뉴스레터

이메일로 Hortonworks의 새 업데이트를 받으세요.

한 달에 한 번 빅 데이터와 관련한 최신 인사이트, 동향, 분석 정보, 지식을 받아 보세요.

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

한 달에 한 번 빅 데이터와 관련한 최신 인사이트, 동향, 분석 정보, 지식을 받아 보세요.

CTA

시작하기

클라우드

시작할 준비가 되셨습니까?

Sandbox 다운로드

어떤 도움이 필요하십니까?

* 저는 언제든지 구독을 해지할 수 있다는 점을 이해합니다. 또한 저는 Hortonworks이 개인정보 보호정책에 추가된 정보를 확인하였습니다.
닫기닫기 버튼
CTA
HDP 데이터 과학

개요

This course provides instruction on the theory and practice of data science, including machine learning and natural language processing. This course introduces many of the core concepts behind today’s most commonly used algorithms and introducing them in practical applications. We’ll discuss concepts and key algorithms in all of the major areas – Classification, Regression, Clustering, Dimensionality Reduction, including a primer on Neural Networks. We’ll focus on both single-server tools and frameworks (Python, NumPy, pandas, SciPy, Scikit-learn, NLTK, TensorFlow Jupyter) as well as large-scale tools and frameworks (Spark MLlib, Stanford CoreNLP, TensorFlowOnSpark/Horovod/MLeap, Apache Zeppelin). Download the data sheet to view the full list of objectives and labs.

필수 전제 조건

Students must have experience with Python and Scala, Spark, and prior exposure to statistics, probability, and a basic understanding of big data and Hadoop principles. While brief reviews are offered in these topics, students new to Hadoop are encouraged to attend the Apache Hadoop Essentials (HDP-123) course and HDP Spark Developer (DEV-343), as well as the language-specific introduction courses.


Target Audience


Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Spark/Hadoop
.

1
Day

An Introduction to Data Science, SciKit-Learn, HDFS, Reviewing Spark apps, DataFrames and NOSQL

Objectives

  • Discuss aspects of Data Science, the team members, and the team roles
  • Discuss use cases for Data Science
  • Discuss the current State of the Art and its future direction
  • Review HDFS, Spark, Jupyter, and Zeppelin
  • Work with SciKit-Learn, Pandas, NumPy, Matplotlib, and Seaborn

  • Hello, ML w/ SciKit-Learn
  • Spark REPLs, Spark Submit, & Zeppelin Review
  • HDFS Review
  • Spark DataFrames and Files
  • NiFi Review

Algorithms in Spark ML and SciKit-Learn: Linear Regression, Logistic Regression, Support Vectors, Decision Trees

K-Means & GMM Clustering, Essential TensorFlow, NLP with NLTK, NLP with Stanford CoreNLP

HyperParameter Tuning, K-Fold Validation, Ensemble Methods, ML Pipelines in SparkML

실시간 교육

실시간 교육 자체 진도 진행형 혼합형
라이브 클래스
DATE & TIME
LOCATION
등록
시작할 준비가 되셨습니까?