Big Data Foundation for Developers

A hands on developers course to learn popular big data tools Hadoop, Hive & Spark including Machine Learning with Spark

Apache Hadoop, Yarn, Hive and Spark are popular big data tools used by many organizations to develop big data analytics solutions. Through this course students can develop big data applications using these tools to process data and derive valuable insights from data. By the end of the course, students will be able to set up a personal big data development environment, master the fundamental concepts of Hadoop, Yarn, Hive and Spark, copy data into and from a big data cluster, process the data using the Map/Reduce paradigm, run Map/Reduce and Spark jobs on Yarn, Learn to process big data using Scala programming language in Spark, Use RDDs and dataframes to process big data, use Parquet format to store data, and finally use Machine Learning Libraries of Spark to develop Machine Learning solutions like decision trees, recommendation engine, Linear Regression and Anomaly detection.

What you’ll learn

  • Apache Hadoop, Hive and Spark are very popular big data tools used by many organizations. Don’t let your skills become obsolete..
  • Upskill yourself with the in-demand big data and machine learning skills.
  • Practice with 20 demos and more than 50 practice activities that push you beyond what you learn in the class to become a big data developer.
  • You will implement machine learning techniques using Spark to solve business problems like prediction, recommendation engine and anomaly detection..
  • By the end of this course, you will be able to set up a big data cluster, copy data to it and process with big data tools.
  • Query big data using Hive, process big data through dataframes in Spark.
  • Store data in Parquet format to take advantage of predicate pushdowns, chain multiple chain multiple transformations of data including windowing and pivoting.
  • Includes introduction to Scala for use with Spark.

Course Content

  • Introduction –> 15 lectures • 33min.
  • Lesson 2 Hadoop – HDFS –> 21 lectures • 1hr 1min.
  • Hadoop Map/Reduce –> 26 lectures • 56min.
  • YARN –> 14 lectures • 31min.
  • Hive –> 15 lectures • 36min.
  • Spark Scala –> 25 lectures • 1hr 21min.
  • Spark RDDs, Dataframes and SQL –> 37 lectures • 2hr 6min.
  • Spark Machine Learning –> 24 lectures • 1hr 15min.
  • Conclusion –> 5 lectures • 22min.

Big Data Foundation for Developers

Requirements

Apache Hadoop, Yarn, Hive and Spark are popular big data tools used by many organizations to develop big data analytics solutions. Through this course students can develop big data applications using these tools to process data and derive valuable insights from data. By the end of the course, students will be able to set up a personal big data development environment, master the fundamental concepts of Hadoop, Yarn, Hive and Spark, copy data into and from a big data cluster, process the data using the Map/Reduce paradigm, run Map/Reduce and Spark jobs on Yarn, Learn to process big data using Scala programming language in Spark, Use RDDs and dataframes to process big data, use Parquet format to store data, and finally use Machine Learning Libraries of Spark to develop Machine Learning solutions like decision trees, recommendation engine, Linear Regression and Anomaly detection.

This is a hands on development course and you will practice more than 50 activities during this course. While Java knowledge is assumed, fundamentals of Scala are taught so that you can write Scala code to process data in Spark. The course provides a foundation for developers to join big data development teams in their organization.

Get Tutorial