Cloudera Developer Training for Apache Hadoop

Cloudera Developer Training for Apache Hadoop Course Description

Duration: 4.00 days (32 hours)

You will learn to build powerful data processing applications in this course. You will learn about MapReduce, the Hadoop Distributed Files System (HDFS), and how to write MapReduce code, and you will cover best practices for Hadoop development, debugging, and implementation of workflows.

This course covers concepts addressed on the Cloudera Certified Developer for Apache Hadoop (CCDH) exam.

Next Class Dates

Contact us to customize this class with your own dates, times and location. You can also call 1-888-563-8266 or chat live with a Learning Consultant.

Back to Top

Intended Audience for this Cloudera Developer Training for Apache Hadoop Course

  • » Developers who need to write and maintain Apache Hadoop applications.

Back to Top

Course Prerequisites for Cloudera Developer Training for Apache Hadoop

Back to Top

Cloudera Developer Training for Apache Hadoop Course Objectives

  • » The internals of MapReduce and HDFS and how to write MapReduce code
  • » Best practices for Hadoop development, debugging, and implementation of workflows and common algorithms
  • » How to leverage Hive, Pig, Sqoop, Flume, Oozie, and other Hadoop ecosystem projects
  • » Creating custom components such as WritableComparables and InputFormats to manage complex data types
  • » Writing and executing joins to link data sets in MapReduce
  • » Advanced Hadoop API topics required for real-world data analysis

Back to Top

Cloudera Developer Training for Apache Hadoop Course Outline

      1. Motivation for Hadoop
        1. Problems with Traditional Large-Scale Systems
        2. Requirements for a New Approach
      2. Hadoop: Basic Concepts
        1. Hadoop Distributed File System (HDFS)
        2. MapReduce
        3. Anatomy of a Hadoop Cluster
        4. Other Hadoop Ecosystem Components
      3. Writing a MapReduce Program
        1. MapReduce Flow
        2. Examining a Sample MapReduce Program
        3. Basic MapReduce API Concepts
        4. Driver Code
        5. Mapper
        6. Reducer
        7. Streaming API
        8. Using Eclipse for Rapid Development
        9. New MapReduce API
      4. Integrating Hadoop into the Workflow
        1. Relational Database Management Systems
        2. Storage Systems
        3. Importing Data from a Relational Database Management System with Sqoop
        4. Importing Real-Time Data with Flume
        5. Accessing HDFS Using FuseDFS and Hoop
      5. Delving Deeper into the Hadoop API
        1. ToolRunner
        2. Testing with MRUnit
        3. Reducing Intermediate Data with Combiners
        4. Configuration and Close Methods for Map/Reduce Setup and Teardown
        5. Writing Partitioners for Better Load Balancing
        6. Directly Accessing HDFS
        7. Using the Distributed Cache
      6. Common MapReduce Algorithms
        1. Sorting and Searching
        2. Indexing
        3. Machine Learning with Mahout
        4. Term Frequency
        5. Inverse Document Frequency
        6. Word Co-Occurrence
      7. Using Hive and Pig
        1. Hive Basics
        2. Pig Basics
      8. Practical Development Tips and Techniques
        1. Debugging MapReduce Code
        2. Using LocalJobRunner Mode for Easier Debugging
        3. Retrieving Job Information with Counters
        4. Logging
        5. Splittable File Formats
        6. Determining the Optimal Number of Reducers
        7. Map-Only MapReduce Jobs
      9. Advanced MapReduce Programming
        1. Custom Writables and WritableComparables
        2. Saving Binary Data Using SequenceFiles and Avro Files
        3. Creating InputFormats and OutputFormats
      10. Joining Data Sets in MapReduce
        1. Map-Side Joins
        2. Secondary Sort
        3. Reduce-Side Joins
      11. Graph Manipulation in Hadoop
        1. Graph Techniques
        2. Representing Graphs in Hadoop
        3. Implementing a Sample Algorithm: Single Source Shortest Path
      12. Creating Workflows with Oozie
        1. Motivation for Oozie
        2. Workflow Definition Format

Back to Top

Do you have the right background for Cloudera Developer Training for Apache Hadoop?

Skills Assessment

We ensure your success by asking all students to take a FREE Skill Assessment test. These short, instructor-written tests are an objective measure of your current skills that help us determine whether or not you will be able to meet your goals by attending this course at your current skill level. If we determine that you need additional preparation or training in order to gain the most value from this course, we will recommend cost-effective solutions that you can use to get ready for the course.

Our required skill-assessments ensure that:

  1. All students in the class are at a comparable skill level, so the class can run smoothly without beginners slowing down the class for everyone else.
  2. NetCom students enjoy one of the industry's highest success rates, and pass rates when a certification exam is involved.
  3. We stay committed to providing you real value. Again, your success is paramount; we will register you only if you have the skills to succeed.
This assessment is for your benefit and best taken without any preparation or reference materials, so your skills can be objectively measured.

Take your FREE Skill Assessment test »

Back to Top

Award winning, world-class Instructors

Jose P.
Jose Marcial Portilla has a BS and MS in Mechanical Engineering from Santa Clara University. He has a great skill set in analyzing data, specifically using Python and a variety of modules and libraries. He hopes to use his experience in teaching and data science to help other people learn the power of the Python programming language and its ability to analyze data, as well as present the data in clear and beautiful visualizations. He is the creator of some of most popular Python Udemy courses including "Learning Python for Data Analysis and Visualization" and "The Complete Python Bootcamp". With almost 30,000 enrollments Jose has been able to teach Python and its Data Science libraries to thousands of students. Jose is also a published author, having recently written "NumPy Succintly" for Syncfusion's series of e-books.

See more...   See more instructors...

Back to Top

Client Testimonials & Reviews about their Learning Experience

We are passionate in delivering the best learning experience for our students and they are happy to share their learning experience with us.
Read what students had to say about their experience at NetCom.   Read student testimonials...

Back to Top