Are you interested in this course? Please let us know.
 Book nowWaitinglist
Prices are displayed without VAT by default.
  • Quick Contact Form

Cloudera Developer for Apache Hadoop

Xebia's four-day developer training course delivers the key concepts and expertise participants need to create robust data processing applications using Apache Hadoop. From workflow implementation and working with APIs through writing MapReduce code and executing joins, Cloudera's training course is the best preparation for the realworld challenges faced by Hadoop developers.

Programme and Course Overview

Xebia University delivers a developer-focused Cloudera Certified training course that closely analyzes Hadoop's structure and provides hands-on exercises that teach you how to import data from existing sources; process data with a variety of techniques such as Java MapReduce programs and Hadoop Streaming jobs; and work with Apache Hive and Pig.

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

  • The internals of MapReduce and HDFS and how to write MapReduce code
  •  Best practices for Hadoop development, debugging, and implementation of workflows and common algorithms
  • How to leverage Hive, Pig, Sqoop, Flume, Oozie, and other Hadoop ecosystem projects
  • Creating custom components such as WritableComparables and InputFormats to manage complex data types
  • Writing and executing joins to link data sets in MapReduce
  • Advanced Hadoop API topics required for real-world data analysis

Developer Certification

Certification is a great differentiator; it helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.
Learn more about the CCDH Certification Exam here:

Target Group & Prerequisites:

This course is best suited to developers and engineers who have programming experience. Knowledge of Java is strongly recommended and is required to complete the hands-on exercises.

Key Promises of this Training

  • The core technologies of Hadoop.
  • How HDFS and MapReduce work.
  • How to develop MapReduce applications.
  • How to unit test MapReduce applications.
  • How to use MapReduce combiners, partitioners and the distributed cache.
  • Best practices for developing and debugging MapReduce applications.
  • How to implement data input and output in MapReduce applications.
  • Algorithms for common MapReduce tasks.
  • How to join data sets in MapReduce.
  • How Hadoop integrates into the data center.
  • How Hive, Impala and Pig can be used for rapid application development.
  • How to create large workflows using Oozie.


  • Introduction
  • The Motivation for Hadoop
  • Hadoop: Basic Concepts and HDFS
  • Introduction to MapReduce
  • Hadoop Clusters and the Hadoop Ecosystem
  • Writing a MapReduce Program in Java
  • Writing a MapReduce Program Using Streaming
  • Delving Deeper into the Hadoop API
  • Practical Development Tips and Techniquess
  • Partitioners and Reducers
  • Data Input and Output
  • Implementing Custom InputFormats and OutputFormats
  • Frequency
  • Joining Data Sets in MapReduce Jobs
  • Integrating Hadoop into the Enterprise Workflow
  • An Introduction to Hive, Imapala, and Pig
  • An Introduction to Oozie
  • Conclusion

Please note, that you need to bring your own laptop for this training. This laptop should meet the following requirements:

  • At least 4GB RAM;
  • 15GB of free hard disk space;
  • VMware Player 5.x or above (Windows)/ VMware Fusion 4.x or above (Mac);
  • Your laptop must support a 64-bit VMware guest image. If the machines are running a 64-bit version of WIndows, or Mac OS X on a Core DUO 2 processor or later, not other test is required. Otherwise, VMware provides a tool to check compatibility, which can be downloaded from;
  • Your laptop must have VT-x virtualization support enabled in the BIOS;
  • If running Windows XP: 7-Zip or WinZip  is needed (due to a bug in Windows XP's built-in Zip utility).