Are you interested in this course? Please let us know.
 Book nowWaitinglist
Prices are displayed without VAT by default.
  • Quick Contact Form

Cloudera Designing and Building Big Data Applications

Xebia's four-day course for designing and building big data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH). You will work through the entire process of designing and building solutions, including ingesting data, determining the appropriate file format for storage, processing the stored data, and presenting the results to the end-user in an easy-to-digest form. Go beyond MapReduce to use additional elements of the EDH and develop converged applications that are highly relevant to the business.

Take Your Knowledge to the Next Level and Solve Real-World Problems with Training for Hadoop and the Enterprise Data Hub


Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

  • Creating a data set with Kite SDK
  • Developing custom Flume components for data ingestion
  • Managing a multi-stage workflow with Oozie
  • Analyzing data with Crunch
  • Writing user-defined functions for Hive and Impala
  • Transforming data with Morphlines
  • Indexing data with Cloudera Search

Audience and Prerequisites

This course is best suited to developers, engineers, and architects who want to use Hadoop and related tools to solve real-world problems. Participants should have already attended Cloudera Developer Training for Apache Hadoop or have equivalent practical experience. Good knowledge of Java and basic familiarity with Linux are required. Experience with SQL is helpful.

CCP: Data Engineer Certification:

This course is an excellent place to start for people working towards the CCP: Data Engineer certification. Although further study is required before passing the exam (we recommend Developer Training for Spark and Hadoop II: Advanced Techniques), this course covers many of the subjects tested in the CCP: Data Engineer exam.

Learn more about the CCP Certification Exam here:


  • Introduction
  • Application Architecture
  • Defining and Using Data Sets
  • Using the Kite SDK Data Module
  • Importing Relational Data with Apache Sqoop
  • Capturing Data with Apache Flume
  • Developing Custom Flume Components
  • Managing Workflows with Apache Oozie
  • Processing Data Pipelines with Apache Crunch
  • Working with Tables in Apache Hive
  • Developing User-Defined Functions
  • Executing Interactive Queries with Impala
  • Understanding Cloudera Search
  • Indexing Data with Cloudera Search
  • Presenting Results to Users
  • Conclusion

Please note, that you need to bring your own laptop for this training. This laptop should meet the following requirements;

  • Minimum RAM required: 8GB
  • Minimum Free Disk Space: 25GB
  • VMware Player 6.x or above (Windows)/VMware Fusion 6.x or above (Mac)
  • Student machines must have VT-x virtualization support enabled in the BIOS.
  • If running Windows XP: 7-Zip or WinZip (due to a bug in Windows XP’s built-in Zip utility)
  • Student machines must support a 64-bit VMware guest image.
    • If the machines are running a 64-bit version of Windows, or Mac OS X on a Core Duo 2 processor or later, no other test is required. Otherwise, VMware provides a tool to check compatibility, which can be downloaded from