 |
I. Introduction |
 |
A. Hadoop history, concepts |
 |
B. Ecosystem |
 |
C. Distributions |
 |
D. High level architecture |
 |
E. Hadoop challenges (hardware / software) |
 |
Hands on session. Preparing to Install Hadoop |
 |
|
 |
II. Planning and Installation |
 |
A. Selecting software, Hadoop distributions |
 |
B. Sizing the cluster, planning for growth |
 |
C. Rack topology |
 |
D. Installation of Core Hadoop and Ecosystem tools |
 |
E. Directory structure, logs |
 |
Hands on Session: Cluster installation |
 |
|
 |
III. HDFS |
 |
A. Concepts (horizontal scaling, block replication, data locality, rack awareness) |
 |
B. Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) |
 |
C. Health monitoring |
 |
D. Command-line and browser-based administration |
 |
E. Adding storage, replacing defective drives - Commissioning / De-Comissioning of Datanodes |
 |
Hands on Session: getting familiar with HDFS commands |
 |
|
 |
IV. Mapreduce2 |
 |
A. MapReduce1 |
 |
B. Terminology and Data Flow - (Map – Shuffle – Reduce) |
 |
C. YARN Architecture |
 |
D. MapReduce Essential Configuration |
 |
Hands on Session : MapReduce UI walk through |
 |
|
 |
V. Schedulers |
 |
A. Working with Jobs |
 |
B. Scheduling Concepts |
 |
C. FIFO Scheduler |
 |
D. Fair Scheduler |
 |
E. Capacity Scheduler - Configuration |
 |
Hands on Session: Working with Schedulers |
 |
|
 |
VI. Data Ingestion & Security |
 |
A. Flume for logs and other data ingestion into HDFS |
 |
B. Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL |
 |
C. Overview of Hive |
 |
D. Copying data between clusters (distcp) |
 |
E. Ranger installation and configuration for HDFS, Hive, Hbase |
 |
Hands on session: setup and configure Flume, Sqoop, Range Installing Hadoop with Ambari Lab Tasks |