|
I. Introduction |
|
A. Hadoop history, concepts |
|
B. Ecosystem |
|
C. Distributions |
|
D. High level architecture |
|
E. Hadoop challenges (hardware / software) |
|
Hands on session. Preparing to Install Hadoop |
|
|
|
II. Planning and Installation |
|
A. Selecting software, Hadoop distributions |
|
B. Sizing the cluster, planning for growth |
|
C. Rack topology |
|
D. Installation of Core Hadoop and Ecosystem tools |
|
E. Directory structure, logs |
|
Hands on Session: Cluster installation |
|
|
|
III. HDFS |
|
A. Concepts (horizontal scaling, block replication, data locality, rack awareness) |
|
B. Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) |
|
C. Health monitoring |
|
D. Command-line and browser-based administration |
|
E. Adding storage, replacing defective drives - Commissioning / De-Comissioning of Datanodes |
|
Hands on Session: getting familiar with HDFS commands |
|
|
|
IV. Mapreduce2 |
|
A. MapReduce1 |
|
B. Terminology and Data Flow - (Map – Shuffle – Reduce) |
|
C. YARN Architecture |
|
D. MapReduce Essential Configuration |
|
Hands on Session : MapReduce UI walk through |
|
|
|
V. Schedulers |
|
A. Working with Jobs |
|
B. Scheduling Concepts |
|
C. FIFO Scheduler |
|
D. Fair Scheduler |
|
E. Capacity Scheduler - Configuration |
|
Hands on Session: Working with Schedulers |
|
|
|
VI. Data Ingestion & Security |
|
A. Flume for logs and other data ingestion into HDFS |
|
B. Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL |
|
C. Overview of Hive |
|
D. Copying data between clusters (distcp) |
|
E. Ranger installation and configuration for HDFS, Hive, Hbase |
|
Hands on session: setup and configure Flume, Sqoop, Range Installing Hadoop with Ambari Lab Tasks |