Big Data Hadoop Admin


Course name – Hadoop Admin

Total time of Course – 24 Hour


Module Day 1 Content Mode Time
1 HDFS
  • Hadoop Cluster - Storage
  • HDFS Daemons
  • Hadoop Cluster - Storage - Conclusion
  • Hadoop Cluster - Data Processing
  • Other Computing Systems vs Hadoop
  • Major goals of HDFS Design
  • HDFS Federation
  • HDFS HA-Quorum Cluster
  • Role of HDFS Security - Kerberos
  • Data Serialization
  • File read and write paths
  • HDFS Commands
Hands-on 4 hour
2 YARN and mrv2
  • Introduction
  • Understand how upgrading a cluster from Hadoop 1 to Hadoop 2 affects cluster settings
  • Understand how to deploy MapReduce v2 (mrv2 / YARN), including all YARN daemons
  • Understand basic design strategy for MapReduce v2 (mrv2)
  • Determine how YARN handles resource allocations
  • Identify the workflow of MapReduce job running on YARN
  • Deploy YARN and mrv2
  • Design strategy for MapReduce v2 (mrv2)
  • YARN Resource Allocations
  • Workflow of MR job on YARN
  • Parameter files - mrv1 and mrv2 with YARN
Hands-on 4 hour
3 Hadoop Cluster Planning
  • Introduction
  • Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster
  • Analyze the choices in selecting an OS
  • Understand kernel tuning and disk swapping
  • Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario
  • Given a scenario, determine the ecosystem components your cluster needs to run in order to fulfill the SLA
  • Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O
  • Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster
  • Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario
  • Choosing Hardware and Operating Systems
  • Cluster planning - OS and ecosystem
  • Cluster Planning - Hardware considerations
Hands-on 4 hour
4 Hadoop Cluster Installation and Administration
  • Introduction
  • Given a scenario, identify how the cluster will handle disk and machine failures
  • Analyze a logging configuration and logging configuration file format
  • Understand the basics of Hadoop metrics and cluster health monitoring
  • Identify the function and purpose of available tools for cluster monitoring
  • Be able to install all the ecoystem components in CDH5, including (but not limited to): Impala, Flume, Oozie, Hue, Cloudera Manager, Sqoop, Hive and Pig
  • Identify the function and purpose of available tools for managing the Apache Hadoop file system
  • Handle disk and machine failures
  • Logging Configuration
Hands-on 4 hour
5 Resource Management
  • Cluster Health Monitoring
  • Tools for Cluster Monitoring
  • Understand the overall design goals of each of Hadoop schedulers
  • Given a scenario, determine how the FIFO Scheduler allocates cluster resources
  • Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN
  • Given a scenario, determine how the Capacity Scheduler allocates cluster resources
  • Schedulers Overview
  • FIFO Scheduler
  • Fair Scheduler
  • Capacity Scheduler
Hands-on 4 hour
6 Monitoring and Logging
  • Introduction
  • Analyze the NameNode and JobTracker Web UIs
  • Understand how to monitor cluster Daemons
  • Identify and monitor CPU usage on master nodes
  • Describe how to monitor swap and memory allocation on all nodes
  • Identify how to view and manage Hadoop’s log files
  • Interpret a log file
Hands-on 4 hour