Big Data Administration

Big Data

Hadoop Cluster Administration

Take this course to catapult your expertise to “job-ready”with hands-on practice in Hadoop cluster administration, single and multi-node cluster set up as well as gaining thorough understanding of HDFS and Mapreduce architecture.

Master to Plan, Install, Backup & Secure a Hadoop Cluster through experts

At Kovid Academy, you will find the explicit momentum required to emerge as an effective Hadoop Big Data Administrator. Our training curriculum is designed with the core intent of delivering much expertise to our participants in the field of Big Data administration.

This training course provides the participants with a comprehensive understanding of the nuts and bolts of the Hadoop cluster that further enables them to install, deploy, configure, upgrade, manage, monitor, tune and secure a distributed Hadoop cluster. The course gives nuances of the Hadoop cluster administration from the perspectives of configuration files in the Vanilla Hadoop Stack and insights of Cloudera CDH, Apache Ambari. The add-ons of this course includes: quizzes, assignments and hands-on exercises. These will assist the participants to further excel their skills.

  • Enquire Here Online

Latest Course Details

Coming Soon

Course Details

After successful completion of this course, you will gain expertise in the following concepts:

  1. Core components of Hadoop – YARN, MapReduce and HDFS
  2. Planning a Hadoop cluster in terms of the hardware and infrastructure based on the requirements
  3. Installing and configuring the Hadoop cluster using Cloudera Manager and Ambari with all the components
  4. Enabling HDFS High Availability, Resource Manager and exploring HDFS Federation
  5. Contrasting MapRFs and HDFS, and the configuration changes required to operate
  6. Loading data from the databases and streaming sources
  7. Managing the Service Level Agreements (SLAs) on a Multi-Tenant Distributed Hadoop Cluster by configuring Fair Scheduler or Capacity Scheduler
  8. Taking care of security, backups and high availability on a live Hadoop cluster
  9. Bench-marking a Hadoop cluster and best practices for the production Hadoop cluster maintenance
  10. Diagnosing, troubleshooting, tuning the performance and other issues on a Hadoop Cluster

This course is useful for the Big Data Administrator aspirants and also for the participants having the designations like Technical Analysts, System Administrators, Database Administrators (DBA), Server Administrators, Technical Support Executives, Programmers, Technical Consultants, Cloud Administrators and more.

Instructor Led training 40 Hrs
Instructor Interaction Yes
Live Support Post Training 1 Year
to Virtual Machine Life Time
Access to Kovid Cluster 3 Years
Kovid Academy Big Data Administrator Certificate Yes
30 CEU/PDU certificate Yes

Module 1: Introduction to Big Data & Hadoop

  • Core fundamental concepts of working with Big Data
  • Understand the state of data and the need for distributed systems to store and process Big Data
  • Distribute architectures and software which are used in Big Data Analytics
  • Case for Apache Hadoop
  • Hadoop distributions and ecosystem
  • Key skills required to embrace the role of a Hadoop Administrator

Module 2: Hadoop Core – HDFS & YARN

  • Distributed architecture of Hadoop
  • Hadoop Distributed File System
  • HDFS High Availability and Federation
  • File operations and Read Write I/O Handling in HDFS
  • Replication, Balancing, Rack Awareness in HDFS
  • HDFS Commands
  • Processing resource Management using YARN
  • YARN daemons and architecture

Module 3: Hadoop Installation using Cloudera Manager

  • Prepare the Linux environment
  • Install Cloudera Manager
  • Install Cloudera CDH on multiple nodes of a Hadoop cluster using Cloudera Manager
  • Configurations using Cloudera Manager
  • Monitoring and Maintenance using Cloudera Manager

Module 4: Hadoop Installation using Ambari

  • Prepare the Linux environment
  • Install Ambari
  • Install Hortonworks HDP on multiple nodes of a Hadoop cluster using Ambari
  • Configurations using Ambari
  • Monitoring and Maintenance using Ambari
  • Comparison between MapRFs and HDFS
  • Configuration differences in a MapR CDP and Hortonworks HDP

Module 5: Computation using YARN

  • YARN architecture
  • YARN configuration
  • Running applications on YARN
  • Application lifecycle management on YARN
  • Resource management and scheduling on YARN

Module 6: Configuration and Logging

  • Hadoop core configuration
  • Key configuration properties and values
  • Metrics collection and logging
  • Balancing the Hadoop cluster in terms of data and processing load

Module 7: Data Motion in HDFS

  • Mechanisms of storing data on HDFS
  • Use of Sqoop to import/export data on databases
  • Use of Flume to aggregate streaming data from log sources
  • Alternatives to gather data from streaming sources like Kafka, Chukwa, Falcon

Module 8: Hadoop Cluster Planning

  • Planning a Hadoop Cluster
  • Key hardware and software configurations
  • Features comparison of leading Hadoop distributions from Cloudera, Hortonworks and MapR
  • Planning for integration of other software

Module 9: Configuring Hadoop Ecosystem components (Pig, Hive etc.)

  • Configuring the components of the Hadoop ecosystem
  • Users, resources and inter process communication
  • Configuring NoSql data stores to work with Hadoop
  • Configuring distributed coordination using Zookeeper

Module 10: Hadoop Clients (Hue, Oozie etc.)

  • Configure Hadoop clients
  • Automate workflows

Module 11: Advanced Hadoop cluster configuration

  • Performance parameters
  • Resource utilization
  • Disaster Recovery

Module 12: Security configuration on a Hadoop cluster

  • Security configurations
  • Kerberos
  • Knox
  • Sentry
  • Ranger

Module 13: Resource management & cluster maintenance

  • Fair Scheduler, Capacity Scheduler
  • Distributed Copy and back ups
  • HDFS Snapshots
  • Decommissioning nodes

Module 14: Monitoring, benchmarking and troubleshooting a Hadoop cluster

  • Monitoring resources and configuring alerts
  • Cluster benchmarking
  • Common troubleshooting & firefighting scenarios

Module 15: Conclusion & Project Discussion

  • Project discussions
  • Dissecting live scenario
  • Planning and deploying a live cluster
  • Setting up security, high availability and disaster recovery
  • Setting up workflows and monitoring cluster
  • Data acquisition and analytical workflow automation and monitoring
  • Rolling out upgrades on the cluster and troubleshooting issues
Big Data Hadoop Developer Training & Certification

Big Data Developer

Become a “Big Data Certified Developer” by gaining a hands-on experience on processing huge amounts of data with multiple tools and technologies.

mongodb big data

Mongo Db Dev & Adm

Abstain yourself from using the traditional RDBMS, give your applications the massive power of MongoDB by storing the data in a document-oriented fashion.

Cassandra Developer and Admin

Cassandra Developer And Administration

Take a canny move, by selecting ‘Apache Cassandra’ as the bedrock for your professional career and start exploiting the ravishing world of NoSQL databases.

Apache Kafka

Getting Started With Kafka

Tune your skills with Apache Kafka – the next generation distributed messaging system, and nurture your career as a Professional Kafka Big Data Developer.

What are the Prerequisites if any for this course?

Basic knowledge of Linux Admin is required to get started with the course and Good understanding of general computing concepts

What support is provided for the prerequisites preparation?

Basics of the Linux administration will be covered in this course and good references will also be shared with the participants.

What are the total training hours?

The duration for the course is of 40 hours, which will be approximately for a period of 5 to 6 weeks. However, the number of hours will vary based upon the level of interaction that the participants exhibits during the training.

What are the details of any projects/assignments worked on?

After the completion of every module, the participants are required to undertake an assignment and will be evaluated on the same day. Furthermore, the participants as a part of training curriculum will have to work on a couple of projects like – setting up the cluster in different modes like High Availability (HA), Federation, etc.

What are the training material provided?

For every module, materials and good references would be shared with the participants. Furthermore, all the online training sessions of the participants will be recorded and made available in the LMS, which can be used for their future reference.

What are the system requirements for participants?

The participants are recommended to have i3 or higher range processor with virtualization support and a minimum of 4 GB RAM (8 GB is recommended), 64-bit operating system and about 100 GB of free hard disk space is required.

Is Certification offered and if so, how do you earn?

Once the training is completed, there will be a certification examination and upon successful completion of the examination, certificates will be issued to the participants.

How many hours is a student expected work?

This depends on the experience of the participants and also how soon they can grasp the different modules. On an average we have noticed that the participants need to spend double the training hours. Let’s say the training is for 10 hours, then the participants need to spend and additional of 20 hours more. Also, the more the participants spend on a particular piece of software the more the comfortable they will become.