Big Data Hadoop

  • 0

Big Data Hadoop

Category : New Courses

Big Data Hadoop

Course Overview

Program Code NICE-NC-08
Eligibility 10th / +2 Passed
Duration 2 Months

Course Structure

1. Introduction to Hadoop

  • High Availability
  • Scaling
  • Advantages and Challenges 

2.Introduction to Big Data

  • What is Big data
  • Big Data opportunities
  • Big Data Challenges
  • Characteristics of Big data 

3. Introduction to Hadoop

  • Hadoop Distributed File System
  • Comparing Hadoop & SQL.
  • Industries using Hadoop.
  • Data Locality.
  • Hadoop Architecture.
  • Map Reduce & HDFS.
  • Using the Hadoop single node image (Clone). 

4.The Hadoop Distributed File System (HDFS)

  • HDFS Design & Concepts
  • Blocks, Name nodes and Data nodes
  • HDFS High-Availability and HDFS Federation.
  • Hadoop DFS The Command-Line Interface
  • Basic File System Operations
  • Anatomy of File Read
  • Anatomy of File Write
  • Block Placement Policy and Modes
  • More detailed explanation about Configuration files.
  • Metadata, FS image, Edit log, Secondary Name Node and Safe Mode.
  • How to add New Data Node dynamically.
  • How to decommission a Data Node dynamically (Without stopping cluster).
  • FSCK Utility. (Block report).
  • How to override default configuration at system level and Programming level.
  • HDFS Federation.
  • ZOOKEEPER Leader Election Algorithm.
  • Exercise and small use case on HDFS. 

5. Map Reduce

  • Functional Programming Basics.
  • Map and Reduce Basics
  • How Map Reduce Works
  • Anatomy of a Map Reduce Job Run
  • Legacy Architecture ->Job Submission, Job Initialization, Task Assignment, Task Execution, Progress and Status Updates
  • Job Completion, Failures
  • Shuffling and Sorting
  • Splits, Record reader, Partition, Types of partitions & Combiner
  • Optimization Techniques -> Speculative Execution, JVM Reuse and No. Slots.
  • Types of Schedulers and Counters.
  • Comparisons between Old and New API at code and Architecture Level.
  • Getting the data from RDBMS into HDFS using Custom data types.
  • Distributed Cache and Hadoop Streaming (Python, Ruby and R).
  • YARN.
  • Sequential Files and Map Files.
  • Enabling Compression Codec’s.
  • Map side Join with distributed Cache.
  • Types of I/O Formats: Multiple outputs, NLINEinputformat.
  • Handling small files using CombineFileInputFormat.

6.Map/Reduce Programming – Java Programming

  • Hands on “Word Count” in Map/Reduce in standalone and Pseudo distribution Mode.
  • Sorting files using Hadoop Configuration API discussion
  • Emulating “grep” for searching inside a file in Hadoop
  • DBInput Format
  • Job Dependency API discussion
  • Input Format API discussion
  • Input Split API discussion
  • Custom Data type creation in Hadoop.

7.NOSQL

  • ACID in RDBMS and BASE in NoSQL.
  • CAP Theorem and Types of Consistency.
  • Types of NoSQL Databases in detail.
  • Columnar Databases in Detail (HBASE and CASSANDRA).
  • TTL, Bloom Filters and Compensation.

8.HBase

  • HBase Installation
  • HBase concepts
  • HBase Data Model and Comparison between RDBMS and NOSQL.
  • Master  & Region Servers.
  • HBase Operations (DDL and DML) through Shell and Programming and HBase Architecture.
  • Catalog Tables.
  • Block Cache and sharding.
  • SPLITS.
  • DATA Modeling (Sequential, Salted, Promoted and Random Keys).
  • JAVA API’s and Rest Interface.
  • Client Side Buffering and Process 1 million records using Client side Buffering.
  • HBASE Counters.
  • Enabling Replication and HBASE RAW Scans.
  • HBASE Filters.
  • Bulk Loading and Coprocessors (Endpoints and Observers with programs).
  • Real world use case consisting of HDFS,MR and HBASE.

9.Hive

  • Installation
  • Introduction and Architecture.
  • Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)
  • Meta store
  • Hive QL
  • OLTP vs. OLAP
  • Working with Tables.
  • Primitive data types and complex data types.
  • Working with Partitions.
  • User Defined Functions
  • Hive Bucketed Tables and Sampling.
  • External partitioned tables, Map the data to the partition in the table, Writing the output of one query to another table, Multiple inserts
  • Dynamic Partition
  • Differences between ORDER BY, DISTRIBUTE BY and SORT BY.
  • Bucketing and Sorted Bucketing with Dynamic partition.
  • RC File.
  • INDEXES and VIEWS.
  • MAPSIDE JOINS.
  • Compression on hive tables and Migrating Hive tables.
  • Dynamic substation of Hive and Different ways of running Hive
  • How to enable Update in HIVE.
  • Log Analysis on Hive.
  • Access HBASE tables using Hive.
  • Hands on Exercises
 

11.Pig

    • Installation
    • Execution Types
    • Grunt Shell
    • Pig Latin
    • Data Processing
    • Schema on read
    • Primitive data types and complex data types.
    • Tuple schema, BAG Schema and MAP Schema.
    • Loading and Storing
    • Filtering
    • Grouping & Joining
    • Debugging commands (Illustrate and Explain).
    • Validations in PIG.
    • Type casting in PIG.
    • Working with Functions
    • User Defined Functions
    • Types of JOINS in pig and Replicated Join in detail.
    • SPLITS and Multiquery execution.
    • Error Handling, FLATTEN and ORDER BY.
    • Parameter Substitution.
    • Nested For Each.
    • User Defined Functions, Dynamic Invokers and Macros.
    • How to access HBASE using PIG.
    • How to Load and Write JSON DATA using PIG.
    • Piggy Bank.

    Hands on Exercises

12. IC-WEB CLIENT (Customer Interaction Center)

  • Overview of IC-WEB Client
  • Account Identification
  • Customizing IC-WEB Client Profiles
    • IC Manager
    • IC Agent
  • Agent Inbox
  • E-Mail Response Management System (ERMS) and Order Routing with Rule Modeler
    • IC-Functions
    • Interactive Scripting
    • Broadcast Messaging
    • Call List Management

13. SQOOP

  • Installation
  • Import Data.(Full table, Only Subset, Target Directory, protecting Password, file format other than CSV,Compressing,Control Parallelism, All tables Import)
  • Incremental  Import(Import only New data, Last Imported data, storing Password in Metastore, Sharing Metastore between Sqoop Clients)
  • Free Form Query Import
  • Export data to RDBMS,HIVE and HBASE
  • Hands on Exercises.

14. HCATALOG

  • Installation.
  • Introduction to HCATALOG.
  • About Hcatalog with PIG,HIVE and MR.
  • Hands on Exercises.

15. HCATALOG

  • Installation.
  • Introduction to HCATALOG.
  • About Hcatalog with PIG,HIVE and MR.
  • Hands on Exercises.

16.FLUME

    • Installation
    • Introduction to Flume
    • Flume Agents: Sources, Channels and Sinks
    • Log User information using Java program in to HDFS using LOG4J and Avro Source
    • Log User information using Java program in to HDFS using Tail Source
    • Log User information using Java program in to HBASE using LOG4J and Avro Source
    • Log User information using Java program in to HBASE using Tail Source
    • Flume Commands

    Use case of Flume: Flume the data from twitter in to HDFS and HBASE. Do some

17.More Ecosystems

  • HUE.(Hortonworks and Cloudera).

18.Oozie

  • Workflow (Action, Start, Action, End, Kill, Join and Fork), Schedulers, Coordinators and Bundles.
  • Workflow to show how to schedule Sqoop Job, Hive, MR and PIG.
  • Real world Use case which will find the top websites used by users of certain ages and will be scheduled to run for every one hour.
  • Zoo Keeper
  • HBASE Integration with HIVE and PIG.
  • Phoenix
  • Proof of concept (POC).

19.SPARK

  • Overview
  • Linking with Spark
  • Initializing Spark
  • Using the Shell
  • Resilient Distributed Datasets (RDDs)
  • Parallelized Collections
  • External Datasets
  • RDD Operations
  • Basics, Passing Functions to Spark
  • Working with Key-Value Pairs
  • Transformations
  • Actions
  • RDD Persistence
  • Which Storage Level to Choose?
  • Removing Data
  • Shared Variables
  • Broadcast Variables
  • Accumulators
  • Deploying to a Cluster
  • Unit Testing
  • Migrating from pre-1.0 Versions of Spark
  • Where to Go from Her

Leave a Reply

six + 16 =