Tuesday, July 1, 2014

Learn Hadoop, MapReduce and BigData from Scratch

A Complete Guide to Learn and Master the Popular Big Data Technologies
The growth of data both structured and unstructured is a big technological challenge and thus provides a great opportunity for IT and technology professionals world wide. There is just too much data and very few professionals to manage and analyze it. We bring together a comprehensive course which will help you master the concepts, technologies and processes involved in BigData.
In this course we will primarily cover MapReduce and its most popular implementation the Apache Hadoop. We will also cover Hadoop ecosystems and practical concepts involved in handling very large data.
The MapReduce Algorithm is used in Big Data to scale computations. Running in parallel the map reduce algorithms load a manageable chunk of data into RAM, perform some intermediate calculations, load the next chunk and keep going until all of the data has been processed. In its simplest representation it can be broken down into a Map step that often takes data set we can think of as ‘unstructured’ the a Reduce step that outputs a ‘structured’ data set often smaller.
In its simplest sense Hadoop is an implementation of the MapReduce Algorithm.
It’s a convenient shorthand when we use the term Hadoop. There is the Hadoop project at a high level, then there is a core selection of tools the Hadoop refers to such as the Hadoop Distributed File System(HDFS), the HDFS shell and the HDFS protocol ‘hdfs://’. Then there is a bigger stack of tools that are becoming central to the use of` Hadoop often referred to as the ‘Hadoop Ecosystem’. These tools consist of but are not limited to Hbase, Pig, Hive, Crunch, Mahout and Avro. Then there is the new Hadoop 2.2.x version that implements a new architecture for MapReduce and allows for efficient workflows using a ‘DAG’ of jobs, a significant evolution of the classic MapReduce job.
Finally Hadoop is written in Java. In Hadoop we see Java’s significant contribution to the evolution of the distributed space as it is represented by Hadoop 2.2 and the Hadoop Ecosystem.
Prerequisites
1. A familiarity of programming in Java. You can do our Java course for free if you want to brush up you java skills here
2. A familiarity of Linux.
3 Have access to a Amazon EMR account.
4. Have Oracle Virtualbox or VMware installed and functioning.
What Will I Learn?
In this course you will learn key concepts in Hadoop and learn how to write your own Hadoop Jobs and MapReduce programs.
The course will specifically facilitate the following High Level outcomes
1. Become literate in Big Data terminology and Hadoop.
2. Given a big data scenario, understand the role of Hadoop in overcoming the
challenges posed by the scenario.
3. How Hadoop functions both in data storage and processing Big Data.
4. Understand the difference between MapReduce version 1 in Hadoop version 1.x.x and
MapReduce version 2 in Hadoop version 2.2.x.
5. Understand the Distributed File Systems architecture and any implementation such as
Hadoop Distributed File System or Google File System.
6. Analyze and Implement a Mapreduce workflow and how to design java classes for
ETL(extract transform and load) and UDF (user defined functions) for this workflow.
7. Data Mining and filtering
The course will specifically facilitate the following Practical outcomes
1. Use the HDFS shell
2. Use the Cloudera, Hortonworks and Apache Bigtop virtual machines for Hadoop code
development and testing.
3. Configure, execute and monitor a Hadoop Job.
4. Use Hadoop data types, readers, writers and splitters.
5. Write ETL and UDF classes for hadoop workflows with PIG and Hive
6. Write filters for Data mining and processing with Mahout , Crunch and Arvo.
7. Test Hadoop code on HortonWorks Sandbox.
8. Run Hadoop code on Amazon EMR.
Category: Technology

CURRICULUM

  • SECTION 1:
    Introduction to Big Data
    • 1
      Introduction to the Course
      04:55
    • 2
      Why Hadoop, Big Data and Map Reduce Part - A
      10:44
    • 3
      Why Hadoop, Big Data and Map Reduce Part - B
      10:22
    • 4
      Why Hadoop, Big Data and Map Reduce Part - C
      12:34
    • 5
      Architecture of Clusters
      19:09
    • 6
      Virtual Machine (VM), Provisioning a VM with vagrant and puppet
      15:54
    • SECTION 2:
      Hadoop Architecture
      • 7
        Set up a single Node Hadoop pseudo cluster Part - A
        11:08
      • 8
        Set up a single Node Hadoop pseudo cluster Part - B
        12:34
      • 9
        Set up a single Node Hadoop pseudo cluster Part - c
        12:07
      • 10
        Clusters and Nodes, Hadoop Cluster Part - A
        13:23
      • 11
        Clusters and Nodes, Hadoop Cluster Part - B
        14:15
      • 12
        NameNode, Secondary Name Node, Data Nodes Part - A
        11:05
      • 13
        NameNode, Secondary Name Node, Data Nodes Part - B
        10:14
      • 14
        Running Multi node clusters on Amazons EMR Part - A
        10:09
      • 15
        Running Multi node clusters on Amazons EMR Part - B
        14:23
      • 16
        Running Multi node clusters on Amazons EMR Part - C
        15:23
      • 17
        Running Multi node clusters on Amazons EMR Part - D
        09:32
      • 18
        Running Multi node clusters on Amazons EMR Part - E
        14:21
      • SECTION 3:
        Distributed file systems
        • 19
          Hdfs vs Gfs a comparison - Part A
          13:17
        • 20
          Hdfs vs Gfs a comparison - Part B
          06:12
        • 21
          Run hadoop on Cloudera, Web Administration
          17:32
        • 22
          Run hadoop on Hortonworks Sandbox
          19:14
        • 23
          File system operations with the HDFS shell Part - A
          14:03
        • 24
          File system operations with the HDFS shell Part - B
          19:37
        • 25
          Advanced hadoop development with Apache Bigtop Part - A
          13:12
        • 26
          Advanced hadoop development with Apache Bigtop Part - B
          07:10
        • SECTION 4:
          Mapreduce Version 1
          • 27
            MapReduce Concepts in detail Part - A
            13:12
          • 28
            MapReduce Concepts in detail Part - B
            10:55
          • 29
            Jobs definition, Job configuration, submission, execution and monitoring Part -A
            09:39
          • 30
            Jobs definition, Job configuration, submission, execution and monitoring Part -B
            10:44
          • 31
            Jobs definition, Job configuration, submission, execution and monitoring Part -C
            16:48
          • 32
            Hadoop Data Types, Paths, FileSystem, Splitters, Readers and Writers Part A
            09:32
          • 33
            Hadoop Data Types, Paths, FileSystem, Splitters, Readers and Writers Part B
            10:39
          • 34
            Hadoop Data Types, Paths, FileSystem, Splitters, Readers and Writers Part C
            18:52
          • 35
            The ETL class, Definition, Extract, Transform, and Load Part - A
            15:14
          • 36
            The ETL class, Definition, Extract, Transform, and Load Part - B
            24:14
          • 37
            The UDF class, Definition, User Defined Functions Part - A
            15:14
          • 38
            The UDF class, Definition, User Defined Functions Part - B
            24:14
          • SECTION 5:
            Mapreduce with Hive ( Data warehousing )
            • 39
              Schema design for a Data warehouse Part - A
              15:14
            • 40
              Schema design for a Data warehouse Part - B
              24:14
            • 41
              Hive Configuration
              17:53
            • 42
              Hive Query Patterns Part - A
              13:52
            • 43
              Hive Query Patterns Part - B
              12:54
            • 44
              Hive Query Patterns Part - C
              16:42
            • 45
              Example Hive ETL class Part - A
              14:02
            • 46
              Example Hive ETL class Part - B
              11:11
            • SECTION 6:
              Mapreduce with Pig (Parallel processing)
              • 47
                Introduction to Apache Pig Part - A
                12:17
              • 48
                Introduction to Apache Pig Part - B
                13:45
              • 49
                Introduction to Apache Pig Part - C
                09:07
              • 50
                Introduction to Apache Pig Part - D
                10:09
              • 51
                Pig LoadFunc and EvalFunc classes
                13:28
              • 52
                Example Pig ETL class Part - A
                12:40
              • 53
                Example Pig ETL class Part - B
                14:11
              • SECTION 7:
                The Hadoop Ecosystem
                • 54
                  Introduction to Crunch Part - A
                  15:20
                • 55
                  Introduction to Crunch Part - B
                  12:52
                • 56
                  Introduction to Arvo
                  15:18
                • 57
                  Introduction to Mahout Part - A
                  12:51
                • 58
                  Introduction to Mahout Part - B
                  13:05
                • 59
                  Introduction to Mahout Part - C
                  13:32
                • SECTION 8:
                  Mapreduce Version 2
                  • 60
                    Apache Hadoop 2 and YARN Part - A
                    12:44
                  • 61
                    Apache Hadoop 2 and YARN Part - B
                    08:23
                  • 62
                    Yarn Examples
                    14:51
                  • SECTION 9:
                    Putting it all together
                    • 63
                      Amazon EMR example Part - A
                      12:03
                    • 64
                      Amazon EMR example Part - B
                      11:46
                    • 65
                      Amazon EMR example Part - C
                      08:26
                    • 66
                      Amazon EMR example Part - D
                      10:18
                    • 67
                      Apache Bigtop example Part - A
                      12:46
                    • 69
                      Apache Bigtop example Part - C
                      13:27
                    • 70
                      Apache Bigtop example Part - D
                      13:54
                    • 71
                      Apache Bigtop example Part - E
                      13:06
                    • 72
                      Apache Bigtop example Part - F
                      13:45
                    • 73
                      Course Summary
                      04:40
                    • 74
                      References
                      2 pages

                    No comments: