StratChat: Learn Hadoop, MapReduce and BigData from Scratch

A Complete Guide to Learn and Master the Popular Big Data Technologies

The growth of data both structured and unstructured is a big technological challenge and thus provides a great opportunity for IT and technology professionals world wide. There is just too much data and very few professionals to manage and analyze it. We bring together a comprehensive course which will help you master the concepts, technologies and processes involved in BigData.

In this course we will primarily cover MapReduce and its most popular implementation the Apache Hadoop. We will also cover Hadoop ecosystems and practical concepts involved in handling very large data.

The MapReduce Algorithm is used in Big Data to scale computations. Running in parallel the map reduce algorithms load a manageable chunk of data into RAM, perform some intermediate calculations, load the next chunk and keep going until all of the data has been processed. In its simplest representation it can be broken down into a Map step that often takes data set we can think of as ‘unstructured’ the a Reduce step that outputs a ‘structured’ data set often smaller.

In its simplest sense Hadoop is an implementation of the MapReduce Algorithm.

It’s a convenient shorthand when we use the term Hadoop. There is the Hadoop project at a high level, then there is a core selection of tools the Hadoop refers to such as the Hadoop Distributed File System(HDFS), the HDFS shell and the HDFS protocol ‘hdfs://’. Then there is a bigger stack of tools that are becoming central to the use of` Hadoop often referred to as the ‘Hadoop Ecosystem’. These tools consist of but are not limited to Hbase, Pig, Hive, Crunch, Mahout and Avro. Then there is the new Hadoop 2.2.x version that implements a new architecture for MapReduce and allows for efficient workflows using a ‘DAG’ of jobs, a significant evolution of the classic MapReduce job.

Finally Hadoop is written in Java. In Hadoop we see Java’s significant contribution to the evolution of the distributed space as it is represented by Hadoop 2.2 and the Hadoop Ecosystem.

Prerequisites

1. A familiarity of programming in Java. You can do our Java course for free if you want to brush up you java skills here

2. A familiarity of Linux.

3 Have access to a Amazon EMR account.

4. Have Oracle Virtualbox or VMware installed and functioning.

What Will I Learn?

In this course you will learn key concepts in Hadoop and learn how to write your own Hadoop Jobs and MapReduce programs.

The course will specifically facilitate the following High Level outcomes

1. Become literate in Big Data terminology and Hadoop.

2. Given a big data scenario, understand the role of Hadoop in overcoming the

challenges posed by the scenario.

3. How Hadoop functions both in data storage and processing Big Data.

4. Understand the difference between MapReduce version 1 in Hadoop version 1.x.x and

MapReduce version 2 in Hadoop version 2.2.x.

5. Understand the Distributed File Systems architecture and any implementation such as

Hadoop Distributed File System or Google File System.

6. Analyze and Implement a Mapreduce workflow and how to design java classes for

ETL(extract transform and load) and UDF (user defined functions) for this workflow.

7. Data Mining and filtering

The course will specifically facilitate the following Practical outcomes

1. Use the HDFS shell

2. Use the Cloudera, Hortonworks and Apache Bigtop virtual machines for Hadoop code

development and testing.

3. Configure, execute and monitor a Hadoop Job.

4. Use Hadoop data types, readers, writers and splitters.

5. Write ETL and UDF classes for hadoop workflows with PIG and Hive

6. Write filters for Data mining and processing with Mahout , Crunch and Arvo.

7. Test Hadoop code on HortonWorks Sandbox.

8. Run Hadoop code on Amazon EMR.

Category: Technology

What are the requirements?
What am I going to get from this course?
What is the target audience?

CURRICULUM

SECTION 1:

Introduction to Big Data
1

Introduction to the Course

04:55

Preview
2

Why Hadoop, Big Data and Map Reduce Part - A

10:44

Preview
3

Why Hadoop, Big Data and Map Reduce Part - B

10:22
4

Why Hadoop, Big Data and Map Reduce Part - C

12:34
5

Architecture of Clusters

19:09
6

Virtual Machine (VM), Provisioning a VM with vagrant and puppet

15:54
SECTION 2:

Hadoop Architecture
7

Set up a single Node Hadoop pseudo cluster Part - A

11:08

Preview
8

Set up a single Node Hadoop pseudo cluster Part - B

12:34
9

Set up a single Node Hadoop pseudo cluster Part - c

12:07
10

Clusters and Nodes, Hadoop Cluster Part - A

13:23
11

Clusters and Nodes, Hadoop Cluster Part - B

14:15
12

NameNode, Secondary Name Node, Data Nodes Part - A

11:05
13

NameNode, Secondary Name Node, Data Nodes Part - B

10:14
14

Running Multi node clusters on Amazons EMR Part - A

10:09
15

Running Multi node clusters on Amazons EMR Part - B

14:23
16

Running Multi node clusters on Amazons EMR Part - C

15:23
17

Running Multi node clusters on Amazons EMR Part - D

09:32
18

Running Multi node clusters on Amazons EMR Part - E

14:21
SECTION 3:

Distributed file systems
19

Hdfs vs Gfs a comparison - Part A

13:17
20

Hdfs vs Gfs a comparison - Part B

06:12
21

Run hadoop on Cloudera, Web Administration

17:32
22

Run hadoop on Hortonworks Sandbox

19:14
23

File system operations with the HDFS shell Part - A

14:03
24

File system operations with the HDFS shell Part - B

19:37
25

Advanced hadoop development with Apache Bigtop Part - A

13:12
26

Advanced hadoop development with Apache Bigtop Part - B

07:10
SECTION 4:

Mapreduce Version 1
27

MapReduce Concepts in detail Part - A

13:12
28

MapReduce Concepts in detail Part - B

10:55
29

Jobs definition, Job configuration, submission, execution and monitoring Part -A

09:39
30

Jobs definition, Job configuration, submission, execution and monitoring Part -B

10:44
31

Jobs definition, Job configuration, submission, execution and monitoring Part -C

16:48
32

Hadoop Data Types, Paths, FileSystem, Splitters, Readers and Writers Part A

09:32
33

Hadoop Data Types, Paths, FileSystem, Splitters, Readers and Writers Part B

10:39
34

Hadoop Data Types, Paths, FileSystem, Splitters, Readers and Writers Part C

18:52
35

The ETL class, Definition, Extract, Transform, and Load Part - A

15:14
36

The ETL class, Definition, Extract, Transform, and Load Part - B

24:14
37

The UDF class, Definition, User Defined Functions Part - A

15:14
38

The UDF class, Definition, User Defined Functions Part - B

24:14
SECTION 5:

Mapreduce with Hive ( Data warehousing )
39

Schema design for a Data warehouse Part - A

15:14
40

Schema design for a Data warehouse Part - B

24:14
41

Hive Configuration

17:53
42

Hive Query Patterns Part - A

13:52
43

Hive Query Patterns Part - B

12:54
44

Hive Query Patterns Part - C

16:42
45

Example Hive ETL class Part - A

14:02
46

Example Hive ETL class Part - B

11:11
SECTION 6:

Mapreduce with Pig (Parallel processing)
47

Introduction to Apache Pig Part - A

12:17
48

Introduction to Apache Pig Part - B

13:45
49

Introduction to Apache Pig Part - C

09:07
50

Introduction to Apache Pig Part - D

10:09
51

Pig LoadFunc and EvalFunc classes

13:28
52

Example Pig ETL class Part - A

12:40
53

Example Pig ETL class Part - B

14:11
SECTION 7:

The Hadoop Ecosystem
54

Introduction to Crunch Part - A

15:20
55

Introduction to Crunch Part - B

12:52
56

Introduction to Arvo

15:18
57

Introduction to Mahout Part - A

12:51
58

Introduction to Mahout Part - B

13:05
59

Introduction to Mahout Part - C

13:32
SECTION 8:

Mapreduce Version 2
60

Apache Hadoop 2 and YARN Part - A

12:44
61

Apache Hadoop 2 and YARN Part - B

08:23
62

Yarn Examples

14:51
SECTION 9:

Putting it all together
63

Amazon EMR example Part - A

12:03
64

Amazon EMR example Part - B

11:46
65

Amazon EMR example Part - C

08:26
66

Amazon EMR example Part - D

10:18
67

Apache Bigtop example Part - A

12:46
68

Apache Bigtop example Part - B

13:01https://www.udemy.com/learn-hadoop-mapreduce-and-bigdata-from-scratch/?couponCode=kiran80
69

Apache Bigtop example Part - C

13:27
70

Apache Bigtop example Part - D

13:54
71

Apache Bigtop example Part - E

13:06
72

Apache Bigtop example Part - F

13:45
73

Course Summary

04:40

Preview
74

References

2 pages

StratChat

Tuesday, July 1, 2014

Learn Hadoop, MapReduce and BigData from Scratch

CURRICULUM

Introduction to Big Data

Hadoop Architecture

Distributed file systems

Mapreduce Version 1

Mapreduce with Hive ( Data warehousing )

Mapreduce with Pig (Parallel processing)

The Hadoop Ecosystem

Mapreduce Version 2

Putting it all together

No comments: