FAQ - Hadoop

FAQ - Hadoop - 3

Posted by : Sushanth Sunday, 27 December 2015

21) What are the features of a “Fully Distributed” mode?

“Fully Distributed” mode is used in the production environment, where we have ‘n’ number of machines forming a Hadoop cluster. Hadoop daemons run on a cluster of machines. There is one host on which “Namenode” runs and another host on which “Datanode” runs, and then there are machines on which “TaskTracker/NodeManager” runs. We have separate masters and slaves in this sort of a distribution.

22) Name the three modes in which Hadoop can be run.

The three modes in which Hadoop can be run are:

1. Standalone (local) mode

2. Pseudo distributed mode

3. Fully distributed mode

23) What is the role of “ZooKeeper” in a Hadoop cluster?

The purpose of “ZooKeeper” is cluster management. “ZooKeeper” will help you achieve coordination between Hadoop nodes. “ZooKeeper” also helps to:

Manage configuration across nodes
Implement reliable messaging
Implement redundant services
Synchronize process execution

Questions around MapReduce

24) What is “MapReduce”?

It is a framework or a programming model that is used for processing large data sets over clusters of computers using distributed programming.

25) What is the syntax to run a “MapReduce” program?

hadoop jar file.jar /input_path /output_path

26) How would you debug a Hadoop code?

There are many ways to debug Hadoop codes but the most popular methods are:

Using Counters.
Using the web interface provided by the Hadoop framework.

27) What are the main configuration parameters in a “MapReduce” program?

Users of the “MapReduce” framework need to specify these parameters:

Job’s input locations in the distributed file system
Job’s output location in the distributed file system
Input format
Output format
Class containing the “map” function
Class containing the “reduce” function

28) What is the default input type/format in “MapReduce”?

By default, the type input type in MapReduce is “text”.

29) State the reason why we can’t perform “aggregation” (addition) in a mapper? Why do we need the “reducer” for this?

We cannot perform “aggregation” (addition) in a mapper because sorting does not occur in the “mapper”. Sorting occurs only on the reducer side. The “Mapper” method initialization depends on each input split. During “aggregation”, we will lose the value of the previous instance. For each row, a new “mapper” will get initialized. For each row, “input split” again gets divided into the “mapper”. Hence, we cannot have a track of the previous row value.

30) What is the purpose of “RecordReader” in Hadoop?

The “InputSplit” has defined a slice of work, but does not describe how to access it. The “RecordReader” class loads the data from its source and converts it into (key, value) pairs suitable for reading by the “Mapper”. The “RecordReader” instance is defined by the “Input Format”.

Subscribe to Posts | Subscribe to Comments

Technical Articles

Software Programming articles

FAQ - Hadoop - 3

Leave a Reply