Thursday, 24 December 2015

Functional Programming: Map/Reduce

  • Functional operations do not modify data structures. They always create new data.
  • Original data still exists in unmodified form.
  • Order of operations does not matter 
  • Example: 
Map  :

ÄMAP is the name of a higher-order function that applies a given function element-wise to a list of elements and returns a list of results
ÄOriginated in functional programming but common in many languages
  square x = x * x
          map square [1,2,3,4,5] will return [1,4,9,16.25]
ÄNotice that I can process “map square” in parallel:
map square [1,2,3] -> [1,4,9]
map square [4,5] -> [16,25]


REDUCE (also known fold, accumulate, compress or inject) is a family of higher-order functions that iterate an arbitrary function over a data structure in some order and build up a return value. Typically, a REDUCE deals with two things: a combining function and a list of elements of some data structure. The fold then proceeds to combine elements of the data structure using the function in some systematic way.


MAX (1, 2, 3, 10,15, 20) -> 20

SUM (1, 7, 10) -> 18

In Hadoop REDUCE function always takes MAP function as an input 

ÄSome Examples of map/reduce:
Count of URL Access Frequency: The map function processes logs of web page requests and outputs <URL, 1>. The reduce function adds together all values for the same URL and emits a <URL, total count> pair
Reverse Web-Link Graph: The map function outputs <target, source> pairs for each link to a target URL found in a page named "source". The reduce function concatenates the list of all source URLs associated with a given target URL and emits the pair: <target, list(source)>
Count Words in the Document: The map function outputs each word plus an associated count of occurrences ("1" in this case). The reduce function sums together all the counts returned for a particular word. 
Difference between traditional and Mapreduce Programming:


Post a comment