In this assignment, you will be designing and implementing MapReduce algorithms for a variety of common data processing tasks. Problem 3: Consider a simple social network dataset consisting of key-value pairs where each key is a person and each value is a friend of that person. Describe a MapReduce algorithm to count he number of friends each person has.
Map Input
The input is a 2 element list: [personA, personB]
personA: Name of a person formatted as a string
personB: Name of one of personA’s friends formatted as a string
This implies that personB is a friend of personA, but it does not imply that personA is a friend of personB. Reduce Output
The output should be a (person, friend count) tuple.
person is a string and friend count is an integer describing the number of friends ‘person’ has.
You can test your solution to this problem using friends.json:
python friend_count.py friends.json
You can verify your solution against friend_count.json.
import MapReduce import sys """ Word Count Example in the Simple Python MapReduce Framework """ mr = MapReduce.MapReduce() # ============================= # Do not modify above this line def mapper(record): # key: document identifier # value: document contents person = record[0] mr.emit_intermediate(person,1) def reducer(person, list_of_values): # key: word # value: list of occurrence counts mr.emit((person,len(list_of_values)) ) # Do not modify below this line # ============================= if __name__ == '__main__': inputdata = open(sys.argv[1]) mr.e xecute(inputdata, mapper, reducer)
Leave a Comment