In this assignment, you will be designing and implementing MapReduce algorithms for a variety of common data processing tasks. Problem 5: Consider a set of key-value pairs where each key is sequence id and each value is a string of nucleotides, e.g., GCTTCCGAAATGCTCGAA.... Write a MapReduce query to remove the last 10 characters from each string of nucleotides, then remove any duplicates generated.

Map Input

The input is a 2 element list: [sequence id, nucleotides]

sequence id: Unique identifier formatted as a string

nucleotides: Sequence of nucleotides formatted as a string Reduce Output

The output from the reduce function should be the unique trimmed nucleotide strings.

You can test your solution to this problem using dna.json:

```    python unique_trims.py dna.json

```

You can verify your solution against unique_trims.json.

```import MapReduce
import sys

"""
Word Count Example in the Simple Python MapReduce Framework
"""

mr = MapReduce.MapReduce()

# =============================
# Do not modify above this line
def mapper(record):
# key: document identifier
# value: document contents
trim_nucleotid = record[1][:-10]
mr.emit_intermediate(trim_nucleotid, 1 )

def reducer(trim_nucleotid, list_of_values):
# key: word
# value: list of occurrence counts
#mr.emit((person,len(list_of_values)) )
mr.emit(trim_nucleotid)

# Do not modify below this line
# =============================
if __name__ == '__main__':
inputdata = open(sys.argv[1])
mr.e xecute(inputdata, mapper, reducer)```