In this assignment, you will be designing and implementing MapReduce algorithms for a variety of common data processing tasks.

Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 6: Top ten hash tags

Write a Python script, top_ten.py, that computes the ten most frequently occurring hash tags from the data you gathered in Problem 1.

Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 5: Which State is happiest?

Write a Python script, happiest_state.py, that returns the name of the happiest state as a string.

Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 4: Compute Term Frequency

Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 3: Derive the sentiment of new terms In this part you will be creating a script that computes the sentiment for the terms that do not appear in the file AFINN-111.txt.

Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

  • Problem 0: Query Twitter with Python
  • Problem 1: Get Twitter Data
  • Problem 2: Derive the sentiment of EACH tweet

week 6 covered topics:

  • HASHING: THE BASICS
  • UNIVERSAL HASHING
  • BLOOM FILTERS

Two programming assignments:

  • The goal is to implement a variant of the 2-SUM algorithm (covered in the Week 6 lecture on hash table applications)
  • The goal is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications).

week 5 covered:

  • DIJKSTRA'S SHORTEST-PATH ALGORITHM
  • HEAPS
  • BALANCED BINARY SEARCH TREES

Programming assignment: In this programming problem you'll code up Dijkstra's shortest-path algorithm. The file contains an adjacency list representation of an undirected weighted graph with 200 vertices labeled 1 to 200. Each row consists of the node tuples that are adjacent to that particular vertex along with the length of that edge. Your task is to run Dijkstra's shortest-path algorithm on this graph, using 1 (the first vertex) as the source vertex, and to compute the shortest-path distances between 1 and every other vertex of the graph. If there is no path between a vertex v and vertex 1, we'll define the shortest-path distance between 1 and v to be 1000000.

week 4 covered:

  • GRAPH SEARCH AND CONNECTIVITY

Programming assignment: The file contains the edges of a directed graph. Vertices are labeled as positive integers from 1 to 875714. Your task is to code up the algorithm from the video lectures for computing strongly connected components (SCCs), and to run this algorithm on the given graph.

Week 3 covered:

  • PROBABILITY REVIEW
  • LINEAR-TIME SELECTION
  • GRAPHS AND THE CONTRACTION ALGORITHM

Programming assignment: The file contains the adjacency list representation of a simple undirected graph. Your task is to code up and run the randomized contraction algorithm for the min cut problem and use it on the above graph to compute the min cut.