Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 5: Which State is happiest?

Write a Python script,, that returns the name of the happiest state as a string. should take a file of tweets as an input and be usable in the following way:

      $ python <sentiment_file> <tweet_file>

The file AFINN-111.txt contains a list of pre-computed sentiment score.

Assume the tweet file contains data formatted the same way as the livestream data.

We recommend that you build on your solution to Problem 2.

There are three different objects within the tweet that you can use to determine it’s origin.

1 The coordinates object

2 The place object

3 The user object

You are free to develop your own strategy for determining the state that each tweet originates from.

Limit the tweets you analyze to those in the United States.

The live stream has a slightly different format from the response to the query you used in Problem 0. In this file, each line is a Tweet object, as described in the twitter documentation.

Note: Not every tweet dictionary will have a text key -- real data is dirty. Be prepared to debug, and feel free to throw out tweets that your code can't handle to get something working. For example, non-English tweets.

import sys
import json
def test (sf, tf):
    uncoded = []
    for x in tf.readlines():
        y= json.loads(x)
        if y.has_key("place"):
            if y["place"] != None and y["place"]["country"] == "United States" and y["place"]["country_code"] == "US":
                #decodedText.append((y["text"].encode("utf-8")), ((y["place"]["full_name"]).split(",")[1]))
               state= (y["place"]["full_name"]).split(",")[1]
               text = y["text"].encode("utf-8")
    return decodedText
    #for x in uncoded:
    #    decodedText.append((x.encode("utf-8")))
    #return decodedText
def sfDict(sf):
    #x = {}
    #for s in sf.readlines():
    #    y= s.split()
    #    x["pair"] = {
    #        "word" : y[0],
    #        "val" : y[1]
    #    }
    x = []
    for s in sf.readlines():
            y = s .split("\t")
            x.append((y[0], y[1]))
    return x
def check(decodedText, op):
    for (tweet, st) in decodedText:
        val= 0.0
        for (x,y) in op:
            if ((x + " " )  or (" " + x)) in tweet:
                val= val + float(y)
                if st in states:
                    states[st] += val
                    states[st] = val
    x= 0.0
    finalState = ""
    for key, value in states.iteritems():
        if value > x:
            finalState = key
            x= value
    print finalState
def main():
    sent_file = open(sys.argv[1])
    tweet_file = open(sys.argv[2])
    x =test(sent_file, tweet_file)
    y= sfDict(sent_file)
    check (x,y)
if __name__ == '__main__':

Leave a Comment

Fields with * are required.

Please enter the letters as they are shown in the image above.
Letters are not case-sensitive.