Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 5: Which State is happiest?

Write a Python script, happiest_state.py, that returns the name of the happiest state as a string.

happiest_state.py should take a file of tweets as an input and be usable in the following way:

      $ python happiest_state.py <sentiment_file> <tweet_file>



The file AFINN-111.txt contains a list of pre-computed sentiment score.

Assume the tweet file contains data formatted the same way as the livestream data.

We recommend that you build on your solution to Problem 2.

There are three different objects within the tweet that you can use to determine it’s origin.

1 The coordinates object

2 The place object

3 The user object

You are free to develop your own strategy for determining the state that each tweet originates from.

Limit the tweets you analyze to those in the United States.

The live stream has a slightly different format from the response to the query you used in Problem 0. In this file, each line is a Tweet object, as described in the twitter documentation.

Note: Not every tweet dictionary will have a text key -- real data is dirty. Be prepared to debug, and feel free to throw out tweets that your code can't handle to get something working. For example, non-English tweets.

import sys
import json
 
 
def test (sf, tf):
    uncoded = []
    decodedText=[]
    for x in tf.readlines():
        y= json.loads(x)
        if y.has_key("place"):
            #uncoded.append(y["text
            if y["place"] != None and y["place"]["country"] == "United States" and y["place"]["country_code"] == "US":
                #decodedText.append((y["text"].encode("utf-8")), ((y["place"]["full_name"]).split(",")[1]))
               state= (y["place"]["full_name"]).split(",")[1]
               text = y["text"].encode("utf-8")
               decodedText.append((text,state))
 
    return decodedText
    #for x in uncoded:
    #
    #    decodedText.append((x.encode("utf-8")))
    #    
    #return decodedText
def sfDict(sf):
    #x = {}
    #for s in sf.readlines():
    #    y= s.split()
    #    x["pair"] = {
    #        "word" : y[0],
    #        "val" : y[1]
    #    }
    x = []
    for s in sf.readlines():
            y = s .split("\t")
            x.append((y[0], y[1]))
    return x
def check(decodedText, op):
    states={}
    for (tweet, st) in decodedText:
        val= 0.0
        for (x,y) in op:
            if ((x + " " )  or (" " + x)) in tweet:
                val= val + float(y)
                if st in states:
                    states[st] += val
                else:
                    states[st] = val
    x= 0.0
    finalState = ""
    for key, value in states.iteritems():
        if value > x:
            finalState = key
            x= value
    print finalState
 
def main():
    sent_file = open(sys.argv[1])
    tweet_file = open(sys.argv[2])
    x =test(sent_file, tweet_file)
    y= sfDict(sent_file)
    check (x,y)
 
if __name__ == '__main__':
    main()

Leave a Comment

Fields with * are required.

Please enter the letters as they are shown in the image above.
Letters are not case-sensitive.