Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington
Problem 5: Which State is happiest?
Write a Python script, happiest_state.py, that returns the name of the happiest state as a string.
happiest_state.py should take a file of tweets as an input and be usable in the following way:
$ python happiest_state.py <sentiment_file> <tweet_file>
The file AFINN-111.txt contains a list of pre-computed sentiment score.
Assume the tweet file contains data formatted the same way as the livestream data.
We recommend that you build on your solution to Problem 2.
There are three different objects within the tweet that you can use to determine it’s origin.
1 The coordinates object
2 The place object
3 The user object
You are free to develop your own strategy for determining the state that each tweet originates from.
Limit the tweets you analyze to those in the United States.
The live stream has a slightly different format from the response to the query you used in Problem 0. In this file, each line is a Tweet object, as described in the twitter documentation.
Note: Not every tweet dictionary will have a text key -- real data is dirty. Be prepared to debug, and feel free to throw out tweets that your code can't handle to get something working. For example, non-English tweets.
import sys import json def test (sf, tf): uncoded = [] decodedText=[] for x in tf.readlines(): y= json.loads(x) if y.has_key("place"): #uncoded.append(y["text if y["place"] != None and y["place"]["country"] == "United States" and y["place"]["country_code"] == "US": #decodedText.append((y["text"].encode("utf-8")), ((y["place"]["full_name"]).split(",")[1])) state= (y["place"]["full_name"]).split(",")[1] text = y["text"].encode("utf-8") decodedText.append((text,state)) return decodedText #for x in uncoded: # # decodedText.append((x.encode("utf-8"))) # #return decodedText def sfDict(sf): #x = {} #for s in sf.readlines(): # y= s.split() # x["pair"] = { # "word" : y[0], # "val" : y[1] # } x = [] for s in sf.readlines(): y = s .split("\t") x.append((y[0], y[1])) return x def check(decodedText, op): states={} for (tweet, st) in decodedText: val= 0.0 for (x,y) in op: if ((x + " " ) or (" " + x)) in tweet: val= val + float(y) if st in states: states[st] += val else: states[st] = val x= 0.0 finalState = "" for key, value in states.iteritems(): if value > x: finalState = key x= value print finalState def main(): sent_file = open(sys.argv[1]) tweet_file = open(sys.argv[2]) x =test(sent_file, tweet_file) y= sfDict(sent_file) check (x,y) if __name__ == '__main__': main()
Leave a Comment