Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington
Problem 3: Derive the sentiment of new terms In this part you will be creating a script that computes the sentiment for the terms that do not appear in the file AFINN-111.txt.
Here's how you might think about the problem: We know we can use certain words to deduce the sentiment of a tweet. Once you know the sentiment of the tweets that contain some term, you can assign a sentiment to the term itself.
Don't feel obligated to use it, but the following paper may be helpful for developing a sentiment metric. Look at the Opinion Estimation subsection of the Text Analysis section in particular. O'Connor, B., Balasubramanyan, R., Routedge, B., & Smith, N. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. (ICWSM), May 2010.
You are provided with a skeleton file, term_sentiment.py, which can be executed using the following command:
$ python term_sentiment.py
Your script should print to stdout each term-sentiment pair, one pair per line, in the following format:
For example, if you have the pair (“foo”, 103.256) it should appear in the output as: foo 103.256 The order of your output does not matter.
import sys import json def hw(): print 'Hello, world!' def lines(fp): print str(len(fp.readlines())) def test (sf, tf): uncoded = [] decodedText=[] for x in tf.readlines(): y= json.loads(x) if y.has_key("text"): uncoded.append(y["text"]) for x in uncoded: decodedText.append((x.encode("utf-8"))) return decodedText def sfDict(sf): #x = {} #for s in sf.readlines(): # y= s.split() # x["pair"] = { # "word" : y[0], # "val" : y[1] # } x = [] for s in sf.readlines(): y = s .split("\t") x.append((y[0], y[1])) return x def check(decodedText, op): for z in decodedText: val= 0.0 for (x,y) in op: if ((x + " " ) or (" " + x)) in z: val= val + float(y) #print z + " : " + str(val) return (z, val) def check2(decodedText, op): for z in decodedText: val = 0.0 for word in z.split(): w = [] for (x,y) in op: if word not in x: w.append(word) elif word in x: val = val + float(y) print word + " " + str(val) def main(): sent_file = open(sys.argv[1]) tweet_file = open(sys.argv[2]) x =test(sent_file, tweet_file) y= sfDict(sent_file) check(x,y) check2 (x,y) if __name__ == '__main__': main()
Leave a Comment