Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 3: Derive the sentiment of new terms In this part you will be creating a script that computes the sentiment for the terms that do not appear in the file AFINN-111.txt.

Here's how you might think about the problem: We know we can use certain words to deduce the sentiment of a tweet. Once you know the sentiment of the tweets that contain some term, you can assign a sentiment to the term itself.

Don't feel obligated to use it, but the following paper may be helpful for developing a sentiment metric. Look at the Opinion Estimation subsection of the Text Analysis section in particular. O'Connor, B., Balasubramanyan, R., Routedge, B., & Smith, N. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. (ICWSM), May 2010.

You are provided with a skeleton file, term_sentiment.py, which can be executed using the following command: $ python term_sentiment.py Your script should print to stdout each term-sentiment pair, one pair per line, in the following format:

For example, if you have the pair (“foo”, 103.256) it should appear in the output as: foo 103.256 The order of your output does not matter.

import sys
import json
def hw():
    print 'Hello, world!'
 
def lines(fp):
    print str(len(fp.readlines()))
 
def test (sf, tf):
    uncoded = []
    decodedText=[]
    for x in tf.readlines():
        y= json.loads(x)
        if y.has_key("text"):
            uncoded.append(y["text"])
 
    for x in uncoded:
 
        decodedText.append((x.encode("utf-8")))
 
    return decodedText
def sfDict(sf):
    #x = {}
    #for s in sf.readlines():
    #    y= s.split()
    #    x["pair"] = {
    #        "word" : y[0],
    #        "val" : y[1]
    #    }
    x = []
    for s in sf.readlines():
            y = s .split("\t")
            x.append((y[0], y[1]))
    return x
 
def check(decodedText, op):
    for z in decodedText:
        val= 0.0
        for (x,y) in op:
            if ((x + " " )  or (" " + x)) in z:
                val= val + float(y)
        #print z + "  : " + str(val)
        return (z, val)
 
def check2(decodedText, op):
    for z in decodedText:
        val = 0.0
        for word in z.split():
            w = []
            for (x,y) in op:
                if word not in x:
                    w.append(word)
                elif word in x:
                    val = val + float(y)
            print word + " " + str(val)
 
def main():
    sent_file = open(sys.argv[1])
    tweet_file = open(sys.argv[2])
    x =test(sent_file, tweet_file)
    y= sfDict(sent_file)
    check(x,y)
    check2 (x,y)
 
if __name__ == '__main__':
    main()

Leave a Comment

Fields with * are required.

Please enter the letters as they are shown in the image above.
Letters are not case-sensitive.