Series of programming assignments from "Introduction to Data Science" course - Join the data revolution by University of Washington

Problem 6: Top ten hash tags

Write a Python script, top_ten.py, that computes the ten most frequently occurring hash tags from the data you gathered in Problem 1.

top_ten.py should take a file of tweets as an input and be usable in the following way: \$ python top_ten.py Assume the tweet file contains data formatted the same way as the livestream data.

In the tweet file, each line is a Tweet object, as described in the twitter documentation. You should not be parsing the “text” field.

Your script should print to stdout each hashtag-count pair, one per line, in the following format:

```      <hashtag:string> <count:float>

```

For example, if you have the pair (baz, 30) it should appear in the output as:

```      baz 30.0

```

Remember your output must contain floats, not ints.

```import sys
import json

def test (tf):
tweets = []
decodedText=[]
#uncoded.append(y["text
if y.has_key("entities") and y["entities"]["hashtags"] != []:
for x in  y["entities"]["hashtags"]:
if x["text"].isalnum():
tweets.append((x["text"]))
newTweets ={}
for i in tweets:
if i in newTweets:
newTweets[i] += 1
else:
newTweets[i] = 1
topTen = []
for w in sorted(newTweets, key=newTweets.get, reverse=True):
topTen.append((w, newTweets[w]))
topTen = topTen[0:10]
for (x,y) in topTen:
print x + " " + str(y)
#for x in uncoded:
#
#    decodedText.append((x.encode("utf-8")))
#
#return decodedText
def sfDict(sf):
#x = {}
#    y= s.split()
#    x["pair"] = {
#        "word" : y[0],
#        "val" : y[1]
#    }
x = []
y = s .split("\t")
x.append((y[0], y[1]))
return x
def check(decodedText):
states={}
for (tweet, st) in decodedText:
val= 0.0
for (x,y) in op:
if ((x + " " )  or (" " + x)) in tweet:
val= val + float(y)
if st in states:
states[st] += val
else:
states[st] = val
x= 0.0
finalState = ""
for key, value in states.iteritems():
if value > x:
finalState = key
x= value
print finalState

def main():

tweet_file = open(sys.argv[1])
test(tweet_file)

if __name__ == '__main__':
main()```