week 6 covered topics:

  • HASHING: THE BASICS
  • UNIVERSAL HASHING
  • BLOOM FILTERS

Two programming assignments:

  • The goal is to implement a variant of the 2-SUM algorithm (covered in the Week 6 lecture on hash table applications)
  • The goal is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications).

The goal of this problem is to implement a variant of the 2-SUM algorithm (covered in the Week 6 lecture on hash table applications).

The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row of the file specifying the ith entry of the array.

Your task is to compute the number of target values t in the interval [-10000,10000] (inclusive) such that there are distinct numbers x,y in the input file that satisfy x+y=t. (NOTE: ensuring distinctness requires a one-line addition to the algorithm from lecture.)

Write your numeric answer (an integer between 0 and 20001) in the space provided.

OPTIONAL CHALLENGE: If this problem is too easy for you, try implementing your own hash table for it. For example, you could compare performance under the chaining and open addressing approaches to resolving collisions. Implementation 1

import sys
filename = "algo6_2sum.txt"
numbers = [int(l) for l in open(filename)]
targets = range(-10000,10001)
H = {}
answers = {}
 
for i in numbers:
  H[i] = True
 
for i in numbers:
  for t in targets:
    if t - i in H:
      if i == t - i:
        continue
      if t not in answers:
        answers[t] = set([tuple(sorted([i, t - i]))])
      else:
        answers[t].add(tuple(sorted([i, t - i])))
 
print len(answers)

Implementation 2

hash = {}
    count = 0
    input_file = open('HashInt.txt')
    for line in input_file:
            num = int(line.rstrip('\n'))
            hash[num] = 1
    input_file.close()
 
    def target_sum (t):
            for element in hash:
                    t_el = t-element
                    if t_el in hash and t_el != element:
                            return 1
            return 0
 
    for i in range(2500,4001):
            if target_sum(i):
                    count += 1
    print count

Question 2 Download the text file here.

The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xi denote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then mk is the (k/2)th smallest number among x1,…,xk.)

In the box below you should type the sum of these 10000 medians, modulo 10000 (i.e., only the last 4 digits). That is, you should compute (m1+m2+m3+⋯+m10000)mod10000.

OPTIONAL EXERCISE: Compare the performance achieved by heap-based and search-tree-based implementations of the algorithm.

Implementation 1

import heapq
import sys
filename = "algo6_median.txt"
X = [int(l) for l in open(filename)]
H_low = []
H_high = []
 
sum = 0
for x_i in X:
  if len(H_low) > 0:
    if x_i > -H_low[0]:
      heapq.heappush(H_high, x_i)
    else:
      heapq.heappush(H_low, -x_i)
  else:
    heapq.heappush(H_low, -x_i)
 
  if len(H_low) > len(H_high) + 1:
    heapq.heappush(H_high, -(heapq.heappop(H_low)))
  elif len(H_high) > len(H_low):
    heapq.heappush(H_low, -(heapq.heappop(H_high)))
 
  sum += -H_low[0]
 
print sum % 10000

Implementation 2

def read_file(filename):
        l_input = []
        myfile = open(filename)
        print 'file open'
        for line in myfile:
            num = int(line.rstrip('\n'))
            l_input.append(num)
        myfile.close()
        print 'file closed'
        return l_input
 
    def get_median(k1, k2):
        if (k1 + k2) % 2 != 0: return (k1 + k2 - 1) / 2
        else: return (k1 + k2) / 2
 
    def sort_insert(e, slist):
        n = len(slist)
        if n == 0:
            slist.append(e)
        else:
            k1 = 1
            k2 = n
            k = get_median(k1, k2)
            while k2 - k1 > 1:
                if e >= slist[k-1]: k1 = k
                elif e < slist[k-1]: k2 = k
                k = get_median(k1, k2)
            if k1 == k2:            
                if e >= slist[k-1]: slist.insert(k, e)
                else: slist.insert(k-1, e)
            elif k2 == k1 +1:
                if e <= slist[k1-1]: slist.insert(k1-1, e)
                elif e >= slist[k2-1]: slist.insert(k2, e)
                elif e <= slist[k-1]: slist.insert(k-1, e)
                elif e > slist[k-1]: slist.insert(k2-1, e)
        return slist
 
    def sort_median(mylist):
        slist = []
        medianlist = []
        n = len(mylist)
        for i in range (0, n):
            sort_insert(mylist[i],slist)
            medianlist.append(slist[get_median(0,len(slist)-1)])               
        return medianlist
 
    myfile = 'E://Median.txt'
    mylist = read_file(myfile)
    medianlist=sort_median(mylist)
    print 'answer:', sum(medianlist) % 10000

Leave a Comment

Fields with * are required.

Please enter the letters as they are shown in the image above.
Letters are not case-sensitive.