Hash Tables

Hash tables are a corner stone of data structures for most problems, they are useful when there are unique keys with given values

Runtime analysis

  1. 1.
    Insert:
    O(1)O(1)
  2. 2.
    Search:
    O(1)O(1)
  3. 3.
    Delete:
    O(1)O(1)
  4. 4.
    Update:
    O(1)O(1)

Take note…

  1. 1.
    Do we need a custom hashing function?

Techniques

Anagram hashing

This technique is particularly when trying to detect all anagrams and store them in buckets of same anagrams
  • Multiplicative hash with prime numbers (3 onwards)
  • Assign each character a prime number and calculate the hash
  • Hash will be unique per anagram
  • Does not scale well if more than 52 characters possible or
    n>5000n > 5000
primes = [3, 5, 7, ...]
def hash(string):
value = 1
for ch in string:
value *= primes[ord(ch) - ord('a')]
return value

Fingerprint hash tables/arrays over hash tables

I would recommend using these over hash tables where possible as they can represent the same information while reducing the complexity of the code
freq = [0] * 26
# is the same as
freq = {}
for ch in s:
if ch not in freq:
freq[ch] = 0

Duplicate checking using bitmasks

This is a slight optimization over using hash tables/sets if the only thing that needs to be done is duplicate detection
  • Not feasible if the numbers can be very large as size grows by powers of 2
acc = 0
for ch in string:
mask = 1 << (ord(ch) - ord('a'))
if acc ^ mask < acc: # duplicate found
return False
acc |= mask

Multiplicative hashing

h(k)=m(ϕ×kϕ×k)h(k) = \lfloor m(\phi \times k - \lfloor \phi \times k \rfloor) \rfloor
  • Where
    kk
    is the key to hash,
    ϕ\phi
    is the golden ratio =
    512\sqrt{5} - \frac{1}{2}

Pairing with another data structure

Often used to improve performance at the cost of using more space to store the hash table
  • Use Linked Lists to optimize to quickly delete/insert nodes in a certain order
    • Hash table has pointers to the nodes of the linked list

Types of hashing

  1. 1.
    Chaining (linked list per bucket)
  2. 2.
    Open addressing (probing for a blank spot if current bucket is filled)