HàPhan 河

What makes a good hashing function?

Below are my self-implement hash function follow this common way:

Screen-Shot-2019-11-06-at-4.59.06-PM

extension String {
    var myHash: UInt {
        let p: UInt = 1111991 //prime number, nearly all possible characters of str
        let m: UInt = UInt(1e9 + 11) //prime number, 
        // should be large to reduce the percentage of collision of two random strings (~ 1/m), 
        //and should be small enough to perform multiplication without overflowing (Int64)
        //https://en.wikipedia.org/wiki/Linear_congruential_generator
        
        var hashValue: UInt = 0
        var pPow: UInt = 1
        for c in self {
            let numberRepresenting = UInt(c.unicodeScalars.first!.value)
            hashValue += ((numberRepresenting * pPow) % m)
            pPow = (pPow * p) % m
        }
        return hashValue
    }
}

print("hello".myHash) //1980540924
print("thisisareally_long_string_to_compute_couldwemakeit".myHash) //23955027982
print("👏".myHash) //128079
print("1".myHash) //49
print(";".myHash) //59
print("a".myHash) //97
print("".myHash) //0
print(" ".myHash) //32

As you can see, the hash function will work with all unicode characters. It's using technique called polynomial rolling hash function.
p and m are constant, and the choice of them is important to get good hashing value. That is one element to judge a hash function, uniformly distribute the keys.

The second element to judge is the efficiently computation. Here I pre-calculate pPow, and the % function just take O(1) time complexity. Overall, the method above could satisfy O(n) where n is length of key.

Or you can reference those methods, it's simple and fast to catch up

extension String {
    var djb2hash: Int {
        let unicodeScalars = self.unicodeScalars.map { $0.value }
        return unicodeScalars.reduce(5381) {
            ($0 << 5) &+ $0 &+ Int($1)
        }
    }

    var sdbmhash: Int {
        let unicodeScalars = self.unicodeScalars.map { $0.value }
        return unicodeScalars.reduce(0) {
             Int($1) &+ ($0 << 6) &+ ($0 << 16) - $0
        }
    }
}

That's all. Happy coding.

Some extra works I tried to implement:

extension Int {
    var isPrime: Bool {
        if self % 2 == 0 {
            return false
        }
        
        let squareRoot = Int(sqrt(Double(self)))
        for i in stride(from: 3, through: squareRoot, by: 2) {
            if self % i == 0 {
                return false
            }
        }
        
        return true
    }
    
    var nearestPrime: Int {
        var goUp = self
        var goDown = self
        while true {
            goUp += 1
            goDown -= 1
            
            if goUp.isPrime {
                return goUp
            }
            if goDown.isPrime {
                return goDown
            }
        }
        return -1
    }
}

References

https://cp-algorithms.com/string/string-hashing.html
https://useyourloaf.com/blog/swift-hashable/
https://en.wikipedia.org/wiki/Rolling_hash
https://stackoverflow.com/questions/5924105/how-many-characters-can-be-mapped-with-unicode
https://www.geeksforgeeks.org/what-are-hash-functions-and-how-to-choose-a-good-hash-function/

Comments