HASH
is a hashing function designed for compatibility with Anduin containers. Inputs are hashed to an eight-digit base-36 string.
Collision
Hashes, by nature of mapping an infinite input domain to a restricted output, have a probability of collision. In order to calculate the probability of collision, we can use an approximation of the birthday problem, derived from its Taylor series expansion.
Where is the cardinality of the hash space, or .
(Number of items) | Probability of collision |
---|---|
1000 | 0.0000001772351919 |
10000 | 0.0000177233637 |
100000 | 0.001770782387 |
1000000 | 0.1624172444 |
2000000 | 0.507834792 |
5000000 | 0.9880959927 |
This approximation only holds as long as the hash function follows a continuous uniform distribution. Below is an empirical result from hashing two datasets, showing that it does.