HASH is a hashing function designed for compatibility with Anduin containers. Inputs are hashed to an eight-digit base-36 string.

Collision

Hashes, by nature of mapping an infinite input domain to a restricted output, have a probability of collision. In order to calculate the probability of collision, we can use an approximation of the birthday problem, derived from its Taylor series expansion.

Where is the cardinality of the hash space, or .

(Number of items)Probability of collision
10000.0000001772351919
100000.0000177233637
1000000.001770782387
10000000.1624172444
20000000.507834792
50000000.9880959927

This approximation only holds as long as the hash function follows a continuous uniform distribution. Below is an empirical result from hashing two datasets, showing that it does.