HASH is a hashing function. Inputs are hashed to an eight-digit base-36 string.
Collision
Hashes, by nature of mapping an infinite input domain to a restricted output, have a probability of collision.
The expected number of collisions can be calculated as:
For example, with items:
| (Number of items) | Expected Collisions |
|---|---|
| 1000 | ~0.000177 |
| 10000 | ~0.0177 |
| 100000 | ~1.77 |
| 1000000 | ~177 |
| 2000000 | ~709 |
| 5000000 | ~4431 |
This approximation only holds as long as the hash function follows a continuous uniform distribution.
Empirical Performance
The hash function demonstrates near-optimal collision resistance across different data types:
| Data Type | Test Size | Observed Collisions | Expected Collisions | Performance |
|---|---|---|---|---|
| Integers | 0 | ~710 | Excellent | |
| Floats | 3.53 (avg) | ~1.8 | Near-optimal | |
| Strings | 34 | ~24 | Near-optimal |
Notes:
- Integer test: Sequential integers from to showed zero collisions, significantly outperforming the theoretical expectation due to the deterministic nature of the input set.
- Float test: Average of 15 runs using
RANDARRAYto generate random floats. Result is close to the theoretical minimum of collisions. - String test: Dictionary of unique words. collisions is within acceptable variance of the theoretical for a random hash function.
Implementation
The hash function uses different algorithms optimized for each data type:
Numbers (Integers and Floats)
Uses multiplicative hashing with XOR mixing:
- Separate integer and fractional parts
- Hash integer part:
- Hash fractional part:
- XOR the parts together with sign handling
- Final multiply by and mod by
The use of as intermediate modulo (instead of ) preserves entropy during accumulation, preventing collision clusters.
Inspired by Knuth’s multiplicative hash (prime ) and MurmurHash finalizer patterns.
Strings
Uses FNV-1a inspired algorithm with large intermediate range:
- Initialize accumulator to FNV offset basis:
- For each character:
- Multiply accumulator by FNV prime ()
- Mod by to keep in range
- XOR with character code
- Mod by again
- Final multiply by and mod by
The two-stage approach ( during accumulation, at finalization) ensures uniform distribution across the output space.
Inspired by FNV-1a hash with modifications for Google Sheets’ numeric constraints.
Design Constraints
Google Sheets imposes several constraints that influenced the design:
- MOD limitations:
MOD(dividend, divisor)fails when - BITXOR requires integers: Float operands must be wrapped with
int() - No bitshift operators: Used division by powers of 2 instead
- Numeric precision: JavaScript float precision (53-bit mantissa) limits calculation size
These constraints led to the two-stage modulo approach, which balances collision resistance with computational safety.