HASH is a hashing function. Inputs are hashed to an eight-digit base-36 string.

Collision

Hashes, by nature of mapping an infinite input domain to a restricted output, have a probability of collision.

The expected number of collisions can be calculated as:

For example, with items:

(Number of items)Expected Collisions
1000~0.000177
10000~0.0177
100000~1.77
1000000~177
2000000~709
5000000~4431

This approximation only holds as long as the hash function follows a continuous uniform distribution.

Empirical Performance

The hash function demonstrates near-optimal collision resistance across different data types:

Data TypeTest SizeObserved CollisionsExpected CollisionsPerformance
Integers0~710Excellent
Floats3.53 (avg)~1.8Near-optimal
Strings34~24Near-optimal

Notes:

  • Integer test: Sequential integers from to showed zero collisions, significantly outperforming the theoretical expectation due to the deterministic nature of the input set.
  • Float test: Average of 15 runs using RANDARRAY to generate random floats. Result is close to the theoretical minimum of collisions.
  • String test: Dictionary of unique words. collisions is within acceptable variance of the theoretical for a random hash function.

Implementation

The hash function uses different algorithms optimized for each data type:

Numbers (Integers and Floats)

Uses multiplicative hashing with XOR mixing:

  1. Separate integer and fractional parts
  2. Hash integer part:
  3. Hash fractional part:
  4. XOR the parts together with sign handling
  5. Final multiply by and mod by

The use of as intermediate modulo (instead of ) preserves entropy during accumulation, preventing collision clusters.

Inspired by Knuth’s multiplicative hash (prime ) and MurmurHash finalizer patterns.

Strings

Uses FNV-1a inspired algorithm with large intermediate range:

  1. Initialize accumulator to FNV offset basis:
  2. For each character:
    • Multiply accumulator by FNV prime ()
    • Mod by to keep in range
    • XOR with character code
    • Mod by again
  3. Final multiply by and mod by

The two-stage approach ( during accumulation, at finalization) ensures uniform distribution across the output space.

Inspired by FNV-1a hash with modifications for Google Sheets’ numeric constraints.

Design Constraints

Google Sheets imposes several constraints that influenced the design:

  • MOD limitations: MOD(dividend, divisor) fails when
  • BITXOR requires integers: Float operands must be wrapped with int()
  • No bitshift operators: Used division by powers of 2 instead
  • Numeric precision: JavaScript float precision (53-bit mantissa) limits calculation size

These constraints led to the two-stage modulo approach, which balances collision resistance with computational safety.