/home/huy/everyday

# 01.13.2022 - Comparing floating-point numbers

Due to rounding errors, most floating-point numbers cannot be represented precisely in the computer. This led to some weird situations like:

0.1 + 0.2 = 0.30000000000000004


Different languages handle this calculation differently.

So, we should not directly compare two floating-point numbers, but use the approximate approach, called epsilon comparison:

approxEqAbs(a, b) =
diff := abs(a - b)
return diff <= ε


Where ε is some acceptable error margin, often called epsilon or machine epsilon. It's the distance between 1 and the next largest floating-point number, or float(1 + ε) != 1.

For example, for IEEE-754 single precision, ε = 2^-23:

            1 = 1.00000000000000000000000
1 + epsilon_m = 1.00000000000000000000001
^                       ^
2^0                    2^(-23)


For double precision, ε = 2^-52:

            1 = 1.0000000000000000000000000000000000000000000000000000
1 + epsilon_m = 1.0000000000000000000000000000000000000000000000000001
^                                                    ^
2^0                                                2^(-52)


It's predefined in many languages, for example:

• Number.EPSILON in JavaScript/TypeScript
• f16_epsilon, f32_epsilon, f64_epsilon, f128_epsilon in Zig
• FLT_EPSILON in C++
• f32::EPSILON, f64::EPSILON in Rust

See Values for standard hardware floating-point arithmetic table on Wikipedia for more details.

The above approximation sounds right, but it only works for small numbers around zero. This won't work correctly for larger numbers because the gap between floats grows larger.

A better approach for this case is to use relative epsilon comparison: Calculate the difference between the two numbers a and b, if the difference is smaller than n% of a, then a and b can be considered equal (assuming a is the larger number of the two numbers).

approxEqRelative(a, b) =
largest := max(a, b)
diff := abs(a - b)
return diff <= largest * ε


In reality, when implementing these methods, we also need to pay attention to some edge cases. Such as, if a and b are zeros or infinities, we should just compare them directly with ==, and if any of them is NaN, the comparison should just return false.

Following is the implementation of approxEqAbs and approxEqRelative in Zig's standard library:

lib/std/math.zig

pub fn approxEqAbs(comptime T: type, x: T, y: T, tolerance: T) bool {
assert(@typeInfo(T) == .Float);
assert(tolerance >= 0);

// Fast path for equal values (and signed zeros and infinites).
if (x == y)
return true;

if (isNan(x) or isNan(y))
return false;

return fabs(x - y) <= tolerance;
}

pub fn approxEqRel(comptime T: type, x: T, y: T, tolerance: T) bool {
assert(@typeInfo(T) == .Float);
assert(tolerance > 0);

// Fast path for equal values (and signed zeros and infinites).
if (x == y)
return true;

if (isNan(x) or isNan(y))
return false;

return fabs(x - y) <= max(fabs(x), fabs(y)) * tolerance;
}


References: