Estrin's method for polynomial evaluation

Synopsis

#include <boost/math/tools/estrin.hpp>

namespace boost { namespace math { namespace tools {

// Advanced interface: Use if you can preallocate a scratch buffer of size (coeffs.size() +1)/2:
// The scratch buffer size is *unchecked* in release compiles, so use at your own risk!
template<typename RandomAccessContainer1, typename RandomAccessContainer2, typename RealOrComplex>
inline auto evaluate_polynomial_estrin(RandomAccessContainer1 const & coeffs, RandomAccessContainer2& scratch, RealOrComplex z);

// Template specialization for std::array, no preallocation is necessary so massive performance improvements are typically observed:
template <typename RealOrComplex1, size_t n, typename RealOrComplex2>
inline RealOrComplex2 evaluate_polynomial_estrin(const std::array<RealOrComplex1, n> &coeffs, RealOrComplex2 z);

}}} // namespaces

Boost.math provided free functions which evaluate polynomials by Estrin's method. Though Estrin's method is not optimal from the standpoint of minimizing arithmetic operations (that claim goes to Horner's method), it nonetheless is well-suited to SIMD pipelines on modern CPUs. For example, on an 2022 M1 Pro, evaluating a double precision polynomial of length N using Estrin's method with scratch space takes 0.28 N nanoseconds for large N, whereas Horner's method takes 1.24 N ns. If you know your polynomial coefficients at compile time and can store them in a std::array, then Estrin's method is unconditionally faster. If the coefficients are computed at runtime, then only for N roughly greater than 90 is Estrin's method better. These numbers are highly dependent on compiler flags and architecture; ensure the compiler is allowed to emit vector instructions and fmas to take full advantage of the benefits of Estrin's method.

To measure the performance on your system, refer to the google benchmark file reporting/performance/estrin_performance.cpp.

Note that Estrin's method is less accurate that Horner's method. Refer to example/estrin_vs_horner_accuracy.cpp for details.

References

Muller, Jean-Michel (2005). Elementary Functions: Algorithms and Implementation (2nd ed.). Birkhäuser. p. 58. ISBN 0-8176-4372-9.