For approximation of some trigonometric functions you can evaluate an approximating polynomial which can be faster if you can live with restrictions in the parameter range. This article is not about acquiring the coefficients of these polynomials but rather the implementation of their evaluation.

Here's a result of porting the packed vector library to ARM NEON compiler intrinsics.

A packed vector library is a library which makes it easier to use the extended SIMD instructions sets (packed vectors) of modern CPUs.

There are quite a few packed vector libraries for C++. Most of them don't provide all the functionality I wanted to have so I wrote my own.