hckrnws
For function-multiversioning, the intrinsic headers in both gcc and clang have logic to take care of selecting targets. You also don't need to do dispatch manually when writing manual optimizations--the same function name with different targets is supported and dispatches automatically.
Is it actually better/faster though? To see the difference between -O and -O2/3, compile some code for an x64 target on Godbolt and look at the output. -O produces optimised x86 code. -O2/3 produces enormous amounts of incomprehensible SSE/AVX/whatever code for even the simplest stuff, leading to a huge blowout in code size that can potentially interact badly with cacheing.
We had a look at this in embedded where you don't have infinite memory to play with and at the moment it's OK because there's no advanced instructions available to use, but it'll get ugly in the future when gcc realises it can use new instructions and produce five times the amount of object code for the same source code.
While using C extensions, and yes Microslop rather have you using C++.
https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-an...
Even if in recent years after tbat post they added support for C11 and C17, minus some stuff like aligned mallocs.
Crafted by Rajat
Source Code