How to have devices (phones, laptops, desktops) process fast, lag less
Lag is how long your device takes to finish a program's instructions
This post allows all uses.
To reduce lags (to lag less):
Look for new versions of programs that have more optimizations;
Turn off features (such as shadows or animations) that do not have uses;
Replace old hardware parts with ones that use new processes -- such as smaller transisters (extra cores + higher gigaherts/clocks) or hardware acceleraters (suchas matrix multipliers);
If you use open source (FLOSS), use vectorizers for your particular CPU — SSE (Streaming SIMD Extensions) for Intel/AMD, and NEON for Arm — to recompile your programs and OS,
if you use closed source, look for a version that optimizes for your particular CPU's features/extensions
Windows comes with mechanisms ( https://wikipedia.org/wiki/WoW64#Performance ) to detect whether you have x32 or x64 CPU’s and uses the most fast modes
MSVC (MicroSoft Visual Compiler) has auto-vectorizers to have old programs use new CPU extensions/features ( https://devblogs.microsoft.com/cppblog/avx-512-auto-vectorization-in-msvc/ ),
GCC has auto-vectorizers ( https://gcc.gnu.org/projects/tree-ssa/vectorization.html ),
Clang/LLVM has auto-vectorizers ( https://llvm.org/devmtg/2014-02/slides/golin-AutoVectorizationLLVM.pdf )
Intel Performance Primitives (for Windows and Linux) has cpuid detect various CPU extensions as you launch programs and chooses ( https://www.reddit.com/r/simd/comments/6h9xy8/different_simd_codepaths_chosen_at_runtime_based/ ) the fastest code paths,
Solaris uses cpuid to detect CPU features to boost standard functions ( https://device.report/m/69d161d7d4feb288dac686a25cf8ee8019426789f720e509abc6c64558e4edc7 ) and has features for programmers to use CPU extensions ( https://stackoverflow.com/questions/53210019/override-hwcap-2-in-mapfile-on-solaris-x86-platforms )
C++'s "syntactic sugar" reduces code sizes (due to classes, templates, and the STL),
plus this sugar syntax allows more room for compilers to optimize sources:
https://stackoverflow.com/questions/13676172/optimization-expectations-in-the-stl
https://devblogs.microsoft.com/cppblog/algorithm-optimizations-advanced-stl-part-2/
Linux allows you to recompile the OS to use the most fast extensions ( https://www.reddit.com/r/linux/comments/53p4uu/compiling_glibc_with_cpuspecific_optimizations/ ),
GCC/LLVM/Clang allow you to recompile your SW to use the most fast extensions.
ArchLinux was produced to minimize effort for you to recompile all of your packages (what Linux calls programs) with CPU optimizations: https://bbs.archlinux.org/viewtopic.php?id=48957
New versions of Linux come with Multiarch which is similar to Windows WoW64: https://stackoverflow.com/questions/40343790/how-does-a-64b-linux-knows-to-manage-a-32b-application https://help.ubuntu.com/community/MultiArch
new versions of GCC/LLVM/Clang use cpuid to run the most fast code paths.
Auto parallelization produces threaded code (takes sources with lots of loops, splits the load across all local CPUs):
https://wikipedia.org/wiki/Automatic_parallelization
https://www.intel.com/content/www/us/en/developer/articles/technical/automatic-parallelization-with-intel-compilers.html “Adding the -Qparallel (Windows*) or -parallel (Linux* or macOS*) option to the compile command is the only action required of the programmer. However, successful parallelization is subject to certain conditions”
https://gcc.gnu.org/wiki/AutoParInGCC (gcc or g++) "You can trigger it by 2 flags -floop-parallelize-all -ftree-parallelize-loops=4"
https://polly.llvm.org/docs/UsingPollyWithClang.html “"To automatically detect parallel loops and generate OpenMP code for them you also need to add -mllvm -polly-parallel -lgomp to your CFLAGS. clang -O3 -mllvm -polly -mllvm -polly-parallel -lgomp file.c"“
https://link.springer.com/chapter/10.1007/978-3-030-64616-5_38 "LLVM Based Parallelization of C Programs for GPU"
https://stackoverflow.com/questions/41553533/auto-parallelization-of-simple-do-loop-memory-reference-too-complex (parallelizes Fortran)
Versus code which vectorizes (or parallelizes on local CPUs/GPUs,)
TensorFlow's "MapReduce" (https://www.tensorflow.org/federated/api_docs/python/tff/backends/mapreduce) parallelizes across clouds of CPUs/GPUs.
The sort of SW (programs) which get the most benefits from MapReduce is artificial CNS, such as: