RFC 110: Just-In-Time (JIT) compilation of expressions (on hold, NOT adopted)
Author: |
Even Rouault |
Contact: |
even.rouault @ spatialys.com |
Started: |
2025-11-25 |
Status: |
Draft, on hold |
Target: |
GDAL 3.13 |
Warning
This RFC has been put on hold. Its candidate implementation has NOT been merged into the official GDAL source code.
Summary
This RFC introduces the optional use of the CLang and LLVM embedded libraries to provide Just-In-Time (JIT) / on-the-fly compilation of code, and its use to speed-up evaluation of expressions, as used in VRTDerivedBand and gdal raster calc.
Motivation
Currently our expression evaluation engines are interpreters, and thus do not benefit from potential optimizations that a compiler can bring. By using the on-the-fly compilation capabilities of CLang and LLVM, and in particular their auto-vectorization optimizations when iterating over arrays, we can benefit from the instruction set of the running CPU, in particular SIMD instructions, such as SSE/AVX on Intel or Neon on ARM CPUs.
Technical details
A new LLVM expression dialect is added. The aim is to keep it synchronized as
much as possible to the MuParser dialect, but acknowledging there are subtle
differences in some numeric results, particularly when Float32 is used.
The LLVM/CLang dependency is optional, and does not make any assumption on which compiler is used to build GDAL itself. It is perfectly fine to build GDAL with GCC or Microsoft Visual C++ compiler.
The libraries are reasonably easy to find from packaging systems. Below a few examples from the changes done in our Continuous Integration / Docker images
Platform |
Build Requirements |
Execution Requirements |
|---|---|---|
Ubuntu Linux |
|
|
Alpine Linux |
|
|
Fedora Linux |
|
|
Conda-Forge |
|
|
New files
The following files are added:
cmake/modules/packages/FindLLVM.cmake:
CMake Find module for LLVM and CLang. Borrowed from another project (BSD-2 License) with modifications to also look for the clang-cpp libraries, in addition to the LLVM libraries.
gcore/gdal_jit_cpp.h:
Public GDAL C++ API to compile C code and get a function pointer for an entry point in that C code. The API does not leak LLVM/CLang object.
/** Returns an executable function from the provided C code. * * @param cCode Valid C code that has a function called functionName and * whose signature must be FunctionSignature. * The C code must not use any \#include statement. * @param functionName Entry point in the C code * @param papszOptions NULL-terminated list of options, or NULL. Unused for now. * @param[out] posDisassembledCode Pointer to a string that must receive the * disassembly of the compiled code, or nullptr * if not useful. * @param[out] pbHasVeclib Pointer to a boolean to indicate if a math vector lib * has been found, or nullptr if not useful. * @return a std::function of signature FunctionSignature corresponding to the * entry point in the C code (may be invalid in case of error.) * @since 3.13 */ template <typename FunctionSignature> std::function<FunctionSignature> CPL_DLL GDALGetJITFunction( const std::string &cCode, const std::string &functionName, CSLConstList papszOptions = nullptr, std::string *posDisassembledCode = nullptr, bool *pbHasVeclib = nullptr);
Note that we return a std::function and not a C callback because the code compiled by LLVM is owned by LLVM objects, and so we need to capture them in the lambda we return as the std::function, so as to keep the underlying C callable code valid as long as the std::function instance is valid.
To be noted that the compilation process does not involve any access to on-disk files (verified with the
straceutility). Only in-memory buffers are created.There is also a function to indicate if JIT capabilities are present:
/** Return which JIT engines are available. * * At time of writing, the return value may be an empty vector or a vector * with "LLVM". * * @since 3.13 */ std::vector<std::string> CPL_DLL GDALGetJITEngines();
That header file is installed, as it may be useful to other projects.
gcore/gdal_jit.cpp:
Implementation of the above header, using libLLVM and libclang-cpp C++ semi-public/semi-private/unstable APIs. That implementation has been verified to build and work for all versions from LLVM 14 up to LLVM 22dev. There are a few #ifdef to account for breaking changes in the LLVM API, but not that much.
gcore/gdal_c_expr.h:
Definition of a C++ class
GDAL_c_expr_nodeand astd::unique_ptr<GDAL_c_expr_node> CPL_DLL GDAL_c_expr_compile(const char *expr)method that parses a C-like MuParser expression string and return the root node of the abstract syntax tree (AST) of the expression. The parser supports all MuParser operators and functions, as well as ournan,isnan,isnodatacustom extensions.gcore/gdal_c_expr.cpp: Implementation of the above
gcore/gdal_c_expr.y: Bison grammar for a C-like MuParser expression
gcore/gdal_c_expr_parser.cpp/.h:
Files generated from above file, and included in
gdal_c_expr.cpp
Modified files
The ExprPixelFunc function in frmts/vrt/pixelfunctions.cpp, which is
the method used to evaluate MuParser/ExprTK expressions is modified.
When the JIT is present and dialect LLVM is specific,
GDAL_c_expr_compile is used to build an AST from it, and then a
GDALCFunctionGenerator class translates that AST back into a C function
that iterates over all pixels.
For example, give an expression AVG(A) on a 3-band raster, in gdal raster calc
flatten mode, where the compute data type is int, the following C code will
be generated in a in-memory string:
typedef __SIZE_TYPE__ size_t; static inline double my_sum(const int* __restrict const* inArrays, size_t nSources, size_t iCol) { double res = 0; for (size_t iSrc = 0; iSrc < nSources; ++iSrc) res += inArrays[iSrc][iCol]; return res; } double round(double); void computePixels(const int* __restrict const* const inArrays, const double* __restrict const* const auxDoubleArrays, int* __restrict const outArray, size_t nSources) { (void)auxDoubleArrays; for(size_t i = 0; i < nSources; ++i) { outArray[i] = (int)round((my_sum(inArrays, 3, i) * (1.0 / 3))); } }
That C code is then passed to an internal JITCompute method that calls
GDALGetJITFunction() and execute the returned function on the source buffers.
JITCompute maintains a per-thread cache of JIT compiled functions.
Runtime Configuration options
The following (advanced) configuration options are added:
GDAL_JIT_DEBUG=YES/NO (default: NO): when enabled, and
CPL_DEBUGconfiguration option is set toONorGDAL_JIT, debug messages specific to JIT usage, including generated C code and disassembly of the JIT code, will be emitted.GDAL_JIT_USE_VECLIB=YES/NO (default: YES): whether a mathematic library providing vectorized implementations of common transcendental functions,
sin(),cos(),tan(),exp(),log(), etc. can be used, when it is available. Such library is not linked to GDAL itself at GDAL compilation time, but is dynamically loaded by LLVM at runtime.GDAL_JIT_VECLIB_PATH=</path/to/dynamic_library>: path to the dynamic library providing vectorized implementations of transcendental functions. Defaults to the path of the
Accelerate frameworkon Mac OS X,SVMLon Windows, orlibmvec.so.1on Linux systems for Intel or ARM64 architectures, when they use a GNU C library (so not Alpine Linux for example)GDAL_JIT_VECLIB_TYPE=mvec|SVML|Accelerate|...: value supported by Clang -fveclib= option. Must be consistent with the library pointed by GDAL_JIT_VECLIB_PATH. See https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fveclib for more details.
Normally, users should not have to bother with any of them.
Benchmarks
On a Skylake x86_64 CPU, i.e. with AVX2 and Fused multiply add (FMA) extensions:
$ gdal raster resize --size 10000,5000 autotest/gdrivers/data/small_world.tif big_world.tif
Averaging all 3 bands (per pixel), with Byte output:
$ time gdal raster calc big_world.tif big_world_avg.tif \ --flatten --calc "avg(X)" -q --dialect LLVM --overwrite \ --output-data-type Byte real 0m0,283s $ time gdal raster calc big_world.tif big_world_avg.tif \ --flatten --calc "avg(X)" -q --dialect muparser --overwrite \ --output-data-type Byte real 0m1,255s $ time gdal raster calc big_world.tif big_world_mean.tif --dialect builtin \ --flatten --calc "mean" -q --overwrite --output-data-type Byte real 0m0,246sNote
The builtin "mean" method is slightly faster than the JIT'ed "avg(X)", since it has a manual ultra optimized SSE2 implementation.
Computing the maximum minus the minimum of all 3 bands (per pixel):
$ time gdal raster calc big_world.tif big_world_max_minus_min.tif \ --flatten --calc "avg(X)" -q --dialect LLVM --overwrite \ --output-data-type Byte real 0m0,248s $ time gdal raster calc big_world.tif big_world_max_minus_min.tif \ --flatten --calc "avg(X)" -q --dialect muparser --overwrite \ --output-data-type Byte real 0m1,838s
Computing the log10 of each pixel, per band, as Float64:
$ time gdal raster calc big_world.tif big_world_log10.tif \ --calc "log10(X)" -q --dialect LLVM --overwrite real 0m2,850s $ time gdal raster calc big_world.tif big_world_log10.tif \ --calc "log10(X)" -q --dialect muparser --overwrite real 0m5,631s $ time gdal raster calc big_world.tif big_world_log10_builtin.tif --dialect builtin \ --calc "log10" -q --overwrite real 0m6,490s
Computing the log10 of each pixel, per band, as Float32:
$ time gdal raster calc big_world.tif big_world_log10.tif \ --calc "log10(X)" -q --dialect LLVM --overwrite --output-data-type Float32 real 0m1,842s $ time gdal raster calc big_world.tif big_world_log10.tif \ --calc "log10(X)" -q --dialect muparser --overwrite --output-data-type Float32 real 0m4,893s $ time gdal raster calc big_world.tif big_world_log10_builtin.tif --dialect builtin \ --calc "log10" -q --overwrite --output-data-type Float32 real 0m6,032s
Computing the modulus of a complex number from one band with the real part and another one with the imaginary part:
$ time gdal raster calc two_float_bands.tif modulus_jit.tif --flatten \ --calc "sqrt(X[1]^2 + X[2]^2)" -q --overwrite --dialect LLVM --output-data-type Float32 real 0m0,549s $ time gdal raster calc two_float_bands.tif modulus_nojit.tif --flatten \ --calc "sqrt(X[1]^2 + X[2]^2)" -q --overwrite --dialect muparser --output-data-type Float32 real 0m1,582s
Warning
Above examples are best cases where the I/O cost is minimal. When slow I/O is involved (remote datasets and/or compression and/or suboptimal data arrangement), the CPU time can be negligible compared to I/O, and thus this enhancement will not bring any significant improvements.
Backward compatibility
This enhancement is fully backwards compatible.
Security
Executing the function returned by GDALGetJITFunction() can obviously lead
to arbitrary code execution depending on the content of the provided C code.
That said, in the VRTDerivedBand context, we are confident that there is no security
risk because the GDALCFunctionGenerator class composes the C code in a very
controlled way. Only recognized operators and functions are emitted, and none of
them involve risky operations. Furthermore the identifiers (names
referencing input arrays or variables) are never emitted verbatim from the
input but always translated from a controlled set.
Furthermore the impossibility to include headers from the C code provided to
GDALGetJITFunction() avoids the risk that malicious content in an external
file would impact our own code.
Documentation
The new public methods are documented.
The https://gdal.org/en/stable/development/building_from_source.html#cmake-general-configure-options
paragraph will be extended to document the CMake variables needed to configure
with LLVM/CLang support: LLVM_ROOT, LLVM_FIND_VERSION, LLVM_CONFIG_EXECUTABLE, LLVM_CLANG_LIBS
Testing
New tests have been added to test all the operators and functions.
A number of tests in autotest/gdrivers/vrtderived.py and autotest/utilities/test_gdal_raster_calc.py
are run both with MuParser and LLVM dialects.
The following continuous integration configurations are enhanced to add libLLVM/libclang-cpp support to GDAL:
Fedora Rawhide, x86_64 architecture
Alpine Linux, x86_64 architecture
Ubuntu 24.04, x86_64 architecture
MacOSX, arm64 architecture, using Conda-Forge dependencies
Windows, x86_64 architecture, using Conda-Forge dependencies
(added) Ubuntu 25.10, arm64 architecture (to have glibc >= 3.40, which is the minimum version to be able to use libmvec on Linux ARM64)
Docker images
The osgeo/gdal:alpine-normal and osgeo/gdal:ubuntu-full Docker images
are modified to include the libclang-cpp and libLLVM
Potential further enhancements
The approach in the above branch currently makes libclang and libLLVM dependencies
of the GDAL core. The weight of those runtime libraries is typically more than
100 MB. We could potentially use the GDAL plugin architecture, so that people
building GDAL could instead create a gdal_LLVM pseudo-driver that could be
opt-in at installation time, typically provided by a gdal-LLVM sub-package.
The API in gdal_jit_cpp.h is independent of VRT expressions and could
be re-used by other parts of GDAL, or external users, for other purposes.
Funding
Funded by GDAL Sponsorship Program (GSP)
Voting history
TBD