RFC 105: Add and use safe path manipulation functions
Author: |
Even Rouault |
Contact: |
even.rouault @ spatialys.com |
Started: |
2025-01-11 |
Status: |
Adopted, implemented in 3.11 |
Target: |
GDAL 3.11 |
Summary
This RFC adds safe versions, for use by C++ code, of all functions of cpl_path.cpp (such as CPLGetPath(), CPLGetDirname(), CPLGetBasename(), CPLGetExtension(), CPLFormFilename(), CPLFormCIFilename(), etc.), that returns a result stored in more or less ephemeral storage, to avoid potential security issues related to their mis-use. It also covers converting most of the code base to the safer alternatives.
Motivation
Above mentioned functions return a const char *
string, which is owned by
a thread-local rotating buffer of 10 strings of up to 2048 bytes each, managed
by the CPLGetStaticResult()
function of port/cpl_path.cpp
The current functions are very easy to use, allowing to do things like the following, without caring about freeing temporary buffers:
const char *pszDirname = CPLGetDirname(pszFilename);
const char *pszBasename = CPLGetBasename(pszFilename);
const char *pszSidecarFile = CPLFormFilename(pszDirname, pszBasename, "bin");
But this ease of use doesn't come risk-free, given that there is only room for 10 strings for a given thread. So if doing the following:
const char* pszExt = CPLGetExtension(mainFile);
for (int i = 0; i < 10; ++i )
{
do_something(CPLGetExtension(auxiliaryFiles[i]));
}
do_something_else(pszExt);
when reaching the last line, pszExt
is no longer an alias of
mainFile
, but of CPLGetExtension(auxiliaryFiles[0]). This will in the best
cases "just" cause malfunction, in bad cases, reading outside the bounds of
buffers, and perhaps even worse when unlucky.
This risk of such mis-use has been identified for long, and the documentation of those functions states that the result they return is short-lived, and that it must be stored in a longer-living storage when not immediately used.
Thus, patterns like the following is the recommended practice:
const std::string osExt = CPLGetExtension(mainFile);
for (int i = 0; i < 10; ++i )
{
do_something(std::string(CPLGetExtension(auxiliaryFiles[i])).c_str());
}
do_something_else(osExt.c_str());
However, experience has shown that it is extremely easy to forget that, and over the years, this has caused several heap buffer overflows bugs detected by oss-fuzz, and probably latent ones still uncovered. The later in date is https://issues.oss-fuzz.com/issues/388868487 , fixed by https://github.com/OSGeo/gdal/pull/11638 , and shows how subtle those bugs can be. In that instance, the issue is due to complex chains of calls where we end up using more than 10 instances of the thread-local storage.
Implementation
Rather to react on a case by case when fuzzers found issues, we want a systematic approach to suppress their root case:
Start from the implementation of the CPLXXXX() functions, copy them to C++ CPLXXXXSafe() functions that return a
std::string
instead of aconst char *
, and modify them to no longer use thread-local rotating buffer of strings, but rely on std::string natural concatenation routines instead of the convoluted C concatenation logic, resulting in clearer code.make the now superseded CPLXXXX() functions call the Safe versions, and store the result in the thread-local rotating buffer of strings. So those functions will mostly behave as their current implementation
That is a simple as:
const char *CPLGetPath(const char *pszFilename) { return CPLPathReturnTLSString(CPLGetPathSafe(pszFilename), __FUNCTION__); }
deprecate the use of the legacy CPLXXXX() functions in C++ code and replace their use by the use of the new CPLXXXXSafe() functions.
Most of the unsafe calls can be automatically replaced by the safer alternatives
for patterns like the following ones some_func(CPLGetBasename(x))
, to be replaced by
some_func(CPLGetBasenameSafe(x).c_str())
, using sed
.
Patterns like variable_name = CPLGetBasename(x)
will however require manual
intervention. If variable_name is a std::string
or CPLString
, then
replacing by variable_name = CPLGetBasenameSafe(x)
is appropriate. If
variable_name is a const char*
, case-by-case analysis has to be done,
to either change its type to std::string / CPLString, or create a temporary
std::string, and have variable_name be assigned to std_string_temp.c_str().
All in all, during development of the candidate implementation, several hundreds manual replacements have been made.
A #define
based protection will prevent any GDAL C++ code from accidentally
re-using the unsafe functions.
Although most call sites can benefit from the safe alternatives, we cannot remove completely the legacy functions,
because they are used by C code. For such code, immediately storing the result with CPLStrdup() is the best alternative (and remembering to CPLFree())
there is at least one known external code base (MapServer) that use some of the legacy functions, although it uses them from a C++ source code file, and thus could eventually migrate to the CPLXXXXSafe() functions when they are released.
Impact on performance
There might be a theoretical impact on performance due to dynamic memory
allocation done by temporary std::string
allocations, although normally
those uses occur in the identification and open part of drivers, and not in
performance critical loops. As the identification part is still critical, and
its main use if to get the filename extension, we extend the GDALOpenInfo class,
so it stores the filename extension as a member variable, which will save tens
of drivers from the need of calling CPLGetExtensionSafe(), as well as a
convenience GDALOpenInfo::IsExtensionEqualToCI(const char* pszExtension) method.
So, all in all, the performance impact of those changes is thought/hoped to be
in the hardly measurable category.
Risks
Due to the amount of changes, in particular for replacements that could not be automated, there is a non-zero risk of causing regressions, in particular in drivers with poor coverage of our regression test suite.
Backward compatibility
No impact
Testing
Existing autotest suite will cover changes.
Documentation
No impact
Funding
Funded by GDAL Sponsorship Program (GSP).
Voting history
+1 from PSC members HowardB, JukkaR, KurtS, JavierJS, DanB and EvenR