Classes | Functions

cpl_string.h File Reference

#include "cpl_vsi.h"
#include "cpl_error.h"
#include "cpl_conv.h"
#include <string>

Go to the source code of this file.

Classes

class  CPLString
 Convenient string class based on std::string. More...
class  CPLStringList
 String list class designed around our use of C "char**" string lists. More...

Functions

int CSLCount (char **papszStrList)
void CSLDestroy (char **papszStrList)
char ** CSLDuplicate (char **papszStrList)
char ** CSLMerge (char **papszOrig, char **papszOverride)
 Merge two lists.
char ** CSLTokenizeString2 (const char *pszString, const char *pszDelimeter, int nCSLTFlags)
char ** CSLLoad (const char *pszFname)
char ** CSLLoad2 (const char *pszFname, int nMaxLines, int nMaxCols, char **papszOptions)
int CSLFindString (char **, const char *)
int CSLPartialFindString (char **papszHaystack, const char *pszNeedle)
int CSLFindName (char **papszStrList, const char *pszName)
int CSLTestBoolean (const char *pszValue)
const char * CPLParseNameValue (const char *pszNameValue, char **ppszKey)
char ** CSLSetNameValue (char **papszStrList, const char *pszName, const char *pszValue)
void CSLSetNameValueSeparator (char **papszStrList, const char *pszSeparator)
char * CPLEscapeString (const char *pszString, int nLength, int nScheme)
char * CPLUnescapeString (const char *pszString, int *pnLength, int nScheme)
char * CPLBinaryToHex (int nBytes, const GByte *pabyData)
GByte * CPLHexToBinary (const char *pszHex, int *pnBytes)
CPLValueType CPLGetValueType (const char *pszValue)
size_t CPLStrlcpy (char *pszDest, const char *pszSrc, size_t nDestSize)
size_t CPLStrlcat (char *pszDest, const char *pszSrc, size_t nDestSize)
size_t CPLStrnlen (const char *pszStr, size_t nMaxLen)
int CPLEncodingCharSize (const char *pszEncoding)
char * CPLRecode (const char *pszSource, const char *pszSrcEncoding, const char *pszDstEncoding)
char * CPLRecodeFromWChar (const wchar_t *pwszSource, const char *pszSrcEncoding, const char *pszDstEncoding)
wchar_t * CPLRecodeToWChar (const char *pszSource, const char *pszSrcEncoding, const char *pszDstEncoding)
int CPLIsUTF8 (const char *pabyData, int nLen)
char * CPLForceToASCII (const char *pabyData, int nLen, char chReplacementChar)
int CPLStrlenUTF8 (const char *pszUTF8Str)
CPLString CPLURLGetValue (const char *pszURL, const char *pszKey)
CPLString CPLURLAddKVP (const char *pszURL, const char *pszKey, const char *pszValue)

Detailed Description

Various convenience functions for working with strings and string lists.

A StringList is just an array of strings with the last pointer being NULL. An empty StringList may be either a NULL pointer, or a pointer to a pointer memory location with a NULL value.

A common convention for StringLists is to use them to store name/value lists. In this case the contents are treated like a dictionary of name/value pairs. The actual data is formatted with each string having the format "<name>:<value>" (though "=" is also an acceptable separator). A number of the functions in the file operate on name/value style string lists (such as CSLSetNameValue(), and CSLFetchNameValue()).

To some extent the CPLStringList C++ class can be used to abstract managing string lists a bit but still be able to return them from C functions.


Function Documentation

char* CPLBinaryToHex ( int  nBytes,
const GByte *  pabyData 
)

Binary to hexadecimal translation.

Parameters:
nBytes number of bytes of binary data in pabyData.
pabyData array of data bytes to translate.
Returns:
hexadecimal translation, zero terminated. Free with CPLFree().
int CPLEncodingCharSize ( const char *  pszEncoding  ) 

Return bytes per character for encoding.

This function returns the size in bytes of the smallest character in this encoding. For fixed width encodings (ASCII, UCS-2, UCS-4) this is straight forward. For encodings like UTF8 and UTF16 which represent some characters as a sequence of atomic character sizes the function still returns the atomic character size (1 for UTF8, 2 for UTF16).

This function will return the correct value for well known encodings with corresponding CPL_ENC_ values. It may not return the correct value for other encodings even if they are supported by the underlying iconv or windows transliteration services. Hopefully it will improve over time.

Parameters:
pszEncoding the name of the encoding.
Returns:
the size of a minimal character in bytes or -1 if the size is unknown.
char* CPLEscapeString ( const char *  pszInput,
int  nLength,
int  nScheme 
)

Apply escaping to string to preserve special characters.

This function will "escape" a variety of special characters to make the string suitable to embed within a string constant or to write within a text stream but in a form that can be reconstitued to it's original form. The escaping will even preserve zero bytes allowing preservation of raw binary data.

CPLES_BackslashQuotable(0): This scheme turns a binary string into a form suitable to be placed within double quotes as a string constant. The backslash, quote, '\0' and newline characters are all escaped in the usual C style.

CPLES_XML(1): This scheme converts the '<', '>', '"' and '&' characters into their XML/HTML equivelent (<, >, " and &) making a string safe to embed as CDATA within an XML element. The '\0' is not escaped and should not be included in the input.

CPLES_URL(2): Everything except alphanumerics and the underscore are converted to a percent followed by a two digit hex encoding of the character (leading zero supplied if needed). This is the mechanism used for encoding values to be passed in URLs.

CPLES_SQL(3): All single quotes are replaced with two single quotes. Suitable for use when constructing literal values for SQL commands where the literal will be enclosed in single quotes.

CPLES_CSV(4): If the values contains commas, semicolons, tabs, double quotes, or newlines it placed in double quotes, and double quotes in the value are doubled. Suitable for use when constructing field values for .csv files. Note that CPLUnescapeString() currently does not support this format, only CPLEscapeString(). See cpl_csv.cpp for csv parsing support.

Parameters:
pszInput the string to escape.
nLength The number of bytes of data to preserve. If this is -1 the strlen(pszString) function will be used to compute the length.
nScheme the encoding scheme to use.
Returns:
an escaped, zero terminated string that should be freed with CPLFree() when no longer needed.
char* CPLForceToASCII ( const char *  pabyData,
int  nLen,
char  chReplacementChar 
)

Return a new string that is made only of ASCII characters. If non-ASCII characters are found in the input string, they will be replaced by the provided replacement character.

Parameters:
pabyData input string to test
nLen length of the input string, or -1 if the function must compute the string length. In which case it must be null terminated.
chReplacementChar character which will be used when the input stream contains a non ASCII character. Must be valid ASCII !
Returns:
a new string that must be freed with CPLFree().
Since:
GDAL 1.7.0
CPLValueType CPLGetValueType ( const char *  pszValue  ) 

Detect the type of the value contained in a string, whether it is a real, an integer or a string Leading and trailing spaces are skipped in the analysis.

Note: in the context of this function, integer must be understood in a broad sense. It does not mean that the value can fit into a 32 bit integer for example. It might be larger.

Parameters:
pszValue the string to analyze
Returns:
returns the type of the value contained in the string.
GByte* CPLHexToBinary ( const char *  pszHex,
int *  pnBytes 
)

Hexadecimal to binary translation

Parameters:
pszHex the input hex encoded string.
pnBytes the returned count of decoded bytes placed here.
Returns:
returns binary buffer of data - free with CPLFree().
int CPLIsUTF8 ( const char *  pabyData,
int  nLen 
)

Test if a string is encoded as UTF-8.

Parameters:
pabyData input string to test
nLen length of the input string, or -1 if the function must compute the string length. In which case it must be null terminated.
Returns:
TRUE if the string is encoded as UTF-8. FALSE otherwise
Since:
GDAL 1.7.0
const char* CPLParseNameValue ( const char *  pszNameValue,
char **  ppszKey 
)

Parse NAME=VALUE string into name and value components.

Note that if ppszKey is non-NULL, the key (or name) portion will be allocated using VSIMalloc(), and returned in that pointer. It is the applications responsibility to free this string, but the application should not modify or free the returned value portion.

This function also support "NAME:VALUE" strings and will strip white space from around the delimeter when forming name and value strings.

Eventually CSLFetchNameValue() and friends may be modified to use CPLParseNameValue().

Parameters:
pszNameValue string in "NAME=VALUE" format.
ppszKey optional pointer though which to return the name portion.
Returns:
the value portion (pointing into original string).
char* CPLRecode ( const char *  pszSource,
const char *  pszSrcEncoding,
const char *  pszDstEncoding 
)

Convert a string from a source encoding to a destination encoding.

The only guaranteed supported encodings are CPL_ENC_UTF8, CPL_ENC_ASCII and CPL_ENC_ISO8859_1. Currently, the following conversions are supported :

  • CPL_ENC_ASCII -> CPL_ENC_UTF8 or CPL_ENC_ISO8859_1 (no conversion in fact)
  • CPL_ENC_ISO8859_1 -> CPL_ENC_UTF8
  • CPL_ENC_UTF8 -> CPL_ENC_ISO8859_1

If an error occurs an error may, or may not be posted with CPLError().

Parameters:
pszSource a NULL terminated string.
pszSrcEncoding the source encoding.
pszDstEncoding the destination encoding.
Returns:
a NULL terminated string which should be freed with CPLFree().
Since:
GDAL 1.6.0
char* CPLRecodeFromWChar ( const wchar_t *  pwszSource,
const char *  pszSrcEncoding,
const char *  pszDstEncoding 
)

Convert wchar_t string to UTF-8.

Convert a wchar_t string into a multibyte utf-8 string. The only guaranteed supported source encoding is CPL_ENC_UCS2, and the only guaranteed supported destination encodings are CPL_ENC_UTF8, CPL_ENC_ASCII and CPL_ENC_ISO8859_1. In some cases (ie. using iconv()) other encodings may also be supported.

Note that the wchar_t type varies in size on different systems. On win32 it is normally 2 bytes, and on unix 4 bytes.

If an error occurs an error may, or may not be posted with CPLError().

Parameters:
pwszSource the source wchar_t string, terminated with a 0 wchar_t.
pszSrcEncoding the source encoding, typically CPL_ENC_UCS2.
pszDstEncoding the destination encoding, typically CPL_ENC_UTF8.
Returns:
a zero terminated multi-byte string which should be freed with CPLFree(), or NULL if an error occurs.
Since:
GDAL 1.6.0
wchar_t* CPLRecodeToWChar ( const char *  pszSource,
const char *  pszSrcEncoding,
const char *  pszDstEncoding 
)

Convert UTF-8 string to a wchar_t string.

Convert a 8bit, multi-byte per character input string into a wide character (wchar_t) string. The only guaranteed supported source encodings are CPL_ENC_UTF8, CPL_ENC_ASCII and CPL_ENC_ISO8869_1 (LATIN1). The only guaranteed supported destination encoding is CPL_ENC_UCS2. Other source and destination encodings may be supported depending on the underlying implementation.

Note that the wchar_t type varies in size on different systems. On win32 it is normally 2 bytes, and on unix 4 bytes.

If an error occurs an error may, or may not be posted with CPLError().

Parameters:
pszSource input multi-byte character string.
pszSrcEncoding source encoding, typically CPL_ENC_UTF8.
pszDstEncoding destination encoding, typically CPL_ENC_UCS2.
Returns:
the zero terminated wchar_t string (to be freed with CPLFree()) or NULL on error.
Since:
GDAL 1.6.0
size_t CPLStrlcat ( char *  pszDest,
const char *  pszSrc,
size_t  nDestSize 
)

Appends a source string to a destination buffer.

This function ensures that the destination buffer is always NUL terminated (provided that its length is at least 1 and that there is at least one byte free in pszDest, that is to say strlen(pszDest_before) < nDestSize)

This function is designed to be a safer, more consistent, and less error prone replacement for strncat. Its contract is identical to libbsd's strlcat.

Truncation can be detected by testing if the return value of CPLStrlcat is greater or equal to nDestSize.

char szDest[5];
CPLStrlcpy(szDest, "ab", sizeof(szDest));
if (CPLStrlcat(szDest, "cde", sizeof(szDest)) >= sizeof(szDest))
    fprintf(stderr, "truncation occured !\n");
Parameters:
pszDest destination buffer. Must be NUL terminated before running CPLStrlcat
pszSrc source string. Must be NUL terminated
nDestSize size of destination buffer (including space for the NUL terminator character)
Returns:
the thoretical length of the destination string after concatenation (=strlen(pszDest_before) + strlen(pszSrc)). If strlen(pszDest_before) >= nDestSize, then it returns nDestSize + strlen(pszSrc)
Since:
GDAL 1.7.0
size_t CPLStrlcpy ( char *  pszDest,
const char *  pszSrc,
size_t  nDestSize 
)

Copy source string to a destination buffer.

This function ensures that the destination buffer is always NUL terminated (provided that its length is at least 1).

This function is designed to be a safer, more consistent, and less error prone replacement for strncpy. Its contract is identical to libbsd's strlcpy.

Truncation can be detected by testing if the return value of CPLStrlcpy is greater or equal to nDestSize.

char szDest[5];
if (CPLStrlcpy(szDest, "abcde", sizeof(szDest)) >= sizeof(szDest))
    fprintf(stderr, "truncation occured !\n");
Parameters:
pszDest destination buffer
pszSrc source string. Must be NUL terminated
nDestSize size of destination buffer (including space for the NUL terminator character)
Returns:
the length of the source string (=strlen(pszSrc))
Since:
GDAL 1.7.0
int CPLStrlenUTF8 ( const char *  pszUTF8Str  ) 

Return the number of UTF-8 characters of a nul-terminated string.

This is different from strlen() which returns the number of bytes.

Parameters:
pszUTF8Str a nul-terminated UTF-8 string
Returns:
the number of UTF-8 characters.
size_t CPLStrnlen ( const char *  pszStr,
size_t  nMaxLen 
)

Returns the length of a NUL terminated string by reading at most the specified number of bytes.

The CPLStrnlen() function returns MIN(strlen(pszStr), nMaxLen). Only the first nMaxLen bytes of the string will be read. Usefull to test if a string contains at least nMaxLen characters without reading the full string up to the NUL terminating character.

Parameters:
pszStr a NUL terminated string
nMaxLen maximum number of bytes to read in pszStr
Returns:
strlen(pszStr) if the length is lesser than nMaxLen, otherwise nMaxLen if the NUL character has not been found in the first nMaxLen bytes.
Since:
GDAL 1.7.0
char* CPLUnescapeString ( const char *  pszInput,
int *  pnLength,
int  nScheme 
)

Unescape a string.

This function does the opposite of CPLEscapeString(). Given a string with special values escaped according to some scheme, it will return a new copy of the string returned to it's original form.

Parameters:
pszInput the input string. This is a zero terminated string.
pnLength location to return the length of the unescaped string, which may in some cases include embedded '\0' characters.
nScheme the escaped scheme to undo (see CPLEscapeString() for a list).
Returns:
a copy of the unescaped string that should be freed by the application using CPLFree() when no longer needed.
CPLString CPLURLAddKVP ( const char *  pszURL,
const char *  pszKey,
const char *  pszValue 
)

Return a new URL with a new key=value pair.

Parameters:
pszURL the URL.
pszKey the key to find.
pszValue the value of the key (may be NULL to unset an existing KVP).
Returns:
the modified URL.
Since:
GDAL 1.9.0

References CPLURLAddKVP(), and CPLString::ifind().

Referenced by CPLURLAddKVP().

CPLString CPLURLGetValue ( const char *  pszURL,
const char *  pszKey 
)

Return the value matching a key from a key=value pair in a URL.

Parameters:
pszURL the URL.
pszKey the key to find.
Returns:
the value of empty string if not found.
Since:
GDAL 1.9.0

References CPLURLGetValue(), and CPLString::ifind().

Referenced by CPLURLGetValue().

int CSLCount ( char **  papszStrList  ) 

Return number of items in a string list.

Returns the number of items in a string list, not counting the terminating NULL. Passing in NULL is safe, and will result in a count of zero.

Lists are counted by iterating through them so long lists will take more time than short lists. Care should be taken to avoid using CSLCount() as an end condition for loops as it will result in O(n^2) behavior.

Parameters:
papszStrList the string list to count.
Returns:
the number of entries.
void CSLDestroy ( char **  papszStrList  ) 

Free string list.

Frees the passed string list (null terminated array of strings). It is safe to pass NULL.

Parameters:
papszStrList the list to free.
char** CSLDuplicate ( char **  papszStrList  ) 

Clone a string list.

Efficiently allocates a copy of a string list. The returned list is owned by the caller and should be freed with CSLDestroy().

Parameters:
papszStrList the input string list.
Returns:
newly allocated copy.
int CSLFindName ( char **  papszStrList,
const char *  pszName 
)

Find StringList entry with given key name.

Parameters:
papszStrList the string list to search.
pszName the key value to look for (case insensitive).
Returns:
-1 on failure or the list index of the first occurance matching the given key.
int CSLFindString ( char **  papszList,
const char *  pszTarget 
)

Find a string within a string list.

Returns the index of the entry in the string list that contains the target string. The string in the string list must be a full match for the target, but the search is case insensitive.

Parameters:
papszList the string list to be searched.
pszTarget the string to be searched for.
Returns:
the index of the string within the list or -1 on failure.
char** CSLLoad ( const char *  pszFname  ) 

Load a text file into a string list.

The VSI*L API is used, so VSIFOpenL() supported objects that aren't physical files can also be accessed. Files are returned as a string list, with one item in the string list per line. End of line markers are stripped (by CPLReadLineL()).

If reading the file fails a CPLError() will be issued and NULL returned.

Parameters:
pszFname the name of the file to read.
Returns:
a string list with the files lines, now owned by caller. To be freed with CSLDestroy()
char** CSLLoad2 ( const char *  pszFname,
int  nMaxLines,
int  nMaxCols,
char **  papszOptions 
)

Load a text file into a string list.

The VSI*L API is used, so VSIFOpenL() supported objects that aren't physical files can also be accessed. Files are returned as a string list, with one item in the string list per line. End of line markers are stripped (by CPLReadLineL()).

If reading the file fails a CPLError() will be issued and NULL returned.

Parameters:
pszFname the name of the file to read.
nMaxLines maximum number of lines to read before stopping, or -1 for no limit.
nMaxCols maximum number of characters in a line before stopping, or -1 for no limit.
papszOptions NULL-terminated array of options. Unused for now.
Returns:
a string list with the files lines, now owned by caller. To be freed with CSLDestroy()
Since:
GDAL 1.7.0

References VSIFCloseL(), VSIFEofL(), and VSIFOpenL().

char** CSLMerge ( char **  papszOrig,
char **  papszOverride 
)

Merge two lists.

The two lists are merged, ensuring that if any keys appear in both that the value from the second (papszOverride) list take precidence.

Parameters:
papszOrig the original list, being modified.
papszOverride the list of items being merged in. This list is unaltered and remains owned by the caller.
Returns:
updated list.
int CSLPartialFindString ( char **  papszHaystack,
const char *  pszNeedle 
)

Find a substring within a string list.

Returns the index of the entry in the string list that contains the target string as a substring. The search is case sensitive (unlike CSLFindString()).

Parameters:
papszHaystack the string list to be searched.
pszNeedle the substring to be searched for.
Returns:
the index of the string within the list or -1 on failure.
char** CSLSetNameValue ( char **  papszList,
const char *  pszName,
const char *  pszValue 
)

Assign value to name in StringList.

Set the value for a given name in a StringList of "Name=Value" pairs ("Name:Value" pairs are also supported for backward compatibility with older stuff.)

If there is already a value for that name in the list then the value is changed, otherwise a new "Name=Value" pair is added.

Parameters:
papszList the original list, the modified version is returned.
pszName the name to be assigned a value. This should be a well formed token (no spaces or very special characters).
pszValue the value to assign to the name. This should not contain any newlines (CR or LF) but is otherwise pretty much unconstrained. If NULL any corresponding value will be removed.
Returns:
modified stringlist.
void CSLSetNameValueSeparator ( char **  papszList,
const char *  pszSeparator 
)

Replace the default separator (":" or "=") with the passed separator in the given name/value list.

Note that if a separator other than ":" or "=" is used, the resulting list will not be manipulatable by the CSL name/value functions any more.

The CPLParseNameValue() function is used to break the existing lines, and it also strips white space from around the existing delimiter, thus the old separator, and any white space will be replaced by the new separator. For formatting purposes it may be desireable to include some white space in the new separator. eg. ": " or " = ".

Parameters:
papszList the list to update. Component strings may be freed but the list array will remain at the same location.
pszSeparator the new separator string to insert.
int CSLTestBoolean ( const char *  pszValue  ) 

Test what boolean value contained in the string.

If pszValue is "NO", "FALSE", "OFF" or "0" will be returned FALSE. Otherwise, TRUE will be returned.

Parameters:
pszValue the string should be tested.
Returns:
TRUE or FALSE.
char** CSLTokenizeString2 ( const char *  pszString,
const char *  pszDelimiters,
int  nCSLTFlags 
)

Tokenize a string.

This function will split a string into tokens based on specified' delimeter(s) with a variety of options. The returned result is a string list that should be freed with CSLDestroy() when no longer needed.

The available parsing options are:

  • CSLT_ALLOWEMPTYTOKENS: allow the return of empty tokens when two delimiters in a row occur with no other text between them. If not set, empty tokens will be discarded;
  • CSLT_STRIPLEADSPACES: strip leading space characters from the token (as reported by isspace());
  • CSLT_STRIPENDSPACES: strip ending space characters from the token (as reported by isspace());
  • CSLT_HONOURSTRINGS: double quotes can be used to hold values that should not be broken into multiple tokens;
  • CSLT_PRESERVEQUOTES: string quotes are carried into the tokens when this is set, otherwise they are removed;
  • CSLT_PRESERVEESCAPES: if set backslash escapes (for backslash itself, and for literal double quotes) will be preserved in the tokens, otherwise the backslashes will be removed in processing.

Example:

Parse a string into tokens based on various white space (space, newline, tab) and then print out results and cleanup. Quotes may be used to hold white space in tokens.

    char **papszTokens;
    int i;

    papszTokens = 
        CSLTokenizeString2( pszCommand, " \t\n", 
                            CSLT_HONOURSTRINGS | CSLT_ALLOWEMPTYTOKENS );

    for( i = 0; papszTokens != NULL && papszTokens[i] != NULL; i++ )
        printf( "arg %d: '%s'", papszTokens[i] );
    CSLDestroy( papszTokens );
Parameters:
pszString the string to be split into tokens.
pszDelimiters one or more characters to be used as token delimeters.
nCSLTFlags an ORing of one or more of the CSLT_ flag values.
Returns:
a string list of tokens owned by the caller.

References CPLStringList::AddString(), CPLStringList::Assign(), CPLStringList::Count(), and CPLStringList::StealList().


Generated for GDAL by doxygen 1.7.1.