Vector driver in Python implementation tutorial
Added in version 3.1.
Introduction
Since GDAL 3.1, the capability of writing read-only vector drivers in Python has been added. It is strongly advised to read the Vector driver implementation tutorial first, which will give the general principles of how a vector driver works.
This capability does not require the use of the GDAL/OGR SWIG Python bindings (but a vector Python driver may use them.)
Note: per project policies, this is considered as an "experimental" feature and the GDAL project will not accept such Python drivers to be included in the GDAL repository. Drivers aiming at inclusion in GDAL master should priorly be ported to C++. The rationale for this is that:
the correctness of the Python code can mostly be checked at runtime, whereas C++ benefits from static analysis (at compile time, and other checkers).
Python code is executed under the Python Global Interpreter Lock, which makes them not scale.
Not all builds of GDAL have Python available.
Linking mechanism to a Python interpreter
Driver location
Driver filenames must start with gdal_ or ogr_ and have the .py extension. They will be searched in the following directies:
the directory pointed by the
GDAL_PYTHON_DRIVER_PATH
configuration option (there may be several paths separated by : on Unix or ; on Windows)if not defined, the directory pointed by the
GDAL_DRIVER_PATH
configuration option.if not defined, in the directory (hardcoded at compilation time on Unix builds) where native plugins are located.
GDAL does not try to manage Python dependencies that are imported by the driver .py script. It is up to the user to make sure its current Python environment has all required dependencies installed.
Import section
Drivers must have the following import section to load the base classes.
from gdal_python_driver import BaseDriver, BaseDataset, BaseLayer
The gdal_python_driver
module is created dynamically by GDAL and is not
present on the file system.
Metadata section
In the first 1000 lines of the .py file, a number of required and optional KEY=VALUE driver directives must be defined. They are parsed by C++ code, without using the Python interpreter, so it is vital to respect the following constraints:
each declaration must be on a single line, and start with
# gdal: DRIVER_
(space character between sharp character and gdal, and between colon character and DRIVER_)the value must be a literal value of type string (except for # gdal: DRIVER_SUPPORTED_API_VERSION which can accept an array of integers), without expressions, function calls, escape sequences, etc.
strings may be single or double-quoted
The following directives must be declared:
# gdal: DRIVER_NAME = "NAME"
: the short name of the driver# gdal: DRIVER_SUPPORTED_API_VERSION = [1]
: the API version(s) supported by the driver. Must include 1, which is the only currently supported version in GDAL 3.1# gdal: DRIVER_DCAP_VECTOR = "YES"
: declares a vector driver# gdal: DRIVER_DMD_LONGNAME = "a longer name of the driver"
Additional directives:
# gdal: DRIVER_DMD_EXTENSIONS = "ext1 ext2"
: list of extension(s) recognized by the driver, without the dot, and separated by space# gdal: DRIVER_DMD_HELPTOPIC = "https://example.com/my_help.html"
: URL to a help page for the driver# gdal: DRIVER_DMD_OPENOPTIONLIST = "<OpenOptionList><Option name='OPT1' type='boolean' description='bla' default='NO'/></OpenOptionList>"
where the XML is anOptionOptionList
.and all other metadata items found in gdal.h starting with
GDAL_DMD_
orGDAL_DCAP
by creating an item name which starts with# gdal: DRIVER_
and the value of theGDAL_DMD_
orGDAL_DCAP
metadata item. For example#define GDAL_DMD_CONNECTION_PREFIX "DMD_CONNECTION_PREFIX"
becomes# gdal: DRIVER_DMD_CONNECTION_PREFIX
Example:
# gdal: DRIVER_NAME = "DUMMY"
# gdal: DRIVER_SUPPORTED_API_VERSION = [1]
# gdal: DRIVER_DCAP_VECTOR = "YES"
# gdal: DRIVER_DMD_LONGNAME = "my dummy plugin"
# gdal: DRIVER_DMD_EXTENSIONS = "foo bar"
# gdal: DRIVER_DMD_HELPTOPIC = "https://example.com/my_help.html"
Driver class
The entry point .py script must contains a single class that inherits from
gdal_python_driver.BaseDriver
.
That class must define the following methods:
- identify(self, filename, first_bytes, open_flags, open_options={})
- Parameters:
filename (str) -- File name, or more generally, connection string.
first_bytes (binary) -- First bytes of the file (if it is a file). At least 1024 (if the file has at least 1024 bytes), or more if a native driver in the driver probe sequence has requested more previously.
open_flags (int) -- Open flags. To be ignored for now.
open_options (dict) -- Open options.
- Returns:
True if the file is recognized by the driver, False if not, or -1 if that cannot be known from the first bytes.
- open(self, filename, first_bytes, open_flags, open_options={})
- Parameters:
filename (str) -- File name, or more generally, connection string.
first_bytes (binary) -- First bytes of the file (if it is a file). At least 1024 (if the file has at least 1024 bytes), or more if a native driver in the driver probe sequence has requested more previously.
open_flags (int) -- Open flags. To be ignored for now.
open_options (dict) -- Open options.
- Returns:
an object deriving from gdal_python_driver.BaseDataset or None
Example:
# Required: class deriving from BaseDriver
class Driver(BaseDriver):
def identify(self, filename, first_bytes, open_flags, open_options={}):
return filename == 'DUMMY:'
# Required
def open(self, filename, first_bytes, open_flags, open_options={}):
if not self.identify(filename, first_bytes, open_flags):
return None
return Dataset(filename)
Dataset class
The Driver.open() method on success should return an object from a class that
inherits from gdal_python_driver.BaseDataset
.
Layers
The role of this object is to store vector layers. There are two implementation
options. If the number of layers is small or they are fast to construct, then
the __init__
method can defined a layers
attribute that is a sequence of
objects from a class that inherits from gdal_python_driver.BaseLayer
.
Example:
class Dataset(BaseDataset):
def __init__(self, filename):
self.layers = [Layer(filename)]
Otherwise, the following two methods should be defined:
- layer_count(self)
- Returns:
the number of layers
- layer(self, idx)
- Parameters:
idx (int) -- Index of the layer to return. Normally between 0 and self.layer_count() - 1, but calling code might pass any value. In case of invalid index, None should be returned.
- Returns:
an object deriving from gdal_python_driver.BaseLayer or None. The C++ code will take care of caching that object, and this method will only be called once for a given idx value.
Example:
class Dataset(BaseDataset):
def layer_count(self):
return 1
def layer(self, idx):
return [Layer(self.filename)] if idx = 0 else None
Metadata
The dataset may define a metadata
dictionary, in __init__
of
key: value of type string, for the default metadata domain.
Alternatively, the following method may be implemented.
- metadata(self, domain)
- Parameters:
domain (str) -- metadata domain. Empty string for the default one
- Returns:
None, or a dictionary of key:value pairs of type string;
Other methods
The following method may be optionally implemented:
- close(self)
Called at the destruction of the C++ peer GDALDataset object. Useful to close database connections for example.
Layer class
The Dataset object will instantiate one or several objects from a class that
inherits from gdal_python_driver.BaseLayer
.
Metadata, and other definitions
The following attributes are required and must defined at __init__ time:
- name
Layer name, of type string. If not set, a
name
method must be defined.
- fields
Sequence of field definitions (may be empty). Each field is a dictionary with the following properties:
- name
Required
- type
A integer value of type ogr.OFT_ (from the SWIG Python bindings), or one of the following string values:
String
,Integer
,Integer16
,Integer64
,Boolean
,Real
,Float
,Binary
,Date
,Time
,DateTime
If that attribute is not set, a
fields
method must be defined and return such a sequence.
- geometry_fields
Sequence of geometry field definitions (may be empty). Each field is a dictionary with the following properties:
- name
Required. May be empty
- type
Required. A integer value of type ogr.wkb_ (from the SWIG Python bindings), or one of the following string values:
Unknown
,Point
,LineString
,Polygon
,MultiPoint
,MultiLineString
,MultiPolygon
,GeometryCollections
or all other values returned byOGRGeometryTypeToName()
- srs
The SRS attached to the geometry field as a string that can be ingested by
OGRSpatialReference::SetFromUserInput()
, such as a PROJ string, WKT string, orAUTHORITY:CODE
.
If that attribute is not set, a
geometry_fields
method must be defined and return such a sequence.
The following attributes are optional:
- fid_name
Feature ID column name, of type string. May be empty string. If not set, a
fid_name
method may be defined.
- metadata
A dictionary of key: value strings, corresponding to metadata of the default metadata domain. Alternatively, a
metadata
method that accepts a domain argument may be defined.
- iterator_honour_attribute_filter
Can be set to True if the feature iterator takes into account the
attribute_filter
attribute that can be set on the layer.
- iterator_honour_spatial_filter
Can be set to True if the feature iterator takes into account the
spatial_filter
attribute that can be set on the layer.
- feature_count_honour_attribute_filter
Can be set to True if the feature_count method takes into account the
attribute_filter
attribute that can be set on the layer.
- feature_count_honour_spatial_filter
Can be set to True if the feature_count method takes into account the
spatial_filter
attribute that can be set on the layer.
Feature iterator
The Layer class must implement the iterator interface, so typically with
a __iter__
method.
The resulting iterator must produce dictionaries for each feature's content. The keys allowed in the returned dictionary are:
- id
Strongly recommended. The value must be an integer to be recognized as a FID.
- type
Required. The value must be the string
"OGRFeature"
- fields
Required. The value must be either a dictionary whose keys are field names; or None
- geometry_fields
Required. the value must be a dictionary whose keys are geometry field names (possibly the empty string for unnamed geometry columns); or None.
The value of each key must be either a geometry encoded as a WKT string; a geometry encoded as ISO WKB as a bytes-like object; or None.
- style
Optional. The value must be a string conforming to the Feature Style Specification.
Filtering
By default, any attribute or spatial filter set by the user of the OGR API will be evaluated by the generic C++ side of the driver, by iterating over all features of the layer.
If the iterator_honour_attribute_filter
(resp. iterator_honour_spatial_filter
)
attribute of the layer object is set to True
, the attribute filter (resp.
spatial filter) must be honoured by the feature iterator method.
The attribute filter is set in the attribute_filter
attribute of the
layer object. It is a string conforming to OGR SQL.
When the attribute filter is changed by the OGR API, the attribute_filter_changed
optional method is called (see below paragraph about optional methods).
An implementation of attribute_filter_changed
may decide to fallback on
evaluation by the generic C++ side of the driver by calling the SetAttributeFilter
method (see below passthrough example)
The geometry filter is set in the spatial_filter
attribute of the
layer object. It is a string encoding as ISO WKT. It is the responsibility of
the user of the OGR API to express it in the CRS of the layer.
When the attribute filter is changed by the OGR API, the spatial_filter_changed
optional method is called (see below paragraph about optional methods).
An implementation of spatial_filter_changed
may decide to fallback on
evaluation by the generic C++ side of the driver by calling the SetSpatialFilter
method (see below passthrough example)
Optional methods
The following methods may be optionally implemented:
- extent(self, force_computation)
- Returns:
the list [xmin,ymin,xmax,ymax] with the spatial extent of the layer.
- feature_count(self, force_computation)
- Returns:
the number of features of the layer.
If self.feature_count_honour_attribute_filter or self.feature_count_honour_spatial_filter are set to True, the attribute filter and/or spatial filter must be honoured by this method.
- feature_by_id(self, fid)
- Parameters:
fid (int) -- feature ID
- Returns:
a feature object in one of the formats of the
__next__
method described above, or None if no object matches fid
- attribute_filter_changed(self)
This method is called whenever self.attribute_filter has been changed. It is the opportunity for the driver to potentially change the value of self.iterator_honour_attribute_filter or feature_count_honour_attribute_filter attributes.
- spatial_filter_changed(self)
This method is called whenever self.spatial_filter has been changed (its value is a geometry encoded in WKT) It is the opportunity for the driver to potentially change the value of self.iterator_honour_spatial_filter or feature_count_honour_spatial_filter attributes.
- test_capability(self, cap)
- Parameters:
string (cap) -- potential values are BaseLayer.FastGetExtent, BaseLayer.FastSpatialFilter, BaseLayer.FastFeatureCount, BaseLayer.RandomRead, BaseLayer.StringsAsUTF8 or other strings supported by
OGRLayer::TestCapability()
- Returns:
True if the capability is supported, False otherwise.
Full example
The following example is a passthrough driver that forwards the calls to the
SWIG Python GDAL API. It has no practical use, and is just intended to show
case most possible uses of the API. A real-world driver will only use part of the
API demonstrated. For example, the passthrough driver implements attribute and
spatial filters in a completely dummy way, by calling back the C++ part of the
driver. The iterator_honour_attribute_filter
and iterator_honour_spatial_filter
attributes, and the attribute_filter_changed
and spatial_filter_changed
method implementations, could have omitted with the same result.
The connection strings recognized by the drivers are
PASSHTROUGH:connection_string_supported_by_non_python_drivers
. Note that
the prefixing by the driver name is absolutely not a requirement, but something
specific to this particular driver which is a bit artificial (without the prefix,
the connection string would go directly to the native driver). The CityJSON
driver mentioned in the Other examples paragraph does
not need it.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# This code is in the public domain, so as to serve as a template for
# real-world plugins.
# or, at the choice of the licensee,
# Copyright 2019 Even Rouault
# SPDX-License-Identifier: MIT
# gdal: DRIVER_NAME = "PASSTHROUGH"
# API version(s) supported. Must include 1 currently
# gdal: DRIVER_SUPPORTED_API_VERSION = [1]
# gdal: DRIVER_DCAP_VECTOR = "YES"
# gdal: DRIVER_DMD_LONGNAME = "Passthrough driver"
# gdal: DRIVER_DMD_CONNECTION_PREFIX = "PASSTHROUGH:"
from osgeo import gdal, ogr
from gdal_python_driver import BaseDriver, BaseDataset, BaseLayer
class Layer(BaseLayer):
def __init__(self, gdal_layer):
self.gdal_layer = gdal_layer
self.name = gdal_layer.GetName()
self.fid_name = gdal_layer.GetFIDColumn()
self.metadata = gdal_layer.GetMetadata_Dict()
self.iterator_honour_attribute_filter = True
self.iterator_honour_spatial_filter = True
self.feature_count_honour_attribute_filter = True
self.feature_count_honour_spatial_filter = True
def fields(self):
res = []
layer_defn = self.gdal_layer.GetLayerDefn()
for i in range(layer_defn.GetFieldCount()):
ogr_field_def = layer_defn.GetFieldDefn(i)
field_def = {"name": ogr_field_def.GetName(),
"type": ogr_field_def.GetType()}
res.append(field_def)
return res
def geometry_fields(self):
res = []
layer_defn = self.gdal_layer.GetLayerDefn()
for i in range(layer_defn.GetGeomFieldCount()):
ogr_field_def = layer_defn.GetGeomFieldDefn(i)
field_def = {"name": ogr_field_def.GetName(),
"type": ogr_field_def.GetType()}
srs = ogr_field_def.GetSpatialRef()
if srs:
field_def["srs"] = srs.ExportToWkt()
res.append(field_def)
return res
def test_capability(self, cap):
if cap in (BaseLayer.FastGetExtent, BaseLayer.StringsAsUTF8,
BaseLayer.RandomRead, BaseLayer.FastFeatureCount):
return self.gdal_layer.TestCapability(cap)
return False
def extent(self, force_computation):
# Impedance mismatch between SWIG GetExtent() and the Python
# driver API
minx, maxx, miny, maxy = self.gdal_layer.GetExtent(force_computation)
return [minx, miny, maxx, maxy]
def feature_count(self, force_computation):
# Dummy implementation: we call back the generic C++ implementation
return self.gdal_layer.GetFeatureCount(True)
def attribute_filter_changed(self):
# Dummy implementation: we call back the generic C++ implementation
if self.attribute_filter:
self.gdal_layer.SetAttributeFilter(str(self.attribute_filter))
else:
self.gdal_layer.SetAttributeFilter(None)
def spatial_filter_changed(self):
# Dummy implementation: we call back the generic C++ implementation
# the 'inf' test is just for a test_ogrsf oddity
if self.spatial_filter and 'inf' not in self.spatial_filter:
self.gdal_layer.SetSpatialFilter(
ogr.CreateGeometryFromWkt(self.spatial_filter))
else:
self.gdal_layer.SetSpatialFilter(None)
def _translate_feature(self, ogr_f):
fields = {}
layer_defn = ogr_f.GetDefnRef()
for i in range(ogr_f.GetFieldCount()):
if ogr_f.IsFieldSet(i):
fields[layer_defn.GetFieldDefn(i).GetName()] = ogr_f.GetField(i)
geom_fields = {}
for i in range(ogr_f.GetGeomFieldCount()):
g = ogr_f.GetGeomFieldRef(i)
if g:
geom_fields[layer_defn.GetGeomFieldDefn(
i).GetName()] = g.ExportToIsoWKb()
return {'id': ogr_f.GetFID(),
'type': 'OGRFeature',
'style': ogr_f.GetStyleString(),
'fields': fields,
'geometry_fields': geom_fields}
def __iter__(self):
for f in self.gdal_layer:
yield self._translate_feature(f)
def feature_by_id(self, fid):
ogr_f = self.gdal_layer.GetFeature(fid)
if not ogr_f:
return None
return self._translate_feature(ogr_f)
class Dataset(BaseDataset):
def __init__(self, gdal_ds):
self.gdal_ds = gdal_ds
self.layers = [Layer(gdal_ds.GetLayer(idx))
for idx in range(gdal_ds.GetLayerCount())]
self.metadata = gdal_ds.GetMetadata_Dict()
def close(self):
del self.gdal_ds
self.gdal_ds = None
class Driver(BaseDriver):
def _identify(self, filename):
prefix = 'PASSTHROUGH:'
if not filename.startswith(prefix):
return None
return gdal.OpenEx(filename[len(prefix):], gdal.OF_VECTOR)
def identify(self, filename, first_bytes, open_flags, open_options={}):
return self._identify(filename) is not None
def open(self, filename, first_bytes, open_flags, open_options={}):
gdal_ds = self._identify(filename)
if not gdal_ds:
return None
return Dataset(gdal_ds)
Other examples
Other examples, including a CityJSON driver, may be found at https://github.com/OSGeo/gdal/tree/master/examples/pydrivers