Vector driver in Python implementation tutorial

Added in version 3.1.

Introduction

Since GDAL 3.1, the capability of writing read-only vector drivers in Python has been added. It is strongly advised to read the Vector driver implementation tutorial first, which will give the general principles of how a vector driver works.

This capability does not require the use of the GDAL/OGR SWIG Python bindings (but a vector Python driver may use them.)

Note: per project policies, this is considered as an "experimental" feature and the GDAL project will not accept such Python drivers to be included in the GDAL repository. Drivers aiming at inclusion in GDAL master should priorly be ported to C++. The rationale for this is that:

  • the correctness of the Python code can mostly be checked at runtime, whereas C++ benefits from static analysis (at compile time, and other checkers).

  • Python code is executed under the Python Global Interpreter Lock, which makes them not scale.

  • Not all builds of GDAL have Python available.

Linking mechanism to a Python interpreter

See Linking mechanism to a Python interpreter

Driver location

Driver filenames must start with gdal_ or ogr_ and have the .py extension. They will be searched in the following directies:

  • the directory pointed by the GDAL_PYTHON_DRIVER_PATH configuration option (there may be several paths separated by : on Unix or ; on Windows)

  • if not defined, the directory pointed by the GDAL_DRIVER_PATH configuration option.

  • if not defined, in the directory (hardcoded at compilation time on Unix builds) where native plugins are located.

GDAL does not try to manage Python dependencies that are imported by the driver .py script. It is up to the user to make sure its current Python environment has all required dependencies installed.

Import section

Drivers must have the following import section to load the base classes.

from gdal_python_driver import BaseDriver, BaseDataset, BaseLayer

The gdal_python_driver module is created dynamically by GDAL and is not present on the file system.

Metadata section

In the first 1000 lines of the .py file, a number of required and optional KEY=VALUE driver directives must be defined. They are parsed by C++ code, without using the Python interpreter, so it is vital to respect the following constraints:

  • each declaration must be on a single line, and start with # gdal: DRIVER_ (space character between sharp character and gdal, and between colon character and DRIVER_)

  • the value must be a literal value of type string (except for # gdal: DRIVER_SUPPORTED_API_VERSION which can accept an array of integers), without expressions, function calls, escape sequences, etc.

  • strings may be single or double-quoted

The following directives must be declared:

  • # gdal: DRIVER_NAME = "NAME": the short name of the driver

  • # gdal: DRIVER_SUPPORTED_API_VERSION = [1]: the API version(s) supported by the driver. Must include 1, which is the only currently supported version in GDAL 3.1

  • # gdal: DRIVER_DCAP_VECTOR = "YES": declares a vector driver

  • # gdal: DRIVER_DMD_LONGNAME = "a longer name of the driver"

Additional directives:

  • # gdal: DRIVER_DMD_EXTENSIONS = "ext1 ext2": list of extension(s) recognized by the driver, without the dot, and separated by space

  • # gdal: DRIVER_DMD_HELPTOPIC = "https://example.com/my_help.html": URL to a help page for the driver

  • # gdal: DRIVER_DMD_OPENOPTIONLIST = "<OpenOptionList><Option name='OPT1' type='boolean' description='bla' default='NO'/></OpenOptionList>" where the XML is an OptionOptionList.

  • and all other metadata items found in gdal.h starting with GDAL_DMD_ or GDAL_DCAP by creating an item name which starts with # gdal: DRIVER_ and the value of the GDAL_DMD_ or GDAL_DCAP metadata item. For example #define GDAL_DMD_CONNECTION_PREFIX "DMD_CONNECTION_PREFIX" becomes # gdal: DRIVER_DMD_CONNECTION_PREFIX

Example:

# gdal: DRIVER_NAME = "DUMMY"
# gdal: DRIVER_SUPPORTED_API_VERSION = [1]
# gdal: DRIVER_DCAP_VECTOR = "YES"
# gdal: DRIVER_DMD_LONGNAME = "my dummy plugin"
# gdal: DRIVER_DMD_EXTENSIONS = "foo bar"
# gdal: DRIVER_DMD_HELPTOPIC = "https://example.com/my_help.html"

Driver class

The entry point .py script must contains a single class that inherits from gdal_python_driver.BaseDriver.

That class must define the following methods:

identify(self, filename, first_bytes, open_flags, open_options={})
Parameters:
  • filename (str) -- File name, or more generally, connection string.

  • first_bytes (binary) -- First bytes of the file (if it is a file). At least 1024 (if the file has at least 1024 bytes), or more if a native driver in the driver probe sequence has requested more previously.

  • open_flags (int) -- Open flags. To be ignored for now.

  • open_options (dict) -- Open options.

Returns:

True if the file is recognized by the driver, False if not, or -1 if that cannot be known from the first bytes.

open(self, filename, first_bytes, open_flags, open_options={})
Parameters:
  • filename (str) -- File name, or more generally, connection string.

  • first_bytes (binary) -- First bytes of the file (if it is a file). At least 1024 (if the file has at least 1024 bytes), or more if a native driver in the driver probe sequence has requested more previously.

  • open_flags (int) -- Open flags. To be ignored for now.

  • open_options (dict) -- Open options.

Returns:

an object deriving from gdal_python_driver.BaseDataset or None

Example:

# Required: class deriving from BaseDriver
class Driver(BaseDriver):

    def identify(self, filename, first_bytes, open_flags, open_options={}):
        return filename == 'DUMMY:'

    # Required
    def open(self, filename, first_bytes, open_flags, open_options={}):
        if not self.identify(filename, first_bytes, open_flags):
            return None
        return Dataset(filename)

Dataset class

The Driver.open() method on success should return an object from a class that inherits from gdal_python_driver.BaseDataset.

Layers

The role of this object is to store vector layers. There are two implementation options. If the number of layers is small or they are fast to construct, then the __init__ method can defined a layers attribute that is a sequence of objects from a class that inherits from gdal_python_driver.BaseLayer.

Example:

class Dataset(BaseDataset):

    def __init__(self, filename):
        self.layers = [Layer(filename)]

Otherwise, the following two methods should be defined:

layer_count(self)
Returns:

the number of layers

layer(self, idx)
Parameters:

idx (int) -- Index of the layer to return. Normally between 0 and self.layer_count() - 1, but calling code might pass any value. In case of invalid index, None should be returned.

Returns:

an object deriving from gdal_python_driver.BaseLayer or None. The C++ code will take care of caching that object, and this method will only be called once for a given idx value.

Example:

class Dataset(BaseDataset):

    def layer_count(self):
        return 1

    def layer(self, idx):
        return [Layer(self.filename)] if idx = 0 else None

Metadata

The dataset may define a metadata dictionary, in __init__ of key: value of type string, for the default metadata domain. Alternatively, the following method may be implemented.

metadata(self, domain)
Parameters:

domain (str) -- metadata domain. Empty string for the default one

Returns:

None, or a dictionary of key:value pairs of type string;

Other methods

The following method may be optionally implemented:

close(self)

Called at the destruction of the C++ peer GDALDataset object. Useful to close database connections for example.

Layer class

The Dataset object will instantiate one or several objects from a class that inherits from gdal_python_driver.BaseLayer.

Metadata, and other definitions

The following attributes are required and must defined at __init__ time:

name

Layer name, of type string. If not set, a name method must be defined.

fields

Sequence of field definitions (may be empty). Each field is a dictionary with the following properties:

name

Required

type

A integer value of type ogr.OFT_ (from the SWIG Python bindings), or one of the following string values: String, Integer, Integer16, Integer64, Boolean, Real, Float, Binary, Date, Time, DateTime

If that attribute is not set, a fields method must be defined and return such a sequence.

geometry_fields

Sequence of geometry field definitions (may be empty). Each field is a dictionary with the following properties:

name

Required. May be empty

type

Required. A integer value of type ogr.wkb_ (from the SWIG Python bindings), or one of the following string values: Unknown, Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollections or all other values returned by OGRGeometryTypeToName()

srs

The SRS attached to the geometry field as a string that can be ingested by OGRSpatialReference::SetFromUserInput(), such as a PROJ string, WKT string, or AUTHORITY:CODE.

If that attribute is not set, a geometry_fields method must be defined and return such a sequence.

The following attributes are optional:

fid_name

Feature ID column name, of type string. May be empty string. If not set, a fid_name method may be defined.

metadata

A dictionary of key: value strings, corresponding to metadata of the default metadata domain. Alternatively, a metadata method that accepts a domain argument may be defined.

iterator_honour_attribute_filter

Can be set to True if the feature iterator takes into account the attribute_filter attribute that can be set on the layer.

iterator_honour_spatial_filter

Can be set to True if the feature iterator takes into account the spatial_filter attribute that can be set on the layer.

feature_count_honour_attribute_filter

Can be set to True if the feature_count method takes into account the attribute_filter attribute that can be set on the layer.

feature_count_honour_spatial_filter

Can be set to True if the feature_count method takes into account the spatial_filter attribute that can be set on the layer.

Feature iterator

The Layer class must implement the iterator interface, so typically with a __iter__ method.

The resulting iterator must produce dictionaries for each feature's content. The keys allowed in the returned dictionary are:

id

Strongly recommended. The value must be an integer to be recognized as a FID.

type

Required. The value must be the string "OGRFeature"

fields

Required. The value must be either a dictionary whose keys are field names; or None

geometry_fields

Required. the value must be a dictionary whose keys are geometry field names (possibly the empty string for unnamed geometry columns); or None.

The value of each key must be either a geometry encoded as a WKT string; a geometry encoded as ISO WKB as a bytes-like object; or None.

style

Optional. The value must be a string conforming to the Feature Style Specification.

Filtering

By default, any attribute or spatial filter set by the user of the OGR API will be evaluated by the generic C++ side of the driver, by iterating over all features of the layer.

If the iterator_honour_attribute_filter (resp. iterator_honour_spatial_filter) attribute of the layer object is set to True, the attribute filter (resp. spatial filter) must be honoured by the feature iterator method.

The attribute filter is set in the attribute_filter attribute of the layer object. It is a string conforming to OGR SQL. When the attribute filter is changed by the OGR API, the attribute_filter_changed optional method is called (see below paragraph about optional methods). An implementation of attribute_filter_changed may decide to fallback on evaluation by the generic C++ side of the driver by calling the SetAttributeFilter method (see below passthrough example)

The geometry filter is set in the spatial_filter attribute of the layer object. It is a string encoding as ISO WKT. It is the responsibility of the user of the OGR API to express it in the CRS of the layer. When the attribute filter is changed by the OGR API, the spatial_filter_changed optional method is called (see below paragraph about optional methods). An implementation of spatial_filter_changed may decide to fallback on evaluation by the generic C++ side of the driver by calling the SetSpatialFilter method (see below passthrough example)

Optional methods

The following methods may be optionally implemented:

extent(self, force_computation)
Returns:

the list [xmin,ymin,xmax,ymax] with the spatial extent of the layer.

feature_count(self, force_computation)
Returns:

the number of features of the layer.

If self.feature_count_honour_attribute_filter or self.feature_count_honour_spatial_filter are set to True, the attribute filter and/or spatial filter must be honoured by this method.

feature_by_id(self, fid)
Parameters:

fid (int) -- feature ID

Returns:

a feature object in one of the formats of the __next__ method described above, or None if no object matches fid

attribute_filter_changed(self)

This method is called whenever self.attribute_filter has been changed. It is the opportunity for the driver to potentially change the value of self.iterator_honour_attribute_filter or feature_count_honour_attribute_filter attributes.

spatial_filter_changed(self)

This method is called whenever self.spatial_filter has been changed (its value is a geometry encoded in WKT) It is the opportunity for the driver to potentially change the value of self.iterator_honour_spatial_filter or feature_count_honour_spatial_filter attributes.

test_capability(self, cap)
Parameters:

string (cap) -- potential values are BaseLayer.FastGetExtent, BaseLayer.FastSpatialFilter, BaseLayer.FastFeatureCount, BaseLayer.RandomRead, BaseLayer.StringsAsUTF8 or other strings supported by OGRLayer::TestCapability()

Returns:

True if the capability is supported, False otherwise.

Full example

The following example is a passthrough driver that forwards the calls to the SWIG Python GDAL API. It has no practical use, and is just intended to show case most possible uses of the API. A real-world driver will only use part of the API demonstrated. For example, the passthrough driver implements attribute and spatial filters in a completely dummy way, by calling back the C++ part of the driver. The iterator_honour_attribute_filter and iterator_honour_spatial_filter attributes, and the attribute_filter_changed and spatial_filter_changed method implementations, could have omitted with the same result.

The connection strings recognized by the drivers are PASSHTROUGH:connection_string_supported_by_non_python_drivers. Note that the prefixing by the driver name is absolutely not a requirement, but something specific to this particular driver which is a bit artificial (without the prefix, the connection string would go directly to the native driver). The CityJSON driver mentioned in the Other examples paragraph does not need it.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# This code is in the public domain, so as to serve as a template for
# real-world plugins.
# or, at the choice of the licensee,
# Copyright 2019 Even Rouault
# SPDX-License-Identifier: MIT

# gdal: DRIVER_NAME = "PASSTHROUGH"
# API version(s) supported. Must include 1 currently
# gdal: DRIVER_SUPPORTED_API_VERSION = [1]
# gdal: DRIVER_DCAP_VECTOR = "YES"
# gdal: DRIVER_DMD_LONGNAME = "Passthrough driver"
# gdal: DRIVER_DMD_CONNECTION_PREFIX = "PASSTHROUGH:"

from osgeo import gdal, ogr

from gdal_python_driver import BaseDriver, BaseDataset, BaseLayer

class Layer(BaseLayer):

    def __init__(self, gdal_layer):
        self.gdal_layer = gdal_layer
        self.name = gdal_layer.GetName()
        self.fid_name = gdal_layer.GetFIDColumn()
        self.metadata = gdal_layer.GetMetadata_Dict()
        self.iterator_honour_attribute_filter = True
        self.iterator_honour_spatial_filter = True
        self.feature_count_honour_attribute_filter = True
        self.feature_count_honour_spatial_filter = True

    def fields(self):
        res = []
        layer_defn = self.gdal_layer.GetLayerDefn()
        for i in range(layer_defn.GetFieldCount()):
            ogr_field_def = layer_defn.GetFieldDefn(i)
            field_def = {"name": ogr_field_def.GetName(),
                         "type": ogr_field_def.GetType()}
            res.append(field_def)
        return res

    def geometry_fields(self):
        res = []
        layer_defn = self.gdal_layer.GetLayerDefn()
        for i in range(layer_defn.GetGeomFieldCount()):
            ogr_field_def = layer_defn.GetGeomFieldDefn(i)
            field_def = {"name": ogr_field_def.GetName(),
                         "type": ogr_field_def.GetType()}
            srs = ogr_field_def.GetSpatialRef()
            if srs:
                field_def["srs"] = srs.ExportToWkt()
            res.append(field_def)
        return res

    def test_capability(self, cap):
        if cap in (BaseLayer.FastGetExtent, BaseLayer.StringsAsUTF8,
                BaseLayer.RandomRead, BaseLayer.FastFeatureCount):
            return self.gdal_layer.TestCapability(cap)
        return False

    def extent(self, force_computation):
        # Impedance mismatch between SWIG GetExtent() and the Python
        # driver API
        minx, maxx, miny, maxy = self.gdal_layer.GetExtent(force_computation)
        return [minx, miny, maxx, maxy]

    def feature_count(self, force_computation):
        # Dummy implementation: we call back the generic C++ implementation
        return self.gdal_layer.GetFeatureCount(True)

    def attribute_filter_changed(self):
        # Dummy implementation: we call back the generic C++ implementation
        if self.attribute_filter:
            self.gdal_layer.SetAttributeFilter(str(self.attribute_filter))
        else:
            self.gdal_layer.SetAttributeFilter(None)

    def spatial_filter_changed(self):
        # Dummy implementation: we call back the generic C++ implementation
        # the 'inf' test is just for a test_ogrsf oddity
        if self.spatial_filter and 'inf' not in self.spatial_filter:
            self.gdal_layer.SetSpatialFilter(
                ogr.CreateGeometryFromWkt(self.spatial_filter))
        else:
            self.gdal_layer.SetSpatialFilter(None)

    def _translate_feature(self, ogr_f):
        fields = {}
        layer_defn = ogr_f.GetDefnRef()
        for i in range(ogr_f.GetFieldCount()):
            if ogr_f.IsFieldSet(i):
                fields[layer_defn.GetFieldDefn(i).GetName()] = ogr_f.GetField(i)
        geom_fields = {}
        for i in range(ogr_f.GetGeomFieldCount()):
            g = ogr_f.GetGeomFieldRef(i)
            if g:
                geom_fields[layer_defn.GetGeomFieldDefn(
                    i).GetName()] = g.ExportToIsoWKb()
        return {'id': ogr_f.GetFID(),
                'type': 'OGRFeature',
                'style': ogr_f.GetStyleString(),
                'fields': fields,
                'geometry_fields': geom_fields}

    def __iter__(self):
        for f in self.gdal_layer:
            yield self._translate_feature(f)

    def feature_by_id(self, fid):
        ogr_f = self.gdal_layer.GetFeature(fid)
        if not ogr_f:
            return None
        return self._translate_feature(ogr_f)

class Dataset(BaseDataset):

    def __init__(self, gdal_ds):
        self.gdal_ds = gdal_ds
        self.layers = [Layer(gdal_ds.GetLayer(idx))
                    for idx in range(gdal_ds.GetLayerCount())]
        self.metadata = gdal_ds.GetMetadata_Dict()

    def close(self):
        del self.gdal_ds
        self.gdal_ds = None


class Driver(BaseDriver):

    def _identify(self, filename):
        prefix = 'PASSTHROUGH:'
        if not filename.startswith(prefix):
            return None
        return gdal.OpenEx(filename[len(prefix):], gdal.OF_VECTOR)

    def identify(self, filename, first_bytes, open_flags, open_options={}):
        return self._identify(filename) is not None

    def open(self, filename, first_bytes, open_flags, open_options={}):
        gdal_ds = self._identify(filename)
        if not gdal_ds:
            return None
        return Dataset(gdal_ds)

Other examples

Other examples, including a CityJSON driver, may be found at https://github.com/OSGeo/gdal/tree/master/examples/pydrivers