Vector driver in Python implementation tutorial

New in version 3.1.

Introduction

Since GDAL 3.1, the capability of writing read-only vector drivers in Python has been added. It is strongly advised to read the Vector driver implementation tutorial first, which will give the general principles of how a vector driver works.

This capability does not require the use of the GDAL/OGR SWIG Python bindings (but a vector Python driver may use them.)

Note: per project policies, this is considered as an “experimental” feature and the GDAL project will not accept such Python drivers to be included in the GDAL repository. Drivers aiming at inclusion in GDAL master should priorly be ported to C++. The rationale for this is that:

  • the correctness of the Python code can mostly be checked at runtime, whereas C++ benefits from static analysis (at compile time, and other checkers).

  • Python code is executed under the Python Global Interpreter Lock, which makes them not scale.

  • Not all builds of GDAL have Python available.

Linking mechanism to a Python interpreter

See Linking mechanism to a Python interpreter

Driver location

Driver filenames must start with gdal_ or ogr_ and have the .py extension. They will be searched in the following directies:

  • the directory pointed by the GDAL_PYTHON_DRIVER_PATH configuration option (there may be several paths separated by : on Unix or ; on Windows)

  • if not defined, the directory pointed by the GDAL_DRIVER_PATH configuration option.

  • if not defined, in the directory (hardcoded at compilation time on Unix builds) where native plugins are located.

GDAL does not try to manage Python dependencies that are imported by the driver .py script. It is up to the user to make sure its current Python environment has all required dependencies installed.

Import section

Drivers must have the following import section to load the base classes.

from gdal_python_driver import BaseDriver, BaseDataset, BaseLayer

The gdal_python_driver module is created dynamically by GDAL and is not present on the file system.

Metadata section

In the first 1000 lines of the .py file, a number of required and optional KEY=VALUE driver directives must be defined. They are parsed by C++ code, without using the Python interpreter, so it is vital to respect the following constraints:

  • each declaration must be on a single line, and start with # gdal: DRIVER_ (space character between sharp character and gdal, and between colon character and DRIVER_)

  • the value must be a literal value of type string (except for # gdal: DRIVER_SUPPORTED_API_VERSION which can accept an array of integers), without expressions, function calls, escape sequences, etc.

  • strings may be single or double-quoted

The following directives must be declared:

  • # gdal: DRIVER_NAME = “some_name”: the short name of the driver

  • # gdal: DRIVER_SUPPORTED_API_VERSION = [1]: the API version(s) supported by the driver. Must include 1, which is the only currently supported version in GDAL 3.1

  • # gdal: DRIVER_DCAP_VECTOR = “YES”: declares a vector driver

  • # gdal: DRIVER_DMD_LONGNAME = “a longer description of the driver”

Additional directives:

  • # gdal: DRIVER_DMD_EXTENSIONS = “ext1 ext2”: list of extension(s) recognized by the driver, without the dot, and separated by space

  • # gdal: DRIVER_DMD_HELPTOPIC = “url_to_hep_page”

  • # gdal: DRIVER_DMD_OPENOPTIONLIST = xml_value where xml_value is an OptionOptionList specification, like “<OpenOptionList><Option name=’OPT1’ type=’boolean’ description=’bla’ default=’NO’/></OpenOptionList>”**

  • and all other metadata items found in gdal.h starting with GDAL_DMD_ (resp. GDAL_DCAP) by creating an item name which starts with # gdal: DRIVER_ and the value of the GDAL_DMD_ (resp. GDAL_DCAP) metadata item. For example #define GDAL_DMD_CONNECTION_PREFIX "DMD_CONNECTION_PREFIX" becomes # gdal: DRIVER_DMD_CONNECTION_PREFIX

Example:

# gdal: DRIVER_NAME = "DUMMY"
# gdal: DRIVER_SUPPORTED_API_VERSION = [1]
# gdal: DRIVER_DCAP_VECTOR = "YES"
# gdal: DRIVER_DMD_LONGNAME = "my super plugin"
# gdal: DRIVER_DMD_EXTENSIONS = "foo bar"
# gdal: DRIVER_DMD_HELPTOPIC = "http://example.com/my_help.html"

Driver class

The entry point .py script must contains a single class that inherits from gdal_python_driver.BaseDriver.

That class must define the following methods:

identify(self, filename, first_bytes, open_flags, open_options={})
Parameters
  • filename (str) – File name, or more generally, connection string.

  • first_bytes (binary) – First bytes of the file (if it is a file). At least 1024 (if the file has at least 1024 bytes), or more if a native driver in the driver probe sequence has requested more previously.

  • open_flags (int) – Open flags. To be ignored for now.

  • open_options (dict) – Open options.

Returns

True if the file is recognized by the driver, False if not, or -1 if that cannot be known from the first bytes.

open(self, filename, first_bytes, open_flags, open_options={})
Parameters
  • filename (str) – File name, or more generally, connection string.

  • first_bytes (binary) – First bytes of the file (if it is a file). At least 1024 (if the file has at least 1024 bytes), or more if a native driver in the driver probe sequence has requested more previously.

  • open_flags (int) – Open flags. To be ignored for now.

  • open_options (dict) – Open options.

Returns

an object deriving from gdal_python_driver.BaseDataset or None

Example:

# Required: class deriving from BaseDriver
class Driver(BaseDriver):

    def identify(self, filename, first_bytes, open_flags, open_options={}):
        return filename == 'DUMMY:'

    # Required
    def open(self, filename, first_bytes, open_flags, open_options={}):
        if not self.identify(filename, first_bytes, open_flags):
            return None
        return Dataset(filename)

Dataset class

The Driver.open() method on success should return an object from a class that inherits from gdal_python_driver.BaseDataset.

Layers

The role of this object is to store vector layers. There are two implementation options. If the number of layers is small or they are fast to construct, then the __init__ method can defined a layers attribute that is a sequence of objects from a class that inherits from gdal_python_driver.BaseLayer.

Example:

class Dataset(BaseDataset):

    def __init__(self, filename):
        self.layers = [Layer(filename)]

Otherwise, the following two methods should be defined:

layer_count(self)
Returns

the number of layers

layer(self, idx)
Parameters

idx (int) – Index of the layer to return. Normally between 0 and self.layer_count() - 1, but calling code might pass any value. In case of invalid index, None should be returned.

Returns

an object deriving from gdal_python_driver.BaseLayer or None. The C++ code will take care of caching that object, and this method will only be called once for a given idx value.

Example:

class Dataset(BaseDataset):

    def layer_count(self):
        return 1

    def layer(self, idx):
        return [Layer(self.filename)] if idx = 0 else None

Metadata

The dataset may define a metadata dictionary, in __init__ of key: value of type string, for the default metadata domain. Alternatively, the following method may be implemented.

metadata(self, domain)
Parameters

domain (str) – metadata domain. Empty string for the default one

Returns

None, or a dictionary of key:value pairs of type string;

Other methods

The following method may be optionaly implemented:

close(self)

Called at the destruction of the C++ peer GDALDataset object. Useful to close database connections for example.

Layer class

The Dataset object will instantiate one or several objects from a class that inherits from gdal_python_driver.BaseLayer.

Metadata, and other definitions

The following attributes are required and must defined at __init__ time:

name

Layer name, of type string. If not set, a name method must be defined.

fields

Sequence of field definitions (may be empty). Each field is a dictionary with the following properties:

name

Required

type

A integer value of type ogr.OFT_ (from the SWIG Python bindings), or one of the following string values: String, Integer, Integer16, Integer64, Boolean, Real, Float, Binary, Date, Time, DateTime

If that attribute is not set, a fields method must be defined and return such a sequence.

geometry_fields

Sequence of geometry field definitions (may be empty). Each field is a dictionary with the following properties:

name

Required. May be empty

type

Required. A integer value of type ogr.wkb_ (from the SWIG Python bindings), or one of the following string values: Unknown, Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollections or all other values returned by OGRGeometryTypeToName()

srs

The SRS attached to the geometry field as a string that can be ingested by OGRSpatialReference::SetFromUserInput(), such as a PROJ string, WKT string, or AUTHORITY:CODE.

If that attribute is not set, a geometry_fields method must be defined and return such a sequence.

The following attributes are optional:

fid_name

Feature ID column name, of type string. May be empty string. If not set, a fid_name method may be defined.

metadata

A dictionary of key: value strings, corresponding to metadata of the default metadata domain. Alternatively, a metadata method that accepts a domain argument may be defined.

iterator_honour_attribute_filter

Can be set to True if the feature iterator takes into account the attribute_filter attribute that can be set on the layer.

iterator_honour_spatial_filter

Can be set to True if the feature iterator takes into account the spatial_filter attribute that can be set on the layer.

feature_count_honour_attribute_filter

Can be set to True if the feature_count method takes into account the attribute_filter attribute that can be set on the layer.

feature_count_honour_spatial_filter

Can be set to True if the feature_count method takes into account the spatial_filter attribute that can be set on the layer.

Feature iterator

The Layer class must implement the iterator interface, so typically with a __iter__ method.

The iterator must return a dictionary with the feature content.

If self.iterator_honour_attribute_filter or self.iterator_honour_spatial_filter are set to True, the attribute filter and/or spatial filter must be honoured by this method.

Two keys allowed in the returned dictionary are:

id

Strongly recommended. The value must be of type int to be recognized as a FID by GDAL

type

Required. The value must be the string “OGRFeature”

fields

Required. The value must be a dictionary whose keys are field names, or None

geometry_fields

Required. the value must be a dictionary whose keys are geometry field names (possibly the empty string for unnamed geometry columns), or None. The value of each key must be a geometry encoded as WKT, or None.

style

Optional. The value must be a string conforming to the Feature Style Specification.

Optional methods

The following methods may be optionaly implemented:

extent(self, force_computation)
Returns

the list [xmin,ymin,xmax,ymax] with the spatial extent of the layer.

feature_count(self, force_computation)
Returns

the number of features of the layer.

If self.feature_count_honour_attribute_filter or self.feature_count_honour_spatial_filter are set to True, the attribute filter and/or spatial filter must be honoured by this method.

feature_by_id(self, fid)
Parameters

fid (int) – feature ID

Returns

a feature object in one of the formats of the __next__ method described above, or None if no object matches fid

attribute_filter_changed(self)

This method is called whenever self.attribute_filter has been changed. It is the opportunity for the driver to potentially chane the value of self.iterator_honour_attribute_filter or feature_count_honour_attribute_filter attributes.

spatial_filter_changed(self)

This method is called whenever self.spatial_filter has been changed (its value is a geometry encoded in WKT) It is the opportunity for the driver to potentially chane the value of self.iterator_honour_spatial_filter or feature_count_honour_spatial_filter attributes.

test_capability(self, cap)
Parameters

string (cap) – potential values are BaseLayer.FastGetExtent, BaseLayer.FastSpatialFilter, BaseLayer.FastFeatureCount, BaseLayer.RandomRead, BaseLayer.StringsAsUTF8 or other strings supported by OGRLayer::TestCapability()

Returns

True if the capability is supported, False otherwise.

Full example

The following example is a passthrough driver that forwards the calls to the SWIG Python GDAL API. The connection strings recognized by the drivers are “PASSHTROUGH:connection_string_supported_by_non_python_drivers”. Note that the prefixing by the driver name is absolutely not a requirement, but something specific to this particular driver which is a bit artificial (without the prefix, the connection string would go directly to the native driver). The CityJSON driver mentionned in the Other examples paragraph does not need it.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# This code is in the public domain, so as to serve as a template for
# real-world plugins.
# or, at the choice of the licensee,
# Copyright 2019 Even Rouault
# SPDX-License-Identifier: MIT

# gdal: DRIVER_NAME = "PASSTHROUGH"
# API version(s) supported. Must include 1 currently
# gdal: DRIVER_SUPPORTED_API_VERSION = [1]
# gdal: DRIVER_DCAP_VECTOR = "YES"
# gdal: DRIVER_DMD_LONGNAME = "Passthrough driver"
# gdal: DRIVER_DMD_CONNECTION_PREFIX = "PASSTHROUGH:"

from osgeo import gdal, ogr

from gdal_python_driver import BaseDriver, BaseDataset, BaseLayer

class Layer(BaseLayer):

    def __init__(self, gdal_layer):
        self.gdal_layer = gdal_layer
        self.name = gdal_layer.GetName()
        self.fid_name = gdal_layer.GetFIDColumn()
        self.metadata = gdal_layer.GetMetadata_Dict()
        self.iterator_honour_attribute_filter = True
        self.iterator_honour_spatial_filter = True
        self.feature_count_honour_attribute_filter = True
        self.feature_count_honour_spatial_filter = True

    def fields(self):
        res = []
        layer_defn = self.gdal_layer.GetLayerDefn()
        for i in range(layer_defn.GetFieldCount()):
            ogr_field_def = layer_defn.GetFieldDefn(i)
            field_def = {"name": ogr_field_def.GetName(),
                         "type": ogr_field_def.GetType()}
            res.append(field_def)
        return res

    def geometry_fields(self):
        res = []
        layer_defn = self.gdal_layer.GetLayerDefn()
        for i in range(layer_defn.GetGeomFieldCount()):
            ogr_field_def = layer_defn.GetGeomFieldDefn(i)
            field_def = {"name": ogr_field_def.GetName(),
                         "type": ogr_field_def.GetType()}
            srs = ogr_field_def.GetSpatialRef()
            if srs:
                field_def["srs"] = srs.ExportToWkt()
            res.append(field_def)
        return res

    def test_capability(self, cap):
        if cap in (BaseLayer.FastGetExtent, BaseLayer.StringsAsUTF8,
                BaseLayer.RandomRead, BaseLayer.FastFeatureCount):
            return self.gdal_layer.TestCapability(cap)
        return False

    def extent(self, force_computation):
        # Impedance mismatch between SWIG GetExtent() and the Python
        # driver API
        minx, maxx, miny, maxy = self.gdal_layer.GetExtent(force_computation)
        return [minx, miny, maxx, maxy]

    def feature_count(self, force_computation):
        return self.gdal_layer.GetFeatureCount(True)

    def attribute_filter_changed(self):
        if self.attribute_filter:
            self.gdal_layer.SetAttributeFilter(str(self.attribute_filter))
        else:
            self.gdal_layer.SetAttributeFilter(None)

    def spatial_filter_changed(self):
        # the 'inf' test is just for a test_ogrsf oddity
        if self.spatial_filter and 'inf' not in self.spatial_filter:
            self.gdal_layer.SetSpatialFilter(
                ogr.CreateGeometryFromWkt(self.spatial_filter))
        else:
            self.gdal_layer.SetSpatialFilter(None)

    def _translate_feature(self, ogr_f):
        fields = {}
        layer_defn = ogr_f.GetDefnRef()
        for i in range(ogr_f.GetFieldCount()):
            if ogr_f.IsFieldSet(i):
                fields[layer_defn.GetFieldDefn(i).GetName()] = ogr_f.GetField(i)
        geom_fields = {}
        for i in range(ogr_f.GetGeomFieldCount()):
            g = ogr_f.GetGeomFieldRef(i)
            if g:
                geom_fields[layer_defn.GetGeomFieldDefn(
                    i).GetName()] = g.ExportToIsoWkt()
        return {'id': ogr_f.GetFID(),
                'type': 'OGRFeature',
                'style': ogr_f.GetStyleString(),
                'fields': fields,
                'geometry_fields': geom_fields}

    def __iter__(self):
        for f in self.gdal_layer:
            yield self._translate_feature(f)

    def feature_by_id(self, fid):
        ogr_f = self.gdal_layer.GetFeature(fid)
        if not ogr_f:
            return None
        return self._translate_feature(ogr_f)

class Dataset(BaseDataset):

    def __init__(self, gdal_ds):
        self.gdal_ds = gdal_ds
        self.layers = [Layer(gdal_ds.GetLayer(idx))
                    for idx in range(gdal_ds.GetLayerCount())]
        self.metadata = gdal_ds.GetMetadata_Dict()

    def close(self):
        del self.gdal_ds
        self.gdal_ds = None


class Driver(BaseDriver):

    def _identify(self, filename):
        prefix = 'PASSTHROUGH:'
        if not filename.startswith(prefix):
            return None
        return gdal.OpenEx(filename[len(prefix):], gdal.OF_VECTOR)

    def identify(self, filename, first_bytes, open_flags, open_options={}):
        return self._identify(filename) is not None

    def open(self, filename, first_bytes, open_flags, open_options={}):
        gdal_ds = self._identify(filename)
        if not gdal_ds:
            return None
        return Dataset(gdal_ds)

Other examples

Other examples, including a CityJSON driver, may be found at https://github.com/OSGeo/gdal/tree/master/gdal/examples/pydrivers