gdal vector pipeline

Added in version 3.11.

Process a vector dataset.

Synopsis

Usage: gdal vector pipeline [OPTIONS] <PIPELINE>

Process a vector dataset.

Positional arguments:

Common Options:
  -h, --help              Display help message and exit
  --json-usage            Display usage as JSON document and exit
  --config <KEY>=<VALUE>  Configuration option [may be repeated]
  --progress              Display progress bar

<PIPELINE> is of the form: read|concat [READ-OPTIONS] ( ! <STEP-NAME> [STEP-OPTIONS] )* ! write [WRITE-OPTIONS]

A pipeline chains several steps, separated with the ! (quotation mark) character. The first step must be read or concat, and the last one write. Each step has its own positional or non-positional arguments. Apart from read, concat and write, all other steps can potentially be used several times in a pipeline.

Potential steps are:

  • read

* read [OPTIONS] <INPUT>
------------------------

Read a vector dataset.

Positional arguments:
  -i, --input <INPUT>                       Input vector datasets [required]

Options:
  -l, --layer, --input-layer <INPUT-LAYER>  Input layer name(s) [may be repeated]

Advanced Options:
  --if, --input-format <INPUT-FORMAT>       Input formats [may be repeated]
  --oo, --open-option <KEY>=<VALUE>         Open options [may be repeated]
  • concat

* concat [OPTIONS] <INPUT>...
-----------------------------

Concatenate vector datasets.

Positional arguments:
  -i, --input <INPUT>                                        Input vector datasets [1.. values] [required]

Options:
  -l, --layer, --input-layer <INPUT-LAYER>                   Input layer name(s) [may be repeated]
  --mode <MODE>                                              Determine the strategy to create output layers from source layers . MODE=merge-per-layer-name|stack|single (default: merge-per-layer-name)
  --output-layer <OUTPUT-LAYER>                              Name of the output vector layer (single mode), or template to name the output vector layers (stack mode)
  --source-layer-field-name <SOURCE-LAYER-FIELD-NAME>        Name of the new field to add to contain identificoncation of the source layer, with value determined from 'source-layer-field-content'
  --source-layer-field-content <SOURCE-LAYER-FIELD-CONTENT>  A string, possibly using {AUTO_NAME}, {DS_NAME}, {DS_BASENAME}, {DS_INDEX}, {LAYER_NAME}, {LAYER_INDEX}
  --field-strategy <FIELD-STRATEGY>                          How to determine target fields from source fields. FIELD-STRATEGY=union|intersection (default: union)
  -s, --src-crs <SRC-CRS>                                    Source CRS
  -d, --dst-crs <DST-CRS>                                    Destination CRS

Advanced Options:
  --if, --input-format <INPUT-FORMAT>                        Input formats [may be repeated]
  --oo, --open-option <KEY>=<VALUE>                          Open options [may be repeated]

Details for options can be found in gdal vector concat.

  • clip

* clip [OPTIONS]
----------------

Clip a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>    Set active layer (if not specified, all)
  --bbox <BBOX>                    Clipping bounding box as xmin,ymin,xmax,ymax
                                   Mutually exclusive with --geometry, --like
  --bbox-crs <BBOX-CRS>            CRS of clipping bounding box
  --geometry <GEOMETRY>            Clipping geometry (WKT or GeoJSON)
                                   Mutually exclusive with --bbox, --like
  --geometry-crs <GEOMETRY-CRS>    CRS of clipping geometry
  --like <DATASET>                 Dataset to use as a template for bounds
                                   Mutually exclusive with --bbox, --geometry
  --like-sql <SELECT-STATEMENT>    SELECT statement to run on the 'like' dataset
                                   Mutually exclusive with --like-where
  --like-layer <LAYER-NAME>        Name of the layer of the 'like' dataset
  --like-where <WHERE-EXPRESSION>  WHERE SQL clause to run on the 'like' dataset
                                   Mutually exclusive with --like-sql

Details for options can be found in gdal vector clip.

  • edit

* edit [OPTIONS]
----------------

Edit metadata of a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>    Set active layer (if not specified, all)
  --geometry-type <GEOMETRY-TYPE>  Layer geometry type
  --crs <CRS>                      Override CRS (without reprojection)
  --metadata <KEY>=<VALUE>         Add/update dataset metadata item [may be repeated]
  --unset-metadata <KEY>           Remove dataset metadata item [may be repeated]
  --layer-metadata <KEY>=<VALUE>   Add/update layer metadata item [may be repeated]
  --unset-layer-metadata <KEY>     Remove layer metadata item [may be repeated]

Details for options can be found in gdal vector edit.

  • filter

* filter [OPTIONS]
------------------

Filter a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>  Set active layer (if not specified, all)
  --bbox <BBOX>                  Bounding box as xmin,ymin,xmax,ymax
  --where <WHERE>|@<filename>    Attribute query in a restricted form of the queries used in the SQL WHERE statement

Details for options can be found in gdal vector filter.

  • geom

* geom <COMMAND> [OPTIONS]
where <COMMAND> is one of:
  - buffer:              Compute a buffer around geometries of a vector dataset.
  - explode-collections: Explode geometries of type collection of a vector dataset.
  - make-valid:          Fix validity of geometries of a vector dataset.
  - segmentize:          Segmentize geometries of a vector dataset.
  - set-type:            Modify the geometry type of a vector dataset.
  - simplify:            Simplify geometries of a vector dataset.
  - swap-xy:             Swap X and Y coordinates of geometries of a vector dataset.

Details for options can be found in gdal vector geom.

  • reproject

* reproject [OPTIONS]
---------------------

Reproject a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>  Set active layer (if not specified, all)
  -s, --src-crs <SRC-CRS>        Source CRS
  -d, --dst-crs <DST-CRS>        Destination CRS [required]

Details for options can be found in gdal vector reproject.

  • select

* select [OPTIONS] <FIELDS>
---------------------------

Select a subset of fields from a vector dataset.

Positional arguments:
  --fields <FIELDS>              Fields to select (or exclude if --exclude) [may be repeated] [required]

Options:
  --active-layer <ACTIVE-LAYER>  Set active layer (if not specified, all)
  --exclude                      Exclude specified fields
                                 Mutually exclusive with --ignore-missing-fields
  --ignore-missing-fields        Ignore missing fields
                                 Mutually exclusive with --exclude

Details for options can be found in gdal vector select.

  • sql

* sql [OPTIONS] <statement>|@<filename>
---------------------------------------

Apply SQL statement(s) to a dataset.

Positional arguments:
  --sql <statement>|@<filename>      SQL statement(s) [may be repeated] [required]

Options:
  -l, --output-layer <OUTPUT-LAYER>  Output layer name(s) [may be repeated]
  --dialect <DIALECT>                SQL dialect (e.g. OGRSQL, SQLITE)

Details for options can be found in gdal vector sql.

  • write

* write [OPTIONS] <OUTPUT>
--------------------------

Write a vector dataset.

Positional arguments:
  -o, --output <OUTPUT>                                Output vector dataset [required]

Options:
  -f, --of, --format, --output-format <OUTPUT-FORMAT>  Output format ("GDALG" allowed)
  --co, --creation-option <KEY>=<VALUE>                Creation option [may be repeated]
  --lco, --layer-creation-option <KEY>=<VALUE>         Layer creation option [may be repeated]
  --overwrite                                          Whether overwriting existing output is allowed
  --update                                             Whether to open existing dataset in update mode
  --overwrite-layer                                    Whether overwriting existing layer is allowed
  --append                                             Whether appending to existing layer is allowed
  -l, --output-layer <OUTPUT-LAYER>                    Output layer name

Description

gdal vector pipeline can be used to process a vector dataset and perform various processing steps.

GDALG output (on-the-fly / streamed dataset)

A pipeline can be serialized as a JSON file using the GDALG output format. The resulting file can then be opened as a vector dataset using the GDALG: GDAL Streamed Algorithm driver, and apply the specified pipeline in a on-the-fly / streamed way.

The command_line member of the JSON file should nominally be the whole command line without the final write step, and is what is generated by gdal vector pipeline ! .... ! write out.gdalg.json.

{
    "type": "gdal_streamed_alg",
    "command_line": "gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632"
}

The final write step can be added but if so it must explicitly specify the stream output format and a non-significant output dataset name.

{
    "type": "gdal_streamed_alg",
    "command_line": "gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write --output-format=streamed streamed_dataset"
}

Examples

Example 1: Reproject a GeoPackage file to CRS EPSG:32632 ("WGS 84 / UTM zone 32N")

$ gdal vector pipeline --progress ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write out.gpkg --overwrite

Example 2: Serialize the command of a reprojection of a GeoPackage file in a GDALG file, and later read it

$ gdal vector pipeline --progress ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write in_epsg_32632.gdalg.json --overwrite
$ gdal vector info in_epsg_32632.gdalg.json

Example 3: None

Union 2 source shapefiles (with similar structure), reproject them to EPSG:32632, keep only cities larger than 1 million inhabitants and write to a GeoPackage

$ gdal vector pipeline --progress ! concat --single --dst-crs=EPSG:32632 france.shp belgium.shp ! filter --where "pop > 1e6" ! write out.gpkg --overwrite