gdal vector concat

Added in version 3.11.

Concatenate vector datasets

Synopsis

Usage: gdal vector concat [OPTIONS] <INPUTS>... <OUTPUT>

Concatenate vector datasets.

Positional arguments:
  -i, --input <INPUTS>                                       Input vector datasets [1.. values] [required] [not available in pipelines]
  -o, --output <OUTPUT>                                      Output vector dataset [required] [not available in pipelines]

Common Options:
  -h, --help                                                 Display help message and exit
  --json-usage                                               Display usage as JSON document and exit
  --config <KEY>=<VALUE>                                     Configuration option [may be repeated]
  -q, --quiet                                                Quiet mode (no progress bar or warning message) [not available in pipelines]

Options:
  -l, --layer, --input-layer <INPUT-LAYER>                   Input layer name(s) [may be repeated] [not available in pipelines]
  -f, --of, --format, --output-format <OUTPUT-FORMAT>        Output format ("GDALG" allowed) [not available in pipelines]
  --co, --creation-option <KEY>=<VALUE>                      Creation option [may be repeated] [not available in pipelines]
  --lco, --layer-creation-option <KEY>=<VALUE>               Layer creation option [may be repeated] [not available in pipelines]
  --overwrite                                                Whether overwriting existing output dataset is allowed [not available in pipelines]
  --update                                                   Whether to open existing dataset in update mode [not available in pipelines]
  --overwrite-layer                                          Whether overwriting existing output layer is allowed [not available in pipelines]
  --append                                                   Whether appending to existing layer is allowed [not available in pipelines]
                                                             Mutually exclusive with --upsert
  --skip-errors                                              Skip errors when writing features [not available in pipelines]
  --mode <MODE>                                              Determine the strategy to create output layers from source layers . MODE=merge-per-layer-name|stack|single (default: merge-per-layer-name)
  --output-layer <OUTPUT-LAYER>                              Name of the output vector layer (single mode), or template to name the output vector layers (stack mode)
  --source-layer-field-name <SOURCE-LAYER-FIELD-NAME>        Name of the new field to add to contain identification of the source layer, with value determined from 'source-layer-field-content'
  --source-layer-field-content <SOURCE-LAYER-FIELD-CONTENT>  A string, possibly using {AUTO_NAME}, {DS_NAME}, {DS_BASENAME}, {DS_INDEX}, {LAYER_NAME}, {LAYER_INDEX}
  --field-strategy <FIELD-STRATEGY>                          How to determine target fields from source fields. FIELD-STRATEGY=union|intersection (default: union)
  -s, --input-crs <INPUT-CRS>                                Input CRS
  -d, --output-crs <OUTPUT-CRS>                              Output CRS

Advanced Options:
  --if, --input-format <INPUT-FORMAT>                        Input formats [may be repeated] [not available in pipelines]
  --oo, --open-option <KEY>=<VALUE>                          Open options [may be repeated] [not available in pipelines]
  --output-oo, --output-open-option <KEY>=<VALUE>            Output open options [may be repeated] [not available in pipelines]
  --upsert                                                   Upsert features (implies 'append') [not available in pipelines]
                                                             Mutually exclusive with --append

Description

gdal vector concat concatenates several source datasets.

It has 3 main modes:

  • --mode = merge-per-layer-name (the default). The output dataset generated by the command will contain as many layers as there are different layer names in the source datasets. For example if there are 2 datasets, one with layers a and b, and the other one with layers b and c, 3 output layers will be created: a, b (merging the 2 source layers) and c.

  • --mode = stack. The output dataset generated by the command will contain as many layers as there are layers in the source datasets. For example if there are 2 datasets ds1 (with layers a and b) and ds2 (with layers b and c), 4 output layers will be created: ds1_a, ds1_b, ds2_b and ds2_c.

  • --mode = single. The output dataset generated by the command will contain one single layer, merging all layers in the source datasets.

When an output layer merges several source layer, by default the resulting schema will contain the union of all source fields. It is possible to select only the intersection with the --field-strategy set to intersection. Regarding the resulting CRS, by default the CRS of the source layer will be used as the target CRS, and features of other source layers that do no match this CRS will be reprojected to it. --output-crs can be used to select a given destination CRS.

This command can also be used as the first step of gdal vector pipeline.

GDALG output (on-the-fly / streamed dataset)

This program supports serializing the command line as a JSON file using the GDALG output format. The resulting file can then be opened as a vector dataset using the GDALG: GDAL Streamed Algorithm driver, and apply the specified pipeline in a on-the-fly / streamed way.

Program-Specific Options

--field-strategy union|intersection

Determines how the schema of the target layer is built from the schemas of the input layers:

  • union (default) to use a super-set of all the fields from all source layers.

  • intersection to use a sub-set of all the common fields from all source layers.

--input-crs, -s <INPUT-CRS>

Set (override) input spatial reference. If not specified the SRS found in the input dataset will be used.

The coordinate reference systems that can be passed are anything supported by the OGRSpatialReference.SetFromUserInput() call, which includes EPSG Projected, Geographic or Compound CRS (i.e. EPSG:4296), a well known text (WKT) CRS definition, PROJ.4 declarations, or the name of a .prj file containing a WKT CRS definition.

If the SRS has an explicit vertical datum that points to a geoid grid, and the input dataset is a single band dataset, a vertical correction will be applied to the values of the dataset.

--mode merge-per-layer-name|stack|single

Determine the strategy to create output layers from source layers. See introductory paragraph for more details.

--output-crs, -d <OUTPUT-CRS>

Set output spatial reference. Inputs will be reprojected to this CRS if necessary.

The coordinate reference systems that can be passed are anything supported by the OGRSpatialReference.SetFromUserInput() call, which includes EPSG Projected, Geographic or Compound CRS (i.e. EPSG:4296), a well known text (WKT) CRS definition, PROJ.4 declarations, or the name of a .prj file containing a WKT CRS definition.

If the SRS has an explicit vertical datum that points to a geoid grid, and the input dataset is a single band dataset, a vertical correction will be applied to the values of the dataset.

--output-layer <OUTPUT-LAYER>

Name of the output vector layer (in single mode, and the default is "merged"), or template to name the output vector layers in stack mode (the default value is {AUTO_NAME}). Not allowed in merge-per-layer-name mode.

The template in stack mode can be a string with the following variables that will be substituted with a value computed from the input layer being processed:

  • {AUTO_NAME}: equivalent to {DS_BASENAME}_{LAYER_NAME} if both values are different, or {LAYER_NAME} when they are identical (case of shapefile).

  • {DS_NAME}: name of the source dataset

  • {DS_BASENAME}: base name of the source dataset

  • {DS_INDEX}: index of the source dataset

  • {LAYER_NAME}: name of the source layer

  • {LAYER_INDEX}: index of the source layer

--source-layer-field-content <SOURCE-LAYER-FIELD-CONTENT>

If specified, the schema of the target layer will be extended with a new field (whose name is given by --source-layer-field-name, or source_ds_lyr otherwise), whose content is determined by the specified template (see --output-layer for variables that can be used).

--source-layer-field-name <SOURCE-LAYER-FIELD-NAME>

If specified, the schema of the target layer will be extended with a field whose name is the value of this option and whose content is determined --source-layer-field-content.

Standard Options

Details
--append

Whether appending features to existing layer(s) is allowed. This also creates the output dataset if it does not exist yet.

--co, --creation-option <NAME>=<VALUE>

Many formats have one or more optional dataset creation options that can be used to control particulars about the file created. For instance, the GeoPackage driver supports creation options to control the version.

May be repeated.

The dataset creation options available vary by format driver, and some simple formats have no creation options at all. A list of options supported for a format can be listed with the --formats command line option but the documentation for the format is the definitive source of information on driver creation options. See Vector drivers format specific documentation for legal creation options for each format.

Note that dataset creation options are different from layer creation options.

--if, --input-format <format>

Format/driver name to be attempted to open the input file(s). It is generally not necessary to specify it, but it can be used to skip automatic driver detection, when it fails to select the appropriate driver. This option can be repeated several times to specify several candidate drivers. Note that it does not force those drivers to open the dataset. In particular, some drivers have requirements on file extensions.

May be repeated.

--input-layer <INPUT-LAYER>

Specifies the name of one or more layers to process. By default, all layers will be processed.

--lco, --layer-creation-option <NAME>=<VALUE>

Many formats have one or more optional layer creation options that can be used to control particulars about the layer created. For instance, the GeoPackage driver supports layer creation options to control the feature identifier or geometry column name, setting the identifier or description, etc.

May be repeated.

The layer creation options available vary by format driver, and some simple formats have no layer creation options at all. A list of options supported for a format can be listed with the --formats command line option but the documentation for the format is the definitive source of information on driver creation options. See Vector drivers format specific documentation for legal creation options for each format.

Note that layer creation options are different from dataset creation options.

--oo, --open-option <NAME>=<VALUE>

Dataset open option (format specific).

May be repeated.

-f, --of, --format, --output-format <OUTPUT-FORMAT>

Which output vector format to use. Allowed values may be given by gdal --formats | grep vector | grep rw | sort

--output-open-option, --output-oo <NAME>=<VALUE>

Added in version 3.12.

Dataset open option for output dataset (format specific).

May be repeated.

--overwrite

Allow program to overwrite existing target file or dataset. Otherwise, by default, gdal errors out if the target file or dataset already exists.

--overwrite-layer

Whether overwriting the existing output vector layer is allowed.

--skip-errors

Added in version 3.12.

Whether failures to write feature(s) should be ignored. Note that this option sets the size of the transaction unit to one feature at a time, which may cause severe slowdown when inserting into databases.

--update

Whether to open an existing output dataset in update mode.

--upsert

Added in version 3.12.

Variant of --append where the OGRLayer::UpsertFeature() operation is used to insert or update features instead of appending with OGRLayer::CreateFeature().

This is currently implemented only in a few drivers: GPKG -- GeoPackage vector, Elasticsearch: Geographically Encoded Objects for Elasticsearch and MongoDBv3 (drivers that implement upsert expose the GDAL_DCAP_UPSERT capability).

The upsert operation uses the FID of the input feature, when it is set (and the FID column name is not the empty string), as the key to update existing features. It is crucial to make sure that the FID in the source and target layers are consistent.

For the GPKG driver, it is also possible to upsert features whose FID is unset or non-significant (the --unset-fid option of gdal vector edit can be used to ignore the FID from the source feature), when there is a UNIQUE column that is not the integer primary key.

Return status code

The program returns status code 0 in case of success, and non-zero in case of error (non-blocking errors emitted as warnings are considered as a successful execution).

Examples

Example 1: Creating a GeoPackage stacking all input shapefiles in separate layers.

gdal vector concat --mode=stack *.shp out.gpkg

Example 2: Adding a field to indicate the source layer, and reprojecting to a single CRS

Concatenate the content of france.shp and germany.shp in merged.shp, reprojecting them to ETRS89, and add a 'country' field to each feature whose value is 'france' or 'germany' depending where it comes from:

gdal vector concat --mode=single --source-layer-field-name=country --output-crs=EPSG:4258 france.shp germany.shp merged.shp