geospatial_learn package

Submodules

geospatial_learn.raster module

The raster module.

Description

A series of tools for the manipulation of geospatial imagery/rasters such as masking or raster algebraic type functions and the conversion of Sentinel 2 data to gdal compatible formats.

raster.array2raster(array, bands, inRaster, outRas, dtype, FMT=None)

Save a raster from a numpy array using the geoinfo from another.

Parameters
  • array (np array) – a numpy array.

  • bands (int) – the no of bands.

  • inRaster (string) – the path of a raster.

  • outRas (string) – the path of the output raster.

  • dtype (int) – though you need to know what the number represents! a GDAL datatype (see the GDAL website) e.g gdal.GDT_Int32

  • FMT (string) – (optional) a GDAL raster format (see the GDAL website) eg Gtiff, HFA, KEA.

raster.batch_gdaldem(inlist, prop='aspect')

batch dem calculation a load of gdal files from some format to tif

Parameters
  • inlist (string) – A list of raster paths

  • prop (string) – one of “hillshade”, “slope”, “aspect”, “color-relief”, “TRI”, “TPI”, “Roughness”

Returns

Return type

List of file paths

raster.batch_translate(folder, wildcard, FMT='Gtiff')

Using the gdal python API, this function translates the format of files to commonly used formats

Parameters
  • folder (string) – the folder containing the rasters to be translated

  • wildcard (string) – the format wildcard to search for e.g. ‘.tif’

  • FMT (string (optional)) – a GDAL raster format (see the GDAL website) eg Gtiff, HFA, KEA

raster.batch_translate_adf(inlist)

batch translate a load of adf (arcgis) files from some format to tif

Parameters

inlist (string) – A list of raster paths

Returns

Return type

List of file paths

raster.batch_wms_download(gdf, wms, layer, img_size, outdir, attribute='id', espg='27700', res=0.25)

Download a load of wms tiles with georeferencing

Parameters
  • gdf (geopandas gdf)

  • wms (string) – the wms addresss

  • layer (string) – the wms layer

  • img_size (tuple) – image x,y dims

  • espg (string) – the proj espg

  • outfile (string) – path to outfile

  • res (int) – per pixel resolution of imagery in metres

raster.bbox2raster(array, bands, bbox, outras, pixel_size=0.25, proj=27700, dtype=5, FMT='Gtiff')

Using a bounding box and other information georef an image and write to disk

Parameters
  • array (np array) – a numpy array.

  • bands (int) – the no of bands.

  • bbox (list or tuple) – xmin, ymin, xmax, ymax

  • pixel_size (int) – pixel size in metres (unless proj is degrees!)

  • outras (string) – the path of the output raster.

  • proj (int) – the espg code eg 27700 for osgb

  • dtype (int) – though you need to know what the number represents! a GDAL datatype (see the GDAL website) e.g gdal.GDT_Int32 = 5

  • FMT (string) – (optional) a GDAL raster format (see the GDAL website) eg Gtiff, KEA.

raster.calc_ndvi(inputIm, outputIm, bandsList, blocksize=256, FMT=None, dtype=None)

Create a copy of an image with an ndvi band added

Parameters
  • inputIm (string) – the granule folder

  • bands (list) – a list of band indicies to be used, eg - [3,4] for Sent2 data

  • FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’

  • blocksize (int) – the chunk of raster read in & write out

raster.clip_raster(inRas, inShp, outRas, cutline=True)

Clip a raster

Parameters
  • inRas (string) – the input image

  • outPoly (string) – the input polygon file path

  • outRas (string (optional)) – the clipped raster

  • cutline (bool (optional)) – retain raster values only inside the polygon

raster.color_raster(inRas, color_file, output_file)

Generate a txt colorfile and make a RGB image from a grayscale one

Parameters
  • inRas (string) – Path to input raster (single band greyscale)

  • color_file (string) – Path to output colorfile.txt

raster.combine_scene(scl, c_scn, blocksize=256)

Combine another scene classification with the sen2cor one

Parameters
  • scl (string) – the sen2cor one

  • c_scn (string) – the independently derived one - this will be modified

  • blocksize (string) – chunck to process

raster.hist_match(inputImage, templateImage)

Adjust the pixel values of a grayscale image such that its histogram matches that of a target image.

Writes to the inputImage dataset so that it matches

As the entire band histogram is required this can become memory intensive with big rasters eg 10 x 10k+

Inspire by/adapted from something on stack on image processing - credit to that author

Parameters
  • inputImage (string) – image to transform; the histogram is computed over the flattened array

  • templateImage (string) – template image can have different dimensions to source

raster.mask_raster(inputIm, mval, overwrite=True, outputIm=None, blocksize=None, FMT=None)

Perform a numpy masking operation on a raster where all values corresponding to mask value are retained - does this in blocks for efficiency on larger rasters

Parameters
  • inputIm (string) – the input raster

  • mval (int) – the mask value eg 1, 2 etc

  • FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’

  • outputIm (string (optional)) – optionally write a separate output image, if None, will mask the input

  • blocksize (int) – the chunk of raster to read in

Returns

A string of the output file path

Return type

string

raster.mask_raster_multi(inputIm, mval=1, rule='==', outval=None, mask=None, blocksize=256, FMT=None, dtype=None)

Perform a numpy masking operation on a raster where all values corresponding to, less than or greater than the mask value are retained - does this in blocks for efficiency on larger rasters

Parameters
  • inputIm (string) – the granule folder

  • mval (int) – the masking value that delineates pixels to be kept

  • rule (string) – the logic for masking either ‘==’, ‘<’ or ‘>’

  • outval (numerical dtype eg int, float) – the areas removed will be written to this value default is 0

  • mask (string) – the mask raster to be used (optional)

  • FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’

  • mode (string) – None > 10m data, ‘20’ >20m

  • blocksize (int) – the chunk of raster read in & write out

raster.mask_with_poly(vector_path, raster_path, value=0)

Change raster values inside a polygon and update the raster

Parameters
  • vector_path (string) – input shapefile

  • raster_path (string) – input raster

  • value (int) – the value to alter

raster.multi_temp_filter(inRas, outRas, bands=None, windowSize=None)

The multi temp filter for SAR data as outlined & published by Quegan et al

This is only suitable for small images, as it holds intermediate data in memory

Parameters
  • inRas (string) – the input raster

  • outRas (string) – the output raster

  • blocksize (int) – the chunck processed

  • windowsize (int) – the filter window size

  • FMT (string) – gdal compatible (optional) defaults is tif

raster.polygonize(inRas, outPoly, outField=None, mask=True, band=1, filetype='ESRI Shapefile')

Polygonise a raster

Parameters
  • inRas (string) – the input image

  • outPoly (string) – the output polygon file path

  • outField (string (optional)) – the name of the field containing burnded values

  • mask (bool (optional)) – use the input raster as a mask

  • band (int) – the input raster band

raster.raster2array(inRas, bands=[1])

Read a raster and return an array, either single or multiband

Parameters
  • inRas (string) – input raster

  • bands (list) – a list of bands to return in the array

raster.rasterize(inShp, inRas, outRas, field=None, fmt='Gtiff')

Rasterize a polygon to the extent & geo transform of another raster

Parameters
  • inRas (string) – the input image

  • outRas (string) – the output polygon file path

  • field (string (optional)) – the name of the field containing burned values, if none will be 1s

  • fmt (the gdal image format)

raster.remove_cloud_S2(inputIm, sceneIm, blocksize=256, FMT=None, min_size=4, dist=1)

Remove cloud using the a scene classification

This saves back to the input raster by default

Parameters
  • inputIm (string) – the input image

  • sceneIm (string) – the scenemap to use as a mask for removing cloud It is assumed the scene map consists of 1 shadow, 2 cloud, 3 land, 4 water

  • FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’

  • min_size (int) – size in pixels to retain of cloud mask

  • blocksize (int) – the square chunk processed at any one time

raster.remove_cloud_S2_stk(inputIm, sceneIm1, sceneIm2=None, baseIm=None, blocksize=256, FMT=None, max_size=10, dist=1)

Remove cloud using a classification where cloud == 1 Esoteric - from the Forest Sentinel project, but retained here

Parameters
  • inputIm (string) – the input image

  • sceneIm1, 2 (string) – the classification rasters used to mask out the areas in

  • baseIm (string) – Another multiband raster of same size extent as the inputIm where the baseIm image values are used rather than simply converting to zero (in the use case of 2 sceneIm classifications)

  • Notes

  • ———–

  • Useful if you have a base image which is a cloudless composite, which

  • you intend to replace with the current image for the next round of

  • classification/ change detection

raster.rgb_ind(inputIm, outputIm, blocksize=256, FMT=None, dtype=5)

Create a copy of an image with rgb indices added :Parameters: * inputIm (string) – the input rgb image

  • outputIm (string) – the output image

  • FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’

  • blocksize (int) – the chunk of raster read in & write out

raster.srtm_gdaldem(inlist, prop='aspect')

Batch dem calculation a load of srtm files

SRTM scale & z factor vary across the globe so this calculates based on latitude

Parameters
  • inlist (string) – A list of raster paths

  • prop (string) – one of “hillshade”, “slope”, “aspect”, “color-relief”, “TRI”, “TPI”, “Roughness”

Returns

Return type

List of file paths

raster.stack_ras(rasterList, outFile)

Stack some rasters

Parameters
  • rasterList (string) – the input image

  • outFile (string) – the output file path including file extension

raster.stat_comp(inRas, outMap, bandList=None, stat='percentile', q=95, blocksize=256, FMT=None, dtype=6)

Calculate depth wise stat on a multi band raster with selected or all bands

Parameters
  • inRas (string) – input Raster

  • outMap (string) – the output raster calculated

  • stat (string) – the statisitc to be calculated make sure there are no nans as nan percentile is far too slow

  • blocksize (int) – the chunck processed

  • q (int) – the ith percentile if percentile is the stat used

  • FMT (string) – gdal compatible (optional) defaults is tif

  • dtype (string) – gdal datatype (default gdal.GDT_Int32)

raster.temporal_comp(fileList, outMap, stat='percentile', q=95, folder=None, blocksize=256, FMT=None, dtype=5)

Calculate an image beased on a time series collection of imagery (eg a years woth of S2 data)

Parameters
  • FileList (list of strings) – the files to be inputed, if None a folder must be specified

  • outMap (string) – the output raster calculated

  • stat (string) – the statisitc to be calculated

  • blocksize (int) – the chunck processed

  • q (int) – the ith percentile if percentile is the stat used

  • FMT (string) – gdal compatible (optional) defaults is tif

  • dtype (string) – gdal datatype (default gdal.GDT_Int32)

raster.tile_rasters(inRas, outDir, tilesize=['256', '256'], overlap='0')

Split a large raster into smaller ones

Parameters
  • inRas (string) – the path to input raster

  • outDir (string) – the path to the output dir

  • tilesize (list of str) – the sides of a square tile [“256”, “256”]

  • overlap (string) – should a overlap per tile be required

raster.wmsGrabber(bbox, image_size, wms, layer, outfile, espg='27700', res=0.25)

Return a wms tile from a given source and optionally write to disk with georef

Parameters
  • bbox (list or tuple) – xmin, ymin, xmax, ymax

  • image_size (tuple) – image x,y dims

  • wms (string) – the wms addresss

  • layer (string) – the wms (sub)layer

  • espg (string) – the proj espg

  • outfile (string) – path to outfile, if None only array is returned

raster.write_vrt(infiles, outfile)
Parameters
  • infiles

  • outfile (string) – the output .vrt

geospatial_learn.learning module

the learning module

Description

The learning module set of functions provide a framework to optimise and classify EO data for both per pixel or object properties

learning.RF_oob_opt(model, X_train, min_est, max_est, step, regress=False)

This function uses the oob score to find the best parameters.

This cannot be parallelized due to the warm start bootstrapping, so is potentially slower than the other cross val in the create_model function

This function is based on an example from the sklearn site

This function plots a graph diplaying the oob rate

Parameters
  • model (string (.gz)) – path to model to be saved

  • X_train (np array) – numpy array of training data where the 1st column is labels

  • min_est (int) – min no of trees

  • max_est (int) – max no of trees

  • step (int) – the step at which no of trees is increased

  • regress (bool) – boolean where if True it is a regressor

  • Returns (tuple of np arrays)

  • ———————–

  • error rate, best estimator

learning.classify_object(model, inShape, attributes, field_name=None, write='gpd')

Classify a polygon/point file attributes (‘object based’) using an sklearn model

Parameters
  • model (string) – path to input model

  • inShape (string) – input shapefile path (must be .shp for now….)

  • attributes (list of stings) – list of attributes names

  • field_name (string) – name of classified label field (optional)

  • write (string) – either gpd(geopandas) or ogr

learning.classify_pixel(model, inputDir, bands, outMap, probMap)

A function to classify an image using a pre-saved model - assumes a folder of tiled rasters for memory management - classify_pixel_block is recommended instead of this function

Parameters

model: sklearn model

a path to a scikit learn model that has been saved

inputDir: string

a folder with images to be classified

bands: int

the no of image bands eg 8

outMap: string

path to output image excluding the file format ‘pathto/mymap’

probMap: string

path to output prob image excluding the file format ‘pathto/mymap’

FMT: string

optional parameter - gdal readable fmt

learning.classify_pixel_bloc(model, inputImage, outMap, bands=[1, 2, 3], blocksize=None, FMT=None, ndvi=None, dtype=5)

A block processing classifier for large rasters, supports KEA, HFA, & Gtiff formats. KEA is recommended, Gtiff is the default

Parameters
  • model (sklearn model / keras model) – a path to a model that has been saved

  • inputImage (string) – path to image including the file fmt ‘Myimage.tif’

  • bands (band) – list of band indices to be used eg [1,2,3]

  • outMap (string) – path to output image excluding the file format ‘pathto/mymap’

  • FMT (string) – optional parameter - gdal readable fmt

  • blocksize (int (optional)) – size of raster chunck in pixels 256 tends to be quickest if you put None it will read size from gdal (this doesn’t always pay off!)

  • dtype (int (optional - gdal syntax gdal.GDT_Int32)) – a gdal dataype - default is int32

Notes

Block processing is sequential, but quite a few sklearn models are parallel so that has been prioritised rather than raster IO

learning.classify_ply(incld, inModel, train_field='training', class_field='label', rgb=True, outcld=None, ignore=['x', 'y', 'scalar_ScanAngleRank', 'scalar_NumberOfReturns', 'scalar_ReturnNumber', 'scalar_GpsTime', 'scalar_PointSourceId'])

Classify a point cloud (ply format)

Parameters
  • incld (string) – the input point cloud

  • class_field (string) – the name of the field that the results will be written to this must already exist! Create in CldComp. or cgal

  • train_field (string) – the name of the training label field so it can be ignored

  • rgb (bool) – whether there is rgb data to be included

  • outcld (string) – path to a new ply to write if not writing to the input one

  • ignore (list) – the pointcloud attributes to ignore for classification

learning.create_model(X_train, outModel, clf='erf', group=None, random=False, cv=5, cores=- 1, strat=True, test_size=0.3, regress=False, params=None, scoring=None, class_names=None, save=True)

Brute force or random model creating using scikit learn. Either use the default params in this function or enter your own (recommended - see sklearn)

Parameters
  • X_train (np array) – numpy array of training data where the 1st column is labels. If the groupkfold is used, the last column will be the group labels

  • outModel (string) – the output model path which is a gz file, if using keras it is h5

  • params (dict) – a dict of model params (see scikit learn)

  • clf (string) – an sklearn or xgb classifier/regressor logit, sgd, linsvc, svc, svm, nusvm, erf, rf, gb, xgb,

  • group (np.array) – array of group labels for train/test split and grid search useful to avoid autocorrelation

  • random (bool) – if True, a random param search

  • cv (int) – no of folds

  • cores (int or -1 (default)) – the no of parallel jobs

  • strat (bool) – a stratified grid search

  • test_size (float) – percentage to hold out to test

  • regress (bool) – a regression model if True, a classifier if False

  • scoring (string) – a suitable sklearn scoring type (see notes)

  • class_names (list of strings) – class names in order of their numercial equivalents

Returns

  • A list of

  • [grid.best_estimator_, grid.cv_results_, grid.best_score_,grid.best_params_, classification_report)]

Scoring types - there are a lot - some of which won’t work for multi-class, regression etc - see the sklearn docs!

‘accuracy’, ‘adjusted_rand_score’, ‘average_precision’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_samples’, ‘f1_weighted’, ‘neg_log_loss’, ‘neg_mean_absolute_error’, ‘neg_mean_squared_error’, ‘neg_median_absolute_error’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_samples’, ‘precision_weighted’, ‘r2’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_samples’, ‘recall_weighted’, ‘roc_auc’

learning.create_model_autosk(X_train, outModel, cores=- 1, class_names=None, incld_est=None, excld_est=None, incld_prep=None, excld_prep=None, total_time=120, res_args={'cv': 5}, mem_limit=None, per_run=None, test_size=0.3, wrkfolder=None, scoring=None, save=True, ply=False)

Auto-sklearn to create a model

Parameters
  • X_train (np array) – numpy array of training data where the 1st column is labels

  • outModel (string) – the output model path which is a gz file, if using keras it is h5

  • cores (int or -1 (default)) – the no of parallel jobs

  • class_names (list of strings) – class names in order of their numercial equivalents

  • incld_est (list of strings) – estimators to included eg [‘random_forest’]

  • excld_est (list of strings) – estimators to excluded eg [‘random_forest’]

  • incld_prep (list of strings) – preproc to include

  • excld_prep (list of strings) – preproc to include

  • total_time (int) – time in seconds for the whole search process

  • res_args (dict) – strategy for overfit avoidance e.g. {‘cv’:5}

  • mem_limit (int) – memory limit per job

  • per_run (int) – time limit per run

  • test_size (float) – percentage to hold out to test

  • wrkfolder (string) – path to dir for intermediate working

  • scoring (string) – a suitable sklearn scoring type (see notes)

Returns

  • A list of

  • [model, classif_report]

learning.create_model_tpot(X_train, outModel, gen=5, popsize=50, group=None, cv=5, cores=- 1, dask=False, test_size=0.2, regress=False, params=None, scoring=None, verbosity=2, warm_start=False)

Create a model using the tpot library where genetic algorithms are used to optimise pipline and params.

Parameters
  • X_train (np array) – numpy array of training data where the 1st column is labels

  • outModel (string) – the output model path (which is a .py file) from which to run the pipeline

  • cv (int) – no of folds

  • cores (int or -1 (default)) – the no of parallel jobs

  • regress (bool) – a regression model if True, a classifier if False

  • test_size (float) – size of test set held out

  • params (a dict of model params (see tpot)) – enter your own params dict rather than the range provided e.g. {‘sklearn.ensemble.RandomForestClassifier’: {“n_estimators”: [200],

    “max_features”: [‘sqrt’, ‘log2’], “max_depth”: [10, None],

  • },

  • ‘xgboost.sklearn.XGBClassifier’ ({) –

    ‘n_estimators’: [200],

    ‘learning_rate’: [0.1, 0.2, 0.4]

  • }}

scoring: string

a suitable sklearn scoring type (see notes)

warm_start: bool

use the previous population, useful if interactive

learning.get_polars(inShp, polars=['VV', 'VH'])

Get list of fields containing polarisations from a polygon/point file

Parameters
  • inShp (string) – the input polygon

  • polars (list of strings) – the attributes headed with polarisations eg ‘VV’

learning.get_training(inShape, inRas, bands, field, outFile=None)

Collect training as an np array for use with create model function

Parameters
  • inShape (string) – the input shapefile - must be esri .shp at present

  • inRas (string) – the input raster from which the training is extracted

  • bands (int) – no of bands

  • field (string) – the attribute field containing the training labels

  • outFile (string (optional)) – path to the training file saved as joblib format (eg - ‘training.gz’)

Returns

  • A tuple containing

  • -np array of training data

  • -list of polygons with invalid geometry that were not collected

learning.get_training_ply(incld, label_field='training', classif_field='label', rgb=True, outFile=None, ignore=['x', 'y', 'scalar_ScanAngleRank', 'scalar_NumberOfReturns', 'scalar_ReturnNumber', 'scalar_GpsTime', 'scalar_PointSourceId'])

Get training from a point cloud

Parameters
  • incld (string) – the input point cloud

  • label_field (string) – the name of the field representing the training points which must be positive integers

  • classif_field (string) – the name of the field that will be used for classification later must be specified so it can be ignored

  • rgb (bool) – whether there is rgb data to be included

  • outFile (string) – path to training array to be saved as .gz via joblib

  • ignore (list) – the pointcloud attributes to ignore for training

Returns

  • np array of training where first column is labels

  • list of feature names for later ref/plotting

learning.get_training_shp(inShape, label_field, feat_fields, outFile=None)

Collect training from a shapefile attribute table. Used for object-based classification (typically).

Parameters
  • inShape (string) – the input polygon

  • label_field (string) – the field name for the class labels

  • feat_fields (list) – the field names of the feature data

  • outFile (string (optional)) – path to training data to be saved (.gz)

Returns

  • training data as a dataframe, first column is labels, rest are features

  • list of reject features

learning.plot_feature_importances(modelPth, featureNames, model_type='scikit')

Plot the feature importances of an ensemble classifier

Parameters
  • modelPth (string) – A sklearn model path

  • featureNames (list of strings) – a list of feature names

learning.ply_features(incld, outcld=None, k=[50, 100, 200], props=['anisotropy', 'curvature', 'eigenentropy', 'eigen_sum', 'linearity', 'omnivariance', 'planarity', 'sphericity'], nrm_props=None)

Calculate point cloud features and write to file

Currently memory intensive due to using pyntcloud….

Parameters
  • incld (string) – the input point cloud

  • outcld (string) – the output point cloud

  • k (list) – the no of neighbors to use when calculating the props multiple is more effective

  • props (list) – the properties you wish to include

  • nrm_props (list) – properties based on normals if the exist (this will fail if they don’t) e.g. [“inclination_radians”, “orientation_radians”]

learning.prob_pixel_bloc(model, inputImage, bands, outMap, classes, blocksize=None, FMT=None, one_class=None)

A block processing classifier for large rasters that produces a probability, output.

Supports KEA, HFA, & Gtiff formats -KEA is recommended, Gtiff is the default

Parameters
  • model (string) – a path to a scikit learn model that has been saved

  • inputImage (string) – path to image including the file fmt ‘Myimage.tif’

  • bands (int) – the no of image bands eg 8

  • outMap (string) – path to output image excluding the file format ‘pathto/mymap’

  • classes (int) – no of classes

  • blocksize (int (optional)) – size of raster chunck 256 tends to be quickest if you put None it will read size from gdal (this doesn’t always pay off!)

  • FMT (string) – optional parameter - gdal readable fmt eg ‘Gtiff’

  • one_class (int) – choose a single class to produce output prob raster

Block processing is sequential, but quite a few sklearn models are parallel so that has been prioritised rather than raster IO

learning.regression_results(y_true, y_pred)
learning.rmse_vector_lyr(inShape, attributes)

Using sklearn get the rmse of 2 vector attributes (the actual and predicted of course in the order [‘actual’, ‘pred’])

Parameters
  • inShape (string) – the input vector of OGR type

  • attributes (list) – a list of strings denoting the attributes

geospatial_learn.shape module

The shape module.

Description

This module contains various functions for the writing of data in OGR vector formats. The functions are mainly concerned with writing geometric or pixel based attributes, with the view to them being classified in the learning module

shape.buffer(inShp, outfile, dist)

Buffer a shapefile by a given distance outputting a new one

Parameters
  • inShp (string) – input shapefile

  • outfile (string) – output shapefile

  • dist (float) – the distance in map units to buffer

shape.create_ogr_poly(outfile, spref, file_type='ESRI Shapefile', field='id', field_dtype=0)

Create an ogr dataset an layer (convenience)

Parameters
  • outfile (string) – path to ogr file

  • spref (wkt or int) – spatial reference either a wkt or espg

  • file_type (string) – ogr file designation

  • field (string) – attribute field e.g. “id”

  • field_type (int or ogr.OFT…..) – ogr dtype of field e.g. 0 == ogr.OFTInteger

shape.extent2poly(infile, filetype='raster', outfile=None, polytype='ESRI Shapefile', geecoord=False, lyrtype='ogr')

Get the coordinates of a files extent and return an ogr polygon ring with the option to save the file

Parameters
  • infile (string) – input ogr compatible geometry file or gdal raster

  • filetype (string) – the path of the output file, if not specified, it will be input file with ‘extent’ added on before the file type

  • outfile (string) – the path of the output file, if not specified, it will be input file with ‘extent’ added on before the file type

  • polytype (string) – ogr comapatible file type (see gdal/ogr docs) default ‘ESRI Shapefile’ ensure your outfile string has the equiv. e.g. ‘.shp’ or in case of memory only ‘Memory’ (outfile would be None in that case)

  • geecoord (bool) – optionally convert to WGS84 lat,lon

  • lyrtype (string) – either ‘gee’ which means earth engine or ‘ogr’ which returns ds and lyr

Returns

Return type

a GEE polygon geometry or ogr dataset and layer

shape.filter_shp(inShp, expression, outField, outLabel)

Filter and index an OGR polygon file features by attribute

Potentially useful for rule sets or prepping a subsiduary underlying raster operation

Parameters
  • inShp (string) – input shapefile

  • expression (string) – expression e.g. “DN >= 168”

  • outField (string) – the field in which the label will reside

  • outLabel (int) – the label identifying the filtered features

shape.geom2pixelbbox(inshp, inras, label='Tree', outfile=None)

Convert shapefile geometries to a df of pixel bounding boxes Projections must be the same!

Parameters
  • inshp (string) – input ogr compatible geometry

  • inras (string) – input raster

  • label (string) – label name def. ‘Tree’

  • outfile (string) – path to save annotation csv

shape.mesh_from_raster(inras, outshp=None, band=1)
shape.meshgrid(inRaster, outShp, gridHeight=1, gridWidth=1)
shape.ms_snake(inShp, inRas, outShp, band=2, buf1=0, buf2=0, algo='ACWE', nodata_value=0, iterations=200, smoothing=1, lambda1=1, lambda2=1, threshold='auto', balloon=- 1)

Deform a polygon using active contours on the values of an underlying raster.

This uses morphsnakes and explanations are from there.

Parameters
  • inShp (string) – input shapefile

  • inRas (string) – input raster

  • outShp (string) – output shapefile

  • band (int) – an integer val eg - 2

  • algo (string) – either “GAC” (geodesic active contours) or the default “ACWE” (active contours without edges)

  • buf1 (int) – the buffer if any in map units for the bounding box of the poly which extracts underlying pixel values.

  • buf2 (int) – the buffer if any in map units for the expansion or contraction of the poly which will initialise the active contour. This is here as you may wish to adjust the init polygon so it does not converge on a adjacent one or undesired area.

  • nodata_value (numerical) – If used the no data val of the raster

  • iterations (uint) – Number of iterations to run.

  • smoothing (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.

  • lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.

  • lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.

  • threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.

  • balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.

shape.poly2dictlist(inShp)

convert an ogr to a list of json like dicts

shape.rasterext2poly(inras)
shape.shape_props(inShape, prop, inRas=None, label_field='ID')

Calculate various geometric properties of a set of polygons Output will be relative to geographic units where relevant, but normalised where not (eg Eccentricity)

Parameters
  • inShape (string) – input shape file path

  • inRas (string) – a raster to get the correct dimensions from (optional), required for scikit-image props

  • prop (string) – Scikit image regionprops prop (see http://scikit-image.org/docs/dev/api/skimage.measure.html)

  • OGR is used to generate most of these as it is faster but the string

  • keys are same as scikit-image see notes for which require raster

Notes

Only shape file needed (OGR / shapely / numpy based)

‘MajorAxisLength’, ‘MinorAxisLength’, Area’, ‘Eccentricity’, ‘Solidity’, ‘Extent’: ‘Extent’, ‘Perimeter’: ‘Perim’

Raster required

‘Orientation’ and the remainder of props calcualble with scikit-image. These

process a bit slower than the above ones

shape.shp2gj(inShape, outJson)

Converts a geojson/json to a shapefile

Parameters
  • inShape (string) – input shapefile

  • outJson (string) – output geojson

Notes

Credit to person who posted this on the pyshp site

shape.snake(inShp, inRas, outShp, band=1, buf=1, nodata_value=0, boundary='fixed', alpha=0.1, beta=30.0, w_line=0, w_edge=0, gamma=0.01, max_iterations=2500, smooth=True, eq=False, rgb=False)

Deform a line using active contours based on the values of an underlying

raster - based on skimage at present so

not quick!

Notes

Param explanations for snake/active contour from scikit-image api

Parameters
  • inShp (string) – input shapefile

  • inRas (string) – input raster

  • band (int) – an integer val eg - 2

  • buf (int) – the buffer area to include for the snake deformation

  • alpha (float) – Snake length shape parameter. Higher values makes snake contract faster.

  • beta (float) – Snake smoothness shape parameter. Higher values makes snake smoother.

  • w_line (float) – Controls attraction to brightness. Use negative values to attract toward dark regions.

  • w_edge (float) – Controls attraction to edges. Use negative values to repel snake from edges.

  • gamma (float) – Explicit time stepping parameter.

  • max_iterations (int) – No of iterations to evolve snake

  • boundary (string) – Scikit-image text: Boundary conditions for the contour. Can be one of ‘periodic’, ‘free’, ‘fixed’, ‘free-fixed’, or ‘fixed-free’. ‘periodic’ attaches the two ends of the snake, ‘fixed’ holds the end-points in place, and ‘free’ allows free movement of the ends. ‘fixed’ and ‘free’ can be combined by parsing ‘fixed-free’, ‘free-fixed’. Parsing ‘fixed-fixed’ or ‘free-free’ yields same behaviour as ‘fixed’ and ‘free’, respectively.

  • nodata_value (numerical) – If used the no data val of the raster

  • rgb (bool) – read in bands 1-3 assuming them to be RGB

shape.sqlfilter(inShp, sql)

Return an OGR layer via sql statement for some further analysis

See https://gdal.org/user/ogr_sql_dialect.html for examples

Notes

An OS Master map example

“SELECT * FROM TopographicArea WHERE DescriptiveGroup=’General Surface’”

Parameters
  • inShp (string) – input shapefile

  • sql (string) – sql expression (ogr dialect)

Returns

Return type

ogr lyr

shape.texture_stats(inShp, inRas, band, gprop='contrast', offset=2, angle=0, write_stat=None, nodata_value=0, mean=False)

Calculate and optionally write texture stats for an OGR compatible polygon based on underlying raster values

Parameters
  • inShp (string) – input shapefile

  • inRas (string) – input raster path

  • gprop (string) – a skimage gclm property entropy, contrast, dissimilarity, homogeneity, ASM, energy, correlation

  • offset (int) – distance in pixels to measure - minimum of 2!!!

  • angle (int) – angle in degrees from pixel (int)

  • mean (bool) – take the mean of all offsets

  • Important to note that the results will be unreliable for glcm

  • texture features if seg is true as non-masked values will be zero or

  • some weird no data and will affect results

Notes

Important

The texture of the bounding box is at present the “relible” measure

Using the segment only results in potentially spurious results due to the scikit-image algorithm measuring texture over zero/nodata to number pixels (e.g 0>54). The segment part will be developed in due course to overcome this issue

shape.thresh_seg(inShp, inRas, outShp, band, buf=0, algo='otsu', min_area=4, nodata_value=0)

Use an image processing technique to threshold foreground and background in a polygon segment.

This default is otsu’s method.

Parameters
  • inShp (string) – input shapefile

  • inRas (string) – input raster

  • band (int) – an integer val eg - 2

  • algo (string) – ‘otsu’, niblack, sauvola

  • nodata_value (numerical) – If used the no data val of the raster

shape.write_id_field(inShape, fieldName='id')

Write a string to a ogr vector file

Parameters
  • inShape (string) – input OGR vecotr file

  • fieldName (string) – name of field being written

shape.write_text_field(inShape, fieldName, attribute)

Write a string to a ogr vector file

Parameters
  • inShape (string) – input OGR vecotr file

  • fieldName (string) – name of field being written

  • attribute (string) – ‘text to enter in each entry of column’

shape.zonal_point(inShp, inRas, field, band=1, nodata_value=0, write_stat=True)

Get the pixel val at a given point and write to vector

Parameters
  • inShp (string) – input shapefile

  • inRas (string) – input raster

  • field (string) – the name of the field

  • band (int) – an integer val eg - 2

  • nodata_value (numerical) – If used the no data val of the raster

shape.zonal_rgb_idx(inShp, inRas, nodata_value=0)

Calculate RGB-based indicies per segment/AOI

Parameters
  • inShp (string) – input shapefile

  • inRas (string) – input raster

  • nodata_value (numerical) – If used the no data val of the raster

shape.zonal_stats(inShp, inRas, band, bandname, layer=None, stat='mean', write_stat=True, nodata_value=0, all_touched=True, expression=None)

Calculate zonal stats for an OGR polygon file

Parameters
  • inShp (string) – input shapefile

  • inRas (string) – input raster

  • band (int) – an integer val eg - 2

  • bandname (string) – eg - blue

  • layer (string) – if using a db type format with multi layers, specify the name of the layer in question

  • stat (string) – string of a stat to calculate, if omitted it will be ‘mean’ others: ‘mode’, ‘min’,’mean’,’max’, ‘std’,’ sum’, ‘count’,’var’, skew’, ‘kurt (osis)’, ‘vol’

  • write_stat (bool (optional)) – If True, stat will be written to OGR file, if false, dataframe only returned (bool)

  • nodata_value (numerical) – If used the no data val of the raster

  • all_touched (bool) – whether to use all touched when raterising the polygon if the poly is smaller/comaparable to the pixel size, True is perhaps the best option

  • expression (string) – process a selection only eg expression e.g. “DN >= 168”

shape.zonal_stats_all(inShp, inRas, bandnames, statList=['mean', 'min', 'max', 'median', 'std', 'var', 'skew', 'kurt'])

Calculate zonal stats for an OGR polygon file

Parameters
  • inShp (string) – input shapefile

  • inRas (string) – input raster

  • band (int) – an integer val eg - 2

  • bandnames (list) – eg - [‘b’,’g’,’r’,’nir’]

  • nodata_value (numerical) – If used the no data val of the raster

geospatial_learn.utilities module

Created on Thu Sep 8 22:35:39 2016 @author: Ciaran Robb The utilities module - things here don’t have an exact theme or home yet so may eventually move elsewhere

If you use code to publish work cite/acknowledge me and authors of libs etc as appropriate

utilities.apply_lut(src, lut)
utilities.colorscale(seg, prop='Area', custom=None)

Colour an array according to a region prop value

Parameters
  • seg (np.array) – input array of labelled image

  • prop (string) – sklearn region prop

  • custom (list) – a custom list of values to apply to array

Returns

Return type

np array of attributed regions

utilities.combine_grid(inRas1, inRas2, outRas, outShp, min_area=None)
utilities.do_phasecong(tempIm, low_t=0, hi_t=0, norient=6, nscale=6, sigma=2)

process phase congruency on an image

utilities.fixply(incloud, outcloud, field='scalar_label')

Fix a ply file for use in cgal after cloudcompare

Parameters
  • incloud (string) – path to an input ply

  • outcloud (string) – path to the output sply

  • field (string) – the scalar field to alter

utilities.houghseg(inRas, outShp, edge='canny', sigma=2, thresh=0, ratio=2, n_orient=6, n_scale=5, hArray=True, vArray=True, valrange=1, interval=10, band=2, min_area=None)

Detect and write Hough lines to a line shapefile and create rectangular segments from them

Implemented for the paper Robb et al. (2020), Semi-automated field plot segmentation from UAS imagery for experimental agriculture, Froniers in Plant Science

Parameters
  • inRas (string) – path to an input raster from which the geo-reffing is obtained

  • outShp (string) – path to the output shapefile

  • edge (string) – edge method ‘canny’ or ‘ph’

  • sigma (float) – scalar value for gaussian smoothing

  • thresh (int/float) – the high hysterisis threshold

  • band (int) – the image band

  • hArray (bool) – axis 1 of the image

  • vArray (bool) – axis2 of the image

  • band (int) – axis 1 of the image

  • min_area (float) – the minimum area of segment to retain

utilities.image_thresh(image)
utilities.imangle(im)

Determine the orientation of non-zero vals in an image

Parameters

im (np array) – input image

Returns

axes – orientations of each side and binary array

Return type

tuple

utilities.iter_ransac(image, sigma=3, no_iter=10, order='col', mxt=2500)
utilities.min_bound_rectangle(points)

Find the smallest bounding rectangle for a set of points. Returns a set of points representing the corners of the bounding box. :Parameters: points (list) – An nx2 iterable of points

Returns

an nx2 list of coordinates

Return type

list

utilities.ms_toposeg(inRas, outShp, iterations=100, algo='ACWE', band=2, dist=30, se=3, usemin=False, imtype=None, useedge=True, burnedge=False, merge=False, close=True, sigma=4, hi_t=None, low_t=None, init=4, smooth=1, lambda1=1, lambda2=1, threshold='auto', balloon=1)

Topology preserveing segmentation, implemented in python/nump inspired by ms_topo and morphsnakes

This uses morphsnakes level sets to make the segments and param explanations are mainly from there.

Parameters
  • inSeg (string) – input segmentation raster

  • raster_path (string) – input raster whose pixel vals will be used

  • band (int) – an integer val eg - 2

  • algo (string) – either “GAC” (geodesic active contours) or “ACWE” (active contours without edges)

  • sigma (the size of stdv defining the gaussian envelope if using canny edge) – a unitless value

  • iterations (uint) – Number of iterations to run.

  • smooth (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.

  • lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.

  • lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.

  • threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.

  • balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.

utilities.ms_toposnakes(inSeg, inRas, outShp, iterations=100, algo='ACWE', band=2, sigma=4, alpha=100, smooth=1, lambda1=1, lambda2=1, threshold='auto', balloon=- 1)

Topology preserveing morphsnakes, implemented in python/numpy exclusively by C.Robb

This uses morphsnakes and explanations are from there.

Parameters
  • inSeg (string) – input segmentation raster

  • raster_path (string) – input raster whose pixel vals will be used

  • band (int) – an integer val eg - 2

  • algo (string) – either “GAC” (geodesic active contours) or “ACWE” (active contours without edges)

  • sigma (the size of stdv defining the gaussian envelope if using canny edge) – a unitless value

  • iterations (uint) – Number of iterations to run.

  • smooth (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.

  • lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.

  • lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.

  • threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.

  • balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.

utilities.ragmerge(inSeg, inRas, outShp, band, thresh=0.02)
Parameters
  • inSeg (string) – Path to Input segmentation raster

  • inRas (string) – Path to underlying raster that will be used merge segments

  • outShape (string) – Path to output segmentation shape

  • band (int) – The band on which to perform the RAG merge

  • thresh (float) – The RAG merge threshold

utilities.ransac_lines(inRas, outRas, sigma=3, row=True, col=True, binwidth=40)
utilities.raster2array(inRas, bands=[1])

Read a raster and return an array, either single or multiband

Parameters
  • inRas (string) – input raster

  • bands (list) – a list of bands to return in the array

utilities.temp_match(vector_path, raster_path, band, nodata_value=0, ind=None)

Based on polygons return template matched images

Parameters
  • vector_path (string) – input shapefile

  • raster_path (string) – input raster

  • band (int) – an integer val eg - 2

  • nodata_value (numerical) – If used the no data val of the raster

  • ind (int) – The feature ID to use - if used this will use one feature and rotate it 90 for the second

Returns

Return type

list of template match arrays same size as input

utilities.visual_callback_2d(background, fig=None)

Returns a callback than can be passed as the argument iter_callback of morphological_geodesic_active_contour and morphological_chan_vese for visualizing the evolution of the levelsets. Only works for 2D images.

Parameters
  • background ((M, N) array) – Image to be plotted as the background of the visual evolution.

  • fig (matplotlib.figure.Figure) – Figure where results will be drawn. If not given, a new figure will be created.

Returns

callback – A function that receives a levelset and updates the current plot accordingly. This can be passed as the iter_callback argument of morphological_geodesic_active_contour and morphological_chan_vese.

Return type

Python function

utilities.wipe_ply_field(incloud, outcloud, tfield='training', field='label')

Scrub a field from a ply file

Parameters
  • incloud (string) – path to an input ply

  • outcloud (string) – path to the output sply

  • tfield (string) – the training field

  • field (string) – the field to erase (that was previously full of class values)

geospatial_learn.convnet module

The convnet module.

Description

A module for using pytorch to classify EO data for semantic segmentation and object identification

convnet.chip_pred(inRas, model, outMap, encoder, classes=['1'], tilesize=256, bands=[1, 2, 3], weights='imagenet', device='cuda')

Chip-based prediction of EO imagery Based on pytorch & albumentations

Parameters
  • inRas (string) – the input raster

  • model (string or pytorch model) – the model to predict

  • outMap (string) – the output classification map

  • encoder (string) – the encoder component of the CNN e.g. resnet34

  • tilesize (int) – the image chip/tile size that will be processed def 256

  • bands (list of ints) – the image bands to use

Notes

This is an early version with some stuff to add/change

convnet.collect_train(masklist, tilelist, outdir, chip_size=256, bands=[1, 2, 3])
Collect and save chips of both mask and image from a list of images for a

semantic segmentation task

The list of images must correspond/ be in the same order

Parameters
  • masklist (list) – A list of images containing the training masks

  • tilelist (list) – A list of images containing the corresponding spectral info

  • outdir (string) – Where the training chips will be written

  • chip_size (int) – the training “chip” size e.g. 256x256 pixels dependent on the nnet used

Returns

  • A tuple of lists of the respective paths of both masks and corresponding

  • images

convnet.collect_train_chip(masklist, tilelist, outdir, chip_size=256, include_zero=True, bands=[1, 2, 3])

Collect and save chips of an image from a list of masks and images

for a chip-based CNN (i.e. we are simply labelling a chip NOT segmenting anything)

Please note that areas of 0 (no mask) will count as a class

The list of images must correspond/ be in the same order.

Parameters
  • masklist (list) – A list of images containing the training masks

  • tilelist (list) – A list of images containing the corresponding spectral info

  • outdir (string) – Where the training chips will be written

  • chip_size (int) – the training “chip” size e.g. 256x256 pixels dependent on the nnet used

  • include_zero (bool) – whether to include a non-masked area as class 0

Returns

  • A tuple of lists of the respective paths of both masks and corresponding

  • images

convnet.makewrkdir(directory)
convnet.semseg_pred(inRas, model, outMap, encoder, classes=['1'], tilesize=256, bands=[1, 2, 3], weights='imagenet', device='cuda')

Semantic Segmentation of EO-imagery - an early version things are to be changed in the near future Based on segmentation_models.pytorch & albumentations

Parameters
  • inRas (string) – the input raster

  • model (string or pytorch model) – the model to predict

  • outMap (string) – the output classification map

  • encoder (string) – the encoder component of the CNN e.g. resnet34

  • tilesize (int) – the image chip/tile size that will be processed def 256

  • bands (list of ints) – the image bands to use

Notes

This is an early version with some stuff to add/change

convnet.train_semantic_seg(maindir, plot=False, bands=[1, 2, 3], tilesize=256, f1=False, preTrain=True, proc='cuda:0', activation='softmax2d', classes=['1'], weights='imagenet', modelpth='./best_model.pth', plot_score=True, params={'batch_size': 16, 'classes': 1, 'device': 'cuda', 'encoder': 'resnet34', 'epochs': 50, 'in_channels': 3, 'lr': 0.0001, 'model': 'Unet', 'num_workers': 2}, nt=- 1)

multi-class Semantic Segmentation of EO-imagery - an early version things are to be changed in the near future Based on segmentation_models.pytorch & albumentations

Parameters
  • mainDir (string) – the working directory where everything is done

  • modelpth (string) – where to save the model eg ‘dir/best_model.pth’

  • plot (bool) – whether to plot intermediate data results eg visualise the image aug, test results etc.

  • bands (list of ints) – the image bands to use

  • tilesize (int) – the size of the image tile used in training and thus classification

  • classes (list) – a list of strings with the classes eg [‘1’, ‘2’, ‘3’] as labelled in raster

  • weights (string) – the encoder weights, typically imagenet or None for rand init

  • f1 (bool) – whether to svae a classification report (will be a plot in working dir)

  • activation (string) – the neural net activation function e.g ‘sigmoid’ for binary or softmax2d for multiclass

  • params (dict) – the convnet model params models: Unet, UNet11, UNet16, U Linknet, FPN, PSPNet, PAN, DeepLabV3 and DeepLabV3+ encoders: ‘resnet18’,’resnet34’,’resnet50’, ‘resnet101’,’resnet152’,’resnext50_32x4d’, ‘resnext101_32x4d’,’resnext101_32x8d’,’resnext101_32x16d’,’resnext101_32x32d’, ‘resnext101_32x48d’,’dpn68’,’dpn68b’,’dpn92’,’dpn98’,’dpn107’,’dpn131’,’vgg11’, ‘vgg11_bn’,’vgg13’,’vgg13_bn’,’vgg16’,’vgg16_bn’,’vgg19’,’vgg19_bn’,’senet154’, ‘se_resnet50’,’se_resnet101’,’se_resnet152’,’se_resnext50_32x4d’,’se_resnext101_32x4d’, ‘densenet121’,’densenet169’,’densenet201’,’densenet161’,’inceptionresnetv2’, ‘inceptionv4’,’efficientnet-b0’,’efficientnet-b1’,’efficientnet-b2’,’efficientnet-b3’, ‘efficientnet-b4’,’efficientnet-b5’,’efficientnet-b6’,’efficientnet-b7’, ‘mobilenet_v2’,’xception’,’timm-efficientnet-b0’,’timm-efficientnet-b1’, ‘timm-efficientnet-b2’,’timm-efficientnet-b3’,’timm-efficientnet-b4’,’timm-efficientnet-b5’, ‘timm-efficientnet-b6’,’timm-efficientnet-b7’,’timm-efficientnet-b8’,’timm-efficientnet-l2’

Notes

This is an early version with some stuff to add/change

Module contents