geospatial_learn package¶

Submodules¶

geospatial_learn.raster module¶

The raster module.

Description¶

A series of tools for the manipulation of geospatial imagery/rasters such as masking or raster algebraic type functions and the conversion of Sentinel 2 data to gdal compatible formats.

raster.array2raster(array, bands, inRaster, outRas, dtype, FMT=None)¶

Save a raster from a numpy array using the geoinfo from another.

Parameters

array (np array) – a numpy array.
bands (int) – the no of bands.
inRaster (string) – the path of a raster.
outRas (string) – the path of the output raster.
dtype (int) – though you need to know what the number represents! a GDAL datatype (see the GDAL website) e.g gdal.GDT_Int32
FMT (string) – (optional) a GDAL raster format (see the GDAL website) eg Gtiff, HFA, KEA.

raster.batch_gdaldem(inlist, prop='aspect')¶

batch dem calculation a load of gdal files from some format to tif

Parameters

inlist (string) – A list of raster paths
prop (string) – one of “hillshade”, “slope”, “aspect”, “color-relief”, “TRI”, “TPI”, “Roughness”

Returns

Return type

List of file paths

raster.batch_translate(folder, wildcard, FMT='Gtiff')¶

Using the gdal python API, this function translates the format of files to commonly used formats

Parameters

folder (string) – the folder containing the rasters to be translated
wildcard (string) – the format wildcard to search for e.g. ‘.tif’
FMT (string (optional)) – a GDAL raster format (see the GDAL website) eg Gtiff, HFA, KEA

raster.batch_translate_adf(inlist)¶

batch translate a load of adf (arcgis) files from some format to tif

Parameters: inlist (string) – A list of raster paths
Returns
Return type: List of file paths

raster.batch_wms_download(gdf, wms, layer, img_size, outdir, attribute='id', espg='27700', res=0.25)¶

Download a load of wms tiles with georeferencing

Parameters

gdf (geopandas gdf)
wms (string) – the wms addresss
layer (string) – the wms layer
img_size (tuple) – image x,y dims
espg (string) – the proj espg
outfile (string) – path to outfile
res (int) – per pixel resolution of imagery in metres

raster.bbox2raster(array, bands, bbox, outras, pixel_size=0.25, proj=27700, dtype=5, FMT='Gtiff')¶

Using a bounding box and other information georef an image and write to disk

Parameters

array (np array) – a numpy array.
bands (int) – the no of bands.
bbox (list or tuple) – xmin, ymin, xmax, ymax
pixel_size (int) – pixel size in metres (unless proj is degrees!)
outras (string) – the path of the output raster.
proj (int) – the espg code eg 27700 for osgb
dtype (int) – though you need to know what the number represents! a GDAL datatype (see the GDAL website) e.g gdal.GDT_Int32 = 5
FMT (string) – (optional) a GDAL raster format (see the GDAL website) eg Gtiff, KEA.

raster.calc_ndvi(inputIm, outputIm, bandsList, blocksize=256, FMT=None, dtype=None)¶

Create a copy of an image with an ndvi band added

Parameters

inputIm (string) – the granule folder
bands (list) – a list of band indicies to be used, eg - [3,4] for Sent2 data
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
blocksize (int) – the chunk of raster read in & write out

raster.clip_raster(inRas, inShp, outRas, cutline=True)¶

Clip a raster

Parameters

inRas (string) – the input image
outPoly (string) – the input polygon file path
outRas (string (optional)) – the clipped raster
cutline (bool (optional)) – retain raster values only inside the polygon

raster.color_raster(inRas, color_file, output_file)¶

Generate a txt colorfile and make a RGB image from a grayscale one

Parameters

inRas (string) – Path to input raster (single band greyscale)
color_file (string) – Path to output colorfile.txt

raster.combine_scene(scl, c_scn, blocksize=256)¶

Combine another scene classification with the sen2cor one

Parameters

scl (string) – the sen2cor one
c_scn (string) – the independently derived one - this will be modified
blocksize (string) – chunck to process

raster.hist_match(inputImage, templateImage)¶

Adjust the pixel values of a grayscale image such that its histogram matches that of a target image.

Writes to the inputImage dataset so that it matches

As the entire band histogram is required this can become memory intensive with big rasters eg 10 x 10k+

Inspire by/adapted from something on stack on image processing - credit to that author

Parameters

inputImage (string) – image to transform; the histogram is computed over the flattened array
templateImage (string) – template image can have different dimensions to source

raster.mask_raster(inputIm, mval, overwrite=True, outputIm=None, blocksize=None, FMT=None)¶

Perform a numpy masking operation on a raster where all values corresponding to mask value are retained - does this in blocks for efficiency on larger rasters

Parameters

inputIm (string) – the input raster
mval (int) – the mask value eg 1, 2 etc
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
outputIm (string (optional)) – optionally write a separate output image, if None, will mask the input
blocksize (int) – the chunk of raster to read in

Returns

A string of the output file path

Return type

string

raster.mask_raster_multi(inputIm, mval=1, rule='==', outval=None, mask=None, blocksize=256, FMT=None, dtype=None)¶

Perform a numpy masking operation on a raster where all values corresponding to, less than or greater than the mask value are retained - does this in blocks for efficiency on larger rasters

Parameters

inputIm (string) – the granule folder
mval (int) – the masking value that delineates pixels to be kept
rule (string) – the logic for masking either ‘==’, ‘<’ or ‘>’
outval (numerical dtype eg int, float) – the areas removed will be written to this value default is 0
mask (string) – the mask raster to be used (optional)
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
mode (string) – None > 10m data, ‘20’ >20m
blocksize (int) – the chunk of raster read in & write out

raster.mask_with_poly(vector_path, raster_path, value=0)¶

Change raster values inside a polygon and update the raster

Parameters

vector_path (string) – input shapefile
raster_path (string) – input raster
value (int) – the value to alter

raster.multi_temp_filter(inRas, outRas, bands=None, windowSize=None)¶

The multi temp filter for SAR data as outlined & published by Quegan et al

This is only suitable for small images, as it holds intermediate data in memory

Parameters

inRas (string) – the input raster
outRas (string) – the output raster
blocksize (int) – the chunck processed
windowsize (int) – the filter window size
FMT (string) – gdal compatible (optional) defaults is tif

raster.polygonize(inRas, outPoly, outField=None, mask=True, band=1, filetype='ESRI Shapefile')¶

Polygonise a raster

Parameters

inRas (string) – the input image
outPoly (string) – the output polygon file path
outField (string (optional)) – the name of the field containing burnded values
mask (bool (optional)) – use the input raster as a mask
band (int) – the input raster band

raster.raster2array(inRas, bands=[1])¶

Read a raster and return an array, either single or multiband

Parameters

inRas (string) – input raster
bands (list) – a list of bands to return in the array

raster.rasterize(inShp, inRas, outRas, field=None, fmt='Gtiff')¶

Rasterize a polygon to the extent & geo transform of another raster

Parameters

inRas (string) – the input image
outRas (string) – the output polygon file path
field (string (optional)) – the name of the field containing burned values, if none will be 1s
fmt (the gdal image format)

raster.remove_cloud_S2(inputIm, sceneIm, blocksize=256, FMT=None, min_size=4, dist=1)¶

Remove cloud using the a scene classification

This saves back to the input raster by default

Parameters

inputIm (string) – the input image
sceneIm (string) – the scenemap to use as a mask for removing cloud It is assumed the scene map consists of 1 shadow, 2 cloud, 3 land, 4 water
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
min_size (int) – size in pixels to retain of cloud mask
blocksize (int) – the square chunk processed at any one time

raster.remove_cloud_S2_stk(inputIm, sceneIm1, sceneIm2=None, baseIm=None, blocksize=256, FMT=None, max_size=10, dist=1)¶

Remove cloud using a classification where cloud == 1 Esoteric - from the Forest Sentinel project, but retained here

Parameters

inputIm (string) – the input image
sceneIm1, 2 (string) – the classification rasters used to mask out the areas in
baseIm (string) – Another multiband raster of same size extent as the inputIm where the baseIm image values are used rather than simply converting to zero (in the use case of 2 sceneIm classifications)
Notes
———–
Useful if you have a base image which is a cloudless composite, which
you intend to replace with the current image for the next round of
classification/ change detection

raster.rgb_ind(inputIm, outputIm, blocksize=256, FMT=None, dtype=5)¶

Create a copy of an image with rgb indices added :Parameters: * inputIm (string) – the input rgb image

outputIm (string) – the output image

FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’

blocksize (int) – the chunk of raster read in & write out

raster.srtm_gdaldem(inlist, prop='aspect')¶

Batch dem calculation a load of srtm files

SRTM scale & z factor vary across the globe so this calculates based on latitude

Parameters

inlist (string) – A list of raster paths
prop (string) – one of “hillshade”, “slope”, “aspect”, “color-relief”, “TRI”, “TPI”, “Roughness”

Returns

Return type

List of file paths

raster.stack_ras(rasterList, outFile)¶

Stack some rasters

Parameters

rasterList (string) – the input image
outFile (string) – the output file path including file extension

raster.stat_comp(inRas, outMap, bandList=None, stat='percentile', q=95, blocksize=256, FMT=None, dtype=6)¶

Calculate depth wise stat on a multi band raster with selected or all bands

Parameters

inRas (string) – input Raster
outMap (string) – the output raster calculated
stat (string) – the statisitc to be calculated make sure there are no nans as nan percentile is far too slow
blocksize (int) – the chunck processed
q (int) – the ith percentile if percentile is the stat used
FMT (string) – gdal compatible (optional) defaults is tif
dtype (string) – gdal datatype (default gdal.GDT_Int32)

raster.temporal_comp(fileList, outMap, stat='percentile', q=95, folder=None, blocksize=256, FMT=None, dtype=5)¶

Calculate an image beased on a time series collection of imagery (eg a years woth of S2 data)

Parameters

FileList (list of strings) – the files to be inputed, if None a folder must be specified
outMap (string) – the output raster calculated
stat (string) – the statisitc to be calculated
blocksize (int) – the chunck processed
q (int) – the ith percentile if percentile is the stat used
FMT (string) – gdal compatible (optional) defaults is tif
dtype (string) – gdal datatype (default gdal.GDT_Int32)

raster.tile_rasters(inRas, outDir, tilesize=['256', '256'], overlap='0')¶

Split a large raster into smaller ones

Parameters

inRas (string) – the path to input raster
outDir (string) – the path to the output dir
tilesize (list of str) – the sides of a square tile [“256”, “256”]
overlap (string) – should a overlap per tile be required

raster.wmsGrabber(bbox, image_size, wms, layer, outfile, espg='27700', res=0.25)¶

Return a wms tile from a given source and optionally write to disk with georef

Parameters

bbox (list or tuple) – xmin, ymin, xmax, ymax
image_size (tuple) – image x,y dims
wms (string) – the wms addresss
layer (string) – the wms (sub)layer
espg (string) – the proj espg
outfile (string) – path to outfile, if None only array is returned

raster.write_vrt(infiles, outfile)¶

Parameters

infiles
outfile (string) – the output .vrt

geospatial_learn.learning module¶

the learning module

Description¶

The learning module set of functions provide a framework to optimise and classify EO data for both per pixel or object properties

learning.RF_oob_opt(model, X_train, min_est, max_est, step, regress=False)¶

This function uses the oob score to find the best parameters.

This cannot be parallelized due to the warm start bootstrapping, so is potentially slower than the other cross val in the create_model function

This function is based on an example from the sklearn site

This function plots a graph diplaying the oob rate

Parameters

model (string (.gz)) – path to model to be saved
X_train (np array) – numpy array of training data where the 1st column is labels
min_est (int) – min no of trees
max_est (int) – max no of trees
step (int) – the step at which no of trees is increased
regress (bool) – boolean where if True it is a regressor
Returns (tuple of np arrays)
———————–
error rate, best estimator

learning.classify_object(model, inShape, attributes, field_name=None, write='gpd')¶

Classify a polygon/point file attributes (‘object based’) using an sklearn model

Parameters

model (string) – path to input model
inShape (string) – input shapefile path (must be .shp for now….)
attributes (list of stings) – list of attributes names
field_name (string) – name of classified label field (optional)
write (string) – either gpd(geopandas) or ogr

learning.classify_pixel(model, inputDir, bands, outMap, probMap)¶

A function to classify an image using a pre-saved model - assumes a folder of tiled rasters for memory management - classify_pixel_block is recommended instead of this function

Parameters

model: sklearn model: a path to a scikit learn model that has been saved
inputDir: string: a folder with images to be classified
bands: int: the no of image bands eg 8
outMap: string: path to output image excluding the file format ‘pathto/mymap’
probMap: string: path to output prob image excluding the file format ‘pathto/mymap’
FMT: string: optional parameter - gdal readable fmt

learning.classify_pixel_bloc(model, inputImage, outMap, bands=[1, 2, 3], blocksize=None, FMT=None, ndvi=None, dtype=5)¶

A block processing classifier for large rasters, supports KEA, HFA, & Gtiff formats. KEA is recommended, Gtiff is the default

Parameters

model (sklearn model / keras model) – a path to a model that has been saved
inputImage (string) – path to image including the file fmt ‘Myimage.tif’
bands (band) – list of band indices to be used eg [1,2,3]
outMap (string) – path to output image excluding the file format ‘pathto/mymap’
FMT (string) – optional parameter - gdal readable fmt
blocksize (int (optional)) – size of raster chunck in pixels 256 tends to be quickest if you put None it will read size from gdal (this doesn’t always pay off!)
dtype (int (optional - gdal syntax gdal.GDT_Int32)) – a gdal dataype - default is int32

Notes

Block processing is sequential, but quite a few sklearn models are parallel so that has been prioritised rather than raster IO

learning.classify_ply(incld, inModel, train_field='training', class_field='label', rgb=True, outcld=None, ignore=['x', 'y', 'scalar_ScanAngleRank', 'scalar_NumberOfReturns', 'scalar_ReturnNumber', 'scalar_GpsTime', 'scalar_PointSourceId'])¶

Classify a point cloud (ply format)

Parameters

incld (string) – the input point cloud
class_field (string) – the name of the field that the results will be written to this must already exist! Create in CldComp. or cgal
train_field (string) – the name of the training label field so it can be ignored
rgb (bool) – whether there is rgb data to be included
outcld (string) – path to a new ply to write if not writing to the input one
ignore (list) – the pointcloud attributes to ignore for classification

learning.create_model(X_train, outModel, clf='erf', group=None, random=False, cv=5, cores=- 1, strat=True, test_size=0.3, regress=False, params=None, scoring=None, class_names=None, save=True)¶

Brute force or random model creating using scikit learn. Either use the default params in this function or enter your own (recommended - see sklearn)

Parameters

X_train (np array) – numpy array of training data where the 1st column is labels. If the groupkfold is used, the last column will be the group labels
outModel (string) – the output model path which is a gz file, if using keras it is h5
params (dict) – a dict of model params (see scikit learn)
clf (string) – an sklearn or xgb classifier/regressor logit, sgd, linsvc, svc, svm, nusvm, erf, rf, gb, xgb,
group (np.array) – array of group labels for train/test split and grid search useful to avoid autocorrelation
random (bool) – if True, a random param search
cv (int) – no of folds
cores (int or -1 (default)) – the no of parallel jobs
strat (bool) – a stratified grid search
test_size (float) – percentage to hold out to test
regress (bool) – a regression model if True, a classifier if False
scoring (string) – a suitable sklearn scoring type (see notes)
class_names (list of strings) – class names in order of their numercial equivalents

Returns

A list of
[grid.best_estimator_, grid.cv_results_, grid.best_score_, – grid.best_params_, classification_report)]

Scoring types - there are a lot - some of which won’t work for multi-class, regression etc - see the sklearn docs!

‘accuracy’, ‘adjusted_rand_score’, ‘average_precision’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_samples’, ‘f1_weighted’, ‘neg_log_loss’, ‘neg_mean_absolute_error’, ‘neg_mean_squared_error’, ‘neg_median_absolute_error’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_samples’, ‘precision_weighted’, ‘r2’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_samples’, ‘recall_weighted’, ‘roc_auc’

learning.create_model_autosk(X_train, outModel, cores=- 1, class_names=None, incld_est=None, excld_est=None, incld_prep=None, excld_prep=None, total_time=120, res_args={'cv': 5}, mem_limit=None, per_run=None, test_size=0.3, wrkfolder=None, scoring=None, save=True, ply=False)¶

Auto-sklearn to create a model

Parameters

X_train (np array) – numpy array of training data where the 1st column is labels
outModel (string) – the output model path which is a gz file, if using keras it is h5
cores (int or -1 (default)) – the no of parallel jobs
class_names (list of strings) – class names in order of their numercial equivalents
incld_est (list of strings) – estimators to included eg [‘random_forest’]
excld_est (list of strings) – estimators to excluded eg [‘random_forest’]
incld_prep (list of strings) – preproc to include
excld_prep (list of strings) – preproc to include
total_time (int) – time in seconds for the whole search process
res_args (dict) – strategy for overfit avoidance e.g. {‘cv’:5}
mem_limit (int) – memory limit per job
per_run (int) – time limit per run
test_size (float) – percentage to hold out to test
wrkfolder (string) – path to dir for intermediate working
scoring (string) – a suitable sklearn scoring type (see notes)

Returns

A list of
[model, classif_report]

learning.create_model_tpot(X_train, outModel, gen=5, popsize=50, group=None, cv=5, cores=- 1, dask=False, test_size=0.2, regress=False, params=None, scoring=None, verbosity=2, warm_start=False)¶

Create a model using the tpot library where genetic algorithms are used to optimise pipline and params.

Parameters

X_train (np array) – numpy array of training data where the 1st column is labels
outModel (string) – the output model path (which is a .py file) from which to run the pipeline
cv (int) – no of folds
cores (int or -1 (default)) – the no of parallel jobs
regress (bool) – a regression model if True, a classifier if False
test_size (float) – size of test set held out
params (a dict of model params (see tpot)) – enter your own params dict rather than the range provided e.g. {‘sklearn.ensemble.RandomForestClassifier’: {“n_estimators”: [200],

“max_features”: [‘sqrt’, ‘log2’], “max_depth”: [10, None],
},
‘xgboost.sklearn.XGBClassifier’ ({) –

‘n_estimators’: [200],
‘learning_rate’: [0.1, 0.2, 0.4]
}}

scoring: string: a suitable sklearn scoring type (see notes)
warm_start: bool: use the previous population, useful if interactive

learning.get_polars(inShp, polars=['VV', 'VH'])¶

Get list of fields containing polarisations from a polygon/point file

Parameters

inShp (string) – the input polygon
polars (list of strings) – the attributes headed with polarisations eg ‘VV’

learning.get_training(inShape, inRas, bands, field, outFile=None)¶

Collect training as an np array for use with create model function

Parameters

inShape (string) – the input shapefile - must be esri .shp at present
inRas (string) – the input raster from which the training is extracted
bands (int) – no of bands
field (string) – the attribute field containing the training labels
outFile (string (optional)) – path to the training file saved as joblib format (eg - ‘training.gz’)

Returns

A tuple containing
-np array of training data
-list of polygons with invalid geometry that were not collected

learning.get_training_ply(incld, label_field='training', classif_field='label', rgb=True, outFile=None, ignore=['x', 'y', 'scalar_ScanAngleRank', 'scalar_NumberOfReturns', 'scalar_ReturnNumber', 'scalar_GpsTime', 'scalar_PointSourceId'])¶

Get training from a point cloud

Parameters

incld (string) – the input point cloud
label_field (string) – the name of the field representing the training points which must be positive integers
classif_field (string) – the name of the field that will be used for classification later must be specified so it can be ignored
rgb (bool) – whether there is rgb data to be included
outFile (string) – path to training array to be saved as .gz via joblib
ignore (list) – the pointcloud attributes to ignore for training

Returns

np array of training where first column is labels
list of feature names for later ref/plotting

learning.get_training_shp(inShape, label_field, feat_fields, outFile=None)¶

Collect training from a shapefile attribute table. Used for object-based classification (typically).

Parameters

inShape (string) – the input polygon
label_field (string) – the field name for the class labels
feat_fields (list) – the field names of the feature data
outFile (string (optional)) – path to training data to be saved (.gz)

Returns

training data as a dataframe, first column is labels, rest are features
list of reject features

learning.plot_feature_importances(modelPth, featureNames, model_type='scikit')¶

Plot the feature importances of an ensemble classifier

Parameters

modelPth (string) – A sklearn model path
featureNames (list of strings) – a list of feature names

learning.ply_features(incld, outcld=None, k=[50, 100, 200], props=['anisotropy', 'curvature', 'eigenentropy', 'eigen_sum', 'linearity', 'omnivariance', 'planarity', 'sphericity'], nrm_props=None)¶

Calculate point cloud features and write to file

Currently memory intensive due to using pyntcloud….

Parameters

incld (string) – the input point cloud
outcld (string) – the output point cloud
k (list) – the no of neighbors to use when calculating the props multiple is more effective
props (list) – the properties you wish to include
nrm_props (list) – properties based on normals if the exist (this will fail if they don’t) e.g. [“inclination_radians”, “orientation_radians”]

learning.prob_pixel_bloc(model, inputImage, bands, outMap, classes, blocksize=None, FMT=None, one_class=None)¶

A block processing classifier for large rasters that produces a probability, output.

Supports KEA, HFA, & Gtiff formats -KEA is recommended, Gtiff is the default

Parameters

model (string) – a path to a scikit learn model that has been saved
inputImage (string) – path to image including the file fmt ‘Myimage.tif’
bands (int) – the no of image bands eg 8
outMap (string) – path to output image excluding the file format ‘pathto/mymap’
classes (int) – no of classes
blocksize (int (optional)) – size of raster chunck 256 tends to be quickest if you put None it will read size from gdal (this doesn’t always pay off!)
FMT (string) – optional parameter - gdal readable fmt eg ‘Gtiff’
one_class (int) – choose a single class to produce output prob raster

Block processing is sequential, but quite a few sklearn models are parallel so that has been prioritised rather than raster IO

learning.regression_results(y_true, y_pred)¶

learning.rmse_vector_lyr(inShape, attributes)¶

Using sklearn get the rmse of 2 vector attributes (the actual and predicted of course in the order [‘actual’, ‘pred’])

Parameters

inShape (string) – the input vector of OGR type
attributes (list) – a list of strings denoting the attributes

geospatial_learn.shape module¶

The shape module.

Description¶

This module contains various functions for the writing of data in OGR vector formats. The functions are mainly concerned with writing geometric or pixel based attributes, with the view to them being classified in the learning module

shape.buffer(inShp, outfile, dist)¶

Buffer a shapefile by a given distance outputting a new one

Parameters

inShp (string) – input shapefile
outfile (string) – output shapefile
dist (float) – the distance in map units to buffer

shape.create_ogr_poly(outfile, spref, file_type='ESRI Shapefile', field='id', field_dtype=0)¶

Create an ogr dataset an layer (convenience)

Parameters

outfile (string) – path to ogr file
spref (wkt or int) – spatial reference either a wkt or espg
file_type (string) – ogr file designation
field (string) – attribute field e.g. “id”
field_type (int or ogr.OFT…..) – ogr dtype of field e.g. 0 == ogr.OFTInteger

shape.extent2poly(infile, filetype='raster', outfile=None, polytype='ESRI Shapefile', geecoord=False, lyrtype='ogr')¶

Get the coordinates of a files extent and return an ogr polygon ring with the option to save the file

Parameters

infile (string) – input ogr compatible geometry file or gdal raster
filetype (string) – the path of the output file, if not specified, it will be input file with ‘extent’ added on before the file type
outfile (string) – the path of the output file, if not specified, it will be input file with ‘extent’ added on before the file type
polytype (string) – ogr comapatible file type (see gdal/ogr docs) default ‘ESRI Shapefile’ ensure your outfile string has the equiv. e.g. ‘.shp’ or in case of memory only ‘Memory’ (outfile would be None in that case)
geecoord (bool) – optionally convert to WGS84 lat,lon
lyrtype (string) – either ‘gee’ which means earth engine or ‘ogr’ which returns ds and lyr

Returns

Return type

a GEE polygon geometry or ogr dataset and layer

shape.filter_shp(inShp, expression, outField, outLabel)¶

Filter and index an OGR polygon file features by attribute

Potentially useful for rule sets or prepping a subsiduary underlying raster operation

Parameters

inShp (string) – input shapefile
expression (string) – expression e.g. “DN >= 168”
outField (string) – the field in which the label will reside
outLabel (int) – the label identifying the filtered features

shape.geom2pixelbbox(inshp, inras, label='Tree', outfile=None)¶

Convert shapefile geometries to a df of pixel bounding boxes Projections must be the same!

Parameters

inshp (string) – input ogr compatible geometry
inras (string) – input raster
label (string) – label name def. ‘Tree’
outfile (string) – path to save annotation csv

shape.mesh_from_raster(inras, outshp=None, band=1)¶

shape.meshgrid(inRaster, outShp, gridHeight=1, gridWidth=1)¶

shape.ms_snake(inShp, inRas, outShp, band=2, buf1=0, buf2=0, algo='ACWE', nodata_value=0, iterations=200, smoothing=1, lambda1=1, lambda2=1, threshold='auto', balloon=- 1)¶

Deform a polygon using active contours on the values of an underlying raster.

This uses morphsnakes and explanations are from there.

Parameters

inShp (string) – input shapefile
inRas (string) – input raster
outShp (string) – output shapefile
band (int) – an integer val eg - 2
algo (string) – either “GAC” (geodesic active contours) or the default “ACWE” (active contours without edges)
buf1 (int) – the buffer if any in map units for the bounding box of the poly which extracts underlying pixel values.
buf2 (int) – the buffer if any in map units for the expansion or contraction of the poly which will initialise the active contour. This is here as you may wish to adjust the init polygon so it does not converge on a adjacent one or undesired area.
nodata_value (numerical) – If used the no data val of the raster
iterations (uint) – Number of iterations to run.
smoothing (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.
lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.
lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.
threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.
balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.

shape.poly2dictlist(inShp)¶: convert an ogr to a list of json like dicts

shape.rasterext2poly(inras)¶

shape.shape_props(inShape, prop, inRas=None, label_field='ID')¶

Calculate various geometric properties of a set of polygons Output will be relative to geographic units where relevant, but normalised where not (eg Eccentricity)

Parameters

inShape (string) – input shape file path
inRas (string) – a raster to get the correct dimensions from (optional), required for scikit-image props
prop (string) – Scikit image regionprops prop (see http://scikit-image.org/docs/dev/api/skimage.measure.html)
OGR is used to generate most of these as it is faster but the string
keys are same as scikit-image see notes for which require raster

Notes

Only shape file needed (OGR / shapely / numpy based)

‘MajorAxisLength’, ‘MinorAxisLength’, Area’, ‘Eccentricity’, ‘Solidity’, ‘Extent’: ‘Extent’, ‘Perimeter’: ‘Perim’

Raster required

‘Orientation’ and the remainder of props calcualble with scikit-image. These: process a bit slower than the above ones

shape.shp2gj(inShape, outJson)¶

Converts a geojson/json to a shapefile

Parameters

inShape (string) – input shapefile
outJson (string) – output geojson

Notes

Credit to person who posted this on the pyshp site

shape.snake(inShp, inRas, outShp, band=1, buf=1, nodata_value=0, boundary='fixed', alpha=0.1, beta=30.0, w_line=0, w_edge=0, gamma=0.01, max_iterations=2500, smooth=True, eq=False, rgb=False)¶

Deform a line using active contours based on the values of an underlying

raster - based on skimage at present so

not quick!

Notes

Param explanations for snake/active contour from scikit-image api

Parameters

inShp (string) – input shapefile
inRas (string) – input raster
band (int) – an integer val eg - 2
buf (int) – the buffer area to include for the snake deformation
alpha (float) – Snake length shape parameter. Higher values makes snake contract faster.
beta (float) – Snake smoothness shape parameter. Higher values makes snake smoother.
w_line (float) – Controls attraction to brightness. Use negative values to attract toward dark regions.
w_edge (float) – Controls attraction to edges. Use negative values to repel snake from edges.
gamma (float) – Explicit time stepping parameter.
max_iterations (int) – No of iterations to evolve snake
boundary (string) – Scikit-image text: Boundary conditions for the contour. Can be one of ‘periodic’, ‘free’, ‘fixed’, ‘free-fixed’, or ‘fixed-free’. ‘periodic’ attaches the two ends of the snake, ‘fixed’ holds the end-points in place, and ‘free’ allows free movement of the ends. ‘fixed’ and ‘free’ can be combined by parsing ‘fixed-free’, ‘free-fixed’. Parsing ‘fixed-fixed’ or ‘free-free’ yields same behaviour as ‘fixed’ and ‘free’, respectively.
nodata_value (numerical) – If used the no data val of the raster
rgb (bool) – read in bands 1-3 assuming them to be RGB

shape.sqlfilter(inShp, sql)¶

Return an OGR layer via sql statement for some further analysis

See https://gdal.org/user/ogr_sql_dialect.html for examples

Notes

An OS Master map example

“SELECT * FROM TopographicArea WHERE DescriptiveGroup=’General Surface’”

Parameters

inShp (string) – input shapefile
sql (string) – sql expression (ogr dialect)

Returns

Return type

ogr lyr

shape.texture_stats(inShp, inRas, band, gprop='contrast', offset=2, angle=0, write_stat=None, nodata_value=0, mean=False)¶

Calculate and optionally write texture stats for an OGR compatible polygon based on underlying raster values

Parameters

inShp (string) – input shapefile
inRas (string) – input raster path
gprop (string) – a skimage gclm property entropy, contrast, dissimilarity, homogeneity, ASM, energy, correlation
offset (int) – distance in pixels to measure - minimum of 2!!!
angle (int) – angle in degrees from pixel (int)
mean (bool) – take the mean of all offsets
Important to note that the results will be unreliable for glcm
texture features if seg is true as non-masked values will be zero or
some weird no data and will affect results

Notes

Important

The texture of the bounding box is at present the “relible” measure

Using the segment only results in potentially spurious results due to the scikit-image algorithm measuring texture over zero/nodata to number pixels (e.g 0>54). The segment part will be developed in due course to overcome this issue

shape.thresh_seg(inShp, inRas, outShp, band, buf=0, algo='otsu', min_area=4, nodata_value=0)¶

Use an image processing technique to threshold foreground and background in a polygon segment.

This default is otsu’s method.

Parameters

inShp (string) – input shapefile
inRas (string) – input raster
band (int) – an integer val eg - 2
algo (string) – ‘otsu’, niblack, sauvola
nodata_value (numerical) – If used the no data val of the raster

shape.write_id_field(inShape, fieldName='id')¶

Write a string to a ogr vector file

Parameters

inShape (string) – input OGR vecotr file
fieldName (string) – name of field being written

shape.write_text_field(inShape, fieldName, attribute)¶

Write a string to a ogr vector file

Parameters

inShape (string) – input OGR vecotr file
fieldName (string) – name of field being written
attribute (string) – ‘text to enter in each entry of column’

shape.zonal_point(inShp, inRas, field, band=1, nodata_value=0, write_stat=True)¶

Get the pixel val at a given point and write to vector

Parameters

inShp (string) – input shapefile
inRas (string) – input raster
field (string) – the name of the field
band (int) – an integer val eg - 2
nodata_value (numerical) – If used the no data val of the raster

shape.zonal_rgb_idx(inShp, inRas, nodata_value=0)¶

Calculate RGB-based indicies per segment/AOI

Parameters

inShp (string) – input shapefile
inRas (string) – input raster
nodata_value (numerical) – If used the no data val of the raster

shape.zonal_stats(inShp, inRas, band, bandname, layer=None, stat='mean', write_stat=True, nodata_value=0, all_touched=True, expression=None)¶

Calculate zonal stats for an OGR polygon file

Parameters

inShp (string) – input shapefile
inRas (string) – input raster
band (int) – an integer val eg - 2
bandname (string) – eg - blue
layer (string) – if using a db type format with multi layers, specify the name of the layer in question
stat (string) – string of a stat to calculate, if omitted it will be ‘mean’ others: ‘mode’, ‘min’,’mean’,’max’, ‘std’,’ sum’, ‘count’,’var’, skew’, ‘kurt (osis)’, ‘vol’
write_stat (bool (optional)) – If True, stat will be written to OGR file, if false, dataframe only returned (bool)
nodata_value (numerical) – If used the no data val of the raster
all_touched (bool) – whether to use all touched when raterising the polygon if the poly is smaller/comaparable to the pixel size, True is perhaps the best option
expression (string) – process a selection only eg expression e.g. “DN >= 168”

shape.zonal_stats_all(inShp, inRas, bandnames, statList=['mean', 'min', 'max', 'median', 'std', 'var', 'skew', 'kurt'])¶

Calculate zonal stats for an OGR polygon file

Parameters

inShp (string) – input shapefile
inRas (string) – input raster
band (int) – an integer val eg - 2
bandnames (list) – eg - [‘b’,’g’,’r’,’nir’]
nodata_value (numerical) – If used the no data val of the raster

geospatial_learn.utilities module¶

Created on Thu Sep 8 22:35:39 2016 @author: Ciaran Robb The utilities module - things here don’t have an exact theme or home yet so may eventually move elsewhere

If you use code to publish work cite/acknowledge me and authors of libs etc as appropriate

utilities.apply_lut(src, lut)¶

utilities.colorscale(seg, prop='Area', custom=None)¶

Colour an array according to a region prop value

Parameters

seg (np.array) – input array of labelled image
prop (string) – sklearn region prop
custom (list) – a custom list of values to apply to array

Returns

Return type

np array of attributed regions

utilities.combine_grid(inRas1, inRas2, outRas, outShp, min_area=None)¶

utilities.do_phasecong(tempIm, low_t=0, hi_t=0, norient=6, nscale=6, sigma=2)¶: process phase congruency on an image

utilities.fixply(incloud, outcloud, field='scalar_label')¶

Fix a ply file for use in cgal after cloudcompare

Parameters

incloud (string) – path to an input ply
outcloud (string) – path to the output sply
field (string) – the scalar field to alter

utilities.houghseg(inRas, outShp, edge='canny', sigma=2, thresh=0, ratio=2, n_orient=6, n_scale=5, hArray=True, vArray=True, valrange=1, interval=10, band=2, min_area=None)¶

Detect and write Hough lines to a line shapefile and create rectangular segments from them

Implemented for the paper Robb et al. (2020), Semi-automated field plot segmentation from UAS imagery for experimental agriculture, Froniers in Plant Science

Parameters

inRas (string) – path to an input raster from which the geo-reffing is obtained
outShp (string) – path to the output shapefile
edge (string) – edge method ‘canny’ or ‘ph’
sigma (float) – scalar value for gaussian smoothing
thresh (int/float) – the high hysterisis threshold
band (int) – the image band
hArray (bool) – axis 1 of the image
vArray (bool) – axis2 of the image
band (int) – axis 1 of the image
min_area (float) – the minimum area of segment to retain

utilities.image_thresh(image)¶

utilities.imangle(im)¶

Determine the orientation of non-zero vals in an image

Parameters: im (np array) – input image
Returns: axes – orientations of each side and binary array
Return type: tuple

utilities.iter_ransac(image, sigma=3, no_iter=10, order='col', mxt=2500)¶

utilities.min_bound_rectangle(points)¶

Find the smallest bounding rectangle for a set of points. Returns a set of points representing the corners of the bounding box. :Parameters: points (list) – An nx2 iterable of points

Returns: an nx2 list of coordinates
Return type: list

utilities.ms_toposeg(inRas, outShp, iterations=100, algo='ACWE', band=2, dist=30, se=3, usemin=False, imtype=None, useedge=True, burnedge=False, merge=False, close=True, sigma=4, hi_t=None, low_t=None, init=4, smooth=1, lambda1=1, lambda2=1, threshold='auto', balloon=1)¶

Topology preserveing segmentation, implemented in python/nump inspired by ms_topo and morphsnakes

This uses morphsnakes level sets to make the segments and param explanations are mainly from there.

Parameters

inSeg (string) – input segmentation raster
raster_path (string) – input raster whose pixel vals will be used
band (int) – an integer val eg - 2
algo (string) – either “GAC” (geodesic active contours) or “ACWE” (active contours without edges)
sigma (the size of stdv defining the gaussian envelope if using canny edge) – a unitless value
iterations (uint) – Number of iterations to run.
smooth (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.
lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.
lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.
threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.
balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.

utilities.ms_toposnakes(inSeg, inRas, outShp, iterations=100, algo='ACWE', band=2, sigma=4, alpha=100, smooth=1, lambda1=1, lambda2=1, threshold='auto', balloon=- 1)¶

Topology preserveing morphsnakes, implemented in python/numpy exclusively by C.Robb

This uses morphsnakes and explanations are from there.

Parameters

inSeg (string) – input segmentation raster
raster_path (string) – input raster whose pixel vals will be used
band (int) – an integer val eg - 2
algo (string) – either “GAC” (geodesic active contours) or “ACWE” (active contours without edges)
sigma (the size of stdv defining the gaussian envelope if using canny edge) – a unitless value
iterations (uint) – Number of iterations to run.
smooth (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.
lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.
lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.
threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.
balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.

utilities.ragmerge(inSeg, inRas, outShp, band, thresh=0.02)¶

Parameters

inSeg (string) – Path to Input segmentation raster
inRas (string) – Path to underlying raster that will be used merge segments
outShape (string) – Path to output segmentation shape
band (int) – The band on which to perform the RAG merge
thresh (float) – The RAG merge threshold

utilities.ransac_lines(inRas, outRas, sigma=3, row=True, col=True, binwidth=40)¶

utilities.raster2array(inRas, bands=[1])¶

Read a raster and return an array, either single or multiband

Parameters

inRas (string) – input raster
bands (list) – a list of bands to return in the array

utilities.temp_match(vector_path, raster_path, band, nodata_value=0, ind=None)¶

Based on polygons return template matched images

Parameters

vector_path (string) – input shapefile
raster_path (string) – input raster
band (int) – an integer val eg - 2
nodata_value (numerical) – If used the no data val of the raster
ind (int) – The feature ID to use - if used this will use one feature and rotate it 90 for the second

Returns

Return type

list of template match arrays same size as input

utilities.visual_callback_2d(background, fig=None)¶

Returns a callback than can be passed as the argument iter_callback of morphological_geodesic_active_contour and morphological_chan_vese for visualizing the evolution of the levelsets. Only works for 2D images.

Parameters

background ((M, N) array) – Image to be plotted as the background of the visual evolution.
fig (matplotlib.figure.Figure) – Figure where results will be drawn. If not given, a new figure will be created.

Returns

callback – A function that receives a levelset and updates the current plot accordingly. This can be passed as the iter_callback argument of morphological_geodesic_active_contour and morphological_chan_vese.

Return type

Python function

utilities.wipe_ply_field(incloud, outcloud, tfield='training', field='label')¶

Scrub a field from a ply file

Parameters

incloud (string) – path to an input ply
outcloud (string) – path to the output sply
tfield (string) – the training field
field (string) – the field to erase (that was previously full of class values)

geospatial_learn.convnet module¶

The convnet module.

Description¶

A module for using pytorch to classify EO data for semantic segmentation and object identification

convnet.chip_pred(inRas, model, outMap, encoder, classes=['1'], tilesize=256, bands=[1, 2, 3], weights='imagenet', device='cuda')¶

Chip-based prediction of EO imagery Based on pytorch & albumentations

Parameters

inRas (string) – the input raster
model (string or pytorch model) – the model to predict
outMap (string) – the output classification map
encoder (string) – the encoder component of the CNN e.g. resnet34
tilesize (int) – the image chip/tile size that will be processed def 256
bands (list of ints) – the image bands to use

Notes

This is an early version with some stuff to add/change

convnet.collect_train(masklist, tilelist, outdir, chip_size=256, bands=[1, 2, 3])¶

Collect and save chips of both mask and image from a list of images for a: semantic segmentation task

The list of images must correspond/ be in the same order

Parameters

masklist (list) – A list of images containing the training masks
tilelist (list) – A list of images containing the corresponding spectral info
outdir (string) – Where the training chips will be written
chip_size (int) – the training “chip” size e.g. 256x256 pixels dependent on the nnet used

Returns

A tuple of lists of the respective paths of both masks and corresponding
images

convnet.collect_train_chip(masklist, tilelist, outdir, chip_size=256, include_zero=True, bands=[1, 2, 3])¶

Collect and save chips of an image from a list of masks and images

for a chip-based CNN (i.e. we are simply labelling a chip NOT segmenting anything)

Please note that areas of 0 (no mask) will count as a class

The list of images must correspond/ be in the same order.

Parameters

masklist (list) – A list of images containing the training masks
tilelist (list) – A list of images containing the corresponding spectral info
outdir (string) – Where the training chips will be written
chip_size (int) – the training “chip” size e.g. 256x256 pixels dependent on the nnet used
include_zero (bool) – whether to include a non-masked area as class 0

Returns

A tuple of lists of the respective paths of both masks and corresponding
images

convnet.makewrkdir(directory)¶

convnet.semseg_pred(inRas, model, outMap, encoder, classes=['1'], tilesize=256, bands=[1, 2, 3], weights='imagenet', device='cuda')¶

Semantic Segmentation of EO-imagery - an early version things are to be changed in the near future Based on segmentation_models.pytorch & albumentations

Parameters

inRas (string) – the input raster
model (string or pytorch model) – the model to predict
outMap (string) – the output classification map
encoder (string) – the encoder component of the CNN e.g. resnet34
tilesize (int) – the image chip/tile size that will be processed def 256
bands (list of ints) – the image bands to use

Notes

This is an early version with some stuff to add/change

convnet.train_semantic_seg(maindir, plot=False, bands=[1, 2, 3], tilesize=256, f1=False, preTrain=True, proc='cuda:0', activation='softmax2d', classes=['1'], weights='imagenet', modelpth='./best_model.pth', plot_score=True, params={'batch_size': 16, 'classes': 1, 'device': 'cuda', 'encoder': 'resnet34', 'epochs': 50, 'in_channels': 3, 'lr': 0.0001, 'model': 'Unet', 'num_workers': 2}, nt=- 1)¶

multi-class Semantic Segmentation of EO-imagery - an early version things are to be changed in the near future Based on segmentation_models.pytorch & albumentations

Parameters

mainDir (string) – the working directory where everything is done
modelpth (string) – where to save the model eg ‘dir/best_model.pth’
plot (bool) – whether to plot intermediate data results eg visualise the image aug, test results etc.
bands (list of ints) – the image bands to use
tilesize (int) – the size of the image tile used in training and thus classification
classes (list) – a list of strings with the classes eg [‘1’, ‘2’, ‘3’] as labelled in raster
weights (string) – the encoder weights, typically imagenet or None for rand init
f1 (bool) – whether to svae a classification report (will be a plot in working dir)
activation (string) – the neural net activation function e.g ‘sigmoid’ for binary or softmax2d for multiclass
params (dict) – the convnet model params models: Unet, UNet11, UNet16, U Linknet, FPN, PSPNet, PAN, DeepLabV3 and DeepLabV3+ encoders: ‘resnet18’,’resnet34’,’resnet50’, ‘resnet101’,’resnet152’,’resnext50_32x4d’, ‘resnext101_32x4d’,’resnext101_32x8d’,’resnext101_32x16d’,’resnext101_32x32d’, ‘resnext101_32x48d’,’dpn68’,’dpn68b’,’dpn92’,’dpn98’,’dpn107’,’dpn131’,’vgg11’, ‘vgg11_bn’,’vgg13’,’vgg13_bn’,’vgg16’,’vgg16_bn’,’vgg19’,’vgg19_bn’,’senet154’, ‘se_resnet50’,’se_resnet101’,’se_resnet152’,’se_resnext50_32x4d’,’se_resnext101_32x4d’, ‘densenet121’,’densenet169’,’densenet201’,’densenet161’,’inceptionresnetv2’, ‘inceptionv4’,’efficientnet-b0’,’efficientnet-b1’,’efficientnet-b2’,’efficientnet-b3’, ‘efficientnet-b4’,’efficientnet-b5’,’efficientnet-b6’,’efficientnet-b7’, ‘mobilenet_v2’,’xception’,’timm-efficientnet-b0’,’timm-efficientnet-b1’, ‘timm-efficientnet-b2’,’timm-efficientnet-b3’,’timm-efficientnet-b4’,’timm-efficientnet-b5’, ‘timm-efficientnet-b6’,’timm-efficientnet-b7’,’timm-efficientnet-b8’,’timm-efficientnet-l2’

Notes

This is an early version with some stuff to add/change

geospatial_learn package¶

Submodules¶

geospatial_learn.raster module¶

Description¶

geospatial_learn.learning module¶

Description¶

geospatial_learn.shape module¶

Description¶

geospatial_learn.utilities module¶

geospatial_learn.convnet module¶

Description¶

Module contents¶