geospatial_learn package¶
Submodules¶
geospatial_learn.raster module¶
The raster module.
Description¶
A series of tools for the manipulation of geospatial imagery/rasters such as masking or raster algebraic type functions and the conversion of Sentinel 2 data to gdal compatible formats.
-
raster.
array2raster
(array, bands, inRaster, outRas, dtype, FMT=None)¶ Save a raster from a numpy array using the geoinfo from another.
- Parameters
array (np array) – a numpy array.
bands (int) – the no of bands.
inRaster (string) – the path of a raster.
outRas (string) – the path of the output raster.
dtype (int) – though you need to know what the number represents! a GDAL datatype (see the GDAL website) e.g gdal.GDT_Int32
FMT (string) – (optional) a GDAL raster format (see the GDAL website) eg Gtiff, HFA, KEA.
-
raster.
batch_gdaldem
(inlist, prop='aspect')¶ batch dem calculation a load of gdal files from some format to tif
- Parameters
inlist (string) – A list of raster paths
prop (string) – one of “hillshade”, “slope”, “aspect”, “color-relief”, “TRI”, “TPI”, “Roughness”
- Returns
- Return type
List of file paths
-
raster.
batch_translate
(folder, wildcard, FMT='Gtiff')¶ Using the gdal python API, this function translates the format of files to commonly used formats
- Parameters
folder (string) – the folder containing the rasters to be translated
wildcard (string) – the format wildcard to search for e.g. ‘.tif’
FMT (string (optional)) – a GDAL raster format (see the GDAL website) eg Gtiff, HFA, KEA
-
raster.
batch_translate_adf
(inlist)¶ batch translate a load of adf (arcgis) files from some format to tif
- Parameters
inlist (string) – A list of raster paths
- Returns
- Return type
List of file paths
-
raster.
batch_wms_download
(gdf, wms, layer, img_size, outdir, attribute='id', espg='27700', res=0.25)¶ Download a load of wms tiles with georeferencing
- Parameters
gdf (geopandas gdf)
wms (string) – the wms addresss
layer (string) – the wms layer
img_size (tuple) – image x,y dims
espg (string) – the proj espg
outfile (string) – path to outfile
res (int) – per pixel resolution of imagery in metres
-
raster.
bbox2raster
(array, bands, bbox, outras, pixel_size=0.25, proj=27700, dtype=5, FMT='Gtiff')¶ Using a bounding box and other information georef an image and write to disk
- Parameters
array (np array) – a numpy array.
bands (int) – the no of bands.
bbox (list or tuple) – xmin, ymin, xmax, ymax
pixel_size (int) – pixel size in metres (unless proj is degrees!)
outras (string) – the path of the output raster.
proj (int) – the espg code eg 27700 for osgb
dtype (int) – though you need to know what the number represents! a GDAL datatype (see the GDAL website) e.g gdal.GDT_Int32 = 5
FMT (string) – (optional) a GDAL raster format (see the GDAL website) eg Gtiff, KEA.
-
raster.
calc_ndvi
(inputIm, outputIm, bandsList, blocksize=256, FMT=None, dtype=None)¶ Create a copy of an image with an ndvi band added
- Parameters
inputIm (string) – the granule folder
bands (list) – a list of band indicies to be used, eg - [3,4] for Sent2 data
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
blocksize (int) – the chunk of raster read in & write out
-
raster.
clip_raster
(inRas, inShp, outRas, cutline=True)¶ Clip a raster
- Parameters
inRas (string) – the input image
outPoly (string) – the input polygon file path
outRas (string (optional)) – the clipped raster
cutline (bool (optional)) – retain raster values only inside the polygon
-
raster.
color_raster
(inRas, color_file, output_file)¶ Generate a txt colorfile and make a RGB image from a grayscale one
- Parameters
inRas (string) – Path to input raster (single band greyscale)
color_file (string) – Path to output colorfile.txt
-
raster.
combine_scene
(scl, c_scn, blocksize=256)¶ Combine another scene classification with the sen2cor one
- Parameters
scl (string) – the sen2cor one
c_scn (string) – the independently derived one - this will be modified
blocksize (string) – chunck to process
-
raster.
hist_match
(inputImage, templateImage)¶ Adjust the pixel values of a grayscale image such that its histogram matches that of a target image.
Writes to the inputImage dataset so that it matches
As the entire band histogram is required this can become memory intensive with big rasters eg 10 x 10k+
Inspire by/adapted from something on stack on image processing - credit to that author
- Parameters
inputImage (string) – image to transform; the histogram is computed over the flattened array
templateImage (string) – template image can have different dimensions to source
-
raster.
mask_raster
(inputIm, mval, overwrite=True, outputIm=None, blocksize=None, FMT=None)¶ Perform a numpy masking operation on a raster where all values corresponding to mask value are retained - does this in blocks for efficiency on larger rasters
- Parameters
inputIm (string) – the input raster
mval (int) – the mask value eg 1, 2 etc
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
outputIm (string (optional)) – optionally write a separate output image, if None, will mask the input
blocksize (int) – the chunk of raster to read in
- Returns
A string of the output file path
- Return type
string
-
raster.
mask_raster_multi
(inputIm, mval=1, rule='==', outval=None, mask=None, blocksize=256, FMT=None, dtype=None)¶ Perform a numpy masking operation on a raster where all values corresponding to, less than or greater than the mask value are retained - does this in blocks for efficiency on larger rasters
- Parameters
inputIm (string) – the granule folder
mval (int) – the masking value that delineates pixels to be kept
rule (string) – the logic for masking either ‘==’, ‘<’ or ‘>’
outval (numerical dtype eg int, float) – the areas removed will be written to this value default is 0
mask (string) – the mask raster to be used (optional)
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
mode (string) – None > 10m data, ‘20’ >20m
blocksize (int) – the chunk of raster read in & write out
-
raster.
mask_with_poly
(vector_path, raster_path, value=0)¶ Change raster values inside a polygon and update the raster
- Parameters
vector_path (string) – input shapefile
raster_path (string) – input raster
value (int) – the value to alter
-
raster.
multi_temp_filter
(inRas, outRas, bands=None, windowSize=None)¶ The multi temp filter for SAR data as outlined & published by Quegan et al
This is only suitable for small images, as it holds intermediate data in memory
- Parameters
inRas (string) – the input raster
outRas (string) – the output raster
blocksize (int) – the chunck processed
windowsize (int) – the filter window size
FMT (string) – gdal compatible (optional) defaults is tif
-
raster.
polygonize
(inRas, outPoly, outField=None, mask=True, band=1, filetype='ESRI Shapefile')¶ Polygonise a raster
- Parameters
inRas (string) – the input image
outPoly (string) – the output polygon file path
outField (string (optional)) – the name of the field containing burnded values
mask (bool (optional)) – use the input raster as a mask
band (int) – the input raster band
-
raster.
raster2array
(inRas, bands=[1])¶ Read a raster and return an array, either single or multiband
- Parameters
inRas (string) – input raster
bands (list) – a list of bands to return in the array
-
raster.
rasterize
(inShp, inRas, outRas, field=None, fmt='Gtiff')¶ Rasterize a polygon to the extent & geo transform of another raster
- Parameters
inRas (string) – the input image
outRas (string) – the output polygon file path
field (string (optional)) – the name of the field containing burned values, if none will be 1s
fmt (the gdal image format)
-
raster.
remove_cloud_S2
(inputIm, sceneIm, blocksize=256, FMT=None, min_size=4, dist=1)¶ Remove cloud using the a scene classification
This saves back to the input raster by default
- Parameters
inputIm (string) – the input image
sceneIm (string) – the scenemap to use as a mask for removing cloud It is assumed the scene map consists of 1 shadow, 2 cloud, 3 land, 4 water
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
min_size (int) – size in pixels to retain of cloud mask
blocksize (int) – the square chunk processed at any one time
-
raster.
remove_cloud_S2_stk
(inputIm, sceneIm1, sceneIm2=None, baseIm=None, blocksize=256, FMT=None, max_size=10, dist=1)¶ Remove cloud using a classification where cloud == 1 Esoteric - from the Forest Sentinel project, but retained here
- Parameters
inputIm (string) – the input image
sceneIm1, 2 (string) – the classification rasters used to mask out the areas in
baseIm (string) – Another multiband raster of same size extent as the inputIm where the baseIm image values are used rather than simply converting to zero (in the use case of 2 sceneIm classifications)
Notes
———–
Useful if you have a base image which is a cloudless composite, which
you intend to replace with the current image for the next round of
classification/ change detection
-
raster.
rgb_ind
(inputIm, outputIm, blocksize=256, FMT=None, dtype=5)¶ Create a copy of an image with rgb indices added :Parameters: * inputIm (string) – the input rgb image
outputIm (string) – the output image
FMT (string) – the output gdal format eg ‘Gtiff’, ‘KEA’, ‘HFA’
blocksize (int) – the chunk of raster read in & write out
-
raster.
srtm_gdaldem
(inlist, prop='aspect')¶ Batch dem calculation a load of srtm files
SRTM scale & z factor vary across the globe so this calculates based on latitude
- Parameters
inlist (string) – A list of raster paths
prop (string) – one of “hillshade”, “slope”, “aspect”, “color-relief”, “TRI”, “TPI”, “Roughness”
- Returns
- Return type
List of file paths
-
raster.
stack_ras
(rasterList, outFile)¶ Stack some rasters
- Parameters
rasterList (string) – the input image
outFile (string) – the output file path including file extension
-
raster.
stat_comp
(inRas, outMap, bandList=None, stat='percentile', q=95, blocksize=256, FMT=None, dtype=6)¶ Calculate depth wise stat on a multi band raster with selected or all bands
- Parameters
inRas (string) – input Raster
outMap (string) – the output raster calculated
stat (string) – the statisitc to be calculated make sure there are no nans as nan percentile is far too slow
blocksize (int) – the chunck processed
q (int) – the ith percentile if percentile is the stat used
FMT (string) – gdal compatible (optional) defaults is tif
dtype (string) – gdal datatype (default gdal.GDT_Int32)
-
raster.
temporal_comp
(fileList, outMap, stat='percentile', q=95, folder=None, blocksize=256, FMT=None, dtype=5)¶ Calculate an image beased on a time series collection of imagery (eg a years woth of S2 data)
- Parameters
FileList (list of strings) – the files to be inputed, if None a folder must be specified
outMap (string) – the output raster calculated
stat (string) – the statisitc to be calculated
blocksize (int) – the chunck processed
q (int) – the ith percentile if percentile is the stat used
FMT (string) – gdal compatible (optional) defaults is tif
dtype (string) – gdal datatype (default gdal.GDT_Int32)
-
raster.
tile_rasters
(inRas, outDir, tilesize=['256', '256'], overlap='0')¶ Split a large raster into smaller ones
- Parameters
inRas (string) – the path to input raster
outDir (string) – the path to the output dir
tilesize (list of str) – the sides of a square tile [“256”, “256”]
overlap (string) – should a overlap per tile be required
-
raster.
wmsGrabber
(bbox, image_size, wms, layer, outfile, espg='27700', res=0.25)¶ Return a wms tile from a given source and optionally write to disk with georef
- Parameters
bbox (list or tuple) – xmin, ymin, xmax, ymax
image_size (tuple) – image x,y dims
wms (string) – the wms addresss
layer (string) – the wms (sub)layer
espg (string) – the proj espg
outfile (string) – path to outfile, if None only array is returned
-
raster.
write_vrt
(infiles, outfile)¶ - Parameters
infiles
outfile (string) – the output .vrt
geospatial_learn.learning module¶
the learning module
Description¶
The learning module set of functions provide a framework to optimise and classify EO data for both per pixel or object properties
-
learning.
RF_oob_opt
(model, X_train, min_est, max_est, step, regress=False)¶ This function uses the oob score to find the best parameters.
This cannot be parallelized due to the warm start bootstrapping, so is potentially slower than the other cross val in the create_model function
This function is based on an example from the sklearn site
This function plots a graph diplaying the oob rate
- Parameters
model (string (.gz)) – path to model to be saved
X_train (np array) – numpy array of training data where the 1st column is labels
min_est (int) – min no of trees
max_est (int) – max no of trees
step (int) – the step at which no of trees is increased
regress (bool) – boolean where if True it is a regressor
Returns (tuple of np arrays)
———————–
error rate, best estimator
-
learning.
classify_object
(model, inShape, attributes, field_name=None, write='gpd')¶ Classify a polygon/point file attributes (‘object based’) using an sklearn model
- Parameters
model (string) – path to input model
inShape (string) – input shapefile path (must be .shp for now….)
attributes (list of stings) – list of attributes names
field_name (string) – name of classified label field (optional)
write (string) – either gpd(geopandas) or ogr
-
learning.
classify_pixel
(model, inputDir, bands, outMap, probMap)¶ A function to classify an image using a pre-saved model - assumes a folder of tiled rasters for memory management - classify_pixel_block is recommended instead of this function
Parameters
- model: sklearn model
a path to a scikit learn model that has been saved
- inputDir: string
a folder with images to be classified
- bands: int
the no of image bands eg 8
- outMap: string
path to output image excluding the file format ‘pathto/mymap’
- probMap: string
path to output prob image excluding the file format ‘pathto/mymap’
- FMT: string
optional parameter - gdal readable fmt
-
learning.
classify_pixel_bloc
(model, inputImage, outMap, bands=[1, 2, 3], blocksize=None, FMT=None, ndvi=None, dtype=5)¶ A block processing classifier for large rasters, supports KEA, HFA, & Gtiff formats. KEA is recommended, Gtiff is the default
- Parameters
model (sklearn model / keras model) – a path to a model that has been saved
inputImage (string) – path to image including the file fmt ‘Myimage.tif’
bands (band) – list of band indices to be used eg [1,2,3]
outMap (string) – path to output image excluding the file format ‘pathto/mymap’
FMT (string) – optional parameter - gdal readable fmt
blocksize (int (optional)) – size of raster chunck in pixels 256 tends to be quickest if you put None it will read size from gdal (this doesn’t always pay off!)
dtype (int (optional - gdal syntax gdal.GDT_Int32)) – a gdal dataype - default is int32
Notes
Block processing is sequential, but quite a few sklearn models are parallel so that has been prioritised rather than raster IO
-
learning.
classify_ply
(incld, inModel, train_field='training', class_field='label', rgb=True, outcld=None, ignore=['x', 'y', 'scalar_ScanAngleRank', 'scalar_NumberOfReturns', 'scalar_ReturnNumber', 'scalar_GpsTime', 'scalar_PointSourceId'])¶ Classify a point cloud (ply format)
- Parameters
incld (string) – the input point cloud
class_field (string) – the name of the field that the results will be written to this must already exist! Create in CldComp. or cgal
train_field (string) – the name of the training label field so it can be ignored
rgb (bool) – whether there is rgb data to be included
outcld (string) – path to a new ply to write if not writing to the input one
ignore (list) – the pointcloud attributes to ignore for classification
-
learning.
create_model
(X_train, outModel, clf='erf', group=None, random=False, cv=5, cores=- 1, strat=True, test_size=0.3, regress=False, params=None, scoring=None, class_names=None, save=True)¶ Brute force or random model creating using scikit learn. Either use the default params in this function or enter your own (recommended - see sklearn)
- Parameters
X_train (np array) – numpy array of training data where the 1st column is labels. If the groupkfold is used, the last column will be the group labels
outModel (string) – the output model path which is a gz file, if using keras it is h5
params (dict) – a dict of model params (see scikit learn)
clf (string) – an sklearn or xgb classifier/regressor logit, sgd, linsvc, svc, svm, nusvm, erf, rf, gb, xgb,
group (np.array) – array of group labels for train/test split and grid search useful to avoid autocorrelation
random (bool) – if True, a random param search
cv (int) – no of folds
cores (int or -1 (default)) – the no of parallel jobs
strat (bool) – a stratified grid search
test_size (float) – percentage to hold out to test
regress (bool) – a regression model if True, a classifier if False
scoring (string) – a suitable sklearn scoring type (see notes)
class_names (list of strings) – class names in order of their numercial equivalents
- Returns
A list of
[grid.best_estimator_, grid.cv_results_, grid.best_score_, – grid.best_params_, classification_report)]
Scoring types - there are a lot - some of which won’t work for multi-class, regression etc - see the sklearn docs!
‘accuracy’, ‘adjusted_rand_score’, ‘average_precision’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_samples’, ‘f1_weighted’, ‘neg_log_loss’, ‘neg_mean_absolute_error’, ‘neg_mean_squared_error’, ‘neg_median_absolute_error’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_samples’, ‘precision_weighted’, ‘r2’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_samples’, ‘recall_weighted’, ‘roc_auc’
-
learning.
create_model_autosk
(X_train, outModel, cores=- 1, class_names=None, incld_est=None, excld_est=None, incld_prep=None, excld_prep=None, total_time=120, res_args={'cv': 5}, mem_limit=None, per_run=None, test_size=0.3, wrkfolder=None, scoring=None, save=True, ply=False)¶ Auto-sklearn to create a model
- Parameters
X_train (np array) – numpy array of training data where the 1st column is labels
outModel (string) – the output model path which is a gz file, if using keras it is h5
cores (int or -1 (default)) – the no of parallel jobs
class_names (list of strings) – class names in order of their numercial equivalents
incld_est (list of strings) – estimators to included eg [‘random_forest’]
excld_est (list of strings) – estimators to excluded eg [‘random_forest’]
incld_prep (list of strings) – preproc to include
excld_prep (list of strings) – preproc to include
total_time (int) – time in seconds for the whole search process
res_args (dict) – strategy for overfit avoidance e.g. {‘cv’:5}
mem_limit (int) – memory limit per job
per_run (int) – time limit per run
test_size (float) – percentage to hold out to test
wrkfolder (string) – path to dir for intermediate working
scoring (string) – a suitable sklearn scoring type (see notes)
- Returns
A list of
[model, classif_report]
-
learning.
create_model_tpot
(X_train, outModel, gen=5, popsize=50, group=None, cv=5, cores=- 1, dask=False, test_size=0.2, regress=False, params=None, scoring=None, verbosity=2, warm_start=False)¶ Create a model using the tpot library where genetic algorithms are used to optimise pipline and params.
- Parameters
X_train (np array) – numpy array of training data where the 1st column is labels
outModel (string) – the output model path (which is a .py file) from which to run the pipeline
cv (int) – no of folds
cores (int or -1 (default)) – the no of parallel jobs
regress (bool) – a regression model if True, a classifier if False
test_size (float) – size of test set held out
params (a dict of model params (see tpot)) – enter your own params dict rather than the range provided e.g. {‘sklearn.ensemble.RandomForestClassifier’: {“n_estimators”: [200],
“max_features”: [‘sqrt’, ‘log2’], “max_depth”: [10, None],
},
‘xgboost.sklearn.XGBClassifier’ ({) –
- ‘n_estimators’: [200],
‘learning_rate’: [0.1, 0.2, 0.4]
}}
- scoring: string
a suitable sklearn scoring type (see notes)
- warm_start: bool
use the previous population, useful if interactive
-
learning.
get_polars
(inShp, polars=['VV', 'VH'])¶ Get list of fields containing polarisations from a polygon/point file
- Parameters
inShp (string) – the input polygon
polars (list of strings) – the attributes headed with polarisations eg ‘VV’
-
learning.
get_training
(inShape, inRas, bands, field, outFile=None)¶ Collect training as an np array for use with create model function
- Parameters
inShape (string) – the input shapefile - must be esri .shp at present
inRas (string) – the input raster from which the training is extracted
bands (int) – no of bands
field (string) – the attribute field containing the training labels
outFile (string (optional)) – path to the training file saved as joblib format (eg - ‘training.gz’)
- Returns
A tuple containing
-np array of training data
-list of polygons with invalid geometry that were not collected
-
learning.
get_training_ply
(incld, label_field='training', classif_field='label', rgb=True, outFile=None, ignore=['x', 'y', 'scalar_ScanAngleRank', 'scalar_NumberOfReturns', 'scalar_ReturnNumber', 'scalar_GpsTime', 'scalar_PointSourceId'])¶ Get training from a point cloud
- Parameters
incld (string) – the input point cloud
label_field (string) – the name of the field representing the training points which must be positive integers
classif_field (string) – the name of the field that will be used for classification later must be specified so it can be ignored
rgb (bool) – whether there is rgb data to be included
outFile (string) – path to training array to be saved as .gz via joblib
ignore (list) – the pointcloud attributes to ignore for training
- Returns
np array of training where first column is labels
list of feature names for later ref/plotting
-
learning.
get_training_shp
(inShape, label_field, feat_fields, outFile=None)¶ Collect training from a shapefile attribute table. Used for object-based classification (typically).
- Parameters
inShape (string) – the input polygon
label_field (string) – the field name for the class labels
feat_fields (list) – the field names of the feature data
outFile (string (optional)) – path to training data to be saved (.gz)
- Returns
training data as a dataframe, first column is labels, rest are features
list of reject features
-
learning.
plot_feature_importances
(modelPth, featureNames, model_type='scikit')¶ Plot the feature importances of an ensemble classifier
- Parameters
modelPth (string) – A sklearn model path
featureNames (list of strings) – a list of feature names
-
learning.
ply_features
(incld, outcld=None, k=[50, 100, 200], props=['anisotropy', 'curvature', 'eigenentropy', 'eigen_sum', 'linearity', 'omnivariance', 'planarity', 'sphericity'], nrm_props=None)¶ Calculate point cloud features and write to file
Currently memory intensive due to using pyntcloud….
- Parameters
incld (string) – the input point cloud
outcld (string) – the output point cloud
k (list) – the no of neighbors to use when calculating the props multiple is more effective
props (list) – the properties you wish to include
nrm_props (list) – properties based on normals if the exist (this will fail if they don’t) e.g. [“inclination_radians”, “orientation_radians”]
-
learning.
prob_pixel_bloc
(model, inputImage, bands, outMap, classes, blocksize=None, FMT=None, one_class=None)¶ A block processing classifier for large rasters that produces a probability, output.
Supports KEA, HFA, & Gtiff formats -KEA is recommended, Gtiff is the default
- Parameters
model (string) – a path to a scikit learn model that has been saved
inputImage (string) – path to image including the file fmt ‘Myimage.tif’
bands (int) – the no of image bands eg 8
outMap (string) – path to output image excluding the file format ‘pathto/mymap’
classes (int) – no of classes
blocksize (int (optional)) – size of raster chunck 256 tends to be quickest if you put None it will read size from gdal (this doesn’t always pay off!)
FMT (string) – optional parameter - gdal readable fmt eg ‘Gtiff’
one_class (int) – choose a single class to produce output prob raster
Block processing is sequential, but quite a few sklearn models are parallel so that has been prioritised rather than raster IO
-
learning.
regression_results
(y_true, y_pred)¶
-
learning.
rmse_vector_lyr
(inShape, attributes)¶ Using sklearn get the rmse of 2 vector attributes (the actual and predicted of course in the order [‘actual’, ‘pred’])
- Parameters
inShape (string) – the input vector of OGR type
attributes (list) – a list of strings denoting the attributes
geospatial_learn.shape module¶
The shape module.
Description¶
This module contains various functions for the writing of data in OGR vector formats. The functions are mainly concerned with writing geometric or pixel based attributes, with the view to them being classified in the learning module
-
shape.
buffer
(inShp, outfile, dist)¶ Buffer a shapefile by a given distance outputting a new one
- Parameters
inShp (string) – input shapefile
outfile (string) – output shapefile
dist (float) – the distance in map units to buffer
-
shape.
create_ogr_poly
(outfile, spref, file_type='ESRI Shapefile', field='id', field_dtype=0)¶ Create an ogr dataset an layer (convenience)
- Parameters
outfile (string) – path to ogr file
spref (wkt or int) – spatial reference either a wkt or espg
file_type (string) – ogr file designation
field (string) – attribute field e.g. “id”
field_type (int or ogr.OFT…..) – ogr dtype of field e.g. 0 == ogr.OFTInteger
-
shape.
extent2poly
(infile, filetype='raster', outfile=None, polytype='ESRI Shapefile', geecoord=False, lyrtype='ogr')¶ Get the coordinates of a files extent and return an ogr polygon ring with the option to save the file
- Parameters
infile (string) – input ogr compatible geometry file or gdal raster
filetype (string) – the path of the output file, if not specified, it will be input file with ‘extent’ added on before the file type
outfile (string) – the path of the output file, if not specified, it will be input file with ‘extent’ added on before the file type
polytype (string) – ogr comapatible file type (see gdal/ogr docs) default ‘ESRI Shapefile’ ensure your outfile string has the equiv. e.g. ‘.shp’ or in case of memory only ‘Memory’ (outfile would be None in that case)
geecoord (bool) – optionally convert to WGS84 lat,lon
lyrtype (string) – either ‘gee’ which means earth engine or ‘ogr’ which returns ds and lyr
- Returns
- Return type
a GEE polygon geometry or ogr dataset and layer
-
shape.
filter_shp
(inShp, expression, outField, outLabel)¶ Filter and index an OGR polygon file features by attribute
Potentially useful for rule sets or prepping a subsiduary underlying raster operation
- Parameters
inShp (string) – input shapefile
expression (string) – expression e.g. “DN >= 168”
outField (string) – the field in which the label will reside
outLabel (int) – the label identifying the filtered features
-
shape.
geom2pixelbbox
(inshp, inras, label='Tree', outfile=None)¶ Convert shapefile geometries to a df of pixel bounding boxes Projections must be the same!
- Parameters
inshp (string) – input ogr compatible geometry
inras (string) – input raster
label (string) – label name def. ‘Tree’
outfile (string) – path to save annotation csv
-
shape.
mesh_from_raster
(inras, outshp=None, band=1)¶
-
shape.
meshgrid
(inRaster, outShp, gridHeight=1, gridWidth=1)¶
-
shape.
ms_snake
(inShp, inRas, outShp, band=2, buf1=0, buf2=0, algo='ACWE', nodata_value=0, iterations=200, smoothing=1, lambda1=1, lambda2=1, threshold='auto', balloon=- 1)¶ Deform a polygon using active contours on the values of an underlying raster.
This uses morphsnakes and explanations are from there.
- Parameters
inShp (string) – input shapefile
inRas (string) – input raster
outShp (string) – output shapefile
band (int) – an integer val eg - 2
algo (string) – either “GAC” (geodesic active contours) or the default “ACWE” (active contours without edges)
buf1 (int) – the buffer if any in map units for the bounding box of the poly which extracts underlying pixel values.
buf2 (int) – the buffer if any in map units for the expansion or contraction of the poly which will initialise the active contour. This is here as you may wish to adjust the init polygon so it does not converge on a adjacent one or undesired area.
nodata_value (numerical) – If used the no data val of the raster
iterations (uint) – Number of iterations to run.
smoothing (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.
lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.
lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.
threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.
balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.
-
shape.
poly2dictlist
(inShp)¶ convert an ogr to a list of json like dicts
-
shape.
rasterext2poly
(inras)¶
-
shape.
shape_props
(inShape, prop, inRas=None, label_field='ID')¶ Calculate various geometric properties of a set of polygons Output will be relative to geographic units where relevant, but normalised where not (eg Eccentricity)
- Parameters
inShape (string) – input shape file path
inRas (string) – a raster to get the correct dimensions from (optional), required for scikit-image props
prop (string) – Scikit image regionprops prop (see http://scikit-image.org/docs/dev/api/skimage.measure.html)
OGR is used to generate most of these as it is faster but the string
keys are same as scikit-image see notes for which require raster
Notes
Only shape file needed (OGR / shapely / numpy based)
‘MajorAxisLength’, ‘MinorAxisLength’, Area’, ‘Eccentricity’, ‘Solidity’, ‘Extent’: ‘Extent’, ‘Perimeter’: ‘Perim’
Raster required
- ‘Orientation’ and the remainder of props calcualble with scikit-image. These
process a bit slower than the above ones
-
shape.
shp2gj
(inShape, outJson)¶ Converts a geojson/json to a shapefile
- Parameters
inShape (string) – input shapefile
outJson (string) – output geojson
Notes
Credit to person who posted this on the pyshp site
-
shape.
snake
(inShp, inRas, outShp, band=1, buf=1, nodata_value=0, boundary='fixed', alpha=0.1, beta=30.0, w_line=0, w_edge=0, gamma=0.01, max_iterations=2500, smooth=True, eq=False, rgb=False)¶ Deform a line using active contours based on the values of an underlying
raster - based on skimage at present so
not quick!
Notes
Param explanations for snake/active contour from scikit-image api
- Parameters
inShp (string) – input shapefile
inRas (string) – input raster
band (int) – an integer val eg - 2
buf (int) – the buffer area to include for the snake deformation
alpha (float) – Snake length shape parameter. Higher values makes snake contract faster.
beta (float) – Snake smoothness shape parameter. Higher values makes snake smoother.
w_line (float) – Controls attraction to brightness. Use negative values to attract toward dark regions.
w_edge (float) – Controls attraction to edges. Use negative values to repel snake from edges.
gamma (float) – Explicit time stepping parameter.
max_iterations (int) – No of iterations to evolve snake
boundary (string) – Scikit-image text: Boundary conditions for the contour. Can be one of ‘periodic’, ‘free’, ‘fixed’, ‘free-fixed’, or ‘fixed-free’. ‘periodic’ attaches the two ends of the snake, ‘fixed’ holds the end-points in place, and ‘free’ allows free movement of the ends. ‘fixed’ and ‘free’ can be combined by parsing ‘fixed-free’, ‘free-fixed’. Parsing ‘fixed-fixed’ or ‘free-free’ yields same behaviour as ‘fixed’ and ‘free’, respectively.
nodata_value (numerical) – If used the no data val of the raster
rgb (bool) – read in bands 1-3 assuming them to be RGB
-
shape.
sqlfilter
(inShp, sql)¶ Return an OGR layer via sql statement for some further analysis
See https://gdal.org/user/ogr_sql_dialect.html for examples
Notes
An OS Master map example
“SELECT * FROM TopographicArea WHERE DescriptiveGroup=’General Surface’”
- Parameters
inShp (string) – input shapefile
sql (string) – sql expression (ogr dialect)
- Returns
- Return type
ogr lyr
-
shape.
texture_stats
(inShp, inRas, band, gprop='contrast', offset=2, angle=0, write_stat=None, nodata_value=0, mean=False)¶ Calculate and optionally write texture stats for an OGR compatible polygon based on underlying raster values
- Parameters
inShp (string) – input shapefile
inRas (string) – input raster path
gprop (string) – a skimage gclm property entropy, contrast, dissimilarity, homogeneity, ASM, energy, correlation
offset (int) – distance in pixels to measure - minimum of 2!!!
angle (int) – angle in degrees from pixel (int)
mean (bool) – take the mean of all offsets
Important to note that the results will be unreliable for glcm
texture features if seg is true as non-masked values will be zero or
some weird no data and will affect results
Notes
Important
The texture of the bounding box is at present the “relible” measure
Using the segment only results in potentially spurious results due to the scikit-image algorithm measuring texture over zero/nodata to number pixels (e.g 0>54). The segment part will be developed in due course to overcome this issue
-
shape.
thresh_seg
(inShp, inRas, outShp, band, buf=0, algo='otsu', min_area=4, nodata_value=0)¶ Use an image processing technique to threshold foreground and background in a polygon segment.
This default is otsu’s method.
- Parameters
inShp (string) – input shapefile
inRas (string) – input raster
band (int) – an integer val eg - 2
algo (string) – ‘otsu’, niblack, sauvola
nodata_value (numerical) – If used the no data val of the raster
-
shape.
write_id_field
(inShape, fieldName='id')¶ Write a string to a ogr vector file
- Parameters
inShape (string) – input OGR vecotr file
fieldName (string) – name of field being written
-
shape.
write_text_field
(inShape, fieldName, attribute)¶ Write a string to a ogr vector file
- Parameters
inShape (string) – input OGR vecotr file
fieldName (string) – name of field being written
attribute (string) – ‘text to enter in each entry of column’
-
shape.
zonal_point
(inShp, inRas, field, band=1, nodata_value=0, write_stat=True)¶ Get the pixel val at a given point and write to vector
- Parameters
inShp (string) – input shapefile
inRas (string) – input raster
field (string) – the name of the field
band (int) – an integer val eg - 2
nodata_value (numerical) – If used the no data val of the raster
-
shape.
zonal_rgb_idx
(inShp, inRas, nodata_value=0)¶ Calculate RGB-based indicies per segment/AOI
- Parameters
inShp (string) – input shapefile
inRas (string) – input raster
nodata_value (numerical) – If used the no data val of the raster
-
shape.
zonal_stats
(inShp, inRas, band, bandname, layer=None, stat='mean', write_stat=True, nodata_value=0, all_touched=True, expression=None)¶ Calculate zonal stats for an OGR polygon file
- Parameters
inShp (string) – input shapefile
inRas (string) – input raster
band (int) – an integer val eg - 2
bandname (string) – eg - blue
layer (string) – if using a db type format with multi layers, specify the name of the layer in question
stat (string) – string of a stat to calculate, if omitted it will be ‘mean’ others: ‘mode’, ‘min’,’mean’,’max’, ‘std’,’ sum’, ‘count’,’var’, skew’, ‘kurt (osis)’, ‘vol’
write_stat (bool (optional)) – If True, stat will be written to OGR file, if false, dataframe only returned (bool)
nodata_value (numerical) – If used the no data val of the raster
all_touched (bool) – whether to use all touched when raterising the polygon if the poly is smaller/comaparable to the pixel size, True is perhaps the best option
expression (string) – process a selection only eg expression e.g. “DN >= 168”
-
shape.
zonal_stats_all
(inShp, inRas, bandnames, statList=['mean', 'min', 'max', 'median', 'std', 'var', 'skew', 'kurt'])¶ Calculate zonal stats for an OGR polygon file
- Parameters
inShp (string) – input shapefile
inRas (string) – input raster
band (int) – an integer val eg - 2
bandnames (list) – eg - [‘b’,’g’,’r’,’nir’]
nodata_value (numerical) – If used the no data val of the raster
geospatial_learn.utilities module¶
Created on Thu Sep 8 22:35:39 2016 @author: Ciaran Robb The utilities module - things here don’t have an exact theme or home yet so may eventually move elsewhere
If you use code to publish work cite/acknowledge me and authors of libs etc as appropriate
-
utilities.
apply_lut
(src, lut)¶
-
utilities.
colorscale
(seg, prop='Area', custom=None)¶ Colour an array according to a region prop value
- Parameters
seg (np.array) – input array of labelled image
prop (string) – sklearn region prop
custom (list) – a custom list of values to apply to array
- Returns
- Return type
np array of attributed regions
-
utilities.
combine_grid
(inRas1, inRas2, outRas, outShp, min_area=None)¶
-
utilities.
do_phasecong
(tempIm, low_t=0, hi_t=0, norient=6, nscale=6, sigma=2)¶ process phase congruency on an image
-
utilities.
fixply
(incloud, outcloud, field='scalar_label')¶ Fix a ply file for use in cgal after cloudcompare
- Parameters
incloud (string) – path to an input ply
outcloud (string) – path to the output sply
field (string) – the scalar field to alter
-
utilities.
houghseg
(inRas, outShp, edge='canny', sigma=2, thresh=0, ratio=2, n_orient=6, n_scale=5, hArray=True, vArray=True, valrange=1, interval=10, band=2, min_area=None)¶ Detect and write Hough lines to a line shapefile and create rectangular segments from them
Implemented for the paper Robb et al. (2020), Semi-automated field plot segmentation from UAS imagery for experimental agriculture, Froniers in Plant Science
- Parameters
inRas (string) – path to an input raster from which the geo-reffing is obtained
outShp (string) – path to the output shapefile
edge (string) – edge method ‘canny’ or ‘ph’
sigma (float) – scalar value for gaussian smoothing
thresh (int/float) – the high hysterisis threshold
band (int) – the image band
hArray (bool) – axis 1 of the image
vArray (bool) – axis2 of the image
band (int) – axis 1 of the image
min_area (float) – the minimum area of segment to retain
-
utilities.
image_thresh
(image)¶
-
utilities.
imangle
(im)¶ Determine the orientation of non-zero vals in an image
- Parameters
im (np array) – input image
- Returns
axes – orientations of each side and binary array
- Return type
tuple
-
utilities.
iter_ransac
(image, sigma=3, no_iter=10, order='col', mxt=2500)¶
-
utilities.
min_bound_rectangle
(points)¶ Find the smallest bounding rectangle for a set of points. Returns a set of points representing the corners of the bounding box. :Parameters: points (list) – An nx2 iterable of points
- Returns
an nx2 list of coordinates
- Return type
list
-
utilities.
ms_toposeg
(inRas, outShp, iterations=100, algo='ACWE', band=2, dist=30, se=3, usemin=False, imtype=None, useedge=True, burnedge=False, merge=False, close=True, sigma=4, hi_t=None, low_t=None, init=4, smooth=1, lambda1=1, lambda2=1, threshold='auto', balloon=1)¶ Topology preserveing segmentation, implemented in python/nump inspired by ms_topo and morphsnakes
This uses morphsnakes level sets to make the segments and param explanations are mainly from there.
- Parameters
inSeg (string) – input segmentation raster
raster_path (string) – input raster whose pixel vals will be used
band (int) – an integer val eg - 2
algo (string) – either “GAC” (geodesic active contours) or “ACWE” (active contours without edges)
sigma (the size of stdv defining the gaussian envelope if using canny edge) – a unitless value
iterations (uint) – Number of iterations to run.
smooth (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.
lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.
lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.
threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.
balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.
-
utilities.
ms_toposnakes
(inSeg, inRas, outShp, iterations=100, algo='ACWE', band=2, sigma=4, alpha=100, smooth=1, lambda1=1, lambda2=1, threshold='auto', balloon=- 1)¶ Topology preserveing morphsnakes, implemented in python/numpy exclusively by C.Robb
This uses morphsnakes and explanations are from there.
- Parameters
inSeg (string) – input segmentation raster
raster_path (string) – input raster whose pixel vals will be used
band (int) – an integer val eg - 2
algo (string) – either “GAC” (geodesic active contours) or “ACWE” (active contours without edges)
sigma (the size of stdv defining the gaussian envelope if using canny edge) – a unitless value
iterations (uint) – Number of iterations to run.
smooth (uint, optional) – Number of times the smoothing operator is applied per iteration. Reasonable values are around 1-4. Larger values lead to smoother segmentations.
lambda1 (float, optional) – Weight parameter for the outer region. If lambda1 is larger than lambda2, the outer region will contain a larger range of values than the inner region.
lambda2 (float, optional) – Weight parameter for the inner region. If lambda2 is larger than lambda1, the inner region will contain a larger range of values than the outer region.
threshold (float, optional) – Areas of the image with a value smaller than this threshold will be considered borders. The evolution of the contour will stop in this areas.
balloon (float, optional) – Balloon force to guide the contour in non-informative areas of the image, i.e., areas where the gradient of the image is too small to push the contour towards a border. A negative value will shrink the contour, while a positive value will expand the contour in these areas. Setting this to zero will disable the balloon force.
-
utilities.
ragmerge
(inSeg, inRas, outShp, band, thresh=0.02)¶ - Parameters
inSeg (string) – Path to Input segmentation raster
inRas (string) – Path to underlying raster that will be used merge segments
outShape (string) – Path to output segmentation shape
band (int) – The band on which to perform the RAG merge
thresh (float) – The RAG merge threshold
-
utilities.
ransac_lines
(inRas, outRas, sigma=3, row=True, col=True, binwidth=40)¶
-
utilities.
raster2array
(inRas, bands=[1])¶ Read a raster and return an array, either single or multiband
- Parameters
inRas (string) – input raster
bands (list) – a list of bands to return in the array
-
utilities.
temp_match
(vector_path, raster_path, band, nodata_value=0, ind=None)¶ Based on polygons return template matched images
- Parameters
vector_path (string) – input shapefile
raster_path (string) – input raster
band (int) – an integer val eg - 2
nodata_value (numerical) – If used the no data val of the raster
ind (int) – The feature ID to use - if used this will use one feature and rotate it 90 for the second
- Returns
- Return type
list of template match arrays same size as input
-
utilities.
visual_callback_2d
(background, fig=None)¶ Returns a callback than can be passed as the argument iter_callback of morphological_geodesic_active_contour and morphological_chan_vese for visualizing the evolution of the levelsets. Only works for 2D images.
- Parameters
background ((M, N) array) – Image to be plotted as the background of the visual evolution.
fig (matplotlib.figure.Figure) – Figure where results will be drawn. If not given, a new figure will be created.
- Returns
callback – A function that receives a levelset and updates the current plot accordingly. This can be passed as the iter_callback argument of morphological_geodesic_active_contour and morphological_chan_vese.
- Return type
Python function
-
utilities.
wipe_ply_field
(incloud, outcloud, tfield='training', field='label')¶ Scrub a field from a ply file
- Parameters
incloud (string) – path to an input ply
outcloud (string) – path to the output sply
tfield (string) – the training field
field (string) – the field to erase (that was previously full of class values)
geospatial_learn.convnet module¶
The convnet module.
Description¶
A module for using pytorch to classify EO data for semantic segmentation and object identification
-
convnet.
chip_pred
(inRas, model, outMap, encoder, classes=['1'], tilesize=256, bands=[1, 2, 3], weights='imagenet', device='cuda')¶ Chip-based prediction of EO imagery Based on pytorch & albumentations
- Parameters
inRas (string) – the input raster
model (string or pytorch model) – the model to predict
outMap (string) – the output classification map
encoder (string) – the encoder component of the CNN e.g. resnet34
tilesize (int) – the image chip/tile size that will be processed def 256
bands (list of ints) – the image bands to use
Notes
This is an early version with some stuff to add/change
-
convnet.
collect_train
(masklist, tilelist, outdir, chip_size=256, bands=[1, 2, 3])¶ - Collect and save chips of both mask and image from a list of images for a
semantic segmentation task
The list of images must correspond/ be in the same order
- Parameters
masklist (list) – A list of images containing the training masks
tilelist (list) – A list of images containing the corresponding spectral info
outdir (string) – Where the training chips will be written
chip_size (int) – the training “chip” size e.g. 256x256 pixels dependent on the nnet used
- Returns
A tuple of lists of the respective paths of both masks and corresponding
images
-
convnet.
collect_train_chip
(masklist, tilelist, outdir, chip_size=256, include_zero=True, bands=[1, 2, 3])¶ Collect and save chips of an image from a list of masks and images
for a chip-based CNN (i.e. we are simply labelling a chip NOT segmenting anything)
Please note that areas of 0 (no mask) will count as a class
The list of images must correspond/ be in the same order.
- Parameters
masklist (list) – A list of images containing the training masks
tilelist (list) – A list of images containing the corresponding spectral info
outdir (string) – Where the training chips will be written
chip_size (int) – the training “chip” size e.g. 256x256 pixels dependent on the nnet used
include_zero (bool) – whether to include a non-masked area as class 0
- Returns
A tuple of lists of the respective paths of both masks and corresponding
images
-
convnet.
makewrkdir
(directory)¶
-
convnet.
semseg_pred
(inRas, model, outMap, encoder, classes=['1'], tilesize=256, bands=[1, 2, 3], weights='imagenet', device='cuda')¶ Semantic Segmentation of EO-imagery - an early version things are to be changed in the near future Based on segmentation_models.pytorch & albumentations
- Parameters
inRas (string) – the input raster
model (string or pytorch model) – the model to predict
outMap (string) – the output classification map
encoder (string) – the encoder component of the CNN e.g. resnet34
tilesize (int) – the image chip/tile size that will be processed def 256
bands (list of ints) – the image bands to use
Notes
This is an early version with some stuff to add/change
-
convnet.
train_semantic_seg
(maindir, plot=False, bands=[1, 2, 3], tilesize=256, f1=False, preTrain=True, proc='cuda:0', activation='softmax2d', classes=['1'], weights='imagenet', modelpth='./best_model.pth', plot_score=True, params={'batch_size': 16, 'classes': 1, 'device': 'cuda', 'encoder': 'resnet34', 'epochs': 50, 'in_channels': 3, 'lr': 0.0001, 'model': 'Unet', 'num_workers': 2}, nt=- 1)¶ multi-class Semantic Segmentation of EO-imagery - an early version things are to be changed in the near future Based on segmentation_models.pytorch & albumentations
- Parameters
mainDir (string) – the working directory where everything is done
modelpth (string) – where to save the model eg ‘dir/best_model.pth’
plot (bool) – whether to plot intermediate data results eg visualise the image aug, test results etc.
bands (list of ints) – the image bands to use
tilesize (int) – the size of the image tile used in training and thus classification
classes (list) – a list of strings with the classes eg [‘1’, ‘2’, ‘3’] as labelled in raster
weights (string) – the encoder weights, typically imagenet or None for rand init
f1 (bool) – whether to svae a classification report (will be a plot in working dir)
activation (string) – the neural net activation function e.g ‘sigmoid’ for binary or softmax2d for multiclass
params (dict) – the convnet model params models: Unet, UNet11, UNet16, U Linknet, FPN, PSPNet, PAN, DeepLabV3 and DeepLabV3+ encoders: ‘resnet18’,’resnet34’,’resnet50’, ‘resnet101’,’resnet152’,’resnext50_32x4d’, ‘resnext101_32x4d’,’resnext101_32x8d’,’resnext101_32x16d’,’resnext101_32x32d’, ‘resnext101_32x48d’,’dpn68’,’dpn68b’,’dpn92’,’dpn98’,’dpn107’,’dpn131’,’vgg11’, ‘vgg11_bn’,’vgg13’,’vgg13_bn’,’vgg16’,’vgg16_bn’,’vgg19’,’vgg19_bn’,’senet154’, ‘se_resnet50’,’se_resnet101’,’se_resnet152’,’se_resnext50_32x4d’,’se_resnext101_32x4d’, ‘densenet121’,’densenet169’,’densenet201’,’densenet161’,’inceptionresnetv2’, ‘inceptionv4’,’efficientnet-b0’,’efficientnet-b1’,’efficientnet-b2’,’efficientnet-b3’, ‘efficientnet-b4’,’efficientnet-b5’,’efficientnet-b6’,’efficientnet-b7’, ‘mobilenet_v2’,’xception’,’timm-efficientnet-b0’,’timm-efficientnet-b1’, ‘timm-efficientnet-b2’,’timm-efficientnet-b3’,’timm-efficientnet-b4’,’timm-efficientnet-b5’, ‘timm-efficientnet-b6’,’timm-efficientnet-b7’,’timm-efficientnet-b8’,’timm-efficientnet-l2’
Notes
This is an early version with some stuff to add/change