Introduction: Spatial

analysis refers to a special stream of Data Mining where data pertaining to

space and time are analyzed and modeled. It is often referred as “Geospatial

analysis”. Due to the prevalence of statistical methods used with spatial data,

it is also called “spatial statistics”. Spatial data analysis has many

practical applications in our life including Mapping applications, Geological

studies, research on Wildlife and Vegetation, Soil analysis, Medical

applications like Brain atrophy, Epidemiology and even advanced Astronomical

research. Modern engineering practices like Chip Design, GPS tracking, Mobility

Sensors and Routing algorithms use a variety of spatial analysis techniques.

Literature Survey: For

a well round perspective on the field of Spatial Analysis and various

Methodologies used, the following literature survey was done.

1.

Kumar

(2015) Spatial data mining: Recent trends and techniques, Computer and

Computational Sciences (ICCCS), 2015 International

Conference on; IEEE,

DOI: 10.1109/ICCACS.2015.7361319

2.

M. Bendechache (2015)

Distributed clustering algorithm for spatial data mining, Spatial Data

Mining and Geographical Knowledge Services (ICSDM), 2015 2nd IEEE International

Conference on; IEEE,

DOI: 10.1109/ICSDM.2015.7298026

3.

J. Paul Elhorst (2010) Applied Spatial

Econometrics: Raising the Bar, Spatial Economic Analysis, 5:1, 9-28, DOI:

10.1080/17421770903541772

4.

J. Chen

(2011) Comparisons with spatial autocorrelation and spatial association rule

mining, Spatial Data

Mining and Geographical Knowledge Services (ICSDM), 2011 IEEE International Conference

on; IEEE, DOI: 10.1109/ICSDM.2011.5969000

5.

D. Lord (2010) The Statistical Analysis of

Crash-Frequency Data: A Review and Assessment of Methodological Alternatives,

Transportation Research Part A Policy and Practice, DOI:

10.1016/j.tra.2010.02.001

6.

D.A.

Griffith (2011) Positive spatial autocorrelation, mixture distributions, and

geospatial data histograms, Spatial Data

Mining and Geographical Knowledge Services (ICSDM), 2011 IEEE International

Conference on; IEEE,

DOI: 10.1109/ICSDM.2011.5968119

7.

Y. Fan (2008) Spatial patterns of brain atrophy

in MCI patients, identified via high-dimensional pattern classification,

predict subsequent cognitive decline, Neuroimage; 39(4): 1731–1743. DOI:10.1016/j.neuroimage.2007.10.031

8.

F. Dormann (2007) Effects of incorporating

spatial autocorrelation into the analysis of species distribution data, Global

Ecology and Biogeography, 16, 129–138, DOI: 10.1111/j.1466-8238.2006.00279.x

9.

A. Griffith (2006) Spatial Modeling In Ecology: The

Flexibility Of Eigenfunction Spatial Analyses, Ecology, 87(10), 2006, pp.

2603–2613

10.

Guisan (2005) Predicting species distribution:

offering more than simple habitat models, Ecology Letters, 8: 993–1009, DOI:

10.1111/j.1461-0248.2005.00792.x

11.

T. Hengl (2004) A generic framework for spatial

prediction of soil variables based on regression-kriging, Geoderma 120 ; 75–93

12.

Burnett (2003) A multi-scale segmentation/object

relationship modeling methodology for landscape analysis, Ecological Modelling

168, 233–249

13.

S. Shekhar (2002) Spatial contextual

classification and prediction models for mining geospatial data, IEEE

Transactions on Multimedia, Volume: 4, Issue: 2;

IEEE, DOI: 10.1109/TMM.2002.1017732

14.

S. Ferrier (2002) Extended statistical

approaches to modelling spatial pattern in biodiversity, Biodiversity and Conservation

11: 2275–2307, 2002, DOI: 10.1023/A:1021302930424

15.

Lehmann (2002) GRASP: generalized regression

analysis and spatial prediction, Ecological Modelling 157 (2002) 189_/207

16.

Veldkamp (2001) Predicting land-use change,

Agriculture, Ecosystems and Environment 85; 1–6

17.

Stockwell (1999) The GARP modelling system:

problems and solutions to automated spatial prediction, International Journal

of Geographical Information Science, 13:2, 143-158, DOI:

10.1080/136588199241391

18.

H. Fielding (1997) A review of methods for the

assessment of prediction errors in conservation presence/absence models,

Environmental Conservation 24 (1): 38–49

19.

P. E. Gessler (1995) Soil-landscape modelling

and spatial prediction of soil attributes, International Journal of

Geographical Information Systems, 9:4, 421-432, DOI: 10.1080/02693799508902047

20.

D. Moore (1993) Soil Attribute Prediction using

Terrain Analysis, Soil Sci. Soc. Am. J. 57:443-452

Analysis: Based on the

above literature survey, the following spatial analysis methods are found most

in practice. Presented below is a short summary about these methods and their

references.

Spatial Regression: Spatial regression methods capture

spatial dependency in regression analysis, avoiding statistical

problems such as unstable parameters and unreliable significance tests, as well

as providing information on spatial relationships among the variables involved.

The estimated spatial relationships can be used on spatial and spatio-temporal

predictions.(1) Depending on the specific technique, spatial dependency

can enter the regression model as relationships between the independent

variables and the dependent, between the dependent variables and a spatial lag

of itself, or in the error terms. Geographically weighted

regression (GWR) is a local version of spatial regression that generates

parameters disaggregated by the spatial units of analysis.(2) This allows

assessment of the spatial heterogeneity in the estimated relationships between

the independent and dependent variables.

Spatial Autocorrelation: Spatial dependency is the

co-variation of properties within geographic space: characteristics at proximal

locations appear to be correlated, either positively or negatively. Spatial

dependency leads to the spatial autocorrelation problem

in statistics since, like temporal autocorrelation; this violates standard

statistical techniques that assume independence among observations. For

example, regression analyses that do not compensate

for spatial dependency can have unstable parameter estimates and yield

unreliable significance tests. Spatial regression models need to capture these

relationships so that they do not suffer from these weaknesses. It is also

appropriate to view spatial dependency as a source of information rather than

something to be corrected.(3)

Point Pattern Analysis: Point pattern analysis (PPA) is

the study of the spatial arrangements of points in (usually 2-dimensional)

space. The simplest formulation is a set X = {x E D} where

D, which can be called the ‘study region,’ is a subset of Rn, a n-dimensional Euclidean

space. The easiest way to visualize a 2-D point pattern is a map of

the locations, which is simply a scatterplot but with the provision that the

axes are equally scaled.

Semivariogram:

In spatial statistics the

theoretical variogram is a function describing the degree of spatial

dependence of a spatial random field or stochastic process. In

the case of a concrete example from the field of gold mining, a

variogram will give a measure of how much two samples taken from the mining

area will vary in gold percentage depending on the distance between those

samples. Samples taken far apart will vary more than samples taken close to

each other. The semivariogram was first defined by Matheron (1963) as

half the average squared difference between points separated at distance. (4)(5){displaystyle h}

Kriging: In geostatistics, kriging or Gaussian

process regression is a method of interpolation for

which the interpolated values are modeled by aGaussian

process governed by prior covariances,

as opposed to a piecewise-polynomial spline chosen

to optimize smoothness of the fitted values. Under suitable assumptions on the

priors, kriging gives the best linear unbiased prediction of

the intermediate values. Interpolating methods based on other criteria such as

smoothness need not yield the most likely intermediate values. The method is

widely used in the domain of spatial analysis and computer experiments. The technique is also

known as Wiener–Kolmogorov prediction, after Norbert

Wiener and Andrey

Kolmogorov.

Bayesian Hierarchical Models: Bayesian hierarchical

modelling is a statistical

model written in multiple levels (hierarchical form) that

estimates the parameters of the posterior distribution using

the Bayesian method. (6) The sub-models

combine to form the hierarchical model, and Bayes’

theorem is used to integrate them with the observed data and

account for all the uncertainty that is present. The result of this integration

is the posterior distribution, also known as the updated probability estimate,

as additional evidence on the prior distribution is acquired.

Simulation of Random Fields: Gaussian Markov random fields

(GMRFs) are powerful and important tools for modeling spatial data. They have

been widely used in different areas of spatial statistics including disease

mapping, spatialtemporal modeling and image analysis. Constructing a GMRF is

straightforward: it is just a finite-dimensional random vector following a

multivariate Gaussian distribution with additional conditional independence

properties, hence termed as Markov. It is convenient and invaluable to combine

the analytical results for the Gaussian distribution and the Markov properties,

which enables us to solve a large class of statistical models. Historically,

the most common method to make inference for the parameters in GMRFs has been

maximum likelihood (7)(8). The behavior of maximum likelihood estimator is

asymptotic in nature and their small sample behavior is often unknown. On the

other hand, the Markov property has become a requirement for constructing

efficient Markov chain Monte Carlo (MCMC) algorithms for GMRFs. Rue(9) showed

that the Markov property makes it possible to apply numerical methods on sparse

matrices. He proposed fast algorithms for sampling and evaluating the

log-density of a GMRF, and conducted efficient MCMC-based inferences. Rue and

Held(10) provides a comprehensive account of the main properties of GMRFs,

emphasizes the strong connection between GMRFs and numerical methods for sparse

matrices, and outlines various applications of GMRFs for statistical inference

(e.g., spatial statistics, time-series analysis, graphical models).

Spatiotemporal Analysis: Spatiotemporal data analysis is an

emerging research area due to the development and application of novel

computational techniques allowing for the analysis of large spatiotemporal

databases. Spatiotemporal models arise when data are collected across time as

well as space and has at least one spatial and one temporal property. An event

in a spatiotemporal dataset describes a spatial and temporal phenomenon that

exists at a certain time t and location x. The analysis of spatiotemporal data

requires that both temporal correlations and spatial correlations be taken into

account. Assessing both the temporal and spatial dimensions of data adds

significant complexity to the data analysis process for two major reasons: 1)

Continuous and discrete changes of spatial and non-spatial properties of

spatiotemporal objects and 2) the influence of collocated neighboring

spatiotemporal objects on one another.

References:

(1)

Song, Yongze; Yong Ge. “Spatial

distribution estimation of malaria in northern China and its scenarios in 2020,

2030, 2040 and 2050”. Malaria Journal.

(2)

Fotheringham, A. S.; Charlton, M. E.; Brunsdon,

C. (1998). “Geographically weighted regression: a natural evolution of the

expansion method for spatial data analysis”. Environment and Planning

A. 30 (11): 1905–1927.doi:10.1068/a301905

(3)

Knegt, De; Coughenour, M.B.; Skidmore, A.K.;

Heitkönig, I.M.A.; Knox, N.M.; Slotow, R.; Prins, H.H.T. (2010). “Spatial

autocorrelation and the scaling of species–environment relationships”. Ecology. 91:

2455–2465. doi:10.1890/09-1359.1

(4)

Matheron, Georges (1963). “Principles of

geostatistics”. Economic Geology. 58 (8): 1246–1266. doi:10.2113/gsecongeo.58.8.1246. ISSN 1554-0774

(5)

Ford, David. “The

Empirical Variogram” (PDF). faculty.washington.edu/edford.

Retrieved 31 October 2017

(6)

Allenby, Rossi, McCulloch (January 2005). “Hierarchical

Bayes Model: A Practitioner’s Guide”. Journal of

Bayesian Applications in Marketing

(7)

CressieN (1993) Statistics for Spatial Data. New York: Wiley-Interscience.

(8)

Richardson S,Guihenneuc C, Lasserre V

(1992)Spatial linear models with autocorrelated errorstructure. The

Statistician 41: 539-557.

(9)

RueH (2001) Fast sampling of Gaussian Markov random fields. Journal of the

Royal StatisticalSociety Series B 65: 325–338.

(10) Rue

H, Held L (2005) Gaussian Markov Random Fields: Theory and Applications, vol. 104 of

Monographson Statistics and Applied Probability. London: Chapman & Hall.