Introduction: data mining, Spatial Data Mining and Geographical Knowledge

Introduction: Spatial
analysis refers to a special stream of Data Mining where data pertaining to
space and time are analyzed and modeled. It is often referred as “Geospatial
analysis”. Due to the prevalence of statistical methods used with spatial data,
it is also called “spatial statistics”. Spatial data analysis has many
practical applications in our life including Mapping applications, Geological
studies, research on Wildlife and Vegetation, Soil analysis, Medical
applications like Brain atrophy, Epidemiology and even advanced Astronomical
research. Modern engineering practices like Chip Design, GPS tracking, Mobility
Sensors and Routing algorithms use a variety of spatial analysis techniques.

Literature Survey: For
a well round perspective on the field of Spatial Analysis and various
Methodologies used, the following literature survey was done.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

(2015) Spatial data mining: Recent trends and techniques, Computer and
Computational Sciences (ICCCS), 2015 International
Conference on; IEEE,
DOI: 10.1109/ICCACS.2015.7361319


M. Bendechache (2015)
Distributed clustering algorithm for spatial data mining, Spatial Data
Mining and Geographical Knowledge Services (ICSDM), 2015 2nd IEEE International
Conference on; IEEE,
DOI: 10.1109/ICSDM.2015.7298026


J. Paul Elhorst (2010) Applied Spatial
Econometrics: Raising the Bar, Spatial Economic Analysis, 5:1, 9-28, DOI:


J. Chen
(2011) Comparisons with spatial autocorrelation and spatial association rule
mining,  Spatial Data
Mining and Geographical Knowledge Services (ICSDM), 2011 IEEE International Conference
on; IEEE, DOI: 10.1109/ICSDM.2011.5969000


D. Lord (2010) The Statistical Analysis of
Crash-Frequency Data: A Review and Assessment of Methodological Alternatives,
Transportation Research Part A Policy and Practice, DOI:


Griffith (2011) Positive spatial autocorrelation, mixture distributions, and
geospatial data histograms, Spatial Data
Mining and Geographical Knowledge Services (ICSDM), 2011 IEEE International
Conference on; IEEE,
DOI: 10.1109/ICSDM.2011.5968119


Y. Fan (2008) Spatial patterns of brain atrophy
in MCI patients, identified via high-dimensional pattern classification,
predict subsequent cognitive decline, Neuroimage; 39(4): 1731–1743. DOI:10.1016/j.neuroimage.2007.10.031


F. Dormann (2007) Effects of incorporating
spatial autocorrelation into the analysis of species distribution data, Global
Ecology and Biogeography, 16, 129–138, DOI: 10.1111/j.1466-8238.2006.00279.x


A. Griffith (2006) Spatial Modeling In Ecology: The
Flexibility Of Eigenfunction Spatial Analyses, Ecology, 87(10), 2006, pp.


Guisan (2005) Predicting species distribution:
offering more than simple habitat models, Ecology Letters, 8: 993–1009, DOI:


T. Hengl (2004) A generic framework for spatial
prediction of soil variables based on regression-kriging, Geoderma 120 ; 75–93


Burnett (2003) A multi-scale segmentation/object
relationship modeling methodology for landscape analysis, Ecological Modelling
168, 233–249


S. Shekhar (2002) Spatial contextual
classification and prediction models for mining geospatial data,  IEEE
Transactions on Multimedia, Volume: 4, Issue: 2;
IEEE, DOI: 10.1109/TMM.2002.1017732


S. Ferrier (2002) Extended statistical
approaches to modelling spatial pattern in biodiversity, Biodiversity and Conservation
11: 2275–2307, 2002, DOI: 10.1023/A:1021302930424


Lehmann (2002) GRASP: generalized regression
analysis and spatial prediction, Ecological Modelling 157 (2002) 189_/207


Veldkamp (2001) Predicting land-use change,
Agriculture, Ecosystems and Environment 85; 1–6


Stockwell (1999) The GARP modelling system:
problems and solutions to automated spatial prediction, International Journal
of Geographical Information Science, 13:2, 143-158, DOI:


H. Fielding (1997) A review of methods for the
assessment of prediction errors in conservation presence/absence models,
Environmental Conservation 24 (1): 38–49


P. E. Gessler (1995) Soil-landscape modelling
and spatial prediction of soil attributes, International Journal of
Geographical Information Systems, 9:4, 421-432, DOI: 10.1080/02693799508902047


D. Moore (1993) Soil Attribute Prediction using
Terrain Analysis, Soil Sci. Soc. Am. J. 57:443-452


Analysis: Based on the
above literature survey, the following spatial analysis methods are found most
in practice. Presented below is a short summary about these methods and their

Spatial Regression: Spatial regression methods capture
spatial dependency in regression analysis, avoiding statistical
problems such as unstable parameters and unreliable significance tests, as well
as providing information on spatial relationships among the variables involved.
The estimated spatial relationships can be used on spatial and spatio-temporal
predictions.(1) Depending on the specific technique, spatial dependency
can enter the regression model as relationships between the independent
variables and the dependent, between the dependent variables and a spatial lag
of itself, or in the error terms. Geographically weighted
regression (GWR) is a local version of spatial regression that generates
parameters disaggregated by the spatial units of analysis.(2) This allows
assessment of the spatial heterogeneity in the estimated relationships between
the independent and dependent variables.

Spatial Autocorrelation: Spatial dependency is the
co-variation of properties within geographic space: characteristics at proximal
locations appear to be correlated, either positively or negatively. Spatial
dependency leads to the spatial autocorrelation problem
in statistics since, like temporal autocorrelation; this violates standard
statistical techniques that assume independence among observations. For
example, regression analyses that do not compensate
for spatial dependency can have unstable parameter estimates and yield
unreliable significance tests. Spatial regression models need to capture these
relationships so that they do not suffer from these weaknesses. It is also
appropriate to view spatial dependency as a source of information rather than
something to be corrected.(3)

Point Pattern Analysis: Point pattern analysis (PPA) is
the study of the spatial arrangements of points in (usually 2-dimensional)
space. The simplest formulation is a set X = {x E D} where
D, which can be called the ‘study region,’ is a subset of Rn, a n-dimensional Euclidean
space. The easiest way to visualize a 2-D point pattern is a map of
the locations, which is simply a scatterplot but with the provision that the
axes are equally scaled.

In spatial statistics the
theoretical variogram is a function describing the degree of spatial
dependence of a spatial random field or stochastic process. In
the case of a concrete example from the field of gold mining, a
variogram will give a measure of how much two samples taken from the mining
area will vary in gold percentage depending on the distance between those
samples. Samples taken far apart will vary more than samples taken close to
each other. The semivariogram was first defined by Matheron (1963) as
half the average squared difference between points separated at distance. (4)(5){displaystyle h}

Kriging: In geostatistics, kriging or Gaussian
process regression is a method of interpolation for
which the interpolated values are modeled by aGaussian
process governed by prior covariances,
as opposed to a piecewise-polynomial spline chosen
to optimize smoothness of the fitted values. Under suitable assumptions on the
priors, kriging gives the best linear unbiased prediction of
the intermediate values. Interpolating methods based on other criteria such as
smoothness need not yield the most likely intermediate values. The method is
widely used in the domain of spatial analysis and computer experiments. The technique is also
known as Wiener–Kolmogorov prediction, after Norbert
Wiener and Andrey

Bayesian Hierarchical Models: Bayesian hierarchical
modelling is a statistical
model written in multiple levels (hierarchical form) that
estimates the parameters of the posterior distribution using
the Bayesian method. (6) The sub-models
combine to form the hierarchical model, and Bayes’
theorem is used to integrate them with the observed data and
account for all the uncertainty that is present. The result of this integration
is the posterior distribution, also known as the updated probability estimate,
as additional evidence on the prior distribution is acquired.

Simulation of Random Fields: Gaussian Markov random fields
(GMRFs) are powerful and important tools for modeling spatial data. They have
been widely used in different areas of spatial statistics including disease
mapping, spatialtemporal modeling and image analysis. Constructing a GMRF is
straightforward: it is just a finite-dimensional random vector following a
multivariate Gaussian distribution with additional conditional independence
properties, hence termed as Markov. It is convenient and invaluable to combine
the analytical results for the Gaussian distribution and the Markov properties,
which enables us to solve a large class of statistical models. Historically,
the most common method to make inference for the parameters in GMRFs has been
maximum likelihood (7)(8). The behavior of maximum likelihood estimator is
asymptotic in nature and their small sample behavior is often unknown. On the
other hand, the Markov property has become a requirement for constructing
efficient Markov chain Monte Carlo (MCMC) algorithms for GMRFs. Rue(9) showed
that the Markov property makes it possible to apply numerical methods on sparse
matrices. He proposed fast algorithms for sampling and evaluating the
log-density of a GMRF, and conducted efficient MCMC-based inferences. Rue and
Held(10) provides a comprehensive account of the main properties of GMRFs,
emphasizes the strong connection between GMRFs and numerical methods for sparse
matrices, and outlines various applications of GMRFs for statistical inference
(e.g., spatial statistics, time-series analysis, graphical models).

Spatiotemporal Analysis: Spatiotemporal data analysis is an
emerging research area due to the development and application of novel
computational techniques allowing for the analysis of large spatiotemporal
databases. Spatiotemporal models arise when data are collected across time as
well as space and has at least one spatial and one temporal property. An event
in a spatiotemporal dataset describes a spatial and temporal phenomenon that
exists at a certain time t and location x. The analysis of spatiotemporal data
requires that both temporal correlations and spatial correlations be taken into
account. Assessing both the temporal and spatial dimensions of data adds
significant complexity to the data analysis process for two major reasons: 1)
Continuous and discrete changes of spatial and non-spatial properties of
spatiotemporal objects and 2) the influence of collocated neighboring
spatiotemporal objects on one another.


Song, Yongze; Yong Ge. “Spatial
distribution estimation of malaria in northern China and its scenarios in 2020,
2030, 2040 and 2050”. Malaria Journal.


Fotheringham, A. S.; Charlton, M. E.; Brunsdon,
C. (1998). “Geographically weighted regression: a natural evolution of the
expansion method for spatial data analysis”. Environment and Planning
A. 30 (11): 1905–1927.doi:10.1068/a301905



Knegt, De; Coughenour, M.B.; Skidmore, A.K.;
Heitkönig, I.M.A.; Knox, N.M.; Slotow, R.; Prins, H.H.T. (2010). “Spatial
autocorrelation and the scaling of species–environment relationships”. Ecology. 91:
2455–2465. doi:10.1890/09-1359.1


Matheron, Georges (1963). “Principles of
geostatistics”. Economic Geology. 58 (8): 1246–1266. doi:10.2113/gsecongeo.58.8.1246. ISSN 1554-0774


Ford, David. “The
Empirical Variogram” (PDF).
Retrieved 31 October 2017


Allenby, Rossi, McCulloch (January 2005). “Hierarchical
Bayes Model: A Practitioner’s Guide”. Journal of
Bayesian Applications in Marketing


CressieN (1993) Statistics for Spatial Data. New York: Wiley-Interscience.


Richardson S,Guihenneuc C, Lasserre V
(1992)Spatial linear models with autocorrelated errorstructure. The
Statistician 41: 539-557.


RueH (2001) Fast sampling of Gaussian Markov random fields. Journal of the
Royal StatisticalSociety Series B 65: 325–338.


(10) Rue
H, Held L (2005) Gaussian Markov Random Fields: Theory and Applications, vol. 104 of
Monographson Statistics and Applied Probability. London: Chapman & Hall.