# Univariate Spatial Dimensionality Reduction

Extening PCoA and Moran Eigenvector Maps to include attributes

spatial
r
Author

Josiah Parry

Published

May 22, 2024

In discussing principal coordinate analysis (PCoA), the question naturally arose of “how could we incorporate non-spatial data into this method?” Well, that is what I wanted to explore.

If we include an attribute e.g. crime rate, and embed that into the spatial components generated by PCoA we would truly be doing spatial dimension reduction. Not only would we be encoding spatial patterns, but we would be encoding spatial patterns as they relate to some attribute across the spatial surface.

This is already explored in the context of gene expression via {spatialPCA} (website here). This is somewhat different to what I explored in my previous blog post. My brief review of the paper and R package tells me that the spatialPCA method applies the spatial weights matrix into the covariance matrix used in PCA.

What I’m going to explore is a bit different than that.

# Univariate spatial encoding

Naturally, we want to look at how we can incorporate attributes into PCoA. Lets first start by creating a spatial weights matrix.

As always, we use the Guerry dataset since it is tried and true. Here I create a spatial weights matrix using contiguity to identify neighborhoods and we use a guassian kernel (like spatial PCA). The Gaussian kernel makes it so that locations that are closer have more weight than those further away. Since we are using contiguity, the distance is estimated by using centroids.

geom <- guerry$geometry pnts <- st_centroid(geom) # neighbors via contiguity nb <- st_contiguity(geom) # gaussian kernel weights for neighbors wt <- st_kernel_weights(nb, pnts, "gaussian") (listw <- nb2listw(nb, wt, style = "B")) Characteristics of weights list object: Neighbour list object: Number of regions: 85 Number of nonzero links: 420 Percentage nonzero weights: 5.813149 Average number of links: 4.941176 Weights style: B Weights constants summary: n nn S0 S1 S2 B 85 7225 698.7942 2378.909 24825.15 ## Revisiting the spatial lag The spatial lag is arguably the most fundamental spatial statistic. It is, in essence, the weighted average of a variable $$x$$ across a neighborhood. To calculate the spatial lag, referred often to as $$Wy$$, we multiply our spatial weights matrix by our neighboring values. Let’s walk through how this works really quickly. We have the index positions of our neighbors and the weights that are associated with each of them. head(nb) head(wt) [[1]] [1] 36 37 67 69 [[2]] [1] 7 49 57 58 73 76 [[3]] [1] 17 21 40 56 61 69 [[4]] [1] 5 24 79 80 [[5]] [1] 4 24 36 [[6]] [1] 24 28 36 40 41 46 80 [[1]] [1] 1.553402 1.857660 2.062100 1.676694 [[2]] [1] 1.801787 1.717777 1.439955 1.721547 1.260566 1.429496 [[3]] [1] 1.599532 1.527097 1.376795 1.722723 1.865664 1.350771 [[4]] [1] 2.040754 1.356645 1.871658 1.685343 [[5]] [1] 2.040754 1.674375 1.689488 [[6]] [1] 2.075805 1.679763 1.357435 1.308397 2.009760 1.812262 1.432539 To calculate the spatial lag we first find the neighbors’ values for a vector x, multiply them by the weights and sum them up. In this case the variable is literacy. x <- guerry$literacy
xj <- find_xj(x, nb)
head(xj)
[[1]]
[1] 29 73 45 32

[[2]]
[1] 67 63 45 54 54 44

[[3]]
[1] 13 23 29 20 19 32

[[4]]
[1] 69 42 23 37

[[5]]
[1] 46 42 29

[[6]]
[1] 42 40 29 29 21 27 37

With the neighboring values, we can multiply them by the spatial weights.

xj_wt <- purrr::map2(xj, wt, $$.xj, .wt) .xj * .wt) head(xj_wt) [[1]] [1] 45.04865 135.60918 92.79449 53.65420 [[2]] [1] 120.71975 108.21994 64.79797 92.96353 68.07056 62.89782 [[3]] [1] 20.79392 35.12323 39.92704 34.45446 35.44761 43.22467 [[4]] [1] 140.81202 56.97907 43.04814 62.35767 [[5]] [1] 93.87468 70.32374 48.99514 [[6]] [1] 87.18379 67.19052 39.36560 37.94350 42.20495 48.93109 53.00394 The last step in calculating the spatial lag is to sum that all up: x_lag <- vapply(xj_wt, sum, numeric(1)) head(x_lag) [1] 327.1065 517.6696 208.9709 303.1969 213.1936 375.8234 Or, simplified, that is: head(st_lag(x, nb, wt)) [1] 327.1065 517.6696 208.9709 303.1969 213.1936 375.8234 ## Stopping short of the spatial lag We use the spatial lag to get a univariate estimate of the spatial neighborhoods value. We have a matrix of xj values that we multiply the spatial weights matrix and then perform a row summation. What if we didn’t perform that summation, and instead applied PCA onto the weighted values of xj? We would then be encoding space and attributes into the components! In the below code chunk I am converting the spatial weights matrix to a sparse matrix and also creating a sparse matrix of the neighboring values. I am also scaling them. m <- as(listw, "CsparseMatrix") xj <- as( nb2listw( nb, find_xj(scale(x), nb), style = "B" ), "CsparseMatrix" ) Now, we multiply them and perform PCA on the resultant matrix: Code plot_comp <- function(comp) { ggplot(guerry, aes(fill = comp)) + geom_sf(color = "black", lwd = 0.2) + scale_fill_viridis_c() + theme_void() + theme(legend.position = "none") } comps <- prcomp_irlba(xj * m, 4) plot_comp(compsrotation[,1]) plot_comp(compsrotation[,2]) plot_comp(compsrotation[,3]) plot_comp(compsrotation[,4]) We can see that there are some meaningful components in here! The original variable’s distribution: ggplot(guerry, aes(fill = literacy)) + geom_sf(color = "black", lwd = 0.2) + scale_fill_viridis_c() + labs(title = "Literacy") + theme_void() We can see that there is some resemblance to original variable in these components. ## Do they exhibit spatial autocorrelation? Yes, yes they do! Code comps_autocorr <- function(comps) { apply(compsrotation, 2, function(.x) { broom::tidy(global_moran_test(.x, nb, wt)) }, simplify = FALSE) |> dplyr::bind_rows(.id = "component") |> dplyr::mutate( estimate1 = round(estimate1, 3), p.value = rstatix::p_format(p.value) ) |> dplyr::select(component, "Moran's I" = estimate1, p_val = p.value) } comps_autocorr(comps) # A tibble: 4 × 3 component Moran's I p_val <chr> <dbl> <chr> 1 PC1 0.689 <0.0001 2 PC2 0.611 <0.0001 3 PC3 0.409 <0.0001 4 PC4 0.594 <0.0001 How do we use this, though? That is the part I am not so clear on. We have multiple components and they all exhibit significant autocorrelation. The applicability of each of these components may depend a lot upon what they are used to predict. The interpretation of them is tougher than identifying the applicability. Code regress_component <- function(z) { vars <- setdiff( colnames(guerry), c("code_dept", "count", "region", "geometry", "area", "distance", "ave_id_geo", "main_city", "dept", "department") ) lapply(vars, \(var) { broom::glance(summary(lm(guerry[[var]] ~ z))) }) |> setNames(vars) |> dplyr::bind_rows(.id = "variable") |> dplyr::arrange(-r.squared) |> dplyr::select(variable, r.squared) |> dplyr::filter(r.squared > 0.1) } If we regress our components upon the variables in the Guerry dataset we might be able to see which patterns they can help explain away. This helper function filters out regressions with an \(R^2$$ of less than 0.1.

regress_component(comps$rotation[,1]) regress_component(comps$rotation[,2])
regress_component(comps$rotation[,3]) regress_component(comps$rotation[,4])
# A tibble: 1 × 2
variable r.squared
<chr>        <dbl>
1 literacy     0.136
# A tibble: 5 × 2
variable        r.squared
<chr>               <dbl>
1 donation_clergy     0.276
2 desertion           0.193
3 instruction         0.132
4 literacy            0.121
5 clergy              0.112
# A tibble: 1 × 2
variable      r.squared
<chr>             <dbl>
1 crime_parents     0.114
# A tibble: 2 × 2
variable    r.squared
<chr>           <dbl>
1 prostitutes     0.319
2 wealth          0.189

Interestingly the 3rd and 4th components are the most useful if we were looking for something to predict upon—this is a derived use case where we’re fishing for something that looks good.

## Univariate Spatial Attribute Comopnent w/ Regression

Let’s explore this 4rd component a bit more. If we regress prostitutes ~ literacy we see that there is a much weaker model. Surprisingly, in fact. And the residuals are only very mildly autocorrelated. Why is the component such a strong predictor????

mod <- lm(prostitutes ~ literacy, data = guerry)
summary(mod)

Call:
lm(formula = prostitutes ~ literacy, data = guerry)

Residuals:
Min     1Q Median     3Q    Max
-415.6 -122.3  -40.5   57.9 4314.3

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -208.002    134.721  -1.544  0.12641
literacy       8.981      3.147   2.854  0.00545 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 502.9 on 83 degrees of freedom
Multiple R-squared:  0.08935,   Adjusted R-squared:  0.07838
F-statistic: 8.144 on 1 and 83 DF,  p-value: 0.005455
global_moran_test(resid(mod), nb, wt)

Moran I test under randomisation

data:  x
weights: listw

Moran I statistic standard deviate = 5.0023, p-value = 2.832e-07
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance
0.142390596      -0.011904762       0.000951401 

Adding in the component to the model, the $$R^2$$ shoots right up! But why?

mod <- lm(prostitutes ~ literacy + comps$rotation[,4], data = guerry) summary(mod)  Call: lm(formula = prostitutes ~ literacy + comps$rotation[, 4], data = guerry)

Residuals:
Min     1Q Median     3Q    Max
-766.7 -103.4  -65.6   36.5 3210.5

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)         -165.523    113.683  -1.456    0.149
literacy               6.326      2.688   2.353    0.021 *
comps$rotation[, 4] 2604.594 440.074 5.919 7.25e-08 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 423.5 on 82 degrees of freedom Multiple R-squared: 0.3619, Adjusted R-squared: 0.3464 F-statistic: 23.26 on 2 and 82 DF, p-value: 9.995e-09 global_moran_test(resid(mod), nb, wt)  Moran I test under randomisation data: x weights: listw Moran I statistic standard deviate = 2.3311, p-value = 0.009874 alternative hypothesis: greater sample estimates: Moran I statistic Expectation Variance 0.100827939 -0.011904762 0.002338715  Often when there is a spatial effect of a variable, we utilize its spatial lag in the model. This is an SLX model but simplified. mod <- lm(prostitutes ~ literacy + st_lag(x, nb, wt), data = guerry) summary(mod)  Call: lm(formula = prostitutes ~ literacy + st_lag(x, nb, wt), data = guerry) Residuals: Min 1Q Median 3Q Max -500.7 -138.6 -29.5 54.1 4237.2 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -155.2878 149.8723 -1.036 0.30319 literacy 10.7142 3.8112 2.811 0.00617 ** st_lag(x, nb, wt) -0.3858 0.4764 -0.810 0.42034 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 503.9 on 82 degrees of freedom Multiple R-squared: 0.09657, Adjusted R-squared: 0.07454 F-statistic: 4.383 on 2 and 82 DF, p-value: 0.01554 global_moran_test(resid(mod), nb, wt)  Moran I test under randomisation data: x weights: listw Moran I statistic standard deviate = 5.2839, p-value = 6.322e-08 alternative hypothesis: greater sample estimates: Moran I statistic Expectation Variance 0.167643014 -0.011904762 0.001154633  Is this all autocorrelation? What if we look at including just a spatial component? What is interesting is that if we recreate this process using only the Moran’s Eigenvectors, there is nothing meaningful to be extracted that predicts prostitution as well as the spatial univariate component! Code sp_comps <- irlba::prcomp_irlba( scale(scale(m, T, T), T, F), 4 ) comps_autocorr(sp_comps) regress_component(sp_comps$rotation[,1])
regress_component(sp_comps$rotation[,2]) regress_component(sp_comps$rotation[,3])
regress_component(sp_comps\$rotation[,4])
# A tibble: 4 × 3
component Moran's I p_val
<chr>           <dbl> <chr>
1 PC1             1.04  <0.0001
2 PC2             0.981 <0.0001
3 PC3             0.934 <0.0001
4 PC4             0.889 <0.0001
# A tibble: 10 × 2
variable        r.squared
<chr>               <dbl>
1 desertion           0.366
2 suicides            0.258
3 crime_pers          0.203
4 literacy            0.197
5 crime_prop          0.178
6 instruction         0.171
7 wealth              0.158
8 donation_clergy     0.155
9 lottery             0.143
10 commerce            0.138
# A tibble: 4 × 2
variable    r.squared
<chr>           <dbl>
1 literacy        0.256
2 instruction     0.230
3 donations       0.160
4 commerce        0.151
# A tibble: 2 × 2
variable    r.squared
<chr>           <dbl>
1 instruction     0.142
2 literacy        0.121
# A tibble: 0 × 2
# ℹ 2 variables: variable <chr>, r.squared <dbl>