Univariate Spatial Dimensionality Reduction

Extening PCoA and Moran Eigenvector Maps to include attributes

spatial
r
Author

Josiah Parry

Published

May 22, 2024

In discussing principal coordinate analysis (PCoA), the question naturally arose of “how could we incorporate non-spatial data into this method?” Well, that is what I wanted to explore.

If we include an attribute e.g. crime rate, and embed that into the spatial components generated by PCoA we would truly be doing spatial dimension reduction. Not only would we be encoding spatial patterns, but we would be encoding spatial patterns as they relate to some attribute across the spatial surface.

This is already explored in the context of gene expression via {spatialPCA} (website here). This is somewhat different to what I explored in my previous blog post. My brief review of the paper and R package tells me that the spatialPCA method applies the spatial weights matrix into the covariance matrix used in PCA.

What I’m going to explore is a bit different than that.

Univariate spatial encoding

Naturally, we want to look at how we can incorporate attributes into PCoA. Lets first start by creating a spatial weights matrix.

As always, we use the Guerry dataset since it is tried and true. Here I create a spatial weights matrix using contiguity to identify neighborhoods and we use a guassian kernel (like spatial PCA). The Gaussian kernel makes it so that locations that are closer have more weight than those further away. Since we are using contiguity, the distance is estimated by using centroids.

geom <- guerry$geometry
pnts <- st_centroid(geom)

# neighbors via contiguity
nb <- st_contiguity(geom)

# gaussian kernel weights for neighbors
wt <- st_kernel_weights(nb, pnts, "gaussian")

(listw <- nb2listw(nb, wt, style = "B"))
Characteristics of weights list object:
Neighbour list object:
Number of regions: 85 
Number of nonzero links: 420 
Percentage nonzero weights: 5.813149 
Average number of links: 4.941176 

Weights style: B 
Weights constants summary:
   n   nn       S0       S1       S2
B 85 7225 698.7942 2378.909 24825.15

Revisiting the spatial lag

The spatial lag is arguably the most fundamental spatial statistic. It is, in essence, the weighted average of a variable \(x\) across a neighborhood.

To calculate the spatial lag, referred often to as \(Wy\), we multiply our spatial weights matrix by our neighboring values. Let’s walk through how this works really quickly.

We have the index positions of our neighbors and the weights that are associated with each of them.

head(nb)
head(wt)
[[1]]
[1] 36 37 67 69

[[2]]
[1]  7 49 57 58 73 76

[[3]]
[1] 17 21 40 56 61 69

[[4]]
[1]  5 24 79 80

[[5]]
[1]  4 24 36

[[6]]
[1] 24 28 36 40 41 46 80
[[1]]
[1] 1.553402 1.857660 2.062100 1.676694

[[2]]
[1] 1.801787 1.717777 1.439955 1.721547 1.260566 1.429496

[[3]]
[1] 1.599532 1.527097 1.376795 1.722723 1.865664 1.350771

[[4]]
[1] 2.040754 1.356645 1.871658 1.685343

[[5]]
[1] 2.040754 1.674375 1.689488

[[6]]
[1] 2.075805 1.679763 1.357435 1.308397 2.009760 1.812262 1.432539

To calculate the spatial lag we first find the neighbors’ values for a vector x, multiply them by the weights and sum them up. In this case the variable is literacy.

x <- guerry$literacy
xj <- find_xj(x, nb)
head(xj)
[[1]]
[1] 29 73 45 32

[[2]]
[1] 67 63 45 54 54 44

[[3]]
[1] 13 23 29 20 19 32

[[4]]
[1] 69 42 23 37

[[5]]
[1] 46 42 29

[[6]]
[1] 42 40 29 29 21 27 37

With the neighboring values, we can multiply them by the spatial weights.

xj_wt <- purrr::map2(xj, wt, \(.xj, .wt) .xj * .wt)
head(xj_wt)
[[1]]
[1]  45.04865 135.60918  92.79449  53.65420

[[2]]
[1] 120.71975 108.21994  64.79797  92.96353  68.07056  62.89782

[[3]]
[1] 20.79392 35.12323 39.92704 34.45446 35.44761 43.22467

[[4]]
[1] 140.81202  56.97907  43.04814  62.35767

[[5]]
[1] 93.87468 70.32374 48.99514

[[6]]
[1] 87.18379 67.19052 39.36560 37.94350 42.20495 48.93109 53.00394

The last step in calculating the spatial lag is to sum that all up:

x_lag <- vapply(xj_wt, sum, numeric(1))
head(x_lag)
[1] 327.1065 517.6696 208.9709 303.1969 213.1936 375.8234

Or, simplified, that is:

head(st_lag(x, nb, wt))
[1] 327.1065 517.6696 208.9709 303.1969 213.1936 375.8234

Stopping short of the spatial lag

We use the spatial lag to get a univariate estimate of the spatial neighborhoods value. We have a matrix of xj values that we multiply the spatial weights matrix and then perform a row summation.

What if we didn’t perform that summation, and instead applied PCA onto the weighted values of xj?

We would then be encoding space and attributes into the components! In the below code chunk I am converting the spatial weights matrix to a sparse matrix and also creating a sparse matrix of the neighboring values. I am also scaling them.

m <- as(listw, "CsparseMatrix")

xj <- as(
  nb2listw(
    nb, find_xj(scale(x), nb), style = "B"
  ),
  "CsparseMatrix"
)

Now, we multiply them and perform PCA on the resultant matrix:

Code
plot_comp <- function(comp) {
    ggplot(guerry, aes(fill = comp)) +
    geom_sf(color = "black", lwd = 0.2) +
    scale_fill_viridis_c() +
    theme_void() +
    theme(legend.position = "none")
}
comps <- prcomp_irlba(xj * m, 4)

plot_comp(comps$rotation[,1])
plot_comp(comps$rotation[,2])
plot_comp(comps$rotation[,3])
plot_comp(comps$rotation[,4])

We can see that there are some meaningful components in here!

The original variable’s distribution:

ggplot(guerry, aes(fill = literacy)) +
    geom_sf(color = "black", lwd = 0.2) +
    scale_fill_viridis_c() +
    labs(title = "Literacy") +
    theme_void()

We can see that there is some resemblance to original variable in these components.

Do they exhibit spatial autocorrelation?

Yes, yes they do!

Code
comps_autocorr <- function(comps) {
    apply(comps$rotation, 2, function(.x) {
        broom::tidy(global_moran_test(.x, nb, wt))
        }, simplify = FALSE) |>
        dplyr::bind_rows(.id = "component") |>
        dplyr::mutate(
            estimate1 = round(estimate1, 3), p.value = rstatix::p_format(p.value)
        ) |>
        dplyr::select(component, "Moran's I" = estimate1, p_val = p.value) 
}
comps_autocorr(comps)
# A tibble: 4 × 3
  component `Moran's I` p_val  
  <chr>           <dbl> <chr>  
1 PC1             0.689 <0.0001
2 PC2             0.611 <0.0001
3 PC3             0.409 <0.0001
4 PC4             0.594 <0.0001

How do we use this, though? That is the part I am not so clear on. We have multiple components and they all exhibit significant autocorrelation. The applicability of each of these components may depend a lot upon what they are used to predict. The interpretation of them is tougher than identifying the applicability.

Code
regress_component <- function(z) {
  vars <- setdiff(
    colnames(guerry), 
    c("code_dept", "count", "region", "geometry", "area", "distance", "ave_id_geo", "main_city", "dept", "department")
  )
  
  lapply(vars, \(var) {
    broom::glance(summary(lm(guerry[[var]] ~ z)))
  }) |> 
    setNames(vars) |> 
    dplyr::bind_rows(.id = "variable") |> 
    dplyr::arrange(-r.squared) |> 
    dplyr::select(variable, r.squared) |>
    dplyr::filter(r.squared > 0.1)
}

If we regress our components upon the variables in the Guerry dataset we might be able to see which patterns they can help explain away. This helper function filters out regressions with an \(R^2\) of less than 0.1.

regress_component(comps$rotation[,1])
regress_component(comps$rotation[,2])
regress_component(comps$rotation[,3])
regress_component(comps$rotation[,4])
# A tibble: 1 × 2
  variable r.squared
  <chr>        <dbl>
1 literacy     0.136
# A tibble: 5 × 2
  variable        r.squared
  <chr>               <dbl>
1 donation_clergy     0.276
2 desertion           0.193
3 instruction         0.132
4 literacy            0.121
5 clergy              0.112
# A tibble: 1 × 2
  variable      r.squared
  <chr>             <dbl>
1 crime_parents     0.114
# A tibble: 2 × 2
  variable    r.squared
  <chr>           <dbl>
1 prostitutes     0.319
2 wealth          0.189

Interestingly the 3rd and 4th components are the most useful if we were looking for something to predict upon—this is a derived use case where we’re fishing for something that looks good.

Univariate Spatial Attribute Comopnent w/ Regression

Let’s explore this 4rd component a bit more. If we regress prostitutes ~ literacy we see that there is a much weaker model. Surprisingly, in fact. And the residuals are only very mildly autocorrelated. Why is the component such a strong predictor????

mod <- lm(prostitutes ~ literacy, data = guerry)
summary(mod)

Call:
lm(formula = prostitutes ~ literacy, data = guerry)

Residuals:
   Min     1Q Median     3Q    Max 
-415.6 -122.3  -40.5   57.9 4314.3 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -208.002    134.721  -1.544  0.12641   
literacy       8.981      3.147   2.854  0.00545 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 502.9 on 83 degrees of freedom
Multiple R-squared:  0.08935,   Adjusted R-squared:  0.07838 
F-statistic: 8.144 on 1 and 83 DF,  p-value: 0.005455
global_moran_test(resid(mod), nb, wt)

    Moran I test under randomisation

data:  x  
weights: listw    

Moran I statistic standard deviate = 5.0023, p-value = 2.832e-07
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.142390596      -0.011904762       0.000951401 

Adding in the component to the model, the \(R^2\) shoots right up! But why?

mod <- lm(prostitutes ~ literacy + comps$rotation[,4], data = guerry)
summary(mod)

Call:
lm(formula = prostitutes ~ literacy + comps$rotation[, 4], data = guerry)

Residuals:
   Min     1Q Median     3Q    Max 
-766.7 -103.4  -65.6   36.5 3210.5 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         -165.523    113.683  -1.456    0.149    
literacy               6.326      2.688   2.353    0.021 *  
comps$rotation[, 4] 2604.594    440.074   5.919 7.25e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 423.5 on 82 degrees of freedom
Multiple R-squared:  0.3619,    Adjusted R-squared:  0.3464 
F-statistic: 23.26 on 2 and 82 DF,  p-value: 9.995e-09
global_moran_test(resid(mod), nb, wt)

    Moran I test under randomisation

data:  x  
weights: listw    

Moran I statistic standard deviate = 2.3311, p-value = 0.009874
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.100827937      -0.011904762       0.002338716 

Often when there is a spatial effect of a variable, we utilize its spatial lag in the model. This is an SLX model but simplified.

mod <- lm(prostitutes ~ literacy + st_lag(x, nb, wt), data = guerry)
summary(mod)

Call:
lm(formula = prostitutes ~ literacy + st_lag(x, nb, wt), data = guerry)

Residuals:
   Min     1Q Median     3Q    Max 
-500.7 -138.6  -29.5   54.1 4237.2 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)   
(Intercept)       -155.2878   149.8723  -1.036  0.30319   
literacy            10.7142     3.8112   2.811  0.00617 **
st_lag(x, nb, wt)   -0.3858     0.4764  -0.810  0.42034   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 503.9 on 82 degrees of freedom
Multiple R-squared:  0.09657,   Adjusted R-squared:  0.07454 
F-statistic: 4.383 on 2 and 82 DF,  p-value: 0.01554
global_moran_test(resid(mod), nb, wt)

    Moran I test under randomisation

data:  x  
weights: listw    

Moran I statistic standard deviate = 5.2839, p-value = 6.322e-08
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.167643014      -0.011904762       0.001154633 

Is this all autocorrelation? What if we look at including just a spatial component? What is interesting is that if we recreate this process using only the Moran’s Eigenvectors, there is nothing meaningful to be extracted that predicts prostitution as well as the spatial univariate component!

Code
sp_comps <- irlba::prcomp_irlba(
  scale(scale(m, T, T), T, F), 4
)

comps_autocorr(sp_comps)
regress_component(sp_comps$rotation[,1])
regress_component(sp_comps$rotation[,2])
regress_component(sp_comps$rotation[,3])
regress_component(sp_comps$rotation[,4])
# A tibble: 4 × 3
  component `Moran's I` p_val  
  <chr>           <dbl> <chr>  
1 PC1             1.04  <0.0001
2 PC2             0.981 <0.0001
3 PC3             0.934 <0.0001
4 PC4             0.889 <0.0001
# A tibble: 10 × 2
   variable        r.squared
   <chr>               <dbl>
 1 desertion           0.366
 2 suicides            0.258
 3 crime_pers          0.203
 4 literacy            0.197
 5 crime_prop          0.178
 6 instruction         0.171
 7 wealth              0.158
 8 donation_clergy     0.155
 9 lottery             0.143
10 commerce            0.138
# A tibble: 4 × 2
  variable    r.squared
  <chr>           <dbl>
1 literacy        0.256
2 instruction     0.230
3 donations       0.160
4 commerce        0.151
# A tibble: 2 × 2
  variable    r.squared
  <chr>           <dbl>
1 instruction     0.142
2 literacy        0.121
# A tibble: 0 × 2
# ℹ 2 variables: variable <chr>, r.squared <dbl>