Comparison of Interpolation Models for Greater Boston Noise Monitoring

Introduction

Noise monitoring stations were placed throughout the Greater Boston Area in order to measure ambient sounds in decibels. The information collected was provided with latitudes and longitudes in order to spatially represent the monitoring site’s location. Each site was monitored only during the day or during the night in order to represent a varying dataset with hypothetically different noise values.

The goal of this analysis was to interpolate these noise values in order to determine the effectiveness of two interpolation models: Inverse Distance Weighting and Ordinary Kriging. The data was separated into daytime and nighttime sites and analyzed in order to predict what noise values would be present in surrounding locations. This type of analysis could produce an interesting look at both the differences and magnitude of noise during the day and night and how two interpolation models can predict noise in other nearby locations.

Data and Methods

The data provided was in the form of an excel spreadsheet split into separate tabs with metadata, monitoring site locations, and surveys of people on the ground. The data of interest was the monitoring site locations which were converted to a .csv file and then read into R as a data frame. The data was then filtered into separate daytime and nighttime data frames. There were a total of 8 variables with 400 observations (monitoring sites). After separating out the datasets there were 227 daytime observations and 173 nighttime observations. The variable of interest for analysis is “Un_weight”, or the unweighted decibel reading at each monitoring site.

#load packages
library(sp)
library(tidyverse)
library(maptools)
library(tmap)
library(sf)
library(rgdal)
library(rgeos)
library(spatstat) #must run spatstat before gstat for idw to work without mask
library(gstat)
library(raster)
library(xts)
library(rgl)

#read in the data
setwd("C:/Users/Jon/Desktop/Final Project")
bostonNeighborhoods_sf <- st_read(dsn = ".", layer="Boston_Neighborhoods")
## Reading layer `Boston_Neighborhoods' from data source `C:\Users\Jon\Desktop\Final Project' using driver `ESRI Shapefile'
## Simple feature collection with 26 features and 7 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -71.19125 ymin: 42.22792 xmax: -70.92278 ymax: 42.39699
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
Monitoring_Sites <- read.csv("GrtrBostonNeighborhoodNoiseSurvey.csv", 
                        stringsAsFactors = FALSE)
Monitoring_Sites_Day <- subset(Monitoring_Sites, T_OD == "D")

Monitoring_Sites_Night <- subset(Monitoring_Sites, T_OD == "N")

glimpse(Monitoring_Sites_Day)
## Observations: 227
## Variables: 8
## $ site_id   <dbl> 102172016, 102182015, 102212015, 102222015, 10223201...
## $ Lat       <dbl> 42.35122, 42.34266, 42.34905, 42.34934, 42.33944, 42...
## $ Lon       <dbl> -71.06611, -71.10331, -71.09622, -71.16388, -71.1017...
## $ A_weight  <dbl> 71.03479, 65.90164, 70.30381, 68.41760, 65.25820, 62...
## $ Un_weight <dbl> 82.06686, 78.17036, 81.79270, 76.54583, 76.30549, 71...
## $ A_UN      <dbl> 11.032073, 12.268715, 11.488891, 8.128227, 11.047287...
## $ T_OD      <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D...
## $ D_OW      <chr> "WD", "WD", "WE", "WE", "WD", "WE", "WD", "WD", "WD"...
glimpse(Monitoring_Sites_Night)
## Observations: 173
## Variables: 8
## $ site_id   <dbl> 102182016, 102202016, 103042016, 103062016, 10309201...
## $ Lat       <dbl> 42.33839, 42.35294, 42.37222, 42.34554, 42.30786, 42...
## $ Lon       <dbl> -71.10805, -71.15082, -71.06385, -71.09030, -71.0812...
## $ A_weight  <dbl> 69.01023, 66.75523, 68.40336, 62.94559, 73.29572, 69...
## $ Un_weight <dbl> 76.46157, 72.43792, 76.19239, 69.73315, 82.85453, 80...
## $ A_UN      <dbl> 7.451338, 5.682682, 7.789034, 6.787567, 9.558807, 10...
## $ T_OD      <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N...
## $ D_OW      <chr> "WD", "WE", "WD", "WE", "WD", "WE", "WE", "WD", "WD"...

When plotted, the unweighted decibel readings showed a fairly normal distribution and therefore transformations of the data were not needed in order to interpolate.

#plot a histogram of data
#appears to be uniformly distributed, does not need a transformation
ggplot(Monitoring_Sites_Day, aes(x=Un_weight))+ geom_histogram(bins = 50) +
  ggtitle("Histogram of Daytime Monitoring Sites") +
  labs(aes(x = "Unweighted Sound Measurements in Decibels", 
           y = "Number of Observations"))

ggplot(Monitoring_Sites_Night, aes(x=Un_weight))+ geom_histogram(bins = 50) + 
  ggtitle("Histogram of Nighttime Monitoring Sites") +
  labs(aes(x = "Unweighted Sound Measurements in Decibels", 
           y = "Number of Observations"))

Maps of the monitoring locations can be seen below for both day and night sites. It can be seen from both the histograms and the maps that there is a higher magnitude of decibel levels during the day than at night.

The first analysis used an inverse distance weighting model to interpolate noise values. The first step was to project each data frame into an spdf. Also, there appeared to be a duplicate point record so the “-zerodist” function was used in order to remove it before further errors occurred during processing. After removing the duplicate, the daytime observations dropped to 226 (only one duplicate removed).

#project the spdf
#coordinates(Monitoring_Sites_Day) <- ~Lon + Lat
#proj4string(Monitoring_Sites_Day) <- CRS("+proj=longlat +datum=WGS84")
#Monitoring_Sites_Day <- spTransform(Monitoring_Sites_Day, "+proj=lcc +lat_1=41.71666666666667 +lat_2=42.68333333333333 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs")

#use zerodist function to remove duplicate spatial points
#Monitoring_Sites_Day <- Monitoring_Sites_Day[-zerodist(Monitoring_Sites_Day)[,1],]

#coordinates(Monitoring_Sites_Night) <- ~Lon + Lat
#proj4string(Monitoring_Sites_Night) <- CRS("+proj=longlat +datum=WGS84")
#Monitoring_Sites_Night <- spTransform(Monitoring_Sites_Night, "+proj=lcc +lat_1=41.71666666666667 +lat_2=42.68333333333333 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs")

glimpse(Monitoring_Sites_Day)
## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  226 obs. of  6 variables:
##   .. ..$ site_id  : num [1:226] 1.02e+08 1.02e+08 1.02e+08 1.02e+08 1.02e+08 ...
##   .. ..$ A_weight : num [1:226] 71 65.9 70.3 68.4 65.3 ...
##   .. ..$ Un_weight: num [1:226] 82.1 78.2 81.8 76.5 76.3 ...
##   .. ..$ A_UN     : num [1:226] 11.03 12.27 11.49 8.13 11.05 ...
##   .. ..$ T_OD     : chr [1:226] "D" "D" "D" "D" ...
##   .. ..$ D_OW     : chr [1:226] "WD" "WD" "WE" "WE" ...
##   ..@ coords.nrs : num(0) 
##   ..@ coords     : num [1:226, 1:2] 235749 232688 233269 227694 232822 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   ..@ bbox       : num [1:2, 1:2] 227361 889244 241185 904817
##   .. ..- attr(*, "dimnames")=List of 2
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot

Next, an IDW raster from the “Un_weight” variable was produced. First, convex hulls were created to encompass the interpolation area for each data set. Next, a sampling grid of 10,000 cells was created for each data set. An IDW model was run on each, transformed to a raster, coordinate system set, and masked to the hull boundaries. The results of each can be seen below.

#create the empty interpolation grid with a convex hull polygon to encompass
# points

m.hull1 <- gConvexHull(Monitoring_Sites_Day, byid = FALSE, id = FALSE)

tm_shape(Monitoring_Sites_Day) + tm_dots() + tm_shape(m.hull1) + tm_borders(col = "blue")
m.hull2 <- gConvexHull(Monitoring_Sites_Night, byid = FALSE, id = FALSE)

tm_shape(Monitoring_Sites_Night) + tm_dots() + tm_shape(m.hull2) + tm_borders(col = "blue")
# define a sampling grid based on the extent of the point layer
grid1 <- spsample(Monitoring_Sites_Day, type = "regular", n = 10000)
grid2 <- spsample(Monitoring_Sites_Night, type = "regular", n = 10000)

# run the idw for the Un_weight variable
idw1 <- idw(Monitoring_Sites_Day$Un_weight ~ 1, 
           Monitoring_Sites_Day, newdata = grid1)
## [inverse distance weighted interpolation]
idw2 <- idw(Monitoring_Sites_Night$Un_weight ~ 1, 
            Monitoring_Sites_Night, newdata = grid2)
## [inverse distance weighted interpolation]
# transform to a data frame to rename the columns
idw.output1 <- as.data.frame(idw1)
names(idw.output1)[1:3] <- c("long", "lat", "prediction")

idw.output2 <- as.data.frame(idw2)
names(idw.output2)[1:3] <- c("long", "lat", "prediction")

# convert idw.output to a raster

#create a spatial points data frame
spg1 <- idw.output1
coordinates(spg1) <- ~ long + lat

spg2 <- idw.output2
coordinates(spg2) <- ~ long + lat

#coerce to spatial pixels dataframe
gridded(spg1) <- TRUE
gridded(spg2) <- TRUE

#coerce to raster
raster_idw1 <- raster(spg1)
raster_idw2 <- raster(spg2)

# set the CRS
projection(raster_idw1) <- CRS(proj4string(Monitoring_Sites_Day))
projection(raster_idw2) <- CRS(proj4string(Monitoring_Sites_Night))

# clip or 'mask' the raster to m.hull1
masked_idw1 <- mask(raster_idw1, m.hull1)
masked_idw2 <- mask(raster_idw2, m.hull2)

The second analysis used ordinary kriging to interpolate values between monitoring sites for both data sets. First, variograms were produced , which were then fit to empirical models using the “goodness-of-fit-test”. The same hulls were used from the IDW to clip the extent of the interpolation area and then grids were generate as an overlay raster.

#variograms for each data set
v1 <- variogram(Un_weight ~ 1, Monitoring_Sites_Day)

v2 <- variogram(Un_weight ~ 1, Monitoring_Sites_Night)

#goodness-of-fit test
vmf1 <- fit.variogram(v1, vgm(c("Sph", "Log", "Gau", "Mat", "Exp", "Lin", "Cir",
                                "Pen", "Ste", "Nug", "Bes")))
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : singular model in variogram fit
vmf2 <- fit.variogram(v2, vgm(c("Sph", "Log", "Gau", "Mat", "Exp", "Lin", "Cir",
                                "Pen", "Ste", "Nug", "Bes")))
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : singular model in variogram fit
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : No convergence after 200 iterations: try different initial
## values?

## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : No convergence after 200 iterations: try different initial
## values?
m.grid1 <- spsample(m.hull1, type = "regular", cellsize=200, 
                   offset = c(0.5, 0.5))

m.grid2 <- spsample(m.hull2, type = "regular", cellsize=200, 
                    offset = c(0.5, 0.5))

gridded(m.grid1) <- TRUE
plot(m.grid1)

gridded(m.grid2) <- TRUE
plot(m.grid2)

Finally, each data set was interpolated using oridinary kriging. Both predicted and variance values were mapped in the results section.

kDay <- krige(Un_weight ~ 1, locations = Monitoring_Sites_Day, newdata = m.grid1, model = vmf1)
## [using ordinary kriging]
summary(kDay)
## Object of class SpatialPixelsDataFrame
## Coordinates:
##         min      max
## x1 227361.2 241161.2
## x2 889243.8 904843.8
## Is projected: TRUE 
## proj4string :
## [+proj=lcc +lat_1=41.71666666666667 +lat_2=42.68333333333333
## +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83
## +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0]
## Number of points: 3373
## Grid attributes:
##    cellcentre.offset cellsize cells.dim
## x1          227461.2      200        69
## x2          889343.8      200        78
## Data attributes:
##    var1.pred        var1.var    
##  Min.   :69.11   Min.   :38.43  
##  1st Qu.:71.04   1st Qu.:39.82  
##  Median :72.19   Median :40.68  
##  Mean   :72.57   Mean   :40.95  
##  3rd Qu.:74.30   3rd Qu.:41.90  
##  Max.   :76.92   Max.   :45.62
kNight <- krige(Un_weight ~ 1, locations = Monitoring_Sites_Night, newdata = m.grid2, model = vmf2)
## [using ordinary kriging]
summary(kNight)
## Object of class SpatialPixelsDataFrame
## Coordinates:
##         min      max
## x1 227657.0 241057.0
## x2 887583.9 904583.9
## Is projected: TRUE 
## proj4string :
## [+proj=lcc +lat_1=41.71666666666667 +lat_2=42.68333333333333
## +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83
## +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0]
## Number of points: 3294
## Grid attributes:
##    cellcentre.offset cellsize cells.dim
## x1          227757.0      200        67
## x2          887683.9      200        85
## Data attributes:
##    var1.pred        var1.var    
##  Min.   :61.50   Min.   :26.88  
##  1st Qu.:68.62   1st Qu.:32.83  
##  Median :69.84   Median :36.01  
##  Mean   :69.94   Mean   :36.62  
##  3rd Qu.:71.35   3rd Qu.:40.63  
##  Max.   :76.83   Max.   :44.31

Results

The variograms produced for the OK models can be seen below. The first model is for the daytime data set and has an “Ste” empirical model fit. The second model is for the nighttime data set and has a “Lin” empirical model fit. It appears that the second model is not a very good fit and this could likely be due to outliers in the original data set. This may negatively affect the OK model for nighttime decibel predictions.

The maps below show the results of the IDW interpolation vs the ordinary kriging interpolation. These show predicted decibel values across the Greater Boston Area. IDW values are based on a weighted average of the known values from the monitoring sites while OK value are based upon distance lags between the monitoring sites and their semivariance.

Variance maps were also produced in order to show the variation from the mean for each data set.

Discussion

This analysis was meant to show different interpolation methods for noise data collected across the Greater Boston Area. By using both IDW and OK interpolation methods, two very different predictions were produced. It appears that the IDW produced a much “finer” interpolation of values since it relied upon averages of values between sites. The OK interpolation, on the other hand seems to have “bands” or “zones” of similarly interpolated values. This diference is likely due to the OK model producing average values for each distance lag between monitoring sites.Based upon the interpolations produced, it appears that noise in the study areas varies greatly based upon time of day (day or night), in both magnitude and spread. This can be seen in both the original data histograms and the variance maps, which indicate a much higher variation of noise across the study area during the night.

It is difficult to determine which model is more useful without validating the models (which I ran into some difficulty doing so). It must also be noted that both of these models assume a planar surface, and in a city with large structures such as buildings, noise can travel in very non-linear paths. It would be interesting to implement additional variables into a universal kriging model to see how much they would change the predictions and validate the model (increase or decrease variance). The variogram models produced for the OK may also need to be further inspected with possible removal of outliers in order for a better empirical model fit and ultimately better performance.