Introduction:

Boston is a city with many different kinds of neighborhoods of unique populations and structures. One product of these differences is elevated noise levels from humans, traffic, construction, and other sources. For this analysis I'd like to take a look at decible levels of ambient noise in Boston using sampled point sets and compare day values to night values, to observe spatial patterns and find areas that may be louder at night. While it is likely that the city is uniformly louder during the day, I am interested in exploring whether there are particular areas that experience elevated nighttime noise in comparison to the day.

Data and Methods:

Data for this analysis comes from researcher Erica Walker's collected dataset of decible samples around the city of Boston. The 'Un_weighted' values were used to be able to observe the raw readings of volume regardless of frequency range. Shapefiles for Boston neighborhoods were used from MassGIS and Boston OpenData. Methodology will include using ordinary Kriging to interpolate the data values across the city as a raster for all readings, just day, and just night. Then raster algebra will be used to find the difference between the day and night interpolations, and finally zonal statistics to determine the mean differences per neighborhood to see if any particular neighborhoods happen to be louder at night.

towns_sf <- st_read(dsn = "Shapefiles",layer = "TOWNSSURVEY_POLY")
## Reading layer `TOWNSSURVEY_POLY' from data source `C:\Users\David\Documents\SSU\GPH_942\Assignments\FinalProject\Shapefiles' using driver `ESRI Shapefile'
## Simple feature collection with 1243 features and 22 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 33863.73 ymin: 777606.4 xmax: 330837 ymax: 959743
## epsg (SRID):    NA
## proj4string:    +proj=lcc +lat_1=41.71666666666667 +lat_2=42.68333333333333 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs
BostonNoise_MA83 <- st_transform(BostonNoise_sf, crs = st_crs(towns_sf))

# subset to just city of Boston
Boston_sf<-towns_sf%>%filter(TOWN=="BOSTON")

Plot 1: Unweighted Noise Samples (Day and Night)

tm_shape(Boston_sf) + tm_fill(col = "grey") + tm_borders() + tm_shape(BostonNoise_MA83) + tm_dots(col = "T_OD", size = 0.05, title = "Time of Day of Noise Samples") + tm_layout()
#####
# convert BostonNoise 
BostonNoise_sp <- as_Spatial(BostonNoise_sf$geometry)
BostonNoise_sp <- spTransform(BostonNoise_sp, CRS=CRS("+init=epsg:26986") )
BostonNoise_sp$A_weight <- BostonNoise_MA83$A_weight
BostonNoise_sp$Un_weight <- BostonNoise_MA83$Un_weight
BostonNoise_sp$A_UN <- BostonNoise_MA83$A_UN
BostonNoise_sp$T_OD <- BostonNoise_MA83$T_OD
BostonNoise_sp$D_OW <- BostonNoise_MA83$D_OW

ggplot(BostonNoise, aes(x= Un_weight)) + 
  geom_histogram(bins = 35, na.rm = TRUE) +
  labs(title = "Plot 2:", subtitle = "Unweighted Decible Samples",
       x = "Decibles" , y = "Count") +
  geom_vline(xintercept = mean(BostonNoise$Un_weight, na.rm = TRUE),
             color= "red", linetype = "dashed", lwd = 1) +
  geom_vline(xintercept = median(BostonNoise$Un_weight, na.rm = TRUE),
              color= "blue", linetype = "dashed", lwd = 1) +  
  annotate(geom = "text", x = 73, y = 15, label = "mean", color = "red",
           angle = 90) +
  annotate(geom = "text", x = 70, y = 15, label = "median", color = "blue",
           angle = 90)

as.data.frame(BostonNoise_sp) %>% 
  ggplot(aes(x=T_OD, y =Un_weight)) +
  geom_boxplot(fill = "light blue", notch = TRUE) +
  ggtitle("Plot 3: Decibel Level by Time of Day") +
  xlab("Time of Day") + ylab("Un-Weighted Decibel Values")

The boxplot (Plot 3) appears to display significantly higher readings for daytime decible samples than at night, though much of the interquartile ranges overlap leaving room to explore anomalies in the data. Plot 2 shows the noise data is relatively normally distributed, with perhaps slight bimodal peaks at 68 and 78 decibles.

Plot 4: Noise Samples (Day)

BostonNoise_D_sp <- BostonNoise_sp[BostonNoise_sp$T_OD == "D",]
BostonNoise_N_sp <- BostonNoise_sp[BostonNoise_sp$T_OD == "N",]
nrow(BostonNoise_D_sp)
## [1] 227
nrow(BostonNoise_N_sp)
## [1] 173
Boston_sf<-towns_sf%>%filter(TOWN=="BOSTON")

tm_shape(Boston_sf) + tm_fill(col = "grey") + tm_borders() + tm_shape(BostonNoise_D_sp) + tm_dots(col = "Un_weight", size = 0.05, title = "Unweighted Decibels" , breaks = c(40,45,50,55,60,65,70,75,80,85,90),palette="-RdBu") + tm_layout()

Plot 5: Noise Samples (Night)

tm_shape(Boston_sf) + tm_fill(col = "grey") + tm_borders() + tm_shape(BostonNoise_N_sp) + tm_dots(col = "Un_weight", size = 0.05, title = "Unweighted Decibels", breaks = c(40,45,50,55,60,65,70,75,80,85,90), palette="-RdBu") + tm_layout()

Plots 4 and 5 show the day and night samples plotted separately, and colorized based on unweighted decible readings. It is a bit challenging to discern particular trends for neighborhoods since much of the samples are bunched, so interpolation will help to smooth the data out for interpretation.

Kriging All Boston Noise Data

First, all of the Boston noise data will be Kriged.

nrow(BostonNoise_sp) * (nrow(BostonNoise_sp)-1)/2
## [1] 79800
vc <- variogram(Un_weight ~ 1,BostonNoise_sp, cloud= TRUE)
ggplot(vc, aes(x=dist,y=gamma)) + geom_point(alpha=0.6) + 
  xlab("separation,m") + ylab("semivariance, Un_weight^2") +
  ggtitle("Plot 6: Variogram Cloud of All Noise Data")

v <- variogram(Un_weight ~ 1,BostonNoise_sp)
vmf <- fit.variogram(v, vgm(c("Log", "Sph", "Gau", "Mat", "Cir", "Wav")))
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : singular model in variogram fit
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : No convergence after 200 iterations: try different initial
## values?
vmf
##   model    psill    range
## 1   Nug 34.20100    0.000
## 2   Gau 12.80389 1267.744
print(plot(v, pl=T, model = vmf, main="Plot 7: Noise Data Semivariogram"))

Boston_sp <- as(Boston_sf, "Spatial")
Boston_sp <- spTransform(Boston_sp, CRS=CRS("+init=epsg:26986"))
proj4string(Boston_sp)
## [1] "+init=epsg:26986 +proj=lcc +lat_1=42.68333333333333 +lat_2=41.71666666666667 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(BostonNoise_sp)
## [1] "+init=epsg:26986 +proj=lcc +lat_1=42.68333333333333 +lat_2=41.71666666666667 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
m.grid <- spsample(Boston_sp, type = "regular", cellsize = 100, offset = c(0.5, 0.5))
gridded(m.grid) <- TRUE
## Warning in points2grid(points, tolerance, round): grid has empty column/
## rows in dimension 1
proj4string((m.grid))
## [1] "+init=epsg:26986 +proj=lcc +lat_1=42.68333333333333 +lat_2=41.71666666666667 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
BostonNoise_sp <- BostonNoise_sp[-zerodist(BostonNoise_sp)[,1],]

k40 <- krige(Un_weight ~ 1, locations = BostonNoise_sp, newdata=m.grid, model=vmf)
## [using ordinary kriging]

The variogram cloud in Plot 6 shows there are some outliers but there is not a particular trend to dissimilarity. Plot 7 shows a fitted model, though it seems some of the variance is more complex past the range than the model is able to fit.

Plot 8: Kriged Prediction of Decibel Samples (All)

tm_shape(k40) + tm_raster("var1.pred", palette = "YlOrRd", 
                          title="Decibels") + tm_layout()  

The Kriged prediction of noise for Boston shows high values near Downtown, Roxbury, and near 90 and the Charles River around Fenway. There are also quieter pockets throughout the city.

Plot 9: Kriged Variance of Decibles (All)

tm_shape(k40) + tm_raster("var1.var", palette = "plasma", 
                          title="Decibels^2") +  tm_layout() +  tm_shape(BostonNoise_sp) + tm_dots(size=0.001,col="white") 

Plot 9 shows variance from Kriging is lowest in East Boston and Fenway areas, with a general increase in variance as you move away from those areas throughout Boston.

Kriging Day Noise Data

nrow(BostonNoise_D_sp) * (nrow(BostonNoise_D_sp)-1)/2
## [1] 25651
vcd <- variogram(Un_weight ~ 1,BostonNoise_D_sp, cloud= TRUE)
ggplot(vcd, aes(x=dist,y=gamma)) + geom_point(alpha=0.6) + 
  xlab("separation,m") + ylab("semivariance, Un_weight^2") +
  ggtitle("Plot 10: Variogram Cloud of Day Readings")

vd <- variogram(Un_weight ~ 1,BostonNoise_D_sp)
vmf_d <- fit.variogram(vd, vgm(c("Log", "Sph", "Gau", "Mat", "Cir", "Wav")))
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : singular model in variogram fit
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : No convergence after 200 iterations: try different initial
## values?
print(plot(vd, pl=T, model = vmf_d, main="Plot 11: Day Noise Semivariogram"))

# create grid of kriging
BostonNoise_D_sp <- spTransform(BostonNoise_D_sp, CRS=CRS("+init=epsg:26986"))
BostonNoise_D_sp <- BostonNoise_D_sp[-zerodist(BostonNoise_D_sp)[,1],] 

# interpolate Kriged values using created model and grid
k40_d <- krige(Un_weight ~ 1, locations = BostonNoise_D_sp, newdata=m.grid, model=vmf_d)
## [using ordinary kriging]

Plot 11 seems to show a much more linear model for day noise data as compared to all samples.

Plot 12: Kriged Prediction of Decible Samples (Day)

tm_shape(k40_d) + tm_raster("var1.pred", palette = "YlOrRd", 
                          title="Decibels") +  tm_layout() 

Predicted values for day noise data seem to be very smooth and showing an emanation from downtown throughout the rest of the city. This would be interesting to continue to analyze to see if there are actual sources of noise in this area, or if this pattern emerges simply because of it being a hub of human activity.

Plot 13: Kriged Variance of Decibles (Day)

tm_shape(k40_d) + tm_raster("var1.var", palette = "plasma", 
                          title="Decibels^2") +  tm_layout() +  tm_shape(BostonNoise_D_sp) + tm_dots(size=0.001,col="green")

Similar to Plot 12, the variance in Plot 13 shows a smooth pattern of lower variance near the heart of the city, with increased values as distance from that area increases.

LOOCV Model Validation (Day)

kcv.ok_d <- krige.cv(Un_weight ~ 1, locations=BostonNoise_D_sp, model = vmf_d)
summary(kcv.ok_d)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                min      max
## coords.x1 227361.2 241185.5
## coords.x2 889243.8 904817.3
## Is projected: NA 
## proj4string : [NA]
## Number of points: 226
## Data attributes:
##    var1.pred        var1.var        observed        residual         
##  Min.   :68.91   Min.   :38.65   Min.   :58.27   Min.   :-12.343912  
##  1st Qu.:71.17   1st Qu.:39.30   1st Qu.:67.29   1st Qu.: -5.086627  
##  Median :73.26   Median :39.68   Median :73.02   Median : -0.439889  
##  Mean   :72.95   Mean   :40.08   Mean   :72.96   Mean   :  0.001744  
##  3rd Qu.:74.64   3rd Qu.:40.47   3rd Qu.:78.10   3rd Qu.:  5.089011  
##  Max.   :76.79   Max.   :45.64   Max.   :89.16   Max.   : 16.053960  
##      zscore                fold       
##  Min.   :-1.9406345   Min.   :  1.00  
##  1st Qu.:-0.8128000   1st Qu.: 57.25  
##  Median :-0.0698189   Median :113.50  
##  Mean   : 0.0001315   Mean   :113.50  
##  3rd Qu.: 0.7853191   3rd Qu.:169.75  
##  Max.   : 2.5618516   Max.   :226.00
sqrt(sum(kcv.ok_d$residual^2)/length(kcv.ok_d$residual))
## [1] 6.518416

Kriging Night Noise Data

vcn <- variogram(Un_weight ~ 1,BostonNoise_N_sp, cloud= TRUE)
ggplot(vcn, aes(x=dist,y=gamma)) + geom_point(alpha=0.6) + 
  xlab("separation,m") + ylab("semivariance, Un_weight^2") +
  ggtitle("Plot 14: Variogram Cloud of Night Readings")

vn <- variogram(Un_weight ~ 1,BostonNoise_N_sp)
vmf_n <- fit.variogram(vn, vgm(c("Log", "Sph", "Gau", "Mat", "Cir", "Wav")))
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : singular model in variogram fit
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : No convergence after 200 iterations: try different initial
## values?
vmf_n
##   model    psill    range
## 1   Nug 25.58488    0.000
## 2   Gau 19.88985 1191.243
print(plot(vn, pl=T, model = vmf_n, main = "Plot 15: Night Noise Semivariogram"))

BostonNoise_N_sp <- spTransform(BostonNoise_N_sp, CRS=CRS("+init=epsg:26986"))
# BostonNoise_N_sp <- BostonNoise_N_sp[-zerodist(BostonNoise_N_sp)[,1],]
nrow(BostonNoise_N_sp)
## [1] 173
k40_n <- krige(Un_weight ~ 1, locations = BostonNoise_N_sp, newdata=m.grid, model=vmf_n)
## [using ordinary kriging]

Similar to day data, night noise data shows consistent dissimilarity among point pairs. Plot 15 shows fit model much more similar to the overall Boston noise data as compared to the more linear day model shown in Plot 11.

Plot 16: Kriged Prediction of Decible Samples (Night)

tm_shape(k40_n) + tm_raster("var1.pred", palette = "YlOrRd", 
                            title="Decibels") +  tm_layout() 

Kriged prediction of night data shows much more variation than at day. Particularly loud pockets are revealed again near Downtown and along the lines of Roslindale and Jamaica Plain. East Boston and Hyde Park appear to be quieter.

Plot 17: Kriged Variance of Decibles (Night)

# let's also map out the variance values of the interpolation
tm_shape(k40_n) + tm_raster("var1.var", palette = "plasma", 
                            title="Decibels^2") +   tm_layout()  +  tm_shape(BostonNoise_D_sp) + tm_dots(size=0.001,col="white")

The plot of variance shows an interesting pocket of high variability in the southern portion of the city, possibly due to fewer readings in the area and a high degree of difference in the surrounding points.

LOOCV Model Validation (Night)

kcv.ok_n <- krige.cv(Un_weight ~ 1, locations=BostonNoise_N_sp, model = vmf_n)
summary(kcv.ok_n)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                min      max
## coords.x1 227657.0 241146.8
## coords.x2 887583.9 904701.9
## Is projected: NA 
## proj4string : [NA]
## Number of points: 173
## Data attributes:
##    var1.pred        var1.var        observed        residual        
##  Min.   :60.77   Min.   :28.31   Min.   :44.34   Min.   :-26.28592  
##  1st Qu.:68.29   1st Qu.:30.51   1st Qu.:65.16   1st Qu.: -4.19237  
##  Median :70.05   Median :32.47   Median :70.10   Median :  0.39911  
##  Mean   :69.88   Mean   :33.36   Mean   :69.90   Mean   :  0.01528  
##  3rd Qu.:71.67   3rd Qu.:35.33   3rd Qu.:74.33   3rd Qu.:  4.30378  
##  Max.   :76.68   Max.   :44.71   Max.   :86.42   Max.   : 18.66064  
##      zscore               fold    
##  Min.   :-4.416015   Min.   :  1  
##  1st Qu.:-0.746792   1st Qu.: 44  
##  Median : 0.068274   Median : 87  
##  Mean   : 0.001245   Mean   : 87  
##  3rd Qu.: 0.742510   3rd Qu.:130  
##  Max.   : 3.192570   Max.   :173
sqrt(sum(kcv.ok_n$residual^2)/length(kcv.ok_n$residual))
## [1] 6.577525

Observing Differences in Day and Night Decible Values

This section will look at noise on a neighborhood basis to see which ones may be louder at night vs day, possibly indicating differences in human behaviour in those neighborhoods.

k40_DN_sub <- k40_d
k40_DN_sub$var1.pred.diff <- k40_d$var1.pred - k40_n$var1.pred
k40_DN_sub$var1.var.diff <- k40_d$var1.var - k40_n$var1.var

# bring in Boston neighborhoods shapefile
# download.file(url = "http://bostonopendata-boston.opendata.arcgis.com/datasets/3525b0ee6e6b427f9aab5d0a1d0a1a28_0.zip",
#               destfile = "BostonNeighborhoods.zip")
# unzip the shapefile
unzip(zipfile = "BostonNeighborhoods.zip")
## Warning in unzip(zipfile = "BostonNeighborhoods.zip"): error 1 in
## extracting from zip file
# read in the shapefile using sf::st_read
BostonNeighborhoods_sf <- st_read(dsn = ".",layer = "Boston_Neighborhoods")
## Reading layer `Boston_Neighborhoods' from data source `C:\Users\David\Documents\SSU\GPH_942\Assignments\FinalProject' using driver `ESRI Shapefile'
## Simple feature collection with 26 features and 7 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -71.19125 ymin: 42.22792 xmax: -70.92278 ymax: 42.39699
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
BostonNeighborhoods_sp <- as_Spatial(BostonNeighborhoods_sf$geometry)
BostonNeighborhoods_sp <- spTransform(BostonNeighborhoods_sp, CRS=CRS("+init=epsg:26986"))

Plot 18: Kriged Prediction of Decibels (Day - Night Data)

tm_shape(k40_DN_sub) + tm_raster("var1.pred.diff", palette = "-RdBu", 
                            title="Decibels") +
  tm_shape(BostonNeighborhoods_sf) + tm_fill(id = "Name", alpha = 0) +
  tm_borders(alpha = .15, col = "black")
## Variable "var1.pred.diff" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Plot 18 shows the predictions of night noise subtracted from day noise, meaning positive values are areas predicted to be louder during the day, and negative to be louder at night.

Plot 19: Kriged Variance of Decibles (Day - Night Data)

tm_shape(k40_DN_sub) + tm_raster("var1.var.diff", palette = "plasma", 
                            title="Decibels^2") +
  tm_shape(BostonNeighborhoods_sf) + tm_fill(id = "Name", alpha = 0) +
  tm_borders(alpha = .15, col = "white")

Plot 19 reveals the difference in variabilty between day and night. Positive values display higher variability in the Kriging during the day, negative values mean higher variability in the night data, with values near zero showing similar variability between the two data sets.

Plot 20: Zonal Statistics by Boston Neighborhoods

BostonNeighborhoods_sp$Name <- BostonNeighborhoods_sf$Name
BostonNeighborhoods_sp$SqMiles <- BostonNeighborhoods_sf$SqMiles
r <-  raster(k40_DN_sub,layer=3, values=TRUE) 

zonal<- extract(r, BostonNeighborhoods_sp)

output = unlist(lapply(zonal, function(x) if (!is.null(x)) mean(x, na.rm=TRUE) else NA ))
matrix(output,ncol = 1)
output <- as.data.frame(output)
colnames(output)<- c("ZonalMean")
row.names(output)<-1:26
#row.names(BostonNeighborhoods_sp)<-1:26
BostonNeighborhoods_sf <-cbind(BostonNeighborhoods_sf, output$ZonalMean)
summary(BostonNeighborhoods_sf$output.ZonalMean)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4012  1.3212  2.3520  2.6547  3.3916  6.7451
tm_shape(BostonNeighborhoods_sf) + tm_fill(id = "Name", col = "output.ZonalMean", title="Neighborhood Mean ", popup.vars = c("output.ZonalMean")) +
  tm_borders(alpha = .15, col = "black")

Zonal statistics was applied to each neighborhood for the subtracted dataset to reveal if any neighborhoods overall were louder at night. No neighborhoods were shown to have this pattern, though a handful were very close. Brighton and Hyde Park measured at roughly 1/2 decible louder during the day which may even bet within the margin of error. These neighborhoods certainly had pockets which were in fact louder at night, but it wasn’t high enough to imply a difference once averaged across the whole neighborhood area.

Results:

The results seem to indicate that there is a clear hub of high volume near Downtown Boston during the day, with a general decrease in noise as you move away from that area. Nighttime noise on the other hand is more nuanced, with pockets of high and low values spread throughout. While there were no neighborhoods showing to be overall louder at night when averaged on their interpolated difference values, there are still clearly pockets in the cities that seem to show increased noise at night.

Summary and Discussion:

In this analysis, I set out to use ordinary Kriging with noise readings to interpolate values across the city. Additionally, I hoped to identify day and night differences in noise levels throughout the city. The results seem to provide a helpful first step in investigating this topic, with a few followup questions that could be pursued such as: Why is there such a clear pattern of noise around Downtown Boston? Are there different activities or types of behaviours in different neighborhoods that seem to inform the variation in noise levels? How might weekday and weekend noise comparision further explain these temporal differences?

Overall this was an interesting topic and set of data to explore. Challenges included fixing Kriging errors because of duplicate points in the dataset, converting the zonal statistics vector to a useable column for joining to the neighborhoods shapefile, and properly indexing the two datasets. I appreciated the simplicity of spatial analysis in R for some tasks, such as subtracting the day and night raster layers which was just one line of code and quickly executed.