Boston is a city with many different kinds of neighborhoods of unique populations and structures. One product of these differences is elevated noise levels from humans, traffic, construction, and other sources. For this analysis I'd like to take a look at decible levels of ambient noise in Boston using sampled point sets and compare day values to night values, to observe spatial patterns and find areas that may be louder at night. While it is likely that the city is uniformly louder during the day, I am interested in exploring whether there are particular areas that experience elevated nighttime noise in comparison to the day.
Data for this analysis comes from researcher Erica Walker's collected dataset of decible samples around the city of Boston. The 'Un_weighted' values were used to be able to observe the raw readings of volume regardless of frequency range. Shapefiles for Boston neighborhoods were used from MassGIS and Boston OpenData. Methodology will include using ordinary Kriging to interpolate the data values across the city as a raster for all readings, just day, and just night. Then raster algebra will be used to find the difference between the day and night interpolations, and finally zonal statistics to determine the mean differences per neighborhood to see if any particular neighborhoods happen to be louder at night.
towns_sf <- st_read(dsn = "Shapefiles",layer = "TOWNSSURVEY_POLY")
## Reading layer `TOWNSSURVEY_POLY' from data source `C:\Users\David\Documents\SSU\GPH_942\Assignments\FinalProject\Shapefiles' using driver `ESRI Shapefile'
## Simple feature collection with 1243 features and 22 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 33863.73 ymin: 777606.4 xmax: 330837 ymax: 959743
## epsg (SRID): NA
## proj4string: +proj=lcc +lat_1=41.71666666666667 +lat_2=42.68333333333333 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs
BostonNoise_MA83 <- st_transform(BostonNoise_sf, crs = st_crs(towns_sf))
# subset to just city of Boston
Boston_sf<-towns_sf%>%filter(TOWN=="BOSTON")
tm_shape(Boston_sf) + tm_fill(col = "grey") + tm_borders() + tm_shape(BostonNoise_MA83) + tm_dots(col = "T_OD", size = 0.05, title = "Time of Day of Noise Samples") + tm_layout()
#####
# convert BostonNoise
BostonNoise_sp <- as_Spatial(BostonNoise_sf$geometry)
BostonNoise_sp <- spTransform(BostonNoise_sp, CRS=CRS("+init=epsg:26986") )
BostonNoise_sp$A_weight <- BostonNoise_MA83$A_weight
BostonNoise_sp$Un_weight <- BostonNoise_MA83$Un_weight
BostonNoise_sp$A_UN <- BostonNoise_MA83$A_UN
BostonNoise_sp$T_OD <- BostonNoise_MA83$T_OD
BostonNoise_sp$D_OW <- BostonNoise_MA83$D_OW
ggplot(BostonNoise, aes(x= Un_weight)) +
geom_histogram(bins = 35, na.rm = TRUE) +
labs(title = "Plot 2:", subtitle = "Unweighted Decible Samples",
x = "Decibles" , y = "Count") +
geom_vline(xintercept = mean(BostonNoise$Un_weight, na.rm = TRUE),
color= "red", linetype = "dashed", lwd = 1) +
geom_vline(xintercept = median(BostonNoise$Un_weight, na.rm = TRUE),
color= "blue", linetype = "dashed", lwd = 1) +
annotate(geom = "text", x = 73, y = 15, label = "mean", color = "red",
angle = 90) +
annotate(geom = "text", x = 70, y = 15, label = "median", color = "blue",
angle = 90)
as.data.frame(BostonNoise_sp) %>%
ggplot(aes(x=T_OD, y =Un_weight)) +
geom_boxplot(fill = "light blue", notch = TRUE) +
ggtitle("Plot 3: Decibel Level by Time of Day") +
xlab("Time of Day") + ylab("Un-Weighted Decibel Values")
The boxplot (Plot 3) appears to display significantly higher readings for daytime decible samples than at night, though much of the interquartile ranges overlap leaving room to explore anomalies in the data. Plot 2 shows the noise data is relatively normally distributed, with perhaps slight bimodal peaks at 68 and 78 decibles.
BostonNoise_D_sp <- BostonNoise_sp[BostonNoise_sp$T_OD == "D",]
BostonNoise_N_sp <- BostonNoise_sp[BostonNoise_sp$T_OD == "N",]
nrow(BostonNoise_D_sp)
## [1] 227
nrow(BostonNoise_N_sp)
## [1] 173
Boston_sf<-towns_sf%>%filter(TOWN=="BOSTON")
tm_shape(Boston_sf) + tm_fill(col = "grey") + tm_borders() + tm_shape(BostonNoise_D_sp) + tm_dots(col = "Un_weight", size = 0.05, title = "Unweighted Decibels" , breaks = c(40,45,50,55,60,65,70,75,80,85,90),palette="-RdBu") + tm_layout()
tm_shape(Boston_sf) + tm_fill(col = "grey") + tm_borders() + tm_shape(BostonNoise_N_sp) + tm_dots(col = "Un_weight", size = 0.05, title = "Unweighted Decibels", breaks = c(40,45,50,55,60,65,70,75,80,85,90), palette="-RdBu") + tm_layout()
Plots 4 and 5 show the day and night samples plotted separately, and colorized based on unweighted decible readings. It is a bit challenging to discern particular trends for neighborhoods since much of the samples are bunched, so interpolation will help to smooth the data out for interpretation.
First, all of the Boston noise data will be Kriged.
nrow(BostonNoise_sp) * (nrow(BostonNoise_sp)-1)/2
## [1] 79800
vc <- variogram(Un_weight ~ 1,BostonNoise_sp, cloud= TRUE)
ggplot(vc, aes(x=dist,y=gamma)) + geom_point(alpha=0.6) +
xlab("separation,m") + ylab("semivariance, Un_weight^2") +
ggtitle("Plot 6: Variogram Cloud of All Noise Data")
v <- variogram(Un_weight ~ 1,BostonNoise_sp)
vmf <- fit.variogram(v, vgm(c("Log", "Sph", "Gau", "Mat", "Cir", "Wav")))
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : singular model in variogram fit
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : No convergence after 200 iterations: try different initial
## values?
vmf
## model psill range
## 1 Nug 34.20100 0.000
## 2 Gau 12.80389 1267.744
print(plot(v, pl=T, model = vmf, main="Plot 7: Noise Data Semivariogram"))
Boston_sp <- as(Boston_sf, "Spatial")
Boston_sp <- spTransform(Boston_sp, CRS=CRS("+init=epsg:26986"))
proj4string(Boston_sp)
## [1] "+init=epsg:26986 +proj=lcc +lat_1=42.68333333333333 +lat_2=41.71666666666667 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(BostonNoise_sp)
## [1] "+init=epsg:26986 +proj=lcc +lat_1=42.68333333333333 +lat_2=41.71666666666667 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
m.grid <- spsample(Boston_sp, type = "regular", cellsize = 100, offset = c(0.5, 0.5))
gridded(m.grid) <- TRUE
## Warning in points2grid(points, tolerance, round): grid has empty column/
## rows in dimension 1
proj4string((m.grid))
## [1] "+init=epsg:26986 +proj=lcc +lat_1=42.68333333333333 +lat_2=41.71666666666667 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
BostonNoise_sp <- BostonNoise_sp[-zerodist(BostonNoise_sp)[,1],]
k40 <- krige(Un_weight ~ 1, locations = BostonNoise_sp, newdata=m.grid, model=vmf)
## [using ordinary kriging]
The variogram cloud in Plot 6 shows there are some outliers but there is not a particular trend to dissimilarity. Plot 7 shows a fitted model, though it seems some of the variance is more complex past the range than the model is able to fit.
tm_shape(k40) + tm_raster("var1.pred", palette = "YlOrRd",
title="Decibels") + tm_layout()
The Kriged prediction of noise for Boston shows high values near Downtown, Roxbury, and near 90 and the Charles River around Fenway. There are also quieter pockets throughout the city.
tm_shape(k40) + tm_raster("var1.var", palette = "plasma",
title="Decibels^2") + tm_layout() + tm_shape(BostonNoise_sp) + tm_dots(size=0.001,col="white")
Plot 9 shows variance from Kriging is lowest in East Boston and Fenway areas, with a general increase in variance as you move away from those areas throughout Boston.
nrow(BostonNoise_D_sp) * (nrow(BostonNoise_D_sp)-1)/2
## [1] 25651
vcd <- variogram(Un_weight ~ 1,BostonNoise_D_sp, cloud= TRUE)
ggplot(vcd, aes(x=dist,y=gamma)) + geom_point(alpha=0.6) +
xlab("separation,m") + ylab("semivariance, Un_weight^2") +
ggtitle("Plot 10: Variogram Cloud of Day Readings")
vd <- variogram(Un_weight ~ 1,BostonNoise_D_sp)
vmf_d <- fit.variogram(vd, vgm(c("Log", "Sph", "Gau", "Mat", "Cir", "Wav")))
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : singular model in variogram fit
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : No convergence after 200 iterations: try different initial
## values?
print(plot(vd, pl=T, model = vmf_d, main="Plot 11: Day Noise Semivariogram"))
# create grid of kriging
BostonNoise_D_sp <- spTransform(BostonNoise_D_sp, CRS=CRS("+init=epsg:26986"))
BostonNoise_D_sp <- BostonNoise_D_sp[-zerodist(BostonNoise_D_sp)[,1],]
# interpolate Kriged values using created model and grid
k40_d <- krige(Un_weight ~ 1, locations = BostonNoise_D_sp, newdata=m.grid, model=vmf_d)
## [using ordinary kriging]
Plot 11 seems to show a much more linear model for day noise data as compared to all samples.
tm_shape(k40_d) + tm_raster("var1.pred", palette = "YlOrRd",
title="Decibels") + tm_layout()
Predicted values for day noise data seem to be very smooth and showing an emanation from downtown throughout the rest of the city. This would be interesting to continue to analyze to see if there are actual sources of noise in this area, or if this pattern emerges simply because of it being a hub of human activity.
tm_shape(k40_d) + tm_raster("var1.var", palette = "plasma",
title="Decibels^2") + tm_layout() + tm_shape(BostonNoise_D_sp) + tm_dots(size=0.001,col="green")
Similar to Plot 12, the variance in Plot 13 shows a smooth pattern of lower variance near the heart of the city, with increased values as distance from that area increases.
kcv.ok_d <- krige.cv(Un_weight ~ 1, locations=BostonNoise_D_sp, model = vmf_d)
summary(kcv.ok_d)
## Object of class SpatialPointsDataFrame
## Coordinates:
## min max
## coords.x1 227361.2 241185.5
## coords.x2 889243.8 904817.3
## Is projected: NA
## proj4string : [NA]
## Number of points: 226
## Data attributes:
## var1.pred var1.var observed residual
## Min. :68.91 Min. :38.65 Min. :58.27 Min. :-12.343912
## 1st Qu.:71.17 1st Qu.:39.30 1st Qu.:67.29 1st Qu.: -5.086627
## Median :73.26 Median :39.68 Median :73.02 Median : -0.439889
## Mean :72.95 Mean :40.08 Mean :72.96 Mean : 0.001744
## 3rd Qu.:74.64 3rd Qu.:40.47 3rd Qu.:78.10 3rd Qu.: 5.089011
## Max. :76.79 Max. :45.64 Max. :89.16 Max. : 16.053960
## zscore fold
## Min. :-1.9406345 Min. : 1.00
## 1st Qu.:-0.8128000 1st Qu.: 57.25
## Median :-0.0698189 Median :113.50
## Mean : 0.0001315 Mean :113.50
## 3rd Qu.: 0.7853191 3rd Qu.:169.75
## Max. : 2.5618516 Max. :226.00
sqrt(sum(kcv.ok_d$residual^2)/length(kcv.ok_d$residual))
## [1] 6.518416
vcn <- variogram(Un_weight ~ 1,BostonNoise_N_sp, cloud= TRUE)
ggplot(vcn, aes(x=dist,y=gamma)) + geom_point(alpha=0.6) +
xlab("separation,m") + ylab("semivariance, Un_weight^2") +
ggtitle("Plot 14: Variogram Cloud of Night Readings")
vn <- variogram(Un_weight ~ 1,BostonNoise_N_sp)
vmf_n <- fit.variogram(vn, vgm(c("Log", "Sph", "Gau", "Mat", "Cir", "Wav")))
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : singular model in variogram fit
## Warning in fit.variogram(object, x, fit.sills = fit.sills, fit.ranges =
## fit.ranges, : No convergence after 200 iterations: try different initial
## values?
vmf_n
## model psill range
## 1 Nug 25.58488 0.000
## 2 Gau 19.88985 1191.243
print(plot(vn, pl=T, model = vmf_n, main = "Plot 15: Night Noise Semivariogram"))
BostonNoise_N_sp <- spTransform(BostonNoise_N_sp, CRS=CRS("+init=epsg:26986"))
# BostonNoise_N_sp <- BostonNoise_N_sp[-zerodist(BostonNoise_N_sp)[,1],]
nrow(BostonNoise_N_sp)
## [1] 173
k40_n <- krige(Un_weight ~ 1, locations = BostonNoise_N_sp, newdata=m.grid, model=vmf_n)
## [using ordinary kriging]
Similar to day data, night noise data shows consistent dissimilarity among point pairs. Plot 15 shows fit model much more similar to the overall Boston noise data as compared to the more linear day model shown in Plot 11.
tm_shape(k40_n) + tm_raster("var1.pred", palette = "YlOrRd",
title="Decibels") + tm_layout()
Kriged prediction of night data shows much more variation than at day. Particularly loud pockets are revealed again near Downtown and along the lines of Roslindale and Jamaica Plain. East Boston and Hyde Park appear to be quieter.
# let's also map out the variance values of the interpolation
tm_shape(k40_n) + tm_raster("var1.var", palette = "plasma",
title="Decibels^2") + tm_layout() + tm_shape(BostonNoise_D_sp) + tm_dots(size=0.001,col="white")
The plot of variance shows an interesting pocket of high variability in the southern portion of the city, possibly due to fewer readings in the area and a high degree of difference in the surrounding points.
kcv.ok_n <- krige.cv(Un_weight ~ 1, locations=BostonNoise_N_sp, model = vmf_n)
summary(kcv.ok_n)
## Object of class SpatialPointsDataFrame
## Coordinates:
## min max
## coords.x1 227657.0 241146.8
## coords.x2 887583.9 904701.9
## Is projected: NA
## proj4string : [NA]
## Number of points: 173
## Data attributes:
## var1.pred var1.var observed residual
## Min. :60.77 Min. :28.31 Min. :44.34 Min. :-26.28592
## 1st Qu.:68.29 1st Qu.:30.51 1st Qu.:65.16 1st Qu.: -4.19237
## Median :70.05 Median :32.47 Median :70.10 Median : 0.39911
## Mean :69.88 Mean :33.36 Mean :69.90 Mean : 0.01528
## 3rd Qu.:71.67 3rd Qu.:35.33 3rd Qu.:74.33 3rd Qu.: 4.30378
## Max. :76.68 Max. :44.71 Max. :86.42 Max. : 18.66064
## zscore fold
## Min. :-4.416015 Min. : 1
## 1st Qu.:-0.746792 1st Qu.: 44
## Median : 0.068274 Median : 87
## Mean : 0.001245 Mean : 87
## 3rd Qu.: 0.742510 3rd Qu.:130
## Max. : 3.192570 Max. :173
sqrt(sum(kcv.ok_n$residual^2)/length(kcv.ok_n$residual))
## [1] 6.577525
This section will look at noise on a neighborhood basis to see which ones may be louder at night vs day, possibly indicating differences in human behaviour in those neighborhoods.
k40_DN_sub <- k40_d
k40_DN_sub$var1.pred.diff <- k40_d$var1.pred - k40_n$var1.pred
k40_DN_sub$var1.var.diff <- k40_d$var1.var - k40_n$var1.var
# bring in Boston neighborhoods shapefile
# download.file(url = "http://bostonopendata-boston.opendata.arcgis.com/datasets/3525b0ee6e6b427f9aab5d0a1d0a1a28_0.zip",
# destfile = "BostonNeighborhoods.zip")
# unzip the shapefile
unzip(zipfile = "BostonNeighborhoods.zip")
## Warning in unzip(zipfile = "BostonNeighborhoods.zip"): error 1 in
## extracting from zip file
# read in the shapefile using sf::st_read
BostonNeighborhoods_sf <- st_read(dsn = ".",layer = "Boston_Neighborhoods")
## Reading layer `Boston_Neighborhoods' from data source `C:\Users\David\Documents\SSU\GPH_942\Assignments\FinalProject' using driver `ESRI Shapefile'
## Simple feature collection with 26 features and 7 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -71.19125 ymin: 42.22792 xmax: -70.92278 ymax: 42.39699
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
BostonNeighborhoods_sp <- as_Spatial(BostonNeighborhoods_sf$geometry)
BostonNeighborhoods_sp <- spTransform(BostonNeighborhoods_sp, CRS=CRS("+init=epsg:26986"))
tm_shape(k40_DN_sub) + tm_raster("var1.pred.diff", palette = "-RdBu",
title="Decibels") +
tm_shape(BostonNeighborhoods_sf) + tm_fill(id = "Name", alpha = 0) +
tm_borders(alpha = .15, col = "black")
## Variable "var1.pred.diff" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Plot 18 shows the predictions of night noise subtracted from day noise, meaning positive values are areas predicted to be louder during the day, and negative to be louder at night.
tm_shape(k40_DN_sub) + tm_raster("var1.var.diff", palette = "plasma",
title="Decibels^2") +
tm_shape(BostonNeighborhoods_sf) + tm_fill(id = "Name", alpha = 0) +
tm_borders(alpha = .15, col = "white")
Plot 19 reveals the difference in variabilty between day and night. Positive values display higher variability in the Kriging during the day, negative values mean higher variability in the night data, with values near zero showing similar variability between the two data sets.
BostonNeighborhoods_sp$Name <- BostonNeighborhoods_sf$Name
BostonNeighborhoods_sp$SqMiles <- BostonNeighborhoods_sf$SqMiles
r <- raster(k40_DN_sub,layer=3, values=TRUE)
zonal<- extract(r, BostonNeighborhoods_sp)
output = unlist(lapply(zonal, function(x) if (!is.null(x)) mean(x, na.rm=TRUE) else NA ))
matrix(output,ncol = 1)
output <- as.data.frame(output)
colnames(output)<- c("ZonalMean")
row.names(output)<-1:26
#row.names(BostonNeighborhoods_sp)<-1:26
BostonNeighborhoods_sf <-cbind(BostonNeighborhoods_sf, output$ZonalMean)
summary(BostonNeighborhoods_sf$output.ZonalMean)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4012 1.3212 2.3520 2.6547 3.3916 6.7451
tm_shape(BostonNeighborhoods_sf) + tm_fill(id = "Name", col = "output.ZonalMean", title="Neighborhood Mean ", popup.vars = c("output.ZonalMean")) +
tm_borders(alpha = .15, col = "black")
Zonal statistics was applied to each neighborhood for the subtracted dataset to reveal if any neighborhoods overall were louder at night. No neighborhoods were shown to have this pattern, though a handful were very close. Brighton and Hyde Park measured at roughly 1/2 decible louder during the day which may even bet within the margin of error. These neighborhoods certainly had pockets which were in fact louder at night, but it wasn’t high enough to imply a difference once averaged across the whole neighborhood area.
The results seem to indicate that there is a clear hub of high volume near Downtown Boston during the day, with a general decrease in noise as you move away from that area. Nighttime noise on the other hand is more nuanced, with pockets of high and low values spread throughout. While there were no neighborhoods showing to be overall louder at night when averaged on their interpolated difference values, there are still clearly pockets in the cities that seem to show increased noise at night.
In this analysis, I set out to use ordinary Kriging with noise readings to interpolate values across the city. Additionally, I hoped to identify day and night differences in noise levels throughout the city. The results seem to provide
a helpful first step in investigating this topic, with a few followup questions that could be pursued such as: Why is there such a clear pattern of noise around Downtown Boston? Are there different activities or types of behaviours in
different neighborhoods that seem to inform the variation in noise levels? How might weekday and weekend noise comparision further explain these temporal differences?
Overall this was an interesting topic and set of data to explore. Challenges included fixing Kriging errors because of duplicate points in the dataset, converting the zonal statistics vector to a useable column for joining to the neighborhoods
shapefile, and properly indexing the two datasets. I appreciated the simplicity of spatial analysis in R for some tasks, such as subtracting the day and night raster layers which was just one line of code and quickly executed.