This is my final deliverable for my Senior Project.
This map
shows the many cities in the United States that Wilford Woodruff has
lived, traveled to, and journaled about throughout his life. Each point
on the map depicts a different city and is colored based on how many
times it was mentioned in Wilford’s journals. The number of mentions in
Wilford’s journals and the city name can be viewed upon selecting each
individual point. The pop-up text also includes a link that can allow
you to view the different journals entries that mention that city.
This map also includes a time slider that allows you to filter
the graph based on date. Using the slider allows you to visualize where
Wilford has lived and traveled to over time.
library(tidyverse)
library(readr)
library(USAboundaries)
library(sf)
library(leaflet)
library(RColorBrewer)
library(leaflet.extras2)
library(geojsonsf)
library(lubridate)
# upload data
#wwp <- read_csv('data/wwp_data.csv')
wwp <- read_csv("C:/Users/spenc/Documents/GitHub/Consult_S23_WWP/data/derived/Spencer Journal Dates.csv") %>%
select(-c(date, day_text))
# remove data that does not include locations
location_data <- wwp[!is.na(wwp$Places),]
# rename columns
colnames(location_data) <- c("id", "document_type","parent_id", "parent_name", "uuid", "page",
"website_url", "short_url", "image_url", "original_transcript",
"text_transcript", "people", "all_places", "dates", "topics")
# filter data to only include journals
journal <- location_data %>%
filter(document_type == "Journals")
############# adding dates to the data #############
sorted_data <- journal[order(journal$id),]
# appending the id to the end of every text script
sorted_data$text_transcript <- paste0(sorted_data$text_transcript, ' %%%%', sorted_data$id, '%%%% ')
######## All this is to get exact days, which will hopefully be unnecessary for 2 reasons
# This paste turns it into one long string
n_full_text <- paste(sorted_data$`text_transcript`, collapse = "")
# This is the pattern that removes all possible ways that WW wrote his dates
pattern_trey <- "(([J|F|M|A|J|S|O|N|D][a-z]{2,8}\\s){1,2}\\d{1,2}(th|st|rd|nd)?,?(\\s\\d{4})?|[J|F|M|A|J|S|O|N|D][a-z]{3,8}\\s\\d{1,2}(th|st|rd|nd)?) ~"
# Removing of the dates
matches <- str_extract_all(n_full_text, pattern=pattern_trey) %>% unlist()
# Emma wrote this, it adds an NA in the beginning where there is not a date
matches2 <- append(matches, NA, after = 0)
# This then resplits the data by date as opposed by page
text <- strsplit(n_full_text, split = pattern_trey) %>% unlist()
# This creates our database which is huge for us
papers <- data.frame(
date = matches2,
text = text
)
# separating the ids from the text
# some of the days include multiple pages which why there are a lot of columns
papers2 <- papers %>%
separate(text, c("text", "id", "extra", "id2", "extra2", "id3", "extra3", "id4", "extra4",
"id5", "extra5", "id6", "extra6", "id7", "extra7", "id8", "extra8",
"id9", "extra9", "id10", "extra10", "id11", "extra11", "id12", "extra12",
"id13", "extra13", "id14", "extra14", "id15", "extra15",
"id16", "extra16", "id17", "extra17", "id18", "extra18",
"id19", "extra19", "id20", "extra20", "id21", "extra21", "id22", "extra22",
"id23", "extra23", "id24", "extra24", "id25", "extra25",
"id26", "extra26", "id27", "extra27", "id28", "extra28",
"id29", "extra29", "id30", "extra30", "id31", "extra31", "id32", "extra32",
"id33", "extra33", "id34", "extra34"), "%%%%")
papers2 <- papers2 %>%
subset(select = -c(extra, extra2, extra3, extra4, extra5, extra6, extra7, extra8,
extra9, extra10, extra11, extra12, extra13, extra14, extra15, extra16,
extra17, extra18, extra19, extra20, extra21, extra22, extra23, extra24,
extra25, extra26, extra27, extra28, extra29, extra30, extra31, extra32,
extra33, extra34))%>%
pivot_longer(cols=c(3:36), names_to='drop_column', values_to='id') %>%
drop_na()
# removing na rows, dropping extra text, and making the date column as a date variable
papers3 <- papers2 %>%
subset(select = -c(drop_column)) %>%
mutate(day = mdy(date)) %>%
select(id, day)
journal$id <- as.character(journal$id)
# joining the date table with the rest of the data by id
data_dates <- journal %>% inner_join(papers3, by='id')
################ wrangling places column
###########################################################################
# separating all places into separate columns
sep_places <- data_dates %>%
separate(all_places, c("place1", "place2", "place3", "place4", "place5", "place6",
"place7", "place8", "place9", "place10", "place11", "place12", "place13", "place14", "place15", "place16",
"place17", "place18", "place19", "place20", "place21", "place22", "place23", "place24", "place25", "place26",
"place27"), "[|]")
# using pivot_longer to make each location it's own row
sep_places <- sep_places %>%
pivot_longer(cols=c(13:39),
names_to='place_n',
values_to='location') %>%
drop_na(location)
# dropping all columns that do not include a county
format <- sep_places %>%
mutate(format = grepl("[A-Za-z ]+(,)+[A-Za-z ]+(,)+[A-Z a-z]+$", location)) %>%
##Anything with 3 words and 2 commas (including those with more)
subset(format != 'FALSE')
##(not because they wouldn't get found, but because now it's standardized)
##Also drops references to just states ('Missouri','Utah',etc)
## grepl is basically str_detect(), but probably faster
# separating location column into city, county, and state columns
data <- format %>%
separate(location, c('city', 'county_name', 'state_name'), sep=',') %>%
mutate(county_yn = grepl('County', county_name)) %>%
subset(county_yn) %>%
select(city, county_name, state_name, short_url,
text_transcript, day)
# removing extra spaces from county and state columns
data$county_name <- trimws(data$county_name, which = c("left"))
data$state_name <- trimws(data$state_name, which = c("left"))
# removing territory from state column
data <- data %>% mutate(state_name = str_remove_all(state_name, " Territory"))
data$county_name <- str_replace(data$county_name, "Great Salt Lake County", "Salt Lake County")
data$city <- str_replace(data$city, "Great Salt Lake City", "Salt Lake City")
########### adding city coordinates to the data
############################################################################
cities <- read_csv("data/uscities.csv") %>%
select(city, state_name, lat, lng)
# creating a geometry point out of the latitude and longitude
cities$point <- st_geometry(st_as_sf(cities,coords = c("lng","lat")))
# joining city data with location data
city_data <- data %>% inner_join(cities, by = c("city", "state_name"))
## creating a url for every city
city_data['state_url'] <- city_data['state_name']
city_data['city_url'] <- city_data['city']
city_data$state_url <- gsub(" ", "+", city_data$state_url)
city_data$city_url <- gsub(" ", "+", city_data$city_url)
city_data['search_url'] <- paste0("https://wilfordwoodruffpapers.org/places?search=",city_data$city_url, "+", city_data$state_url)
# grouping data by city and state
group_data <- city_data %>%
group_by(city, state_name)
# creating a count column for how many times every city is mentioned [said county, think it isn't]
count_data <- transform(group_data,city_frequency=ave(seq(nrow(group_data)),search_url,FUN=length))
############################################################################
################################## GRAPH ###################################
############################################################################
############################################################################
# converting data frame into an sf points object
count_data <- sf::st_as_sf(count_data)
# making the 'day' column into an as.POSIXct object type
data2 <- count_data
data2 <- sf::st_as_sf(data2)
data2 <- st_cast(data2, "POINT")
data2 <- data2[order(data2$day), ]
data2$day = as.POSIXct(
seq.POSIXt(as.POSIXct(min(data2$day)), as.POSIXct(max(data2$day)), length.out = nrow(data2)))
data2$day = as.Date(data2$day)
data3 <- data2 %>%
select(day, point)
# creating a color palette and bins for number of mentions
mybins <- c(0,2,5,10,50,100,200,Inf)
mypalette <- colorBin(palette="YlGnBu", domain=data2$city_frequency, na.color="transparent", bins=mybins)
# creating leaflet graph
## Slider will later be built into the search, so probably not needed
leaflet() %>%
addProviderTiles('CartoDB.Positron') %>%
setView(-98.5795, 39.8283, zoom = 3) %>%
addTimeslider(data = data3, fillOpacity = 1, popup = ~paste("<b>", "<a href=", data2$search_url, ">", data2$city, "</a>", ",", "</b>", data2$state_name, "<br>Number of Mentions:",data2$city_frequency),
color = ~mypalette(data2$city_frequency), radius = 5, weight = 5,
options = timesliderOptions(position = "topright", timeAttribute = "day",
showAllOnStart = TRUE, alwaysShowDate = TRUE)) %>%
addLegend(data=data2, pal=mypalette, values=~city_frequency, opacity=0.9, title = "Mentions",
position = "bottomleft")
Wilford Woodruff traveled to and lived in many different places
throughout his life. He was born in Farmington, Connecticut on March 1st
in 1807. He lived there until 1811 when he moved to Richland, New York
with his family. This is where Woodruff spent most of his youth. In
1833, Wilford Woodruff joined the Church of Jesus Christ of Latter Day
Saints and shortly after moved to Kirtland, Ohio, where the Chruch
headquarters were located. In 1838, Woodruff then moved the Missouri
with the Saints to flee the persecution in Kirtland. Then in 1839,
Woodruff moved to Nauvoo, Illinois, where he helped build the city and
construct the Nauvoo Temple. Outside of the United States, Woodruff did
a lot of travelling to other countries, including his mission in
England.
Wilford Woodruff first arrived in Salt Lake City in
1847, which remained his primary residence for the rest of his life.
In my analysis, I looked at the different cities in the United
States that Wilford Woodruff mentioned in his journals. Specifically, I
wanted to see which cities were mentioned the most, and how the
locations of cities mentioned changed throughout his life. Salt Lake
City, Utah is the most mentioned city is Woodruff’s writings, with it
being mentioned 875 times. Salt Lake City is significantly mentioned
more than any other city, which the next closed being Nauvoo, Illinois
with 272 mentions. Ogden, Utah, Provo, Utah, and St. George Utah are the
next most mentioned cities.
From the map, you can see that Utah
is the most densely populated state. Although Woodruff didn’t move to
Utah until 1847, this is where he spent a majority of his life, which is
why this makes sense that it is also one of the most widely traveled.
In total, Wilford Woodruff mentions a total of 36 different
states in his journals, with the top states being Utah, Illinois,
Massachusetts, Ohio, and Idaho.