Chapter 4 Visualizing Edges
In this chapter, we continue to use the American Mafia data introduced in Chapter 2.
This chapter covers the following topics:
- How to convert point data into line geometry
- How to visualize edges with constant color and size
- How to visualize edges with varying edge width by attributes
- How to visualize edges with varying edge color by attributes
- How to visualize edges with varying edge size and edge color with a combined legend
You will be able to find a copy of all the codes at the bottom of the page.
Before proceeding to the codes, please load the following packages:
library(sf) #for using spatial objects
library(tidyverse) #for using tidy syntax
library(tmap) #for visualizing maps
library(tigris) #for downloading TIGER boundary shapefiles
library(stplanr) #for using od2line function to convert points to lines
library(igraph) #for calculating node degree
library(SSNtools) #load sample datasets MafiaNodes and MafiaEdges
4.1 Convert Points into Lines
4.1.1 Method 1: Use od2line
function in stplanr
package.
od2line
takes in two arguments: 1) an OD dataframe that assumes the first two column contains the origin and destination variables, and 2) a shapefile that can be matched to the origin and destinations. In our case, the first dataframe is MafiaEdges, and the second is MafiaNodes converted into an sf geometry object. This method is particularly useful if you stores your OD dataframe and your shapefile separately. This is also the easiest way for our dataset to convert into lines. The only downside is that you have one more package dependency. We encourage readers to explore other useful OD-related functions in the stplanr
package, such as dist_google()
, od_coords2line()
, od_to_odmatrix()
and so on.
#library(SSNtools)
#library(tidyverse)
#library(stplanr)
data(MafiaNodes)
data(MafiaEdges)
# convert MafiaNodes to an sf geometry object (shapefile)
= MafiaNodes %>%
MafiaSpatial st_as_sf(coords=c("LonX", "LatY"), crs = 4326)
# create line geometry
= od2line(MafiaEdges, MafiaSpatial)
MafiaEdges_toLine
# print first three rows
c(1:3),] MafiaEdges_toLine[
## Simple feature collection with 3 features and 2 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -80.1252 ymin: 25.9783 xmax: -73.9652 ymax: 40.7449
## Geodetic CRS: WGS 84
## Source Target geometry
## 1 CARUSO-FRANK LISI-GAETANO LINESTRING (-74.0049 40.613...
## 2 LANSKY-MEYER GAMBINO-CARLO LINESTRING (-80.1252 25.978...
## 3 CARUSO-FRANK DIMAGGIO-ROSARIO LINESTRING (-74.0049 40.613...
4.1.2 Method 2: Group by lineID and Summarize Points into Line
The second one is to group the points coordinates by line ID, summarize the geometry (you can also flow values in the same statement if available), and then use st_cast
in the sf
package to turn the grouped point coordinates into a line geometry. This method is particularly useful if your points are organized by line ID and if you do not want to use extra packages. It is also convenient for visualizing GPS trajectory data because each row tends to be a point data associated with a line ID and you want to connect all the points that represent one trajectory.
For our data, we do not have line ID for points, but for demonstration purpose, we can create such ID with row_number()
. Then we use pivot_longer()
in tidyr
package (loaded with tidyverse
) to transform one row of edge to two rows of edge points and show what the data look like. Noted that multiple mafia members in MafiaNodes have the same locations and are connected. They cannot form lines. Thus we remove them in the codes.
#library(SSNtools)
#library(tidyverse)
#library(sf)
data(MafiaNodes)
data(MafiaEdges)
# convert MafiaNodes to an sf geometry object (shapefile)
= MafiaNodes %>%
MafiaSpatial st_as_sf(coords=c("LonX", "LatY"), crs = 4326)
# transform MafiaEdges to a format where each node in the edge is coded with an edge ID
= MafiaEdges %>% mutate(ID = row_number()) %>%
MafiaEdges2 pivot_longer(cols = c("Source", "Target"), names_to = "type", values_to = "NODE") %>%
left_join(MafiaSpatial, by=c('NODE'), copy=FALSE)
# print first three rows of the data
c(1:3),] MafiaEdges2[
## # A tibble: 3 × 7
## ID type NODE Family NY NiceLabel geometry
## <int> <chr> <chr> <chr> <int> <chr> <POINT [°]>
## 1 1 Source CARUSO-FRANK Genovese 2 Frank Caruso (-74.0049 40.6133)
## 2 1 Target LISI-GAETANO Lucchese 2 Gaetano Lisi (-73.9908 40.7142)
## 3 2 Source LANSKY-MEYER Genovese 1 Meyer Lansky (-80.1252 25.9783)
# convert points to lines
= MafiaEdges2 %>%
MafiaEdges_toLine group_by(ID) %>%
# this is an optional step to remove points that are at the same location;
filter(n_distinct(geometry) > 1) %>%
st_as_sf() %>%
group_by(ID) %>%
summarise() %>%
st_cast("LINESTRING")
c(1:4),] MafiaEdges_toLine[
## Simple feature collection with 4 features and 1 field
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -118.157 ymin: 25.9783 xmax: -73.9652 ymax: 40.7449
## Geodetic CRS: WGS 84
## # A tibble: 4 × 2
## ID geometry
## <int> <LINESTRING [°]>
## 1 1 (-74.0049 40.6133, -73.9908 40.7142)
## 2 2 (-80.1252 25.9783, -73.9652 40.595)
## 3 3 (-74.0049 40.6133, -73.985 40.7449)
## 4 4 (-118.157 33.9231, -117.93 34.0768)
4.1.3 Method 3: Join Two Point Geometry into One Row and Unite into Line
The third way to convert points into lines is to join two point geometry in one row and cast them into a line. This method is particularly useful if you have coordinates of both origin and destination points in one dataframe. In our case, we do not have coordinates, so we have to join MafiaNodes
to get the geometries to use this method.
#library(SSNtools)
#library(tidyverse)
#library(sf)
data(MafiaNodes)
data(MafiaEdges)
# attach point geometry to MafiaEdges for both Source and Target nodes
= MafiaEdges %>%
MafiaEdges_toLine left_join(MafiaNodes, by=c('Source' = 'NODE'), copy=FALSE) %>%
left_join(MafiaNodes, by=c('Target' = 'NODE'), copy=FALSE) %>%
#LonX.x and LatY.x are coordinates for Source;
#LonX.y and LatY.y are coordinates for Target
select(c(Source, Target, LonX.x, LatY.x, LonX.y, LatY.y)) %>%
# this is an optional step to remove points that are at the same location
filter(LonX.x != LonX.y & LatY.x != LatY.y)
# this helper function converts a row with four coordinates into a
# two by two matrix and cast it into a linestring.
= function(r){st_linestring(t(matrix(unlist(r), 2, 2)))}
st_segment
# loop through each row and cast it into a linestring
$geometry = st_sfc(
MafiaEdges_toLinesapply(1:nrow(MafiaEdges_toLine),
function(i){
st_segment(MafiaEdges_toLine[i,][c('LonX.x', 'LatY.x', 'LonX.y', 'LatY.y')])},
simplify=FALSE))
# ensure the output is an sf object and set the crs
= MafiaEdges_toLine %>% st_as_sf() %>% st_set_crs(4326)
MafiaEdges_toLine
c(1:3),] MafiaEdges_toLine[
## Simple feature collection with 3 features and 6 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -80.1252 ymin: 25.9783 xmax: -73.9652 ymax: 40.7449
## Geodetic CRS: WGS 84
## Source Target LonX.x LatY.x LonX.y LatY.y
## 1 CARUSO-FRANK LISI-GAETANO -74.0049 40.6133 -73.9908 40.7142
## 2 LANSKY-MEYER GAMBINO-CARLO -80.1252 25.9783 -73.9652 40.5950
## 3 CARUSO-FRANK DIMAGGIO-ROSARIO -74.0049 40.6133 -73.9850 40.7449
## geometry
## 1 LINESTRING (-74.0049 40.613...
## 2 LINESTRING (-80.1252 25.978...
## 3 LINESTRING (-74.0049 40.613...
4.2 Visualizing Edges
The simplest edge visualization code snippet using tmap
is the following:
#library(SSNtools)
#library(tidyverse)
#library(sf)
#library(tmap)
#library(tigris)
# prepare data
data(MafiaNodes)
data(MafiaEdges)
# convert coordinates to sf point geometries
= MafiaNodes %>%
MafiaSpatial st_as_sf(coords=c("LonX", "LatY"), crs = 4326)
# convert point geometries to lines
= od2line(MafiaEdges, MafiaSpatial)
MafiaEdges_toLine
# states is a function in tigris to download U.S. state boundary shapefile
= states(cb=TRUE, progress_bar = FALSE) %>%
us_states filter(!STUSPS %in% c('PR','AS', 'AK', 'GU','MP','VI', 'HI'))
# tmap functions to visualize maps
tmap_mode('plot')
tm_shape(us_states) +
tm_polygons() +
tm_shape(MafiaEdges_toLine) +
tm_lines()
4.3 Visualizing Edges by Color
To visualize edges by color, we create a column weight
for edges based on the edge distance. Then we assign the variable column weight
to argument col
in tm_lines
. The default color setting is proportional to the equal intervals of the column variable (in this case, weight
values). However, in reality, the weight distribution of non-planar networks is often skewed, with a few edges have very high weights or flow values and most of the rest have low values. Therefore, we need to adjust the breaks to better visualize the SSN network. To do that, we give argument breaks
a vector of fixed numbers, and set argument style
to fixed
.
# create a line weight column based on edge distance
= MafiaEdges_toLine %>% mutate(weight = as.numeric(st_length(geometry)))
MafiaEdges_toLine
tmap_mode('plot')
tm_shape(us_states) +
tm_polygons(alpha=0, border.col = 'grey') +
#reorder edges so that long distance edges are drawn first and short-ranged edges drawn last
tm_shape(arrange(MafiaEdges_toLine, desc(weight))) +
#define line color with column `weight` and properties associated with lines
tm_lines(col='weight', scale=2, alpha=0.2,
breaks = round(quantile(MafiaEdges_toLine$weight, probs=c(0, 0.5, 0.9, 0.99, 1)), 0),
style="fixed", n = 4,
labels=c('0-14','14-1630','1630-4000','4000-4150'),
palette=c('#CCEBC5', '#7BCCC4', '#2B8CBE', '#094081'),
title.col = c('Distance (km)')) +
tm_layout(legend.position = c('right', 'bottom'))
We can also visualize lines by color based on a categorical variable, such as mafia families. To assign a categorical value to each edge, we identified the top 5 mafia families with the most number of members. They are Genovese
, Lucchese
, Gambino
, Detroit
, and Chicago
. If an edge is affiliated with one of the top 5 mafia families, it will be joined with the respective mafia family name. Otherwise, it will has the value Others
. The biggest difference with the codes above is to change style = 'fixed'
to style = 'cat'
in tm_lines()
and change the color palette to represent categorical colors.
= c('Genovese', 'Lucchese', 'Gambino', 'Detroit', 'Chicago')
top_5 = MafiaNodes %>% mutate(Family = ifelse(Family %in% top_5, Family, 'Others'))
MafiaNodes = MafiaEdges_toLine %>%
MafiaEdges_toLine left_join(MafiaNodes %>% select(c(NODE, Family)), by=c('Source' = 'NODE'), copy=FALSE) %>%
left_join(MafiaNodes %>% select(c(NODE, Family)), by=c('Target' = 'NODE'), copy=FALSE) %>%
#the two left join above will automatically create Family.x and Family.y to differentiate having `Family` twice.
mutate(edge_family = ifelse(Family.x %in% top_5, Family.x, 'Others')) %>%
mutate(edge_family = ifelse(Family.y %in% top_5, Family.y, edge_family))
tmap_mode('plot')
tm_shape(us_states) +
tm_polygons(alpha=0, border.col = 'grey') +
tm_shape(arrange(MafiaEdges_toLine, desc(edge_family))) +
tm_lines(col='edge_family', style='cat', alpha=0.5, lwd=1,
palette=c('#57B897', '#F7774F', '#7A8CC1', '#E072B5', '#FAD324', 'lightgrey'),
title.col = c('Edges by Families')) +
tm_layout(legend.position = c('right', 'bottom'))
4.4 Visualizing Edges by Line Width
To visualize edges by line width, we assign the variable column weight
to argument lwd
in tm_lines
. The following map has constant color across different line width
#library(tmap)
tmap_mode('plot')
tm_shape(us_states) +
tm_polygons(alpha=0, border.col = 'grey') +
tm_shape(MafiaEdges_toLine) +
#define line width with column `weight` and properties associated with lines
tm_lines(lwd='weight', scale=2, alpha=0.2, legend.lwd.is.portrait = TRUE,
title.lwd = c('Distance (m)')) +
tm_layout(legend.position = c('right', 'bottom'))
Since we already established that the edges are better viewed through a skewed weight breaks, we manually assign break values for line width. To adjust the line width breaks, we need to create a column to store relative edge width, which is similar to the node size visualization in chapter 2. We chose a skewed quantile breaks for line width. This step is quite manual for line width aesthetics, and better supported for color aesthetics. The upcoming tmap v4 will have significant improvements on the ease of use.
# We create a column called flow_breaks that stores relative line width
= round(quantile(MafiaEdges_toLine$weight, probs=c(0, 0.5, 0.9, 0.99, 1)), 0)
brks
= MafiaEdges_toLine %>% mutate(
MafiaEdges_toLine line_width = case_when(
>= brks[1] & weight <= brks[2] ~ 0.1,
weight > brks[2] & weight <= brks[3] ~ 0.3,
weight > brks[3] & weight <= brks[4] ~ 0.5,
weight > brks[4] & weight <= brks[5] ~ 1
weight
)
)
tmap_mode('plot')
tm_shape(us_states) +
tm_polygons(alpha=0, border.col = 'grey') +
tm_shape(MafiaEdges_toLine) +
#define line width with column `weight` and properties associated with lines
tm_lines(lwd='line_width', scale=2, alpha=0.2,
legend.lwd.is.portrait = TRUE,
lwd.legend = c(0.1, 0.3, 0.5, 1)*2,
lwd.legend.labels=c('0-14','14-1630','1630-4000','4000-4150'),
title.lwd = c('Distance (km)')) +
tm_layout(legend.position = c('right', 'bottom'))
4.5 Visualizing Edges by Color and Width
Similar to node visualization, to visualize edges with both color and line width, we need to add arguments for both and create a combined legend through tm_add_legend
. Unfortunately, the package does not have an automatic way to combine color and line width legend together, so we have to define the values manually.
tmap_mode('plot')
= tm_shape(us_states) +
map tm_polygons(alpha=0, border.col = 'grey') +
#reorder edges so that long distance edges are drawn first and short-ranged edges drawn last
tm_shape(arrange(MafiaEdges_toLine, desc(weight))) +
tm_lines(
#arguments that define the styles for color
col="weight", alpha=0.2,
breaks = round(quantile(MafiaEdges_toLine$weight, probs=c(0, 0.5, 0.9, 0.99, 1)), 0),
style="fixed", n = 4,
palette=c('#CCEBC5', '#7BCCC4', '#2B8CBE', '#094081'),
legend.col.show = FALSE,
#arguments that define the styles for line width
lwd='line_width', scale=2,
legend.lwd.show = FALSE
+
) #add manual legends to combine color and line width schema
tm_add_legend(
type=c('line'),
col=c('#CCEBC5', '#7BCCC4', '#2B8CBE', '#094081'),
lwd=c(0.1, 0.3, 0.5, 1)*2,
labels=c('0-14','14-1630','1630-4000','4000-4150'),
title='Distance (km)') +
tm_layout(legend.position = c('right', 'bottom'))
map
Perfect! Now we can would like to add nodes with size scaled by their degree. Here we show a new technique, which is to save the map we already plotted as a variable map
above, and add new components to it. In this way, we do not have to regenerate the portion of the map that we already produced and speed up the mapping and testing time dramatically!
#library(igraph)
= graph_from_data_frame(MafiaEdges, directed = FALSE, vertices=MafiaSpatial)
g $degree = degree(g)
MafiaSpatial
= map +
map tm_shape(MafiaSpatial) +
tm_symbols(size="degree", scale=2, #scale up the node size
col='orange', border.col='darkorange',
alpha=0.2, border.alpha = 0.2,
title.size=c('Degree'))
map
To export the tmap object into a local folder, you can add:
tmap_save(map, filename='YOUR_LOCAL_FOLDER_PATH/map.png')
Here is the full code to replicate the map above
library(sf)
library(tidyverse)
library(tmap)
library(tigris)
library(stplanr)
library(igraph)
data(MafiaNodes)
data(MafiaEdges)
# convert MafiaNodes to an sf geometry object (shapefile)
= MafiaNodes %>%
MafiaSpatial st_as_sf(coords=c("LonX", "LatY"), crs = 4326)
# create line geometry
= od2line(MafiaEdges, MafiaSpatial)
MafiaEdges_toLine
# states is a function in tigris to download U.S. state boundary shapefile
= states(cb=TRUE, progress_bar = FALSE) %>%
us_states filter(!STUSPS %in% c('PR','AS', 'AK', 'GU','MP','VI', 'HI'))
# create weight column for each edge
= MafiaEdges_toLine %>% mutate(weight = as.numeric(st_length(geometry)))
MafiaEdges_toLine = round(quantile(MafiaEdges_toLine$weight, probs=c(0, 0.5, 0.9, 0.99, 1)), 0)
brks
# create line_width column for each edge
= MafiaEdges_toLine %>% mutate(
MafiaEdges_toLine line_width = case_when(
>= brks[1] & weight <= brks[2] ~ 0.1,
weight > brks[2] & weight <= brks[3] ~ 0.3,
weight > brks[3] & weight <= brks[4] ~ 0.5,
weight > brks[4] & weight <= brks[5] ~ 1
weight
)
)
# create degree column for each node
= graph_from_data_frame(MafiaEdges, directed = FALSE, vertices=MafiaSpatial)
g $degree = degree(g)
MafiaSpatial
tmap_mode('plot')
= tm_shape(us_states) +
map tm_polygons(alpha=0, border.col = 'grey') +
#reorder edges so that long distance edges are drawn first and short-ranged edges drawn last
tm_shape(arrange(MafiaEdges_toLine, desc(weight))) +
tm_lines(
#arguments that define the styles for color
col="weight", alpha=0.2,
breaks = round(quantile(MafiaEdges_toLine$weight, probs=c(0, 0.5, 0.9, 0.99, 1)), 0),
style="fixed", n = 4,
palette=c('#CCEBC5', '#7BCCC4', '#2B8CBE', '#094081'),
legend.col.show = FALSE,
#arguments that define the styles for line width
lwd='line_width', scale=2,
legend.lwd.show = FALSE
+
) #add manual legends to combine color and line width schema
tm_add_legend(
type=c('line'),
col=c('#CCEBC5', '#7BCCC4', '#2B8CBE', '#094081'),
lwd=c(0.1, 0.3, 0.5, 1)*2,
labels=c('0-14','14-1630','1630-4000','4000-4150'),
title='Distance (km)') +
tm_shape(MafiaSpatial) +
tm_symbols(size="degree", scale=2, #scale up the node size
col='orange', border.col='darkorange',
alpha=0.2, border.alpha = 0.2,
title.size=c('Degree')) +
tm_layout(legend.position = c('right', 'bottom'))
#tmap_save(map, filename='YOUR_LOCAL_FOLDER_PATH/map.png')
map