library(tidyverse)
library(pdftools)
library(tidytext)
library(tokenizers)
Mills, Race, and Rawls
28/08/2024 - This blog post is based on some charts I made for my Contemporary Political Theory students while we were covering critics of Rawls, including Mills. I was trying to keep it interesting for them. I’m retrospectively posting it here, tagged with the original date, because why not…
In “Black Rights/White Wrongs: The Critique of Racial Liberalism”, Charles Mills writes the following on Rawls:
“The person seen as the most important twentieth-century American political philosopher and theorist of social justice, and a fortiori the most important American contract theorist, had nothing to say about the remediation of racial injustice, so central to American society and history. His five major books (excluding the two lecture collections on the history of ethics and political philosophy)—A Theory of Justice, Political Liberalism, Collected Papers, The Law of Peoples, and Justice as Fairness: A Restatement—together total over 2,000 pages. If one were to add together all their sentences on race and racism, one might get half a dozen pages, if that much”
Charles Mills, 2017, 35
Now I thought “why not attempt this?”. Seems straightforward, and we have tools that make it pretty easy.
But first, we have to conduct a thought experiment. Suppose one legally obtained PDFs of:
A Theory of Justice
Political Liberalism
Rawls’ Collected Papers
The Law of Peoples
Justice as Fairness: A Restatement
In a manner similar to the purely hypothetical nature of the original position, how might one add together all the sentences on race and racism?
Well first you might load up some packages:
Then you might read your hypothetical PDFs in as a list:
<- list.files(pattern = "pdf$")
files
<- lapply(files, pdf_text) rawls_books
After that, you could extract the sentences and words from all the books and turn them into dataframes:
# Use map to extract sentences from each PDF and combine into a data frame
<- purrr::map_dfr(files, function(file) {
text_df_sentences <- pdf_text(file)
pdf_text <- unlist(tokenizers::tokenize_sentences(pdf_text))
sentences data.frame(
sentence = sentences,
document = rep(file, length(sentences))
)
})
<- purrr::map_dfr(files, function(file) {
text_df_words <- pdf_text(file)
pdf_text <- unlist(tokenizers::tokenize_words(pdf_text))
words data.frame(
word = words,
document = rep(file, length(words))
) })
Then you might wish to do some basic analysis where you count most frequent words.
<- text_df_words %>%
text_df_words anti_join(stop_words)
<- text_df_words %>%
word_freq count(word, sort = TRUE)
<- head(word_freq, 10)
most_freq
glimpse(most_freq)
Rows: 10
Columns: 2
$ word <chr> "justice", "political", "principles", "conception", "society", "s…
$ n <int> 7484, 6358, 4413, 3913, 3476, 3094, 3055, 2649, 2615, 2540
Could turn that into a nice chart.
Code
<- most_freq %>%
p1 mutate(word = fct_reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_bar(stat="identity", fill="#268bd2") +
coord_flip() +
labs(title = "10 most frequent words across Rawls' big 5 books",
caption = "Source: Yusuf Khan - original analysis of entirely legal PDFs of COLL, JAF, LOP, PL, TOJ") +
theme_minimal() +
theme(plot.title.position = "plot") +
ylab("") +
xlab("") +
theme(plot.title.position = "plot",
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(face = "italic"),
axis.text.y = element_text(face = "bold")
+
) geom_text(aes(label = n), hjust = 1.2, size = 3, colour = "white", fontface = "bold")
# ggsave("rawls_count.png", plot = p1, width = 6, height = 4, units = "in", bg = "#fdf6e3")
p1
What about counting the sentences mentioning race? Well…you could interpret Mills very crudely and just flag race/racism/racist/racial. This approach isn’t very careful. Other terms such as “black”, “white”, “indigenous”, “Jim Crow”, “native”, “skin”, “colour/color”, “segregation” and so on could all be used to discuss race.
So perhaps a more sophisticated approach is called for? Ah but even when you’ve looked for these terms…it isn’t looking good. So to keep things simple, you resolve to look for race/racism/racist/racial. This could crudely support Mills’ point but its not very complete.
<- text_df_sentences %>%
count_race filter(str_detect(sentence, "\\b(?:race|racism|racist|racial)\\b")) %>%
mutate(book = case_match(
document,"RawlsCOLL.pdf" ~ "Collected Papers",
"RawlsJAF.pdf" ~ "Justice as Fairness",
"RawlsLOP.pdf" ~ "Law of Peoples",
"RawlsPL.pdf" ~ "Political Liberalism",
"RawlsTOJ.pdf" ~ "Theory of Justice",
.default = document
),count = 1
%>%
) group_by(book) %>%
summarise(sum = sum(count)) %>%
ungroup() %>%
arrange(sum)
glimpse(count_race)
Rows: 5
Columns: 2
$ book <chr> "Law of Peoples", "Theory of Justice", "Political Liberalism", "C…
$ sum <dbl> 3, 8, 9, 10, 12
You could also turn this into a chart and report the findings.
Code
<- count_race %>%
p2 mutate(book = fct_reorder(book, sum)) %>%
ggplot(aes(book, sum)) +
geom_bar(stat="identity", fill = "#d33682") +
coord_flip() +
theme_minimal() +
ylab("") +
xlab("") +
labs(title = "5 books, over 2000 pages, and about 1 million words...but Rawls only \nhas 42 sentences containing 'race', 'racism', 'racist', or 'racial'",
subtitle = "Sentences containing race/racism/racist/racial across 5 of Rawls' books",
caption = "Source: Yusuf Khan - original analysis of entirely legal PDFs") +
theme(plot.title.position = "plot",
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(face = "italic"),
axis.text.y = element_text(face = "bold")) +
geom_text(aes(label = sum), hjust = 1.8, size = 3, colour = "white", fontface = "bold") +
scale_y_continuous(breaks = scales::pretty_breaks(n = 4))
ggsave("rawls_race.png", plot = p2, width = 6.5, height = 4, units = "in", bg = "#fdf6e3")
p2
Let’s return to Mills’ claim:
If one were to add together all their sentences on race and racism, one might get half a dozen pages, if that much
How does Rawls fare? First, let’s check the average sentence length across all his major works:
# Calculate number of words per sentence
<- text_df_sentences %>%
text_df_sentences rowwise() %>%
mutate(word_count = str_count(sentence, "\\S+"))
# Calculate average sentence length across all sentences
<- mean(text_df_sentences$word_count)
average_sentence_length
average_sentence_length
[1] 23.83138
Next, let’s assume a page has 450 words and calculate the number of sentences per page (sorry I am not bothered to break up the files by page):
<- 450
words_per_page
# Estimate sentences per page
<- words_per_page / average_sentence_length
sentences_per_page
sentences_per_page
[1] 18.88267
If Rawls’ work only has 42 sentences containing “race”, “racism”, “racist”, or “racial”…this roughly comes to [rapidly taps racism calculator]:
<- 42/sentences_per_page
rawls_race_pages
rawls_race_pages
[1] 2.224262
2.2 pages. I am shocked. SHOCKED I SAY.
But remember, this is all a thought experiment. Suppose one legally obtained those PDFs. So, all of this is hypothetical, of course…