About this project
I immediately clicked on Wolf Hall when I saw it in the
Guardian's list of the 100 best novels, to see who had voted for
it. It is the book I would choose for number one. I was amazed to
see that everyone who voted for it was a woman. Sad, but true,
like WTF! This made me very curious about who voted for whom and
if there were obvious gender trends in the voting. I used Claude
AI to analyze all of the data from the Guardian's list and to
make this visualization. It is useful in a few different ways.
First, it is a good way to find books. You can navigate from
books that you like to voters who selected them and then to the
other books that those voters also chose. Just click around in
the graph or in the lists. Second, it shows very clear gender
trends in the voting. Men tend to vote for male authors.
I looked at the 'gender' of books two ways. First, what
percentage of ballots are for the opposite gender (non-binary
voters are in all of the lists and rankings but not in these
male/female statistics). Second, I thought maybe some books
would be more stereotypically 'male' or 'female' and that might
drive voting preferences differently from the author's gender.
Maybe male voters just prefer 'male' books, you know, spies,
war, mayhem, exuberant misogyny and all the rest. I certainly
love and admire lots of those books. Maybe women just don't
write as many books like that. So, I had Claude code all of
the books as more male or more female using stereotypical,
subjective interpretations (including subject matter and aspects
of writing style).
It turns out that coding the books as male/female shifts the
results a few points, but men's preference for male authors
easily trumps their preference for male-coded subjects.
Some of the coolest results are in the voter rankings. I have
ranked voters by consensus vs contrarian, most idiosyncratic vs
most canonical taste, and those who vote most against the
overall gender pattern. These rankings also switch around in
interesting ways when you include all 700 or so books vs. just
the top 100.
Nota bene, there are some tangled, intractable issues here.
There is a phantom false premise that haunts these kinds of
comparisons, the assumption that the universe of books to vote
for is not already skewed by gender. Obviously that's wrong.
It's a real garden of the forking paths. So the stats are
clear, these voters voted this way in this sample. But, the
ground is less solid than you might want for building a broader
analysis. The most important result, however, remains clear:
Hilary Mantel was robbed.
I wrote this introduction old school, typing in a text
editor. The rest was written and programmed by Claude. Knowing
a tiny bit about how LLMs work, I suspect there is a good chance
this entire analysis is completely, fraudulently wrong. But, it
is wonderfully plausible which is all we really can get in the
post-truth reality.
— Seth Rosenthal
What the data shows
The Guardian's 100 best novels of all time (May 2026) was
assembled from the top-10 ballots of 172 contributors — novelists,
critics, academics, and Guardian journalists. That's 1,720 ballots in
total, across 694 distinct books picked at least once.
The published top 100 is the consensus filter; the 594 honourable-mentions
form a long tail of personal favourites.
The headline pattern is gendered asymmetry, but not the
asymmetry you might expect from the canon being male-dominated. Female
voters cast their ballots across the whole subject-matter gradient.
They pick the male-coded canon (Moby-Dick, Brothers Karamazov, the
modernists) and the female-coded canon (Austen, the Brontës, Morrison,
Atwood) in roughly equal measure. Their voting distribution is
bimodal — two near-equal peaks, one in male-coded
territory and one in female-coded territory. Male voters, by contrast,
cluster narrowly: their distribution is unimodal, a
single peak around mildly male-coded modernist titles (Ulysses,
The Trial, Brothers Karamazov, Blood Meridian), and 70% of their
ballots go to male-authored books versus a 50/50 split for female
voters.
When the analysis is extended to all 694 books — including everyone's
personal favourites that didn't make the consensus top 100 — the
pattern strengthens rather than weakens. The male
voters' distribution sharpens into an even tighter single peak (the
secondary peak at +1.5 in top-100 mode disappears); the female bimodality
persists.
A surprising feature of the long tail: it doesn't diversify
the canon in the way the headline statistics might suggest. It is more
contemporary (median publication year 1973 vs 1937 for the top 100)
and less canonical (mean canonicity −0.33 vs −1.18), but it doesn't
really shift the gender mix among authors (38% female-authored in the
long tail vs 37% in the top 100), and its subject-matter mean is
actually more male-coded than the top 100 (−0.21 vs +0.23).
The long tail is dominated by mid-century male literary
canon — Bellow, Updike, Roth, Coetzee, Bernhard, Beckett,
Powell, DeLillo — that voters mention as personal favourites but that
didn't aggregate to the top 100. Contemporary diverse voices are
present in the long tail, but outnumbered by these male-literary
"honourable mentions."
The voter rankings shift in interesting ways when you switch from
top-100 mode to all-books mode. The two voters whose entire
top-10 ballots missed the cut — Natalie Haynes and Nussaibah Younis —
naturally dominate the "most contrarian" ranking in all-books mode
(they're invisible in top-100 mode). Mark Haddon's top-100 picks look
extremely canonical (mean canonicity −2.50); his full top-10 includes
enough recent work that his mean shifts to +0.60, a 3-point swing.
Nikesh Shukla's top-100 picks look strongly female-coded (+2.50);
his full top-10 is essentially gender-neutral (−0.05). These shifts
suggest that the picks which aggregate to the top 100 don't
always reflect each voter's full reading personality — the consensus
filter has its own bias toward female-coded literary fiction at the
"officially approved" end and male-coded modernism at the
"officially serious" end, with each voter's idiosyncrasies thinning
out under aggregation.
For a frank discussion of how the codings were done, the
assumptions made, and the specific judgment calls behind
individual books, see
Methodology & coding notes.
View the source code →