The Two Ways of Getting It Wrong

I enjoy teaching people about statistical reasoning, but it does get tedious trying to get them to apply it. That is, people do reasonably well with the abstract case (raw numbers) or an illustrative example (incidences of renal failure, HIV tests), but their success rate drops dramatically as soon as the conversation turns to anything that’s genuinely important to them.

In my world, the topic of indie versus traditional (“trad”) publishing comes up often, as it did yesterday pursuant to this biased blog post. Talk to them long enough and eventually trad supporters will mention publishing companies’ valuable role as “gatekeepers” — people whose job it is to let the good folks through and keep the riffraff out — and how that serves a vital function for readers.

The detection of “good” books — for now let’s skip trying to define “good” — like the detection of HIV, is a binary (pass/fail, true/false) measure. That makes it seem like there is only one way to err: you are either right or you are wrong.


Turns out on any binary measure there are actually TWO ways to fail, and in statistics those are called Type I (alpha) or Type II (beta) error, and although the expert will note some very slight nuanced differences, these are basically false negatives and false positives, respectively.

Most examples you hear about come from published writers talking in interviews about how long it took to get their books published. Every big name has this story. Frank Herbert’s Dune, which is probably the greatest work of fantastic fiction of the 20th century, was rejected by publishers some 20-odd times.

Think about that. How easy would it have been to give up after the tenth rejection in a row? or the fifteenth? or the twentieth? This is why I have said many times that success in writing requires an almost messianic belief in yourself and other people’s opinion of your work is not diagnostic of anything.

(Some writing really does suck, but diagnosing that requires a two-pronged test. Rejection — even serial rejection — proves nothing.)

Herbert’s example, and all the others like it — Stephen King will tell a similar tale — raises the important question: if trad publishers are such good gatekeepers, how come they’re always getting it wrong?

In reality, the situation is actually much, much worse than that little thought experiment would indicate, and that’s because it only addressed one of the two ways to fail.

Estimations of publisher accuracy are hampered in this case by something called Survivorship Bias. The victims of the publishers’ Type I and Type II error are not equally represented in the media. You’ll hear from King and Herbert but you’ll never hear from the Type II folks (authors of rejected “good” manuscripts) because to the media such a person is just another schlop unpublished wannabe.

But that’s exactly what’s required for an accurate, non-subjective estimation of publishers’ worth as gatekeepers. Take this example from Wikipedia and replace “diagnosed bowel cancer” with “publishable manuscript.”

Publishers’ subjective sense of their rectitude is wholly confined to the ‘Condition Positive’ column — books they accepted and published — because that is the only column for which they have any real knowledge of outcomes.

So a publisher might say “Most of the manuscripts I designated ‘good’ in the submission process were published, and most of what I published ended up being ‘good’ per the estimation of the marketplace, 2/3 in fact. Only 33% did poorly. Therefore, I add value as a gatekeeper.”

But look at the numbers. That conclusion is literally based on only a tiny fraction of the truth because published books, which is the entire population from which this informal analysis is drawn, only account for a tiny fraction of all the manuscripts on offer, which is the total population of study! (In this example, 30 cases versus 2,000, but of course the real numbers are in the hundreds of thousands and millions, respectively.)

When publishers tout their value as gatekeepers, they are not only touting their sensitivity but also their specificity, but then they omit the latter entirely from their “proof.” It’s like doctors telling you how good they are at curing a disease but basing that only at the subset of people who got well.

Of course your success rate is good!

Because there are so many more unpublished manuscripts — a lot more than 2,000 — a completely theoretical specificity of 85% (which is astronomically high for a human decision-maker and almost certainly much higher than their actual rate, whatever it is) still omits a huge number of people — 15% of every manuscript in the world.

That’s a whole lotta bad decisions being swept under the rug.

Note for simplicity’s sake I am leaving aside any gold standard base incidence of “good” manuscripts,” which adjusts the true error rate down, but I’m trying to keep this short.

Now, in indie publishing, obviously there is a very low barrier to entry and there are pretty much no gatekeepers at all, save the doubt and fear of heading off into the wilderness alone, which means that, although many more “good” manuscripts are turned into books (almost all?), a great many more bad ones are as well.

We all know this. But it’s not a problem endemic to indie publishing. There is always a trade-off between sensitivity and specificity and tests that improve one will usually degrade the other, whether in security screening at the airport, spam filtering in your inbox, or the detection of ghosts and ESP!

Being genuinely, measurably good at something is damned hard. Welcome to life.

Now, you may look at the indie market and decide you prefer the trad system, which uses biased decision-making and insider connections to artificially reduce noise. I can’t argue the preference for the exclusivity of a walled garden. Some people prefer gated communities and private schools and quinoa over bacon.

But given the actual state of the world, what is that preference really but a wistful, nostalgic plea?

“I miss the simple days of yesteryear, before Netflix and tablet PCs. I miss Andy Griffith and print magazines and tube socks.”

Well, okay. But regardless of your personal preferences, those days are gone. As Gomer would say, “She ain’t comin’ back.”

More to the point, you cannot argue that the reason you prefer the good ol’ days is because “Publishers are good gatekeepers.” That’s bunk. They’re shoddy gatekeepers who’ve convinced you of their value by excluding most of what matters from their subjective estimation of their own accuracy. (And of course it’s no surprise that it’s in their economic interest to do so.)

The fact is, lots of people were getting screwed by the old system — and not just unlucky white dudes. Minorities, women, anyone who didn’t go to an Ivy League school, authors who wrote transgressive material, or made books with an unusual format, or who wrote in a tight niche (anthropomorphic dinosaur erotica, anyone?), or basically anything that didn’t easily fit the publisher’s legacy production assembly line have all been missing or under-represented.

If you’re one of the winners, particularly if you’re also ignorant of statistics, then it’s easy to convince yourself you got in because of merit. Here’s the deal: I’m not saying you didn’t earn it. I’m saying lots of other people did as well and were still left out. Therefore your merit was not the determining factor of your success. (I will skip the discussion of Confirmation Bias and the Ratcheting Effect, which can explain both your initial and continued success. You’re smart enough to look those up on your own.)

By the way, all the same arguments apply to music and film. As a whole, indie media is democratic. Telecomputing has cut a tunnel-to-market right under the castle walls. That will not change, and if trad producers are unwilling to change or adapt, I ask them, how will you handle the next disruption? Or the one after that?

Or did you think technology was done changing things?

The Doors of Timur by Vasily Vereshchagin


