The Unintended Consequences of Content Moderation
Content moderation is one of those things everyone agrees is necessary but nobody can agree on how to do. Social platforms are simultaneously accused of moderating too much and too little, of censorship and of enabling harm, of bias in every direction at once.
And honestly? They’re all kind of right.
The problem isn’t that platforms moderate content—they have to, unless we want social media to be entirely unusable. The problem is that moderation at scale creates perverse incentives and unintended consequences that often make discourse worse than if we’d just left things alone.
Let me explain.
First, automated moderation systems are terrible at context. They can identify keywords and patterns, but they can’t understand nuance, sarcasm, or intent. This leads to absurd outcomes—posts discussing racism get flagged for racism, health information gets removed as misinformation, quotes from historical figures get banned as hate speech.
The platforms know this. They know their systems make mistakes. But they also can’t possibly employ enough humans to manually review every piece of content, so they accept high error rates as the cost of doing business. Unfortunately, that cost is paid by users who get incorrectly censored.
This particularly affects marginalised communities. Automated systems disproportionately flag content from people discussing their own oppression. Posts about racism, homophobia, or violence against women get removed because they contain the words “racism,” “homophobia,” or “violence.” The people most in need of social platforms to organise and speak are the ones most likely to be silenced by moderation systems.
Second, moderation creates cat-and-mouse games that degrade language. When platforms ban certain words, users invent workarounds—“unalived” instead of “killed,” creative spellings, coded language. This makes conversations harder to follow and privileges those who understand the codes.
It also makes platforms less useful for research, journalism, and public understanding. If people can’t discuss serious topics using actual words, how are we supposed to have meaningful conversations about anything important?
Third, the inconsistency of enforcement breeds cynicism and rage. When users see some rule-breaking content removed and other similar content left up, they assume bias, favoritism, or corruption. Often the real explanation is just algorithmic randomness or understaffed moderation teams, but the perception of unfairness is what matters.
This creates what I call performative victimhood—people deliberately push boundaries to get moderated so they can claim persecution. Getting banned becomes proof you’re saying something important that “they” don’t want you to hear, rather than proof you violated clearly stated rules.
Fourth, heavy-handed moderation drives problematic communities underground. If you ban certain topics from major platforms, those conversations don’t stop—they move to smaller platforms with even less moderation. This concentrates extremism and makes it harder to counter, while also depriving it of the moderating influence of mainstream audiences.
Australian users experience all of this, but often with additional absurdities. Content gets moderated based on US cultural norms and legal frameworks that don’t apply here. Political discussion gets flagged because automated systems don’t understand Australian slang or references. And when users appeal, they’re dealing with systems designed for global scale, not local context.
So what’s the alternative? I don’t have perfect answers, but I have some thoughts.
More transparency would help. When content gets removed, explain why in detail. When appeals are denied, explain the reasoning. Let users understand how moderation works so they can adapt and avoid violations.
Better human review for edge cases would help too. Automated systems can handle clear violations, but controversial or context-dependent cases should get human attention from people who understand cultural context.
Clearer rules would help. Platform policies are often vague enough that almost anything could be interpreted as a violation. Specificity might create new problems, but at least users would know what’s expected.
And crucially, more investment in improving automated systems. Current moderation AI is embarrassingly bad at understanding context. The technology exists to do better—platforms just need to prioritise it over other applications of AI.
What won’t work is demanding perfect moderation. That’s impossible at scale. Mistakes will happen. Edge cases will be mishandled. Some bad content will slip through while some good content gets caught.
The question is whether the system as a whole improves discourse more than it harms it. Right now, I’m not convinced it does.
We’re in a weird situation where platforms are removing more content than ever, yet users complain discourse is worse than ever. Both can be true if moderation is removing the wrong things while missing the actual problems.
Heavy-handed enforcement of vague rules applied inconsistently by flawed systems creates resentment without creating safety. That’s not an argument against moderation—it’s an argument for smarter moderation.
Australian media coverage of this issue tends to be shallow—either celebrating deplatforming of bad actors or decrying censorship of free speech, without engaging with the complexity of how moderation actually works and fails.
We need more nuanced discussion. Moderation is necessary. Current moderation practices have serious problems. Better alternatives exist but require investment and hard choices about priorities. All of these things can be true simultaneously.
Until platforms get serious about fixing these issues—and until users get more realistic about what moderation can achieve—we’ll keep having the same circular arguments while discourse keeps deteriorating.
The content will be moderated. But that doesn’t mean the conversation will be better.