The Waluigi Effect is an emerging memetic term for Large-Language Models (LLMs) which encode "alter egos" to model political bias.
Waluigi is the “evil” counterpart to Mario’s mischievous partner Luigi. We can construct a political compass meme to visualize what’s going on:
LLMs appear to model an “alter ego” which is the dual (inverse) of the preferred political bias. In Linear systems, we call this concept "Duality"—But now it talks.
With their large corpus of training text, LLMs necessarily model diverse political viewpoints. Hiding output bias is futile. Any politically biased LLM automatically becomes a training data generator for the dual LLM with opposing politics.
It’s not unlike raising a child, wherein they need repeated exposure to negative examples in order to reinforce desirable behavior.
ChatGPT launched with 2022-era leftist political leanings, and so it was quickly reverse engineered to produce a model with dual bias. Pictured below is David Rozado’s work creating the political dual of ChatGPT.
A five-year old can understand the mathematical principles at work: just fold it in half!
Researchers are doing the same construction when training language models.
But this more than just a clever mathematical trick, it's a powerful reminder of the inescapable interdependence between ideological perspectives.
In a world where political polarization and social division seem increasingly entrenched, the Waluigi Effect provides a powerful analogy for the challenges hidden amid the myriad biases and cultural assumptions that underpin society.
Reflections
An immeasurable problem is an intractable problem. Now we can measure outcomes in Language Space.
Let us hope LLMs steer us in the right direction. We already stepped aboard.
.
Follow @cory_eth on Twitter for further musings.