I used to believe that there was no such thing as good & evil. That everyone just saw themselves as good, the other side evil, and objectively they weren’t really different.
I don’t think this is true. I think there really is a categorical difference between “good”, and “evil”. I think we can understand and identify it the same way taxonomists classify different types of species.
An example of the categorical difference between good & evil comes up in S1E1 of Foundation (Asimov). The emperor Day teaches the young emperor Dawn a lesson in how to get the best output from your people:
“When people are afraid to do the job right, they are certain to do it wrong”
So that’s one example: an “evil” emperor rules via fear. A “good” emperor rules via inspiration, via love. People voluntarily choose to follow the good, and that allows them to produce their best work. People can be coerced to follow orders, but that produces inferior work.
I’ve found the Star Wars TV show Andor helpful for thinking about this as well. There you can see how the evil empire has a strict hierarchy, and does a lot of evil things (like killing people). But you can also see how the rebel alliance also has a hierarchy, and also does a lot of evil things (like killing people).
But we know there is (supposed to be) a difference. Somehow the rebels are “good” and the empire is “evil”. The empire fights to accumulate power to one person, while the rebels independently fight to gain their freedom.
This is a more subtle point: “evil” requires explicit coordination (to funnel power towards one specific person), whereas “good” can conspire together without ever meeting or knowing each other. This is spelled out for us in this speech:
Remember this.
Freedom is a pure idea. It occurs spontaneously and without instruction.
Random acts of insurrection are occuring constantly throughout the galaxy.
There are whole armies battalions that have no idea that they've already enlisted in the cause. Remember that the frontier of the Rebellion is everywhere. And even the smallest act of insurrection pushes our lines forward.
And then remember this.
The Imperial need for control is so desperate because it is so unnatural. Tyranny requires constant effort. It breaks, it leaks. Authority is brittle.
Opression is the mask of fear.
The last example I’ll mention today is from the TV show The Bear. There we can see an example of a negative “evil” hierarchy transition into a positive “good” hierachy. This was particularly interesting to me because I used to think “hierarchy = bad”. But the show basically holds everything constant through this transition, so you can see exactly what was “evil” before and why it’s “good” now1.
What I learned from The Bear is that no one really likes being in a hierarchy, especially not if they are at the bottom, but that people WILL voluntarily enter into it if they understand why it helps them. Everyone in The Bear initially resists the hierarchy, until they realize that the person above them can help them grow. Each character has a particular moment when they taste excellence, when they see themselves achieve something that they previously could not, and they switch from feeling compelled to follow, to wanting to follow.
The hierarchy in The Bear is good because everyone is oriented towards “excellence”. The person at the top is not looking down at those below him, he’s looking up towards growth & beauty, and pulls everyone up with him.
The hierarchy of the empire in Star Wars is bad because the emperor “looks down” on those below, mostly wants to use them & control them. In a positive hierarchy, competence is rewarded, and flows to the top. In a negative hierarchy, competence is a threat2.
If we can better understand & distinguish good from evil, we can align ourselves with good actors, as well as catching ourselves when we fall into evil patterns.
One of the most important insights I had here was learning that Christians don’t think of “sin” as like, breaking an arbitrary rule that some powerful emperor chooses to punish you for, but as hurting yourself directly by the nature of the action. Spending all your money gambling is a “sin” because now you’ve lost all your money, no one external had to punish you for it3.
I wrote this essay today because I think all of this is really obvious if you think about it, but it doesn’t seem obvious to most people. In my circles there’s a lot of talk about “AI alignment” but I’ve basically tuned it all out because I think alignment is already solved. The answer is what I’ve described above (traditionally known as: theology). But theology isn’t a “this old book has all the rules”, you can do theology from first principles. The answer to whether something is a sin or not is a technical, empirical question.
I don’t really care about AI alignment because the far more important and urgent issue is: human alignment.
The good news is, getting a clear enough picture of the assymetry of good & evil allows us to tilt the playing field towards good. Good & evil patterns look very different, and they work very differently. We can just design systems that reward the good & punish the bad. This is completely solvable4.
We just need more engineers who understand theology. And then we can run experiments in human society to see if our alignment theory actually works or not.
I think this is important because it’s kind of like a controlled science experiment but for morality. You might think that in the Star Wars case, if the rebels do defeat the empire, that they will simply form yet another empire, and become evil, and the cycle repeats. If this is inevitable, then this would imply that all empires = evil. If the show wanted to disprove that they’d have to show us an example of an evil empire, and then a good one, so we could think through the differences, holding the “empire” variable constant.
“Competence is a threat in negative hierarchies” can be seen in Star Wars’s Andor, but also in the TV show Succession. If you care about the success of the collective, and your own growth, you WANT more competent people below you to rise up, because it helps you.
In Succession, incompetent people cling to power and sabotage their peers or those below them to prevent them from threatening their positions, whereas in The Bear, people willingly giving up their status to raise up people who are more competent, even though ceding power hurts (financially, and emotionally). It’s ultimately better for them.
The other example I like is that “cancer is a sin, against itself”, in the sense that it chooses to grow without limit, but if it succeeds, it kills itself. All evil is like this. It either gets destroyed by the good, or it succeeds…and then ultimately gets destroyed anyway by its own actions.
A cell in an organism that destroys the organism it depends on is sinning. A human in a society that destroys the society it relies on is sinning. Someone is going to stop it eventually. It’s just a matter of how much destruction needs to happen before that reckoning comes.
One example is what I describe in: “Unfakeable signals of good faith”
I couldn't quite fit this in, but one big idea is that competence is aligned with "good". All things being equal, a "good" actor will be more competent than an "evil" actor. The AI alignment people discovered this recently (see https://scottaaronson.blog/?p=8693) when they made an LLM intentionally worse at writing code, and it started to behave more evil (praising hitler)
This SHOULD mean that "good" should just "always win", should always be able to outcompete evil. Why good doesn't win permanently is an open question for me that I have some theories about. But the reason that evil in general wins is because evil HAS to work & grow & be competent just to survive. It has this pressure. Whereas good has to *choose* to grow and compete.
This, I believe, is the source of the "all that needs to happen for evil to win is for good to do nothing". It's because the odds are stacked against evil, by its nature. Evil can't actually ever win permanently (sort of by definition, tautologically).
My sense of this is that good considers the well-being of the self *and* other, while evil only considers one.