I used to believe that there was no such thing as good & evil. That everyone just saw themselves as good, the other side evil, and objectively they weren’t really different.
I couldn't quite fit this in, but one big idea is that competence is aligned with "good". All things being equal, a "good" actor will be more competent than an "evil" actor. The AI alignment people discovered this recently (see https://scottaaronson.blog/?p=8693) when they made an LLM intentionally worse at writing code, and it started to behave more evil (praising hitler)
This SHOULD mean that "good" should just "always win", should always be able to outcompete evil. Why good doesn't win permanently is an open question for me that I have some theories about. But the reason that evil in general wins is because evil HAS to work & grow & be competent just to survive. It has this pressure. Whereas good has to *choose* to grow and compete.
This, I believe, is the source of the "all that needs to happen for evil to win is for good to do nothing". It's because the odds are stacked against evil, by its nature. Evil can't actually ever win permanently (sort of by definition, tautologically).
This is actually a major debate in AI alignment, under the name "Orthogonality Thesis"!
I disagree with your interpretation of the recent research - the LLM did not become worse at coding - it takes quite a lot of skill to write intentionally insecure code that looks secure. It was evil code in the sens of being malicious, not "bad" code in the sense of being full of beginner mistakes.
Note that they also asked the LLM to write pedagogically valuable obviously insecure code (ie "bad" code full of beginner security errors, but not maliciously obfuscated), and this did NOT make it evil (praising hitler).
Yes - in followup work, the decrease in competence was at the very least, extremely minor, possibly nonexistence. Even the falloff in competence in the original was fairly minor.
I don't think that is true, however - is it good for humans to destroy all of the other species that we've driven extinct? Probably not. And yet have we? I think we have.
We're competent at destroying many things in the world, including possibly the biosphere soemday. Does this make us good?
yes! I think we can basically make a full list of properties like this. And then look at what each one does. I think some of the properties tilt in favor of good winning (like rule via love = competence). Another is the ability to listen & understand requires love, and allows you to make better decisions, because you have more information.
Some properties make "good" 's job *harder*. Considering the well-being of self *and* other is one of them. I think it's related to the one I just published, about how "evil" can force decisions on you, whether they are good or bad for you, while "good" can only give you things if you ask for it.
Problem: bad actors take advantage of decentrilization by being marginally stronger. People create authority to stop it. So the rebels are against the empire, but they are also creating the platform for the mafiosos to gain power.
Yes. How does the rebel group ensure they don't just recreate another empire in their attempt to take down the evil empire is a very fruitful question for me
There's actually a scene where this is explicitly mentioned, when Luthen says "Don't you see I have sacrificed everything for the cause - I have had to become the very evil that I fight against in order to win". I think there's two very big ideas in this scene (1) the necessity of mimicking evil as you fight it. Which isn't really winning. The evil has now spread to you. So even if you win, the evil has won (2) the idea of "containing" the evil and sacrificing yourself.
Luthen acknowledges he is doomed (spiritually, if not materially). He essentially sacrifices himself for the greater good, even if his means mirror the emperor (I feel like there's a similar idea in Lord of the Rings but I don't know enough there to articulate it)
but the gist is: the good needs not just to beat evil, it has to find a way to sustain the good *while* competing against evil.
"good" tactics bring with them greater competence (people do better work when they are inspired vs afraid) but the challenge of "good" is much harder than "evil". It's a kind of double challenge
thats also a problem (how to stop the new regime from becoming the old one), but I'm pointing at something different-- How to stop the rebel group from destroying the good things about centralization and/or authority.
Riots are against the empire, and cops stop riots. But the rebels don't like the cops, and in a way are promoting the riots.
Ok, yes I'd say that's about "unbundling evil packages". If we were to draw an image it would be like, take a circle with lots of green dots & red dots inside of it. You can see the evil "red dots", you want to destroy it.
Now, in attacking it, you're kind of in a lose lose position (you either succeed, and you destroy the green & red together), or you fail (because the green fights back, or you lose too much if you get rid of the green).
I think the ways out of this are either identifying what is "necessary evil" as a scaffolding step, or ideally "unzip it". Through greater discernment, you can grow the green dots, and decrease the red dots. The latter is the more optimal, but it requires greater skill/competence/discernment.
(this is also why I am so obsessed with the TV show The Bear. A lot of people, including past me, saw authority & top down hierarchy as bad, because most of our experience with it is bad. But it's really hard to get rid of because it's not *inherently* evil. Greater discernment allows us to attack in a more targeted way. Our attacks face much less resistance when they are targeted)
You might find the discussions of alignment in tabletop games interesting. Particularly M Joseph Young's essay on paladins. Let's see...
No. I'm thinking of something else. It was a discussion of how evil people have a hard time cooperating, because they know that, in their place, they would act in an untrustworthy manner. But perhaps if you peruse other discussions on this topic, you'll encounter more useful and interesting things on this topic.
I think you're quite right.
This is the problem with communism. A coop can function spontaneously, but communism must be enforced and always degenerates into exploitation and a desire for exit.
re:communism, yes! The simple answer there is that those people don't actually want the bad thing. They want the good outcome, and they believe this process will bring about the good thing.
The "proof" of this is the thought experiment that you can show them the failed execution of their idea, and they will say "oh no, that's not what I want!" - (as opposed to a truly misaligned entity which *would* choose to the destruction because there's either some personal gain for them, or some kind of destructive nihilistic motive).
> how evil people have a hard time cooperating, because they know that, in their place, they would act in an untrustworthy manner
this is very important. Evil hurts *itself*. This is the nature of evil. There is no way to do evil without it "poking out" in this way, somehow, someway, over some time frame. I think this is the same idea as karma. The question of whether something is "necessary evil" is more complicated, but evil always has consequences.
I couldn't quite fit this in, but one big idea is that competence is aligned with "good". All things being equal, a "good" actor will be more competent than an "evil" actor. The AI alignment people discovered this recently (see https://scottaaronson.blog/?p=8693) when they made an LLM intentionally worse at writing code, and it started to behave more evil (praising hitler)
This SHOULD mean that "good" should just "always win", should always be able to outcompete evil. Why good doesn't win permanently is an open question for me that I have some theories about. But the reason that evil in general wins is because evil HAS to work & grow & be competent just to survive. It has this pressure. Whereas good has to *choose* to grow and compete.
This, I believe, is the source of the "all that needs to happen for evil to win is for good to do nothing". It's because the odds are stacked against evil, by its nature. Evil can't actually ever win permanently (sort of by definition, tautologically).
This is actually a major debate in AI alignment, under the name "Orthogonality Thesis"!
I disagree with your interpretation of the recent research - the LLM did not become worse at coding - it takes quite a lot of skill to write intentionally insecure code that looks secure. It was evil code in the sens of being malicious, not "bad" code in the sense of being full of beginner mistakes.
Note that they also asked the LLM to write pedagogically valuable obviously insecure code (ie "bad" code full of beginner security errors, but not maliciously obfuscated), and this did NOT make it evil (praising hitler).
Yes - in followup work, the decrease in competence was at the very least, extremely minor, possibly nonexistence. Even the falloff in competence in the original was fairly minor.
I don't think that is true, however - is it good for humans to destroy all of the other species that we've driven extinct? Probably not. And yet have we? I think we have.
We're competent at destroying many things in the world, including possibly the biosphere soemday. Does this make us good?
My sense of this is that good considers the well-being of the self *and* other, while evil only considers one.
yes! I think we can basically make a full list of properties like this. And then look at what each one does. I think some of the properties tilt in favor of good winning (like rule via love = competence). Another is the ability to listen & understand requires love, and allows you to make better decisions, because you have more information.
Some properties make "good" 's job *harder*. Considering the well-being of self *and* other is one of them. I think it's related to the one I just published, about how "evil" can force decisions on you, whether they are good or bad for you, while "good" can only give you things if you ask for it.
Problem: bad actors take advantage of decentrilization by being marginally stronger. People create authority to stop it. So the rebels are against the empire, but they are also creating the platform for the mafiosos to gain power.
Yes. How does the rebel group ensure they don't just recreate another empire in their attempt to take down the evil empire is a very fruitful question for me
There's actually a scene where this is explicitly mentioned, when Luthen says "Don't you see I have sacrificed everything for the cause - I have had to become the very evil that I fight against in order to win". I think there's two very big ideas in this scene (1) the necessity of mimicking evil as you fight it. Which isn't really winning. The evil has now spread to you. So even if you win, the evil has won (2) the idea of "containing" the evil and sacrificing yourself.
Luthen acknowledges he is doomed (spiritually, if not materially). He essentially sacrifices himself for the greater good, even if his means mirror the emperor (I feel like there's a similar idea in Lord of the Rings but I don't know enough there to articulate it)
but the gist is: the good needs not just to beat evil, it has to find a way to sustain the good *while* competing against evil.
"good" tactics bring with them greater competence (people do better work when they are inspired vs afraid) but the challenge of "good" is much harder than "evil". It's a kind of double challenge
thats also a problem (how to stop the new regime from becoming the old one), but I'm pointing at something different-- How to stop the rebel group from destroying the good things about centralization and/or authority.
Riots are against the empire, and cops stop riots. But the rebels don't like the cops, and in a way are promoting the riots.
Ok, yes I'd say that's about "unbundling evil packages". If we were to draw an image it would be like, take a circle with lots of green dots & red dots inside of it. You can see the evil "red dots", you want to destroy it.
Now, in attacking it, you're kind of in a lose lose position (you either succeed, and you destroy the green & red together), or you fail (because the green fights back, or you lose too much if you get rid of the green).
I think the ways out of this are either identifying what is "necessary evil" as a scaffolding step, or ideally "unzip it". Through greater discernment, you can grow the green dots, and decrease the red dots. The latter is the more optimal, but it requires greater skill/competence/discernment.
(this is also why I am so obsessed with the TV show The Bear. A lot of people, including past me, saw authority & top down hierarchy as bad, because most of our experience with it is bad. But it's really hard to get rid of because it's not *inherently* evil. Greater discernment allows us to attack in a more targeted way. Our attacks face much less resistance when they are targeted)
You might find the discussions of alignment in tabletop games interesting. Particularly M Joseph Young's essay on paladins. Let's see...
No. I'm thinking of something else. It was a discussion of how evil people have a hard time cooperating, because they know that, in their place, they would act in an untrustworthy manner. But perhaps if you peruse other discussions on this topic, you'll encounter more useful and interesting things on this topic.
I think you're quite right.
This is the problem with communism. A coop can function spontaneously, but communism must be enforced and always degenerates into exploitation and a desire for exit.
re:communism, yes! The simple answer there is that those people don't actually want the bad thing. They want the good outcome, and they believe this process will bring about the good thing.
The "proof" of this is the thought experiment that you can show them the failed execution of their idea, and they will say "oh no, that's not what I want!" - (as opposed to a truly misaligned entity which *would* choose to the destruction because there's either some personal gain for them, or some kind of destructive nihilistic motive).
> how evil people have a hard time cooperating, because they know that, in their place, they would act in an untrustworthy manner
this is very important. Evil hurts *itself*. This is the nature of evil. There is no way to do evil without it "poking out" in this way, somehow, someway, over some time frame. I think this is the same idea as karma. The question of whether something is "necessary evil" is more complicated, but evil always has consequences.
The weed of crime bears bitter fruit...
Interesting post, lots for me to think about!
I think you might enjoy Siderea's essay discussing the novel Watership Down - it's a very good analysis of topics like
"if you care about the success of the collective, and your own growth, you WANT more competent people below you to rise up, because it helps you"
via the lens of the Jungian archetype of Kingship
https://siderea.dreamwidth.org/1192109.html