Evil Behavior Recorded

Forcing LLMs to be evil during training can make them nicer in the long run

New Anthropic research shows that undesirable LLM traits can be detected—and even prevented—by examining and manipulating the model’s inner workings. A new study from Anthropic suggests that traits ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Forcing LLMs to be evil during training can make them nicer in the long run

Trending now