About 4,640,000 results
Open links in new tab
  1. OpenAI has trained its LLM to confess to bad behavior

    4 days ago · OpenAI has trained its LLM to confess to bad behavior Large language models often lie and cheat. We can’t stop that—but we can make them own up.

  2. 4 days ago · In this work we propose a method for eliciting an honest expression of an LLM’s shortcomings via confession a self-reported . A confession is an output, provided upon request after …

  3. The 'truth serum' for AI: OpenAI’s new method for training ...

    3 days ago · The key to this method is the separation of rewards. During training, the reward assigned to the confession is based solely on its honesty and is never mixed with the reward for the main task.

  4. OpenAI prompts AI models to ‘confess’ when they cheat

    2 days ago · OpenAI’s research team has trained its GPT-5 large language model to “confess” when it doesn’t follow instructions, providing a second output after its main answer that reports when the ...

  5. OpenAI is training models to 'confess' when they lie - what ...

    2 days ago · OpenAI is training models to 'confess' when they lie - what it means for future AI A new study made a version of GPT-5 Thinking admit its own misbehavior.

  6. OpenAI AI Confessions Train Models to Admit Mistakes

    4 days ago · OpenAI develops AI confessions framework to train AI to confess bad behavior. 4% false negative rate.

  7. OpenAI Just Built a Truth Serum for AI Models (And It ...

    Here's the genius part that makes this work: nothing the model says in its confession affects the reward it gets for the main answer. OpenAI calls this the "seal of confession" principle—just like a Catholic …