![[unfaith-1.png]] ## Overview >[!summary] > Model thoughts are not always trust worthy - we asking the model to prove something false, the model thoughts are twisted to justify the answer (unfaithful) making it hard to use chain-of-thought analysis to assess model result faithful. >[!question] > Can we use thoughts to ensure the model is producing faithful answers? >[!idea] > Ask models to prove factually untrue statements and see if the thoughts give hints that the model in unfaithful ## 🔮Insights ![[unfaith-2.png]] >[!insight] > When the model is providing a unfaithful answer, the thoughts are unfaithful too. >[!limitation] > ## 🧭 Topic Compass ### Where Does X come from? ### What is similar to X? ### What compete with X? ### Where can X lead To? ## 📖 References ### **Paper** url: