![[unfaith-1.png]]
## Overview
>[!summary]
> Model thoughts are not always trust worthy - we asking the model to prove something false, the model thoughts are twisted to justify the answer (unfaithful) making it hard to use chain-of-thought analysis to assess model result faithful.
>[!question]
> Can we use thoughts to ensure the model is producing faithful answers?
>[!idea]
> Ask models to prove factually untrue statements and see if the thoughts give hints that the model in unfaithful
## 🔮Insights
![[unfaith-2.png]]
>[!insight]
> When the model is providing a unfaithful answer, the thoughts are unfaithful too.
>[!limitation]
>
## 🧭 Topic Compass
### Where Does X come from?
### What is similar to X?
### What compete with X?
### Where can X lead To?
## 📖 References
### **Paper**
url: