You inevitably have teams with head to head victories that end up ranked below the team they beat. There are too many teams and too few games. The transitive property won't hold.
The fact that the ESPN FPI model seems generally reasonable, scores well on a variety of model scoring metrics, AND has some strange outlier cases makes me more confident that they aren't up to any funny business at ESPN. Some strange results in a statistical model are to be expected. Hopefully not too many and hopefully not too strange.
BYU and CU's raw FPI score rounds to 11 — essentially tied. They are separated by less than a point. Top ranked Ohio St is 27 to give a sense of scale. Is BYU really as much better than Colorado as the bowl game showed? Sometimes a team just has an off night. Take Oregon for example. At the end of the season Oregon was FPI ranked a few spots below Ohio St despite the head to head win (and being undefeated vs 2 losses for Ohio st). Then they got blown out in the playoff. A sample size of one game tells you very little about the relative strength of two teams, especially in a sport with so much randomness like football. That's why no one thinks Kansas is a better team than BYU despite our loss to them.