That's what the calibration and other methods do. You should be severely penalized in your model grade every time you say there is a 90% chance of a team winning and they actually lose. Yes, the business of predicting college football games is hard. Most of the models are probably going to end up being around 50/50 in the end against the spread.
The methods of evaluating probabilistic models predicting binary outcomes are well known. This comes up all over the place (e.g. will it rain tomorrow). I'd rather take the weatherman who only gets things right overall 48% of the time, but when he says there's a 90% chance of rain you can be sure of rain, than the weatherman who is right 52% of the time but when he says there's a 90% chance of rain there's really a 40% chance of rain.