Makes up stuff a decent percentage of the time. When it builds it's tokenizer I think it gets really jumbled on who the subject and description is in online conversations which is where most of the training is from. Not just the models struggle but the data preparation where thousands of people do it usually in low cost countries.