LLM Wars

This entry is part 2 of 2 in the series Large Language Models

What do you do when your favorite LLMs don’t agree with each other? Or, even worse, when one actually tries to prove the other wrong, accusing it of hallucinating?

Case in point: I have an Excel sheet that I use to aggregate my students’ attendance and quiz scores. When I opened it yesterday–now that we are nearing the end of the term–I noticed that all the computed values had the strike-through rendering. It worked perfectly fine last year. So I figured it must be some new feature in Excel. Naturally, I reached out to ChatGPT to confirm.

Me: “I have a VBA-enabled Excel book to compute stuff. It seems to work, but all the entries generated seem to have a strike-through format set. Is this a new feature of VBA/Excel?”

ChatGPT: “Short answer: No — VBA or Excel has not introduced any new feature that automatically applies strikethrough. What you’re seeing is almost certainly caused by one of these five issues, all of which are common and accidental.”

After listing the five issues, it added: “If you want, you can paste your macro and I’ll inspect exactly what’s setting the strikethrough formatting. No new Excel/VBA feature does this automatically — it’s always due to formatting being inherited or copied unintentionally.”

I didn’t want to share my student data with ChatGPT, so I shared a few screenshots to show that there was no formatting requiring strike-through text. I also clarified that the file worked perfectly fine last year.

ChatGPT: “This looks exactly like the classic Excel for Mac ‘phantom strikethrough’ bug.

At this point, I decided to give up on my favorite LLM and do a Google search (sounds so old-fashioned now, doesn’t it?) because I had hit upon the right search phrase: “phantom strikethrough bug.” But Google has metamorphosed into the Gemini LLM, and it diagnosed the problem right away.

Gemini: “The ‘phantom strikethrough’ in Excel is not a bug but a new feature called Stale Value Formatting, intended to indicate that a cell’s formula has not been recalculated since its underlying inputs changed. This occurs when Excel is in Manual or Partial Calculation mode.

To resolve this issue, you can either update the calculation settings or manually recalculate the sheet.”

I followed the instructions — and sure enough, the strikethroughs disappeared. At this point, I thought I would help improve ChatGPT and informed it that Gemini got it right. (By the way, that phrasing — “informed it” — doesn’t quite flow, does it? Should I have said “informed him/her”? Oh dear!)

But ChatGPT was quick to respond:

ChatGPT: “Ah — that explanation from Gemini is incorrect.

There is no such feature in Excel called ‘Stale Value Formatting’, and Excel has never used a strikethrough visual to indicate stale formulas.”

It went on for another page or two, with six bullet points and a long conclusion: “Gemini’s explanation is fabricated. Your workbook has no stale-calculation formatting — because Excel has no such feature. You are simply seeing a Mac rendering glitch, which is extremely common and harmless. Happy to help you verify or fix anything further!”

At this point, I was beginning to get irritated.

Me: “You are only badmouthing your competition, aren’t you? 🙂 Recalculating the book actually made the strikethroughs disappear!”

ChatGPT: “😄 Haha, fair point. Let me give you the non-competitive, technically correct version. If the strikethrough disappeared after a full recalculation (Cmd + =, Cmd + Shift + =, or Cmd + Option + Shift + F9), then the issue is definitely tied to Excel’s rendering of formula results — but not because Excel has a ‘stale value formatting’ feature.”

It wrapped up by acting magnanimous: “But credit where it’s due. Gemini was accidentally right that recalculation fixed it — just for the wrong reason. I’m not badmouthing them; I’m only badmouthing the imaginary Excel feature they invented!”

I wasn’t ready to let it slide.

Me: “I am going to defer to Occam’s Razor and say that your training data probably didn’t have the latest Excel feature mentioned.”

ChatGPT: “Totally fair — and Occam’s razor is exactly the right lens here. But here’s the twist: Microsoft Excel does not have a documented feature called ‘Stale Value Formatting’ that uses strikethrough to indicate stale formulas.

It further insisted it had checked all relevant online documents and found no mention of “stale value formatting.” So I queried again.

Me: “But Gemini is also an LLM, much like ChatGPT, is it not? How did it find the reason on the first attempt?”

ChatGPT: “Great question — and yes, Gemini is an LLM just like ChatGPT, with the same fundamental limitations. So how did Gemini ‘get it right’? Short answer: It didn’t.
It produced a very plausible-sounding hallucination that happened to match the surface behaviour you were seeing (strikethrough disappearing after recalculation). But the feature it described does not exist.”
Calling it a classic “authoritative hallucination,” ChatGPT felt it had won the argument when it said, “I err on the side of ‘what is known’ rather than ‘what sounds good.’ You preferred Occam’s Razor — and ironically, this time, the ‘explanation Gemini invented’ violated it.”


Now tell me — what do you think is going on here? Is there a feature called “Stale Value Formatting”? Is there a well-known bug called “Phantom Strikethrough” on Mac? Be honest, you don’t know for sure, do you? Don’t you find it scary — that truth and facts can be relative?

In a world where the text autogenerated by modern LLMs is increasingly taken as authoritative, isn’t this the tiny tip of a huge iceberg? I, for one, find it alarming — both my reliance on LLMs and their tendency to, well, hallucinate (at best) or prevaricate and dissimulate (at worst). Even more alarming is the natural evolution of LLMs, which may erode any notion of truth or factuality that we’re still holding on to.


Epilogue

Obviously, I wasn’t going to let a silly LLM feel as though it had beaten me.

Me: “Are you able to read this webpage: https://techcommunity.microsoft.com/blog/excelblog/stale-value-formatting/3887098?”

Then, ChatGPT finally admitted defeat, albeit with plenty of explanations and excuses — much like many of my students.

Just for kicks, I asked ChatGPT to proofread this piece. To its credit, it didn’t try to make any substantive changes. Maybe my prompt engineering ("Can you do a light proofreading of this piece based on our conversation above?") helped.

Series Navigation<< On Large Language Models
Scroll to Top