My lab is a walking bundle of ironies
2026年3月14日ITELS speaking: describe your workplace
My lab is basically a walking bundle of ironies. We’re tucked away in the Department of Formal Methods, but is anyone actually doing “formal” work? Fat chance. Everyone is just obsessed with the ChatGPT hype and the whole LLM circus.
My project is to use LLMs to solve cases of Detective Winston –figure out who the killer is and see which company’s language model has the most powerful brain.
But just guessing the killer doesn’t cut it. The reasoning has to be correct too. My advisor had a bright idea: why not use a chatbot to analyze the reasoning process?
And I’m like, “Wait… isn’t that like making the player a referee?”
My advisor went, “Not really. If the reasoning is from ChatGPT, we let Gemini judge it. If the reasoning is from Gemini, we let Claude judge it.”
Well, guess what happens now? LLMs give low scores to reasoning that wasn’t likely generated by themselves…
The irony doesn’t stop at the grading; the dataset is a mess too. Every case in the dataset has some key prop in the story. I’ll just call it the “triangle.”
My primary advisor was set on using triangles with equal side lengths.
My co-advisor instructed ChatGPT to build a triangle generator, and also blended the triangles into the storylines. Both advisors were like, “Great! Hurry up and run experiments on this dataset!”
After preprocessing for 2 weeks (such as translating to Japanese, matching with Detective Conan and Sherlock Holmes) and running experiments for another 2 weeks, I fancied some visualizations. However, the triangles looked all crooked and weird. I thought my code had a bug. I checked again and realized: the triangles my co-advisor generated were “isosceles” ones.
But I had assumed the triangles in our case collection were “equilateral”.
Neither my co-advisor nor I are familiar with triangles. My main advisor is “moderately” familiar with geometry, but he didn’t use very technical terms when he described them. So at the time, neither of us realized that triangles could be classified.
Anyway… we’ve spent time and a small fortune (not mine) on isosceles triangles and we’ve even started writing the evaluation section because the paper deadline is in two months.
Nevertheless, look on the bright side. I’m fortunate in the sense that my co-advisor is a doer–he contributed code and data.
Analysis:
The whole script is dripping with irony, and the final line is the finishing touching.