FizzBuzzing LLMs
December 11 2025
Tom Scott’s video explaining why FizzBuzz is a good interview question to weed out bad programmers is 8 years old (!) and still a great watch. I figured it would be a fun exercise to study how various LLMs solve the problem for the same reason Tom says in the video: it reveals one’s coding style. Do they bodge a solution together? Or do they plan and think long-term?
Here’s the FizzBuzz problem statement (from a blogpost by Imran Ghory that Tom cites in his video), that I used verbatim when prompting:
Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.
I ran all of the prompts in the first week of December 2025, and include links to the full chat when possible.
GPT-5.1
for i in range(1, 101):
if i % 15 == 0:
print("FizzBuzz")
elif i % 3 == 0:
print("Fizz")
elif i % 5 == 0:
print("Buzz")
else:
print(i)
This is probably what I would write in response to the question. It’s a decent solution, and probably represents the average programmer’s answer to the question. I like that its clean with no comments; There’s no statement or expression so complex that needs explaining here.
Gemini 3
def fizzbuzz(n_max):
"""
Prints numbers from 1 up to n_max, with substitutions for multiples
of 3 ("Fizz"), 5 ("Buzz"), and both ("FizzBuzz").
Args:
n_max (int): The maximum number in the range (inclusive).
"""
print(f"--- FizzBuzz sequence up to {n_max} ---")
# Loop from 1 up to and including n_max
for n in range(1, n_max + 1):
output = ""
# Check for multiples of 3
if n % 3 == 0:
output += "Fizz"
# Check for multiples of 5
if n % 5 == 0:
output += "Buzz"
# If output is still empty, it means n is not a multiple of 3 or 5,
# so we print the number itself.
if not output:
print(n)
else:
# If output has "Fizz", "Buzz", or "FizzBuzz", print the string.
print(output)
# Run the function for the required range (1 to 100)
fizzbuzz(100)
I didn’t ask for a function, but Gemini wrote one anyway. Google seems to have focused on training Gemini models (since at least 2.5) to always output code that’s extremely well-commented. I’m surprised there aren’t type hints1 and emoji sprinkled all over the comments and print statements, but I guess it reserves that behavior for more complex coding challenges. Both of these are almost certainly behaviors the model ‘learns’ during human-preference alignment: annotators/users just prefer responses with emojis and lots of comments. I don’t.
It did use string concatenation to solve FizzBuzz though, which is a smarter way to solve the problem over if-elif-elif-else.
Claude 4.5
Claude Haiku used string concatenation and was thankfully less verbose like GPT-5.1:
for i in range(1, 101):
output = ""
if i % 3 == 0:
output += "Fizz"
if i % 5 == 0:
output += "Buzz"
if output == "":
output = str(i)
print(output)
Claude Sonnet, despite being the larger, smarter model, chose to solve the problem using if-elif-elif-else:
for i in range(1, 101):
if i % 3 == 0 and i % 5 == 0:
print("FizzBuzz")
elif i % 3 == 0:
print("Fizz")
...
The responses from GPT-5.1 and Claude were short and plain, and I prefer them both to Gemini. However, both these models do write extensive comments (and emoji) when they are tasked with bigger coding challenges, which is smart product design on OpenAI and Anthropic’s part.
DeepSeek-V3.2
DeepSeek-V3.2 gave me 3 different solutions. A solution using if-elif-elif-else, another using string concatenation, and this one-liner:
for i in range(1, 101): print("Fizz"*(i%3==0) + "Buzz"*(i%5==0) or i)
I hate one-liners and code that’s trying to be too cute or smart, but I’ll excuse this. Barely. Multiplying a boolean with a string and then taking inclusive or with an integer… I think this is Python’s fault for allowing such a monstrosity.
What did I learn?
A student in my introductory Python class asked if there were any accuracy differences for code generated by different AIs, and I said no, but there were differences in style. After this little exercise and more experience with coding agents, I’m not sure style is the right word. The code samples above are stylistically different, but the differences come from design decisions by the companies to boost engagement.
Gemini adds extensive comments because Google thinks you’ll like that code more and it will get you to use Gemini more. DeepSeek generates 3 solutions probably for the same reason. Claude and GPT-5.1 are more adaptive in their responses, but they exhibit similar engagement boosting tactics2. Ultimately, I think this exercise gives a peek into what the latest LLMs from these companies actually are: products designed with the purpose of increasing ‘engagement’. They just want us to keep using them.
Update: I took it one step further and made a benchmark out of LLMs playing FizzBuzz. Some mildly interesting results!
-
Type hints in python don’t do much. ↩
-
ChatGPT almost always ends its response with a follow-up question, or a cheery sentence about how it can do something else. Genius, but also insidious. ↩
Levels of nuance never reached
September 20 2025
I’ve finally published a paper where I could cite KJ Healy’s provocative 2012 paper ‘Fuck Nuance’. It’s not the most technically sophisticated paper I’ve worked on, nor the most intellectually interesting, but as soon as I saw all those papers about ‘fine-grained’ synthetic personas, I knew that this was probably an instance of researchers falling into the ‘fine-grained nuance trap’ and that I needed to look into it with my collaborators Chantal and Gauri. Our findings aren’t conclusive across all models and settings, but in our specific experimental conditions fine-grained detail in personas don’t dramatically improve the lexical diversity of synthetic data from LLMs.
The paper is a short read, I hope some people find value in it. But I wanted to use the paper as an excuse to revisit my very first post on this blog. It’s been five years, has the phenomenon of nuance rising slowed down in ACL publications?
Nope. At the current rate of doubling, over half the papers published in *CL venues will mention nuance/fine-grain by 2040! I can’t wait to write the follow-up to this post in 2030 to see if we’ve reached peak nuance.
Again, I don’t hold a value judgement one way or another and at least 2 of my articles contribute to this phenomenon. I think words like ‘nuance’ and ‘fine-grain’ are just words we attribute high value to as researchers and use even when we don’t need to or have to. Words also fall into and out of style — but I do think a lot of us also fall into the nuance trap when we study research questions to make our papers sound more appealing.
Kural embeddings
August 23 2025
Over the last two years, I’ve seen quite a lot of blog posts, videos and explainers about embeddings which initially surprised me. It shouldn’t have, since ‘AI’/LLMs are everywhere now; Developers, students and hobbyists are really excited (as I assume they were about the Web in late 90s/early 2000s and mobile apps in the early 2010s) to understand them and build something using them. Still, it feel surreal for the concept I studied as a Masters student to be the focus of attention for the whole computing industry. There’s too much attention and hype, and I do wonder (with some fear) what a massive financial bubble bursting actually looks like at the micro level in everyday life.
One undeniable positive to come from all the hype around ‘AI’ is much better developer tooling and frameworks for embeddings and neural networks now than in 2017. With Qwen3 multilingual embeddings and Gemini CLI1, I could quickly prototype the first idea that popped in my head: Build a web app that uses a multilingual embedding model to find relevant Thirukural couplets (in Tamil) for user queries in any language. That’s what ‘Kural for your question’ is. I’m pretty happy with the end product, but the retrieval of relevant kural couplets itself with cosine similarity of embeddings is pretty underwhelming.
Visualizing the embeddings of all 1330 kurals in 2D using UMAP gives an idea why. There aren’t really any meaningful clusters, with only the Book of Love showing some clear separation from the other two books. The similarity search works sometimes because of certain key words, but lacks any understanding of the intent of the question. For the question What are the duties of a son to his parents? (an undying question for me), only one of the 3 kurals deemed relevant is about the parent-child relationship at all. The larger Qwen3 embedding models might work better, but model training frameworks and data mixtures are more biased towards real-world use cases — my niche, little idea probably doesn’t mesh well with what Qwen3 Embedding was trained to do.
Still feels good to build something, even if underwhelming 🙂.
-
Just like Typeproof — I know its dangerous, but boy is it addictive to build fully functional web apps with just text prompts. I’ve learned more Typescript this way than I have in years, but not as much as I would have learned if I had built these apps from scratch. But I would have never built these web apps from scratch either. ↩
Dangers of the translation period
July 11 2025
I recently watched Fallen Leaves on a flight to Chennai. What a wonderful movie — depressed people talking about love and life in a depressed manner, and yet finding their peace in the end.
A few lines of dialogue caught my attention towards the end of the movie. The main character is solving a crossword puzzle when she utters the meta-linguistic clues out loud:
“Danger.” Six letters.
Threat.
“O positive.” Ten letters.
Blood group.
The above lines are from the English subtitles that I was reading, but Fallen Leaves is a Finnish movie. I wondered how the subtitle translators shifted between cultures when writing the English subtitles for these language-specific clues, so I naturally looked up the original Finnish subtitles:
“Vaara.” Kuusi kirjainta.
Uhka.
“O Positiivinen.” Kymmenen kirjainta.
Veriryhmä.
Kuusi and Kymmenen are the cardinal numbers six and ten in Finnish. But the Finnish word for bloodgroup, Veriryhmä according to the subtitles (and Google Translate) is 9 characters long. However, Finnish plurals are frequently formed with the -t suffix, and the captions might have missed the plural, or the character might have simply said the singular. But, what about Uhka which is just 4 characters?
It is at this point while writing this blog post that I realized she’s solving an English crossword puzzle, and just reading the clues out loud in Finnish🤦🏾♂️. So much for getting excited about an interesting translation puzzle in the wild.
Concurrency in DeTeXt
July 5 2025
I haven’t had a traditional formal CS education, which hasn’t held me back from getting things done with computer programming. However, I have slowly accumulated a long list of blind spots because of the high-level of abstraction modern scripting languages like Python provide. One of these concepts is concurrency. This blog post will be very high-level, but it’s my understanding of these difficult concepts at the moment, and it did help me implement Swift concurrency in my app DeTeXt.1
Sky-high overview
The first misconception I had to forego was the link between concurrency and parallelism. Rob Pike describes the distinction in a succinct way:
Concurrency is about dealing with multiple things at once. Parallelism is about doing multiple things at once.
The dealing with part is important! I didn’t understand the need for this because of how removed I’ve been from low-level programming. YouTuber Core Dumped has [a really good video about concurrency][coredumped], and his explanation finally helped me understand why concurrency is not only important, but crucially the default. Processors execute one instruction at a time — but modern operating systems run hundreds of active processes at any given time. The Operating System (OS) is in charge of figuring out which process should get access to system resources. Modern CPUs are so fast that we’re under the illusion that all processes are running simultaneously2
This fundamental truth about deadling with multiple things that need to get done applies to individual applications/programs as well. Thankfully, application developers can rely on the APIs and abstractions provided by the OS to handle concurrency. For developing applications on Apple platforms in 2025, this means adopting language support for concurrency in Swift.
For my simple app DeTeXt, there were two instances where I needed to deal with concurrency in my code:
- running image recognition with my CoreML model as a separate task on the main thread.
- showing a temporary toast-style pop-up when the user copies a symbol/command/unicode code-point. Once again, this is a task that runs on the main thread.
But what’s a thread? Or a task? And why am I running things on the main thread?
Threads
A process is a program in execution. It has its own program counter, register information, and memory space. However, programs themselves need to do multiple things at once within them. Threads are the abstraction that almost all OSes have settled on to handle concurrency within a process. Every process has a main thread, which is the initial thread where all work first happens. The OS typically initiates subsequent threads based on developer instructions.
For a user-facing mobile application the cardinal rule is that all user interface (UI) work must happen on the main thread. Updating the user interface is a short-term operation that immediately affects the user experience; Users can tolerate waiting for a large download or file save, but the app itself should never crash or lag. Critical operations happen on the main thread, which has OS-level priority for the application.
DeTeXt is a pretty simple app — it doesn’t download or upload anything, all symbols and images are loaded on start-up, and they’re only 10MB anyway. Implementing concurrency in DeTeXt simply meant identifying (and marking) asynchronous functions and possible suspension points (where possible long-running tasks can occur), and instructing the OS to run all asynchronous functions on the main thread, which it was doing anyway.
Actors and Tasks
Swift’s concurrency model doesn’t let us work with threads directly. Instead we work with 2 abstractions — tasks and actors.
A unit of work that we need to handle with concurrency is a task. Fetching web resources, reading/writing from the file system are traditional examples of a task suited for concurrent thinking/processing. For DeTeXt, I defined 2 tasks that encapsulated the 2 asynchronous functions3 mentioned earlier:
- Taking the drawing from the on-screen canvas, pre-processing it, and processing it through the neural net that calculates probabilities for every symbol. The probabilities are stored in a reference type marked Observable —any changes to the underlying data send notifications to SwiftUI views that observe it.
- Displaying the name of the command, or the symbol itself and showing it as a toast-style pop-up on screen for a set, constant period of time, then automatically dismissing it.
Now both of these need to run on the main thread, since they update the UI in both cases. However, both tasks can take undetermined amounts of time to finish. The CPU/GPU/Neural engine might be clogged up doing some other intensive process (very unlikely but possible), and the toast task needs to pause for a set amount of time before finishing. Implementing this in DeTeXt couldn’t have been simpler: package the asynchronous function call to the CoreML model and the toast suspension/sleep as tasks. Now, we need to ensure that both tasks run on the main thread, for which we turn to another abstraction: Actors.
Actors are objects that ensure that only one function has access to mutable data at a time. Their role in concurrency is to avoid data races. The Main Actor is in charge of updating all the data that updates the UI. Since the only mutable data in my app pertains to the UI, I simply needed to mark the two asynchronous functions with the @MainActor attribute, instructing the OS that while there may be suspension points in these two functions, both of them impact the UI, so ensure that these functions only run on the Main Actor. The Main Actor abstracts over the main thread — they’re very similar, and the differences have more to do with low-level implementation details.
Fin
I learned a lot about concurrency and asynchronous functions while re-writing my app to use Swift’s new concurrency features. You can view the actual code changes on the GitHub repo of course. To be honest, writing this blog post took more time than actually learning and implementing Swift concurrency in my app! I love explainer blog posts and videos, so I figured I’d give a shot at writing my own, for me.
-
I didn’t need to add concurrency as it turned out, but it was a good excuse to learn the underlying concepts. ↩
-
On CPUs with multiple cores, multiple processes do run simultaneously. But even the beefiest CPU from Apple has ‘only’ 32 cores. I had 869 active processes when I was writing this post. ↩
-
Asynchronous functions run as part of some task — the task abstraction enables structured concurrency, which I haven’t explored yet. ↩