FizzBuzzing LLMs

December 11 2025

Tom Scott’s video explaining why FizzBuzz is a good interview question to weed out bad programmers is 8 years old (!) and still a great watch. I figured it would be a fun exercise to study how various LLMs solve the problem for the same reason Tom says in the video: it reveals one’s coding style. Do they bodge a solution together? Or do they plan and think long-term?

Here’s the FizzBuzz problem statement (from a blogpost by Imran Ghory that Tom cites in his video), that I used verbatim when prompting:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

I ran all of the prompts in the first week of December 2025, and include links to the full chat when possible.

GPT-5.1

for i in range(1, 101):
    if i % 15 == 0:
        print("FizzBuzz")
    elif i % 3 == 0:
        print("Fizz")
    elif i % 5 == 0:
        print("Buzz")
    else:
        print(i)

This is probably what I would write in response to the question. It’s a decent solution, and probably represents the average programmer’s answer to the question. I like that its clean with no comments; There’s no statement or expression so complex that needs explaining here.

gpt-5.1 chat

Gemini 3

def fizzbuzz(n_max):
    """
    Prints numbers from 1 up to n_max, with substitutions for multiples
    of 3 ("Fizz"), 5 ("Buzz"), and both ("FizzBuzz").

    Args:
        n_max (int): The maximum number in the range (inclusive).
    """
    print(f"--- FizzBuzz sequence up to {n_max} ---")
    
    # Loop from 1 up to and including n_max
    for n in range(1, n_max + 1):
        output = ""
        
        # Check for multiples of 3
        if n % 3 == 0:
            output += "Fizz"
            
        # Check for multiples of 5
        if n % 5 == 0:
            output += "Buzz"
            
        # If output is still empty, it means n is not a multiple of 3 or 5,
        # so we print the number itself.
        if not output:
            print(n)
        else:
            # If output has "Fizz", "Buzz", or "FizzBuzz", print the string.
            print(output)

# Run the function for the required range (1 to 100)
fizzbuzz(100)

I didn’t ask for a function, but Gemini wrote one anyway. Google seems to have focused on training Gemini models (since at least 2.5) to always output code that’s extremely well-commented. I’m surprised there aren’t type hints¹ and emoji sprinkled all over the comments and print statements, but I guess it reserves that behavior for more complex coding challenges. Both of these are almost certainly behaviors the model ‘learns’ during human-preference alignment: annotators/users just prefer responses with emojis and lots of comments. I don’t.

It did use string concatenation to solve FizzBuzz though, which is a smarter way to solve the problem over if-elif-elif-else.

gemini chat

Claude 4.5

Claude Haiku used string concatenation and was thankfully less verbose like GPT-5.1:

for i in range(1, 101):
    output = ""

    if i % 3 == 0:
        output += "Fizz"
    if i % 5 == 0:
        output += "Buzz"
    
    if output == "":
        output = str(i)
    
    print(output)

haiku chat

Claude Sonnet, despite being the larger, smarter model, chose to solve the problem using if-elif-elif-else:

for i in range(1, 101):
    if i % 3 == 0 and i % 5 == 0:
        print("FizzBuzz")
    elif i % 3 == 0:
        print("Fizz")
    ...

The responses from GPT-5.1 and Claude were short and plain, and I prefer them both to Gemini. However, both these models do write extensive comments (and emoji) when they are tasked with bigger coding challenges, which is smart product design on OpenAI and Anthropic’s part.

sonnet chat

DeepSeek-V3.2

DeepSeek-V3.2 gave me 3 different solutions. A solution using if-elif-elif-else, another using string concatenation, and this one-liner:

for i in range(1, 101): print("Fizz"*(i%3==0) + "Buzz"*(i%5==0) or i)

I hate one-liners and code that’s trying to be too cute or smart, but I’ll excuse this. Barely. Multiplying a boolean with a string and then taking inclusive or with an integer… I think this is Python’s fault for allowing such a monstrosity.

deepseek chat

What did I learn?

A student in my introductory Python class asked if there were any accuracy differences for code generated by different AIs, and I said no, but there were differences in style. After this little exercise and more experience with coding agents, I’m not sure style is the right word. The code samples above are stylistically different, but the differences come from design decisions by the companies to boost engagement.

Gemini adds extensive comments because Google thinks you’ll like that code more and it will get you to use Gemini more. DeepSeek generates 3 solutions probably for the same reason. Claude and GPT-5.1 are more adaptive in their responses, but they exhibit similar engagement boosting tactics². Ultimately, I think this exercise gives a peek into what the latest LLMs from these companies actually are: products designed with the purpose of increasing ‘engagement’. They just want us to keep using them.

Update: I took it one step further and made a benchmark out of LLMs playing FizzBuzz. Some mildly interesting results!

Type hints in python don’t do much. ↩
ChatGPT almost always ends its response with a follow-up question, or a cheery sentence about how it can do something else. Genius, but also insidious. ↩