Kural embeddings
August 23 2025
Over the last two years, I’ve seen quite a lot of blog posts, videos and explainers about embeddings which initially surprised me. It shouldn’t have, since ‘AI’/LLMs are everywhere now; Developers, students and hobbyists are really excited (as I assume they were about the Web in late 90s/early 2000s and mobile apps in the early 2010s) to understand them and build something using them. Still, it feel surreal for the concept I studied as a Masters student to be the focus of attention for the whole computing industry. There’s too much attention and hype, and I do wonder (with some fear) what a massive financial bubble bursting actually looks like at the micro level in everyday life.
One undeniable positive to come from all the hype around ‘AI’ is much better developer tooling and frameworks for embeddings and neural networks now than in 2017. With Qwen3 multilingual embeddings and Gemini CLI1, I could quickly prototype the first idea that popped in my head: Build a web app that uses a multilingual embedding model to find relevant Thirukural couplets (in Tamil) for user queries in any language. That’s what ‘Kural for your question’ is. I’m pretty happy with the end product, but the retrieval of relevant kural couplets itself with cosine similarity of embeddings is pretty underwhelming.
Visualizing the embeddings of all 1330 kurals in 2D using UMAP gives an idea why. There aren’t really any meaningful clusters, with only the Book of Love showing some clear separation from the other two books. The similarity search works sometimes because of certain key words, but lacks any understanding of the intent of the question. For the question What are the duties of a son to his parents? (an undying question for me), only one of the 3 kurals deemed relevant is about the parent-child relationship at all. The larger Qwen3 embedding models might work better, but model training frameworks and data mixtures are more biased towards real-world use cases — my niche, little idea probably doesn’t mesh well with what Qwen3 Embedding was trained to do.
Still feels good to build something, even if underwhelming 🙂.
-
Just like Typeproof — I know its dangerous, but boy is it addictive to build fully functional web apps with just text prompts. I’ve learned more Typescript this way than I have in years, but not as much as I would have learned if I had built these apps from scratch. But I would have never built these web apps from scratch either. ↩