Posts List

Grokking GenAI: Multimodal Reasoning with Gemini - Part 2

When I wrote Grokking GenAI: Multimodal Reasoning with Gemini last year, multimodality felt like a breakthrough. An AI that could read text, look at images, listen to audio, and even understand code already felt futuristic. But over the past year, something important has changed.

Grokking GenAI: Multimodal Reasoning with Gemini

Imagine youโ€™re trying to plan a trip to Hawaii. Youโ€™ve got a few pictures of beautiful beaches, a list of things you want to see, and a rough budget in mind. How do you pull it all together? You might browse travel blogs, compare prices, and even watch videos of the islands. Youโ€™re using different kinds of information โ€“ pictures, text, and video โ€“ to make sense of your trip.