Posts List

Building Real-Time RAG Systems with Gemini & the Multimodal Live API

Most retrieval-augmented generation systems today feel a bit stiff. You ask a question. You wait. You get an answer. It works, but it doesn’t “feel” like a conversation.

Grokking GenAI: Multimodal Reasoning with Gemini - Part 2

When I wrote Grokking GenAI: Multimodal Reasoning with Gemini last year, multimodality felt like a breakthrough. An AI that could read text, look at images, listen to audio, and even understand code already felt futuristic. But over the past year, something important has changed.

Grokking GenAI: Multimodal Reasoning with Gemini

Imagine you’re trying to plan a trip to Hawaii. You’ve got a few pictures of beautiful beaches, a list of things you want to see, and a rough budget in mind. How do you pull it all together? You might browse travel blogs, compare prices, and even watch videos of the islands. You’re using different kinds of information – pictures, text, and video – to make sense of your trip.

Retrieval Augmented Generation (RAG) with Vertex AI and Langchain

Do you read a lot? No? Well, let’s say you’re at a library looking for information on a specific topic. Instead of just browsing through every book on the shelves, you ask the librarian for help.