Read on to learn about my Capstone project for Google’s Gen AI Intensive Course 2025

What is the need this tool addresses?

Frequently when researching a design or writing an article I want to find a specific piece of information. If I know the information is in a book then I simply open the book’s index and look for a relevant word that will lead me to that piece of information.

However when the desired information is in a video (or other linear media such as podcasts) then I need to try one of the following things:

Check the video’s chapters (if they exist) in the hope that it will give me clues about where to find the information
Rapidly skim through the video in the hope that I will randomly chance upon the desired information
Read through the whole transcript on YouTube, which is slightly quicker than watching the video (like scanning a book from start to end to find a specific word)
Download the transcript and search it, or upload it to Gemini to query it

All of these methods are inefficient. I don’t even know if the desired information is there at all, so skimming through videos or downloading transcripts is a cumbersome and resource intensive task.

In defining this problem I’m not just conducting me-search. Everyone has to deal with this issue when working with linear media. It’s in the nature of the media - you can’t experience video or audio without spending time. Jumping from one place to another (which we can do so easily with text and images) is extremely difficult with linear media. Even if you do jump from one place to another, you then have to hit play and watch and/or listen to understand. It takes time - much longer than scanning an alphabetised book index.

Students and researchers can benefit from a tool that solves this. It should also appeal to anyone who is simply looking to quickly find something within a video - whether it’s a DIYer trying to fix a plumbing emergency, a parent who has been asked by their kid what a dog’s backward sneeze is, or anyone else! Their chances of finding relevant information would be considerably improved with a video index.

Why would AI help solve this?

So with that problem in mind, the use case I’m attempting to solve is creating book-style indexes for YouTube videos.

Recent advances in generative AI are extremely useful for solving this use case:

GenAI Capabilities: Generative AI is excellent for tasks such as document understanding, information extraction, and content generation
Contextual Understanding: By using a language model (Gemini), the solution goes beyond simple keyword extraction and understands the context of the video to identify relevant keywords and their importance
Dynamic and Adaptable: The solution can be easily adapted to different video content and languages by modifying the prompts and language settings
Concise Summary: The solution provides a concise AI-generated summary of the video's main topic, giving users a quick overview of the content

When I had defined my use case and considered what was possible with GenAI, I felt confident that I could create a tool.

Setting up the environment and running the first prompt