You're Using AI for Text. Your Business Runs on Everything Else.

Claude and ChatGPT process text. Your business doesn't. Coaching calls, screen shares, product photos, video ads, whiteboard sketches... the real work is visual and audio, and text-only AI tools can't see any of it. Gemini does. Not as a transcription service... as a tool that reasons across photos, video, and audio natively, in the same conversation where you're already working.

Marlon Brand

Marlon Brand

Founder, Undeniable · Last updated April 2026

01The Blind Spot

The Blind Spot in Your AI Workflow

You recorded a coaching call on Monday. By Wednesday the action items were gone... buried in a 45-minute recording nobody's going to re-watch. You took photos of a whiteboard after a strategy session. They're still in your camera roll, unsearchable. You saw a competitor's video ad that kept showing up in your feed. You watched it twice, thought “that's working,” and moved on without knowing why.

None of this is laziness. Your AI tool literally can't see these things.

You've been optimizing text prompts for months while every photo, every recording, every screen capture sat untouched. That's the Format Tax... the hours you spend manually translating visual and audio work into text so your AI can process it. Screenshots of dashboards you retype into prompts. Meeting recordings you transcribe and summarize by hand. Competitor ads you describe in words instead of uploading.

30 minutes a day. 130 hours a year. At $100 to $200 an hour, that's $13,000 to $26,000... not on creating, but on format conversion. You're the middleware between your business and your AI tool.

I started feeding my meeting video recordings directly to Gemini three months ago. Not the audio... the video. Gemini sees the screen shares, the slides, the whiteboard sketches... everything a transcript misses. The action items appear in 30 seconds, with context from what was shown, not just what was said.

That's not a better transcription service. That's a fundamentally different kind of understanding. And recordings are just the start.

Already using AI for text-based work? Our Claude Code Slash Commands guide covers the persistent-systems side. This guide covers the multimodal side. Different tools, different strengths.

02Definition

What “Multimodal” Actually Means (It's Not What Google Says)

Google markets Gemini as “multimodal AI.” That sounds like a feature checkbox. What it actually means is more specific.

Gemini doesn't just accept images as input. It reasons across formats simultaneously. Upload a video and ask about a moment at the 14-minute mark. Send a photo of your office whiteboard and ask it to turn the diagram into a project plan. Record yourself talking through an idea and get structured notes back.

Claude can look at an image and describe it. ChatGPT can describe and edit images through DALL-E. Gemini does both natively in the same conversation... and goes further. Edit the background. Remove an object. Transfer the visual style. Generate a variation. That's the line between understanding and creating.

This isn't incremental. Text-only AI tools require you to translate everything into words before processing it. Gemini processes the original format directly. A photo stays a photo. A recording stays a recording. A video stays a video. The translation step disappears, and the Format Tax goes with it.

03Workflows

What Gemini Means for Your Content Workflow

This isn't a feature list. It's four workflows that coaches, consultants, and small agency owners do every week... and what changes when your AI can actually see what you're working with.

Client Recordings

What you're doing now

Record a coaching call on Zoom. Use Otter or Zoom transcription. You get a text transcript. The words are there. But the transcript doesn't know your client pulled up a whiteboard and sketched their funnel. It doesn't see the screen share. It doesn't capture the slide you walked them through.

What changes with Gemini

Upload the video recording. Gemini processes both the audio and the visual stream. Screen shares, slides, whiteboard sketches, facial expressions. Ask “What did we discuss when I shared my screen at the 12-minute mark?” and get an answer with visual context, not just the words spoken over it.

The transcript gives you words. The video gives you the meeting.

Social Content

What you're doing now

Open Canva. Browse templates. Adjust text, swap colors, resize for three platforms. Twenty minutes for one post.

What changes with Gemini

Describe what you want. “A professional social graphic with a dark background, my headshot on the left, and the text ‘Stop guessing. Start measuring.’ in a clean sans-serif.” Gemini generates it. Or upload your headshot and ask for a branded version with a different background for LinkedIn.

This doesn't replace a designer for complex work. It replaces the 20-minute Canva sessions for the routine posts you publish three times a week.

Competitor Research

What you're doing now

You see a competitor's video ad that keeps showing up in your feed. You watch it. You think “that's working” without being able to articulate why. Maybe you screenshot one frame. You move on.

What changes with Gemini

Screen-record the ad (or grab it from Meta's Ad Library). Upload to Gemini. Ask: “Break down what's working in this ad. Pacing, visual transitions, audio cues, text overlays, call-to-action timing.” You get a structured analysis of why it works. Hook timing. Emotional arc. The moment the CTA appears relative to value delivery.

Works for static ads too... upload a screenshot and get the visual hierarchy. But video is where the multimodal advantage is sharpest. No other tool watches a video ad and tells you why the pacing works.

Course and Training Material

What you're doing now

Run a 90-minute workshop. Record it. Maybe transcribe the audio. But the transcript misses the slides, the live demo, the diagram you drew on the whiteboard halfway through.

What changes with Gemini

Upload the video. Gemini sees the slides, reads the diagrams, follows the demo, and hears the explanation... all at once. Ask for a structured outline that references both what you said and what you showed. Quiz questions based on the actual diagrams, not just the spoken words.

The gap between audio-only transcription and full video understanding is the gap between “what was said” and “what was taught.” Most of what sticks in a workshop is visual. Audio transcription throws that away.

The time saved per instance is modest. 15 minutes here, 20 minutes there. But they compound. Four coaching calls a week. Three social posts. One competitor analysis. One training session a month. That's where the 130 hours comes from.

04Visual Capabilities

Gemini's Visual Capabilities Your Competitors Are Already Using

You need visuals constantly. Pitch decks, landing page mockups, social graphics, product photos with clean backgrounds. Right now that means stock photo searches, Canva sessions, or waiting on a designer for assets that aren't even final yet.

Gemini generates and edits images natively. Not through a plugin or a separate tool. Inside the same conversation where you're working on everything else.

What this looks like in practice:

You're preparing a pitch for a new client. You sketch a rough layout on paper. You take a photo of the sketch. Gemini turns it into a polished diagram. Not a perfect rendering... but clean enough for a deck.

You need a hero image for a landing page. Instead of searching stock photo sites for 30 minutes and settling for something generic, you describe what you want. “A clean, professional workspace with warm lighting, a laptop showing analytics, and a coffee cup. Photorealistic.” Gemini generates options.

You have a product photo with a cluttered background. Upload it, ask Gemini to swap the background for a clean white studio setup. The product stays. The mess disappears.

You receive a financial report as a PDF with charts you need to reference in a presentation. Upload it. Ask Gemini to extract the data from the charts into a table. What used to be manual transcription is now a 10-second upload.

And then there's video. You need a short clip for a social post or a client pitch. Hiring a videographer for 8 seconds of footage makes no sense. Gemini's Veo model generates video from text descriptions... dialogue, sound effects, cinematic camera work... in 8-second clips. Describe a product showcase. Describe a course intro. Veo creates it. The output is good enough for social content and pitch decks. It's still a preview model... you won't get precise control over timing or brand-specific elements... but for quick concept videos, making an 8-second clip used to cost hundreds of dollars. Now it costs a prompt and a paid Gemini plan.

These capabilities work right now. The people using them aren't more technical than you. They just know they exist.

05Tool Selection

When to Use What (The Honest Breakdown)

If you're reading this on an Undeniable guide page, you might expect us to say “just use Claude for everything.” We won't. That's not how good operators work.

Each tool is genuinely better at specific things. Using one for everything is like hiring one person for every role in your business.

Your TaskBest ToolWhy
Processing images, video, audioGeminiNative multimodal understanding and generation
Generating visuals and videoGeminiBuilt-in image creation, editing, video generation (Veo)
Writing and editing codeClaude Code or Gemini (Antigravity)Antigravity is Google's agent-first IDE. Claude Code lives in your terminal. Different tools, different workflows.
Building repeatable AI workflowsClaudeCustom skills, persistent context across sessions
Quick conversations, brainstormingChatGPTFast, conversational, most people already know it
Google Workspace automationGeminiNative Docs, Sheets, Gmail integration

I use all three. Gemini for anything visual or audio. Claude for code and persistent systems. ChatGPT for quick conversations that don't need memory.

The real question isn't “which AI is best.” It's “which AI is best for this specific task, right now.” That judgment is what separates operators from everyone else.

We wrote a deeper comparison of Claude vs ChatGPT specifically for coaches and consultants at Claude vs ChatGPT. That guide goes deeper on the persistent-systems side. This one covers the multimodal side.

Different tools. Different strengths. Use all of them.

06Getting Started

Getting Started With Gemini in 10 Minutes

You don't need a tutorial. You need three things to try today so you can see the difference yourself.

Try this first

Upload a recent client call video recording (not just audio) to Gemini. Ask: “Summarize this meeting, including what was shown on screen, and list the action items with timestamps.” Compare what you get to a standard audio transcript.

Try this second

Screen-record a competitor's video ad that's performing well. Upload it and ask: “Break down what's working in this ad. Analyze the pacing, visual transitions, audio cues, and CTA timing.” Compare that analysis to your own instinct. Notice what Gemini names that you felt but couldn't articulate.

Try this third

Describe a social graphic you need for your next post. Be specific about the layout, colors, text, and mood. See what Gemini generates. Iterate once. Compare the time to your last Canva session.

Where to access Gemini

  • Gemini app (gemini.google.com)... the free tier handles image analysis, meeting summaries, and content generation. Video generation (Veo) and some advanced features require a paid plan.
  • Google AI Studio (aistudio.google.com)... for API access and more control. Better for building repeatable workflows.
  • Google Workspace ... if you're already in Gmail, Docs, and Sheets, Gemini integrates directly.

We're not listing specific pricing here because it changes. Check Google's current pricing page for the latest.

07Limitations

What Gemini Can't Do (Yet)

Every tool has edges. Knowing where Gemini's edges are saves you from the wrong expectations.

Memory is implicit, not structured

Gemini does remember things about you across sessions... your role, your preferences, recurring topics. But this memory is learned from your conversations, not defined by you. There's no project file you write and control. No explicit instruction set that loads at the start of every session. Gemini's memory is like a coworker who gradually picks up on your preferences. Claude's CLAUDE.md is a briefing document you hand them on day one.

Coding tools are split across products

Google has Antigravity (agent-first IDE, their flagship), Jules (autonomous PR agent, Python and JavaScript only), Gemini Code Assist (agent mode in VS Code and IntelliJ), and Gemini CLI (terminal agent). Four separate products. Antigravity is the most ambitious... it deploys multiple autonomous agents inside a VS Code fork. But the ecosystem doesn't share context across products the way Claude Code does with CLAUDE.md and custom skills. If you want one persistent environment with project memory, that's a different tool.

Image generation has guardrails

Content policy restrictions mean some image requests get refused. Faces, branded content, and sensitive subjects have limitations. This is Google being cautious, and it means the tool won't always do what you ask.

Video generation is early and paid-only

Veo 3.1 generates 8-second clips from text descriptions... dialogue, sound effects, cinematic camera work. But it's a preview model with limited control over precise timing, specific brand elements, and longer-form content. It also requires a paid Gemini plan. Great for social clips and concept videos. Not ready for full production work.

Locked to the Google ecosystem

The deepest integrations are with Google Workspace. If your business runs on Microsoft 365 or any other stack, the tightest features won't be available to you.

None of these are reasons not to use Gemini. They're reasons to know what it's for and what it's not for. Use the right tool for the job. Skip the rest.

08FAQ

Common Questions, Straight Answers

More from the Lab

--meta-ads

How We Cut Cost Per Lead from $16 to $5.43

The creative strategy framework behind it. Same budget, same targeting. Different creative.

Read
--meta-ads

Meta Ads Credit Card Deadline: What to Do Before April 1st

How to switch billing, what it means for cash flow, and the exact message to send each type of client.

Read
--lead-gen

Facebook Lead Forms vs Landing Pages

Which funnel setup actually converts and when to use each.

Read
--ai-tools

Claude vs ChatGPT for Coaches and Consultants

Which AI actually does the work? Architecture, persistence, and compounding compared.

Read
--ai-tools

Control Your Computer From Your Phone Using Claude

Step-by-step Remote Control setup. No coding required. 10 minutes.

Read
--claude-code

Claude Code Slash Commands That Actually Matter

The slash commands that build persistent context instead of disposable chats. /init, /memory, /compact, /plan, and custom skills.

Read
--email-deliverability

Get Out of Spam — Or Keep Losing Revenue in Silence

Why your emails land in spam and the infrastructure framework for fixing it.

Read
--ai-motion-control

AI Motion Control: Cinematic Video with Higgsfield

Create cinematic AI video using Higgsfield. No camera crew, no budget.

Read
--free-tool

Clean Copy — Make AI Text Sound Human

Free browser tool that strips AI tells from your copy. Paste, pick a mode, get human output.

Try it
--free-course

Learn Claude Code — Free 5-Lesson Video Course

Set up VS Code with Claude Code, build a landing page, and deploy live with Vercel. Five lessons, no coding experience required.

Watch free

Ready to Skip the DIY?

We build the full AI infrastructure underneath all this for one client per quarter. Gemini, Claude, the custom skills, the workflows, the whole system.

// One client per quarter

Apply to Work With Undeniable

No commitment. No pitch deck. Just a straight conversation about what's possible.