VideoObject schema: making video content visible to AI

Bas Vermeer
Bas Vermeer SEO/AEO Specialist

The invisibility problem of video for AI

Video is one of the most powerful content formats on the web. It captures attention, explains complex concepts clearly and keeps visitors on your page longer. But for AI models and search engines, video content is largely invisible. An AI model cannot watch a video. It cannot listen to spoken words. It cannot see what is shown on screen. Without supplementary metadata, your video is a black hole to machines: they know something is there, but not what.

This is exactly the problem that VideoObject schema solves. By adding structured metadata about your video, you translate the visual and auditory content into a format that machines can process. You tell them what the video contains, how long it lasts, when it was published and where to find the thumbnail.

VideoObject is one of the richer schema types within Schema.org, with dozens of properties describing every aspect of a video. As part of your Schema.org strategy, it is an essential type for any website that publishes video content.

IMPORTANT

Google uses VideoObject schema to generate video rich results: large thumbnails with play buttons in search results. Pages with video rich results get up to 41% more clicks than pages without, according to Google's own research.

VideoObject schema: the essential implementation

A basic VideoObject contains the required properties that Google needs for video rich results, supplemented with recommended properties that further increase visibility.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "VideoObject",
  "name": "Implementing Schema.org markup: step-by-step guide",
  "description": "In this video you learn how to implement Schema.org JSON-LD markup on your website for better visibility in search engines and AI models.",
  "thumbnailUrl": "https://aeo-expert.nl/images/video/schema-tutorial-thumb.jpg",
  "uploadDate": "2026-04-20T10:00:00+02:00",
  "duration": "PT12M30S",
  "contentUrl": "https://aeo-expert.nl/videos/schema-tutorial.mp4",
  "embedUrl": "https://www.youtube.com/embed/abc123xyz",
  "publisher": {
    "@type": "Organization",
    "name": "AEO Expert",
    "logo": {
      "@type": "ImageObject",
      "url": "https://aeo-expert.nl/images/logo.png"
    }
  }
}
</script>

The required properties are "name", "description", "thumbnailUrl" and "uploadDate". Without these four, Google will refuse to show the video rich result. The "duration" is expressed in ISO 8601 format: PT12M30S means 12 minutes and 30 seconds.

contentUrl versus embedUrl

The difference between "contentUrl" and "embedUrl" is subtle but important. The contentUrl points to the direct video file (for example a .mp4 URL). The embedUrl points to the embeddable player (for example a YouTube embed URL). If your video is on YouTube and also has a direct download, include both.

  • contentUrl: direct link to the video file. Used by crawlers to verify and potentially index the video.
  • embedUrl: link to the embeddable player. Used to display the video in search results and other platforms.
  • If your video is only on YouTube, use embedUrl. contentUrl is optional if no direct file is available.
  • Always use HTTPS URLs for both contentUrl and embedUrl to avoid security warnings.

Advanced VideoObject properties

The basic implementation suffices for video rich results, but advanced properties describe your video even better for AI models.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "VideoObject",
  "name": "E-E-A-T optimization: complete guide",
  "description": "Comprehensive walkthrough of E-E-A-T optimization for AI visibility.",
  "thumbnailUrl": "https://aeo-expert.nl/images/video/eeat-guide-thumb.jpg",
  "uploadDate": "2026-04-22T14:00:00+02:00",
  "duration": "PT25M15S",
  "embedUrl": "https://www.youtube.com/embed/def456uvw",
  "interactionStatistic": {
    "@type": "InteractionCounter",
    "interactionType": { "@type": "WatchAction" },
    "userInteractionCount": 15420
  },
  "transcript": "Welcome to this guide on E-E-A-T optimization. In this video we discuss how to build expertise, experience, authority and trustworthiness...",
  "inLanguage": "en",
  "hasPart": [
    {
      "@type": "Clip",
      "name": "What is E-E-A-T?",
      "startOffset": 0,
      "endOffset": 180,
      "url": "https://aeo-expert.nl/video/eeat-guide#what-is-eeat"
    },
    {
      "@type": "Clip",
      "name": "Proving expertise",
      "startOffset": 180,
      "endOffset": 540,
      "url": "https://aeo-expert.nl/video/eeat-guide#expertise"
    }
  ]
}
</script>

The "transcript" property is particularly valuable for AI models. A transcript makes the spoken content of your video searchable and citable. AI models that cannot watch the video itself can read the transcript and use the information in their answers.

Clip markup for key moments

The "hasPart" property with Clip objects describes specific segments within the video. Google uses this data to show "key moments" in search results: clickable timestamps that jump directly to the relevant part of the video. This is comparable to how a good heading hierarchy makes a textual article scannable.

Each Clip object has a name, a startOffset (in seconds), an endOffset and optionally a URL with fragment identifier. Google shows a maximum of five key moments per video in search results.

VideoObject for YouTube videos

Most websites host their videos on YouTube and then embed them on their own site. In that case, the VideoObject markup on your own page is extra important, because it provides the context that YouTube's own metadata lacks.

<!-- YouTube embed with surrounding VideoObject markup -->
<div class="video-container">
  <iframe src="https://www.youtube.com/embed/abc123xyz"
          title="Implementing Schema.org markup"
          allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
          allowfullscreen>
  </iframe>
</div>

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "VideoObject",
  "name": "Implementing Schema.org markup",
  "description": "Step-by-step guide for implementing Schema.org JSON-LD on your website.",
  "thumbnailUrl": "https://img.youtube.com/vi/abc123xyz/maxresdefault.jpg",
  "uploadDate": "2026-04-20T10:00:00+02:00",
  "duration": "PT12M30S",
  "embedUrl": "https://www.youtube.com/embed/abc123xyz",
  "publisher": {
    "@type": "Organization",
    "name": "AEO Expert"
  },
  "transcript": "The full transcription of the video..."
}
</script>

Note that the thumbnailUrl uses the YouTube thumbnail URL. YouTube provides standard thumbnails in various formats via the path img.youtube.com/vi/VIDEO_ID/. The "maxresdefault.jpg" variant provides the highest resolution.

Video and AI citations: why transcripts are essential

Adding a transcript to your VideoObject schema is one of the most impactful steps you can take for AI visibility. Here is why.

AI models process text. They are trained on text, they reason in text and they generate text. A video without a transcript is a closed box for an AI model. With a transcript, that same video becomes a rich source of citable information. When a user asks an AI model "how do I implement Schema.org markup?", the model can search through your video transcript and cite the relevant passage, including a reference to the video.

  1. Generate transcripts automatically via YouTube (autocaptioning) or specialized services like Rev or Otter.ai.
  2. Review and correct automatically generated transcripts. Autocaptions regularly contain errors, especially with technical terms.
  3. Structure the transcript with timestamps so readers (and machines) can quickly navigate to the relevant section.
  4. Also publish the transcript as visible text on the page, below the video. This doubles the SEO effect.
  5. Reference the transcript in the VideoObject schema via the "transcript" property.
A video without a transcript is like a book in a vault: the content is valuable, but no one can access it. A transcript opens the vault for every AI that wants to read.

VideoObject combined with Article schema

When a video is part of a blog post or article, you combine the VideoObject with the Article schema in a @graph structure.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Article",
      "headline": "Implementing VideoObject schema",
      "author": { "@type": "Person", "name": "Jan de Vries" },
      "datePublished": "2026-04-24",
      "video": { "@id": "#main-video" }
    },
    {
      "@type": "VideoObject",
      "@id": "#main-video",
      "name": "Implementing VideoObject schema: video tutorial",
      "description": "Practical tutorial on VideoObject schema implementation.",
      "thumbnailUrl": "https://aeo-expert.nl/images/video/videoobject-thumb.jpg",
      "uploadDate": "2026-04-24T10:00:00+02:00",
      "duration": "PT18M45S",
      "embedUrl": "https://www.youtube.com/embed/ghi789rst"
    }
  ]
}
</script>

The "video" property on the Article references the VideoObject via @id. This tells machines that the video is an integral part of the article, not a standalone embed. AI models use this relationship to understand that the video and the text together tell a complete story.

Key takeaways

  • VideoObject schema makes video content visible to machines that cannot watch the video itself, by offering essential metadata in a structured format.
  • Google uses VideoObject for video rich results with large thumbnails and play buttons, which can increase click-through rate by up to 41%.
  • The transcript property is the most valuable element for AI citations: it makes spoken content searchable and citable.
  • Clip markup via hasPart describes key moments within the video, which Google can display as clickable timestamps in search results.
  • Always combine VideoObject with Article or WebPage schema via a @graph array when the video is part of a larger content piece.

Frequently asked questions

Should I add VideoObject schema if my video is on YouTube?

Yes, absolutely. YouTube generates its own structured data, but that is limited to the YouTube platform. By adding VideoObject schema to your own page where the video is embedded, you make the video visible in the context of your own website. Google can then show your page as a video rich result, instead of only the YouTube result.

How long should the transcript be?

The transcript should contain the complete spoken content of the video. Shortened or summarized transcripts miss valuable information that AI models could cite. If a complete transcription is not feasible, provide at minimum an extensive summary of the main points. The more complete the transcript, the greater the chance of AI citations.

What is the difference between VideoObject and embedding?

Embedding is visually displaying a video player on your page. VideoObject schema is the structured metadata that tells machines what the video contains. They complement each other: the embed provides the visual experience for visitors, the schema provides the machine-readable description for search engines and AI models. Embedding without schema is like a movie without a description in the program guide.

How many Clip objects can I add?

There is no technical maximum, but Google shows a maximum of five key moments per video in search results. Therefore focus on the five to ten most valuable segments of your video. Choose segments that provide standalone value and that match frequently asked questions or search terms. More than twenty clips per video dilutes the signal without additional benefit.

Can VideoObject schema help rank video content?

VideoObject schema is a prerequisite for video rich results in Google, and video rich results generate significantly more clicks than regular search results. While the schema itself is not a direct ranking factor, it increases the chance that Google prominently displays your video. Combined with a transcript and clip markup, you maximize the visibility and discoverability of your video content.

Video is the most consumed content format on the web. VideoObject schema is the bridge that ensures machines do not overlook this rich content.

How does your website score on AI readiness?

Get your AEO score within 30 seconds and discover what you can improve.

Free scan

SHARE THIS ARTICLE

LINKEDIN X

RELATED ARTICLES