AI-Assisted Post

This post was generated with AI assistance.

If you use Obsidian for research and watch a lot of YouTube, the Obsidian Web Clipper extension should be one of your most-used tools. The pitch is straightforward: clip any YouTube video directly into your vault as a structured note, complete with metadata, an AI-generated summary, and the full transcript — all without leaving the browser.

The reality, at least recently, is that most templates floating around the internet are broken. YouTube has changed its UI twice in 2026 alone, and each change silently kills the transcript extraction. You clip a video, get a beautiful note with all your frontmatter, and then a completely empty transcript section. No error. Just nothing.

This post walks through exactly what the Web Clipper is doing, what broke, how we fixed it, and what the final working template looks like.


What Obsidian Web Clipper Actually Does

The extension is a browser-side scraper that reads a web page and transforms it into a Markdown note using a JSON template you define. When you click the clip button on a YouTube page, it does three things:

1. Extracts structured metadata from the page’s schema. YouTube embeds a <script type="application/ld+json"> block on every video page containing a @VideoObject — a standardized data structure with the video title, author, upload date, description, thumbnail URL, duration, and embed URL. The clipper reads this directly. This is reliable because it’s machine-readable data YouTube intentionally publishes.

2. Runs CSS selectors against the live DOM. For anything not in the schema — like the transcript — the clipper uses CSS selectors to find specific HTML elements on the page and extract their text. This is where things get fragile. If YouTube changes its component names or DOM structure, the selector silently returns nothing.

3. Calls an AI model with the extracted context. If your template includes a {{"prompt"}} block, the clipper sends that prompt along with the page context to a language model and injects the response directly into your note. This is how you get the auto-generated summary, key takeaways, and mindmap without any extra steps.


The Template We Started With

The original template in circulation looked like this for transcript extraction:

{{selectorHtml:ytd-transcript-segment-renderer .segment-timestamp, ytd-transcript-segment-renderer yt-formatted-string|join:"\n"|markdown|callout:("transcript","Transcript (YouTube)",true)}}

It targeted ytd-transcript-segment-renderer components and extracted both the timestamp and the text from each segment. When it was written, this worked. The problem is that ytd-transcript-segment-renderer no longer exists in YouTube’s DOM. The component was renamed as part of their February 2026 UI overhaul, and then renamed again in March 2026. The selector runs, finds nothing, and returns an empty string — no warning, no fallback.

The original also had a few other issues worth cleaning up:

  • Duration was pulled from #ytd-player .ytp-time-duration — a live player DOM element that isn’t always rendered when the clipper runs, producing inconsistent results
  • URL was pulled directly from {{schema:@VideoObject:@id}} which sometimes returns the embed URL rather than the watch URL, breaking the video embed in the note
  • The replace filter syntax on duration used replace:"PT","","S","" — passing two find/replace pairs to a filter that only accepts one, triggering a parse error in newer clipper versions
  • published and created were both typed as "type": "date" when created should be "type": "datetime" to store the full timestamp

What We Changed and Why

Transcript Selector

The core fix. The current working selector as of April 2026:

{{transcript}}

Yes — that simple. After going through two broken CSS selector approaches (both targeting the transcript engagement panel by target-id, both subsequently invalidated by YouTube DOM changes), the most reliable solution is the built-in {{transcript}} variable that the Web Clipper populates natively from YouTube’s transcript API rather than scraping the DOM. It doesn’t require the transcript panel to be open. It doesn’t break when YouTube renames a component. It just works.

URL Reconstruction

Changed from {{schema:@VideoObject:@id}} to:

{{schema:@VideoObject:embedUrl|replace:"embed/":"watch?v="}}

The embedUrl field in the schema is consistently present and formatted as https://www.youtube.com/embed/VIDEO_ID. The replace filter converts it to a standard watch URL. This produces a reliable link every time and makes the video embed render correctly in the note.

Duration Cleanup

Changed from a broken two-argument replace to a single clean operation:

{{schema:@VideoObject:duration|replace:"PT":""}}

YouTube stores duration in ISO 8601 format — something like PT1H23M45S. Stripping the PT prefix gives you 1H23M45S which is human-readable enough for a frontmatter field. The original attempt to also strip the trailing S used unsupported filter syntax and caused a validation error.

AI Prompt Block

The original prompt was passing {{transcript}} and {{selectorHtml:...}} references inside the prompt string itself, which caused unclosed string errors in the clipper’s template parser. The fix is to keep the prompt text clean — no nested variable references, no internal quotes — and let the clipper’s context system handle passing the transcript to the model automatically via the context field.


The Final Template

{
  "schemaVersion": "0.1.0",
  "name": "YouTube (Open Transcript) 2026",
  "behavior": "create",
  "noteNameFormat": "{{schema:@VideoObject:uploadDate|date:\"YYYY-MM-DD\"}} {{schema:@VideoObject:author}} - {{schema:@VideoObject:name|safe_name|trim}}",
  "path": "✂ Clippings/YouTube",
  "triggers": [
    "https://www.youtube.com/watch"
  ],
  "properties": [
    { "name": "title", "value": "{{schema:@VideoObject:name}}", "type": "text" },
    { "name": "channel", "value": "{{schema:@VideoObject:author}}", "type": "text" },
    { "name": "url", "value": "{{schema:@VideoObject:embedUrl|replace:\"embed/\":\"watch?v=\"}}", "type": "text" },
    { "name": "published", "value": "{{schema:@VideoObject:uploadDate|date:\"YYYY-MM-DD\"}}", "type": "date" },
    { "name": "created", "value": "{{time|date:\"YYYY-MM-DDTHH:mm:ssZ\"}}", "type": "datetime" },
    { "name": "duration", "value": "{{schema:@VideoObject:duration|replace:\"PT\":\"\"}}", "type": "text" },
    { "name": "thumbnailUrl", "value": "{{schema:@VideoObject:thumbnailUrl|first}}", "type": "text" },
    { "name": "genre", "value": "{{schema:@VideoObject:genre}}", "type": "multitext" },
    { "name": "watched", "value": "", "type": "text" }
  ],
  "noteContentFormat": "![{{title}}]({{schema:@VideoObject:embedUrl|replace:\"embed/\":\"watch?v=\"}})\n\n{{schema:@VideoObject:description|callout:(\"summary\",\"Description\",true)}}\n\n{{\"Analyze this YouTube video and generate the following sections:\\n\\n## Summary\\n\\nBriefly summarize the video in 3-5 sentences.\\n\\n## Key Takeaways\\n\\nList the 5-7 most important takeaways as concise bullet points.\\n\\n## Mindmap\\n\\nGenerate a Mermaid mindmap of the main topics using simple syntax only, no icons.\\n\\n## Notable Quotes\\n\\nList notable quotes from the transcript.\"}}\n\n{{transcript|callout:(\"transcript\",\"Transcript\",true)}}",
  "context": "# {{schema:@VideoObject:name}}\n\n{{schema:@VideoObject:description}}\n\n## Basic Information\n\n- Link: {{schema:@VideoObject:embedUrl|replace:\"embed/\":\"watch?v=\"}}\n- Channel: {{schema:@VideoObject:author}}\n- Duration: {{schema:@VideoObject:duration|replace:\"PT\":\"\"}}\n\n## Transcript\n\n{{transcript}}"
}

What You Get

When you clip a YouTube video with this template, Obsidian creates a note with:

  • Frontmatter — title, channel, watch URL, publish date, clip timestamp, duration, thumbnail URL, genre tags, and an empty watched field you can fill in later
  • Embedded video at the top of the note
  • Collapsible description callout — the full YouTube description tucked away so it doesn’t clutter your reading view
  • AI-generated analysis — summary, key takeaways, a Mermaid mindmap of the topic structure, and notable quotes, all generated at clip time with no extra steps
  • Collapsible transcript callout — the full transcript available for searching and reference

The note name format uses YYYY-MM-DD Channel - Title so your clippings folder sorts chronologically and is scannable by source.


Why This Matters for Research

The real value isn’t the individual note — it’s what happens when you have fifty of them linked together in Obsidian. The transcript becomes searchable across your entire vault. The Mermaid mindmap gives you an instant topic scaffold you can expand into your own notes. The watched property lets you build a Dataview query that surfaces everything you’ve clipped but haven’t processed yet.

For anyone doing serious research from video sources — whether that’s technical content, investigative journalism, academic lectures, or anything else — this turns passive watching into an active, queryable knowledge base. The clip takes about two seconds. The note it produces would take fifteen minutes to write manually.

The template breaks periodically because YouTube changes its DOM. Now you know exactly what to look for when it does.