Back to TIL
Astro

Lightweight Excerpts for Astro

How can I easily create excerpts for my content in Astro?

That was a question that hit me last week when I ported my website over to Astro.

Update 22-11-2024:

I fixed anchor tags being converted to The Link Text [/the/link/]. Now, only the text remains.

My quick-hack was a Claude generated function that loaded the Markdown and stripped every bit of Markdown-syntax using multiple regular expressions. I didn’t like it, but it did the job.

Until this week…

This week I posted my TIL about disk usage analysis and I need to use MDX. Due to the now complex HTML, my excerpt builder went up in flames. Well, that didn’t take long.

A broken auto-generated excerpt, showing internal JS code.

There are quite a few approaches out there to doing excerpts: Paul Scanlon’s, Igor Dimitrijević’ and Chen Hui Jing’s.

Here is another approach. Inspired by Chen’s, but relying heavily on popular dependencies:

$ npm i micromark micromark-extension-mdxjs micromark-extension-frontmatter html-to-text

Here’s the function. I placed it into utils/excerpt.ts:

import { micromark } from "micromark";
import { frontmatter, frontmatterHtml } from "micromark-extension-frontmatter";
import { mdxjs } from "micromark-extension-mdxjs";
import { compile } from "html-to-text";

// html-to-text offers a lot of options for customization.
// https://www.npmjs.com/package/html-to-text
const compiledPlaintextConvert = compile({
  wordwrap: false,
  ignoreHref: true,
  uppercase: false,
  selectors: [{ selector: "a", options: { ignoreHref: true } }],
});

/**
 * Creates an excerpt from markdown content
 * @param content The markdown content
 * @param maxLength Maximum length of the excerpt
 * @returns Truncated content with "…" if needed
 */
export function createExcerpt(
  content: string,
  maxLength: number = 100,
): string {
  // Render Markdown, supports MDX and Frontmatter.
  const html = micromark(content, {
    extensions: [frontmatter(), mdxjs()],
    htmlExtensions: [frontmatterHtml()],
  });
  // Convert HTML to plaintext.
  const plaintext = compiledPlaintextConvert(html);

  // Truncate the content.
  if (plaintext.length <= maxLength) return plaintext;
  const lastSpace = plaintext.lastIndexOf(" ", maxLength);
  return lastSpace === -1
    ? plaintext.slice(0, maxLength) + "…"
    : plaintext.slice(0, lastSpace) + "…";
}

I quite like the readability of it. Plus, it supports frontmatter and MDX. Works nicely:

An auto-generated excerpt, this time not showing any JavaScript or HTML. Only goold ol’ plain text.