Home » Archiving external resources

Beto Dealmeida's avatar

Archiving external resources

Automatically archiving external links in blog posts

One of the things that my blog engine, nefelibata[archived], focuses on is data persistence. Posts and associated metadata are saved in text files, either as Markdown or JSON, and there are no databases. External images are mirrored locally, and the user will receive a warning if their blog contains any external CSS, Javascript or image files.

I was reading about The Internet Archive and had an idea. I'm a big fan of the Wayback Machine, and I knew you could ask for it to save a page. Turns out it's very easy to do it programmatically, so I modified my blog engine to request a snapshot of every external link in a blog post. If everything worked out well the first link in this post will have two extra attributes, data-archive-url and data-archive-date:

<a
    href="https://github.com/betodealmeida/nefelibata/"
    data-archive-url="https://web.archive.org/web/20200609031420/https://github.com/betodealmeida/nefelibata/"
    data-archive-date="2020-06-09T03:14:20+00:00"
>
    nefelibata
</a>

I'm now working on CSS and Javascript to surface these links correctly. This way readers can check the archived links if the original one is gone by the time they read the post.

Comments

You can engage with this post on Mastodon, Twitter, Webmention or WT.Social.