Manually archiving Reddit posts with pandoc

Tuesday, March 03, 2026

The other day I wanted to archive a post from Reddit. This is necessary for long-term note taking: Reddit posts and replies disappear regularly because people delete their accounts. Obviously, this is their right, and they probably have good reasons. It does mean that saving a Reddit link to a helpful post is not good enough if you want to reference the post later.

To be sure you can view the post later on, the only truly effective way is to copy the text of the post or reply and store it locally. Unfortunately just selecting the text in the browser does not preserve formatting, inline links, etc. Turns out there’s an effective and quick way to get, more or less, the original Markdown formatted source: convert the HTML source to Markdown using pandoc.

Note that you should only do this from reference-type posts and replies, e.g. link dumps, DnD adventure summaries, how-to guides. Anything that could be interpreted as personal is better left under the author’s complete control.

Here’s how it works:

  1. Open the post you’re interested in in the browser, and open the HTML inspector pane (probably by pressing F12).

  2. Select the element containing the text you’re interested in.

  3. Right click > Copy > Inner HTML. Depending on your browser getting the inner HTML might be slightly different.

  4. Paste the inner HTML into a file, e.g. contents.html.

  5. Convert the HTML into Markdown using pandoc:

    pandoc -i contents.html -o contents.md

In my experience the output is pretty decent. You might have to fix some stray line breaks, and there are a few formatting constructs pandoc is a bit too eager to preserve. But the plaintext Markdown source already reads better than the raw HTML. Probably with some effort this approach could easily be braincoded into an automated script. The only trick to automate is finding the right element to focus on.


View as: md (raw), txt.

Generated with BYOB. License: CC-BY-SA. This page is designed to last.

[ This site is part of the UT webring ]

no ai webring previous random next a white rectangular box with a dashed outline with the words 'the no ai webring' in the centre, with a parenthetical question mark next to it and two arrows pointing left and right to either side