Over the past two weeks i’ve been hit by somewhat of a nostalgia trip, after coming across a copy of my old blog on archive.org and not having a suitable backup of the content itself i’ve decided to try and preserve my posts in a useful format for the future.
I spent some time on what format I would ideally like to keep these posts in, text with some simple markup for links and formatting. The obvious chose was Markdown, sticking to the true core of the markup and not using any extensions or plugins available for most parsers these days.
Of course not everything should be embedded in the post, so I took pointers from Hugo and Jeykll and worked the posts into a YAML/Markdown hybrid. YAML’s multidocument feature makes it easy to parse these documents with a few lines of Python:
import yaml with open(args.file) as fobj: raw_doc = fobj.read() _, header, text = raw_doc.split('---') docs = [yaml.safe_load(header), text]
So an example file would look like:
--- title: "Post Title" date: 2020-07-09 tags: - tag1 - tag2 --- Post *content* **goes** [here](#)
Gathering The Corpus
As it turns out, I wrote a lot of posts. While I don’t think i’m at the level of a daily blogger I was a very directed technical person who only took time to write a blog post when I felt the need to broadcast something to the world. So having a collection of over 300 posts feels impressive for me, sure a vast majority of them a small and pointless posts, but good for me.
The issue next is that a lot of these posts were in various formats: B2, B2evolution, Mephisto, Wordpress, and archive.org stored only one type of output: HTML. Converting these posts is all manual labor, find the post on archive.org, copy and paste the formatted text, and correct the formatting in Visual Studio Code.
So far this process is ongoing, i’ve done 3 years of posts and i’ve still got 10 years more…
Reviewing My Old Posts
The next job would be to review the posts and see what is applicable and acceptable for the modern internet. 2003-2007 was a strange and weird time on the internet, most of the big social media wasn’t around and people were generally safe posting far too much information online. Everyone had their guard down and blogs were usually only seen by close friends and family in a person’s “bubble”.
For example I purchased a Apple Powerbook in 2003, for some reason I felt the need to share the tracking number of my deliver online for people to view where my delivery was up to. Today that’d be stupid, really stupid. I think if you did that now on social media someone would call up the courier and re-route your parcel.
Another “elephant in the room” is the direct posting of my inane mental drivel from my angst filled years of my late teens. I was stupid, young, and not really thinking about what I posted and why. In the end it did actually lose my job due to a post I made on that blog, and today some of those posts will be classified as hateful. Thankfully i’ve grown and i’m not the same person as back then, so these posts will be confined to this post archive and archive.org forever and will never be published again, stored there as a reminder to myself.
Gathering my posts has been an interesting exercise, while a lot of up front work i’ll hopefully be able to avoid this in the future by sticking to the Hugo/Jekyll mixed YAML/Markdown format for all future posts. At least now i’ll have a corpus of my output in one place that I can take to any new re-design or site that takes my fancy.
Also from a GTD this feels like one long review of my past, it has allowed me to see some of my actions in new light, see old hobbies i’ve long since forgotten in a new light, and gave me some new aims for the future.