Stage Clear: May 2025

Stage Clear is a monthly roundup of the projects I've been working on.

Scanning

I scanned a new-to-the-internet issue of Total PC Gaming from 2008! There was a lot of learning, and a lot of personal firsts, involved: melting spine glue with a cheap heat gun, using a (very sharp) rotary cutter to clean up rough page edges, scanning an entire magazine page-by-page on a flatbed scanner (a process that kept me on my feet for two straight hours) since the ADF didn't pull the glossy paper through cleanly, photoshopping out many many tears and glue marks, and making my first contribution to both Retromags and OldGameMags; as a result, this issue is currently downloadable at three distinct locations. I think that counts, informally, as preserved.

An update on PC Gamer I mentioned in last month's Stage Clear: actually, I won't be scanning them. I'd checked the web and then said, overconfidently, they were commercially unavailable, but those issues are still purchasable through the PC Gamer iOS app, and those aren't toes I'm willing to tread on. They've also already been preserved, just not shared with the public. If (or when) the storefront is ever shut down, the people sitting on those rips will need no help from me to get them out there.

Magazine Indexing

Database stats as of the end of May 2025 (delta from last month):

Titles indexed: 26 (+10)
Distinct issues indexed: 2,756 (+630)
Pages indexed: 450,357 (+65,909)

New additions to the index: the years of Australian magazine PC PowerPlay I didn't index last month, every issue of Australian multiplatform magazine Hyper, every issue I could find of PCGames/Electronic Entertainment/PC Entertainment/PC Games (all the same publication), as well as ZeroCD-ROM UserCD-ROM Today, a few issues of Total PC Gaming and the newsletter of the Japanese Game Preservation Society GPS News (both English and Japanese editions).

Next in the queue, I'll continue the PC focus with Computer Games Strategy Plus/Computer Games.

Video Game Morgue File wiki

I put the magazine index through its paces by pulling every page (pre-2002) that mentioned Baldur's Gate. The cloud OCR is so good that it caught a bunch of stuff the original index (which used local OCR) missed out on, so despite the monthly bills I'm satisfied I'm getting my money's worth.

The purpose was to create and fill out a Baldur's Gate wiki page with every ad, preview and review of it I could find. In only the magazines that have already been indexed, that came to 18 previews, 13 reviews and 8 ads!

The search query also pulled (by design) a ton of material relating to the game's expansion and its sequel, so I'll continue to sort through the output and create pages for those titles too.

Software

I couldn't find a duplicate detector that worked on a document level rather than a byte level, so this month I wrote one.

Here's an example: there's a magazine that's been scanned. Those images are zipped and renamed to .cbz. They're also converted to a PDF. Those two files are uploaded. A few months later, some unscrupulous person changes the metadata on the PDF to add their own URL and reuploads it to their own site. Later, someone else who downloaded that second PDF rips the images back out of it and creates a .cbr to upload somewhere else. And then I come along, trying to find the best available version of every issue of that magazine, and download all four files.

Those files will have completely distinct file contents and checksums, so no duplicate detector will flag them. But the images are the same in all four. Visually, they're indistinguishable. For the purposes of file hoarding, they're duplicates, and only one needs to be retained.

So I wrote a script to create a content hash for a file. A string that represents all the images that can be compared with others. I've been running it across my collection as I've been going through it. So far I've freed up a couple of dozen gigabytes, but I suspect there are more duplicates still to catch.

It's a messy, hackish bit of coding that's inextricably hooked into the rest of the indexing pipeline, so it's probably not feasible to get this one up on GitHub.

Retrohistories

Lastly, for the first time since last year, I'm back working on the unfinished video again! I don't yet have any progress to show, and I can't promise a specific release date, but that project is the present focus. Between rewriting Magnus, getting this site up and running and general busy-life stuff, I've neglected the channel for far too long and it's high time I started turning around videos again.

And that's it for the roundup. This time next month I'll be in sunny Norway, spending as much time outdoors as I can, so I'll be back at the tail end of July with a roundup of anything I managed to do in the time I wasn't tramping happily through Oslomarka.

This article was updated on