The story of the recovery

I can give my half, though some of it is lost to faded memory. As I recall, after trying to play the game and having it crash pretty quick, I used ‘txd’ to disassemble it. It was too badly damaged for that, so I hacked up ‘txd’ to basically ignore errors as much as possible and plough on. If you try this, the end of the disassembly before it fails entirely contains a bunch of this


      13006:  0d 0d 0d                store           local12 #0d
      13009:  0d 0d 0d                store           local12 #0d
      1300c:  0d 0d 0d                store           local12 #0d
      ...
      13075:  0d 0d 0d                store           local12 #0d
      13078:  12 80 02 62             get_prop_addr   "tower" #02 -> g52
  

That's not good, but it pointed me in more or less the right direction, and had me guessing some kind of run-length-encoding thing. I think I tried just removing the '0d's and getting some glimpses of text beyond that point. Eventually I looked further back in the disassembly and noticed this section:


      12954:  09 09 09 09             and             #09 #09 -> local8
      12958:  09 09 09 09             and             #09 #09 -> local8
      1295c:  09 09 09 09             and             #09 #09 -> local8
      12960:  09 09 09 09             and             #09 #09 -> local8
      12964:  09 09 09 09             and             #09 #09 -> local8
  

Somewhere around this time I realized I needed to look in the .dsk file, not the .z3 file, because of the effects of interleave. I found that removing some of the '09's made the interleave work right (for track $15 — an Apple II track is 4096 bytes, and the first three tracks aren't used for the game, so track $15 is 0x12000-0x12FFF in the Z-code). This confirmed the idea that I was looking at a disk image that had been compressed and then uncompressed incorrectly. I removed the '09s', shifted a section by a byte, matched some other damaged parts up against v4 to deduce what they were, and I had a good track $15.

I removed the '0d's, but while track $16 certainly seemed to have snippets of good text and code, it was extremely damaged. By comparing it to v4, I was able to figure out some patterns

  1. Except at the beginning and end of the track, some byte values were completely fine.

  2. Other byte values could represent 1,2, or 3 different values, with no indication of which is which.

  3. The beginning and end of the track were a mess.

I think I worked through track $16 and part of $17, putting in patches by modifying a copy of the ‘ap2inf’ program to apply them as it extracted the .dsk file, before I realized what was going on. Some sort of table-based compression scheme had been used on each track separately, and the tables had been corrupted. Since DDD was far and away the most common compressor in the Apple ][ pirate scene (and this file proudly presented itself as “Cracked by Coast to Coast”), I thought of it immediately. I tracked down an annotated copy of DDD (the internet is amazing), and things became clear.

The DDD compression scheme is simple. The whole disk is recorded as a single bit stream. Each track begins with 20(decimal) octets which are the 20 most common bytes in the uncompressed track. These 20 octets map to prefix codes which are 4,5,6, or 7 bits long. There are 2 8 bit prefix codes, one of which is used to indicate run length encoding; it's followed by two more 8 bit codes with the byte to repeat and the repeat count. The other is never written but if read, is also treated as the RLE code. All other bytes are encoded with 9 bits.

So what had happened is there was some decoding error on track $15, which had resulted in two erroneous RLE codes being read. This caused DDD to fill its track buffer before the end of the compressed track. It then wrote out the damaged track and read the next 20*8 = 160 (decimal) bits into its compression table. Then it started merrily reading along using a completely wrong compression table… and then it started reading the real compression table for the next track as if it were ordinary data. And then it started reading the next track, still with the bogus compression table. Because it started early, it would end early, and the same thing would happen with every subsequent track, with usually minor shifts — though towards the end there's another RLE code error which shifted it more.

This explains the substitutions. The prefix code '1100' (which as it turns out always represents '00', because that's the most common byte in every track) would decode as whatever the first byte in that nonsense table was — hex '62', for track $16. But the 9-bit code for hex '62' would ALSO decode as hex '62'. And to make it worse, while the bytes in the real tables were always unique, the bogus tables were under no such constraint. On track $16, another prefix code, which should properly represent '02', also was in the table as '62'. So the correct '00', '02', and '62' would be '62' in the damaged file.

It also explains the mess at the beginning and end of the track. Basically each track can be divided into several regions

  1. Garbage caused by “decoding” the frequency table
  2. Decoded data where the decoder is not aligned with the start of the codes
  3. The bulk of the track, aligned data decoded with the broken table that's really some previous track data,
  4. “Lost” data, where the decoder was reading track data as a new frequency table.
  5. More misaligned data
  6. Aligned data decoded with the new frequency table.

What I did is use comparison with v4 to guess the alignment of section 3 and discard section 1. If I got this wrong I could tell because the interleave would be wrong and things would break at address $xx100, so I could adjust it. I would go through and, by comparing with v4 (repeatedly running my hacked txd), fill in section 2 and substitute the proper values in for section 3. Then when I hit section 4 (substitutions would stop making sense), I would guess the alignment of section 6, fill in the “hole” where 4 and 5 were, and do substitutions for section 6.

Originally I was doing this by patching a copy of ap2inf by hand for each substitution. This was not going to get done before the heat death of the universe. After Alessandro contacted me I started making new tools. First was modifying ap2inf to take a patch list. The patch list could contain ‘default’ substitutions; this is useful because it turns out the literal substitution is by definition the least likely value for a byte to be. Also if you can fill in opcode bytes (especially PRINT and PRINT_RET) automatically, the disassembly works a lot better. Next was a program that would go through and find the next uncertain byte in a track, and suggest the possible substitutions (based on the ones I did already). Then some emacs fun — I wrote some elisp to hilight the byte I was working on in a side-by-side view of the v65 and v4 disassemblies. And an automatic “string rectifier” which would fix identical strings automatically when I reached them. All this sped things up enormously.

Alessandro figured out how to recover Section 4 and the misaligned sections; one of those sections falls on a difference between v4 and v65.

We each independently did a full repair, and checked the files against each other. I don't know his exact technique; I didn't ask, as I figured keeping them different would mean errors would be less likely to be identical between our files. I also melted my eyeballs checking my own work, especially the early stuff. And I wrote tools to tell me of calls to routines which didn’t exist, jumps to bad destinations, two-byte immediates that were neither strings nor routines (there are some legit ones), and finally two-byte immediates and routines which could have been ambiguous (these were very annoying).

There was some other damage to the file, sections of bytes which had been XORed with various “almost” constants. These were aligned to the sectors in the .dsk file and were probably either unrelated later read errors, or happened when writing out the uncompressed data.

I made an attempt to figure out what the damaged compressed data was. What I determined (through the ridiculous speed of modern computers, checking out 2^28 or so possibilities) is there’s no possible pattern of damaged compressed data for track $15, that is the right length, which could have caused the trouble. There is one pattern that's one bit short (though missing a bit is not the ONLY problem, which was a theory at one point, as the very first error is a shifted byte), and many that are longer. So either I've made an error, or the problem was not damage to the compressed file but some sort of actual processor or memory malfunction while decoding.