======================= "ZX7" - by Einar Saukas ======================= "ZX7" is an optimal LZ77/LZSS data compressor for all platforms, including the ZX-Spectrum. Available implementations of standard LZ77/LZSS compression algorithm use either a "greedy" or "flexible parsing" strategy, that cannot always guarantee the best possible encoding. In comparison, "ZX7" provides a highly efficient compression algorithm that always generate perfectly optimal LZ77/LZSS encoding. ===== USAGE ===== To compress a file, use the command-line compressor as follows: zx7 Cobra.scr This will generate a compressed file called "Cobra.scr.zx7". Afterwards choose a decompressor routine in assembly Z80 (for the ZX-Spectrum), according to your requirements for speed and size: * "Standard" routine: 69 bytes only * "Turbo" routine: 88 bytes, about 25% faster * "Mega" routine: 244 bytes, about 30% faster Finally compile the chosen decompressor routine and load the compressed file somewhere in memory. To decompress data, just call the routine specifying the source address of compressed data in HL and the target address in DE. For instance, if you compile the decompressor routine to address 65000, load "Cobra.scr.zx7" at address 51200, and want to decompress it directly to the screen, then execute the following code: LD HL, 51200 ; source address (put "Cobra.scr.zx7" there) LD DE, 16384 ; target address (screen memory in this case) CALL 65000 ; decompress routine compiled at this address It's also possible to decompress data into a memory area that partially overlaps the compressed data itself (only if you won't need to decompress it again later, obviously). In this case, the last address of compressed data must be at least "delta" bytes higher than the last address of decompressed data. The exact value of "delta" for each case is reported by ZX7 during compression. See image below: |------------------| compressed data |---------------------------------| decompressed data start >> <---> delta For convenience, there's also a command-line decompressor that works as follows: dzx7 Cobra.scr.zx7 ===== NOTES ===== For simplicity, provided implementation of ZX7 compressor (in C) will load all data in memory at once, process everything, then write the output file directly. If you want to compress a very large file (over 1Gb), it should take just a few seconds (only a few times more than the time it takes to load the file itself from disk), but it will need a modern computer with lots of memory. More specifically, compressing n bytes of data requires approximately 17n bytes of free memory (plus a small constant overhead). Technically it means compressing within asymptotically optimal space O(n), asymptotically optimal expected time O(n), and asymptotically optimal worst case time O(n*w) only. The provided ZX7 decompressor (in C) works even better. It writes the output file while reading the compressed file, without keeping it in memory. Therefore it always use the same amount of memory, regardless of file size. Thus even very large compressed files (over 1Gb) can always be decompressed using very small computers with limited memory, even if it took a lot of memory to compress it originally. It means decompressing within asymptotically optimal space and time O(n) only, although in this case it means storage space O(n) for input and output files, and much smaller memory space O(w) for processing. As a matter of fact, it would be trivial to modify the ZX7 compressor to operate with limited memory too. Theoretically, the ZX7 algorithm only requires optimal memory space O(w) and optimal storage space O(n) for both compressing and decompression n bytes of data with a sliding window of w bytes. However, such compressor implementation would have to write and read intermediate files twice. Since current ZX7 data format is focused on 8-bits Z80 machines using small data blocks (typically under 64Kb), modifying the ZX7 compressor to generate intermediate files is simply not worth it. ====== EXTRAS ====== ZX7 compressor now contains a few extra "hidden" features, that are slightly harder to use properly, and not supported by the ZX7 decompressor in C. These features are only intended for very specific special cases, so you can safely skip this entire section and simply keep using ZX7 regular features as usual. If you really want to use these new "hidden" features detailed in this section, keep in mind that only certain ZX7 Assembly routines will be able to decompress your data afterwards, and they will take additional steps to make it work. Please read carefully these instructions before attempting to use any of them! 1. COMPRESSING BACKWARDS When using ZX7 for "in-place" decompression (decompressing data to overlap the same memory area storing the compressed data), you must always leave a small margin of "delta" bytes of compressed data at the end. However it won't work to decompress some large data that will occupy all the upper memory until the last memory address, since there won't be even a couple bytes left at the end. A possible workaround is to compress and decompress data backwards, starting at the last memory address. Therefore you will only need to leave a small margin of "delta" bytes of compressed data at the beginning instead. Technically, it will require that lowest address of compressed data should be at least "delta" bytes lower than lowest address of decompressed data. See image below: compressed data |------------------| decompressed data |---------------------------------| <---> << start delta To compress a file backwards, use the command-line compressor as follows: zx7 -b Cobra.scr To decompress it later, you must call one of the supplied "backwards" variants of the Assembly decompressor, specifying last source address of compressed data in HL and last target address in DE. For instance, if you compile a "backwards" Assembly decompressor routine to address 64000, load backwards compressed file "Cobra.scr.zx7" (with size 2450 bytes) to address 51200, and want to decompress it directly to the ZX-Spectrum screen (with 6912 bytes), then execute the following code: LD HL, 51200+2450-1 ; source (last address of "Cobra.scr.zx7") LD DE, 16384+6912-1 ; target (last address of screen memory) CALL 64000 ; backwards decompress routine Notice that compressing backwards may sometimes produce slightly smaller compressed files in certain cases, slightly larger compressed files in others. Overall it shouldn't make much difference either way. 2. COMPRESSING WITH PREFIX The LZ77/LZSS compression is achieved by "abbreviating repetitions", such that certain sequences of bytes are replaced with much shorter references to previous occurrences of these same sequences. For this reason, it's harder to get very good compression ratio on very short files, or in the initial parts of larger files, due to lack of choices for previous sequences that could be referenced. A possible improvement is to compress data while also taking into account what else will be already stored in memory during decompression later. Thus the compressed data may even contain shorter references to repetitions stored in some previous "prefix" memory area, instead of just repetitions within the decompressed area itself. An input file may contain both some prefix data to be referenced only, and the actual data to be compressed. An optional parameter can specify how many bytes must be skipped before compression. See below: compressed data |-------------------| prefix decompressed data |--------------|---------------------------------| start >> <--------------> <---> skip delta As usual, if you want to decompress data into a memory area that partially overlaps the compressed data itself, the last address of compressed data must be at least "delta" bytes higher than the last address of decompressed data. For instance, if you want the first 6144 bytes of a certain file to be skipped (not compressed but possibly referenced), then use the command-line compressor as follows: zx7 +6144 Cobra.cbr In practice, suppose an action game uses a few generic sprites that are common for all levels (such as player graphics), and other sprites are specific for each level (such as enemies). All generic sprites must stay always accessible at a certain memory area, but any level specific data can be only decompressed as needed, to the memory area immediately following it. In this case, the generic sprites area could be used as prefix when compressing and decompressing each level, in an attempt to improve compression. For instance, suppose generic graphics are loaded from file "generic.gfx" to address 56000, occupying 2500 bytes, and level specific graphics will be decompressed immediately afterwards, to address 58500. To compress each level using "generic.gfx" as a 2500 bytes prefix, use the command-line compressor as follows: copy /b generic.gfx+level_1.gfx prefixed_level_1.gfx zx7 +2500 prefixed_level_1.gfx copy /b generic.gfx+level_2.gfx prefixed_level_2.gfx zx7 +2500 prefixed_level_2.gfx copy /b generic.gfx+level_3.gfx prefixed_level_3.gfx zx7 +2500 prefixed_level_3.gfx To decompress it later, you simply need to use one of the normal variants of the Assembly decompressor, as usual. In this case, if you loaded compressed file "prefixed_level_1.gfx.zx7" to address 48000 for instance, decompressing it will require the following code: LD HL, 48000 ; source address (put "prefixed_level_1.gfx.zx7" there) LD DE, 58500 ; target address (level specific memory area in this case) CALL 65000 ; decompress routine compiled at this address However decompression will only work properly if exactly the same prefix data is present in the memory area immediately preceding the decompression address. Therefore you must be extremely careful to ensure the prefix area does not store variables, self-modifying code, or anything else that may change prefix content between compression and decompression. Also don't forget to recompress your files whenever you modify a prefix! In certain cases, compressing with a prefix may considerably help compression. In others, it may not even make any difference. It mostly depends on how much similarity exists between data to be compressed and its provided prefix. 3. COMPRESSING BACKWARDS WITH SUFIX Both features above can be used together. A file can be compressed backwards, with an optional parameter to specify how many bytes should be skipped (not compressed but possibly referenced) from the end of the input file instead. See below: compressed data |-------------------| decompressed data sufix |---------------------------------|--------------| << start <---> <--------------> delta skip As usual, if you want to decompress data into a memory area that partially overlaps the compressed data itself, lowest address of compressed data must be at least "delta" bytes lower than lowest address of decompressed data. For instance, if you want to skip the last 768 bytes of a certain input file and compress everything else (possibly referencing this "sufix" of 768 bytes), then use the command-line compressor as follows: zx7 -b +768 Cobra.cbr In previous example, suppose the action game now stores level-specific sprites in the memory area from address 33000 to 33511 (512 bytes), just before generic sprites that are stored from address 33512 to 34535 (1024 bytes). In this case, these generic sprites could be used as sufix when compressing and decompressing level-specific data as needed, in an attempt to improve compression. To compress each level using "generic.gfx" as a 1024 bytes sufix, use the command-line compressor as follows: copy /b "level_1.gfx+generic.gfx level_1_sufixed.gfx zx7 -b +1024 level_1_sufixed.gfx copy /b "level_2.gfx+generic.gfx level_2_sufixed.gfx zx7 -b +1024 level_2_sufixed.gfx copy /b "level_3.gfx+generic.gfx level_3_sufixed.gfx zx7 -b +1024 level_3_sufixed.gfx To decompress it later, use the backwards variant of the Assembly decompressor. In this case, if you compile a "backwards" decompressor routine to address 64000, and load compressed file "level_1_sufixed.gfx.zx7" (with 217 bytes) to address 39000 for instance, decompressing it will require the following code: LD HL, 39000+217-1 ; source (last address of "level_1_sufixed.gfx.zx7") LD DE, 33000+512-1 ; target (last address of level-specific data) CALL 64000 ; backwards decompress routine Analogously, decompression will only work properly if exactly the same sufix data is present in the memory area immediately following the decompression area. Therefore you must be extremely careful to ensure the sufix area does not store variables, self-modifying code, or anything else that may change sufix content between compression and decompression. Also don't forget to recompress your files whenever you modify a sufix! Also if you are using "in-place" decompression, you must leave a small margin of "delta" bytes of compressed data just before the decompression area. ======= LICENSE ======= The optimal C compressor is available under the "BSD-3" license. In practice, this is relevant only if you want to modify its source code and/or incorporate the compressor within your own products. Otherwise, if you just execute it to compress files, you can simply ignore these conditions. The Z80 assembly decompressors can be used freely within your own ZX-Spectrum programs (or any other Z80 platform), even for commercial releases. The only condition is that you must indicate somehow in your documentation that you have used "ZX7". ======= HISTORY ======= 2012-12-30: Initial release (with optimal compressor and Z80 decompressors) 2012-12-31: Minor changes to source code since it wasn't strictly ANSI C (thanks to Metalbrain!) 2013-01-03: Minor changes to data format increasing maximum offset by 6% thus improving compression, and some improvements in the "Standard" Z80 routine (thanks to Antonio Villena!) 2013-01-05: Further improvements in the "Standard" Z80 routine (thanks to Metalbrain!) 2013-02-25: Further improvements in the "Turbo" Z80 routine, released libraries for z88dk and Boriel's ZX BASIC. 2015-09-01: Implemented C decompressor, and updated compressor to calculate and display "delta" 2016-01-21: Improved documentation. 2016-02-11: Added extra features. 2016-02-12: Improved documentation once more. 2016-06-10: Added command-line option "-f" ======= CREDITS ======= The compressed file format is directly based (although slightly improved) on Team Bomba's Bitbuster - http://www.teambomba.net/bombaman/downloadd26a.html and Gasman's Bitbuster Extreme - www.west.co.tt/matt/speccy/apology/ Some of the size improvements used in "Standard" version were suggested by Antonio Villena and Metalbrain. The main speed improvement used in "Turbo" version was originally suggested by Urusergi for Magnus Lind's Exomizer - http://hem.bredband.net/magli143/exo/ The optimal LZ77/LZSS compress algorithm was invented by myself (Einar Saukas). To the best of my knowledge, there was no similar high-performance solution available. I thereby present this implementation as evidence of "prior art", thus preventing anyone from ever patenting it. Software patents are evil!!!