PZX (Perfect ZX Tape) file format version 1.0 ============================================= The PZX file format consists of a sequence of blocks. Each block has the following uniform structure: 0 u32 tag - unique identifier for the block type. 4 u32 size - size of the block in bytes, excluding the tag and size fields themselves. 8 u8[size] data - arbitrary amount of block data. The block tags are four ASCII letter identifiers indicating how to interpret the block data. The first letter is stored first in the file (i.e., at offset 0 of the block), the last letter is stored last (i.e., at offset 3). This means the tag may be internally conveniently represented either as 32bit multicharacter constants stored in big endian order or 32bit reversed multicharacter constant stored in little endian order, whichever way an implementation prefers. All integral values specified by this format are stored in little endian format, that is, the least significant byte at lower offsets. All values specified are in decimal, unless prefixed with 0x, in which case they are hexadecimal. Any bits specified as unused should be set to zero by encoding implementations and ignored by decoding implementations. All strings specified by this format are stored in UTF-8 encoding. However implementations are not required to render Unicode data at all. Any implementation may choose to render only the characters it understands and replace the rest with either a question mark character, or distinctive hexadecimal representation of the character, whatever the implementor prefers. For example an implementation which understands ASCII characters only may render only those ASCII characters in the 32-126 range it understands. Also note that the minimal compliant implementation is not required to render any strings at all. The sole purpose of the format is to encode sequence of pulses of certain duration. Pulse level is defined to be either low or high. When loading, low level corresponds to the bit 6 of port 0xFE reset (EAR on) and high level set (EAR off). Similarly, when saving, low level corresponds to the bit 3 of port 0xFE reset (MIC on) and high level to set (MIC off). The block descriptions below define the initial level at start of each block (and would do so even for any future block which might be ever introduced, so an implementation may take this for granted), and assume the level changes after each pulse generated (even if its duration is zero). However note that implementations may as well decide to invert that initial level and invert the level before each pulse, if it makes them easier to implement. All that matters is that the level of the pulses themselves remain the same. Any durations are expressed in T cycles of standard 48k Spectrum CPU. This means one T cycle equals 1/3500000 second. It's up to an implementation to convert the durations appropriately if it may not or does not want to use the T cycles directly. However note that this is not necessary in case of 128k emulation. In that case the CPU speed is not substantially different and the ROM I/O routines use the same timing constants anyway, so an implementation may safely use the T cycles directly as well. Overall regarding duration precision, note that the pulses generated by the original Spectrum ROM saving routine itself vary by +1, -3, and -1 T cycles in case of first pulse of first bit of leader, regular, and checksum bytes, respectively. And memory contention can increase this by another 6 T cycles. This means that an encoding implementation may choose to consider pulses which do not vary by more than about 2% as reasonably equal for comparison purposes when encoding pulse durations. Blocks ------ There are four mandatory block types which any implementation must implement in order to claim PZX compatibility, namely PZXT, PULS, DATA and PAUS blocks. All other block types may be safely ignored, although implementors are welcome to implement them as they see fit. However whenever an implementation encounters a block it doesn't understand, it must skip it and continue as if the block was not there at all. In case the specified block size is smaller than the minimum size possible for given block type, it is an error and an application may either ignore the block and try to process the next one or to stop processing the file entirely, whatever an implementor finds more appropriate. Similarly, if some data inside the block indicate that the block should have at least some minimum size but the specified block size is smaller than this, this is an error as well and the above applies as well. To allow custom extensions, anyone is free to define his own custom block, however to prevent clashes with the standard, such blocks may use only lowercase tag names, while any standardized blocks use uppercase tag names. The core blocks which must be supported are: PZXT - PZX header block ----------------------- 0 u8 major - major version number (currently 1). 1 u8 minor - minor version number (currently 0). 2 u8[?] info - tape info, see below. This block distinguishes the PZX files from other files and may provide additional info about the file as well. This block must be always present as the first block of any PZX file. Any implementation should check the version number before processing the rest of the PZX file or even the rest of this block itself. Any implementation should accept only files whose major version it implements and reject anything else. However an implementation may report a warning in case it encounters minor greater than it implements for given major, if the implementor finds it convenient to do so. Note that this block also allows for simple concatenation of PZX files. Any implementation should thus check the version number not only in case of the first block, but anytime it encounters this block anywhere in the file. This in fact suggests that an implementation might decide not to treat the first block specially in any way, except checking the file starts with this block type, which is usually done because of file type recognition anyway. The rest of the block data may provide additional info about the tape. Note that an implementation is not required to process any of this information in any way and may as well safely ignore it. The additional information consists of sequence of strings, each terminated either by character 0x00 or end of the block, whichever comes first. This means the last string in the sequence may or may not be terminated. The first string (if there is any) in the sequence is always the title of the tape. The following strings (if there are any) form key and value pairs, each value providing particular type of information according to the key. In case the last value is missing, it should be treated as empty string. The following keys are defined (for reference, the value in brackets is the corresponding type byte as specified by the TZX standard): Publisher [0x01] - Software house/publisher Author [0x02] - Author(s) Year [0x03] - Year of publication Language [0x04] - Language Type [0x05] - Game/utility type Price [0x06] - Original price Protection [0x07] - Protection scheme/loader Origin [0x08] - Origin Comment [0xFF] - Comment(s) Note that some keys (like Author or Comment) may be used more than once. Any encoding implementation must use any of the key names as they are listed above, including the case. For any type of information not covered above, it should use either the generic Comment field or invent new sensible key name following the style used above. This allows any decoding implementation to classify and/or localize any of the key names it understands, and use any others verbatim. Overall, same rules as for use in TZX files apply, for example it is not necessary to specify the Language field in case all texts are in English. PULS - Pulse sequence --------------------- 0 u16 count - bits 0-14 optional (see bit 15) repeat count, always greater than zero bit 15 repeat count present: 0 not present 1 present 2 u16 duration1 - bits 0-14 low/high (see bit 15) pulse duration bits bit 15 duration encoding: 0 duration1 1 ((duration1<<16)+duration2) 4 u16 duration2 - optional (see bit 15 of duration1) low bits of pulse duration ... ditto repeated until the end of the block This block is used to represent arbitrary sequence of pulses. The sequence consists of pulses of given duration, with each pulse optionally repeated given number of times. The duration may be up to 0x7FFFFFFF T cycles, however longer durations may be achieved by concatenating the pulses by use of zero pulses. The repeat count may be up to 0x7FFF times, however more repetitions may be achieved by simply storing the same pulse again together with another repeat count. The optional repeat count is stored first. When present, it is stored as 16 bit value with bit 15 set. When not present, the repeat count is considered to be 1. Note that the stored repeat count must be always greater than zero, so when decoding, a value 0x8000 is not a zero repeat count, but prefix indicating the presence of extended duration, see below. The pulse duration itself is stored next. When it fits within 15 bits, it is stored as 16 bit value as it is, with bit 15 not set. Otherwise the 15 high bits are stored as 16 bit value with bit 15 set, followed by 16 bit value containing the low 16 bits. Note that in the latter case the repeat count must be present unless the duration fits within 16 bits, otherwise the decoding implementation would treat the high bits as a repeat count. The above can be summarized with the following pseudocode for decoding: count = 1 ; duration = fetch_u16() ; if ( duration > 0x8000 ) { count = duration & 0x7FFF ; duration = fetch_u16() ; } if ( duration >= 0x8000 ) { duration &= 0x7FFF ; duration <<= 16 ; duration |= fetch_u16() ; } The pulse level is low at start of the block by default. However initial pulse of zero duration may be easily used to make it high. Similarly, pulse of zero duration may be used to achieve pulses lasting longer than 0x7FFFFFFF T cycles. Note that if the repeat count is present in case of zero pulse for some reason, any decoding implementation must consistently behave as if there was one zero pulse if the repeat count is odd and as if there was no such pulse at all if it is even. For example, the standard pilot tone of Spectrum header block (leader < 128) may be represented by following sequence: 0x8000+8063,2168,667,735 The standard pilot tone of Spectrum data block (leader >= 128) would be: 0x8000+3223,2168,667,735 For the record, the standard ROM save routines create the pilot tone in such a way that the level of the first sync pulse is high and the level of the second sync pulse is low. The bit pulses then follow, each bit starting with high pulse. The creators of the PZX files should use this information to determine if they got the polarity of their files right. Note that although most loaders are not polarity sensitive and would work even if the polarity is inverted, there are some loaders which won't, so it is better to always stick to this scheme. DATA - Data block ----------------- 0 u32 count - bits 0-30 number of bits in the data stream bit 31 initial pulse level: 0 low 1 high 4 u16 tail - duration of extra pulse after last bit of the block 6 u8 p0 - number of pulses encoding bit equal to 0. 7 u8 p1 - number of pulses encoding bit equal to 1. 8 u16[p0] s0 - sequence of pulse durations encoding bit equal to 0. 8+2*p0 u16[p1] s1 - sequence of pulse durations encoding bit equal to 1. 8+2*(p0+p1) u8[ceil(bits/8)] data - data stream, see below. This block is used to represent binary data using specified sequences of pulses. The data bytes are processed bit by bit, most significant bits first. Each bit of the data is represented by one of the sequences, s0 if its value is 0 and s1 if its value is 1, respectively. Each sequence consists of pulses of specified durations, p0 pulses for sequence s0 and p1 pulses for sequence s1, respectively. The initial pulse level is specified by bit 31 of the count field. For data saved with standard ROM routines, it should be always set to high, as mentioned in PULS description above. Also note that pulse of zero duration may be used to invert the pulse level at start and/or the end of the sequence. It would be also possible to use it for pulses longer than 65535 T cycles in the middle of the sequence, if it was ever necessary. For example, the sequences for standard ZX Spectrum bit encoding are: (initial pulse level is high): bit 0: 855,855 bit 1: 1710,1710 The sequences for ZX81 encoding would be (initial pulse level is high): bit 0: 530, 520, 530, 520, 530, 520, 530, 4689 bit 1: 530, 520, 530, 520, 530, 520, 530, 520, 530, 520, 530, 520, 530, 520, 530, 520, 530, 4689 The sequence for direct recording at 44100kHz would be (assuming initial pulse level is low): bit 0: 79,0 bit 1: 0,79 The sequence for direct recording resampled to match the common denominator of standard pulse width would be (assuming initial pulse level is low): bit 0: 855,0 bit 1: 0,855 After the very last pulse of the last bit of the data stream is output, one last tail pulse of specified duration is output. Non zero duration is usually necessary to terminate the last bit of the block properly, for example for data block saved with standard ROM routine the duration of the tail pulse is 945 T cycles and only than goes the level low again. Of course the specified duration may be zero as well, in which case this pulse has no effect on the output. This is often the case when multiple data blocks are used to represent continuous stream of pulses. PAUS - Pause ------------ 0 u32 duration - bits 0-30 duration of the pause bit 31 initial pulse level: 0 low 1 high This block may be used to produce pauses during which the pulse level is not particularly important. The pause consists of pulse of given duration and given level. However note that some emulators may choose to simulate random noise during this period, occasionally toggling the pulse level. In such case the level must not be toggled during the first 70000 T cycles and then more than once during each 70000 T cycles. For example, in case of ZX Spectrum program saved by standard SAVE command there is a low pulse of about one second (49-50 frames) between the header and data blocks. On the other hand, there is usually no pause between the blocks necessary at all, as long as the tail pulse of the preceding block was used properly. Additional blocks which should be supported: BRWS - Browse point ------------------- 0 u8[?] text - text describing this browse point This block may be used when it is desirable to mark some point on the tape as target for tape browsing. The text is the single line describing this target. If an implementation supports tape browsing, it should implement a mode in which it allows jumping to PZXT or BRWS blocks only. These blocks are the reasonable points to which jumping makes sense, and both provide descriptive name of that point. In case of the PZXT block it is the tape title (or "Tape start" if no name is specified) and in case of BRWS block it is the the description of that browse point. An implementation may choose to also implement a mode in which it allows jumping to arbitrary blocks as well. In such case the blocks may be referred to by their tag names as well, and an implementation may choose to attempt to extract any additional information from the block itself as it sees fit. Creators of the PZX files are urged to put the BRWS blocks to wherever it makes sense. They are often not needed, but they should be inserted prior any level data block to which a user might eventually need to fast forward or rewind to, together with appropriately descriptive name. STOP - Stop tape command ------------------------ 0 u16 flags - when exactly to stop the tape (1 48k only, other always). This block is used to instruct an emulator to stop the virtual tape deck. The flags field is used to specify under which conditions this should happen. In case the value is 0, the tape should be always stopped, in case the value is 1, it should be stopped only if 48k Spectrum is being emulated. Other values are not defined yet, however for future compatibility, any implementation should treat any value it doesn't understand as 0. Creators of the PZX files are urged to put these blocks to whichever position the program asks the user to stop the tape at, and use the flags field appropriately to the situation as well. Issues ====== Q: Some pulse sequences are more common than the others. Shall we include some of them in the format definition and provide a way of simply specifying them in the blocks? For example, PULS block of zero size might refer to the standard pilot sequence. Similarly, in DATA block, setting p0 to 0 might indicate that p1 specifies one of the default sets of sequences, say 0 for standard ZX Spectrum and 1 for ZX81. Do we want to do either or both of this or is it just a feature creep? A: It is better not to hard code any specific knowledge of this type into an implementation. This way an implementation doesn't have to deal with special cases and it has a better chance of understanding any future revisions of the format. The size savings alone are not worth it. F.A.Q. ====== Q: How do I recognize that a DATA block is suitable for flashloading? A: By testing the pulse sequences. What exactly you need to test may depend on what loaders your implementation can flashload. In the common case of the standard loaders you would simply test that each sequence consists of two non-zero pulses, and that the total duration of the sequence s0 is less than the total duration of sequence s1. Mapping of TZX blocks to PZX blocks: ==================================== Here is a brief roadmap how each of the TZX blocks may be mapped to corresponding PZX block(s): ID 10 - Standard speed data block Trivially encoded with PULS and DATA blocks. ID 11 - Turbo speed data block Trivially encoded with PULS and DATA blocks. ID 12 - Pure tone Trivially encoded with PULS block. ID 13 - Sequence of pulses of various lengths Trivially encoded with PULS block. ID 14 - Pure data block Trivially encoded with DATA block. ID 15 - Direct recording block Encoded with DATA block, following the example sequences in DATA block description. ID 18 - CSW recording block Encoded either as PULS block or DATA block, using the DRB like encoding with appropriate denominator in the latter case. In either case, it would help if the pulse durations are cleaned from the noise caused by interfering sampling frequency. ID 19 - Generalized data block Encoded as PULS block and DATA block for alphabets with no more than 2 symbols. In case of 3+ symbols for the data the pilot would be encoded as PULS block, and for data an encoding same as for CSW above would be used. ID 20 - Pause (silence) or 'Stop the tape' command Encoded either as PAUS block or STOP block. ID 21 - Group start Encoded as BRWS block. ID 22 - Group end Not needed. ID 23 - Jump to block ID 24 - Loop start ID 25 - Loop end ID 26 - Call sequence ID 27 - Return from sequence ID 28 - Select block All these were dropped as not really needed. ID 2A - Stop the tape if in 48K mode Encoded as STOP block. ID 2B - Set signal level Not needed, every block defines pulse levels regardless of other blocks. ID 30 - Text description Encoded as BRWS block. ID 32 - Archive info Included as part of PZXT block. ID 31 - Message block ID 33 - Hardware type ID 35 - Custom info block All these were dropped as not needed, although they might be easily included via custom blocks if anyone really has some use for them. ID 5A - "Glue" block (90 dec, ASCII Letter 'Z') Not exactly needed, although PZXT block may be used in the same way as well. History ======= 1.0 (28.6.2007) * Changed ZXTP tag to PZXT to eliminate any possible problems with draft implementations. 0.3 draft (20.5.2007) * Some values are now intentionally limited to 31 bits to prevent overflow issues while decoding. * Number of pulses of DATA block bit sequences is now stored as 8 bits only. 0.2 draft (12.5.2007) * The ZXTP block now stores the info types verbosely (suggested by Kio and AndyC). * The encoding of PULS block now allows 32 bit durations and optional repeat count. * The DATA block and PAUS blocks now contain explicit initial pulse level. * The PAUS block now consists of single pulse, and specifies how random noise should be used. 0.1 draft (28.4.2007) + Patrik Rak presents the initial draft release.