h1 RLES Files p 2007-02-17 kio@little-bat.de 2009-06-24 revised h2 Summary p RLES files are intended to store audio cassette tapes with programs or data for home computers of the eighties. RLES files store square wave signal timings only, which were used by most home computers of this era, whithout information about the exact signal level. Only output "high" and "low" are encoded. Within this limit they can store any audio tape. In addition they may contain information for display to the user. p RLES = run-length encoded square wave. p File name extension: ".rles" h4 Compression p Apart from their inherent data reduction compared to full audio files, rles files are not compressed. They are approximately 10-fold in size compared to the contained data. They should be compressed with a widely accepted compressor on file scope, e.g. with ZIP. If the preferred compression method changes over time then they may be simply decoded and re-compressed with the new method without touching the contents. p A file format meant for archiving should be as simple as possible. h2 Format Description h3 File Format p All 2 or 4 byte values are stored with the low bytes first, as the Z80 does. char = character; 8 bits; utf-8 unicode encoded text ushort = 2 bytes; 16 bits; lsb first ulong = 4 bytes, 32 bits; lsb first pre ; File magic char[8] "RlesTape" ; 8 bytes: file magic char[4] "1.1", $00 ; 4 bytes: major + minor version, delimiter ; info text for display: (multiple, optional) char[4] "info" ulong length ; total length of data following char[] "info text",$00 ; UTF-8 with c-style delimiter char[] padding ; optional some padding to full length ; Raw data: (multiple) char[4] "rles" ulong length ; total length of data [bytes] following ulong sampling_frequency ; frequency in hertz char[] rles_data ; one high and low phase per byte ; Header of a file concatenated by the user: char[8] "RlesTape" ; file magic char[4] "1.0", $00 ; major + minor version ... h3 The "File Magic" p The file starts with a "magic cookie". In addition to the file name extension this file magic indicates a rles file. p It consists of the text "RlesTape", followed my the major and minor file revision number, each exactly one alphanumeric character, delimited by a dot "." and terminated by $00. Total size of the file magic is 12 bytes. p Any decoder which supports a certain major revision number must be able to parse any file with this major revision number. It may ignore not yet supported blocks introduced by later sub revisions. The major revision number will probably never change. h3 Block Layout h4 General Layout p All data is grouped in "blocks". All blocks, whether already defined or defined in future sub revisions, must follow the following sheme, so that old decoders still can parse the file: pre char[4] "abcd" ; type identifier ulong length ; total byte count of data following char data[length] ; data following h4 Public and Private Block Types p Encoders may store blocks which are not officially defined. In order for not to interfere with officially defined blocks in future sub revisions of the file standard they should use at least one uppercase ascii letter in the type identifier. Public block types will always only contain lowercase letters, eventually also underscore and digits. If a programmer whishes to extend the public file version with a new block type, then please contact me for submission and discussion. h4 Unknown Block Types p Unknown blocks are either blocks defined in later sub revisions or private block types of the encoder and can be skipped assuming the general block layout. Or they are simply junk, and skipping will fail. Then the file may be assumed to end here and may be truncated by an encoder. h4 Empty Files p Empty files may be either 0 byte long (so that "touch foo.rles" works) or may consist of the file magic plus some empty "rles" data blocks. A 0 byte long file is preferred for storing empty files. h3 "rles" Data Block p There should be at least one "rles" data block in a rles file. p "rles" blocks are intermixed with other block types. pre char[4] "rles" ulong length ulong sampling_frequency char data[length-4] ; rles data h4 Sampling Frequency p The "rles" data block starts with the declaration of the sampling frequency used when encoding this block. Encoders should use the 'natural' frequency of their audio hardware or an integer fraction thereof. e.g. typical sampling frequencies are 44100, 22050 and 11025 Hz. Each "rles" block may have it's individual sampling frequency. h4 Run Length Encoded Data p Each byte of the "rles" data stores the duration of one "high" and one "low" phase of the monoaural audio signal. p The duration of the "high" audio signal phase is stored in the upper nibble and the duration of the "low" audio signal phase is stored in the lower nibble. p The high nibble of a byte is recorded first and the low nibble is recorded thereafter. p Polarity is always as saved by the computer but may be inverted by the decoder if desired. Some computers are insensitive to polarity, e.g. the ZX Spectrum, others are not, e.g. the C64. h5 Nibble Values p Each nibble may be any value from 0 to 15. The value represents the number of samples at the declared sampling frequency for the corresponding audio signal phase. p If a nibble value is in range 1 to 15, then this value is the direct number of samples for this phase. If one nibble is 0 then it's counterpart is scaled by 15 (see below). Bytes with two zero nibbles should not be generated and should be skipped and ignored when read. h5 Zero Nibbles p If one nibble value is 0, then the signal effectively does not change for the "duration" of this nibble. Zero nibbles are inserted when the opposite level phase is too long to be encoded in one nibble, that is, if the opposite phase's duration is longer than 15 samples. In addition, if one nibble value is 0, then the opposite nibble is scaled by 15, so that it can span more time. This does not affect accuracy, because the preceeding or following adjacent phase length has no 0 counterpart and is unscaled and can store the remainder. p There are two exceptions: (2009-06-24) To allow starting with a 'low' level: If the first nibble of the data is 0, then the opposite (following) nibble is not scaled. To allow storing odd number of phases: If the last nibble of the data is 0, then the (preceeding) opposite nibble is also not scaled. p Due to the definition that 'the opposite nibble in the same byte' is scaled by 15, this "long" nibble may come after or before the zero nibble and thus the "fractional" nibble either preceeded the "long" nibble or will follow it. h6 Long "low" phase p The "fractional" nibble predeeds the "long" nibble: pre char $88 ; 8 samples high, 8 samples low char $01 ; 0 samples high, 1 x 15 samples low ; = 23 samples low total char ... h6 Long "high" phase p The "fractional" nibble follows the "long" nibble: pre char $10 ; 1 x 15 samples high, 0 samples low char $88 ; 8 samples high, 8 samples low ; = 23 samples high total char ... h6 Example durations for the ZX Spectrum: pre Sampling freq. CPUclock Frequ. 22050 44100 cycles Hz Pilot: 13 27 2168 807 Sync1: 4 8 667 2623 Sync2: 4 9 735 2381 0-Bit: 5 11 855 2047 1-Bit: 11 22 1710 1023 h4 Encoding p Encoders should use the sampling frequency of the built-in audio hardware or an integer fraction thereof, typically 44100/22050/11025 or 48000/24000/12000 Hz. p The recommended default sampling frequency is 22050 Hz. p The encoder may allow the user to choose different "quality settings" or it may adjust the sampling frequency in response to the recorded signal on it's own. p Changing the sampling frequency does not neccessarily change the file size, as one nibble per high and low signal phase is stored and the amount of signal phases does not change. Though the file size may grow considerably if the run length of signal phases can't be stored in one nibble frequently. Also reducing the sampling frequency results in a smaller set of distinct values stored in the file which improves compression thereafter. h4 Decoding p As said in the beginning, "rles" data does not preserve information about signal shape and elongation. It only preserves information about the duration of the high and low audio signal phases. p The decoder must rescale the run lengths from the sampling frequency declared in the "rles" data block to it's own sampling frequency. If encoder and decoder have the same 'natural' sampling frequency then this is a simple task using integer arithmetics; e.g. typically a scale-up from 22050 to 44100 Hz. p If decoder and encoder use different 'natural' sampling frequencies, e.g. if the decoder runs at 48000 Hz and the tape was sampled with 44100 Hz, then most signal phase run lengths will have a fractional part after scaling. However there is no need, and actually it is not even good, to exactly preserve the overall playing time accross signal phases. Rounding of signal phase run lengths should be done immediately on a per-phase base and remaining fractional samples should be discarded and not carried to the next phase. This way the run length of the current signal phase can be rounded best (±0.5 samples) and the carry does not add to the next - opposite! - signal phase's run length. p Thereby the exact playing time of the whole tape will slightly vary, but the decodability will be better. h5 Silence Level p Each byte of the "rles" data defines the duration of one "1" and one "0" phase. p However, this represents the saved signal as emitted at the saving computer's output data pin. But writing to and reading from the tape recorder implies transmission accross analogue signal paths with capacitors which block low frequency signals and direct current. When there is no alternating signal, either because the tape stops or a certain duration of silence is recorded (either generated by permanent high or low level at the saving computer's output pin) then the signal at the reading computer's input pin slowly approximates a final "silence" level which is read as "0" or "1" depending on the reading computer's hardware. p E.g.: the silence level of the EAR input pin of the ZX Spectrum up to issue 2 was "1" and "0" for later PCB revisions. p Therefore when the tape is stopped or when silence is detected, that is, if the upper or lower nibble repeats to be 0, then the decoder must after a certain time toggle from the tape input signal level to the reading machine's silence level (if they are different). h3 "info" Block p Info texts are stored intermixed with "rles" data blocks and refer to the immediately following block. p Info texts may be displayed by the decoder when winding or playing back the rles file. The encoder may try to decode data blocks and their meta info and insert appropriate "info" blocks on it's own. h3 "Rles" Block - Subsequent "RlesTape" Header p There is no "Rles" block. If a "Rles" block is present then this means that the user has concatenated two or more rles files. p A concatenated rles file should always be detected right after opening because the appended files may have different file versions than the starting file. A decoder may play the file as far as it can decode it. An encoder must be able to read and write the highest version number encountered in any of the concatenated files or it must not write to this file at all. If the file is written to, then the superfluxous "RlesTape" magics should be removed and all data converted to the most appropriate (probably highest) version number.