Garbage Collector for the K1-16/16 CPU ====================================== Garbage Collector: New definition: Allocated Memory management with no explicit MemFree but a JVM style Garbage Collector with Memory Compactor as opposed to old definition: Allocated Memory management with Reference Counting and Garbage Collection meant the Memory Compactor only Garbage Collection vs. Reference Counting ========================================= Recent Vcc source compiled with Vicci created code with reference counting for allocated objects. Vcc made a difference between locking references, which incremented the refcnt, and non-locking references, which were mostly used for function arguments to eliminate superfluous locking & unlocking with delete-test. This worked good. What is the main benefit of a memory allocation model with Garbage Collector? ----------------------------------------------------------------------------- The main benefit is, that you don't have to delete memory in your code. Simplification: This simplifies the source for programs and reduces the amount of opcode variants, e.g. for CatStr which actually could have up to 9 variants: with str or str¢ or range for it's two arguments: 3 x 3 = 9. Exceptions: It also simplifies the implementation of exceptions drastically: With a refcnt model the compiler needs to keep track of allocated objects on the stack in order to destroy them, in case an exception bypasses the normal stack unwinding at procedure exit. Worse: If the program defines allocated objects in a local context or just somewhere shortly after the first instructions, then the precise amount of unwinding varies with the position in the program. Even more worse: The exact position of allocated objects on the stack relative to the stack top varies in expressions with every intermediate value pushed and popped. With a GC model there is no need for all this: The compiler just remembers the stack position at the 'try' instruction and resets the stack to this, if an exception occurs. Other benefit: The GC can detect unreachable rings of objects, which would otherwise keep themselves alive for ever. What is the main drawback of a GC? ---------------------------------- The Garbage Collector needs to find every living allocated object. In a memory-limited computer like the K1 computer it is also no option to implement a so-called 'conservative' GC, which blindly believes that every value it sees in RAM might be a pointer to allocated memory, which produces many false positives; and therefore always some unused memory is not freed during a GC run. We need a 'precise' GC. This means, the GC needs knowledge of where to find local and global objects and knowledge of objects in these objects. 'knowledge of objects in objects': This means it needs a class description associated with ever memory block. This is quite good: quickly we also use this for virtual functions and reflections. super! :-) 'find global objects': Global variables could be viewed as members in an instance of a "Globals" class for which we simply provide a class description as for the regular classes. 'find local objects': These are all on the variables stack, at some times some also on the return stack, by now. The GC cannot tell which are just numbers and which are pointers to allocated memory. Since the memory model of the K1 computer uses handles to movable memory blocks, and these handles are allocated in one area, an address range of only at most 8k out of 64k can possibly be a valid memory reference. A conservative GC could exclude at least 7/8 of all values on the stack. A precise GC requires information which value actually _is_ a memory reference. A solution could be to put objects on a separate stack. This requires at least one more address register and maybe a register for the top memory object. Also memory usage for Vstack and Rstack plus Mstack would be less optimal. Another solution could be to put memory objects on the return stack. They could be clearly told apart from return addresses, which never can point into the memory handles range. An advantage would be, that we could use this for return to dynamic code in allocated memory too, which needs a reference to the memory block (on the Rstack) and an offset (on the Vstack or maybe on the Rstack as well). If we want to implement something like that. Putting arguments on the return stack requires some fiddling with return addresses for the compiler. Implementation ============== Memory management is similar to RefCnt management: Memory blocks are allocated in a free area (the heap) from bottom upwards and handles (pointers) to these blocks are allocated from top downwards. Variables point to the handles. Handles have no refcnt. Each handle is either in the free list or points to a memory block, which may be inside or outside the heap. EACH memory block is referenced by exactly ONE handle. Memory blocks start with a length byte (for Arrays) OR a reference to the class description object (for Objects). Levels of implementation: 1. no change to the compiler: we just ignore the difference between str and str¢ and Pop and PopStr. no GC at all. (The system will crash after a while. Good enough for testing.) 2. Add class descriptions and store references to them in objects, in place of the length byte. Adapt all opcodes which read or write the length byte. 3. Add a 'semi precise' GC: It knows how to parse the class description. All global data is made an instance of the Globals class with an appropriate class description. All values on Vstack and Rstack are treated as possible references to allocated memory, but only an address range of some 100s (at most 8k) actually point into the address range of handles. 4. Modify the compiler to put allocated objects on the return stack. 5. Modify the GC to no longer scan the Vstack, which should make it a 'precise' GC. (neglecting that sometimes values are pushed on the Rstack which may be still misinterpreted as a memory reference. So we still have to be careful.) The Garbage Collection: ----------------------- This is now more complicated and takes longer. Initially written in Vcc, maybe with some hand-optimized Millicode and later partially rewritten in Microcode. Interrupts are enabled during a GC, but the GC needs to poll them, because the GC is executed entierly in the Microcode Rom. Interrupt code must not access objects in the heap. Interrupt code must not allocate, deallocate or resize memory in the system area. Memory blocks outside the heap are not marked and swept by the GC. They are never disposed of or moved by the GC. They can be accessed by interrupt code while the GC is running, but the GC must poll interrupts, because the GC executes in Microcode Rom only. They must not contain (refer to) objects in the heap, because these may look "unreachable" to the GC and are not accessible for interrupt code during a GC. Memory outside heap in program area: Mostly string literals and maybe some array literals in program code. There are no Millicode opcodes for object literals or array of object literals. At boot time some objects, e.g. Class objects, might be stored in the program area. (mostly deprecated) If a program is unloaded, all living references to Objects in program code must be moved into the heap. The "System" area: Objects, which must be accessible at interrupt time, may also be placed in a special "system" area: This is the bottom part of the heap protected by raising heap_start. All objects and arrays in this area must only refer to objects outside the heap: either in the "system" area or in the program area. If they are created, deleted or resized then this area is resized, which may require a GC in the heap, and then the object is resized etc. and higher objects are moved accordingly with interrupts off. Changes to objects in the system area are possible but should be rare. Management of objects in the system area requires special variants of memory allocating opcodes. phase 1: mark: Mark all memory in the heap as potentially unreachable. This is done by setting the msb of the length byte of all memory blocks in the heap. phase 2: sweep: Clear the "unreachable" bit in all reachable memory: We need a method which cannot fail. Fur minimum space requirement recursions are rewritten to use tables. In case of table overflow the GC must be able to recover. phase 2A: For all bytes on all Vstacks and all Rstacks and maybe some other places, (note: semi precise GC) if the byte points inside the address range of memory handles, and if the handle is in use and points into the heap, then clear the "unreachable" bit in the addressed memory block. Now in the handles area are handles to presumably "unreachable" blocks, and "reachable" blocks with childs which may still be marked "unreachable" and must be marked "reachable". (not yet "inspected" blocks.) What we need is a handles area with truely unreachable blocks and "inspected" reachable blocks. So let's split the handles area with a "slider" into a lower (currently empty) region with only inspected handles and an upper area of maybe not yet inspected handles, which we'll shrink to zero by advancing the slider: phase 2B: While slider did not reach top of handles inspect and skip handle at slider: if it's not free handle and if it points inside the heap and if it points to a "reachable" block or a block without finalizer: if it's a flat object or array of basic types, then there is nothing to inspect; skip it. else its an Object or Array of Objects: inspect it. inspect: for all contained Objects (note: incl. the Class object) if the handle is not NULL and points to an unreachable block: mark_reachable() (note: will add objects to gc_table[], may restart phase 2B!) while gc_table[] is not empty: pop the last object and inspect it. mark_reachable: marke the object reachable. if it points above the slider: it will be inspected later anyway: done if it's an array of basic types: there is nothing to inspect: done now: it's a not-yet inspected reachable block on the wrong side of the slider. put it in the gc_table[] for later non-recursive inspection: if there is space in the table: store it. done. else reset phase 2B: reset slider and clear table. goodie: inspect objects above slider to reduce the risk of another reset of phase 2B. All memory still marked as "unreachable" IS unreachable but blocks with finalizers must be preserved until their finalizer was run: phase 3: Find and Register Finalizers Find all unreachable memory blocks which have a finalizer code. mark them reachable while there is space in the finalizers[] table, put them in else just don't: the finalizer will be called in the nect GC All memory still marked as "unreachable" IS unreachable and will now be disposed: phase 4: Prepare for non-quadratic Memory Compaction: Exchange pointer and length byte in all handle - memory pairs: For all memory handles: If the handle points into the heap, then if the memory block is still marked unreachable add the handle to the free_handles list. clear the msb of the memory block's length byte (it was used for the "unreachable" bit) so that it can be told apart from a pointer into the handles area else store the length byte of the memory block in the handle and store the address of the handle in the length byte of the memory block. phase 5: Memory Compaction: For all memory blocks in the heap: If the length byte is a pointer into the handles region, then this is a reachable block: move memory down store length byte from handle in the length byte store address of memory block in the handle else this is an unreachable block: step over it Clear the reclaimed memory. Store the final gap_start pointer. phase 6: Run Finalizers: note: unlike Java we guarantee that the finalizer is called, unless the system crashes… For all handles in the finalizer list: call the finalizer of the object. clear finalizer bit in the memory block's length byte: memory will be reclaimed in the next GC catch outofmemory: don't clear the finalizer bit. chances are we can run it later. catch other exceptions: log error, clear finalizer bit. give up with this finalizer. The Length Byte: (note: bytes are 16 bit wide) ---------------- Some information is stored in the 8 msbits of the length byte of memory blocks: %1....... msb toggled during Garbage Collection: block may be "unreachable". %.0...... Flat Object or Array: non-object items only. Length = lower 14 bits. (16k-1 max). %.10..... Array of Objects. Length = lower 13 bits. (8k-1 max). %.11..... Object with type info. The first data member (the first byte after the length byte) is the Class object. %.111.... Object with Finalizer which was not yet executed. The finalizer can be found in the Class object. When the finalizer was run at the end of a GC, this bit is reset and the object disposed in the next GC. %.11.abcd reserved. Class "Class" and Class Objects ------------------------------- All Objects are objects of a class and their first data member is a "Class Object". Class Objects describe classes. They are Objects of the class "Class". They are named after the class they describe. (unless this results in syntactical hickup) For each class defined in a program the compiler creates a Class Object. Class "Class" itself is described by a Class Object. The Class Object of Class Objects is the Class Object for class "Class" or "NamedClass". Eventually all Objects are based on a baseclass named "Object". Then data member "class" could be managed by the programmer, but it adds a level of recursion to Classes in the GC and does not eliminate all special handling for Objects for the compiler. type Class = { Class class; // Class Object of this Class Object: the Class Object for class "Class" // eventually not visible in sources. uint length; // length of objects of this class plus the msbits // this becomes the memory block's length byte. // required for allocating memory for an object of this class. Class baseclass; // if this class is based on a parent class, else NULL. uint[] o_offsets; // list of offsets of all subobjects and subarrays in objects of this class // including offsets from base class. Needed by the GC. May be NULL. void()[] vtable; // list of virtual functions. Incl. those from the base class, // either the same or the overloaded function. May be NULL. // if class has a finalizer then vtable[0] = finalizer. (length = %0111...) } type NamedClass = Class + { str name; // class name incl. names of surrounding scopes str[] m_names; // list of names of data members, excluding base class and hidden ones. May be NULL. Class[] m_types; // list of types of data members, excluding base class and hidden ones. May be NULL. // some special, low values indicate basic types + bitmask to indicate arrays. uint[] v_types; // typedefs for the functions in the vtable[]. Or maybe FQNs. Todo... } A class info of type "Class" contains only informations required for allocation, GC and for virtual functions. A class info of type "NamedClass" adds names and type infos for it's data members. This info is useful for reflection and to display the object in a debugger's variables view. Which one is used is a matter of settings in the compiler. Named classes are not used by default because they are so fat: they allocate 4 more objects in the heap. global Class Class = { // Class Object for this Class Object itself: // class = Class; // (added by compiler, invisible in source) // Description for class "Class": length = 5 + %0110 << 12; // Object with type info, no finalizer baseclass = NULL; o_offsets = [2,3,4]; // note: the Class Object needs not be in here vtable = NULL; } global Class NamedClass = { // Class Object for this Class Object itself: // class = Class; // (added by compiler, invisible in source) // Description for class "NamedClass": length = 9 + %0110 << 12; // Object with type info, no finalizer baseclass = Class; o_offsets = [2,3,4,5,6,7,8]; vtable = NULL; } with support for Debugger and Reflection: global NamedClass Class = { // Class Object for this Class Object itself: // class = NamedClass; // (added by compiler, invisible in source) // Description for class "Class": length = 5 + %0110 << 12; // Object with type info, no finalizer baseclass = NULL; o_offsets = [2,3,4]; // note: the Class Object needs not be in here vtable = NULL; name = "Class"; m_names = []; // TODO m_types = []; // TODO v_types = []; // TODO } global NamedClass NamedClass = { // Class Object for this Class Object itself: // class = NamedClass; // (added by compiler, invisible in source) // Description for class "NamedClass": length = 9 + %0110 << 12; // Object with type info, no finalizer baseclass = Class; o_offsets = [2,3,4,5,6,7,8]; vtable = NULL; name = "NamedClass"; m_names = []; // TODO m_types = []; // TODO v_types = []; // TODO } The Handles: ------------ Handles are pointers to memory blocks inside or outside heap, and are stored at the top of the heap growing downwards. There's an unidirectional list running through all free handles starting at free_handles. Whether a handle is free or used can be determined by it's address: if it points above handles_start is's a free handle. With the 'semi precise' GC, we must consider that small negative numbers on the stacks are mistaken for handle references. The most common negative numbers are the very small ones, which seem to point to the topmost handles. Since these handles were allocated during system start for final objects, which never become unreferenced, there is no problem for misinterpreting small negative numbers as pointers to handles. They just "help" to keep these objects alive. Finalizers ---------- Unlike JAVA, finalizers are guaranteed to be called. This simplifies throwing exceptions. Finalizers are called at the end of a Garbage Collection, when the Heap was already compacted. It is not guaranteed that finalizers are called in the next GC, because they are collected in a small table which may overflow. Then the finalizer is called in a later GC. All objects referenced by unreachable blocks with finalizers are preserved until the finalizer was executed. All finalizers for all unreachable blocks are called in a totally arbitrary order. With no respect to any object hierarchy or time of last access or creation or when they became unreachable. When a finalizer runs, all finalizers in all other accessible objects may or may not yet have been executed. There might be a test for the finalizer bit, but this won't help to detect 'outer' finalizers in case of recursion. Finalizers should not do too much work. But still they could trigger another (recursive) garbage collection. This is detected and finalizers are not called inside the recursive GC. If a finalizer throws an outofmemory exception, then it will be called again at the end of the next GC. If a finalizer throws any other exception, this will be ignored and the Object marked as "finalized". Programs may call finalizers directly, before an objects becomes unreachable. Programs should call finalizers directly for local objects because then they are disposed of earlier. A finalizer should not call itself.