Garbage Collector for the K1-16/16 CPU
======================================

Garbage Collector:
	New definition:
		Allocated Memory management with no explicit MemFree 
		but a JVM style Garbage Collector with Memory Compactor	
	as opposed to old definition: 
		Allocated Memory management with Reference Counting
		and Garbage Collection meant the Memory Compactor only


Garbage Collection vs. Reference Counting
=========================================

Recent Vcc source compiled with Vicci created code with reference counting for allocated objects.
Vcc made a difference between locking references, which incremented the refcnt, and non-locking references,
which were mostly used for function arguments to eliminate superfluous locking & unlocking with delete-test.
This worked good.


What is the main benefit of a memory allocation model with Garbage Collector?
-----------------------------------------------------------------------------

The main benefit is, that you don't have to delete memory in your code. 

Simplification:
This simplifies the source for programs and reduces the amount of opcode variants, e.g. for CatStr 
which actually could have up to 9 variants: with str or str¢ or range for it's two arguments: 3 x 3 = 9.

Exceptions:
It also simplifies the implementation of exceptions drastically:
With a refcnt model the compiler needs to keep track of allocated objects on the stack in order to destroy them,
in case an exception bypasses the normal stack unwinding at procedure exit. Worse: If the program defines allocated objects
in a local context or just somewhere shortly after the first instructions, then the precise amount of unwinding
varies with the position in the program. Even more worse: The exact position of allocated objects on the stack 
relative to the stack top varies in expressions with every intermediate value pushed and popped.
With a GC model there is no need for all this: The compiler just remembers the stack position at the 'try' instruction
and resets the stack to this, if an exception occurs. 

Other benefit:
The GC can detect unreachable rings of objects, which would otherwise keep themselves alive for ever.


What is the main drawback of a GC?
----------------------------------

The Garbage Collector needs to find every living allocated object. In a memory-limited computer like the K1 computer
it is also no option to implement a so-called 'conservative' GC, which blindly believes that every value it sees in 
RAM might be a pointer to allocated memory, which produces many false positives; and therefore always some unused memory 
is not freed during a GC run. 
We need a 'precise' GC. This means, the GC needs knowledge of where to find local and global objects and knowledge of
objects in these objects.

'knowledge of objects in objects': This means it needs a class description associated with ever memory block.
This is quite good: quickly we also use this for virtual functions and reflections. super! :-)

'find global objects': Global variables could be viewed as members in an instance of a "Globals" class for which we 
simply provide a class description as for the regular classes.

'find local objects': These are all on the variables stack, at some times some also on the return stack, by now.
The GC cannot tell which are just numbers and which are pointers to allocated memory. Since the memory model of the 
K1 computer uses handles to movable memory blocks, and these handles are allocated in one area, an address range of 
only at most 8k out of 64k can possibly be a valid memory reference. A conservative GC could exclude at least 7/8 of
all values on the stack. A precise GC requires information which value actually _is_ a memory reference.

A solution could be to put objects on a separate stack. This requires at least one more address register and maybe 
a register for the top memory object. Also memory usage for Vstack and Rstack plus Mstack would be less optimal.

Another solution could be to put memory objects on the return stack. They could be clearly told apart from 
return addresses, which never can point into the memory handles range. An advantage would be, that we could use
this for return to dynamic code in allocated memory too, which needs a reference to the memory block (on the Rstack)
and an offset (on the Vstack or maybe on the Rstack as well). If we want to implement something like that.
Putting arguments on the return stack requires some fiddling with return addresses for the compiler.


Implementation
==============

Memory management is similar to RefCnt management:
Memory blocks are allocated in a free area (the heap) from bottom upwards and handles (pointers) to these blocks 
are allocated from top downwards. Variables point to the handles. Handles have no refcnt.
Each handle is either in the free list or points to a memory block, which may be inside or outside the heap.
EACH memory block is referenced by exactly ONE handle.
Memory blocks start with a length byte (for Arrays) OR a reference to the class description object (for Objects).

Levels of implementation:

1. 	no change to the compiler: we just ignore the difference between str and str¢ and Pop and PopStr.
	no GC at all. (The system will crash after a while. Good enough for testing.)
2.	Add class descriptions and store references to them in objects, in place of the length byte.
	Adapt all opcodes which read or write the length byte.
3.	Add a 'semi precise' GC: It knows how to parse the class description. All global data is made an instance of the 
	Globals class with an appropriate class description. All values on Vstack and Rstack are treated as possible
	references to allocated memory, but only an address range of some 100s (at most 8k) actually point into the
	address range of handles.
4.	Modify the compiler to put allocated objects on the return stack.
5.	Modify the GC to no longer scan the Vstack, which should make it a 'precise' GC. (neglecting that sometimes values
 	are pushed on the Rstack which may be still misinterpreted as a memory reference. So we still have to be careful.)


The Garbage Collection:
-----------------------

This is now more complicated and takes longer. 
Initially written in Vcc, maybe with some hand-optimized Millicode and later partially rewritten in Microcode.

Interrupts are enabled during a GC, but the GC needs to poll them, because the GC is executed entierly in the Microcode Rom.
Interrupt code must not access objects in the heap.
Interrupt code must not allocate, deallocate or resize memory in the system area. 
 
Memory blocks outside the heap are not marked and swept by the GC. 
They are never disposed of or moved by the GC.
They can be accessed by interrupt code while the GC is running, 
	 but the GC must poll interrupts, because the GC executes in Microcode Rom only.
They must not contain (refer to) objects in the heap, because these may look "unreachable" to the GC 
	 and are not accessible for interrupt code during a GC.

Memory outside heap in program area:
	Mostly string literals and maybe some array literals in program code.
	There are no Millicode opcodes for object literals or array of object literals.
	At boot time some objects, e.g. Class objects, might be stored in the program area. (mostly deprecated)
	If a program is unloaded, all living references to Objects in program code must be moved into the heap.

The "System" area:
	Objects, which must be accessible at interrupt time, may also be placed in a special "system" area:
	This is the bottom part of the heap protected by raising heap_start. All objects and arrays in this area 
	must only refer to objects outside the heap: either in the "system" area or in the program area.
	If they are created, deleted or resized then this area is resized, which may require a GC in the heap, 
	and then the object is resized etc. and higher objects are moved accordingly with interrupts off. 
	Changes to objects in the system area are possible but should be rare. 
	Management of objects in the system area requires special variants of memory allocating opcodes.


phase 1: mark: 
Mark all memory in the heap as potentially unreachable. 
This is done by setting the msb of the length byte of all memory blocks in the heap. 

phase 2: sweep: 
Clear the "unreachable" bit in all reachable memory: 
We need a method which cannot fail. Fur minimum space requirement recursions are rewritten to use tables.
In case of table overflow the GC must be able to recover.

phase 2A:
For all bytes on all Vstacks and all Rstacks and maybe some other places,		(note: semi precise GC)	
	if the byte points inside the address range of memory handles,
	and if the handle is in use and points into the heap, then
		clear the "unreachable" bit in the addressed memory block.

Now in the handles area are handles to presumably "unreachable" blocks, and "reachable" blocks
with childs which may still be marked "unreachable" and must be marked "reachable". (not yet "inspected" blocks.)
What we need is a handles area with truely unreachable blocks and "inspected" reachable blocks.
So let's split the handles area with a "slider" into a lower (currently empty) region with only inspected handles
and an upper area of maybe not yet inspected handles, which we'll shrink to zero by advancing the slider:

phase 2B:
While slider did not reach top of handles inspect and skip handle at slider:
	if it's not free handle
	and if it points inside the heap
	and if it points to a "reachable" block or a block without finalizer:
		if it's a flat object or array of basic types, then there is nothing to inspect; skip it.
		else its an Object or Array of Objects: inspect it.

inspect:
	for all contained Objects (note: incl. the Class object)
		if the handle is not NULL and points to an unreachable block:
			mark_reachable()		(note: will add objects to gc_table[], may restart phase 2B!)
	while gc_table[] is not empty:
		pop the last object and inspect it.

mark_reachable:
	marke the object reachable.
	if it points above the slider: it will be inspected later anyway: done
	if it's an array of basic types: there is nothing to inspect: done
	now: it's a not-yet inspected reachable block on the wrong side of the slider.
	put it in the gc_table[] for later non-recursive inspection:
		if there is space in the table: store it. done.
		else reset phase 2B: 
		reset slider and clear table. 
		goodie: inspect objects above slider to reduce the risk of another reset of phase 2B.


All memory still marked as "unreachable" IS unreachable 
but blocks with finalizers must be preserved until their finalizer was run:

phase 3: Find and Register Finalizers
Find all unreachable memory blocks which have a finalizer code.
	mark them reachable
	while there is space in the finalizers[] table, put them in
	else just don't: the finalizer will be called in the nect GC


All memory still marked as "unreachable" IS unreachable 
and will now be disposed:

phase 4: Prepare for non-quadratic Memory Compaction:
Exchange pointer and length byte in all handle - memory pairs:
For all memory handles:
	If the handle points into the heap, then
		if the memory block is still marked unreachable
			add the handle to the free_handles list.
			clear the msb of the memory block's length byte (it was used for the "unreachable" bit)
			so that it can be told apart from a pointer into the handles area
		else
			store the length byte of the memory block in the handle and
			store the address of the handle in the length byte of the memory block.

phase 5: Memory Compaction:
For all memory blocks in the heap:
	If the length byte is a pointer into the handles region,
	then this is a reachable block:
		move memory down
		store length byte from handle in the length byte
		store address of memory block in the handle	
	else this is an unreachable block:
		step over it
Clear the reclaimed memory.		
Store the final gap_start pointer.

phase 6: Run Finalizers:
note: unlike Java we guarantee that the finalizer is called, unless the system crashes…
For all handles in the finalizer list: 	
	call the finalizer of the object.
	clear finalizer bit in the memory block's length byte: memory will be reclaimed in the next GC
	catch outofmemory: 
		don't clear the finalizer bit. chances are we can run it later.
	catch other exceptions: 
		log error, clear finalizer bit. give up with this finalizer.


The Length Byte:  (note: bytes are 16 bit wide)
----------------

Some information is stored in the 8 msbits of the length byte of memory blocks:

%1.......	msb toggled during Garbage Collection: block may be "unreachable".
%.0......	Flat Object or Array: non-object items only. Length = lower 14 bits. (16k-1 max). 
%.10.....	Array of Objects. Length = lower 13 bits. (8k-1 max). 

%.11.....	Object with type info. The first data member (the first byte after the length byte) is the Class object.
%.111....	Object with Finalizer which was not yet executed. The finalizer can be found in the Class object.
			When the finalizer was run at the end of a GC, this bit is reset and the object disposed in the next GC.
%.11.abcd	reserved.
			

Class "Class" and Class Objects
-------------------------------

All Objects are objects of a class and their first data member is a "Class Object".
Class Objects describe classes. They are Objects of the class "Class". 
They are named after the class they describe. (unless this results in syntactical hickup)
For each class defined in a program the compiler creates a Class Object.
Class "Class" itself is described by a Class Object. 
The Class Object of Class Objects is the Class Object for class "Class" or "NamedClass".

Eventually all Objects are based on a baseclass named "Object".
Then data member "class" could be managed by the programmer, 
but it adds a level of recursion to Classes in the GC
and does not eliminate all special handling for Objects for the compiler.


	type Class =
	{
		Class 	class;		// Class Object of this Class Object: the Class Object for class "Class"
							// eventually not visible in sources.

		uint	length;		// length of objects of this class plus the msbits
							// this becomes the memory block's length byte.
							// required for allocating memory for an object of this class.
	
		Class	baseclass;	// if this class is based on a parent class, else NULL.
	
		uint[]	o_offsets;	// list of offsets of all subobjects and subarrays in objects of this class
							// including offsets from base class. Needed by the GC. May be NULL.
	
		void()[] vtable;	// list of virtual functions. Incl. those from the base class,
							// either the same or the overloaded function. May be NULL.
							// if class has a finalizer then vtable[0] = finalizer. (length = %0111...)
	}
	
	type NamedClass = Class +
	{
		str		name;		// class name incl. names of surrounding scopes
		str[]	m_names;	// list of names of data members, excluding base class and hidden ones. May be NULL.
		Class[]	m_types;	// list of types of data members, excluding base class and hidden ones. May be NULL.
							// some special, low values indicate basic types + bitmask to indicate arrays.
		uint[]	v_types;	// typedefs for the functions in the vtable[]. Or maybe FQNs. Todo...
	}


A class info of type "Class" contains only informations required for allocation, GC and for virtual functions.
A class info of type "NamedClass" adds names and type infos for it's data members.
This info is useful for reflection and to display the object in a debugger's variables view.
Which one is used is a matter of settings in the compiler.
Named classes are not used by default because they are so fat: they allocate 4 more objects in the heap.
	
	global Class Class = 
	{
		// Class Object for this Class Object itself:
		// class = Class;					// (added by compiler, invisible in source)
		
		// Description for class "Class":
		length = 5 + %0110 << 12;			// Object with type info, no finalizer
		baseclass = NULL;
		o_offsets = [2,3,4];				// note: the Class Object needs not be in here
		vtable = NULL;		
	}

	global Class NamedClass = 
	{
		// Class Object for this Class Object itself:
		// class = Class;					// (added by compiler, invisible in source)
		
		// Description for class "NamedClass":
		length = 9 + %0110 << 12;			// Object with type info, no finalizer
		baseclass = Class;
		o_offsets = [2,3,4,5,6,7,8];
		vtable = NULL;
	}

with support for Debugger and Reflection:

	global NamedClass Class = 
	{
		// Class Object for this Class Object itself:
		// class = NamedClass;				// (added by compiler, invisible in source)

		// Description for class "Class":
		length = 5 + %0110 << 12;			// Object with type info, no finalizer
		baseclass = NULL;
		o_offsets = [2,3,4];				// note: the Class Object needs not be in here
		vtable = NULL;		

		name = "Class";
		m_names = [];		// TODO
		m_types = [];		// TODO
		v_types = [];		// TODO
	}

	global NamedClass NamedClass = 
	{
		// Class Object for this Class Object itself:
		// class = NamedClass;				// (added by compiler, invisible in source)

		// Description for class "NamedClass":
		length = 9 + %0110 << 12;			// Object with type info, no finalizer
		baseclass = Class;
		o_offsets = [2,3,4,5,6,7,8];
		vtable = NULL;

		name = "NamedClass";
		m_names = [];		// TODO
		m_types = [];		// TODO
		v_types = [];		// TODO
	}


The Handles:
------------

Handles are pointers to memory blocks inside or outside heap, and are stored at the top of the heap growing downwards.
There's an unidirectional list running through all free handles starting at free_handles. 
Whether a handle is free or used can be determined by it's address: if it points above handles_start is's a free handle.

With the 'semi precise' GC, we must consider that small negative numbers on the stacks are mistaken for handle references.
The most common negative numbers are the very small ones, which seem to point to the topmost handles.
Since these handles were allocated during system start for final objects, which never become unreferenced,
there is no problem for misinterpreting small negative numbers as pointers to handles. They just "help" to 
keep these objects alive.


Finalizers
----------

Unlike JAVA, finalizers are guaranteed to be called. This simplifies throwing exceptions.
Finalizers are called at the end of a Garbage Collection, when the Heap was already compacted.
It is not guaranteed that finalizers are called in the next GC, because they are collected in a small table which may overflow.
Then the finalizer is called in a later GC.

All objects referenced by unreachable blocks with finalizers are preserved until the finalizer was executed.

All finalizers for all unreachable blocks are called in a totally arbitrary order.
With no respect to any object hierarchy or time of last access or creation or when they became unreachable.
When a finalizer runs, all finalizers in all other accessible objects may or may not yet have been executed.
There might be a test for the finalizer bit, but this won't help to detect 'outer' finalizers in case of recursion.

Finalizers should not do too much work. But still they could trigger another (recursive) garbage collection.
This is detected and finalizers are not called inside the recursive GC.

If a finalizer throws an outofmemory exception, then it will be called again at the end of the next GC.
If a finalizer throws any other exception, this will be ignored and the Object marked as "finalized".

Programs may call finalizers directly, before an objects becomes unreachable.
Programs should call finalizers directly for local objects because then they are disposed of earlier.

A finalizer should not call itself.