Concept and Interface of
the Cache Package for the virtual motherboard
Concept
Access to RAM over the virtual bus can be slow compared to a builtin simulator.
Therefore, as with real hardware, we use caches to reduce the overhead of the bus.
In the moment we assume having one CPU only, that is, one process that
reads and writes the cache. To make the cache as fast as possible, especially
on systems with several (real) CPUs, other threads - collectively called service threads -
will take care of the interfacing the cache with the virtual bus.
As a general rule, access to the resources should have no, or very little overhead,
in the (normal) case where we do not need the virtual bus.
Because we move access to the virtual bus into separate processes, all forms
of access to the bus is affected. That is, also interrupts, power-on, power-off
and reset, and registration.
Interrupts
This is a simple case, but illustrates the problem. A typical CPU will
check for interrupts once per cycle. If the CPU thread checks for input on the
virtual bus that can be very time consuming. Here is the solution:
We have 64-Bit interrupt vector. The CPU Thread is allowed reading it at any
time. If it is 0, there is no interrupt and the CPU thread can continue. This is the
normal, fast case. If the bitvector is not zero, an interrupt has occurred
and things become different. In this case, the CPU can lock the bit-vector,
copy bits into private variables (typically with an or operation) and reset
the bit-vector to zero.
If an interrupt arrives, the service thread will also lock the bitvector and add a bit to it.
Locking ensures, that write operations on the bit-vector are atomic. The read operation
is not atomic. The CPU gets a snapshot of the bit vector. This is, however, not
a problem. It will see the complete changes, either when locking it or if the
bit-vector is still zero, on the next cycle.
Reset, Power-on, and Power-off
Power is for the CPU thread a read only variable. There is no synchronization
needed. The CPU might miss short power-fails, if it isn't checking fast enough.
Reset needs a kind of acknowledge. Reading the reset variable again needs
no synchronisation. Resetting it to zero needs locking.
Startup and Shutdown
This is done by (one of) the service threads after the threads are started.
The service threads will shut down together and (one of them) will take case
of deregistration.
Cache
Cache organization
A multiway cache is organized as an array of sets of lines. A given
address for memory access is mapped to a set typically by ignoring n low-order Bits and
m high-order Bits. After stripping these bits off, the remaining bits
are the index into the array of sets.
Within the set, there are up to k lines (1<= k).
Each line consists of a start address, 2^n content byte, and again
some administrative information. Next, we match the given address against
the start adresses in the set, neglecting the n low-order bit.
If the given address matches one of the start adresses in the set,
we have a cache hit. The content byte of the line are a valid copy of the
memory at the start adress of the line. In this case, we can use the n low-oder bits of the address
as an indes into the 2^n content byte and access the cache content for the
given address. If there is no match, we have a cache miss. The cache does not
contain any information about the data at the given address.
The number of lines in a set is limited by k. Therefor, we implement a set of
lines as an array with k elements and add to each line a status flag.
If this flag is zero (easy to test) the cache line is part of the set
and read to use.
We provide for a separate data and instruction cache. A second-level cache
might follow at a later point.
Cache operation
If data is requested from the cache, the cache organization
let you find easily a matching cache line. The sets are searched linearly.
If we have cache hit, we can retrieve or store data in the cache line found.
For later use each access on the cache will increment an access counter
(wrap around like a gigantic clock) for the cache as a whole, and stores
the time of the last access in the set.
If we have a cache miss, things are more complex. We have to read the data
over the bus and store it in a cache line, before we can continue in the same
way as with a cache hit. To read the data, we have to interface with the
service tread. To store the data in the cache, we have to find a cache line
that will hold the data. If there is an unused line in the set, we use it.
If all lines are in use, we find the least recently used and reuse it.
Before we can reuse the cache line, we have to check if there were any write
operations on this line. If so, we have to update the memory over the bus before
the reuse.
Locking, Waiting and Threads
The main part of the line is the data_address pair, which contains beside
data and address also status and size. The size is non negative can be less or equal to LINESIZE
The status can be Valid, Invalid, Reading, or Writing.
A line that is invalid is as good as non existing. Only the CPU thread can
change a line from Invalid to something else. No locking is required.
A line that is valid is under complete control of the CPU thread. The service
thread will never touch it.
A line that is valid can be changed by the CPU thread to reading or writing.
It should then forward the line to the service thread, otherwise it will keep that status forever.
A line that is writing is under the control of the service thread. The bus-write thread will
issue a bus message for the write operation and change the status back to valid
and reset the dirty flag. It will signal the CPU thread that a line became
valid again, just in case the CPU thread was waiting for it. If several write requests
are issued, the CPU can still wait only for one of them. If it is not the one it was
waiting for. It can continue waiting.
A line that is reading is under the control of the service thread. The bus-write thread will
issue a bus message for the read operation. Then it will forward the line as pending
to the bus-read thread. When the answer for the read request arrives, the bus-read thread will
check the pending lines, find the line and store the data received. It will then set the
status back to Valid and signals the CPU thread.
Note that only the CPU thread will ever set the address for a line. So what if an answer
arrives and there is no pending line for it? This should not happen in the first place
since requests are made only for lines that are pending.
If it happens anyway, the answer is ignored.
It might very well happen that a line becomes valid but the CPU is not waiting for a signal.
For instance there might be an instruction prefetch that causes a read operation on the
instruction cache. The CPU will issue the read but will not wait for it to complete.
Then the execution may take a different route and the data will never be needed. Or the
execution proceeds as predicted. In this case we will read the instruction, and it might be
in the cache as valid or as reading. In the later case the thread starts waiting for it
until the read completes.
By launching several read or write requests it can happen that the CPU thread can not
find an INVALID or VALID line for the next request. In this case, it will have to wait
until the first line in the set becomes valid.