MICRONOTES
================================================================================
Note 9.0                    LSI-11/73 Cache Concepts                  No replies
JAWS::KAISER                                        307 lines  25-MAR-1985 09:18
--------------------------------------------------------------------------------
      +---------------+					   +-----------------+
      | d i g i t a l |					   |  uNOTE # 009    |
      +---------------+					   +-----------------+


      +----------------------------------------------------+-----------------+
      | Title: Cache Concepts and the LSI-11/73		   | Date: 02-JUL-84 |
      +----------------------------------------------------+-----------------+
      | Originator: Charlie Giorgetti			   | Page 1 of 6     |
      +----------------------------------------------------+-----------------+


      The goal is to  introduce	 the  concept  of  cache  and  its  particular
      implementation  on  the  LSI-11/73  (KDJ11-A).   This  is not a detailed
      discussion of the different cache	 organizations	and  their  impact  on
      system performance.

			      What Is A Cache ?
			      -----------------

      The purpose of having a cache is to simulate a  system  having  a	 large
      amount of moderately fast memory.	 To do this the cache system relies on
      a small amount of very fast,  easily  accessed  memory  (the  cache),  a
      larger  amount of slower, less expensive memory (the backing store), and
      the statistics of program behavior.

      The goal is to store some of the data and its  associated	 addresses  in
      the  cache  and  all  of	the data at its usual addresses (including the
      currently cached data) in the backing store.  If it can be arranged that
      most  of	the  time  when the processor needs data it is located in fast
      memory, then the program will execute more quickly,  slowing  down  only
      occasionally  for	 main memory operations.  The placement of data in the
      cache should not be a concern to the programmer but is a consequence  of
      how the cache functions.

      Figure 1 is an example of a memory organization  showing	a  cache  with
      backing  store.	If  the	 data needed by the microprocessor (uP) can be
      found in the cache then it is accessed much faster due to the local data
      path  and faster cache memory than by having to access the backing store
      on the slower system bus.


				     +-----------+	 System Bus
	+------+  CPU Internal Buses |	System	 | For Memory and I/O Options
	|      |<-----------------+->|	 Bus	 |<-------------------------->
	|  uP  |		  |  | Interface |	      |
	|      |<----------+	  |  +-----------+	      |
	+------+ Fast Path |	  |		      +-------+---------+
		 to Cache  |	  |		      |			|
			 +-+------+--+		      |	 System Memory	|
			 |	     |		      | (Backing Store) |
			 |   Cache   |		      |			|
			 |	     |		      +-----------------+
			 +-----------+

		Figure 1 - An Example Memory System with Cache

									Page 2


      A cache memory system can only work if it can successfully predict  most
      of  the  time  what  memory  locations  the  program will require.  If a
      program accessed data from memory in a  completely  random  fashion,  it
      would  be impossible to predict what data would be needed next.  If this
      was the case a cache would operate no better then a conventional	memory
      system.

      Programs rarely generate random addresses.  In many cases the subsequent
      memory  address  referenced  is  often  very  near  the  current address
      accessed.	 This is the principle of program locality.  The next  address
      generated	 is in the neighborhood of the current address.	 This behavior
      helps makes cache systems feasible.

      The concept of program locality is not  always  adhered  to,  but	 is  a
      statement	 of how many programs behave.  Many programs execute code in a
      linear fashion or in loops with  predictable  results  in	 next  address
      generation.   Jumps  and context switching give the appearance of random
      address generation.  The ability to determine what word a	 program  will
      reference	 next is never completely successful and therefore the correct
      "guesses" are a statistical measure of the size and organization of  the
      cache, and the behavior of the program being executed.

      The measure of a cache performance is a statistical  evaluation  of  the
      number  of  memory  references  found  versus  not found in cache.  When
      memory is referenced and the address is found in the cache this is known
      as  a  hit.   When  it is not it is termed a miss.  Cache performance is
      usually stated in terms of the hit ratio or the miss ratio  where	 these
      are defined as:


					Number of Cache Hits
		      Hit Ratio =  ---------------------------------
				   Total Number of Memory References



		      Miss Ratio =  1 - Hit Ratio




		      The LSI-11/73 Cache Implementation
		      ----------------------------------

      The cache organization chosen must be one that can be implemented within
      the physical and cost constraints of the design.

      The LSI-11/73 implements a direct map cache.  A direct map  organization
      has a single unique cache location for a given address and this is where
      the associated data from backing store are maintained.   This  means  an
      access to cache requires one address comparison to determine if there is
      a hit.  The significance of this is that a small amount of circuitry  is
      required	to  perform  the comparison operation.	The LSI-11/73 has an 8
      KByte cache.  This means that there are 4096  unique  address  locations
      each of which stores two bytes of information.

									Page 3


      The cache not only maintains the data from backing  store	 but  it  also
      includes other information that is needed to determine if its content is
      valid.  These are	 parity	 detection  and	 valid	entry  checking.   The
      following	 diagram  shows	 the logical layout of the cache and what each
      field and its associated address in the cache is used for.


	  Binary Cache
	  Entry Address	   P   V      TAG	P1     B1      P0     B0
			 +---+---+-------------+---+----------+---+----------+
	   000000000000	 |   |	 |	       |   |	      |	  |	     |
			 +---+---+-------------+---+----------+---+----------+
	   000000000001	 |   |	 |	       |   |	      |	  |	     |
			 +---+---+-------------+---+----------+---+----------+
	   000000000010	 |   |	 |	       |   |	      |	  |	     |
			 +---+---+-------------+---+----------+---+----------+
		.				 .
		.				 .
		.				 .

			 +---+---+-------------+---+----------+---+----------+
	   111111111101	 |   |	 |	       |   |	      |	  |	     |
			 +---+---+-------------+---+----------+---+----------+
	   111111111110	 |   |	 |	       |   |	      |	  |	     |
			 +---+---+-------------+---+----------+---+----------+
	   111111111111	 |   |	 |	       |   |	      |	  |	     |
			 +---+---+-------------+---+----------+---+----------+

		      Figure 2 - LSI-11/73 Cache Layout


      The Cache Entry Address is the address of one of 4096 entries within the
      cache.   This  value  has a one-to-one relationship with a field in each
      address that is generated	 by  the  processor  (described	 in  the  next
      section on how the physical address accesses cache).

      Each field has the following meaning:

	  Tag (TAG) - This nine bit field  contains  information  that	is
	  compared  to the address label, described in the next section on
	  how the physical address  accesses  cache.   When  the  physical
	  address  is  generated, the address label is compared to the tag
	  field.  If there is a match it can be considered a hit  provided
	  that there is entry validation and no parity errors.

	  Cache Data (B0 and B1) - These two bytes  are	 the  actual  data
	  stored in cache.

	  Valid Bit (V) - The valid bit indicates whether the  information
	  in B0 and B1 is usable as data if a cache hit occurs.	 The valid
	  bit is set when the entry is allocated  during  a  cache  update
	  which occurs as a result of a miss.

	  Tag Parity Bit - (P) - Even  parity  calculated  for	the  value
	  stored in the tag field.

									Page 4


	  Parity Bits (P0 and P1) - P0 is even parity calculated  for  the
	  data	byte  B0 and P1 is odd parity calculated for the data byte
	  B1.

      When the processor generates a  physical	address,  the  on-board	 cache
      control  logic must determine if there is a hit by looking at the unique
      location in cache.  To determine	what  location	to  check,  the	 cache
      control logic considers each address generated as being made up of three
      unique parts.  The following are the three fields of  a  22-bit  address
      (in  an  unmapped	 or  18-bit environment the label field is six or four
      bits less respectfully):


       21 20 19 18 17 16 15 14 13    12 11 10 09 08 07 06 05 04 03 02 01    00
      +--+--+--+--+--+--+--+--+--+  +--+--+--+--+--+--+--+--+--+--+--+--+  +--+
      |	 |  |  |  |  |	|  |  |	 |  |  |  |  |	|  |  |	 |  |  |  |  |	|  |  |
      +--+--+--+--+--+--+--+--+--+  +--+--+--+--+--+--+--+--+--+--+--+--+  +--+

      |<-------- LABEL --------->|  |<-------------- INDEX ------------>|  BYTE
									  SELECT

    Figure 2 - Components of a 22-bit Address For Cache Address Selection

      Each field has the following meaning:

	  Index - This twelve bit field determines which one of	 the  4096
	  cache	 data  entries to compare with for a cache hit.	 The index
	  field is the displacement into the cache and corresponds to  the
	  Cache Entry Address.

	  Label - Once the location in the cache is selected, the nine bit
	  label	 field	is  compared  to the tag field stored in the cache
	  entry under consideration.  If the address  label  and  the  tag
	  field match, the valid bit is set, and there is no parity error,
	  then a hit has occurred.

	  Byte Select Bit - This bit determines if the reference is on	an
	  odd  or  even	 byte  boundary.  All Q-bus reads are word only so
	  this bit has no effect on a cache read.  Q-bus writes can access
	  either  words or bytes.  If there is a word write the cache will
	  be updated if there is a hit.	 If there is a miss  a	new  cache
	  entry	 will  be  made.  If there is a byte write, the cache will
	  only be updated if there is a hit.  A miss will not create a new
	  entry on a byte write.

      The LSI-11/73 direct map cache must update the backing store on a memory
      write.   The  LSI-11/73  uses  the  write	 through  method.   With  this
      technique, writes	 to  backing  store  occurs  concurrently  with	 cache
      writes.	The  result is that the backing store always contains the same
      data as the cache.

									Page 5


		       Features Of The LSI-11/73 Cache
		       -------------------------------

      The LSI-11/73 direct map cache has a number of features that  assist  in
      the  performance	of  the	 overall  system  in  addition	to  the	 speed
      enhancement as a result of faster memory access.	These features consist
      of the following:

			 o Q-bus DMA monitoring
			 o I/O page reference monitoring
			 o Memory management control of cache access
			 o Program control of cache parameters
			 o Statistical monitoring of cache performance

      The  LSI-11/73  cache  control  logic  monitors  the  Q-bus  during  DMA
      transactions.   When  an	address	 that  has its data stored in cache is
      accessed during DMA, the cache  and  backing  store  contents  might  no
      longer  be  the  same.   This  is	 an unacceptable situation.  The cache
      control logic invalidates a cache entry if the address  is  used	during
      DMA.   This  also	 includes  addresses  used during Q-bus Block Mode DMA
      transfers.

      Memory references to the I/O page are not	 cached	 since	that  data  is
      volatile, meaning its contents can change without a Q-bus access.	 Since
      the cache could end up with stale data, I/O references are not cached.

      There are situations for which using the cache to store information  for
      faster  access is not desirable.	An example is a device that resides in
      the I/O page, and is true in other instances as well.  One situation  is
      a	 device	 that  does  not  reside  in  the  I/O page but can change its
      contents without a bus reference, such as dual ported memory.

	   Another situation is partitioning and  tuning  an  application  for
      instruction  code execution versus data being manipulated.  In this case
      the instruction stream may execute many times over  for  different  data
      values.	Speed  enhancement  can	 be  obtained  if the instructions are
      cached while the data is not cached.  By forcing the data	 never	to  be
      cached it cannot replace instructions in the cache.

      The memory management unit (MMU) of the LSI-11/73	 can  assist  in  this
      situation.   Pages  of memory allocated for data can be marked to bypass
      the cache and therefore not effect instructions that  loop  many	times.
      The  cache  and  the  MMU work together to achieve the goal of increased
      system performance.

      The dynamics of cache operation are under program control through use of
      the  Cache Control Register (CCR), an LSI-11/73 on-board register.  This
      register can "turn" the cache on or off, force cache parity  errors  for
      diagnostic  testing,  and	 invalidate all cache entries.	The details of
      the CCR are described in the  KDJ11-A  CPU  Module  User's  Guide	 (part
      number EK-KDJ1A-UG-001).

      During  system  design  or  at  run-time	the  performance  enhancements
      provided	by  the	 cache	system can be monitored under program control.
      This is accomplished by using another LSI-11/73  on-board	 register  the

									Page 6


      Hit/Miss	Register  (HMR).   This	 register  tracks  the last six memory
      references and indicates if a hit or miss took place.   The  details  of
      the HMR are also described in the KDJ11-A CPU Module User's Guide.

				   Summary
				   -------

      Caches are a mechanism that can help improve overall system performance.
      The  dynamics  of a given cache are dictated by the organization and the
      behavior of the programs running on the machine.	The LSI-11/73 cache is
      designed	to  be	flexible  in  its  use,	 simple in implementation, and
      enhance application performance.

      More  detailed  discussions  on  how  caches  work   and	 other	 cache
      organizations  can  be  found in computer architecture texts that have a
      discussion of memory hierarchy.