UNIT 2

Memory,Register and Register Transfer: Register Transfer Language, Bus and Memory Transfer, Bus Architecture, Bus Arbitration, Arithmetic Logic, IEEE Standard for floating Point Numbers, Memory Hierarchy, Cache Memory, Vitual Memory, Memory Management Hardware.


REGISTER TRANSFER LANGUAGE

Introduction of data


Digital system is a collection of digital hardware modules Modules are registers, counters, arithmetic elements, etc connected via: - data paths routes on which information is moved - control paths routes on which control signals are moved Micro operations (micro-ops) are operations on data stored in registers Digital modules (often just called “registers”) are defined by their information contents and the set of micro-ops they perform Register transfer language is a concise and precise means of describing those operations.

Register Transfer Operations


Registers: denoted by upper case letters, and optionally followed by digits or letters Register transfer operations: the movement of data stored in registers and the processing performed on the data.Digital System: An interconnection of hardware modules that do a certain task on the information. Registers + Operations performed on the data stored in them = Digital Module Modules are interconnected with common data and control paths to form a digital computer system.

Register Transfer Language (RTL)

Digital System: An interconnection of hardware modules that do a certain task on the information.
Registers + Operations performed on the data stored in them = Digital Module
Modules are interconnected with common data and control paths to form a digital computer system
Microoperations: operations executed on data stored in one or more registers.
For any function of the computer, a sequence of microoperations is used to describe it
The result of the operation may be:
-replace the previous binary information of a register or
-transferred to another register

Bus and Memory Transfers

• Paths must be provided to transfer information from one register to another
• A Common Bus System is a scheme for transferring information between registers in a multiple-register configuration
• A bus: set of common lines, one for each bit of a register, through which binary information is transferred one at a time
• Control signals determine which register is selected by the bus during each particular register transfer



• The transfer of information from a bus into one of many destination registers is done:
- By connecting the bus lines to the inputs of all destination registers and then:
- activating the load control of the particular destination register selected
• We write: R2 ? C to symbolize that the content of register C is loaded into the register R2 using the common system bus
• It is equivalent to: BUS ?C, (select C) R2 ?BUS (Load R2)



• Memory read : Transfer from memory
Memory write : Transfer to memory
• Data being read or wrote is called a memory word (called M)- (refer to section 2-7)
• It is necessary to specify the address of M when writing /reading memory
• This is done by enclosing the address in square brackets following the letter M

Bus Architecture



• Memory bus (also called system bus since it interconnects the subsystems)
• Interconnects the processor with the memory systems and also connects the I/O bus.
• Three sets of signals –address bus, data bus, and control bus.

System bus


A system’s bus characteristics according to the needs of the processor, speed, and word length for instructions and data.
• Processor internal bus(es) characteristics differ from the system external bus(es)

Address bus


• Processor issues the address of the instruction byte or word to the memory system through the address bus
• Processor execution unit, when required, issues the address of the data (byte or word) to the memory system through the address bus
• The address bus of 32-bits fetches the instruction or data from an address specified by a 32-bit number.
Example
Let a processor at the start reset the program counter at address 0. Then the processor issues address 0 on the bus and the instruction at address 0 is fetched from memory.
Let a processor instruction be such that it needs to load register r1 from the memory address M.The processor issues address M on the address bus and data at address M is fetched .
When the Processor issues the address of the instruction, it gets back the instruction through the data bus
• When it issues the address of the data, it loads the data through the data bus
• When it issues the address of the data, it stores the data in the memory through the data bus.

Control bus


• Control bus issues signals to control the timing of various actions during interconnection.
• Control bus signals from processor to Memory read, Memory write, IO read, IO write
• Control bus signals synchronize the subsystems: memory and IO systems
• Control bus signals from processor to Address latch enable or data valid
• Control bus signals from systems to processor for Interrupt and hold
• Control bus signals from processor to systems for Interrupt acknowledge, hold acknowledge

Bus arbitration


Processor and DMA controllers both need to initiate data transfers on the bus and access main memory.
The device that is allowed to initiate transfers on the bus at any given time is called the bus master.
When the current bus master relinquishes its status as the bus master, another device can acquire this status.
The process by which the next device to become the bus master is selected and bus mastership is transferred to it is called bus arbitration.
(a).Centralized arbitration:
A single bus arbiter performs the arbitration.
(b).Distributed arbitration:
All devices participate in the selection of the next bus master
Bus arbiter may be the processor or a separate unit connected to the bus.
Normally, the processor is the bus master, unless it grants bus membership to one of the DMA controllers.
DMA controller requests the control of the bus by asserting the Bus Request (BR) line.
In response, the processor activates the Bus-Grant1 (BG1) line, indicating that the controller may use the bus when it is free.
BG1 signal is connected to all DMA controllers in a daisy chain fashion.
BBSY signal is 0, it indicates that the bus is busy. When BBSY becomes 1, the DMA controller which asserted BR can acquire control of the bus.
Centralized arbitration scheme with one Bus-Request (BR) line and one Bus-Grant (BG) line forming a daisy chain.
Several pairs of BR and BG lines are possible, perhaps one per device as in the case of interrupts.
Bus arbiter has to ensure that only one request is granted at any given time.
It may do so according to a fixed priority scheme, or a rotating priority scheme.
Rotating priority scheme:
There are four devices, and initial priority is 1,2,3,4.
After the request from device 1 is granted, the priority changes to 2,3,4,1

Distributed arbitration


All devices waiting to use the bus share the responsibility of carrying out the arbitration process.
Arbitration process does not depend on a central arbiter and hence distributed arbitration has higher reliability.
Each device is assigned a 4-bit ID number.
All the devices are connected using 5 lines, 4 arbitration lines to transmit the ID, and one line for the Start-Arbitration signal.
To request the bus a device:
  • Asserts the Start-Arbitration signal.
  • Places its 4-bit ID number on the arbitration lines.
    The pattern that appears on the arbitration lines is the logical-OR of all the 4-bit device IDs placed on the arbitration lines.
    Device A has the ID 5 and wants to request the bus:
    - Transmits the pattern 0101 on the arbitration lines.
    Device B has the ID 6 and wants to request the bus:
    - Transmits the pattern 0110 on the arbitration lines.
    Pattern that appears on the arbitration lines is the logical OR of the patterns:
    - Pattern 0111 appears on the arbitration lines

    Airthmetic Logic

    An arithmetic logic unit (ALU) is a combinational digital electronic circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on floating point numbers. An ALU is a fundamental building block of many types of computing circuits, including the central processing unit (CPU) of computers, FPUs, and graphics processing units (GPUs). A single CPU, FPU or GPU may contain multiple ALUs. The inputs to an ALU are the data to be operated on, called operands, and a code indicating the operation to be performed and, optionally, status information from a previous operation; the ALU's output is the result of the performed operation. In many designs, the ALU also exchanges additional information with a status register, which relates to the result of the current or previous operations.


    Part of the computer that actually performs arithmetic and logical operations on data
    All of the other elements of the computer system are there mainly to bring data into the ALU for it to process and then to take the results back out
    Based on the use of simple digital logic devices that can store binary digits and perform simple Boolean logic operations

    IEEE 754 floating-point standard

    Basic Definition

    Defines the following different types of floating-point formats: Arithmetic format All the mandatory operations defined by the standard are supported by the format. The format may be used to represent floating-point operands or results for the operations described in the standard. Basic format This format covers five floating-point representations, three binary and two decimal, whose encodings are specified by the standard, and which can be used for arithmetic. At least one of the basic formats is implemented in any conforming implementation. Interchange format A fully specified, fixed-length binary encoding that allows data interchange between different platforms and that can be used for storage. Leading “1” bit of significand is implicit Exponent is “biased” to make sorting easier all 0s is smallest exponent all 1s is largest bias of 127 for single precision and 1023 for double precision summary: (–1)sign ΄ (1+significand) ΄ 2exponent – bias Example: decimal: -.75 = -3/4 = -3/22 binary : -.11 = -1.1 x 2-1 floating point: exponent = -1+bias = 126 = 01111110.

    Arithmetic format

    All the mandatory operations defined by the standard are supported by the format. The format may be used to represent floating-point operands or results for the operations described in the standard.

    Basic format

    This format covers five floating-point representations, three binary and two decimal, whose encodings are specified by the standard, and which can be used for arithmetic. At least one of the basic formats is implemented in any conforming implementation.

    Interchange format

    A fully specified, fixed-length binary encoding that allows data interchange between different platforms and that can be used for storage.

    Memory Hierarchy

    Computers have several different types of memory. This memory is often viewed as a hierarchy as shown below.

    Our main concern here will be the computer’s main or RAM memory. The cache memory is important because it boost’s the speed of accessing memory, but it is managed entirely by the hardware. The rotating magnetic memory or disk memory is used by the Virtual Memory Management.

    Cache memory

    A cache memory is a fast random access memory where the computer hardware stores copies of information currently used by programs (data and instructions), loaded from the main memory. The cache has a significantly shorter access time than the main memory due to the applied faster but more expensive implementation technology. The cache has a limited volume that also results from the properties of the applied technology. If information fetched to the cache memory is used again, the access time to it will be much shorter than in the case if this information were stored in the main memory and the program will execute faster.
    Time efficiency of using cache memories results from the locality of access to data that is observed during program execution. We observe here time and space locality:
    Time locality consists in a tendency to use many times the same instructions and data in programs during neighbouring time intervals,
    Space locality is a tendency to store instructions and data used in a program in short distances of time under neighbouring addresses in the main memory.
    Due to these localities, the information loaded to the cache memory is used several times and the execution time of programs is much reduced. Cache can be implemented as a multi-level memory. Contemporary computers usually have two levels of caches. In older computer models, a cache memory was installed outside a processor (in separate integrated circuits than the processor itself). The access to it was organized over the processor external system bus. In today's computers, the first level of the cache memory is installed in the same integrated circuit as the processor. It significantly speeds up processor's co-operation with the cache. Some microprocessors have the second level of cache memory placed also in the processor's integrated circuit. The volume of the first level cache memory is from several thousands to several tens of thousands of bytes. The second level cache memory has volume of several hundred thousand bytes. A cache memory is maintained by a special processor subsystem called cache controller.
    If there is a cache memory in a computer system, then at each access to a main memory address in order to fetch data or instructions, processor hardware sends the address first to the cache memory. The cache control unit checks if the requested information resides in the cache. If so, we have a "hit" and the requested information is fetched from the cache. The actions concerned with a read with a hit are shown in the figure below.
    If the requested information does not reside in the cache, we have a "miss" and the necessary information is fetched from the main memory to the cache and to the requesting processor unit. The information is not copied in the cache as single words but as a larger block of a fixed volume. Together with information block, a part of the address of the beginning of the block is always copied into the cache. This part of the address is next used at readout during identification of the proper information block. The actions executed in a cache memory on "miss" are shown below.
    To simplify the explanations, we have assumed a single level of cache memory below. If there are two cache levels, then on "miss" at the first level, the address is transferred in a hardwired way to the cache at the second level. If at this level a "hit" happens, the block that contains the requested word is fetched from the second level cache to the first level cache. If a "miss" occurs also at the second cache level, the blocks containing the requested word are fetched to the cache memories at both levels. The size of the cache block at the first level is from 8 to several tens of bytes (a number must be a power of 2). The size of the block in the second level cache is many times larger than the size of the block at the first level.
    The cache memory can be connected in different ways to the processor and the main memory:
  • as an additional subsystem connected to the system bus that connects the processor with the main memory,
  • as a subsystem that intermediates between the processor and the main memory,
  • as a separate subsystem connected with the processor, in parallel regarding the main memory.
    The third solution is applied the most frequently.
    We will now discuss different kinds of information organization in cache memories.
    There are three basic methods used for mapping of information fetched from the main memory to the cache memory:
  • associative mapping
  • direct mapping
  • set-associative mapping.
    In today's computers, caches and main memories are byte-addressed, so we will refer to byte-addressed organization in the sections on cache memories that follow.

    Cache memory with associative mapping

    With the associative mapping of the contents of cache memory, the address of a word in the main memory is divided into two parts: the tag and the byte index (offset). Information is fetched into the cache in blocks. The byte index determines the location of the byte in the block whose address is generated from the tag bits, which are extended by zeros in the index part (it corresponds to the address of the first byte in the block. In the number of bits in the byte index is n then the size of the block is a power of 2 with the exponent n. The cache is divided into lines. In each line one block can be written together with its tag and usually some control bits. It is shown in the figure below.
    When a block is fetched into the cache (on miss), the block is written in an arbitrary free line. If there is no free line, one block of information is removed from the cache to liberate one line. The block to be removed is determined according to a selected strategy, for example the least used block can be selected. To support the block selection, each access to a block residing in the cache, is registered by changing the control bits in the line the block occupies.
    The principle of the read operation in cache memory is shown below. The requested address contains the tag (bbbbb) and the byte index in the block (X). The tag is compared in parallel with all tags written down in all lines. If a tag match is found in a line, we have a hit and the line contains the requested information block. Then, based on the byte index, the requested byte is selected in the block and read out into the processor. If none of the lines contains the requested tag, the requested block does not reside in the cache. The missing block is next fetched from the main memory or an upper level cache memory.
    The functioning of a cache with associative mapping is based on the associative access to memory. The requested data are found by a parallel comparison of the requested tag with tags registered in cache lines. For a big number of lines, the comparator unit is very large and costly. Therefore, the associative mapping is applied in cache memories of a limited sizes (i.e. containing not too many lines).

    Cache memory with direct mapping

    The name of this mapping comes from the direct mapping of data blocks into cache lines. With the direct mapping, the main memory address is divided into three parts: a tag, a block index and a byte index. In a given cache line, only such blocks can be written, whose block indices are equal to the line number. Together with a block, the tag of its address is stored. It is easy to see that each block number matches only one line in the cache.

    The readout principle in cache with direct mapping is shown below. The block index (middle part of the address) is decoded in a decoder, which selects lines in the cache. In a selected line, the tag is compared with the requested one. If a match is found, the block residing in the line is exactly that which has been requested, since in the main memory there are no two blocks with the same block indices and tags.
    We have a hit in this case and the requested byte is read in the block. If there was no tag match, it means that either there is no block yet in the line or the residing block is different to the requested one. In both cases, the requested block is fetched from the main memory or the upper level cache. Together with the fetched block, its tag is stored in the cache line.

    With direct mapping, all blocks with the same index have to be written into the same cache line. It can cause frequent block swapping in cache lines, since only one block can reside in a line at a time. It is called block thrashing in the cache. For large data structures used in programs, this phenomenon can substantially decrease the efficiency of cache space use. The solution shown in next section eliminates this drawback.

    Cache memory with set associative mapping


    With this mapping, the main memory address is structured as in the previous case. We have there a tag, a block index and a byte index. The block into line mapping is the same as for the direct mapping. But in a set associative mapping many blocks with different tags can be written down into the same line (a set of blocks). Access to blocks written down in a line is done using the associative access principle, i.e. by comparing the requested tag with all tags stored in the selected line. From both mentioned features, the name of this mapping is derived. The figure below shows operations during a read from a cache of this type.

    First, the block index of the requested address is used to select a line in a cache. Next, comparator circuits compare the requested tag with all stored in the line. On match, the requested byte is fetched from the selected block and sent to the processor. On miss (no match), the requested block is fetched from the main memory or the upper level cache. The new block is stored in a free block slot in the line or in the slot liberated by a block sent back to the main memory (or the upper level cache). To select a block to be removed, different strategies can be applied. The most popular is the LRU (least-recently used) strategy, where the block not used for the longest time is removed. Other strategies are: FIFO (first-in-first-out) strategy - the block that is stored during the longest time is selected or LFU (least-frequently used) strategy where the least frequently modified block is selected. To implement these strategies, some status fields are maintained associated with the tags of blocks.
    Due to the set associative mapping, block thrashing in cache is eliminated to the large degree. The number of blocks written down in the same cache line is from 2 to 6 with the block size of 8 to 64 bytes.

    Memory updating methods after cache modification


    A cache memory contains copies of data stored in the main memory. When a change of data in a cache takes place (ex. a modification due to a processor write) the contents of the main memory and cache memory cells with the same address, are different. To eliminate this lack of data coherency two methods are applied:
  • write through, the new cache contents is written down to the main memory immediately after the write to the cache memory,
  • write back, the new cache contents is not written down to the main memory immediately after the change, but only when the given block of data is replaced by a new block fetched from the main memory or an upper level cache. After a data write to the cache, only state bits are changed in the modified block, indicating that the block has been modified (a dirty block).
    The write back updating is more time efficient, since the block cells that were modified many times while being in the cache, are updated in the main memory only once.

    Vitual Memory


    • The basic abstraction provided by the OS memory management is virtual memory
    – A process’s address space in memory is not necessarily the same as the physical memory (RAM) address in which it resides
    – When a process requests a memory address, the OS will translate the address from a virtual address to a physical address ?
    ?

    Virtual address


    • Processes access memory using a virtual address
    – The virtual address is not the same as the physical RAM address in which it resides
    – The OS (hardware MMU) translates the virtual address into the physical RAM address
    – Who determines the mapping for the translation

    • Processes access memory using a virtual address
    – The virtual address is not the same as the physical RAM address in which it resides
    – The OS (hardware MMU) translates the virtual address into the physical RAM address
    – Who determines the mapping for the translation?



    • Virtual memory enables programs to execute without requiring their entire address space reside in physical memory
    – Saves space
    • Many programs do not need all of their code and data at once (or ever), so there is no need to allocate memory for it
    – Allows ?exibility for application and OS
    • Indirection allows moving programs around in memory; OS can adjust amount of memory allocated based upon its run-time behavior
    • Allows processes to address more or less memory than physically installed in the machine
    – Isolation and protection
    • One process cannot access memory addresses in others

    Memory Management Unit


    • Protection
    – Restrict which physical addresses processes can use, so they can’t stomp on each other
    • Fast translation
    – Accessing memory must be fast, regardless of the protection scheme
    • Fast context switching
    – Overhead of updating memory hardware on a context switch must be low

    As a program runs, the memory addresses that it uses to reference its data is the logical address. The real time translation to the physical address is performed in hardware by the CPU’s Memory Management Unit (MMU). The MMU has two special registers that are accessed by the CPU’s control unit. A data to be sent to main memory or retrieved from memory is stored in the Memory Data Register (MDR). The desired logical memory address is stored in the Memory Address Register (MAR). The address translation is also called address binding and uses a memory map that is programmed by the operating system.

    Note: The job of the operating system is to load the appropriate data into the MMU when a processes is started and to respond to the occasional by loading the needed memory and updating the memory map.

    Before memory addresses are loaded on to the system bus, they are translated to physical addresses by the MMU.