April 1983 © BYTE Publications Inc
Thomas W. Starnes
Motorola Inc., Microprocessor Division
3501 Ed Bluestein Blvd.
Austin, TX 78721
In the mid 1970s at Motorola, a new idea was taking shape. As more and more demands were being made on the MC6800 family of microprocessors, the push was on toward developing greater programmability of a 16-bit microprocessor. A project to develop the MC68000, known as Motorola's Advanced Computer System on Silicon (MACSS), was started.
The project team began with the freedom to design this entirely new product to best fit the needs of the microprocessor marketplace. Developers at Motorola explored many possibilities and made many difficult decisions. The result can be seen in the MC68000, viewed by most industry experts as the most powerful, yet easy to program, microprocessor available. In this first of four articles, I will discuss many of the philosophies behind the design choices that were made on the MC68000.
Many criteria can qualify a processor as an 8-, 16-, or 32-bit device. A manufacturer might base its label on the width of the data bus, address bus, data sizes, internal data paths, arithmetic and logic unit (ALU), and/or fundamental operation code (op code). Generally, the data-bus size has determined the processor size, though perhaps the best choice would be based on the size of the op code. I'll talk a bit about these features and then show how the MC68000 is both a 16- and 32-bit microprocessor.
Designers must make hundreds of decisions to shape the architecture of a new microprocessor. The needs of the users of the new product must be considered as the most important factors. After all, the users are the ones who really need a functional product, and if they are not happy with the features or performance, they will keep looking for a better alternative.
Unfortunately, it may be impossible to meet all of the needs of the users due to certain design limitations. The design must be inexpensive enough to produce in mass quantity. Also, current technology will permit only certain types and numbers of circuits to be manufactured on a silicon chip. These are the foremost factors that dictate the upper limits of the capabilities of a microprocessor.
In planning the new 16-bit MACSS, designers had to make a decision concerning the general architecture first. What should it look like? A great deal of software written for the MC6800 family already existed. A processor that provides enhancements over an older processor, yet can run all of the programs for the older processor, has a real asset: it can capitalize on the existing software base. This may attract users by ensur ing that they won't have to rewrite at least some of their programs.
Unfortunately, architectures, such as the early 8-bit microprocessors, were rather crude. Because they were designed to replace logic circuits, not enough thought was put into the software aspect of the parts. Their instruction set was oriented toward hardware. The designers did not con sider carefully the future of these products, their expandability and compatibility. To try to design a microprocessor to be compatible with the older 8-bit parts was limiting.
Designers at Motorola decided that the new MACSS should be the fast est, most flexible processor available. They would design it for program mers, to make their job easier, by providing functions in a way that most programmers could best use them.
Early on, it appeared that to have a really powerful new generation of microprocessors, a totally new architecture should be used and that earlier designs should be considered as examples rather than as models. This gave the MC68000 designers the freedom to introduce completely new concepts into microprocessors and to optimize the functionality of the new chip.
The planners decided there was one area in which ties to the 8-bit product family would be advantageous without exception. That area was in peripherals. Motorola decided that this new 16-bit microprocessor would directly interface to the 8-bit collection of MC6800 peripherals. Because so many input/output (I/O) operations are 8-bit oriented, it seemed logical to retain this compatibility even though the 8-bit microprocessor interface would naturally be about half as fast as a comparable 16-bit. Compatability with 8-bit MC6800 peripherals had the added benefit of immediately ensuring support of the new microprocessor with a complete family of peripheral chips, rather than requiring a wait of perhaps years for 16-bit versions to become available.
A properly designed 16-bit microprocessor has many advantages over the most sophisticated 8-bit microprocessor, especially to the programmer (see figures 1 and 2). The 8 bits of op code for the smaller processor provide only 256 different instruction variations. This may seem to be a lot at first glance, but consider the following.
Figure 1: Op code organization for the MC6800. This processor is limited in its abilities because of its 8-bit size.
M6800 OP CODE ------------------------------- | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | ------------------------------- | \_____/ \_____________/ | | | | | | | | | REGISTER | | 0=A | | 1=B | | | | ADDRESS MODE | 00=IMMEDIATE | 01=DIRECT | 10=INDEXED | 11=EXTENDED | | OPERATON $0=SUBRACT 1=COMPARE 2=SUBTRACT W/CARRY 4=AND 5=BIT 6=LOAD 7=STORE 8=EXCLUSIVE OR 9=ADD W/CARRY A=OR B=ADD
Figure 2: The MC68000 ADD instruction op code shows the power available with 16-bit operations. Multiple registers with variable operand sizes and a large address field give a programmer tremendous flexibility in programming.
MC68000 OP CODE --------------------------------------------------------------- | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | --------------------------------------------------------------- OPERATION \_________/ \_/ \_____/ \_____________________/ ADD | | | | REGISTER | | EFFECTIVE ADDRESS FIELD D4 | | MEMORY A2 INCREMENT (1 OF 8) | | (1 OF 12 MODES PLUS | | 1 OF 8 REGISTERS) TO (1) OR | FROM (0) | MEMORY | | OPERAND SIZE 16 BITS (8, 16, OR 32 BITS)
If the microprocessor has two registers from which to move and manipulate data, those two registers require 1 bit for encoding the op code. If four different addressing modes are offered for accessing memory data, these require 2 more bits for encoding. This leaves the microprocessor with only 5 bits with which to encode the operation to be performed. Only 32 different operations can be performed.
Now admittedly this is plenty of operations for most applications, but realize that only two data registers and four memory-addressing modes are not very many to someone doing serious programming. Registers are there for fast data manipulation, and constantly swapping the contents of too few registers is not very fast. A more powerful microprocessor would have many registers, and they would all have to be accessible by the different operations.
Additionally, the more addressing modes you have for accessing memory data, the more efficiently you can get values in memory. Obviously, 8 bits of op code cannot give the microprocessor both the variety and the number of operations that a good 16-bit microprocessor can. With 64,000 different instructions possible in a 16-bit op code, you can perform far more complex operations.
This, then, is the real advantage of 16-bit over 8-bit microprocessors to the programmer. A 16-bit microprocessor will have twice the data-bus width of the 8-bit version. This wider bus allows twice as much information to go in and out of the processor in the same amount of time. This can, with proper internal design, almost double the rate at which operations take place over the rate of a similar 8-bit machine. Sixteen-bit microprocessors should give the programmer far greater flexibility in coding and perform similar operations in less than half the time of an 8-bit microprocessor.
Users of the 8-bit microprocessors originally had difficulty imagining what kind of programs could fill up 64K bytes of memory. Many systems had no more than 8K bytes of ROM (read-only memory) and RAM (random-access read/write memory). But as time went on and the general software base grew, systems with up to 64K bytes of memory became more prevalent. Either code had to become more efficient or ways of fitting more than 64K bytes of memory in a system had to be developed. Sixteen-bit microprocessors could make code more efficient.
In planning MACSS, designers foresaw that the 16-bit, 64K-byte addressing range of popular 8-bit microprocessors would be quickly outgrown by the newly proposed microprocessor. Each additional bit of address could double the addressing range of the processor.
Look at the techniques of expanding beyond a 16-bit addressing range and analyze the design trade-offs (see figure 3). You could extend the addressing range of early computers and minicomputers simply by appending some additional bits to the most significant of the 16 address bits. These additional bits were usually stored in an additional register, the page register. This method is called paging, because you work out of one page at a time. The page is set manually, and the lower 16-bits of address are included in the instruction stream or registers.
Figure 3: Three methods of addressing memory. The Linear method arranges a contiguous memory area. The Paged method organizes memory into blocks or pages of a prescribed length. The Segmented method gives each user or program a specific area in memory. Both the Paged and the Segmented method give the programmer access to only a small portion of memory.
LINEAR _______ | | FFFFFFFF | | | | | | | | | | / / / / | | | | | | | | | | | | | | | | |_______| 00000000 PAGED (PAGE REGISTER): (ADDRESS REGISTER) _______ | | FF:FFFF | | |_______| FF:0000 | | | | | | / / / / | | | | |_______| | | | | |_______| 01:0000 | | | | |_______| 00:0000 SEGMENTED (SEGMENT REGISTER): (ADDRESS REGISTER) ARE ADDED _______ | | | | | | / / / / | | | | |_______| | | 2C:FFFF | | |_______| 2C:0000 | | | XXX | |_______| | | A5:FFFF | | |_______| A5:0000
Paging has the advantage of being quite simple to implement in the processor. No real circuit change is needed over the straightforward 16-bit addressing, because all the expansion is done simply by appending bits to the core. It also has the advantage of having fairly dense code, because only 16 bits of address are carried around in the instructions.
However, there are many disadvantages to paging. The programmer is limited to accessing only the particular page of memory that happens to be set in the page register. To be assured that the right page is being used requires a check to see what is currently in the page register, possibly saving that page number, and loading the register with the desired page number. This takes time and requires both additional thought by the programmer and additional code in the software. This additional code typically takes up the room saved by carrying around only 16 bits of address.
One way to get around the single-page limitation of paging is to provide many page registers. Other characteristics that determine which register will be active on a particular bus cycle include instruction fetch, data read/write, and stack access. While these additional registers give the programmer access to more than one page at a time, there is still only one page available for each type of access.
Some extensions to paging came out to compensate for some of the losses experienced in paging. Segmentation, for example, follows the same general principles of pagination. The key difference in segmentation is that the page number becomes a segment number and the segment number is essentially added to the core 16-bit address. This allows some relocation of the core address but still forces the programmer to check that the desired segment is loaded, and limits the range of any segment to only 64K bytes of memory.
To a programmer, the simplest address technique is a direct addressing of any memory location. This would be without regard for whether the wanted data is near recently accessed data or whether it is miles away. The programmer wants a linear view of data, that is, the ability to specify a very simple, albeit long, address that will access any data.
Now, beyond the processor's memory-addressing method, memory management is sometimes used. With it more sophisticated systems dynamically relocate or control the various blocks of memory. This is done for protection purposes in larger systems. The advantage is that you can protect one user's work space from the devastating effects of another user's poor programs running amuck. To this end, a separate memory management unit (MMU), in conjunction with the operating system, performs some addition to or translation of an address. This technique may sound similar to paging and segmenting memory, but this is done to serve a completely different purpose, and in a different way. The application program writer never sees this memory management and writes code as though the entire memory were available.
To expand the memory space on the MACSS, the best option, though not the easiest to implement on the chip, is a linear address space. This space is not broken up by paging, segmentation, or banking schemes. It is a very simple addressing technique, requiring the least effort by the programmer, while still allowing more advanced operations such as memory management.
A linear address is simply a straightforward 32-bit, for example, address. The address space is not broken up into blocks; and it is contiguous. Accessing such an address merely requires the expression of the 32-bits in the instruction or using a single address register. For convenience, if the upper 16 bits of the address are either all 0s or all 1s, then a shorter, 16-bit form of the address can be sign-extended to automatically provide the correct address. This is the way the MC68000 accesses memory and I/O.
How big an address space should a 16-bit microprocessor address? The natural address sizes greater than 16 bits are 24 and 32 bits, which are 3 and 4 bytes long, respectively. For a 16-bit microprocessor, the odd number of bytes becomes slightly unwieldy. Looking a little further into the future, it seemed that even the 16 megabytes of a 24-bit address might not meet the needs of large systems.
While 32 bits of address, reaching 4 gigabytes of memory, seems tremendous, once the need for more than 16 bits is established, 32 bits is the next most convenient size. It takes exactly two 16-bit bus transfers to move an address into the processor, and once the second transfer is needed, as it would be even for an 18-bit address, it is just as well to use the whole 16 bits brought in. Thus, engineers selected a virtual-memory address space of 32 bits for the MC68000.
Now, from a practical packaging standpoint, 32 address signal lines are quite a few. The placement of integrated circuits (ICs) in dual inline packages (DIPs) with greater than 40 leads was rare before 1980. With only a few systems in the early '80s requiring more than 16 megabytes of memory, it seemed a reasonable trade-off to bring only the 24 least significant address bits to the outside world. That way fewer pins would be required, and MACSS could fit within a 64-pin DIP. Still, all 32 bits of address are maintained within the processor, and there are simple means of determining the upper 8 bits' values.
With the size of the memory address space determined, it was easier to settle on the register scheme of the new processor. The size and the number of registers had to be decided.
Designers originally envisioned onboard registers for a processor because operating on memory data requires a time-consuming transfer across the external bus. It just happens that in programming most data is operated on a number of times in succession before a result is obtained. Often many combinations with many different data pieces are used. The merging of these two observations leads to onboard or on-chip registers for fast manipulation of frequently used data.
It seems that from the day registers were brought into the processor, programmers have wanted more registers for their use. The goal, then, when designing processors, is to provide as many registers as possible for the programmer. In the MC6800, only two registers (A and B) were available for data manipulation, and one index register (X) to point to non-stack data. These few registers are being loaded and saved almost as often as the data within them is manipulated.
The loading and saving of registers is usually wasted time. The amount of time spent bringing data into on-chip registers for fast manipulation depends upon the exact use of that data. However, the more registers available, the more likely it is that a register will not have to be saved just so that some other data can be operated on in that register.
The design of the internal execution of instructions through a microprocessor will determine many things about the suitability of the chip for programming. Instructions may operate either on what are called dedicated registers or on a general register set. Each of these methods has advantages and disadvantages.
In a microprocessor that uses dedicated registers, an instruction includes the address of the data to be worked on in specific registers. These registers are inherent in the instruction. The ADD instruction, for example, will add only from a memory location to, say, register A -- not to register B, and not from register A to memory. If the value to be added to is not already in register A, it must first be placed there. Before it can be placed there, a number in A may have to be saved. All of this can be quite troublesome. This is not very different from the situation in which there simply are not enough registers.
Contrast this with the example of a processor that uses true general-purpose registers. In a general-register machine, the ADD instruction may add data from memory to any of the internal registers. The instruction must contain information on which register it will operate on. This is determined when the instruction is assembled. If there were four registers in the processor, the ADD operation could be performed in register A, B, C, or D, as selected by the programmer.
Now if the value to be added to is in register C, the programmer simply designates C as the operand register. There is no need to shuffle registers and no need to save any register contents. The general-register machine, then, is easier to program and typically requires less time to execute an operation.
As it always happens, this ease of programming does not come free. You will see later that allowing a selection of registers requires bits in the op code for encoding and, therefore, more bits of the op code. Also, it is typically more difficult for the microprocessor designer to implement the circuitry that incorporates various registers because it takes time to determine which register is to be used and to activate that register. Streamlining internal operations so that this time is not detectable requires quite a bit of planning.
So while fewer registers or dedicated registers may be easier for the microprocessor designer to implement, they make programming the new chip more cumbersome and less flexible. But the extra time, effort, and expense of implementing general-register principles pays off by easing the programming of these devices.
Therefore, the MC68000 was designed with general-purpose registers. Any instruction may select any register for use as a source or destination operand or as a pointer in any allowable addressing mode. This tremendous flexibility gives programmers the ultimate in data and pointer placement.
A close observation of the use of registers indicates they usually have one of two purposes: they may retain data for manipulation, or they may contain an address that points to a memory location. The use of a register for each of these purposes is quite different.
When data is moved into or out of a register or is manipulated within the register, all types of conditional information from the operation are important. Thus, you typically would like all condition codes to be properly set after a data operation. This way these condition codes may be used to branch or with other data operations.
On the other hand, an address might be placed in or taken from a register, or modified by incrementing or decrementing. Rarely is it important whether a carry comes out of the ALU or whether the result is negative (i.e., has a 1 in the most significant bit). In fact, a programmer would prefer manipulation of an address to have no effect on the condition codes. Often in the middle of a complex data operation, you must bring in a new address or increment an address. To have this operation modify the condition codes most of the time will foul up the data operation in progress, and so is undesirable.
Therefore, two generic register types emerge: a data register (D0 through D7) and an address register (A0 through A7). The MC68000 has both types. In a data register, any operation will affect the condition codes of the microprocessor as is appropriate for the operation and the data used. However, in an address-register operation, condition codes will not be changed, but the codes from previous data operations will be retained. This way you can have address and index pointer changes made, without affecting the accuracy of the results, in the middle of a complex data operation that requires many instructions and transfers from memory.
What size and how many of each type of register should be included in the microprocessor? The more registers there are, the better it is for the programmer. Unfortunately, the more register and control circuits in the chip, the more expensive it is. A good balance must be attained.
Two registers are too few, four are nice, but it is difficult to imagine even a complex routine requiring more than eight different memory pointers. The encoding of eight registers requires an even three bits. Because it seemed that eight was a good upper bound, the MC68000 has eight address registers and also eight data registers.
With 16 registers available, divided half and half for data and address, almost any sizable routine will never require the temporary storing of a value in a register just so that the register can be used for something else. And, within the routine, manipulations of memory pointers in address registers will not interfere with an ongoing data calculation, because of the distinction of how the condition codes work for the different register types. It is easy to see how the MC68000 is easier to program.
Earlier I explained that MACSS would handle all of its addresses as 32-bit quantities. Anyone who has ever programmed 8-bit microprocessors, which have 8-bit accumulators and 16-bit index registers, has seen the difficulty with the two different sizes. Once programmers figure out how to put the 16-bit value in both 8-bit accumulators, things get tougher when they try to get arithmetic carries from the lower half to the upper half of the value.
A little of this experience led the MC68000 designers to decide that using data that is the same size as the address register could make some software design significantly easier. In order to handle a linear 32-bit virtual-address space, the MC68000 needed to have 32-bit address registers. How would 32-bit data registers fit into a 16-bit microprocessor?
You would expect a 16-bit microprocessor to process 8- and 16-bit data, but does it make sense for it to also process 32-bit data? Obviously, the addresses will have to be handled in that size. Designers recognized that in 8-bit microprocessors the ability to handle 16-bit data came in quite handy for more advanced applications. The 8-bit processors soon had to be upgraded to handle 16-bit operands, and users of 16-bit mini-computers needed 32-bit operations.
Once a few 32-bit operations become necessary in a microprocessor, you need a whole array of operations. If a multiplication operation generates a 32-bit result, in order to do anything with that result, other 32-bit operations are needed. For consistency, again, Motorola decided that the data registers would be 32 bits wide and operations on all 32 bits could take place with a single instruction.
The exact manner of processing data and addresses through the MC68000 came about later, with careful analysis of the internal architecture and the need for address and data in the sequence of instructions. The chip ended up with three separate arithmetic units, which could work in parallel. I'll describe their purpose to give some insight into how the machine works.
The MC68000 has a 16-bit-wide ALU that essentially performs all data calculations and provides single-pass evaluation of the 16-bit data, for which the MC68000 is primarily designed. There are also two other internal arithmetic units. Both are 16 bits wide and are generally used in conjunction with each other to perform the various address calculations associated with operand effective addresses. This makes sense because all addresses are 32 bits wide. An effective address (EA) is the calculated result based on a selected addressing mode of the processor. In the MC6800, for instance, if an "index-register-plus-offset" address mode were used, the EA would be the result of adding the contents of X with the given offset. Because EA evaluation takes time and can be a significant portion of the instruction, it is important to perform this quickly.
At one time, then, one 32-bit address and one 16-bit data calculation can take place within the MC68000. This speeds instruction execution time considerably by processing addresses and data in parallel. The MC68000 also operates on 32-bit data. This is usually done by taking two passes of 16-bit data, one for the lower word and one for the upper word. This is reflected in the execution time of many 16- and 32-bit instructions.
Another way designers made the MACSS faster was to include what is called a prefetch queue. This prefetch queue is more intelligent than other microprocessor queues; its control varies according to the instruction stream contents.
The prefetch queue is a very effective means of increasing microprocessor performance; it attempts to have as much instruction information as possible available before a particular instruction begins execution. The microprocessor uses an otherwise idle data bus to prefetch from the instruction stream. This keeps the bus active more of the time, increasing performance because processing of instructions is often limited by the time it takes to get all the relevant information into the processor.
The part of memory from which instructions are fetched, the program space, contains op codes and addressing information. The prefetch queue can contain enough information to execute one instruction, decode the next instruction, and fetch the following instruction from memory -- all at the same time.
Exactly what is in the queue is very dependent upon the exact instruction sequences. The queue is intelligent enough to stay fairly full without being too wasteful.
For instance, when a conditional branching instruction is detected, the prefetch is ready to either branch or not by the time a decision is made. The queue tries to fetch both the op code following the branch instruction and the op code at the calculated branch location. Then, when the condition codes are compared and a decision is made whether to branch, the processor can begin immediate decoding of either instruction. The other unnecessary op code is ignored.
You can use the prefetch queue in many other special ways as well. One example is in speeding up the repetitious Move Multiple Registers instruction, where it is used to accelerate successive data transfers. The prefetch queue allows many frequently used instructions to execute in exactly the time it takes to fetch the op code (actually, the time to prefetch the next op code).
One other significant implementation feature from the MACSS project emerged from the choice between a random logic design versus a microcoded design. Both techniques have advantages and disadvantages. Earlier microprocessors were largely of random logic design. Advanced techniques of very large scale integration (VLSI) and the increasing complexity of the chips have made microcoding more attractive.
Random logic design of a microprocessor or other logic device is the building of the device from discrete components-gates, buffers, and transistors. This limits the components to those that are essential. There are no unused gates, duplicated circuits, or clever uses of otherwise unused components. The design is usually packed as tightly as possible and is quite fast.
The difficulty is that, as the design becomes more and more complex, as VLSI has, the planning and layout of the components and signal traces become exponentially more difficult and often impossibly so. This means that it takes exorbitant amounts of time to design the circuits.
Another problem with the use of random logic in very complex circuits occurs in modeling and testing. Before such circuits are finally placed in silicon, they must be modeled and simulated on computers because of the great difficulty in running down bugs once the chip is in silicon compared to debugging a wire-wrap board. The entire circuit must be modeled all at once to ensure that one combination of signals affects only the expected section of the device.
Similarly, once the circuit is in silicon, the pass/fail testing of the components in a random logic chip is quite difficult. You typically have only a few lines to send sequences of patterns through for testing. Because a particular section of the circuit may be exercised only by a very few given inputs, a normal test may not detect a stuck gate or other error caused by some strange combination of inputs.
On the other hand, in much the same way that microprocessors made designing systems with medium-scale integration/large-scale integration (MSI/LSI) easier, microprogramming has come to ease the complications in the design of microprocessors. Microprogramming is to a microprocessor what a microprocessor is to a logic design of a system. A microprocessor has central components that can be considered black boxes with inputs and outputs. For each given operation (instruction, interrupt condition, etc.), the microprocessor can route certain information to these black boxes as inputs, and the outputs can be routed to other components. The control of this routing is performed by a microcontroller or microsequencer.
Similar to a microprocessor, the microsequencer directs the flow of data through the various components (ALU, registers, condition flags, shifters, buses, etc.) according to microprogrammed instructions. Each instruction has its own microroutine, or sequence of microwords, which routes the associated data to the proper component in the proper order. Conditions and branches may redirect the microroutines.
Microcoding a complex circuit simplifies design mostly because it makes the circuit modular. It takes a controller, a block of microprogram, and the components through which data is to flow. Each of these elements may be modeled, built, and tested with individual inputs and outputs. Microcoding is a big step toward simplifying the design process because it breaks up the design into manageable blocks, thereby easing the testing of the finished product.
Another advantage of microcoding is that it allows tremendous flexibility in the exact operation of the circuit. Its microwords allow more combinations of the inputs through the components than most random logic would allow. Microcoding's programmability makes it especially attractive to silicon designers because random logic in silicon is not easily changed.
You can change the microROM of the microcoded device right up to the minute before the masks for the device are processed. To change a small facet of an operation may mean altering a few bits in the microROM, but this changes only whether or not there is a gate on the bit's transistor-a simple alteration. Similarly, after the silicon is cast, should a change be necessary, it will likely be just a microcode change, which would be much easier than random logic modification in silicon.
The disadvantage of a microcoded circuit lies primarily in its generality. Because it is made up of modules and is programmed, the microcoded circuit is more wasteful of transistors and therefore makes a larger circuit. This may add up to 20 percent more board space or chip area than a tight random logic design. But microcoding has advantages that make up for this disadvantage, making it the design choice for modern VLSI circuits.
There are two types of microprogramming, horizontal and vertical (see figure 4). Horizontal microcoding is the more direct form. It is unencoded, so that, for instance, 1 bit in each microword would enable each register. For 16 registers, then, 16 bits of microcode must be dedicated. Horizontal microwords tend to be quite long, and because the size of the microcode directly affects chip size, they can quickly increase chip cost.
Figure 4: Comparison of horizontal and vertical microcode patterns.
VERTICAL MICROWORD ------------------------------------- | | 1 | 0 | 0 | 1 | | ------------------------------------- ENCODED "REGISTER 4" WITH ENABLE BIT HORIZONTAL MICROWORD -----------------------------------------------/ /--- | | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | | -----------------------------------------------/ /--- UNENCODED "REGISTER 4"
A denser but slower form of microcoding is vertical microcoding. Here, control functions are encoded, so that only 4 bits of microcode are required to select one of 16 registers. While it needs a much shorter microword, vertical microprogramming is potentially slower than horizontal microprogramming. Vertical microprogramming will take at least one level of logic gates to decode the encoded signals. This level of gates may just throw the total gate propagation delay over the threshold of the clock pickets, forcing an additional clock cycle into the instruction.
In the MACSS project, the MC68000 was selected to be microcoded. In retrospect this was a very wise decision. The first silicon prototype worked well enough so that the major circuits in the device could be tested, and subsequent "fixes" were often just microcode corrections. The instruction set was not firm until just before the masks went to wafer fabrication, allowing some late decisions to be made to improve the performance of the chip.
A combination of horizontal and vertical microcoding was used on the MC68000 to gain the optimum advantages of both. Essentially, a microcode and a nanocode were developed. The microcode is a series of pointers into assorted microsubroutines in the nanocode. The nanocode performs the actual routing and selecting of registers and functions, and directs results. This combination is quite efficient because a great deal of code can share many common routines and yet retain the individuality required of different instructions.
Decoding of an instruction's op code generates starting addresses in the microcode for the type of operation and the addressing mode. Completion of an instruction enables interrupts to be accepted or allows access to the prefetch queue for the next op code. The prefetch queue actually keeps bus use at 85 to 95 percent, i.e., the bus is idle only 5 to 15 percent of the time!
Let's look back now at the MC68000 and see what parts of it might qualify it as a 16-bit device. The internal data ALU is 16 bits. It processes 32-bit addresses, though only 24 bits are brought off chip. The op code that tells the processor what operation to perform is 16 bits wide. The data bus is 16 bits wide. The microprocessor will operate on either 8, 16, or 32 bits of data automatically. There are 16 general-purpose 32-bit-wide registers in the chip.
The MC68000 is generally considered a 16-bit microprocessor, though it uses 32-bit addresses and contains 32-bit registers. It also can operate on 32 bits of data as easily as 8 and 16. Many users of the MC68000 consider it a 32-bit just as much as a 16-bit processor. Whatever you consider it there is no doubt that the MC68000 is indeed a powerful microprocessor. In coming articles, I will discuss in more detail exactly what operations are available in the MC68000 and will illustrate examples of MC68000 code.
Motorola has recently developed an improved version of the MC68000: the MC68010. It is completely compatible with object codes of earlier versions of the 68000 and has added virtual memory support and improved loop instruction execution.
By using virtual memory techniques. the 68010 can appear to access up to 16 megabytes of memory when considerably less physical memory is available to a user. The physical memory can be accessed by the micro processor while a much larger "virtual" memory is maintained as an image on a secondary storage device such as a floppy disk. When the microprocessor is instructed to access a location in the virtual memory that is not within the physical memory (referred to as a page fault), the access is suspended while the location and data are retrieved from the floppy disk and placed into physical memory. Then the suspended access is completed. The 68010 provides hardware support for virtual memory with the ability to suspend an instruction when a page fault is detected and then to complete the instruction after physical memory has been updated.
The MC68010 uses instruction continuation rather than instruction restart to support virtual memory. When a page fault occurs, the microprocessor stores its internal state on the supervisor stack. When the page fault has been repaired, the previous internal state is reloaded into the microprocessor, and it continues with the suspended instruction. Instruction continuation has the additional advantage of handling hardware support for virtual I/O devices.
As mentioned in the body of this article, the 68000 uses a prefetch queue to improve the speed of instruction execution. The 68010 goes one step further by making the prefetch queue more intelligent. Detection of a three-word looping instruction will put the microprocessor into a special mode. In this loop mode, the microprocessor will need only to make data transfers on the bus, because it latches up the queue and executes the instruction repeatedly out of the queue. Once the termination condition for the loop is reached, normal operation of the prefetch queue is resumed. This operation is invisible to the programmer and provides efficient execution of program loops.
About the Author
Thomas Starnes is an electrical engineer who has spent the last five years helping to plan the direction of the MC68000 family of processor products for Motorola.