I’ve been a bit lazy giving updates but I did spend some time working out an overall design for the instruction set I outlined last time. Getting close on that, but for today I’ll go over some new circuit components that should be useful.
A few posts ago I described demultiplexers, which can be used to direct an input value to one of its ‘n’ output lines. A decoder is quite similar, except instead of passing an input value through a decoder simply selects (sets to 1) one of its ‘n’ outputs based on its inputs.
For example, consider the 2:4 decoder (2 input lines, selecting one of 4 output lines) above. When A1 is high (1) and A0 is low (0) then line X2 is selected (2 is 10 in binary; A1=1 and A0=0).
Similarly, the other combinations of A1 and A0 select their corresponding output line:
Here’s how it all works. Take a close look at the table above and note the combination of 0’s and 1’s for the A1 and A0 input lines. If both A1 and A0 are low (0) then X0 is high (1), if A1 is low (0) and A0 is high (1) then X1 is high (1), and so on.
The 2:4 decoder circuit emulates this logic table using a series of AND and NOT gates to combine its two input lines (A1 and A0) accordingly.
An AND gate (the semi-circular symbols in the circuit above) outputs high (1) only if both of its inputs are also high otherwise it outputs low (0). A NOT gate (triangles in the circuit above) outputs the “opposite” of its input (e.g. high to low, low to high).
Let’s look at the X2 example again. According to the table above, X2 is selected (high) only when A1=1 and A0=0. However we can’t simply AND the two input values (e.g. “A1 AND A0”) because we’d get “1 AND 0” which results in 0. Not the quite result we’re looking for.
The key here is the A0=0 part. What we really want is “1 AND 1” but only when A0 is 0. That’s where the NOT comes into play…NOT 0 = 1!
This allows us to express our desired “1 AND 1” as “1 AND (NOT 0)”; or in terms of our input lines “A1 AND (NOT A0)”. The only way that expression is true (1) is if A1 = 1 and A0=0. If you don’t believe me, the table below might help 🙂
||A1 AND (NOT A0)
||0 AND 1
||0 AND 0
||1 AND 1
||1 AND 0
There’s a whole world of boolean logic (or boolean algebra) available to us to play games like this 🙂 We’ll probably use a few more tricks like this in future circuits too.
Following through with this, the remaining output lines can be determined using similar boolean logic:
||(NOT A1) AND (NOT A0)
||(NOT A1) and A0
||A1 AND (NOT A0)
||A1 AND A0
I’ll leave it up to you to figure out how these are all wired into that 2:4 decoder circuit diagram 🙂
By combining multiple AND and NOT gates it’s possible to create 3:8, 4:16, etc. decoders. Since I’m pretty sure I’ll need at least the 3:8 and 4:16 decoders I wired some up:
You probably noticed that I cheated a bit 🙂
Rather than creating a large number of AND and NOT gates I instead used a lower order decoder (e.g. a 2:4 for the 3:8 and a 3:8 for the 4:16) along with a 2-output demultiplexer.
The decoder handles the “lower most” input lines (e.g. the lower 2 of 3 inputs, or the lower 3 or 4 inputs) while the demultiplexer handles the “upper most” input line by redirecting decoder’s outputs to either the lower or upper ‘n’ output lines (e.g. the lower/upper 4 outputs, or the lower/upper 8 outputs).
Well, I just realized that I didn’t really give any examples on what a decoder can be used for! It’s getting late though so trust me, they’re useful 🙂
Taking a short break from creating circuits to jot down some preliminary thoughts on the overall architecture for Nandy. While I could probably be a bit more ambitious for now I think the following would be a reasonable goal:
My first computer was an 8-bit processor with 128K of RAM; so I’ll feel right at home working with something this primitive 🙂
Since the instruction set is kind of the heart and soul of the CPU let’s talk about that in more detail. I spent some time reading through the technical manuals of several classic CPUs (like the Zilog Z80 and Motorola 6809) to get a feel for their capabilities. Ultimately the MIPS series of CPUs caught my interest due to its relative simplicity compared to the others.
Not that MIPS is a simple processor…not at all! All that instruction pipelining stuff, wow. A project for another time maybe 🙂
Registers & Memory
Nandy will be a wholly 16-bit CPU which means:
- all registers will be 16-bit words
- every memory (RAM) location will be a 16-bit word
- all instructions will be 16-bit words (since they’re stored in RAM)
This choice will help simplify the CPU design in some aspects (e.g. shear number of gates required) and make it more complicated in other ways (e.g. instruction encoding, numeric limitations, etc.). I’ll talk about those issues when I run into them 🙂
I’m new at this. I’m building a CPU from NAND gates. I don’t really want to design and build a 50+ instruction set CPU…at least not yet. So how few operations can I get away with and still have something that can be called a “computer”?
Well, one of the primary things computers do is math. Let’s start there:
- negative (a = -a)
- bit logic (AND, OR, NOT, XOR)
- bit shifting (multiply and divide by 2)
I’m going to avoid multiplication and division (beyond bit shifting). Those can calculations get rather complicated to do with logic gates and can actually be performed using repeated addition or subtraction in software.
I’m also going to limit Nandy to integer math, no floating-point values at this point. If I’m feeling ambitious I might look into either hardware or software implementation of fixed-point arithmetic; which would give some ability to work with non-integer (“real”) values.
In addition to mathematical operations, I’ll also need some ability to “branch” in order to perform if/then/else type logic. I’m thinking I’ll at least need:
- branch if zero
- branch if negative
- branch always
That should cover most “comparison” type expressions (e.g. if A < B then …) and “goto” branching for sub-routine calls and such.
Since there are only a handful of registers in a CPU for storing data but a “huge” amount of RAM nearby I’ll need some instructions for accessing memory (RAM). As I briefly mentioned above I was inspired by the RISC / load-store architecture which means most of my instructions will operate only on registers and only a few of them will actually write to and read from memory:
- read 16-bit word from RAM and store in a register
- write a register value into a 16-bit word in RAM
Finally, I’ll also need some way to load “immediate” values into registers; e.g. “A = 3”:
- load a register with a 16-bit constant
This last one will be tricky. Nandy’s registers will be 16-bits wide and so will it’s instructions. So how do I load a 16-bit register with a 16-bit value when the instruction is only 16-bits wide? Note, somewhere in that 16-bit instruction I have to encode “load this immediate value” (along with the other instructions too) so at least a couple bits need to be used for that…leaving fewer than 16 bits for that 16-bit value…
Food for thought; problem for another day 🙂
Well, in total that puts me at around 15 instructions. Not too bad!
So what are addressing modes? Basically these are various methods the CPU can use to access data; which is typically placed in some location (address) in the computer’s RAM.
Nandy is a bit unique in that two of the three addressing modes I’m considering don’t actually address RAM, but instead refer to the CPUs internal registers or constant values. The third mode though is the primary (well, only actually) means to access data in RAM.
As I mentioned above, most of the Nandy instructions will operate only on registers; some CPUs refer to this as the register-direct addressing mode. Basically the idea is that I load registers with values of interest (either with immediate values or from RAM) then perform a calculation. The result of that calculation is stored into another register. So basically, this mode moves data between registers only.
I mentioned this one above too. Basically immediate addressing lets you load a constant value into a register. For example I might load A with 3, then B with 4 using immediate addressing. The A and B registers can then be added together and stored into a third register C (i.e. register-direct addressing).
This is the big one. This form of addressing consists of two parts: a base value (in one register) and an offset from that base (in another register or as an immediate value). Essentially you derive the final address by:
Address = Base + Offset
Once you have that address you can use it to access data in RAM or branch to that location (e.g. if/then expression) which is also somewhere in RAM.
So why use a pair of base/offset value instead of just a single one (i.e. as with absolute/direct or register-indirect addressing)?
Well, one of the neat features of this approach is that the “offset” can be a positive or negative value. This means that I can refer to data that’s either “before” (negative) or “after” (positive) the base value. This sort of thing really becomes useful when performing branching (e.g. jump back a few instructions if register A is zero).
The other nice thing is that the “offset” is kind of optional; setting it to 0 means that only the “base” is used to compute the final address. So really, you get register-indirect built-in and even kind of get absolute/direct addressing too!
That’s all for now. Maybe next time I’ll have a few more circuits to share.
We’ve already agreed that multiplexers are cool. What’s next?
Well, they’re really only half of the picture for logical switches. The demultiplexer essentially performs the opposite task. That is, given a single input line send its value to one of ‘n’ output lines.
Let’s start with a DMUX2x1 (that is 2 outputs, 1 line each):
With this circuit, A is the single input (being either 0 or 1) and SEL states whether to place A on X0 (SEL = 0) or X1 (SEL = 1), with the other being set to 0. In the example above we have A = 1 and SEL = 1, so the value of A (1) is output on X1 (with X0 being 0).
How do we extend this to more than 2 outputs? As with the MUX4x1 (and others) we use multiple DMUX2x1s to create a DMUX4x1; then multiple DMUX4x1s to create a DMUX8x1 and so on. Let’s not get ahead of ourselves though and do a DMUX4x1 first:
This circuit works pretty much the exact opposite way the MUX4x1 did. Recall that SEL0 and SEL1 represent the least-significant and most-significant digits of the two-digit binary number used to select what value goes on the output (00 for X0, 01 for X1, 10 for X2 and 11 for X3).
With the DMUX4x1 above, the most-significant digit is decoded by the left-most DMUX2x1 and sends the input value (A) to either the upper-right DMUX2x1 (if SEL1=0) or the lower-right DMUX2x1 (if SEL1=1).
With this decision made, the choice is narrowed down to either X0/X1 (upper DMUX2x1) or X2/X3 (lower DMUX2x1). The selected DMUX2x1 (upper or lower) then decodes the least significant digit (SEL0) to determine which of its two lines to output A on (e.g. X2 with SEL0=0 or X3 with SEL0=1).
Like I said, pretty much like the MUX4x1 but in reverse 🙂 For reference, here’s the MUX4x1 again:
Okay, let’s keep going and combine some DMUX4x1s into a DMUX8x1:
For eight output lines (X0 to X7) we need to have 3 SEL lines (2^3 = 8) which form a three digit binary number (000, 001, 010, 011, 100, 101, 110, 111).
The DMUX4x1 we created above accepts two SEL lines to select one of its 4 outputs (00, 01, 10, 11) so to cover all eight outputs we’ll need two of those: one for selecting X0 through X3 and a second for X4 through X7. But which two SEL lines (SEL0, SEL1, SEL2) do we use?
Look a couple paragraphs up to where I listed the eight 3-digit binary numbers. Notice how the 2-digit binary numbers are repeated twice, once with the 3rd digit as 0 and again with it as 1:
||2nd & 1st digits
This means that we can connect SEL0 (1st digit) and SEL1 (2nd digit) to our two DMUX4x1s. The 3rd digit (SEL2) can then be used to select between those two DMUX4x1 using a DMUX2x1.
Let’s work through the example in the DMUX4x1 figure above. With SEL2 = 1, A (1) is output on X1 of the DMUX2x1 which is fed into the lower DMUX4x1. Lines SEL1 (1) and SEL1 (0) then tell that DMUX4x1 to output A on X2 (10 in binary is 2 in decimal) which corresponds to X6 of the DMUX8x1. Or in other words, setting SEL2 = 1, SEL1 = 1, SEL0 = 0 (110 in binary or 6 in decimal) outputs A (1) on X6.
Got all that?
I really wish I had a way that you could play with these circuits live 🙂 Maybe I’ll try to get some Logisim files up for you to experiment with.
Now to round out our collection, here’s a DMUX16x1:
For those of you keeping track, here’s where we stand for the number of NAND gates being used:
They certainly add up!
That’s it for multiplexers and demultiplexers (at least for now) so what’s the next topic? Well, I’ve been giving some thought to the basic CPU hardware (word size, addressing, registers, etc.) and instruction encoding. Thinking next time I’ll put some of those ideas down on “paper”.
I’ve been playing more with Logisim – getting used to the UI and building more circuits – multiplexers and demultiplexers in fact.
So last time I showed you the MUX 2×1 I built to help create a single “bit” of memory:
To recap, a multiplexer lets you choose between two (or more) input lines (e.g. A0 and A1 above), sending only one of them to the output (X). The SEL line states which of the two inputs (SEL = 0 for A0, SEL = 1 for A1) to pass through to X.
That’s great for two lines, but what if you have more to chose from? Turns out this basic logic can be relatively easily extended to additional lines by combining multiple MUXs. Let’s start with a MUX4x1:
So now we have two SEL lines (SEL0 and SEL1) and four input lines to choose from (A0 to A3). How do we set the SEL lines to pick a particular input line?
If you’re not already familiar with binary numbers, now would be a good time to read up 🙂
Given two SEL lines (each being 0 or 1) we can create 2^2 = 4 values in binary: 00, 01, 10, 11 which correspond to the decimal values 0, 1, 2 and 3. Hmm, notice that we named the input lines above A0, A1, A2 and A3? That wasn’t coincidental 🙂
That means in order to pick input A2 as our output (as in the diagram above) we need to set the two SEL lines to the binary number 10 (or 2 in decimal) or in other words SEL0 = 0 and SEL1 = 1.
Why that and not SEL0=1 and SEL=0?
In terms of binary number counting SEL0 represents the “least significant digit” (right most) of the two digit binary number (e.g. 0 of 10); which means that it repeats each time a more significant digit (e.g. SEL1) changes. In decimal numbers we might count from 0 to 9, 10 to 19, 20 to 29, etc. Binary works the same way, but we only have 1 and 0 to work with. So 00, 01, 10, 11.
Read that over a couple times if you need to 🙂
Okay, so with SEL0 representing the least significant digit, let’s see what the circuit does with it. You can see that SEL0 (0) is fed into the two both of the MUX2x1’s on the left. This means that the top MUX2x1 will select A0 (the first of its two inputs) and the bottom MUX2x1 will select A2 (the first of its two inputs).
That narrows down our options from 4 lines to just 2 (A0 or A2). Now we need to choose which of those two to output – that’s where the “most significant digit” (SEL1) comes into play. The outputs of the two MUX2x1’s are fed into a third MUX2x1 with the SEL1 line making the choice. In this case, SEL1=1 so its second input (A2, the output from the lower MUX2x1) is output on X.
Following through with this logic we get…drum roll…a MUX8x1 and MUX16x1!
Cool huh? Maybe a little hypnotic too 🙂
That’s all for now, I’ll write about the demultiplexers next time!
After playing around with about a half dozen applications for simulating logic circuits, the winner is…
It’s free, pretty easy to use, lets me visualize and interact with the circuits, has decent documentation and best of all its file format is XML based so I should be able to translate any circuits I make into my own simulator if/when I get there. There are fancier ones out there, most for $$$, but I’m happy with this one for now 🙂
And so, my first NAND based circuit… a D-type flip flop (DFF)
Exciting isn’t it?
Here’s what it does, when CLK (representing the computer clock which cycles from low (0) to high (1), to low, etc.) is high, the value at A (either high or low) is “pushed” into the various NAND gates. This value is output on X and its inverse (logical NOT) on ~X. The neat part is that when CLK goes low, those values are retained at X and ~X. At least until CLK goes high again, then A is loaded as before and X/~X change.
What’s the use of this? Well, turns out the DFF is a corner stone to building a single “bit” of memory. Yeah, that’s right at least 4 NAND gates are required to represent a single bit of memory. We’re not done though, because to truly “remember” the bit value we need a few more logic gates to retain X when CLK cycles back to high.
Leading to my second and third circuits (no I’m not going to count them all). The key behind a ‘bit’ remembering its value is to feed the DFF’s output (X) back into A. Unfortunately we can’t directly connect X back to A since there would be no way for the various NAND gates to know whether to operate on the value from A or from X.
Fortunately this can be solved by adding a “switch” that lets only one or the other through to the other gates. A “multiplexer” (below) accepts two or more inputs (A0 and A1 in this case) and one or more “select” lines (SEL). The value on the SEL line dictates which of the inputs are passed through to the output (X).
In the figure above, A0 is high (1) and A1 is low (0). Since SEL is low (0) the multiplexer selects the A0 line, resulting in 1 on X. If SEL was 1 then A1 would have been selected, with 0 being output on X.
Get it? Let’s combine that with the DFF above:
So here we’ve added the multiplexer (2×1 means 2 input lines, 1 wire to each) to “switch” between A (a new value to load) and X (the current value in the DFF). Which input to use is stated by SET (0 = X, 1 = A).
As before, when CLK is high the value of the DFF can be changed. However, if SET is high (1) then A is passed in (e.g. the DFF is “set” to 1), otherwise if SET is low (0) then X is used. As before, once CLK goes low the DFF retains its last value and outputs it on X (which is fed into the MUX).
Here’s the sneaky bit, as long as CLK is low, changing either A or SET will have no affect on the DFF’s internal state. Once CLK goes high again, if SET is 0 then the DFF’s current value (output on X) is fed back into itself and so “remembers” its value. This will continue as CLK cycles from low to high; at least until SET becomes 1 again to give the DFF a few value.
Hopefully that all made sense 🙂
Well, that now means that a single “bit” of memory requires 9 NAND gates (5 for the DFF and another 4 for the MUX2x1).
Let’s do some math:
1 word = 16 bits x 9 NANDs = 144 NANDs
64K words of memory = 144 NANDs x 65536 = 9,437,184
That’s almost 10 million NAND gates just for a mere 64K of memory! Hmm, I think I’ll probably need to cheat for main memory like this as I suspect the simulator with die trying to sequence that many NAND gates. I might try using the NAND gate bits for CPU registers though, there will only be a handful and it’d be a shame not to use my bit circuit at all 🙂
That’s enough for now. Think I’ll next try making a few more multiplexers (4, 8 and 16 inputs) as I think those will be useful later on. I’ll also look into the logic needed for demultiplexers which are basically the opposite of multiplexers (given a single line, output it to one of ‘n’ others).
I also need to give some thought to the basic CPU requirements (e.g. word size, memory addressing, number of registers, instruction set, etc.) as they will dictate which types of low-level circuits I need to create.
So where should I start with building a CPU? I think I’ll begin by jotting down some thoughts of things I’d like to accomplish.
First and foremost, I want to build all the logic using only NAND gates. That’s not to say I won’t abstract them into components like multiplexers and adders as I go “up” in the design. I’m pretty sure connecting hundreds of NAND gates individually would get pretty tedious 🙂
Second, I want to be able to actually run programs on my CPU. I’m not ready to delve deep into IC electronics just yet which that means I’ll need some kind of logic gate simulator. I’ve found a few software packages that show promise (I’ll get back to you on my next post) but I’m tempted to try writing my own. Something simple…maybe just emulate a few thousand NAND gates, hook them up and see what happens!
Third, speaking of programming the CPU, I’d be happy with just a basic assembly language – but once I get things up and running I might try my hand at a high-level language compiler…maybe even a minimal OS…I can dream.
What is your name?
Oh, and I’ve decided to nickname my CPU “Nandy”. Combine NAND with a once-popular home computer brand from the 1980’s…
As a final note for tonight, I thought I’d share some links I found inspirational:
Meet the lowly NAND gate.
Yes, there are more sophisticated logic gates like AND, OR, XOR and so on, but the NAND gate is special. How so? It can be used to create all of the others – a “universal gate” so to speak.
Why is this important to me? Because this meager little logic gate will be the focus of my next project…design my own CPU!
An eon or so ago when I was a young man at university I took a principles of computer hardware course in which I spent a lot of time combining elemental logic gates into more sophisticated devices such as multiplexers, adders, and even a full Arithmetic Logical Unit (ALU). It was a blast!
I recently got it in my head that I’d like to try my hand at that again. With my class notes and textbooks long turned to dust I dug around and eventually stumbled across Elements of Computing Systems – the tag line says it all “The NAND to Tetris Companion”. Yup, this book walks you through designing your own CPU, assembly language, virtual machine and high-level language. Great stuff!
While a bit short in technical details in places (how exactly does a D-Flip Flip work?) it was enough to inspire me to try my hand at building my own.
So here we go…