Fantasy CPU emulator part 2

2021-12-21

Welcome to part two of "Fantasy CPU emulator" in which we go through the most interesting takeaways from writing the application.

If you have not read part one please read it first, to get a better understanding of what I've built.

You can view / download the complete code on GitHub.

Modelling the CPU

The following code shows the CPU's type definitions:

export type Memory = number[];

export type Printer = number[];

export type Cpu = {
  ip: number;
  is: number;
  r0: number;
  r1: number;
  memory: Memory;
  state: CpuState;
  printer: Printer;
};

The CPU stores the values of all registers, which are always numbers. It also stores what came out of the printer, and what the current state of the memory is.

The CPUState in an union with the following possibilities:

export type CpuState =
  | 'finished'
  | 'reading-instruction'
  | 'executing-instruction'
  | 'crashed';

The CpuState models what the CPU is doing at the moment. Before an instruction can be performed the CPU must first read the instruction, only then it can execute the instruction. This means that executing an instruction is always done in two steps: reading and writing.

Technically there is no reason to do the reading and executing in a two steps, it is done purely for a visual / educational standpoint, as this makes the inner workings of the CPU easier to visualize. This in turn makes it easier to understand what the CPU is doing.

The CpuState also tracks why a program stopped, which can be for two reasons:

  • 'finished' is for when a program is done. This happens when the program executes a 0 instruction.
  • 'crashed' is for when to computer says no. This happens when executing a non existing instruction.

Making the CPU work

Now that we know how we model the state of the CPU, it is time to look at how to implement the CPU itself.

The memory the CPU can access can only store numbers. So each instruction corresponds with a number. I choose to model this like so:

type Operation = (cpu: Cpu) => Cpu;

const operations: Record<number, Operation> = {
  0: exit,
  1: addition,
  2: subtraction,
  3: incR0,
  4: incR1,
  5: decR0,
  6: decR1,
  7: beep,
  8: print,
  9: loadIntoR0,
  10: loadIntoR1,
  11: writeFromR0,
  12: writeFromR1,
  13: jump,
  14: jumpIfR0Zero,
  15: jumpIfR0NotZero
};

First I defined a type called Operation which is a function which takes the Cpu and returns a new Cpu, which has applied / performed the operation.

The operations object is a Record which maps numbers to operations. The reason I choose an object instead of an array, is because I found it to be a little more explicit, I quite like the fact that you see the mapping of the number to the operation so explicitly.

I could also define an array, and then manually add the instruction by index, one line at a time. But this feels a little more messy to me due to the array's creation then spanning multiple statements.

Lets look at the most important function execute:

export function execute(cpu: Cpu): Cpu {
  switch (cpu.state) {
    case 'crashed':
    case 'finished':
      return cpu;

    case 'reading-instruction': {
      const is = cpu.memory[cpu.ip];
      return { ...cpu, is, state: 'executing-instruction' };
    }

    case 'executing-instruction': {
      const is = cpu.is;

      if (!isOperation(is)) {
        return { ...cpu, state: 'crashed' };
      }

      const operation = operations[is];

      const nextState = operation(cpu);

      if (nextState.state !== 'finished') {
        nextState.state = 'reading-instruction';
      }

      return nextState;
    }
  }
}

What is does is based is alter the Cpu based on its current state.

The cases for CpuState's 'finished' and 'error' are the easiest to implement. They simply cause the CPU not to be affected at all, and return the cpu parameter as is.

In the 'reading-instruction' state, the IS register is set to what is currently in memory, on the location which is contained in the IP register. Remember each instruction will increment the IP register by one or two (depending on if the instruction is a two byte or one byte instruction) which is how the program moves forward.

The 'executing-instruction' is the most important, here we actually perform the operation. First it makes sure that the IS register contains a valid operation. The isOperation checks ifIS is defined, and a valid operation, meaning a number between -1 and 16. If this is not the case the CpuState is set to 'crashed'.

If the instruction is valid, it is executed. The result of the instruction is the next version of the CPU. There is one other thing that execute will then also: set the CpuState back to 'reading-instruction'. The reason that happens in execute and not in the individual operations, is that the instructions need less code.

The only instruction which actually changes the CpuState is the exit instruction. Which puts the state to 'finished' which is why the if-statement is still needed.

Speaking of the exit instruction here it is:

function exit(cpu: Cpu): Cpu {
  return { ...cpu, is: 0, state: 'finished' };
}

All the exit does is set the CpuState to 'finished' and then it is done. It will also set the IS register to 0 but this is not really needed, because it is already correctly set by execute, but I do it anyway for clarity. Also note that the IP register must be left alone, because the program has stopped.

Lets look at the addition and subtraction instructions next:

function addition(cpu: Cpu): Cpu {
  const r0 = cpu.r0 + cpu.r1;

  return { ...cpu, is: 1, r0, ip: cpu.ip + 1 };
}
function subtraction(cpu: Cpu): Cpu {
  const r0 = cpu.r0 - cpu.r1;

  return { ...cpu, is: 2, r0, ip: cpu.ip + 1 };
}

As you can see they both set the register R0 to be the result of the operation. Also because they are both one-byte instructions the IP register is incremented by one.

Up next are the increment and decrement instructions.

function incR0(cpu: Cpu): Cpu {
  const r0 = cpu.r0 + 1;

  return { ...cpu, is: 3, r0, ip: cpu.ip + 1 };
}
function decR0(cpu: Cpu): Cpu {
  const r0 = cpu.r0 - 1;

  return { ...cpu, is: 5, r0, ip: cpu.ip + 1 };
}

As you can see incrementing / decrementing a register is not very hard at all.

The R1 variants are not shown / explained but they work exactly the same, but are performed on the R1 register.

A more interesting instruction is beep:

function beep(cpu: Cpu): Cpu {
  /* istanbul ignore if */
  if (window.AudioContext) {
    const context = new AudioContext();
    const oscillator = context.createOscillator();
    const gain = context.createGain();
    oscillator.connect(gain);
    oscillator.frequency.value = 520;
    oscillator.type = 'sine';
    gain.connect(context.destination);
    gain.gain.value = 30 * 0.01;
    oscillator.start(context.currentTime);
    oscillator.stop(context.currentTime + 200 * 0.001);
  }

  return { ...cpu, is: 7, ip: cpu.ip + 1 };
}

I had never created a sound before in the browser so I was not sure if it was possible. But luckily the AudioContext API exists, so we can do just that.

I'm not a sound expert, so I cannot claim to know exactly how the code produces the sound. I simply fiddled with this example on StackOverflow, by Petr Tripolsky, until it sounded good to me.

The istanbul ignore exists because JSDom does not support the AudioContext API, and I did not want to mock the entire API just to test this.

Time to delve into the first of the two-byte instructions: print:

function print(cpu: Cpu): Cpu {
  const printer = [...cpu.printer, getValueAtNextAddress(cpu)];

  return { ...cpu, is: 8, printer, ip: cpu.ip + 2 };
}

I choose for an array of numbers to represent the printer, and not just "print" the value, because I want to know what was printed, and not loose that information in the future. This so we can do time travel debugging later.

To get the value at the next address I call a helper function named getValueAtNextAddress. It will return the value of next cell based on the current IP. It will also guard against going out of bounds meaning: accessing indexes outside of the memory array, by returning 0 if that happens.

I think it would be better to throw an error instead and set the CpuState to 'error', but it works for the time being...

print is also special in the way it uses the second / next cell. It will use the value of the next cell literally! Whatever is in the next cell will be printed.

All other two-byte operations will use the value of the second cell as a memory address to fetch the value from. You could say that they treat the second / next cell as a pointer, not a literal value.

Also all two byte instructions act on two bytes in memory, so they have to increase the IP by 2. If we did not do this the next cell would be interpreted as another instruction.

Lets look at loadIntoR0 next:

function loadIntoR0(cpu: Cpu): Cpu {
  const address = getValueAtNextAddress(cpu);

  const r0 = cpu.memory[address];

  return { ...cpu, is: 9, r0, ip: cpu.ip + 2 };
}

loadIntoR0 uses the value of the next / second memory cell as the address in memory, from which to take the value from to load into R0.

Meaning that if the CPU encounters: 9, 10 this means: take whatever is the value inside of memory address 10 and load it into R0.

Writing also does the same thing:

function writeFromR0(cpu: Cpu): Cpu {
  const address = getValueAtNextAddress(cpu);

  const memory = [...cpu.memory];
  memory[address] = cpu.r0;

  return { ...cpu, is: 11, memory, ip: cpu.ip + 2 };
}

By now you must wonder why I copy the memory and the state. The reason is so we can create the history overview. By never mutating the objects / arrays directly we can create easily restore the CPU to those states. This enables time travel debugging, more on that later on.

Lets look at Dijkstra's least favorite instruction: jump.

function jump(cpu: Cpu): Cpu {
  const address = getValueAtNextAddress(cpu);

  return { ...cpu, is: 13, ip: address };
}

Jump directly alters the value of the IP register by reading the value from the address at the second / next cell.

Many moons ago when I first wrote the Clojure variant of this CPU, I was surprised on how easy jump was to implement. You would think that such a powerful instruction would take more lines of code.

The following instructions: jumpIfR0Zero and jumpIfR0NotZero only take the R0 register into account. This means that all branching / looping can only ever be done based R0.

Here is the code:

function jumpIfR0Zero(cpu: Cpu): Cpu {
  if (cpu.r0 === 0) {
    const address = getValueAtNextAddress(cpu);

    return { ...cpu, is: 14, ip: address };
  } else {
    return { ...cpu, is: 14, ip: cpu.ip + 2 };
  }
}
function jumpIfR0NotZero(cpu: Cpu): Cpu {
  if (cpu.r0 !== 0) {
    const address = getValueAtNextAddress(cpu);

    return { ...cpu, is: 15, ip: address };
  } else {
    return { ...cpu, is: 15, ip: cpu.ip + 2 };
  }
}

The branching is done using an if-statement, which is quite a luxury to have.

The code we have seen so far was written to be very functional. Functional programming is very suitable for this type of problem. Especially the use of pure functions makes testing the CPU a joy.

A pure function is a function which when given the same input produce the same output, without mutating anything of the outside world. This property is what makes the tests so easy to write.

For example here is the test for decR1:

it('should perform the decR1 instruction when the IS is 6', () => {
  const cpu = { ...initCpuFromMemory([6, 0]), is: 6, r0: 0, r1: 5 };
  cpu.state = 'executing-instruction';

  expect(execute(cpu)).toEqual({
    ip: 1,
    is: 6,
    r0: 0,
    r1: 4,
    memory: [6, 0],
    state: 'reading-instruction',
    printer: []
  });
});

If only all test were so easy to write.

Functional core, imperative shell

I tried to make as much code as possible unaware that React even exists.

This has the following benefits:

  • Because React and the core functionality are separated, the code is less complex. The React code may drive the CPU related code, but is not the CPU code itself. React handles the events, such as pause and step, and the UI, the CPU will handle itself. This will prevent React knowledge from bleeding into the CPU.
  • It makes the CPU portable to other languages and frameworks. React might not live forever, so this makes migration easier in the future.
  • Testing vanilla TypeScript / JavaScript code does not require any React specific setup, which makes testing easier. It requires fewer lines of code, and does not require any React specific knowledge.

The React UI

Know that we know how the CPU works, lets see how we visualize and drive it from React. The main React component is called FantasyCpu,it uses a useReducer which produces the objects which are of type FantasyCpuState:

export type CpuHistory = Cpu[];

export type FantasyCpuState = {
  cpu: Cpu;
  savePoint: Cpu;
  history: CpuHistory;
  error: boolean;
  playing: boolean;
  speed: number;
};

This FantasyCpuState represents the UI state. It contains the a Cpu, the playing state, the current speed of the CPU, and if there are errors.

It also contains a savePoint, which is a version of the cpu we restore to when the restore button is clicked.

The final thing it contains is a history, which is an array of the past Cpu states. This is the one element in the FantasyCpuState which I struggled to classify, is it a UI thing, or a CPU domain thing?

I'm still not a 100% sure if classifying it as UI state is the correct choice, but my reason to do so is that a CPU does not have a notion of history. I reserve the right to change my mind in the future however.

The history is used to display a history of the CPU, for debugging purposes. Also when a history is clicked we restore to that state. This is the time travel debugging I was talking about earlier.

As I've mentioned before we use useReducer to handle the state of the FantasyCpu. The reducer function, and its actions, are defined like this:

export type FantasyCpuStateActions =
  | { type: 'pause' }
  | { type: 'play' }
  | { type: 'save' }
  | { type: 'step' }
  | { type: 'setSpeed'; payload: number }
  | { type: 'restore'; payload: Cpu }
  | { type: 'restart' }
  | { type: 'reset'; payload: number[] }
  | { type: 'update'; payload: { text: string; index: number } };

function reducer(
  state: FantasyCpuState,
  action: FantasyCpuStateActions
): FantasyCpuState {
  switch (action.type) {
    case 'pause':
      return { ...state, playing: false };

    case 'play':
      return { ...state, playing: true };

    case 'save':
      return { ...state, savePoint: state.cpu };

    case 'setSpeed':
      return { ...state, speed: action.payload };

    case 'step':
      if (!canExecute(state.cpu)) {
        return { ...state, playing: false };
      }

      const nextState = execute(state.cpu);

      return {
        ...state,
        history: [...state.history, state.cpu],
        cpu: nextState,
        playing: state.playing ? canExecute(nextState) : false,
        error: nextState.state === 'crashed'
      };

    case 'restore':
      return { ...state, cpu: action.payload, playing: false };

    case 'restart':
      return {
        ...state,
        playing: true,
        cpu: {
          ...state.savePoint,
          r0: 0,
          r1: 0,
          ip: 0,
          is: 0,
          state: 'reading-instruction'
        }
      };

    case 'reset':
      return initFantasyCpuState(action.payload);

    case 'update':
      const number = parseInt(action.payload.text, 10);

      if (isNaN(number)) {
        return { ...state, error: true };
      }

      const memory = [...state.cpu.memory];
      memory[action.payload.index] = number;

      const nextCpu = { ...state.cpu, memory };

      return {
        ...state,
        error: false,
        cpu: nextCpu,
        savePoint: nextCpu
      };
  }
}

A reducer is a function which when given a state, in our case an object of type FantasyCpuState, and an action, in our case of type FantasyCpuStateActions, produces a new FantasyCpuState with the action applied.

If this sound a lot like Redux you will be right. It was the direct inspiration for the useReducer hook. It is basically a tiny Redux baked into React itself.

But why use useReducer instead of useState?

The reason is this: whenever you find two or more useState's which are intertwined together, meaning they affect each other / depend on each other useReducer is a better idea.

This becomes apparent in the "step" action:

case 'step':
  if (!canExecute(state.cpu)) {
    return { ...state, playing: false };
  }

  const nextState = execute(state.cpu);

  return {
    ...state,
    history: [...state.history, state.cpu],
    cpu: nextState,
    playing: state.playing ? canExecute(nextState) : false,
    error: nextState.state === 'crashed'
  };

The trick is that whenever a "step" occurs, and the CPU has to move forward, lots of UI state needs to be manipulated at once: The history needs to be appended, the playing state stops based on the CpuState, and whenever there is an error we want to display it.

Worst yet this also happens conditionally, because when the CpuState is either "finished" or "crashed" the CPU must do nothing. Which is what the canExecute guards against.

My initial attempt at implementing FantasyCpu did not use useReducer it was a hot mess. To understand why, take a look at how the autoplay functionality works with useReducer:

import { useEffect, Dispatch } from 'react';
import { FantasyCpuState, FantasyCpuStateActions } from '../../FantasyCpu';

export function useAutoplay(
  state: FantasyCpuState,
  dispatch: Dispatch<FantasyCpuStateActions>
) {
  useEffect(
    function autoPlay() {
      if (!state.playing || state.error) {
        return;
      }

      // Do a step every second
      const timeoutId = setInterval(() => {
        dispatch({ type: 'step' });
      }, state.speed);

      return () => {
        clearInterval(timeoutId);
      };
    },
    [dispatch, state.error, state.playing, state.speed]
  );
}

useAutoplay is a custom hook which uses useEffect to set an interval at which to call dispatch("step"). The code is relatively straightforward, assuming you know the basics of useEffect.

Now lets look at an older version which uses useState and it's version of the step and autoplay:

export function FantasyCpu({ memory }: Props) {
  const [cpuState, setCpuState] = useState(() => initCpuFromMemory(memory));
  const [savePoint, setSavePoint] = useState<CpuState>(cpuState);

  const [history, setHistory] = useState([]);

  const [playing, setPlaying] = useState(false);
  const [speed, setSpeed] = useState(200);
  const [saving, setSaving] = useSaving();

  function step() {
    if (cpuState.state === 'finished' || cpuState.state === 'crashed') {
      setPlaying(false);
      return;
    }

    // Create a save point if there is not one.
    if (!savePoint) {
      save();
    }

    setHistory([...history, cpuState]);

    // Execute is imported from cpu.ts
    setCpuState(execute);
  }

  useEffect(
    function autoPlay() {
      if (!state.playing || state.error) {
        return;
      }

      const intervalId = setInterval(() => {
        step();
      }, speed);

      return () => {
        clearInterval(intervalId);
      };
    },
    [playing, state.error, speed]
  );

  function save() {
    setSaving(true);
    setSavePoint(cpuState);
  }

  return null;
}

The code feels more disjointed, the reason for it having al these separate functions, is so they can also be called individually. For example the save function is called when the save button is clicked, and when performing the initial step when there is no save point yet.

The relationship between these states is difficult to see in this version of the code. They are intertwined deeply, but you must read and grok the entire code to see how the are intertwined.

But the real reason I refactored the code to use useReducer is the following: the autoplay useEffect is missing a dependency, the step function. The problem is that if I add the step function as a dependency, it will run the useEffect on every render, as the step function is not stable. As each call to FantasyCpu will re-create the step function.

One solution is to use a useCallback around the step function, so it returns the same function every time, but this has a cascading effect: the save function must then also be wrapped inside a useCallback, for the same reason.

So by using useReducer here I managed do two things:

  • The code is more readable as one action is responsible handling the entire state change. The logic is no longer put into separate functions, which are at some distance to each other.
  • The need for useCallback was eliminated which would have made the code more complex.

Now that we have seen the reducer and the auto play lets look at the FantasyCpu component itself.

// Import omitted.

type Props = {
  initialMemory: number[];
};

export function FantasyCpu({ initialMemory }: Props) {
  const [state, dispatch] = useReducer(
    reducer,
    initialMemory,
    initFantasyCpuState
  );

  const saving = useShowSavingMessage(state.savePoint);

  useAutoplay(state, dispatch);

  const { cpu } = state;

  return (
    <div className="FantasyCpu p-0 md:p-10">
      <div className="px-2 sm:px-10 py-5 grid gap-4 bg-white md:w-680">
        <Registers cpu={cpu} />

        <Memory cpu={cpu} dispatch={dispatch} />

        <div className="sticky bottom-0 bg-white grid gap-4 py-2">
          <Printer printer={cpu.printer} />

          <HistoryView history={state.history} dispatch={dispatch} />

          <div className="grid xl:grid-cols-6 md:grid-cols-3 gap-4">
            <PlayButton
              playing={state.playing}
              cpuState={cpu.state}
              dispatch={dispatch}
            />

            <SpeedSelect speed={state.speed} dispatch={dispatch} />

            <StepButton dispatch={dispatch} />

            <SaveButton dispatch={dispatch} />

            <RestoreButton savePoint={state.savePoint} dispatch={dispatch} />

            <ResetButton dispatch={dispatch} initialMemory={initialMemory} />
          </div>

          {saving ? <strong>Saved current CPU state</strong> : null}

          {state.error ? <ErrorMessage cpuState={cpu.state} /> : null}
        </div>
      </div>

      <CheatSheet />
    </div>
  );
}

The nice thing about useReducer is that you can now pass along the dispatch to child components, allowing them to send actions themselves.

This has a nice benefit, it makes components such as the PlayButton actually able to trigger the operation. Changing it from being a dumb component, into a component which handles a piece of the domain. Contrast these two versions of PlayButton:

<PlayButton
  playing={state.playing}
  cpuState={cpu.state}
  dispatch={dispatch}
/>

<PlayButton
  playing={playing}
  cpuState={cpuState}
  onClick={playingChanged}
/>
}

The first use of PlayButton is smart / domain aware it will handle the logic of when to dispatch the 'pause', 'play' or 'restart' actions.

The second PlayButton only handles the UI, and forces the FantasyCpu to handle the change in state. It is dumb and not aware of its purpose beyond displaying the UI.

Having lots of dumb components forces you to create one or components which are too smart. This has the nasty effect of these components ballooning in size, due to their many responsibilities.

So my advice: try to give your components not only something to display, but also something to do.

Conclusion

So it seems like useReducer is pretty neat. This does not mean however that useReducer should always be used.

My rule of thumb is this: if the states intrinsically linked to each other use useReducer. Otherwise use useState as it does have less overhead, when the state is not coupled.

You can view / download the complete code on GitHub.

Please share this post if you enjoyed it.