## Building a Processor, Part 7: Optimization Debugging

*This is part 7 of my Building a Processor series, where I try to build a processor on an FPGA board. This post is mostly a continuation of the previous post, about optimizing the debouncer circuit I wrote.*

Previous: optimizing the debouncer.

In the last post, I did what I thought were some cool optimizations to simplify the debounce circuit, but I was disappointed to see how little it ended up affecting the final design. So, today I'm going to look into why it didn't help. I did a similar optimization in a larger overall design, and saw benefits there, and I have a couple theories why that might be:

- The optimizers work across module boundaries, so the effects of my optimizations are dependent on the rest of the circuit around the debouncers
- The results are largely irrelevant if I'm not area-constrained in the first place

## Turning on keep-hierarchy

To test the first theory, I'm going to turn the "keep hierarchy" setting to Yes in the Synthesize process properties. Doing this, for the unoptomized circuit, I get the following area report:

Number of Slice Registers: 76 out of 18,224 1% Number of Slice LUTs: 92 out of 9,112 1% Number of occupied Slices: 37 out of 2,278 1%

And when I use the optimized module, I get this report:

Number of Slice Registers: 76 out of 18,224 1% Number of Slice LUTs: 84 out of 9,112 1% Number of occupied Slices: 34 out of 2,278 1%

Which almost exactly matches the results I saw in the previous post. You can take a look at the two "Technology" schematics here; I'm starting to think that I was wrong in thinking that optimizing the debouncer could be beneficial, since the majority of both circuits is the 17-bit resettable counter circuitry:

## Trying to make the area-optimizer work harder

I'm going to try one last thing to see if my improvements were actually improvements, which is to tell the optimizers to try even harder to reduce area. I'm not really sure which metric they will try to minimize, so a similar caveat still applies, that the numbers might not be indicative of the optimizer's "best work". The results are pretty interesting though; here is the new report for the unoptimized debouncer, with keep-hierarchy still turned on:

Number of Slice Registers: 76 out of 18,224 1% Number of Slice LUTs: 85 out of 9,112 1% Number of occupied Slices: 37 out of 2,278 1%

And for the optimized debouncer:

Number of Slice Registers: 76 out of 18,224 1% Number of Slice LUTs: 79 out of 9,112 1% Number of occupied Slices: 34 out of 2,278 1%

So not much of a difference... let's try turning keep-hierarchy off. Here's the new timing for the unoptimized debouncer:

Number of Slice Registers: 43 out of 18,224 1% Number of Slice LUTs: 50 out of 9,112 1% Number of occupied Slices: 27 out of 2,278 1%

Whoa, that's very different. Let's see what it looks like for the optimized debouncer:

Number of Slice Registers: 76 out of 18,224 1% Number of Slice LUTs: 81 out of 9,112 1% Number of occupied Slices: 29 out of 2,278 1%

Odd, this is in some metrics worse than with keep-hierarchy turned on.

My takeaway from this is that the difference between the two circuits is less than the variability I'm getting from different optimization options. This is a little bit of a let-down, since it means that it's very hard to test the area-usage of subcomponents in isolation. I guess we'll have to wait until the circuit is more complicated before doing much more optimization, or focus more on timing performance, which perhaps we can check more easily by setting the clock period lower. So, back to adding more functionality for now.

## Leave a comment