Saturday, August 19, 2017

Chisel Sharpening: If it's not tested, it's broken

"If it's not tested, it's broken."

-- Bruce Eckel

I'm a big believer in the quote above, and cite it somewhat frequently -- perhaps to the tedium of my colleagues. In my last post, I showed a Chisel3-based description of a Wishbone interconnect. While it might have looked cool, without tests I had no idea whether it worked correctly or not. After adding a simple UVM testbench around my interconnect I can, yet again, confirm the truth of the quote above. But, enough pontificating, let's dig into the specifics.

Error Types

Especially when implementing something while learning a new language or technique, I find that I make four types of errors:
  • Implementation errors and oversights - There are your run of the mill bugs. For example, I neglected to implement an intended feature, or I implemented the logic incorrectly. Good planning helps to minimize these errors, but they are why we place such value on good verification.
  •  Errors in description - These are learning mistakes related to the new language or technique. I structured a description around my understanding of the language/technique, only to find that it resulted in unexpected behavior. The ease or difficulty in avoiding and/or diagnosing errors in description is a key determinant for me in deciding how easy a new technique is to adopt. 
  • Errors in reuse - This type of error occurs when reusing existing IP, only to find that it functions differently from my understanding. 
  • Tool or library issues - These are errors that you hope not to encounter, but do crop up from time to time. 
Not surprisingly, I encountered the first three categories of errors while verifying my Wishbone interconnect.  I did encounter one tool/library issue, but I'll get to that later...

UVM Testbench

I decided to verify a 2x4 configuration of the Wishbone interconnect, and created a very (very) basic UVM testbench around the interconnect and instantiated two Wishbone master agents and four memory target devices - mapped at 0x00000000, 0x00001000, 0x00002000, and 0x00003000, respectively.


And, a very basic write/read test:
























I ran this in Questa (the Altera Modelsim Starter Edition, to be precise). On the first test run, nothing worked and quite a few signals were driven to X. In order to accelerate progress, I turned on register randomization (something supported by the Verilog code generated by Chisel).

Reuse Error: Chisel Arbiter

My simple write/read test causes both masters to perform a write to the same target device as the first operation. I had assumed that the Chisel-provided arbiter would grant the output to the selected master until that master dropped its request. However, this turned out to not be the case. It took browsing the source code (for me at least) to understand the mechanism the library developer had provided for controlling the arbiter locking behavior. Once I understood that mechanism, it was very simple to customize the locking behavior.
The arbiter is such a useful and reusable construct that I'll devote a future post to it, rather than delve into the details here.

Description Error: Multiple Assignments

My original interconnect description used a nested loop structure across the masters and slaves, with a conditional assignment to propagate back the slave response. 

  for (i <- 0 until p.N_SLAVES) {
    for (j <- 0 until p.N_MASTERS) {
      // ...
      
      // Propagate slave response back to active master
      when (out_arb(i).io.in(j).ready) {
          in_rsp(j) := out_rsp(i);
      } .otherwise {
          in_rsp(j).park_rsp();
      }
    }
    out_arb(i).io.out.bits.assign_req2p(io.s(i));
  }
  
As it turns out, this code trips over one of the corner cases of Chisel: when procedural code (eg for loops) make multiple assignments, the last assignment is taken. In this case, that means that both masters were being fed the response from the last slave device.
Investigating description errors like these unfortunately involve digging into the generated Verilog code. This is both tedious and not quite as bad as it sounds. Chisel picks sensible names for module I/O signals, so these are easy to track. However, Chisel also generates lots of anonymously-named internal signals (eg _T_51) that are used to implement the logic within the module.
I don't have a concrete proposal for the Chisel authors on how to improve this situation, but I would like to think a graphical view, such as a schematic, might be helpful in relating the input Scala code to the resulting Verilog.

Reuse Error: Mux Arguments

After better-understanding Chisel's behavior with respect to multiple assignments, I decided that using Chisel's 1-hot Mux primitive would be the best way to handle the response data. Here I bumped into a limitation of the Mux that the Arbiter primitive allowed me to ignore: multiplexing bundles with signals of different I/O directions is not supported (and, sadly, only uncovered very late in the transformation process from Chisel to Verilog). It all makes a lot of sense once you think it through.
Understanding this limitation drove me to redefine the I/O bundles I used to describe the Wishbone interface. It was a fairly straight-forward process, and one that I'll describe in more depth in a future post on structuring I/Os for standard interfaces with Chisel.

Understanding Data Manipulation Techniques

 Chisel encourages descriptions that involve collections of data. In some ways, this isn't so different from other hardware-description languages. What's different is the set of operators Chisel provides for manipulating these data collections. One early example I ran across was implementing address decode for the masters. I had arrays of target base/limit addresses, and wanted to determine which target device each master was selecting. This was very easy (and compactly) described with the following code:

    val slave_req = io.addr_base.zip(io.addr_limit).map(e => (
        io.m(i).ADR >= e._1 && io.m(i).ADR <= e._2) &&
        io.m(i).CYC && io.m(i).STB)

This code describes the following:
  • Combine the addr_base and addr_limit arrays into an array of (addr_base,addr_limit) tuples using the 'zip' operation
  • Convert this array of tuples into an array of Bool where the entry is 'true' if the target is selected
These techniques also apply nicely to selecting fields from composite data structures, as shown below. In the case below, we want to determine whether a given target device is actively selected by any master.

      when (out_arb(j).io.in.map((f) => f.valid).reduceLeft(_|_)) {
        out_arb(j).io.out.bits.assign_b2(io.s(j));
      } .otherwise {
        // If no master is requesting, deactivate the slave requests
        io.s(j).park_req()
      }

In this case, the code does the following:
  • out_arb(j).io.in is an array of composite data going into the per-target arbiters. The map() operation selects just the 'valid' field from each array element
  • Then, the reduceLeft() operation performs a reduction across the array
Now, both of these operations can be described in other hardware description languages. But both likely would involve several layers of temporary data fields. It's actually really nice to be able to describe the high-level view of the manipulation to be performed, and be confident that a sensible implementation of this implementation will be inferred (and, after having to dig into the implementation for other reasons, I can state that the implementation is sensible).

Tool Issue: Register without Reset

I mentioned earlier that I had turned on register initial-value randomization when I first started simulations. After getting my test running correctly, I had hoped this would not be needed. However, it turns out that Chisel's Arbiter primitive contains a register without a reset value. Perhaps this hasn't created an issue for many Chisel users because the Verilator 2-state simulator is often used. However, with a 4-state simulator like Questa/Modelsim, an uninitialized register is a fatal issue that results in X propagation and a non-functioning design.

I strongly recommend using registers that are reset, and will provide this feedback to the Chisel team. 

Conclusions

I'd class many of the issues I faced as all part of the learning curve for a new tool or technique. Challenging (and sometimes time-consuming) to surmount, perhaps, but issues I'd likely not face in the future. I've also gained an new-found appreciation for the descriptive power that Chisel's support for Scala's collection-manipulation operators bring to the description of hardware. 

For now, I have a working Wishbone interconnect described with Chisel. And, despite a few hiccups along the way, I'm still feeling pretty good about the expressive power that Chisel brings to hardware description.


Next, I'm curious to see how synthesis results compare for a hand-coded Wishbone interconnect and the Chisel-generated one. 
As always, you can find the source for my experiments here:






No comments:

Post a Comment