Showing posts with label Chisel+Chisel3. Show all posts

Monday, October 2, 2017

Designing Standard-protocol Interfaces with Chisel Bundles

Standard interfaces are all around us, and enhance interoperability between devices created by different organizations. While some standard interfaces are quite niche in nature, others, like the unbiquitous phono jack, have been used for many applications that are only slightly related.

When it comes to design and reuse of design IP, using higher-level interfaces (certainly higher-level that just a set of wires) helps to make use and reuse of the IP easier. An IP that connects with the rest of the design via interfaces is easier to understand than a block that has a wire-level interface -- even if those hundreds of wires are equivalent to several high-level interfaces. Connecting an IP with top-level interfaces to the rest of the design is much easier and trouble-free than individually connecting hundreds of signals.

SystemVerilog provides the interface construct as both a design and a verification feature. A SystemVerilog interface describes the low-level signals of which the interface is composed. The ways in which those signals can be used (eg initiator vs target) are captured using a modport.

interface wb_if #(

parameter int WB_ADDR_WIDTH = 32,

parameter int WB_TGA_WIDTH = 1,

parameter int WB_DATA_WIDTH = 32,

parameter int WB_TGD_WIDTH = 1,

parameter int WB_TGC_WIDTH = 1

);

reg[(WB_ADDR_WIDTH-1):0] ADR;

reg[(WB_TGA_WIDTH-1):0] TGA;

reg[2:0] CTI;

reg[1:0] BTE;

reg[(WB_DATA_WIDTH-1):0] DAT_W;

reg[(WB_TGD_WIDTH-1):0] TGD_W;

reg[(WB_DATA_WIDTH-1):0] DAT_R;

reg[(WB_TGD_WIDTH-1):0] TGD_R;

reg CYC;

reg[(WB_TGC_WIDTH-1):0] TGC;

reg ERR;

reg[(WB_DATA_WIDTH/8)-1:0] SEL;

reg STB;

reg ACK;

reg WE;

modport master(

output ADR,

output TGA,

output CTI,

output BTE,

output DAT_W,

output TGD_W,

input DAT_R,

output TGD_R,

output CYC,

output TGC,

input ERR,

output SEL,

output STB,

input ACK,

output WE);

...

endinterface

An example of a Wishbone SV interface is shown above, with just the 'master' modport shown. As you can see, parameters are specified on the interface declaration, core signals are declared without direction, and directions for different uses of the signals are specified via modport declarations.

Chisel provides the Bundle construct to group signals together. While the concept and high-level use of a Chisel Bundle is quite similar to a SystemVerilog interface, there are some significant differences. This blog captures the best practices that I've discovered thus far while describing Chisel bundles for standard interfaces.

SV Interfaces vs Chisel Bundles

If you've spent time working with SystemVerilog interfaces already, understanding the differences between SV Interfaces and Chisel Bundles will likely make the best practices below make more sense.

While SystemVerilog provides the modport construct for describing a usage of a interface, Chisel doesn't have a similar notion. All signals in a Chisel bundle are given a direction. Bundles may be instantiated as-is, or instantiated 'Flipped' with reversed signal directions.

Chisel bundles can be hierarchical, so a bundle type can be composed of several instances of other bundle types. In contrast, a SystemVerilog interface must effectively be single-level.

Being an object-oriented language, Chisel allows methods to be defined on a bundle type that assign values to the bundle signals. This can be very useful by making it easy for the user of a bundle type to drive the bundle signals to a useful state.

Chisel Bundle Best Practices

At the end of this blog post is a Chisel description of a Wishbone interface, which I'll refer to in the best practices description below.

Describe from the Initiator's Perspective

Since signal directions are specified on the signals of a Chisel bundle, it's helpful to be consistent in picking either the initiator or the target and describing all interfaces in those terms. I've picked the initiator as the standard perspective to use.

Note that the Wishbone signal directions are captured from the initiator's (master's) perspective. For example, ADR and CYC are outputs, while DAT_R and ACK are inputs.

Collect Related Signals in a Sub-Bundle

Users of a standard interface will often benefit from working with sub-elements of the protocol. Declaring this sub-elements as part of the interface declaration can be very helpful. Since some Chisel elements (such as the Mux) expect all elements of a bundle to have the same direction, it's important that all elements of a sub-bundle have the same direction. In the Wishbone example above, I've created a 'ReqData' bundle to capture all signals related to the transaction request, and a 'RspData' bundle to capture all signals related to the transaction response.

Collect Protocol Parameters into a Parameters Class

Standard protocols are often parameters. For example, the Wishbone address, data, and tag widths are variable. Collecting protocol parameters into a class, instead of passing them individually to the bundle constructor, has two key benefits:

Less typing when creating multiple instances of the interface with the same parameterization
It's easier to create the 'cloneType' method (see next tip), and this can even be placed in a base class if you prefer

Define a cloneType Method

Chisel needs to clone Bundle objects for several reasons. A parameterized standard interface bundle must provide a cloneType method to ensure that the proper parameters are used when the interface bundle is cloned. You can see the definition of the cloneType method above.

Provide tieoff and tieoff_flipped Methods

It should be easy for any users of a standard interface to tie-off that interface. In other words, effectively disable the interface. The tieoff() method is used for initiator interfaces. As you can see, tieoff() drives the response signals to inactive values. The tieoff_flipped() method is used for target interfaces. As you can see, tieoff_flipped() drives the request signals (ADR, CYC, etc) to inactive values.

Note that if a clock or reset must be applied to an interface for it to function properly, the tieoff() method can accept handles to these required signals.

Provide Utility Methods

The ability to provide utility methods for driving interface signals to pre-defined states helps minimize the code an IP must write. In the case of Wishbone, setting the error-response state is done directly by the set_error() method. Any IP that needs to return an error can call this method to set the appropriate values.

I've found the best practices above to be helpful in structuring interfaces that are easily reusable. If you've been working with Chisel, what best practices have you discovered in working with Chisel bundles?

Chisel Bundle for a Wishbone Interface

class Wishbone(val p : Wishbone.Parameters) extends Bundle {

val req = new Wishbone.ReqData(p)

val rsp = new Wishbone.RspData(p)

override def cloneType() : this.type = {

return new Wishbone(p).asInstanceOf[this.type]

}

def tieoff() {

rsp.tieoff()

}

def tieoff_flipped() {

req.tieoff_flipped()

}

object Wishbone {

class Parameters (

val ADDR_WIDTH : Int=32,

val DATA_WIDTH : Int=32,

val TGA_WIDTH : Int=1,

val TGD_WIDTH : Int=1,

val TGC_WIDTH : Int=1) { }

class RspData(override val p : Wishbone.Parameters) extends Bundle {

val DAT_R = Input(UInt(p.DATA_WIDTH.W))

val TGD_R = Input(UInt(p.TGD_WIDTH.W))

val ERR = Input(Bool())

val ACK = Input(Bool())

override def cloneType() : this.type = {

return new RspData(p).asInstanceOf[this.type]

}

def tieoff() {

DAT_R := 0.asUInt();

TGD_R := 0.asUInt();

ERR := Bool(false);

ACK := Bool(false);

}

def set_error() {

ERR := Bool(true);

ACK := Bool(true);

}

class ReqData(override val p : Wishbone.Parameters) extends Bundle {

val ADR = Output(UInt(p.ADDR_WIDTH.W))

val TGA = Output(UInt(p.TGA_WIDTH.W))

val CTI = Output(UInt(3.W))

val BTE = Output(UInt(2.W))

val DAT_W = Output(UInt(p.DATA_WIDTH.W))

val TGD_W = Output(UInt(p.TGD_WIDTH.W))

val CYC = Output(Bool())

val TGC = Output(UInt(p.TGC_WIDTH.W))

val SEL = Output(UInt((p.DATA_WIDTH/8).W))

val STB = Output(Bool())

val WE = Output(Bool())

def tieoff_flipped() {

ADR := 0.asUInt()

TGA := 0.asUInt()

CTI := 0.asUInt()

BTE := 0.asUInt()

DAT_W := 0.asUInt()

TGD_W := 0.asUInt()

CYC := Bool(false)

TGC := 0.asUInt()

SEL := 0.asUInt()

STB := Bool(false)

CYC := Bool(false)

WE := Bool(false)

}

override def cloneType() : this.type = {

return new ReqData(p).asInstanceOf[this.type]

}

Monday, August 28, 2017

Chisel Sharpening: In the end, it's all about results

At the end of the day, it's all about results, of course. A productivity improvement is great, as long as the implementation results at least stay constant. Any decrease in implementation results definitely detract from any productivity improvements.

So, after all the work thus far describing and verifying the wishbone interconnect, we still have to answer the question: how are the results? How does the Chisel description compare, in terms of synthesis results, to the results from the hand-coded description?

In this post, I'll be using Intel/Altera Quartus for synthesis (since I have access to an Altera prototype board and Quartus). If anyone wishes to donate a Zynq prototype board, I'd be happy to report the results from Xilinx Vivado as well.

There are two things that we care about when it comes to implementation: speed and size. What's the maximum frequency at which the implementation operates? How many logic elements are required? How many registers?

Prerequisites

Synthesis tools are very good, by design, at eliminating unused logic. Any I/Os in our design that aren't connected will be optimized out of the design. Consequently, synthesizing the interconnect will only work under two conditions: if all the interconnect I/Os are connected to FPGA device I/Os, or if we build a design around the interconnect that utilizes the all the interconnect I/Os.

Given the number of I/Os the wishbone interconnect has, the second path is the most feasible.

We already have a synthesizable Wishbone target memory device from the verification environment. Now, we just need something to drive the master interfaces. Just to further explore Chisel, I created a Wishbone initiator using the LFSR Chisel module to randomize the address (to select the target device), read/write, and write data.

I connected up the LEDs to counters that toggle the LED every time the respective target device is accessed 16,384 times -- just for kicks, and because I like blinking lights...

Results

For comparison, I'll leave everything the same about the design except for the interconnect. Here are the relevant details:

Design	Registers	ALMs	Fmax
Hand-coded	264	239	160.64
Chisel	227	275	201.37

So, Chisel uses:

14% more ALMs than the hand-coded RTL
14% fewer registers than the hand-coded RTL

So, it's a bit of a wash in terms of size.

The performance difference is fairly significant, though: the Chisel design is 25% faster than the hand-coded RTL. So, the (slightly) higher-level description certainly doesn't hurt the results! Now, I'm sure I could achieve the same results with the hand-coded description that the Chisel description achieved. I'm still a bit surprised, though, that the Chisel description achieved better results out the box with a less-than-expert user. So, definitely a promising conclusion to my initial Chisel exploration!

Now, just for fun, here is the design running on the Cylone V prototype board.

Stay tuned for more details on what I'm learning about Chisel constructs. And, you can find the code used in my experiments here: https://github.com/mballance/wb_sys_ip

Saturday, August 19, 2017

Chisel Sharpening: If it's not tested, it's broken

"If it's not tested, it's broken."

-- Bruce Eckel

I'm a big believer in the quote above, and cite it somewhat frequently -- perhaps to the tedium of my colleagues. In my last post, I showed a Chisel3-based description of a Wishbone interconnect. While it might have looked cool, without tests I had no idea whether it worked correctly or not. After adding a simple UVM testbench around my interconnect I can, yet again, confirm the truth of the quote above. But, enough pontificating, let's dig into the specifics.

Error Types

Especially when implementing something while learning a new language or technique, I find that I make four types of errors:

Implementation errors and oversights - There are your run of the mill bugs. For example, I neglected to implement an intended feature, or I implemented the logic incorrectly. Good planning helps to minimize these errors, but they are why we place such value on good verification.
Errors in description - These are learning mistakes related to the new language or technique. I structured a description around my understanding of the language/technique, only to find that it resulted in unexpected behavior. The ease or difficulty in avoiding and/or diagnosing errors in description is a key determinant for me in deciding how easy a new technique is to adopt.
Errors in reuse - This type of error occurs when reusing existing IP, only to find that it functions differently from my understanding.
Tool or library issues - These are errors that you hope not to encounter, but do crop up from time to time.

Not surprisingly, I encountered the first three categories of errors while verifying my Wishbone interconnect. I did encounter one tool/library issue, but I'll get to that later...

UVM Testbench

I decided to verify a 2x4 configuration of the Wishbone interconnect, and created a very (very) basic UVM testbench around the interconnect and instantiated two Wishbone master agents and four memory target devices - mapped at 0x00000000, 0x00001000, 0x00002000, and 0x00003000, respectively.

And, a very basic write/read test:

I ran this in Questa (the Altera Modelsim Starter Edition, to be precise). On the first test run, nothing worked and quite a few signals were driven to X. In order to accelerate progress, I turned on register randomization (something supported by the Verilog code generated by Chisel).

Reuse Error: Chisel Arbiter

My simple write/read test causes both masters to perform a write to the same target device as the first operation. I had assumed that the Chisel-provided arbiter would grant the output to the selected master until that master dropped its request. However, this turned out to not be the case. It took browsing the source code (for me at least) to understand the mechanism the library developer had provided for controlling the arbiter locking behavior. Once I understood that mechanism, it was very simple to customize the locking behavior.

The arbiter is such a useful and reusable construct that I'll devote a future post to it, rather than delve into the details here.

Description Error: Multiple Assignments

My original interconnect description used a nested loop structure across the masters and slaves, with a conditional assignment to propagate back the slave response.

  for (i <- 0 until p.N_SLAVES) {
    for (j <- 0 until p.N_MASTERS) {
      // ...
      
      // Propagate slave response back to active master
      when (out_arb(i).io.in(j).ready) {
          in_rsp(j) := out_rsp(i);
      } .otherwise {
          in_rsp(j).park_rsp();
      }
    }
    out_arb(i).io.out.bits.assign_req2p(io.s(i));
  }

As it turns out, this code trips over one of the corner cases of Chisel: when procedural code (eg for loops) make multiple assignments, the last assignment is taken. In this case, that means that both masters were being fed the response from the last slave device.
Investigating description errors like these unfortunately involve digging into the generated Verilog code. This is both tedious and not quite as bad as it sounds. Chisel picks sensible names for module I/O signals, so these are easy to track. However, Chisel also generates lots of anonymously-named internal signals (eg _T_51) that are used to implement the logic within the module.
I don't have a concrete proposal for the Chisel authors on how to improve this situation, but I would like to think a graphical view, such as a schematic, might be helpful in relating the input Scala code to the resulting Verilog.

Reuse Error: Mux Arguments

After better-understanding Chisel's behavior with respect to multiple assignments, I decided that using Chisel's 1-hot Mux primitive would be the best way to handle the response data. Here I bumped into a limitation of the Mux that the Arbiter primitive allowed me to ignore: multiplexing bundles with signals of different I/O directions is not supported (and, sadly, only uncovered very late in the transformation process from Chisel to Verilog). It all makes a lot of sense once you think it through.

Understanding this limitation drove me to redefine the I/O bundles I used to describe the Wishbone interface. It was a fairly straight-forward process, and one that I'll describe in more depth in a future post on structuring I/Os for standard interfaces with Chisel.

Understanding Data Manipulation Techniques

Chisel encourages descriptions that involve collections of data. In some ways, this isn't so different from other hardware-description languages. What's different is the set of operators Chisel provides for manipulating these data collections. One early example I ran across was implementing address decode for the masters. I had arrays of target base/limit addresses, and wanted to determine which target device each master was selecting. This was very easy (and compactly) described with the following code:

val slave_req = io.addr_base.zip(io.addr_limit).map(e => (

io.m(i).ADR >= e._1 && io.m(i).ADR <= e._2) &&

io.m(i).CYC && io.m(i).STB)

This code describes the following:

Combine the addr_base and addr_limit arrays into an array of (addr_base,addr_limit) tuples using the 'zip' operation
Convert this array of tuples into an array of Bool where the entry is 'true' if the target is selected

These techniques also apply nicely to selecting fields from composite data structures, as shown below. In the case below, we want to determine whether a given target device is actively selected by any master.

when (out_arb(j).io.in.map((f) => f.valid).reduceLeft(_|_)) {

out_arb(j).io.out.bits.assign_b2(io.s(j));

} .otherwise {

// If no master is requesting, deactivate the slave requests

io.s(j).park_req()

}

In this case, the code does the following:

out_arb(j).io.in is an array of composite data going into the per-target arbiters. The map() operation selects just the 'valid' field from each array element
Then, the reduceLeft() operation performs a reduction across the array

Now, both of these operations can be described in other hardware description languages. But both likely would involve several layers of temporary data fields. It's actually really nice to be able to describe the high-level view of the manipulation to be performed, and be confident that a sensible implementation of this implementation will be inferred (and, after having to dig into the implementation for other reasons, I can state that the implementation is sensible).

Tool Issue: Register without Reset

I mentioned earlier that I had turned on register initial-value randomization when I first started simulations. After getting my test running correctly, I had hoped this would not be needed. However, it turns out that Chisel's Arbiter primitive contains a register without a reset value. Perhaps this hasn't created an issue for many Chisel users because the Verilator 2-state simulator is often used. However, with a 4-state simulator like Questa/Modelsim, an uninitialized register is a fatal issue that results in X propagation and a non-functioning design.

I strongly recommend using registers that are reset, and will provide this feedback to the Chisel team.

Conclusions

I'd class many of the issues I faced as all part of the learning curve for a new tool or technique. Challenging (and sometimes time-consuming) to surmount, perhaps, but issues I'd likely not face in the future. I've also gained an new-found appreciation for the descriptive power that Chisel's support for Scala's collection-manipulation operators bring to the description of hardware.

For now, I have a working Wishbone interconnect described with Chisel. And, despite a few hiccups along the way, I'm still feeling pretty good about the expressive power that Chisel brings to hardware description.

Next, I'm curious to see how synthesis results compare for a hand-coded Wishbone interconnect and the Chisel-generated one.

As always, you can find the source for my experiments here:

https://github.com/mballance/wb_sys_ip.git