Wednesday, August 2, 2017

Chisel Sharpening: Initial impressions

Recently, RiscV (https://riscv.org/) has been all the rage. The free (as in speech and beer) instruction-set architecture has experienced an explosion of interest in the last couple of years, after being more-or-less an academic curiosity since 2010 or so.

This post isn't about RiscV, though. It's about Chisel (https://github.com/freechipsproject/chisel3/wiki) , the design language that the UC Berkeley team working on RiscV uses to implement Rocket Chip, their proof-of-concept RiscV implementation. Claims of the productivity benefits of using Chisel for hardware design are substantial. For example, one data point two implementations of a RiscV architecture -- one using Chisel and one in Verilog (https://riscv.org/wp-content/uploads/2015/01/riscv-chisel-tutorial-bootcamp-jan2015.pdf).


3x fewer lines of code -- even (but maybe especially) if the savings was in wiring -- sounds pretty good to me! So, I decided a while ago that I should dig in and learn more about Chisel.

So, what's Chisel?

Chisel certainly isn't a one-for-one replacement for a hardware description language (HDL) like Verilog or VHDL. Chisel is a class library, written in Scala (https://www.scala-lang.org/). Descriptions written using that class library are compiled, executed, and converted to Verilog. Now, your first thought when a high-level language like Scala is mentioned might be "High-Level Synthesis", but that's not what Chisel is all about. Chisel is very much focused on RTL (register-tranfer level) design.

A learning project

I learn almost everything by doing, so I decided to assign myself a learning project to see if I could make Chisel go. A few years back, I coded a parameterized Wishbone interconnect in SystemVerilog. I was curious to see how a similar project coded in Chisel would compare.

All about the interfaces

The first thing to do is to capture a Wishbone interface. As with many hardware interfaces, Wishbone is parameterized with address, data, and tag widths. Quite sensibly, Chisel provides a class for specifying reusable collections of signals called Bundle. Below, you can see the code used to declare the parameters for a Wishbone interface, and the Bundle (WishboneMaster) that specifies the signals for a Master interface. 

class WishboneParameters (
    val ADDR_WIDTH  :  Int=32,
    val DATA_WIDTH  :  Int=32,
    val TGA_WIDTH   :  Int=1,
    val TGD_WIDTH   :  Int=1,
    val TGC_WIDTH   :  Int=1) {
  
    def cloneType() = (new WishboneParameters(ADDR_WIDTH, DATA_WIDTH,
        TGA_WIDTH, TGD_WIDTH, TGC_WIDTH)).asInstanceOf[this.type]
}

class WishboneMaster(val p : WishboneParameters) extends Bundle {
val ADR = Output(UInt(p.ADDR_WIDTH.W));
val TGA = Output(UInt(p.TGA_WIDTH.W));
val CTI = Output(UInt(3.W));
val BTE = Output(UInt(2.W));
val DAT_W = Output(UInt(p.DATA_WIDTH.W));
val TGD_W = Output(UInt(p.TGD_WIDTH.W));
val DAT_R = Input(UInt(p.DATA_WIDTH.W));
val TGD_R = Input(UInt(p.TGD_WIDTH.W));
val CYC = Output(Bool());
val TGC = Output(UInt(p.TGC_WIDTH.W));
val ERR = Input(Bool());
val SEL = Output(UInt(p.DATA_WIDTH/8));
val STB = Output(Bool());
val ACK = Input(Bool());
val WE = Output(Bool());

    // ...
}

Coding up the interconnect

As with many learning projects, the Wishbone interconnect involved a series of iterations. Eventually, I arrived at the code below, broken up a bit to support comments:

class WishboneInterconnectParameters(
    val N_MASTERS  : Int=1,
    val N_SLAVES   : Int=1,
    val wb_p       : WishboneParameters) {
}

class WishboneInterconnect(
    val p : WishboneInterconnectParameters,
    val typename : String = "WishboneInterconnect") extends Module {

    val io = IO(new Bundle {
      val addr_base  = Input(Vec(p.N_SLAVES, UInt(p.wb_p.ADDR_WIDTH.W)))
      val addr_limit = Input(Vec(p.N_SLAVES, UInt(p.wb_p.ADDR_WIDTH.W)))
   
      val m = Vec(p.N_MASTERS, Flipped(new WishboneMaster(p.wb_p)))
      val s = Vec(p.N_SLAVES, new WishboneMaster(p.wb_p))
  });
    
  override def desiredName() : String = typename;

Here's the interface declaration. Note that we have vectors of address base and limit for address decode, then vectors of master and slave interfaces. Note the 'Flipped' method that reverses the Input/Output direction of elements within a bundle.

Now, what we're building is effectively shown below. Each slave interface has an associated arbiter that is connected to all masters. Fortunately, Chisel provides an Arbiter as a built-in element of the class library.


  val in_rsp = Seq.fill(p.N_MASTERS) ( Wire(new WishboneMaster(p.wb_p) ))
  val out_rsp = Seq.fill(p.N_SLAVES) ( Wire(new WishboneMaster(p.wb_p) ))
  
  for (i <- 0 until p.N_MASTERS) {
    // Drive back to master
    in_rsp(i).assign_rsp2p(io.m(i));
  }
  
  val out_arb = Seq.fill(p.N_SLAVES) ( Module(new RRArbiter(
      new WishboneMaster(p.wb_p), p.N_MASTERS)) )

This code creates a couple of temp arrays for routing the response back from the slave to the master, as well as an array of per-slave interface arbiters.

  // For each slave, hook up all masters
  for (i <- 0 until p.N_SLAVES) {
    for (j <- 0 until p.N_MASTERS) {
      val m_sel = io.addr_base.indexWhere((p:UInt) => (io.m(j).ADR >= p))
      val m_ex = (io.addr_base.exists((p:UInt) => (io.m(j).ADR >= p)) &&
          io.addr_limit.exists((p:UInt) => (io.m(j).ADR <= p)));
      out_arb(i).io.in(j).bits.assign_p2req(io.m(j))

      when (m_ex && m_sel === i.asUInt()) {
        out_arb(i).io.in(j).valid := Bool(true);
      } .otherwise {
        out_arb(i).io.in(j).valid := Bool(false);
      }
      
      // Propagate slave response back to active master
      when (out_arb(i).io.in(j).ready /* out_arb(i).io.in(j).valid && 
          out_arb(i).io.chosen === j.asUInt() */) {
          in_rsp(j) := out_rsp(i);
      } .otherwise {
          in_rsp(j).park_rsp();
      }
    }
    out_arb(i).io.out.bits.assign_req2p(io.s(i));
  }
}

Finally, we do the address decode to determine which slave a master's request address selects, and connect everything up to the arbiters. Note how simple it is to query the base/limit address arrays! The 'm_ex' field is true if the master is selecting a valid slave, while the 'm_sel' field holds the target index.

Generating RTL

One of the things I spent far too much time on was finding out how to generate Verilog from my Chisel description. Turns out the incantation is quite simple once you know what it is:

object WishboneInterconnectDriver extends App {
    var N_MASTERS = 2;
    var N_SLAVES = 4;
    var ADDR_WIDTH = 32;
    var DATA_WIDTH = 32;
 
  var typename = "wishbone_ic_%d_%d_%dx%d".format(
      ADDR_WIDTH, DATA_WIDTH, N_MASTERS, N_SLAVES);
  
  chisel3.Driver.execute(args, () => new WishboneInterconnect(
      new WishboneInterconnectParameters(N_MASTERS, N_SLAVES,
          wb_p=new WishboneParameters(ADDR_WIDTH, DATA_WIDTH)
      ), typename)
  )
}

The code above calls the Chisel 'Driver', passing in an instance of the WishboneInterconnect class. The result of running this code is a set of files, one of which is the Verilog RTL. The output RTL is somewhat low-level -- and 1656 lines long (!). When it comes to debugging this, I'll be interested to how much this gets in the way. But, it's all sensible RTL at the end of the day...

Results

Okay, so the hand-coded SystemVerilog interconnect took a total of 326 lines of SystemVerilog code. But, a little over 100 of those were the per-slave arbiter. If we ignore those lines, we have 206 lines of SystemVerilog. The Chisel description is 56 lines of code. So, 3-6x less code, depending on whether you count or ignore the arbiter implementation. Not bad, and I'm definitely feeling more comfortable with Chisel after working through an example like this.

If you're interested, you can find the complete code on GitHub:

This repository contains both the hand-coded SystemVerilog and the Chisel representation. 

So, what did we learn?

Well, initial experiments certainly seem to bear out the productivity benefits of Chisel. Library elements, such as the arbiter module are a great productivity boost! Array operations raise the abstraction level.

Figuring out the basics can be a bit challenging. Even figuring out how to run the conversion to Verilog process took some digging. Because Chisel is embedded in another language, semantic errors tend to show up as Java exception errors, rather than nice high-level error messages.

So, thus far, some good and some bad. Over the next couple of posts, I plan to dig into a couple of other areas of comparison -- including verification of the RTL, and how efficiently hand-coded and Chisel-generated implementations synthesize. So, stay tuned for more.

Have you experimented at all with Chisel? Or, with other HDL alternatives for that matter. What has your experience been? 


No comments:

Post a Comment