Monday, August 28, 2017

Chisel Sharpening: In the end, it's all about results

At the end of the day, it's all about results, of course. A productivity improvement is great, as long as the implementation results at least stay constant. Any decrease in implementation results definitely detract from any productivity improvements.

So, after all the work thus far describing and verifying the wishbone interconnect, we still have to answer the question: how are the results? How does the Chisel description compare, in terms of synthesis results, to the results from the hand-coded description?

In this post, I'll be using Intel/Altera Quartus for synthesis (since I have access to an Altera prototype board and Quartus). If anyone wishes to donate a Zynq prototype board, I'd be happy to report the results from Xilinx Vivado as well.

There are two things that we care about when it comes to implementation: speed and size. What's the maximum frequency at which the implementation operates? How many logic elements are required? How many registers?

Prerequisites

Synthesis tools are very good, by design, at eliminating unused logic. Any I/Os in our design that aren't connected will be optimized out of the design. Consequently, synthesizing the interconnect will only work under two conditions: if all the interconnect I/Os are connected to FPGA device I/Os, or if we build a design around the interconnect that utilizes the all the interconnect I/Os. 
Given the number of I/Os the wishbone interconnect has, the second path is the most feasible.

We already have a synthesizable Wishbone target memory device from the verification environment. Now, we just need something to drive the master interfaces. Just to further explore Chisel, I created a Wishbone initiator using the LFSR Chisel module to randomize the address (to select the target device), read/write, and write data. 



I connected up the LEDs to counters that toggle the LED every time the respective target device is accessed 16,384 times -- just for kicks, and because I like blinking lights...

Results

For comparison, I'll leave everything the same about the design except for the interconnect. Here are the relevant details:

Design Registers ALMs Fmax
Hand-coded 264 239 160.64
Chisel 227 275 201.37
So, Chisel uses:
  • 14% more ALMs than the hand-coded RTL
  • 14% fewer registers than the hand-coded RTL
So, it's a bit of a wash in terms of size.

The performance difference is fairly significant, though: the Chisel design is 25% faster than the hand-coded RTL. So, the (slightly) higher-level description certainly doesn't hurt the results! Now, I'm sure I could achieve the same results with the hand-coded description that the Chisel description achieved. I'm still a bit surprised, though, that the Chisel description achieved better results out the box with a less-than-expert user. So, definitely a promising conclusion to my initial Chisel exploration!


Now, just for fun, here is the design running on the Cylone V prototype board.





Stay tuned for more details on what I'm learning about Chisel constructs. And, you can find the code used in my experiments here: https://github.com/mballance/wb_sys_ip




Saturday, August 19, 2017

Chisel Sharpening: If it's not tested, it's broken

"If it's not tested, it's broken."

-- Bruce Eckel

I'm a big believer in the quote above, and cite it somewhat frequently -- perhaps to the tedium of my colleagues. In my last post, I showed a Chisel3-based description of a Wishbone interconnect. While it might have looked cool, without tests I had no idea whether it worked correctly or not. After adding a simple UVM testbench around my interconnect I can, yet again, confirm the truth of the quote above. But, enough pontificating, let's dig into the specifics.

Error Types

Especially when implementing something while learning a new language or technique, I find that I make four types of errors:
  • Implementation errors and oversights - There are your run of the mill bugs. For example, I neglected to implement an intended feature, or I implemented the logic incorrectly. Good planning helps to minimize these errors, but they are why we place such value on good verification.
  •  Errors in description - These are learning mistakes related to the new language or technique. I structured a description around my understanding of the language/technique, only to find that it resulted in unexpected behavior. The ease or difficulty in avoiding and/or diagnosing errors in description is a key determinant for me in deciding how easy a new technique is to adopt. 
  • Errors in reuse - This type of error occurs when reusing existing IP, only to find that it functions differently from my understanding. 
  • Tool or library issues - These are errors that you hope not to encounter, but do crop up from time to time. 
Not surprisingly, I encountered the first three categories of errors while verifying my Wishbone interconnect.  I did encounter one tool/library issue, but I'll get to that later...

UVM Testbench

I decided to verify a 2x4 configuration of the Wishbone interconnect, and created a very (very) basic UVM testbench around the interconnect and instantiated two Wishbone master agents and four memory target devices - mapped at 0x00000000, 0x00001000, 0x00002000, and 0x00003000, respectively.


And, a very basic write/read test:
























I ran this in Questa (the Altera Modelsim Starter Edition, to be precise). On the first test run, nothing worked and quite a few signals were driven to X. In order to accelerate progress, I turned on register randomization (something supported by the Verilog code generated by Chisel).

Reuse Error: Chisel Arbiter

My simple write/read test causes both masters to perform a write to the same target device as the first operation. I had assumed that the Chisel-provided arbiter would grant the output to the selected master until that master dropped its request. However, this turned out to not be the case. It took browsing the source code (for me at least) to understand the mechanism the library developer had provided for controlling the arbiter locking behavior. Once I understood that mechanism, it was very simple to customize the locking behavior.
The arbiter is such a useful and reusable construct that I'll devote a future post to it, rather than delve into the details here.

Description Error: Multiple Assignments

My original interconnect description used a nested loop structure across the masters and slaves, with a conditional assignment to propagate back the slave response. 

  for (i <- 0 until p.N_SLAVES) {
    for (j <- 0 until p.N_MASTERS) {
      // ...
      
      // Propagate slave response back to active master
      when (out_arb(i).io.in(j).ready) {
          in_rsp(j) := out_rsp(i);
      } .otherwise {
          in_rsp(j).park_rsp();
      }
    }
    out_arb(i).io.out.bits.assign_req2p(io.s(i));
  }
  
As it turns out, this code trips over one of the corner cases of Chisel: when procedural code (eg for loops) make multiple assignments, the last assignment is taken. In this case, that means that both masters were being fed the response from the last slave device.
Investigating description errors like these unfortunately involve digging into the generated Verilog code. This is both tedious and not quite as bad as it sounds. Chisel picks sensible names for module I/O signals, so these are easy to track. However, Chisel also generates lots of anonymously-named internal signals (eg _T_51) that are used to implement the logic within the module.
I don't have a concrete proposal for the Chisel authors on how to improve this situation, but I would like to think a graphical view, such as a schematic, might be helpful in relating the input Scala code to the resulting Verilog.

Reuse Error: Mux Arguments

After better-understanding Chisel's behavior with respect to multiple assignments, I decided that using Chisel's 1-hot Mux primitive would be the best way to handle the response data. Here I bumped into a limitation of the Mux that the Arbiter primitive allowed me to ignore: multiplexing bundles with signals of different I/O directions is not supported (and, sadly, only uncovered very late in the transformation process from Chisel to Verilog). It all makes a lot of sense once you think it through.
Understanding this limitation drove me to redefine the I/O bundles I used to describe the Wishbone interface. It was a fairly straight-forward process, and one that I'll describe in more depth in a future post on structuring I/Os for standard interfaces with Chisel.

Understanding Data Manipulation Techniques

 Chisel encourages descriptions that involve collections of data. In some ways, this isn't so different from other hardware-description languages. What's different is the set of operators Chisel provides for manipulating these data collections. One early example I ran across was implementing address decode for the masters. I had arrays of target base/limit addresses, and wanted to determine which target device each master was selecting. This was very easy (and compactly) described with the following code:

    val slave_req = io.addr_base.zip(io.addr_limit).map(e => (
        io.m(i).ADR >= e._1 && io.m(i).ADR <= e._2) &&
        io.m(i).CYC && io.m(i).STB)

This code describes the following:
  • Combine the addr_base and addr_limit arrays into an array of (addr_base,addr_limit) tuples using the 'zip' operation
  • Convert this array of tuples into an array of Bool where the entry is 'true' if the target is selected
These techniques also apply nicely to selecting fields from composite data structures, as shown below. In the case below, we want to determine whether a given target device is actively selected by any master.

      when (out_arb(j).io.in.map((f) => f.valid).reduceLeft(_|_)) {
        out_arb(j).io.out.bits.assign_b2(io.s(j));
      } .otherwise {
        // If no master is requesting, deactivate the slave requests
        io.s(j).park_req()
      }

In this case, the code does the following:
  • out_arb(j).io.in is an array of composite data going into the per-target arbiters. The map() operation selects just the 'valid' field from each array element
  • Then, the reduceLeft() operation performs a reduction across the array
Now, both of these operations can be described in other hardware description languages. But both likely would involve several layers of temporary data fields. It's actually really nice to be able to describe the high-level view of the manipulation to be performed, and be confident that a sensible implementation of this implementation will be inferred (and, after having to dig into the implementation for other reasons, I can state that the implementation is sensible).

Tool Issue: Register without Reset

I mentioned earlier that I had turned on register initial-value randomization when I first started simulations. After getting my test running correctly, I had hoped this would not be needed. However, it turns out that Chisel's Arbiter primitive contains a register without a reset value. Perhaps this hasn't created an issue for many Chisel users because the Verilator 2-state simulator is often used. However, with a 4-state simulator like Questa/Modelsim, an uninitialized register is a fatal issue that results in X propagation and a non-functioning design.

I strongly recommend using registers that are reset, and will provide this feedback to the Chisel team. 

Conclusions

I'd class many of the issues I faced as all part of the learning curve for a new tool or technique. Challenging (and sometimes time-consuming) to surmount, perhaps, but issues I'd likely not face in the future. I've also gained an new-found appreciation for the descriptive power that Chisel's support for Scala's collection-manipulation operators bring to the description of hardware. 

For now, I have a working Wishbone interconnect described with Chisel. And, despite a few hiccups along the way, I'm still feeling pretty good about the expressive power that Chisel brings to hardware description.


Next, I'm curious to see how synthesis results compare for a hand-coded Wishbone interconnect and the Chisel-generated one. 
As always, you can find the source for my experiments here:






Wednesday, August 2, 2017

Chisel Sharpening: Initial impressions

Recently, RiscV (https://riscv.org/) has been all the rage. The free (as in speech and beer) instruction-set architecture has experienced an explosion of interest in the last couple of years, after being more-or-less an academic curiosity since 2010 or so.

This post isn't about RiscV, though. It's about Chisel (https://github.com/freechipsproject/chisel3/wiki) , the design language that the UC Berkeley team working on RiscV uses to implement Rocket Chip, their proof-of-concept RiscV implementation. Claims of the productivity benefits of using Chisel for hardware design are substantial. For example, one data point two implementations of a RiscV architecture -- one using Chisel and one in Verilog (https://riscv.org/wp-content/uploads/2015/01/riscv-chisel-tutorial-bootcamp-jan2015.pdf).


3x fewer lines of code -- even (but maybe especially) if the savings was in wiring -- sounds pretty good to me! So, I decided a while ago that I should dig in and learn more about Chisel.

So, what's Chisel?

Chisel certainly isn't a one-for-one replacement for a hardware description language (HDL) like Verilog or VHDL. Chisel is a class library, written in Scala (https://www.scala-lang.org/). Descriptions written using that class library are compiled, executed, and converted to Verilog. Now, your first thought when a high-level language like Scala is mentioned might be "High-Level Synthesis", but that's not what Chisel is all about. Chisel is very much focused on RTL (register-tranfer level) design.

A learning project

I learn almost everything by doing, so I decided to assign myself a learning project to see if I could make Chisel go. A few years back, I coded a parameterized Wishbone interconnect in SystemVerilog. I was curious to see how a similar project coded in Chisel would compare.

All about the interfaces

The first thing to do is to capture a Wishbone interface. As with many hardware interfaces, Wishbone is parameterized with address, data, and tag widths. Quite sensibly, Chisel provides a class for specifying reusable collections of signals called Bundle. Below, you can see the code used to declare the parameters for a Wishbone interface, and the Bundle (WishboneMaster) that specifies the signals for a Master interface. 

class WishboneParameters (
    val ADDR_WIDTH  :  Int=32,
    val DATA_WIDTH  :  Int=32,
    val TGA_WIDTH   :  Int=1,
    val TGD_WIDTH   :  Int=1,
    val TGC_WIDTH   :  Int=1) {
  
    def cloneType() = (new WishboneParameters(ADDR_WIDTH, DATA_WIDTH,
        TGA_WIDTH, TGD_WIDTH, TGC_WIDTH)).asInstanceOf[this.type]
}

class WishboneMaster(val p : WishboneParameters) extends Bundle {
val ADR = Output(UInt(p.ADDR_WIDTH.W));
val TGA = Output(UInt(p.TGA_WIDTH.W));
val CTI = Output(UInt(3.W));
val BTE = Output(UInt(2.W));
val DAT_W = Output(UInt(p.DATA_WIDTH.W));
val TGD_W = Output(UInt(p.TGD_WIDTH.W));
val DAT_R = Input(UInt(p.DATA_WIDTH.W));
val TGD_R = Input(UInt(p.TGD_WIDTH.W));
val CYC = Output(Bool());
val TGC = Output(UInt(p.TGC_WIDTH.W));
val ERR = Input(Bool());
val SEL = Output(UInt(p.DATA_WIDTH/8));
val STB = Output(Bool());
val ACK = Input(Bool());
val WE = Output(Bool());

    // ...
}

Coding up the interconnect

As with many learning projects, the Wishbone interconnect involved a series of iterations. Eventually, I arrived at the code below, broken up a bit to support comments:

class WishboneInterconnectParameters(
    val N_MASTERS  : Int=1,
    val N_SLAVES   : Int=1,
    val wb_p       : WishboneParameters) {
}

class WishboneInterconnect(
    val p : WishboneInterconnectParameters,
    val typename : String = "WishboneInterconnect") extends Module {

    val io = IO(new Bundle {
      val addr_base  = Input(Vec(p.N_SLAVES, UInt(p.wb_p.ADDR_WIDTH.W)))
      val addr_limit = Input(Vec(p.N_SLAVES, UInt(p.wb_p.ADDR_WIDTH.W)))
   
      val m = Vec(p.N_MASTERS, Flipped(new WishboneMaster(p.wb_p)))
      val s = Vec(p.N_SLAVES, new WishboneMaster(p.wb_p))
  });
    
  override def desiredName() : String = typename;

Here's the interface declaration. Note that we have vectors of address base and limit for address decode, then vectors of master and slave interfaces. Note the 'Flipped' method that reverses the Input/Output direction of elements within a bundle.

Now, what we're building is effectively shown below. Each slave interface has an associated arbiter that is connected to all masters. Fortunately, Chisel provides an Arbiter as a built-in element of the class library.


  val in_rsp = Seq.fill(p.N_MASTERS) ( Wire(new WishboneMaster(p.wb_p) ))
  val out_rsp = Seq.fill(p.N_SLAVES) ( Wire(new WishboneMaster(p.wb_p) ))
  
  for (i <- 0 until p.N_MASTERS) {
    // Drive back to master
    in_rsp(i).assign_rsp2p(io.m(i));
  }
  
  val out_arb = Seq.fill(p.N_SLAVES) ( Module(new RRArbiter(
      new WishboneMaster(p.wb_p), p.N_MASTERS)) )

This code creates a couple of temp arrays for routing the response back from the slave to the master, as well as an array of per-slave interface arbiters.

  // For each slave, hook up all masters
  for (i <- 0 until p.N_SLAVES) {
    for (j <- 0 until p.N_MASTERS) {
      val m_sel = io.addr_base.indexWhere((p:UInt) => (io.m(j).ADR >= p))
      val m_ex = (io.addr_base.exists((p:UInt) => (io.m(j).ADR >= p)) &&
          io.addr_limit.exists((p:UInt) => (io.m(j).ADR <= p)));
      out_arb(i).io.in(j).bits.assign_p2req(io.m(j))

      when (m_ex && m_sel === i.asUInt()) {
        out_arb(i).io.in(j).valid := Bool(true);
      } .otherwise {
        out_arb(i).io.in(j).valid := Bool(false);
      }
      
      // Propagate slave response back to active master
      when (out_arb(i).io.in(j).ready /* out_arb(i).io.in(j).valid && 
          out_arb(i).io.chosen === j.asUInt() */) {
          in_rsp(j) := out_rsp(i);
      } .otherwise {
          in_rsp(j).park_rsp();
      }
    }
    out_arb(i).io.out.bits.assign_req2p(io.s(i));
  }
}

Finally, we do the address decode to determine which slave a master's request address selects, and connect everything up to the arbiters. Note how simple it is to query the base/limit address arrays! The 'm_ex' field is true if the master is selecting a valid slave, while the 'm_sel' field holds the target index.

Generating RTL

One of the things I spent far too much time on was finding out how to generate Verilog from my Chisel description. Turns out the incantation is quite simple once you know what it is:

object WishboneInterconnectDriver extends App {
    var N_MASTERS = 2;
    var N_SLAVES = 4;
    var ADDR_WIDTH = 32;
    var DATA_WIDTH = 32;
 
  var typename = "wishbone_ic_%d_%d_%dx%d".format(
      ADDR_WIDTH, DATA_WIDTH, N_MASTERS, N_SLAVES);
  
  chisel3.Driver.execute(args, () => new WishboneInterconnect(
      new WishboneInterconnectParameters(N_MASTERS, N_SLAVES,
          wb_p=new WishboneParameters(ADDR_WIDTH, DATA_WIDTH)
      ), typename)
  )
}

The code above calls the Chisel 'Driver', passing in an instance of the WishboneInterconnect class. The result of running this code is a set of files, one of which is the Verilog RTL. The output RTL is somewhat low-level -- and 1656 lines long (!). When it comes to debugging this, I'll be interested to how much this gets in the way. But, it's all sensible RTL at the end of the day...

Results

Okay, so the hand-coded SystemVerilog interconnect took a total of 326 lines of SystemVerilog code. But, a little over 100 of those were the per-slave arbiter. If we ignore those lines, we have 206 lines of SystemVerilog. The Chisel description is 56 lines of code. So, 3-6x less code, depending on whether you count or ignore the arbiter implementation. Not bad, and I'm definitely feeling more comfortable with Chisel after working through an example like this.

If you're interested, you can find the complete code on GitHub:

This repository contains both the hand-coded SystemVerilog and the Chisel representation. 

So, what did we learn?

Well, initial experiments certainly seem to bear out the productivity benefits of Chisel. Library elements, such as the arbiter module are a great productivity boost! Array operations raise the abstraction level.

Figuring out the basics can be a bit challenging. Even figuring out how to run the conversion to Verilog process took some digging. Because Chisel is embedded in another language, semantic errors tend to show up as Java exception errors, rather than nice high-level error messages.

So, thus far, some good and some bad. Over the next couple of posts, I plan to dig into a couple of other areas of comparison -- including verification of the RTL, and how efficiently hand-coded and Chisel-generated implementations synthesize. So, stay tuned for more.

Have you experimented at all with Chisel? Or, with other HDL alternatives for that matter. What has your experience been? 


Saturday, April 12, 2014

System Level Verification: What, Another Framework?


Frameworks have been enormously helpful in making testing and verification productive and reusable. In the RTL verification space, AVM, VMM, OVM, and UVM have all contributed to standardizing a pattern for RTL verification that encompasses encapsulation and reuse, automation for stimulus generation, and mechanisms for results checking. In the software space, frameworks such as JUnit and CPPUnit have, likewise, provide structure around standardizing how test results are checked and how test suites are composed and executed in an automated fashion.

As more focus is given to verification issues at SoC and system level, it makes sense to ask: are the requirements for a verification framework in a system level environment met by existing (primarily RTL-centric) verification  frameworks, or is there something fundamentally different at system level? As it turns out, there are some rather unique aspects of system-level verification that make existing verification frameworks unsuitable for application in this space.

Perhaps the most visible difference in an system-level design is the presence of embedded processors that are simultaneously part of the design and part of the verification environment. The verification framework must enable automation for test generation, as well as facilitate managing the processors as design and verification resources in configurations ranging from 'bare metal', to OS and test, to an OS running application software and a test.

Another difference with a system-level environment is that verification must look both forward and backwards in the verification process. It is a requirement that the same test scenario be able to run in an SoC-level simulation/emulation context that includes a SystemVerilog/UVM testbench, as well as running in a prototype context (FPGA or first silicon), as well as potentially running on the end product. Now, that certainly doesn't mean that the same verification will be done at each step: the purpose and goals of verification in each context are quite different. However, being able to re-run the same scenario in two verification contexts provides some continuity between the contexts and avoids having to start from scratch when changing contexts. For example, consider an SoC-level environment in simulation. Much of the test stimulus is still hardware-centric, but early embedded-software scenarios are being executed. When moving to emulation, it is enormously valuable to be able to run the same scenario that was proven to run in simulation and not have to start developing a test from scratch. Continuity is even more helpful when moving from emulation (which still largely maintains the appearance of simulation) to an FPGA prototype environment that is radically different.

As mentioned in a previous post, system-level environments tend to look more like cooperating islands of test functionality rather than the monolithic testbench used for block and subsystem RTL verification. A system-level verification framework must enable reuse of verification infrastructure across these islands, as well as facilitating the cooperation of these islands in carrying out the broader verification task.

Just because the requirements are different for a system-level verification framework doesn't mean that design of a system-level verification framework must start from first principals. Hardware- and software-centric test frameworks have been in development for over a decade (some would argue much, much longer), and there is much to be gained from the features that, over time, were found to be useful for verifying hardware and software.

I've recently started work on a lightweight C++-based verification framework that targets the requirements of system-level verification. The framework borrows features and patterns from existing verification frameworks, and adds features and patterns to address some of the unique requirements of driving embedded software tests and coordinating the multiple 'islands' of a system-level verification environment. The framework is designed to be encapsulated in environments as diverse as UVM and special-purpose verification hardware accelerators. The next few blog posts will explore some of the specifics of this developing framework.

~Everything is a system -- some are just smaller than others

Saturday, March 29, 2014

System-level verification: Islands vs Continents

As the saying goes, "There is nothing permanent except change". However, even with constant change, similarities with what came before abound and true and complete discontinuities are relatively rare. This is certainly true in the functional verification space as we begin looking beyond the methodologies that served us well for unit-level and subsystem-level verification and ask "what's next?". There are many similarities between the requirements for a system-level verification framework and requirements for frameworks targeting unit-level and SoC-level environments. In both cases, encapsulation, reuse, abstraction, and test automation are important. However, there are also fairly significant differences in requirements as well. The biggest differences involve testbench structure, time correlation expectations and requirements, and modeling languages.

In a unit-level and SoC-level environment, testbench environments tend to be monolithic. The very fact that it is common to refer to 'the testbench' highlights the all-encompassing nature of the testbench environment in unit- and subsystem- level verification. By contrast, system level (and to a certain extent SoC level) verification tend to be more distributed -- more like loosely-connected islands than the all-encompassing continent of a unit-level testbench environment.

In a SoC-level verification environment, the primary 'island' is the verification-centric embedded software running on the processor or processors that is effectively verifying the system from the inside out. This software may run autonomously relative to the testbench that is testing the design from the outside in, or it may be coordinated -- either loosely or quite tightly -- with the activity of the testbench surrounding the design.

In a system-level verification environment, the 'island' effect is even more pronounced. System-level verification is typically carried out in the lab with either an FPGA prototype or first silicon. In this case, the testbench will be divided into islands such as test equipment connected to the design interfaces, test code running on the embedded processors, and specialized test hardware built into the chip.

A key requirement in unit- and SoC-level testbench environments has historically been tight time synchronization and correlation of activity. Given the simulation and emulation-centric nature of unit, subsystem, and SoC-level verification this makes perfect sense: since the execution engine maintains a coherent view of events and time, the testbench environment can maximize modeling ease and the repeatability and predictability of results. However, this global view of time comes at the cost of execution speed. Simulation-based testbench environments are by-and-large single threaded, and remain largely unable to take advantage to the recent explosion in the availability of multi-core machines to accelerate simulation speed.

By contrast, a system-level verification environment cannot afford to sacrifice the much higher execution speed delivered by an FPGA or first silicon prototype to maintain lock-step synchronization between the execution of the entire environment. Even if higher execution speed could be sacrificed, maintaining full time synchronization would artificially constrain the system and make the results of system level verification impossible to trust.

Finally, verification frameworks designed for use at unit to SoC level have typically be written in a verification-centric language such as Vera, 'e', or SystemVerilog. This makes sense, of course, since these languages provide features specific to RTL verification. However, the fact that these existing languages are typically tightly tied to simulation environments makes reusing verification IP and tests created using them in a system-level environment nearly impossible. A system-level verification framework is essentially constrained to use a 'lowest common denominator' language in order to ensure maximum reuse.

A verification framework that seeks to provide value for system-level verification must be designed from the ground up with these requirements in mind. Over the next few posts, we'll have a look at how these requirements are being addressed in a new system level verification framework currently being developed.

~Everything is a system -- some are just smaller than others

Tuesday, March 18, 2014

Verification Frameworks and System-Level Verification

It seems the last decade or so has been the decade of the verification language and the verification framework. From SystemC, Vera, and 'e' to SystemVerilog, and VMM and AVM to OVM and UVM, a lot of focus has been placed on making design verification engineers productive in creating verification environments and tests. By and large, these verification language and frameworks focused on block-level and subsystem-level verification -- areas where automating application of test stimulus at the signal level was the name of the game. The world changes, however. Today, there is a growing interest in verification at the SoC and system level. At this level, where embedded software and high-level test stimulus are important, frameworks that are tied to RTL-centric language features are a bit of an impedance mismatch -- despite how relevant and valuable they are for unit and subsystem level verification.

Looking forward, the question must be raised: given the importance of verification frameworks in enabling productive, modular, and reusable verification in RTL-centric environments, might a verification framework focused on system-level verification bring the same benefits? As you might guess, I believe in the value of having a verification framework that addresses the somewhat-unique (at least as compared to those of RTL-centric environments) requirements of system-level environments.

Now, just because there might be value in a different verification framework focused on the needs of system-level verification doesn't mean that we have to start from scratch in designing this framework. The experiences from the past decade in terms of general requirements for and useful attributes of a verification framework are invaluable in informing the core elements of a new system-level verification framework.

So, the past informs the future even as the environment and requirements change. Over the course of the next few blog posts, I'll outline more details on key attributes of a system-level verification framework.

Are you doing verification at the system level? If so, what verification framework are you using?



Sunday, March 16, 2014

SVEditor: What's that reference? (Part 1)


One key feature of integrated development environments -- especially those for object oriented languages -- is the ability to find the references to a data type, a method, or a data field. A few months back, I implemented initial reference-finding functionality focused on module declarations and module instantiations. This functionality was used to support the design hierarchy view. Being able to quickly identify root modules by finding module declarations that were not referenced in the project was key to making the design hierarchy view efficient on large designs. Now, I've started work on more general support for locating references to types and fields.

SVEditor creates an AST (abstract syntax tree) for each source file that it parses. The ASTs stored the filesystem and the most recently used are cached in memory. This enables SVEditor to manage large projects without being limited by the amount memory available to Java, as well as avoid re-parsing source files on startup (provided they haven't been modified). Bringing ASTs in from disk is faster than re-parsing them, but is a time-consuming operation. Consequently, all SVEditor operations seek to minimize the number of ASTs that must be traversed.

Finding references is one of those global operations that requires information from all (or nearly all) the files in the environment. When performing a single reference lookup, waiting for a while is not a huge problem. However, reference searching is a very useful operation. As noted before, doing reference lookups for all modules in a design (often at least hundreds) is used to build the design hierarchy. In cases like these, reference lookups must be very fast.

The approach currently being implemented within SVEditor has two steps: coarse data collection during parsing and fine-grained data analysis during a lookup.

During parsing, all identifiers within a file are collected into a set that is associated with the file id. This per-file identifier hash allows a quick check to be performed to see if any references to a given element are likely. 

During a reference-finding operation, a set of files on which to perform more-involved analysis is built based on the per-file identifier hash. This first-pass filter enables more-detailed analysis to be performed on a much smaller set of files, while requiring very little storage overhead.

Next time, more details on the detailed AST analysis to identify class field and method references.