Sunday, June 26, 2022

Tools and Techniques to Improve YAML-File Usability

 


This blog post is a bit of a departure from many that I’ve created for this blog. Most of my blog posts are about things I’ve created. This post is about a collection of tools that I use in developing the things I create.  

I’ve recently come back to working on some new features in PyUCIS, the Python library for accessing functional coverage data. PyUCIS provides an implementation of the Accellera UCIS, and several back-end implementations. Good tests are critical when developing new functionality and, in the case of PyUCIS, tests rely on having coverage data to manipulate. As it so happens, while the UCIS API is good for providing tools access to coverage data, it’s not a great interface for humans (and, specifically, for test writers). What test writers need is a very concise and easy-to-read mechanism to capture the coverage data on which the library should operate. How should we capture this data? A couple decades ago, I might have toyed with developing a small language grammar to capture exactly the data I needed. Today, using a mark-up language like YAML or JSON to capture such data is my go-to approach.

YAML - A Data Format for Everything and Nothing

There are many reasons for the popularity of YAML for capturing application-configuration information, such as what we need to capture coverage data. YAML’s structure of a nested series of mappings and lists lends itself to easily capturing all manner of data. Furthermore, support for reading and writing YAML is available the vast majority of programming languages. 

However, the ease with which we can define new data formats, and create simple processors to accept data captured in these formats can be deceptive. It’s tempting to think that, because YAML defines a standard set of structures for capturing data, users will find it easy and intuitive to capture data in our specific format. It’s tempting to think that our format might be so simple that only a little documentation with a few examples may be more than sufficient. The truth, however, is that making our application-specific data format usable requires us to do many of the same things that we would have to do if we defined a custom language. Our YAML-based format must be fully-documented, our data processors must be robust in accepting valid content, rejecting invalid content, and not silently ignore unrecognized input. I’ve had the painful experience of coming back to a project (yep, one that I created) after a few months away and having to dig into the YAML-processing code to remember the data format. 

The apparent ease with which we can access data from our application code is also a bit deceptive. Most YAML-reading libraries provide access to the data through a hierarchy of maps and list that mirrors the structure of the data. Depending on how we might want to subsequently process the data, we might first copy it to a set of custom data object, or we might access it by directly querying the maps and lists. In both cases,  

The really thing about YAML, though, is that many tools exist precisely to help make a custom YAML-based format easy to use and reliable to implement. For the most part, I will focus on tools available in the Python ecosystem. However, many of these tools are equally-useful in when implementing applications in other languages. YAML-processing libraries exist in other language ecosystems as well.

PyUCIS Coverage Example

Let’s look at the following tools in the context of the YAML data format that PyUCIS uses to capture coverage data for testing. Here’s a small example:


The root of data in the document is named ‘coverage’. Currently, ‘coverage’ consists of a series of covergroup types under the ‘covergroups’ section. Each covergroup type has a name and a list of instances. A covergroup instance holds coverpoints, which have bins in which hit counts are stored. The format is intended to make it very simple to capture coverage data for use in testing coverage reporting and merging tools. It’s also not a bad format to bring in coverage data from other tools.

PyYAML
It’s incredibly simple to read data from a YAML-formatted file. I’ve tended to use the PyYAML Python library, but there are many other choices. With PyYAML, reading in file like the example above is incredibly simple:

import yaml

with open(“coverage.yaml”, “r”) as fp:
yaml_data = yaml.load(fp, Loader=yaml.FullLoader)

The result is a hierarchy of Python dictionaries and lists containing the data from the file, which we can walk by indexing. For example:

for cg in yaml_data[“coverage”][“covergroups”]:
  print(“Covergroup type: %s” % cg[“name”])

JSON Schema

One thing we will always want to ensure is that a coverage file conforms to the required syntax. One way to do this is to hand-code a validator that walks through the data structure from the parser and confirms that required elements are present and unexpected elements are not. Another is to create a schema for the document and use a validation library. 
We will create a schema for the coverage file format. Creating a schema is the most efficient way to enable validation of our file format. In addition, once we have a schema, there are many other ways that we can use it.
Despite the fact that we are using YAML for our data, we will capture the schema using json-schema.


The example above is the first part of the schema for our coverage data. It’s a bit verbose, but notice a few things:

  • The root of our document is an object (a dictionary with keys and values) with a single root element coverage
  • A coverage section is an array of covergroupType. Note that the schema refers to this separate declaration, which allows it to be referenced and reused in multiple locations.
  • covergroupType  specifies that it is an object that has three possible sub-entries (name, weight, instances)
  • Of these possible sub-entries, only ‘name’ is required

This merely scratches the surface of what is possible to describe with json-schema. There’s a bit of a learning curve, but my experience has been that it’s pretty straightforward once you learn a few fundamentals.

Once we have a schema, we can validate the data-structure returned from the YAML parser against the schema using the jsonschema Python library.

import yaml

import json

import jsonschema


with open(“coverage.yaml”, “r”) as fp:

yaml_data = yaml.load(fp, Loader=yaml.FullLoader)

with open(“coverage_schema.json”, “r”) as fp:

     schema = json.load(fp)

jsonschema.validate(instance=yaml_data, schema=schema)


Validating a document prior to attempting to process the data structure from the YAML parser allows us to simplify our processing code because we can assuming that the structure of the data is correct.

Python-JsonSchema-Objects

The simplest way to obtain data is to operate directly on the data structure returned by the parser. 

While  this is simple and straightforward, there is at least one significant pitfall: it’s almost never a good idea to use string literals. Consider what happens if we change the name of one of our optional keywords just a bit. 

weight=1

if “weight” in cg.keys():

  weight = cg[“weight”]

If we neglect to update all the locations in our code that use this string literal, some of our data will simply be silently ignored. Clearly, there are some incremental steps we can take – for example, defining a constant for each string literal, making it easier to update. 

Another approach is to work with classes that are generated from our schema. This approach makes it much more likely that we’ll find data misuse issues earlier, and has the added benefit of giving us actual classes to work with. I recently discovered the python-jsonschema-objects project, and used it on PyUCIS for the first time. Thus far, I’m extremely impressed and plan to use it more broadly.

The short version of how it works is as follows. python-jsonschema-objects works off of a JSON-schema document. Each section of the schema (eg covergroupType) should be given a title from which the class name will be derived. Call python-schema-objects to build a Python namespace containing class declarations. Your code can then create classes and populate them – either directly or from parsed data.

It looks like this:

import python_jsonscehma_objects as pjs


builder = pjs.ObjectBuilder(schema)

ns = builder.build_classes()

cov = ns.CoverageData().from_json(json.dumps(yaml_data))


if cov.covergroups is not None:

  for cg in cov.covergroups:

    print(“cg: %s” % cg.name)


The ‘ns’ object above contains the classes derived from the definitions in the schema. We can create an instance of a CoverageData class that contains our schema-compliant data just by loading the JSON representation of that YAML data. From there on, we can directly access our data as class fields.

VSCode YAML Editor

Thus far, we’ve primarily looked at tools that help the developer. The final two tools are focused on improving the user experience. Both leverage our document schema.

Visual Studio Code (VSCode) is a free integrated development environment (IDE) produced by Microsoft. In open source terms, it’s free as in beer. My understanding is that there are compatible truly open source versions as well. As with many IDEs, there is an extensive ecosystem of plug-ins available  to assist in developing different types of code. One of those plug-ins supports YAML development.

So, what does having a schema allow an intelligent editor to do for us? Well, for one thing, it can check the validity of a YAML file as we type it and allow us to fix errors as we go. 


It can suggest what content is valid based on where we are in the document. For example, the schema states that we can have coverpoints and crosses elements inside an instances section. The editor knows this, and prompts us with what it knows is valid.



It can also shows us information about the document section we’re hovering over. Features like these can significantly improve ease of use, making it easier for your users to get started. 

Sphinx Json Schema
Over time, I’ve really come to love Sphinx-Doc for documenting projects. I really like the way it enables combining human-created content with content extracted from the implementation code. I think it finds a great middle ground between tools that fully-generate documentation from code comments and documentation that is fully human created.

Not surprisingly, Sphinx has an extension that supports extracting data from a JSON schema. The extracted data provides a great synopsis of the data format. It’s very likely that you’ll want to add in a bit of extra description on top of what makes sense to put directly in the schema documentation.


The heading and table above are the result of using sphinx-jsonschema to document the covergroupType entity in our coverage schema. All the sub-elements are documented, and complex sub-elements have links to the relevant documentation. The text below the table is description that was manually added to the document. As with most Sphinx plug-ins, the jsonschema plug-in saves the developer from the laborious work of documenting the structure of the document.

Conclusion
YAML is an excellent textual format structure for capturing structured data in a human readable way. Making use of a few readily-available free and open-source tools can make domain-specific YAML-based file formats much easier and reliable to implement, and can dramatically increase their usability. Next time you start sketching out a YAML-file format to use in your application, I’d encourage you to also reach for some of these tools. Your users will thank you – even if the sole user ends up being you!

References
    • PyUCIS GitHub - https://github.com/fvutils/pyucis
    • PyUCIS Docs - https://fvutils.github.io/pyucis/
    • json-schema - https://json-schema.org/
    • jsonschema Python library - https://pypi.org/project/jsonschema/
    • RedHat YAML editor for VSCode - https://github.com/redhat-developer/vscode-yaml
    • sphinx-jsonschema - https://sphinx-jsonschema.readthedocs.io/en/latest/



Copyright 2022 Matthew Ballance

The views and opinions expressed above are solely those of the author and do not represent those of my employer or any other party.





Sunday, June 12, 2022

PyVSC: Working with Coverage Data

 


I’ve been investing some time in documentation updates this weekend, after a couple of PyVSC users pointed out some under-described aspects of the PyVSC coverage flow. Given that these areas were under-documented in the past, it seemed a good opportunity to highlight what can be done with functional coverage data once it is sampled by a PyVSC covergroup.

So, we’ve described some functional coverage goals using a PyVSC covergroup and coverpoints, created a covergroup instance, and sampled some coverage data – perhaps it was randomly-generated stimulus or data sampled from a monitor. What now?

Runtime Coverage API

One simple thing we can do is to query coverage achieved using the coverage APIs implemented by PyVSC covergroup classes. The `get_coverage` method returns the coverage achieved by all instances of a covergroup type. The `get_inst_coverage` method returns the coverage achieved by the specified covergroup instance.

Let’s look at an example:


In the example above, we define a covergroup with a coverpoint that contains four bins (1, 2, 4, 8). We create two instances of this covergroup and sample them with two different values. After each call to sample, we display the coverage achieved by all instances of the covergroup (type coverage) and the coverage achieved by each instance.


The output from this example is shown above. After sampling the first covergroup, the coverage achieved for that, and all, instances is 25% since one of four bins was hit. After sampling the second covergroup, the coverage achieved for that covergroup instance is also 25%. Because two different bins are hit between the two covergroup instances, two of four bins are hit (50%) for type coverage.


Runtime Coverage Reports

Another way to look at collected coverage is via a coverage report. PyVSC provides two methods that are nearly identical for obtaining a textual coverage report:

  • get_coverage_report – Returns the report as a string
  • report_coverage – Writes the report to a string (stdout by default)

Both of these methods accept a keyword parameter named ‘details’ which controls whether bin hits are reported or just the top-level coverage achieved. Let’s look at a derivative of the first example to better understand the textual coverage report options.


This example is nearly identical to the first one, but with calls to ‘report_coverage’ instead of calls to the covergroup get_coverage methods.


The output from running this example is shown above. When reporting ‘details’ is enabled, the content of each coverage bin is reported. When reporting ‘details’ is disabled, only the top-level coverage achieved is reported. Displaying a coverage report with details is often helpful for confirming the correctness of a coverage model during development.


Saving Coverage Data

The PyUCIS library implements a Python interface to coverage data via the Accellera UCIS data model. It implements an object-oriented interface to coverage data, in addition to the Python equivalent of the UCIS C API. PyVSC uses the PyUCIS library to save coverage data, and can do so in a couple of interesting ways. Coverage data is written via the vsc.write_coverage_db method.

PyVSC can save coverage data to the XML interchange format defined by the UCIS standard. This is the default operation model for write_coverage_db. The example below shows saving it to a file named  ‘cov.xml’. 


PyVSC can also save coverage data to a custom database format, provided the tool that implements that database implements the UCIS C API. The example below saves coverage data to a custom database using the UCIS C API implemented in the shared library named ‘libucis.so’.


Both of these paths to saving coverage may provide ways to bring coverage data collected by PyVSC into coverage-analysis flows implemented by commercial EDA tools. Check your tool’s documentation and/or check with your application engineer to understand which options may be available. Feel free to report what works for you on the PyVSC discussion forum so that others can benefit as well.

Viewing Coverage Data

Obviously, you can use commercial EDA tools to view coverage data from PyVSC if your tool provides a path to bring UCIS XML in, or if it implements the UCIS C API. PyUCIS Viewer provides a very simple open-source graphical application for viewing coverage in UCIS XML format. 

To use PyUCIS Viewer, save coverage data in UCIS XML interchange format, then run PyUICIS Viewer on that XML file:

% pyucis-viewer cov.xml

A simple tree-based graphical viewer will open to show type and instance coverage. 


Conclusion

There are several options for viewing and manipulating coverage once it has been collected via a covergroup modeled with PyVSC. In a future post, we’ll look at some additional manipulation and reporting options being implemented within PyUCIS

Until then, check out the latest additions to the PyVSC documentation and raise questions and issues on the PyVSC GitHub page.


Copyright 2022 Matthew Ballance

The views and opinions expressed above are solely those of the author and do not represent those of my employer or any other party.


Sunday, March 27, 2022

TbLink-RPC: Simplifying the Multi-Language Testbench


SystemVerilog/UVM is, by far, the most widely-used language and methodology for block and subsystem-level verification environments today. The simplicity of that statement overlooks the fact that it’s often very common to have other bits of non-SystemVerilog code connected. Maybe it’s some C/C++ code that implements a reference algorithm used by the scoreboard. Maybe it’s an instruction-set simulation model used to implement software behavior. Maybe it’s infrastructure that allows early firmware code to drive behavior in the simulation. Either way, integrating non-SystemVerilog code is a non-trival development and maintenance task, despite the fact that SystemVerilog defines a standard API (the Direct Procedure Interface – DPI).

This is the first in a series of posts describing a project that I’ve been working with a goal of simplifying this situation. The TbLink-RPC (roughly Testbench Link Remote Procedure Call) provides infrastructure that dramatically reduces the code a testbench developer needs to create in order to integrate code with a simulation environment. 

Two primary experience brought me to working on the TbLink-RPC project. The first was a somewhat long history of feeling like I had to reinvent the wheel every time I needed to integrate non-SystemVerilog code into a testbench environment. The second was my interest in ‘alternative’ testbench languages and my experience using Python as a verification language.

Back in the 2017/2018 timeframe, I started to get re-involved with open-source hardware design targeting, primarily, FPGAs. When you’re designing gateware (frankly, any software-like thing), it’s imperative to have a good test environment. I did a bit of exploration, trying out bespoke C++ testbench environments and a few other things, before landing on Python and cocotb as my test framework. A lot of this was motivated by open-source tool capabilities and community. I was committed to using open-source tools as much as possible for my open-source gateware, and open-source simulators (eg Icarus and Verilator) didn’t support sufficient SystemVerilog features to be able to use SV-UVM. After looking around a bit, cocotb seemed to have the largest community around it making it the obvious choice.

I found a lot to like about Python and cocotb for developing testbench environments. Python has a large collection of libraries, and the ability to easily incorporate these in a testbench boosted my productivity. I find the Python language easy to write and use – especially for smaller projects. Having a pure-Python testbench worked for me as a hobbyist. In many ways, that is because I create all my testbench environments from the ground up and don’t use commercial Verification IP (VIP/UVCs). 

This same approach doesn’t work in most commercial environments. Testbench environments must reuse existing VIPs/UVCs (either commercial or developed in-house), and it’s common for a testbench architecture to remain mostly-unchanged across multiple design cycles. Doing a wholesale conversion to Python (or any other ‘alternative’ language) doesn’t make sense. Furthermore, bringing in small amounts of a different language has a high development and maintenance cost.

What I concluded I really wanted was a framework that would simplify the process of integrating some amount of testbench code written in any language (more precisely, any non-SystemVerilog language since the simulator already supports SystemVerilog) into a simulation environment.

Concept

I’ll get into more detail about the TbLink-RPC architecture in a future post. Fundamentally, though, the idea is to form a point-to-point connection between two environments through which pairs of objects can communicate via method calls. 


There are a few things that make TbLink-RPC different from other remote-procedure-call solutions. The first is that TbLink-RPC is simulation centric. Or, rather, it is designed to work with environments that maintain a local “simulated” notion of time. To that end, TbLink-RPC supports both blocking (time-consuming) and non-blocking functions, and defines a protocol to ensure that time advances at the intended time.

TbLink-RPC is designed to support both single OS process and multi OS-process integration. Single-process integration (where both environments run in the same OS process) provide higher performance.    But, single-process integration isn’t always feasible. For example, one environment might be an instruction-set simulator (eg QEMU) that must run in its own process in order to manage memory in its own highly-specialized way. 

TbLink-RPC emphasizes automation and modularity. Code-generation automation is used to create the boilerplate code, minimizing user effort to integrate new APIs. The entire system is designed such that integrations created independently can be easily combined. Even better, this is done in such a way that SystemVerilog users don’t need to deal with generated DPI integration code.


Example

An example is often the simplest way to get across a concept, and I have a very simple one here. For now, I’ll just show the key elements of a typical use case: connecting a Python reference model to a UVM testbench environment. 


Now, in this case our DUT isn’t that exciting. It’s just an adder, so our reference model is correspondingly trivial. 


Our reference-model class contains a method named ‘add’ that returns the sum of two parameters passed to it. Note that we apply a decorator (tblink_rpc.iftype) to the class. This registers the class with the TbLink-RPC infrastructure. Note, also, that we apply a decorator to the ‘add’ method and specify typing annotations for the parameters and return type. Together, these register the method with TbLink-RPC and specify the parameter types to be used.


In order to call our Python class from SystemVerilog, we will need a SV class to call. That class is shown above, and combines a class from the TbLink-RPC library (TbLinkLaunchUvmObj) with some generated implementation classes created from the API definition that in Python. If our class contained methods implemented in SystemVerilog that could be called from Python, then this class would contain the implementation. Since that’s not the case, there isn’t anything to implement here.

From a UVM testbench perspective, using our Python class involves two steps:

  • Launch an instance of the class in Python connected with the SV class
  • Call the API

The first step, launching, is shown above. The TbLink-RPC library class (TbLinkLaunchUvmObj) that our class inherits from implements the details of starting up and communicating with an embedded or remote environment. We just have to specify the details of how to do this via the configuration object. In this case, and in most cases, we will use a factory method to fill in common details. Because we are starting a Python environment, we must specify the Python module (uvm_python_obj) that contains the Python class we wish to call.


Finally, we need to call our reference model. Our scoreboard contains a method that accepts operand data from the agent driving the ALU, and a result from the agent monitoring the ALU output. We call the reference model by making a SV class-method call to obtain the expected result from the Python reference model.

Next Steps

TbLink-RPC is designed to simplify integrating code into simulation environment. In the context of a SystemVerilog environment, this makes it much easier and simpler to bring in external models written in non-SystemVerilog. Over the course of the next few posts, I’ll go into a bit more detail on TbLink-RPC architecture and the nuts and bolts of the integration process.


References


Copyright 2022 Matthew Ballance

The views and opinions expressed above are solely those of the author and do not represent those of my employer or any other party.


Sunday, January 23, 2022

Documenting SystemVerilog with Sphinx



I've been digging into a project over the last few months whose value proposition is to simplify the process of connecting simulation-like environments and things like reference models, testbench languages, etc. I'll write more (likely much more) about this project in the future. This post, however, is about documentation and, specifically, documentation for SystemVerilog code.

As you can imagine, SystemVerilog support is critical for a project that connects into simulation environments. SystemVerilog and UVM currently are, after all, the most widely-deployed solution for block- and subsystem-level verification. But, support for other languages (C, C++, Python, etc) are important as well. Consequently, when it comes to documenting APIs, I need to cover a fair amount of ground.


Sphinx for Documenting Code

I've been using Sphinx Documentation Generator for the last couple of years. Sphinx accepts input formatted with reStructuredText  mark-up and generates formatted output in HTML, PDF, and several other formats. Sphinx has a wide variety of plug-ins that help to make formatting different types of document content more productive. Sphinx was originally created for documenting Python code, and it shows. That said, its features support documenting far more than just Python code -- as we'll shortly see.


Leveraging Code Comments

Tools that generate documentation from code comments have been around for a long time. My experience has been that they're a good way to quickly create somewhat-generic documentation, provided the doc-generation tool supports the right coding language and the codebase contains enough comments. My experience has also been that documentation created from API code comments is lacking key insights from the author on the code's architecture. 

One thing that I've liked about Sphinx is that it supports referencing code-comment documentation, but relies on the document author to do so. This approach encourages the documentation author to incorporate API documentation alongside explanatory text (that could be awkward to include in a code comment) instead of having large fully auto-generated API-reference sections. This approach also gives the documentation author much more control over the document structure than fully-automated API-documentation tools can typically afford to provide.


What About Non-Python Code?

As mentioned previously, Sphinx was originally a tool for documenting Python packages. As such, it provides built-in features for extracting documentation comments from Python code. The Doxygen tool is one very popular way of documenting languages such as C and C++ from code comments. The Breathe plug-in for Sphinx processes the XML output files from Doxygen, allowing Sphnix documents to bring in documentation code-comments from any language supported by Doxygen.

The good news is that Doxygen supports a wide range of languages. Using Doxygen to pull in doc comments from the C++ codebase of my project will work with relative ease. Unfortunately, though, Doxygen doesn't support SystemVerilog.

 

Doxygen and Filters

Believe it or not, a commonly-recommended path to support a new language with Doxygen is to write a filter script that transforms the source language (that Doxygen doesn't support) into the form of a language that Doxygen does. 

As luck would have it, there is a SystemVerilog filter for Doxygen: DoxygenFilterSystemVerilog. It's written in PERL and uses regular expressions to recognize SystemVerilog constructs and translate them into C++ equivalents that Doxygen can understand. Doxygen primarily cares about constructs like classes, class fields, and function declarations. This makes an approximate-translation strategy workable.

Dealing with Packages and Namespaces

There's just one small issue that we need to solve in order to have a complete flow. SystemVerilog class-based code relies heavily on the pre-processor to assemble a set of classes under the appropriate package. In C++, a namespace is just a 'tag' that gets applied to content. Any number of files may declare content inside the same namespace with no issues. SystemVerilog, conversely, requires that a given package be declared only once and that all content in that package be declared inside that single package scope. The end result is that users sensibly place different classes in different files and 'glue' the whole thing together using include directives and the pre-processor.

If we are only documenting code from a single package, this likely wouldn't cause a problem. We could simply run the Doxygen filter on each class file. When documenting a codebase with multiple packages, we need to ensure we stay consistent with the packages in which classes are declared. How do we do this? By pre-processing the code, of course!


Putting it all Together

There really are two parts to our flow. The first is processing the SystemVerilog code to make it 'look' like C++.


That part of the flow is shown above. In this portion of the flow I'm using Verilator for the pre-processor
because of a very unique feature. Most pre-processors that I'm aware of strip out comments as part of preprocessing. Verilator allows the user to (optionally) retain comments. Since the vast majority of the 'interesting' content is found in documentation comments, this is a critical feature for this flow.


Once we have our "c++-ified" SystemVerilog, we can run that through Doxygen. This will result in some XML files containing the information and relationships Doxygen extracted from the XML code. These XML files are what the Breathe plug-in reads, and exposes to Sphinx.

As I mentioned earlier, Sphinx brings in documentation code-comment content on demand instead of automatically assembling it into a document or document section. 


The snippet above shows bringing in the code-comment documentation for all fields and methods within the specified class (in this case tblink_rpc::IEndpoint). The result in the document looks something like what is shown below:


There are other Sphinx doc tags that allow referencing a single function, etc.


Looking Forward

In my current work, I'm focused on SystemVerilog classes. The approach above works well for documenting class structures, but doesn't work so well for document SystemVerilog modules and interfaces. My current thinking is to use the Sphinx Verilog Domain plug-in to document modules, and continue to use the flow above for classes. Would a single tool be better? Yes. But having two complementary tools is just fine.

Now that I have a way to document SystemVerilog class code, I'm digging into that process. If you're curious about the commands/scripts I'm using, you can have a look at the tblink-rpc-docs project and the Makefile in that project.

Hope you find this helpful, and feel free to comment back on your approach to documenting SystemVerilog classes.


Sunday, April 18, 2021

SoC Integration Testing: Hw/Sw Coordination (Part 2)


Controlling the outside world -- specifically interface BFMs -- from embedded software is critical to SoC integration tests that exercise interface IP. In the last post, we showed how to pass data from embedded software to Python by tracing execution of the processor core and reading the mirrored values of registers and memory to obtain parameter values. While functional, doing things in this way is highly specific to one message-passing approach and is pretty labor intensive. In this post, we'll add some abstraction and automation to improve usability and scalability.


Design Goals



While we're initially focused on providing a nice automated way to communicate between embedded software and the test harness in a simulation environment, the design goals go beyond this. The diagram above shows the basic architecture. Endpoints provide a portal for one environment to call APIs in another environment. Each endpoint supports a known set of APIs, and different endpoints will support different sets of APIs. 

Each environment interacts with APIs on an endpoint without needing to know how communication is implemented. For example, execution trace might be used to implement processor to Python communication in a simulation-based environment. When the design is synthesized to FPGA, communication might be implemented via an external interface. With appropriate abstraction, neither the test software running on the processor nor the Python test code should need to change despite the fact that data is being moved in very different ways. 

In order for this to be feasible, we'll need to collect some meta-data about the APIs.


Example

I always find an example to be helpful, so let's look at the enhancements to the flow in the context of a simple example.
 


The diagram above shows the key elements of a very small SoC called Tiny SoC. We can test many aspects of integration using just software on the processor. For example, we can read registers in the peripheral devices and check that they are correct. We can carry out DMA transfers. But, we need to control the outside world when testing the full path from software through the UART and SPI devices.

Bus functional models (BFMs) or Verification IP (VIP) provide very effective ways to interact with interface protocols from testbench code. What we need in addition is a way to control these BFMs from the software running on the core in the design.


Capturing the API

Let's focus on the UART for now. Our UART BFM provides a detailed API for configuring individual attributes of the UART protocol (eg baud-rate divisor) and for interacting with the UART protocol a byte at a time. That's fine for IP-level testing, but is a bit too low-level for software-driven testing.

For software-driven testing, we want to instruct the BFM to do some reasonable amount of work and let it go. To help with this, the UART BFM defines a higher-level API intended for use by software. 

An example of that higher-level API is shown above. Calling the uart_bfm_tx_bytes_incr API causes the BFM to begin sending a stream of bytes starting with a specific value and incrementing. There is another API that instructs the BFM to expect to receive a stream of bytes sent by the software running on the processor.

To enable automation, we describe the Python API that we will call from embedded software using special annotations. We collect related APIs together in a class, and identify whether these methods are exported by the Python environment and will be called by the embedded software, or are imported by the Python and will be called by Python code. 


Since we want embedded software to call this API, the API is considered to be exported by Python. You can also see the configuration function that updates the UART's configuration (eg baud rate).

Each of the method parameters is given a Python3 type annotation. This enables the Python libraries to know the type of each parameter and collect the right data to pass when the functions are called. 

On the C side, we simply need to have functions with the same signature as what we've captured in the Python API definition.


While the code shown above (link) is hand-coded, we could generate it automatically based on what is specified in the Python API definition. 


Connecting to Implementation: Python

Connecting all of this up on the Python side involves connecting the relevant BFMs and API implementations together. 


The snippet above is from the cocotb test that runs when a baremetal software test is run (link). At the beginning of simulation, the test locates the relevant BFMs. The u_dbg_bfm is the tracer BFM that monitors execution of software on the processor core. This BFM implements an Endpoint, as shown in the diagram at the beginning of the post. The u_uart_bfm is the BFM connected to the UART interface on TinySoC. 

Once we have all the BFMs, we can create an instance of the higher-level UART BFM API (uart_bfm_sw) and tell the debug BFM that it should handle the embedded software calling these APIs.


Example C-Test
With the BFMs connected on the Python side, we can now focus on how to interact with the BFM from the software test.

The software test snippet above transmits some data via the UART to the waiting UART BFM to check (link). Before we can send data, both the UART IP and the external BFM need to be configured in the same way. We program the UART IP via its registers, and call the uart_bfm_config function to cause the corresponding Python method to be invoked. This will cause the UART BFM mode to be configured.

Next, we call the uart_bfm_rx_bytes_incr to tell the UART BFM that it should expect to receive 20 bytes. It should expect the first byte to have a value 10 and subsequent bytes to increment by one. By telling the BFM what to expect, our test is self-checking and the required amount of interaction is small.

Finally, we again interact with the UART IP actually send the data that the BFM is expecting. 

Next Steps
The API definition and Endpoint architecture described in the post above provides a modular way to capture the APIs used to communicate across environments. Because the API signature is captured in machine-readable way, it also enables the use of automation when implementing the APIs for different environments. 

As I mentioned at the beginning of the post, the API and Endpoint architecture is designed so it can be applied in many verification environments -- it's certainly not restricted to just communicating between embedded software test and the test harness. I've been interested for a while in methodology for creating and verifying firmware along with the IP that it controls such that it's ready to go when SoC-integration testing begins. My next post will begin exploring how to create, verify, and deliver firmware along with an IP.

References


Disclaimer
The views and opinions expressed above are solely those of the author and do not represent those of my employer or any other party.

Sunday, March 28, 2021

SoC Integration Testing: Hw/Sw Test Coordination (Part 1)

 



IP- and subsystem-level testbenches are quite monolithic. There is a single entity (the testbench) that applies stimulus to the design, collects metrics, and checks results. In contrast, an SoC-level testbench is composed of at least two islands: the software running on the design’s processor and the external testbench connected to the design interfaces. Efficiently developing SoC tests involving both islands requires the ability to easily and efficiently coordinate their activity.

There are a two times when it’s imperative that the behavior of the test island(s) inside the design and the test island outside the design are coordinated – specifically, the beginning and end of the test when all islands must be in agreement. But, there are many other points in time where it is advantageous to be able communicate between the test islands. 

Especially when running in simulation, the ability to efficiently pass debug information from software out to the test harness dramatically speeds debug. 

It’s often useful to collect metrics on what’s happening in the software environment during test – think of this as functional coverage for software. 

Verifying our design requires applying external stimulus to prove that the design (including firmware) reacts appropriately. This requires the ability to coordinate between initiating traffic on external interfaces and running firmware on the design processors to react – another excellent application of hardware/software coordination. 

Often, checking results consumes a particularly-large portion of the software-test’s time. The ability to offload this to the test harness (which runs on the host server) can shorten our simulation times significantly. 

Key Care-Abouts

When it comes to our key requirements for communication, one of the biggest is efficiency – at least while we’re in simulation. The key metric being how many clock cycles it takes to transfer data from software to testbench. When we look at a simulation log, we want to see most activity (and simulation time) focused on actually testing our SoC, and not on sending debug messages back to the test harness. A mechanism with a low overhead will allow us to collect more debug data, check more results, and generally have more flexibility and freedom in transferring data between the two islands.

Non-Invasive

One approach to efficiency is to use custom hardware for communication. Currently, though this may change, building the communication path into the design seems to be disfavored. So, having the communication path be non-invasive is a big plus.

Portable

Designs, of course, don’t stay in simulation forever. The end goal is to run them in emulation and prototyping for performance validation, then eventually on real silicon where validation continues -- just at much higher execution speed. Ideally, our communication path will be portable across these changes in environment. The low-level transport may change – for example, we may move from a shared-memory mailbox to using an external interface – but we shouldn’t need to fundamentally change our embedded software tests or the test behavior running on the test harness.

Scalable

A key consideration – which really has nothing to do with the communication medium at all – is how scalable the solution is in general. How much work is required to add a piece of data (message, function, etc) that will be communicated? How much specialized expertise is required? The simpler the process is to incrementally enhance the data communicated, the greater the likelihood that it will be used.

Current Approaches

Of the approaches that I’ve seen in use, most involve either software-accessible memory or the use of an existing external interface as the transport mechanism between software and the external test harness. In fact, one of the earliest cases of hardware/software interaction that I used was the Arm Trickbox – a memory-mapped special-purpose hardware device that supported sending messages to the simulation transcript and terminating the test, among other actions.

In both of these cases, some amount of code will run on the processor to format messages and put them in the mailbox or send them via the interface. 

Challenges

Using a memory-based communication is generally possible in a simulation-based environment – provided we can snoop writes to memory, and/or read memory contents directly from the test harness. That doesn’t mean that memory-based communication is efficient, though, and in simulation, we care a lot about efficiency due to the speed of hardware simulators.

Our first challenge comes from the fact that all data coming from the software environment needs to be copied from its original location in memory into the shared-memory mailbox. This is because the test harness only has access to portions of the address space, and generally can’t piece together data stored in caches. The result is that we have to copy all data sent from software to the test harness out to main (non-cached) memory. Accessing main memory is slow, and thus communication between software and the test harness significantly lengthens our simulations.

Our second challenge comes from the fact that the mailbox is likely to be smaller than the largest message we wish to send. This means that our libraries on both sides of the mailbox need to manage synchronizing data transmission with available space in the mailbox. This means that one of the first tasks we need to undertake when bringing up our SoC is to test the communication path between software and test harness.

A final challenge, which really ought not to be a challenge, is that we’ll often end up custom-developing the communication mechanism since there aren’t readily-available reusable libraries that we can easily deploy. More about that later.

Making use of Execution Trace

In a previous post, I wrote about using processor-execution trace for enhanced debug. I've also used processor trace as a simple way to detect test termination. For example, here is the Python test-harness code that terminates the test when one of 'test_pass' or 'test_fail' are invoked:

In order to support test-result checking, the processor-execution trace BFM has the ability to track both the register state and memory state as execution proceeds.


The memory mirror is a sparse memory model that contains only the data that the core is actively using. It's initialized from the software image loaded into simulation memory, and updated when the core performs a write. The memory mirror provides the view of memory from the processor core's perspective -- in other words, pre-cache. 

Our test harness has access to the processor core's view of register values and memory content at the point that a function is called. As it turns out, we can build on this to create a very efficient way to transferring data from software to the test harness.

In order to access the value of function parameters, we need to know the calling convention for our processor core. Here's the table describing register usage in the RISC-V calling convention:

Note that x10-17 are used to pass the first eight function arguments. 

Creating Abstraction
We could, of course, directly access registers and memory from our test-harness code to get the value of function parameters. But, a little abstraction will help us out in the long run.

The architecture-independent core-debug BFM defines a class API for accessing the value of function parameters. This is very similar to the varadic-argument API used in C programming:


Now, we just need to implement a RISC-V specific version of this API in order to simplify accessing function parameter values:

Here's how we use this implementation. Assume we have a embedded-software function like this:
When we detect that this function has been called, we can access the value of the string passed to the function from the test harness like this:


Advantages
There are several advantages to using a trace-driven approach to data communication between processor core and test harness. Because the trace BFM sees the processor's view of memory, there's no need to (slowly) copy data out to main memory in order for the test harness to see it. This allows data to stay in caches and avoids unnecessary copying.

Perhaps more importantly, our trace-based communication mechanism allow us to offload data processing to the test harness. Take, for example, the very-common debug printf:


The user passes a format string and then a variable number of arguments that will all be converted to string representations that can be displayed. If our communication mechanism is an in-memory mailbox or external interface, we need to perform the string formatting on the design's processor core. If, however, we use the trace-based mechanism for communication, the string formatting can all be done by the test harness in zero simulation time. This allows us to keep our simulations shorter and more-focused on the test at hand, while maximizing the debug and metrics data we collect.


Next Steps

SoC integration tests are distributed tests carried out by islands of test behavior running on the processor(s) and on the test harness controlling the external interfaces. Testing more-interesting scenarios requires coordinating these islands of test functionality. 

In this post, we’ve looked at using execution-trace to implement a high-efficiency mechanism for communicating from embedded test software back to the test harness. While this mechanism is mostly-specific to simulation, it has the advantage of simplifying communication, debug, and metrics collection at this early phase of integration testing when, arguably, we most-need a high degree of visibility. 

While we have an efficient mechanism, we don’t yet has a mechanism that makes it easy to add new APIs (scalable) nor a mechanism that is easily portable to environments that need to use a different transport mechanism.

In the next post, we’ll have a look at putting some structure and abstraction around communication that will help with both of these points.

References

Disclaimer
The views and opinions expressed above are solely those of the author and do not represent those of my employer or any other party.


Sunday, February 28, 2021

SoC Integration Testing: IP-Integrated Debug and Analysis

 


One of the things I've always liked about side projects is the freedom to stop and explore a topic of interest as it comes up. One such topic that came up for me recently is IP-integrated debug and analysis instrumentation. I started thinking about this after the last post (link) which focused on exposing a higher-abstraction-level view of processor-core execution. My initial approach to doing this involved a separate bus-functional model (BFM) intended to connect to any RISC-V processor core via an interface. After my initial work on this bus-functional model that could be bolted onto a RISC-V core, two things occurred to me:
  • Wouldn't it be helpful if processor cores came with this type of visibility built-in instead of as a separate bolt-on tool?
  • Wouldn't SoC bring-up be simpler if more of the IPs within the SoC exposed an abstracted view of what they were doing internally instead of forcing us to squint at (nearly) meaningless signals and guess?
And, with that, I decided to take a detour to explore this a bit more. Now, it's not unheard of to create an abstracted view of an IP's operation during block-level verification. Often, external monitors that are used to reconstruct aspects of the design state, and that information is used to guide stimulus generation, or as part of correctness checking. Some amount of probing down into the design may also be done.

While this is great for block-level verification, none of this infrastructure can reasonably move forward to the SoC level. That leaves us with extremely limited visibility when trying to debug a failure at SoC level.

If debug and analysis instrumentation were embedded into the IP during its development, an abstracted view of the IP's operation would consistently be available independent of whether it's being verified at block level or whether it's part of a much larger system.

Approach

After experimenting with this a bit, I've concluded that the process of embedding debug and analysis instrumentation within an IP is actually pretty straightforward. The key goals guiding the approach are:
  • Adding instrumentation must impose no overhead when the design is synthesized. 
  • Exposing debug and analysis information must be optional. We don't want to slow down simulation unnecessarily if we're not even taking advantage of the debug information
When adding embedded debug and analysis instrumentation to an IP, our first step is to create a 'socket' within the IP to which we can route the lower-level signals from which we'll construct the higher-level view of the IP's operation. From a design RTL perspective, this socket is an empty module whose ports are all inputs. We instance this 'debug-socket' module in the design and connect the signals of interest to it.

Because the module contains no implementation and only accepts inputs, synthesis tools very efficiently optimize it out. This means that having the debug socket imposes no overhead on the synthesized result.

Of course, we need to plug something into the debug socket. In the example we're about to see, what we put in the socket is a Python-based bus functional model. The same thing could, of course, be done with a SystemVerilog/UVM agent as well.

Example - DMA Engine

Let's look at a simple example of adding instrumentation to an existing IP. Over the years, I've frequently used the wb_dma core from opencores.org as a learning vehicle, and when creating examples. I created my first OVM testbench around the wb_dma core, learned how to migrate to UVM with it, and have even used it in SoC-level examples. 

DMA Block Diagram

The wb_dma IP supports up to 31 DMA channels internally that all communicate with the outside world via two initiator interfaces and are controlled by a register interface. It isn't overly complex, but determining what the DMA engine is attempting to do by observing traffic on the interfaces is a real challenge!

When debugging a potential issue with the DMA, the key pieces of information to have are:
  • When is a channel active? In other words, when does it have pending transfers to perform?
  • When a channel is active, what is it's configuration? In other words, source/destination address, transfer size, etc.
  • When is a channel actually performing transfers?
While there may be additional things we'd like to know, this is a good start.



The waveform trace above shows the abstracted view of operation produced for the DMA engine. Note the groups of traces that each describe what one channel is doing. The dst, src, and sz traces describe how an active channel is configured. If the channel is inactive, these traces are blanked out. The active signal is high when the channel is actually performing transfers. Looking at the duty cycle of the active signals across simultaneously-active channels gives us a good sense for whether a given channel is being given sufficient access to the initiator interfaces. 

Let's dig into the details a bit more on how this is implemented.

DMA Debug/Analysis Socket
We first need to establish a debug/analysis "socket" -- an empty module -- that has access to all the signals we need. In the fwperiph-dma IP (a derivative of the original wb_dma project), this socket is implemented by the fwperiph_dma_debug module


And, that's all we need. The debug/analysis socket has access to:
  • Register writes (adr, dat_w, we)
  • Information on which channel is active (ch_sel, dma_busy)
  • Information on when a transfer completes (dma_done_all)
Note that, within the module, we have an `ifdef block allowing us to instance a module. This is the mechanism via which we insert the actual debug BFM into the design. Ideally, we would use the SystemVerilog bind construct, but this IP is designed to support a pure-Verilog flow. The `ifdef block accomplishes roughly the same thing as a type bind.

Debug/Analysis BFM

The debug/analysis BFM has two components. One is a Verilog module that translates from the low-level signals up to operations such as "write channel 2 CSR" and "transfer on channel 3 complete". This module is about 250 lines of code, much of it of low complexity. 

The other component of the BFM is the Python class that tracks the higher-level view of what channels are active, how they are configured, and ensures that the debug information exposed in signal traces is updated. The Python BFM can also provide callbacks to enable higher-level analysis in Python. The Python BFM is around 150 lines of code. 

So, in total we have ~400 lines of code dedicated to debug and analysis -- a similar amount and style to what might be present in a block-level verification environment. The difference, here, is that this same code is reusable when we move to SoC level. 

Results

Thus far, I've mostly used the waveform-centric view provided by the DMA-controller integrated debug. Visual inspection isn't the most-efficient way to do analysis, but I've already had a couple of 'ah-ha' moments while developing some cocotb-based tests for the DMA controller. 


I was developing a full-traffic test that was intended to keep all DMA channels busy for most of the time when I saw the pattern in the image above. Notice that a transfer starts on each channel (left-hand side), and no other transfers start until all the previously-started transfers are complete (center-screen). Something similar happens on the right-hand side of the trace. Seeing this pattern graphically alerted me that my test was unintentionally waiting for all transfers to complete before starting the next batch, and thus artificially throttling activity on the DMA engine.


With the test issue corrected, the image above shows expected behavior where new transfers start while other channels are still busy.


Looking Forward

I've found the notion of IP-integrated debug and analysis instrumentation very intriguing, and early experience indicates that it's useful in practice. It's certainly true that not all IPs benefit from exposing this type of information, but my feeling is that many that contain complex, potentially-parallel, operations exposed via simple interfaces will. Examples, such as DMA engines, processor cores, and PCIe/USB/Ethernet controllers come to mind. And, think how nice it would be to have IP with this capability built-in!

In this blog post, we've looked at the information exposed via the waveform trace. This is great to debug the IP's behavior -- while it's being verified on its own or during SoC bring-up. At the SoC level, the higher-level information exposed by at the Python level may be even more important. As we move to SoC level, we become increasingly interested in validation -- specifically, confirming that we have configured the various IPs in the design to support the intended use, but not over-configured them and, thus, incurred excess implementation costs. My feeling is that the information exposed at the Python level can help to derive performance metrics to help answer these questions.

This has been a fun detour, and I plan to continue exploring it in the future -- especially, how it can enable higher-level analysis in Python. But, now it's time to look at how we can bring the embedded-software and hardware (Python)  portions of our SoC testbench closer together. Look for that in the new few weeks.

References

Disclaimer
The views and opinions expressed above are solely those of the author and do not represent those of my employer or any other party.