16 December 2022

Networks

Enhancing SONiC - the process of developing custom network functionality

13 minutes reading

Enhancing SONiC - the process of developing custom network functionality

In the previous article, we introduced our reasons for exploring SONiC’s potential in terms of building new network functionalities, presented a use case, and described how we approached the task on the logical level.

In this part, we want to share the experience we have gained through the development process. Note: please do not treat this as a comprehensive developer’s guide. What we explain here is our process and the key takeaways from it, which we believe might come in handy for other SONiC developers.

The developer’s resources

Before we start, a few words about the knowledge sources and repos.

From a developer’s perspective, the SONiC project has several resources used on a daily basis. Those include:

Apart from the official sources, we’ve found the Developer’s Overview of Sonic (presented on Youtube as well) by Praveen Chaudhary to be a good starting point for an aspiring SONiC developer. 

Development plan

The intended solution was to extend ACLs with UDFs. How was that approached?

Having to deal with ACLs, the first place we checked was the doc/acl directory, specifically ACL High Level Design. It showcases the flow diagram of the creation of ACL objects in terms of the logical architecture of the control plane (figure modified):

Flow diagram of the creation of ACL objects

Fig. 1 Flow diagram of the creation of ACL objects, source: SONiC GitHub

The logical architecture does not map 1-to-1 to the physical software architecture. In terms of the processes involved, ACL creation needs to engage the following physical components:

Physical components involved in ACL creation

Fig. 2 Physical components involved in ACL creation

Our plan was to update the control flow and verify that the ACL configuration based on user-defined fields is handled correctly throughout the entire stack.

SAI specification

First, based on the SAI documentation UDF-based ACL and the actual SAI definitions, we’ve specified the proper SAI API usage for our use case.

SAI, for ACL and UDF mechanisms, defines several object types which must be used. Those objects and their relationships are presented in the picture below:

SAI objects’ relationships for ACL and UDF mechanisms

Fig. 3 SAI objects’ relationships for ACL and UDF mechanisms 

Note: for ACL RULE and ACL TABLE objects not all attributes are shown (only those important/used for the use case)

In general, a UDF Group may contain multiple UDF objects (in fact a UDF object references the UDF Group to which it belongs). Multiple UDF objects may point to the same or different UDF Match objects (when a packet goes through a UDF group object, one UDF is selected based on its UDF match object). For the purposes of our use case, single UDF Match and UDF objects are enough. 

The initial research has shown that most of the development work will be done in Orchestration Agent (aka Orchagent). It’s the component responsible for interpreting the user configuration and calling the SAI API (through a database proxy). We needed to implement two kinds of change in orchagent:

  1. add new logic for creating UDF objects with SAI,
  2. modify the existing logic to use the UDF in ACL tables and ACL rules with SAI.
Improve your network operations. Check how we can help.

Development execution

We could separate the two phases of development:

  1. Core logic changes in Orchagent - implementing the core logic for configuring UDF with SAI API and connecting the UDFs with ACLs.
  2. Integrating the Orchagent changes through SONiC - fixing any follow-up bugs or problems.

Phase 1 Core logic changes - Orchagent

Knowing what new data is required by SAI from the user, we’ve updated the Configuration Database schema, which serves as an input for Orchestration Agent.

Having specified the required input and expected output for Orchagent, it was enough to implement the test case for configuring UDF-based ACLs using the existing Orchagent unittest infrastructure:

Orchagent unit test scenario visualization

Fig. 4 Orchagent unit test scenario visualization 

Finally, using the test-first approach, we implemented and verified the required logic in Orchagent. Looking at the ACL flow diagram, the central piece was working as expected:

ACL configuration phase 1 work-in-progress

Fig. 5 ACL configuration phase 1 work-in-progress 

Phase 2 Breaking further through this jungle (watch out for the bugs)

At that point, we were not in possession of a physical switch with SONiC installed and were thus limited by the capabilities of the SONiC virtual (software) switch. We performed multiple trials, but could not get the software switch to respect the ACL configurations. That made it impossible for us to test ACLs on the data plane. 

We could only verify if the control plane worked as expected. A valid approach to test that in SONiC is to check the ASIC_DB database, which reflects the current state of the underlying SAI-compliant ASIC.

The scenario was simple:

  • pass a simple UDF-based ACL config to swssconfig,
  • verify that the content of ASIC_DB Redis database resembles our input.

The principle of end-2-end test scenario

Fig. 6 The principle of end-2-end test scenario 

The first results were negative. The ASIC_DB database showed no valid UDF-based ACLs configured. Orchagent syslog analysis gave us the reason. 

There is a library in SONiC (saimetadata) responsible for enriching SAI API with additional runtime metadata. We configured a SAI object not used by SONiC before and some part of the SAI metadata logic for that object was not implemented yet. 

Moreover, there are two components dependent on saimetadata: Orchagent (via the SAI Redis proxy) and SyncD (responsible for interfacing with the vendor ASIC according to the state of ASIC_DB). So now we knew that both Orchagent and SyncD needed a fix.

ACL configuration phase 2 work-in-progress

Fig. 7 ACL configuration phase 2 work-in-progress

The saimetadata fix required adding missing validation of the data and turned out to be rather simple. We rebuilt and redeployed the swss and syncd containers.

At that point, the control plane test was successful: a UDF-based ACL configuration was correctly digested by the system, which could be verified by the ASIC state represented in ASIC_DB.

ACL+UDF successful configuration applied

Fig. 8 ACL+UDF successful configuration applied

Development environment setup

Now we would like to share some of our experience with development setup. 

Because of the large number and variety of components in the SONiC system, the build system is rather complicated. There is a convenient Makefile available for the user. The actual build of targets, i.e. .deb packages, Docker images and binary images, is dockerized in multiple layers of containers and governed by autotools, aka GNU Build System. That gives a useful separation between the host machine and the build process, resulting in a robust build toolchain working out-of-the-box.

That however, does not play well with the smart modern IDEs used for development. Accurate index search, test framework integration, integrated debugger, code completion, and compilation errors shown in the editor before the code is compiled - all these increase one's productivity, but require the development tools to be fully aligned with the build process. 

Remote development, but hosted locally

Our use case required most of the work to be performed within the Orchagent component, written in C++. Practically speaking, the simplest setup for a C++ project would be to have the dependencies and build tools natively on the host system, alongside the IDE (CLion by JetBrains in our case):

Regular IDE development overview

Fig. 9 Regular IDE development overview 

Given the system dependencies and package versions required, it is not possible to build Orchagent natively on a regular Ubuntu machine using the preferred IDE in a simple manner.

To work around that problem we used a tool called JetBrains Gateway. Orchagent build is configured inside a Debian Docker container hosted locally, with all the specific required .deb packages installed by hand. Then, the JetBrains client connects via ssh to the Docker builder instance and runs a CLion backend there. The host system is just used for rendering and interacting with the IDE frontend. The Gateway supports any ssh-enabled system, but hosting a Docker instance locally was just the most convenient solution:

Remote IDE development overview

Fig. 10 Remote IDE development overview 

Judging from the information we got from the SONiC community via the sonicproject Google Group, such an approach does not seem to be widespread among SONiC developers.

There are of course alternatives to the JetBrains Gateway. For example, an analogous approach is available for VS Code in the Visual Studio Code Remote Development extension.

Judging from the information we got from the SONiC community via the sonicproject Google Group, such an approach does not seem to be widespread among SONiC developers.

There are of course alternatives to the JetBrains Gateway. For example, an analogous approach is available for VS Code in the Visual Studio Code Remote Development extension.

Summary - lessons learned

  • This specific project required flexibility and trial-and-error methods. We never knew if we were chasing a dead end or a silver bullet. One can never be certain about the time estimates for research tasks.

  • The SONiC control plane is divided into functional containers. The data plane is abstracted behind the SAI layer. That results in a flexible and relatively easy-to-learn architecture, open for extension.

  • From the developer's perspective, SONiC has a rather complicated build system with knowledge stored in multiple spots:

  • “Remote” development on a local container can be a lifesaver when working on projects with strict or complicated dependency requirements. 

In the next part, we will take you through the final stage - verification of our proof of concept in a lab setup.

Łukasz

Łukasz Drożdż

Software Engineer
Michał

Michał Pawłowski

Senior Network Engineer
Tomasz

Tomasz Sikorski

Engineering Manager