29 September 2020

Low-level programming

P4-programmable smartNIC controlled by ONOS (video)

65 minutes reading

P4-programmable smartNIC controlled by ONOS (video)

The traditional NIC (Network Interface Card) is a relatively simple device equipped with Ethernet interface(s) and used to enable connectivity between machines. SmartNICs, by contrast, offer much more sophisticated capabilities and allow you to perform advanced operations on packets. SmartNICs are thus perfectly suited to optimize network performance in a data center, especially those that are programmable and offer the computing resources required.

Consider the example of VNF (Virtual Network Function) offloading. The idea is to execute common network functions—a firewall, a NAT or a load balancer—directly on a smartNIC rather than as a virtual appliance (a virtual machine or a container) deployed on the server.

SmartNICs with P4 support are a particularly compelling solution in this context. They make it possible to express how the network dataplane has to process packets using P4 language, which is gaining in popularity in the networking industry and is considered the next step in the evolution of SDN.

An important aspect is how to effectively control the P4 smartNIC at runtime phase. What you need is control plane software. It can be based on an SDN controller, for example. The video below walks you through how we integrated the smartNIC with an open-source ONOS controller using a dedicated smartNIC proxy developed expressly for the purpose by CodiLime’s R&D. We present a VNF offloading use case in practice by providing a custom P4 implementation of an example network function (a firewall) and loading it onto the smartNIC.

In fact, the scope of our PoC is wider than only P4-related aspects. We show how a heterogeneous dataplane—P4 smartNIC with emulated OVS-based leaf-spine fabric—can be controlled by a single SDN solution (ONOS) using different protocols. We also enhanced ONOS webUI and CLI with additional features related to our firewall use case. To the best of our knowledge, this is among the first such PoCs, if not the first.

Hi, my name is Paweł, and my name is Artur. Welcome to our webinar. Today we would like to present the solution we have developed in CodiLime's R&D department and this is a P4-programmable smartNIC controlled by ONOS controller.

So before we start, let me say a few words about our company. CodiLime is a networking industry expert. We've been providing cloud and network expertise to top global networking hardware and software providers and telecoms since 2011. Currently, we have more than 200 network, software and DevOps engineers on board. And what is important regarding the topic of this webinar, we do have experience in integrating smartNIC devices with third-party management or control plane software, including SDN controllers. And we are also experts in the DPDK and P4 development for those areas. We have also successfully delivered a number of such projects already for top tech companies in Silicon Valley.

p4 development

OK, so I think it's high time to get started with the content we have prepared for you today. So we'll start with just a few words on technologies or components that are important to our solution. Next, we'll give you a brief overview of our PoC architecture and then we'll discuss its software components one by one. We'll also present how the entire environment is configured and how it can be set up in an optimized way. And finally, we'll show you a demo and then there will be time for a Q&A session, as we usually do during our webinars. So let's start.

As I mentioned, the essence of our solutions is mainly controlling P4-programmable smartNIC by ONOS SDN controller. To the best of our knowledge, this is among the first such PoC, if not the first. So before we start discussing the detailed architecture of our proof of concept and its internal components, we'd like to give you a very brief introduction to what P4 is all about, what smartNIC we used and how P4 is in general handled in ONOS.

OK, so what is the P4? If I had to answer this question with just one statement, I would say that the P4 approach is to define how the data plane of a given network device is to process packets. This concept is based on generally the same principle that is used in the areas of CPU, GPU or DSP programming where we write high-level programs using a domain-specific language. Next, we compile the code and it is executed by the given domain-specific processor. And now how to build a network system according to the same paradigm?

So basically doing the same thing. But it is absolutely crucial that we have to start using programmable chips. In addition, we need a framework to describe what network functions we want to perform and how the overall data plane processing pipeline should be organized. And this is where P4 language helps us to express that in a standardized way. So when your P4 programs are ready, you just compile it and load executable files into the chip and that's all.

P4 stands for programming protocol-independent packet processors. It comes from the paper published in 2013 where the idea of P4 language was originally described. The syntax of the language is similar to the C language, I would say. But P4 as such is less flexible than C. For example, some constructs that are present in C like loops, are not available in P4. So far two versions of the language have been released. The first one was called P414 and the second one is known as P416.

What is very important is that P416 specification introduces the so-called architectural model, which defines all existing function blocks for a given data plane. And here on the slide, you can find different examples of architectural models. In general, all those models are defined as a so-called pipeline the incoming packets will be processed within, and such a pipeline contains different blocks. Some of them will be fully programmable and these are called programmable blocks and some will be fixed functions, meaning that P4 programmers will not have any control over them. So now let's have a look at how to create a P4 program and apply it to the target device.

So, the first step would be to check the architecture model our device complies with because we need to know which blocks are programmable in fact. Now we can write all the P4 code and after it is ready, we compile it using the compiler, which is usually provided by the device manufacturer. And the compiler generates binaries which are loaded onto the device and since then all the objects defined in the P4 code are present in a data plane.

Now, to effectively talk to this data plane, we need a control plane software. For this purpose, we can use, for example, some open-source SDN controller. And in our case, a data plane is a smartNIC and the control plane is based on ONOS. A couple of words on ONOS. It's a well known open-source controller developed by Open Networking Foundation. I would say, it has a typical architecture of a core part where ONOS services are implemented. Those services provide a kind of inventory of currently connected devices, host, links, etc. And they also provide an overview on a current network topology. And they also store all match action rules installed in the devices. And those subsystems provide APIs, which can be in turn consumed by ONOS applications. Some of them are offered just out of the box within the given ONOS release.

But since ONOS is a mobile platform, you can write a totally new app on your own, extending the ONOS capabilities with these new functions. In this old one part, you've got a collection of drivers and protocol libraries you can use to communicate with particular devices. One of them is P4Runtime, which is a protocol used to control P4-defined data plane. In fact, P4Runtime is an API specification based on the so called protocol buffers, which provide a method for serializing structured data in an efficient way. It was designed to be oriented around simplicity and performance. Data structures are called the messages and they are defined in so called .PROTO files. P4Runtime uses gRCP as a transport, which is an open-source RCP system. And its main role is to manage the way a client and the server can interact.

And what's important, it has a lot of nice features supported natively like security, like authentication mechanism or bidirectional streaming, etc. And what is more, using protobufs and gRCP allows you to automatically generate the required code for both the client and the server for many different languages like Java, C++, Python, Go, etc. And just to give you an example, maybe without any deep dive, since we don't have time for this today. This is how the P4Runtime WriteRequest message may look like. Here it is presented in the so-called protobuf text format. Here we can see how an example table defined in some P4 program can be populated with sample data using this message. ONF has been developing a number of projects where P4 is used.

However, those implementations are dedicated to a multipart physical switch like a DC fabric switch, for example. And what we would like to focus on today is the smartNIC as a device onto which P4 programs are loaded. Unlike the traditional network interface, smartNICs usually offer some level of programming capabilities, meaning that you can define how they will perform operations on packets. And especially those devices that support P4 can be an interesting option. You might want to read our blog posts where we analyze such solutions.

All right. One of possible scenarios where we can employ smartNIC is the so called VNF offloading. This can be an interesting use case when we think about some deployments in the data center for which you would like to generally optimize network performance. And this is what we want to show you today. Which smartNIC did we use for that purpose? So we took a solution offering a native support for P4 - a Netronome Agilio card equipped with two 10GbE interfaces with the NFP chip on board.

The latest version of the SDK declares support for the P4Runtime protocol here, but it is not actually fully supported. And therefore another protocol or method which is called Thrift is the preferred way to control this card at runtime. We'll talk on that later on.

Now, I would like to briefly discuss the high-level architecture of our PoC that we have developed in CodiLime's R&D department. So, the main data plane component here is the smartNIC, of course, that is installed in one of our bare metal servers. What we want to achieve is to offload some network functions to be executed directly on the smartNIC. It can be a firewall, it can be a load balancer, it can be NAT, et cetera.

For the first round, we have prepared some example P4 implementation of the firewall. But as I said, this is only an option, in fact, because other functions can be executed as well after preparing the required P4 implementations in the future. And as you can see, in fact, the scope of our PoC is wider than only P4 itself because apart from the smartNIC we have setup on the other bare metal server, a kind of emulated leaf-spine fabric based on the OVS software switches And all the components above the smartNIC and the emulatated leaf-spine are controlled by the same SDN controller, which is ONOS in our case. What we want to show is that a single SDN controller can control a heterogeneous data plane environment using multiple protocols.

So we are using OpenFlow for OVS switches and to control their smartNIC data plane we have developed a dedicated proxy that converts P4Runtime protocol to the Thrift protocol, which is understood by the smartNIC, as mentioned earlier. Now let's start to discuss different elements one by one. So maybe let's focus on the P4 program first. Okay. Now I would like to briefly discuss the highly, sorry I would like to present our P4 code to you to make you familiar with details of our implementation and with P4 language itself. As it was said before, P416 is a universal language which can be executed on different types of devices. Abilities of those devices are described by P4 architecture shipped by manufacturer. Here we see the V1Model, which is one of the reference architecture provided by P4.org. And also it can be used by our Agilio smartNIC.

This model is composed of the parser, five control blocks and one fixed function block. Parser and deparser are responsible for packing and unpacking packets in the appropriate way. Checksum-related blocks can be used to verify and update checksum in case of packet header manipulation. Ingress and egress blocks are used to execute the processing logic. The parser is responsible for extracting previously defined packet headers. Headers extraction is necessary, to allow the P4 program to operate on those data. The parser block is quite limited and it mainly allows us to make use of the IF statements.

In fact, the entire logic of our firewall is sitting in the ingress block. The heart of it are two tables. First one, is the forwarding table. The second one, is the actual firewall table. Our forwarding mechanism is quite simple, and it differentiates packets only per input port value. Then, it forwards the packet to the given output port or drops it. We had to do this because in P4 there is no default behavior and all codes need to be defined by ourselves. The firewall table contains all fields which we use for magic of course, like protocol from IPv4 header or source address from Ethernet header. For the given packet our table can perform one of two defined actions: allow or drop. Actually, there is a third action here - no action, but it is only to make it default.

The last part is the deparser block, which is responsible for reassembling packets. During the build phase the P4 code is changed to three necessary products which are relevant to Agilio smartNIC: Netronome binary file, design file and P4 info file. Aside from the binary and design, we have P4 info file, which is a definition of our data plane implementation. Next, this file is being utilized by P4Runtime client. The next element of our PoC architecture I would like to discuss is our custom firewall control application. When it comes to say about the details of our application, it is composed of many different parts which are present in the tree diagram. Starting from the top, we can specify two basic elements of our firewall application: the pipeconf and the control application.

First, let's talk about the pipeconf. In the ONOS, pipeconf is a service to manage the configuration of protocol-independent pipelines. It's actually from the documentation of ONOS. Going down here, we divide the pipeconf package into three components: behaviors, models and extensions. Actually, this division is imposed by the way of creating a pipeconf object in the code. Here we have an example from our codebase how we create the pipeconf object with previously mentioned elements. Our implementation of the pipeline interpreter is acting as a behavior component.

The task of the behavior part is to map ONOS internal data structure to P4-defined data fields. For example, we need to tell ONOS that a field named protocol from IPv4 structure, defined in our P4 program of course, corresponds to IPv4 ProtoField in ONOS internal structures. But passing only the field name is not enough. Actually, ONOS talks to the smartNIC via P4Runtime protocol which uses identifiers instead of names. Here comes the model part of the pipeconf object. It loads P4 info file and in an automatic way allows us to use P4 names in the ONOS code. Those names are mapped to the identifiers during communication with a P4-defined database. Here is an example of this. We can see that table t_firewall is translated to this ID. The same is with one of the fields. In this part, it's translated to ID 1. Also, this definition contains the name of this field, the size and match type.

Also, actions are mapped to the IDs. The last part are the extensions of pipeconf objects. In this case, we determine which files specific for a given data plane have to be used, of course in our pipeconf object. We have discussed the structure of pipeconf package. Now let's take a look at the control application, which is a control plane, part of the firewall application. In ONOS, you can write an application, which is pipeline-aware, meaning that this application is dedicated for a given data plane. Our control application contains firewall service to manage firewall devices and the rules and also a fully featured user interface. Our graphical user interface's components are based on Angular integrated with ONOS domestic interface. Here is an example of our firewall Web UI.

Also, a firewall-related command line interface is integrated with the ONOS environment. It extends ONOS native CLI with some firewall-specific commands. Here are examples of this CLI and its usage. Firewall service is the actual logic executor for our application. The main task of this firewall service is to manage and apply rules for P4-defined devices. The last element of our PoC software stack, which I would like to discuss, is a smartNIC proxy. Our proxy acts as an adaptation layer for Agilio smartNIC runtime interface. To better understand why we decided to build this adapter, let's start from the beginning. Normally, ONOS communicates with P4-capable devices via P4Runtime protocol.

As we mentioned earlier, Agilio smartNIC manufacturer declares support for P4Runtime as well, additionally to legacy Thrift-based protocol. And these are the two ways to control the smartNIC at runtime. Since both ends support P4Runtime, our natural choice was to use it in order to realize communication between data plane and control plane. But after a first try, it turned out we are not able to do this. We realized that ONOS supports the newest version of P4Runtime protocol. Well, LTE for our smartNIC was using some very early P4Runtime version.

And those two versions were not compatible with each other. Then, our next step was to downgrade the version of P4Runtime in ONOS, but it didn't work either. to which we got from Netronome seems to have some limitations. For example, it was not possible to use ternary matching. Also, packet-out mechanism was not implemented. As you can see, it was not possible to make it work according to our expectations. Then, we started to think about how to deal with it. There is a Thrift-based protocol supported by smartNIC, but ONOS doesn't contain it. So, we made a decision to implement a smartNIC proxy. ONOS populates the firewall tables with P4Runtime protocol, which are accepted by the proxy gRPC server. After that, this data is repacked and packed with Thrift-based protocol and sent to a smartNIC. OK, so maybe a couple of words about the architecture of our PoC.

OK, so as I mentioned earlier, the main data plane component is a P4-programmable smartNIC, as you know. But we also have a kind of leaf-spine fabric which is based on Open vSwitch instances, and this is created using the mininet utility. And both smartNIC and the leaf-spine fabric are controlled by ONOS, but using different protocols. And additionally, they're emulated hosts connected to the data plane network just to generate and receive real traffic within the data plane. So we've got three end-user instances and two hosts, acting as application servers.

All the emulated hosts and the mininet with leaf-spine topology are running in Docker containers To enable interconnection between all these components common Linux network techniques and objects are used, such as linux-bridge instances or veth links, etc. And this is how it looks like on a physical level. So the demo is deployed on two bare metal servers. The first hosts the smartNIC equipped with 10G Ethernet interfaces. And on the second one, all other data plane and control plane components are set up. The general idea of the demo is that end users depicted here as Host A, Host B and Host C will try to access services available on the application servers that you can see here. In order to do that, the traffic first needs to go through the OVS-based leaf-spine fabric that end users' containers are attached to.

Here the default gateways for them are configured. Next, the traffic leaves the physical interface attached to the linux-bridge instance, which interconnects that interface to leaf-spine fabric. And then the traffic is transmitted over the wire to the smartNIC physical interface 0, where it is supposed to be forwarded to the interface 1. Of course, if the firewall configuration in the smartNIC allows for that. And finally, the traffic returns back to the first bare metal in order to reach Server 1 and Server 2 connected to another linux-bridge. So here is how the entire traffic forwarding path looks like. We will analyze different scenarios in a while, when we show you how our setup works in practice.

But now let's take a look at how the leaf-spine fabric is configured. So it contains two spines and four leaves, and it is defined in a corresponding mininet Python script you can see here. It is quite simple, I would say, and this is how you usually configure a custom topology in the mininet. So you define the objects: two spines, four leaves, you define links between them and you define the name of the topology to be used when launching mininet. What is important is that we want to have more than just one IP subnet configured on the leaf-spine fabric in our setup. So we need to support IP routing there somehow. And how to accomplish that? Well, ONOS provides at least three applications just out of the box that are capable of doing so. The first one is a segment routing app. It is designed for complex use cases, I would say, since it is a part of the so-called Trellis suite.

If you have heard about the Open CORD project, you may know it is being used there. Another option is an SDN IP application. This one is mostly oriented on BGP peering. And there's also another application which is called Simple Fabric, which just provides a simple routing between subnets connected to different leaves. And this is completely enough for our needs. So we decided to use it. So this is how the fabric is going to be configured. We've got three subnets here.

The first one 10.0.1.0 is for our emulated application servers. They are connected to the fabric through the smartNIC, as you can see. And two others subnets. 10.0.10.0 and 10.0.20.0 are for our emulated end users. Here you can see how the fabric has to be configured in the ONOS to work according to our expectations. So basically in the ONOS, the standard method to inject configs to the system is to use Jason files. And we will discuss the most important part of such a file, which is used to configure our leaf-spine fabric. So first we define the proper interfaces which correspond to the appropriate ports and the OVS instances where both application servers and end users are connected. Then, in the sections related to the Simple Fabric application we define the subnets and we assign interfaces to them. And lastly, we provide an IP configuration for each subnet.

This is completely enough for our leaf-spine fabric to be under control of the ONOS. Now, how to configure the smartNIC in ONOS? First, we need to remember that our smartNIC will not be seen directly by ONOS. ONOS will talk to the smartNIC proxy instead. And to enable that we need to configure an IP address of the proxy and this address will be used for gRPC communication between ONOS and the proxy. Then, we need to indicate a driver we can use, a BMV2 here, which is a driver for a well known BMV2 software switch usually used for testing P4 programs. You might have heard about it. And finally, the pipeconf ID has to be provided, which corresponds to the P4 program on the smartNIC.

After doing this and activating all the required apps in ONOS, we can list all devices controlled by ONOS. So, in our case it would be six OVS instances: forming leaf-spine fabric plus the smartNIC proxy which is not a real device, but it's more something like an agent of a real device. And this is how the complete network topology is presented in ONOS Web UI. As you can see, we've got leaf-spine fabric here to which emulated hosts are connected. Also, there is a smartNIC proxy shown in the typology scheme. You might notice that it is not linked to any other object here. And this is because it is not a real data plane element, as we know. On the other hand, there is no smartNIC depicted here. Again, this is because ONOS does not talk to it directly.

And the tricky thing here is that we can see two application server instances connected directly to the leaf-spine fabric. As you remember, there is also smartNIC in between in the physical setup. However, since ONOS does not include the smartNIC in the topology view, it just detects the traffic coming from application servers on the leaf switch the smartNIC is connected to. And that is why ONOS draws the direct links between that switch and the server. Finally, let's take a look at how we built the entire setup. Two bare metal servers are provisioned and configured remotely by a single builder machine, and this is mostly realized using Ansible playbooks, which are downloaded from our local GitLab repository.

Also, the source code of smartNIC proxy software and custom ONOS apps we developed are stored here. There's also a CI/CD pipeline here, which builds some Docker images that we use. The setup is built in five main phases. First, we install our required software and dependencies on the bare metals. In the second step, we download configuration files, Docker images, and we just build them. Then, we configure the Docker network and something which we call an internal management network just for the purpose of our demo. In the next step, control plane components are set up, like ONOS and smartNIC proxy, and they are configured accordingly. And finally, we are on the leaf-spine fabric and emulated hosts, so end users and application server containers. The entire setup contains 8 Docker containers.

These are: the custom ONOS instance, the smartNIC proxy container, the mininet container with leaf-spine fabric and five containers representing end users and application servers. To deploy this, we use something like 40 Ansible playbooks, including ansible-roles definitions, etc. OK, now I think it's high time to present how all this works. The demo is organized as follows. We'll be adding new rules to the firewall, one by one, to show how it handles different traffic patterns. And for each such use case we will start with some brief explanation of what we want to configure and then we will show you a short video we recorded earlier that shows how to do this and how it impacts the communication between emulated end users and application servers in our setup.

OK, so first let's take a look at how the entire network configuration looks like. There are three IP subnets configured at the leaf-spine fabric, as you may remember, I believe. Server 1 has an IP address 10.0.1.11, Server 2 has an IP address, 10.0.1.22 and they are in the same subnet. The IP address for Host A is 10.0.10.101 and 102 is for Host B. And they are also in the same subnet. And we've also got Host C with IP address 10.0.20.103, which is in another subnet. OK, having said that, I think we are ready to watch the first video. OK, so first, I would like to show you some Ansible playbooks. Here's an example file.

Actually, it is not in the Ansible playbook because there's just a YAML file with a configuration. So as you can see, you've got some parts related to the configuration here. So some ONOS-related configs with some definitions of that configuration files, and these are also some applications that we need to activate in ONOS. Then we've got some parts related to the fabric config. So we tell that the Ansible of which file with that topology is to be used and we specify the name for the topology, as you can see here. There's also a section regarding the end user containers configuration.

Here is a list, in fact, and this is an example element of the configuration for Host B. So you can see different parameters here, like IP address, like MAC address for this container, default gateway and also some information on how to connect this container to the fabric. So we specify the name of the OVS switch, which is leaf 2 in this case, and the number of the OpenFlow port. Something similar for application servers is also on the list, as you can see. What is important here especially is that we specified here the name of the linux-bridge instance that those containers are attached to. OK, I think we encountered some technical issues, so maybe I will try once again.

OK, here we can see some devices in ONOS and host as well. And there is also a firewall CLI application, as it was mentioned before. So here we can see that there are no rules in the firewall database. And we can also see the flows which are globally configured by ONOS on the devices, mostly on OVS switch instances. And these are mainly for the IP routing, which we just realized using those flows. OK, so now let me see an IP configuration for end users, for Host A for the first time. Then for Host B. And for Host C as well. And the same information for Server 1 and the Server 2.

And as you can see, all this information is exactly the same as we're showing you in the configuration YAML file. So now we'll start to check the IP connectivity between end users and application servers. But first, we will launch some traffic capturing and this is going to be executed on the physical interfaces on the linux-bridge instances which correspond to the physical interfaces on the smartNIC because they're physically attached to the smartNIC physical interfaces. So let's identify those interfaces first. And then we'll start capturing traffic on those interfaces using a well-known tcpdump tool. Now, we'll try to ping that application servers from end users, so we can see that it is possible to ping Host A, to ping the application server 1 from Host A and it is possible. so for that Server 2. Also Host B can ping application servers without any issues. And it is the same for Host C. And, this was possible because we didn't have any firewalls rules in our firewall database, as we could see in the firewall CLI component listing. So now we start adding firewall rules to our smartNIC to warm up. Let's start with something very simple.

We'll block the traffic based on the physical interface on the smartNIC. And this way, as you can easily see, we'll block all the traffic coming from end users to application servers. OK, so here's a firewall Web UI component. This is something that you don't normally have after you download and install the ONOS software because this is a custom UI component that has been developed in the CodiLime R&D department just for the purpose of this demo. So this is a table, basically, with some parameters corresponding to the fields that are implemented in our P4 firewall implementation. These are the keys for the matching. So, if you want to add any rule, we need to click plus and then select the action. So there are two options you can select: "allow" or "deny". Since we want to deny some traffic, we select "deny". And then we select the number of the ingress interface and we just click OK.

And as you can see, we have successfully added a rule to our firewall database. And what we can see now in the firewall CLI application, in the ONOS, there is a new element in the firewall database, which is "deny" for ingress port 0. So we've got the first rule. Now, let's take a look at the flows, but for smartNIC proxy only. And here we can see we have successfully added a new flow, which is about to drop the traffic. It is in that table, which is called t_firewall, as you could see earlier. And we match on the field, which is called standard metadata ingress port. The number is 0. And we're using the ternary match, And the action that has to be applied is selected to "drop".

And more or less the same we can see in, let's say, low-level tool, because this is available directly on the smartNIC RTE. So we can list the rules for the t_firewall table as well here. So we can see we've got another rule which reflects exactly what we configured in the ONOS. So there is a reference to the port, which is P0. It reflects the zero value and there is an action "drop" from the ingress control block. And now let's see what is going to happen. So we cannot access any service on Server 1 and we cannot ping Server 2 from Host B or from Host A. In general, we can't do anything because all the traffic coming from end users is being blocked by the firewall. Now we'll show you how to remove that rule.

So instead of "plus", you need to click "minus". And after selecting the rule, just click OK, and wait for synchronization between the front-end and the back-end of this component in ONOS. So now let's take a look at the firewall database here. There are no rules. We have removed the previous rule. And now if you check the flows on the smartNIC proxy, you can see that there's no flow, that it was there before. And you can see exactly the same when you type the firewall list in the ONOS CLI. And the same effect using RTE CLI. OK, so now let's check the traffic. It is working once again because there is no end blocking rule in the firewall table. OK, so now let's do something more ambitious.

We'll experiment maybe with TCP right now. So we've got an NGINX server launched on the Server 1 and it is listening on TCP ports from 9051 to 9060. And what we are going to do. First, we will try to reach the service from end user Host A and from Host B as well. And then, we will block the TCP connection from the subnet, which Host A and Host B belong to the application server, but only on the one value of their destination: TCP port, which will be 9051. Then in the second step, we will block that TCP traffic for destination ports from 9056 to 9060, but only for Host B. So we shouldn't expect any impact on the Host A. OK, so let's see what's happening. So we'll add a rule, we will specify an action which is going to be "deny", we'll select the IP ProtoField which is 106 for TCP in hex and then will specify the subnet in the IPv4 source address field and then the particular IP address for that Server 1.

And we need to select the destination port, which is 9051 in our case. OK, the rule has been applied, as we can see. And then let's try to access the service from Host B. We cannot do this and this is the same for Host A, we cannot access the service. Now we are trying for Host C and, as we can see now, it is possible because Host C is not affected by this rule. OK, so now we will try to add another rule, as I said, this time only for Host B, so we specify the right IP address and then we specify the range of destination TCP ports for that address. OK, so we need to wait a couple of seconds, sometimes it takes a while because the front-end needs to be synchronized with the back-end, as I said, for this UI component.

And we are trying to access the service from Host A, but on the port which is from the range blocked by Host B only. So we can. But for Host B, it is not possible. And also using another part from the range we specified. And again, no impact on Host C. OK, now let's experiment a bit with UDP for a change. So this time we will launch a simple UDP server listening on port 7031 on the Server 2. And we'll try to send the message from Host C to the Server 2. And then we'll block the communication from Host C to the Server 2 for UDP destination port 7031. And we will see what's going to happen. And then, in the second use case, we will launch another UDP server, but this time listing on different port 7032. And again, we will block the connectivity, but now based on the source port on the Host C. So let's check how it works. So first of all, we need to establish a UDP server on Server 2. We will use a very popular netcat tool. We will specify that this UDP should listen on the local port 7031.

And now we will try to send the message from Host C. So we will open the connection using netcat. We specify the right IP address of the server to and the port, the remote port in that case. And we will send a simple message, just a kind of hello. And as you can see, it has been received on the Server 2. Now, the server tries to send something and again, it can do it successfully. So we have two-site communication. Now, we will block the communication from Host C to the Server 2. So we specify the right IP address, we select the UDP for the IP proto field and specify the right destination port number.

So now let's try again to send a message from Host C to the Server 2 and as we can see now, it is not possible because this traffic pattern is blocked at the firewall on the smartNIC. But Server 2, of course, can send the messages to the Host C still because the reverse path is not blocked. OK, so as I said, now we will launch another UDP server listening to another port. We'll open the connection to the server, but now specifying the source port on the Host C. And we will send a simple message. And as you can see, it is possible to do so. And of course, the server can react with another message.

So we'll block this traffic pattern right now. Then we need to specify the right number in the source port field in our form. OK, since it is applied now, let's check what the effect is. And as expected, we cannot send any message from Host C the Server 2 anymore. The reverse path is still working. Now let's launch another UDP server listing on the same part on Server 2. And let's open the connection from Host C, but this time specifying a different source port. And as you can see this time, it is possible to send messages to Server 2. So our firewall is working according to expectations. OK, now it's time for ICMP. So it will be just a very simple use case. We will block the ICMP traffic from Host A to the subnet, which Server 1 and Server 2 belong to.

So now let's check the ping without any rule. So Host A can ping the Server 1 and Server 2 without any issues. And then we'll add the rule, but this time using firewall CLI component because we can add, remove or list the rules also here. Now we've got four rules and we will try to add the next one. So as you can see, we need to type firewall add, then specify the right parameters. So the address of the Host A, then destination address for the subnet that we want to block the communication to, and you need to specify the right number in the IP proto field, which is one. Then select the right device and select the rule, which is "deny". And as we can see now, the new rule has been added.

OK, so as you can see, it seems that our new rule does work because we cannot ping a host, cannot ping Server 1 or Server 2 from Host A. Of course, there's no impact on the Host B. or Host C because they are not affected by that rule. Now, we'll show you how to override exactly the same rule, but with a new action which will be "allow" in order to allow this communication once again. And now, as we can see, the pings from Host A work again. OK, so far we've been defining the firewall rules on the smartNIC based on Layer 3 parameters mostly because we are using particular IP addresses or subnets, port numbers or IP proto field values. Now let's make use of Layer 2 header fields like MAC address or EtherType.

And what we are going to do is to block ARP requests from application server 1 to its default gateway, which is configured in the leaf-spine on the leaf 1 switch. So we will specify the EtherType. for ARP and we will also specify the source MAC address for Server 1. So first of all, let's check the default gateway for Server 1. And then let's check the physical interface that is connected to data plane network. We will use that interface in the ARP command in order to send ARP requests to the given IP address, so this is the default gateway of this host. And we can successfully receive the ARP replies.

Now, let's copy the MAC address of the Server 1. And let's try to add another rule so we will select action, "deny" with specified EtherType that time for ARP and then we'll put the right source MAC address for Server 1. We need to add a bit to apply this rule. OK, the rule has been applied, as we can see, and now let's check the impact. So as we can see now that our ping does not work anymore. OK, so it was the last example from our demo part, now I think we can start the Q&A session.

OK, the first question. Why have you decided to use ONOS? So basically the answer is very simple. ONOS has very good support for the P4Runtime. And this is something that we wanted to use from the very beginning. Of course, as you can see, we end up with something which is a kind of adapter which transforms P4Runtime to the Thrift protocol, which is understood by our smartNIC. But ONOS supports P4Runtime, we wanted to use it because it was a mature solution for programming the P4-defined data plane. So this is the reason.

Another question: what was the criteria used for selecting smartNIC to be Netronome Agilio? Do you want to answer or I would like to answer. OK, I can answer this question. Simply we selected Netronome Agilio because it was one of the cheapest smartNICs on the market and it's one of the not so many smartNICs which support P4. Yes, I think it is just a natural choice at that time. This card is available from the market for a couple of years. So I think we just make a very, very small research, very short research, and then selected the solution. If you want to check the other solutions that support P4, as I said, you can try to reach our blog posts where we analyze such solutions.

Another question Does ONOS have good support for operation aspects, e.g. retrieving counters on routers, switches etc.? So I would answer that ONOS is quite advanced for all the operations that are about the flow control. So if you want to make some operations on programming, like a data plane should react and should operate no matter if it is P4Runtime or it is OpenFlow. So I think ONOS is really well suited for that. But if we are talking about some more management aspects, like retrieving counters, for instance, I would say maybe there are other solutions that are a little bit better for that. Maybe OpenDaylight is more advanced, I would say.

OK, another question. Did you take any performance test on the smartNIC data plane? So maybe I will answer this question. Yes, we actually make a performance test on a smartNIC. Actually, it was our performance test. We actually utilize a full two 10 gigabytes ports or lines in our smartNIC with very low latency. Yeah. We could achieve the full throughput, I would say, without any drops, without any increased latency. So we didn't encounter the issues with the performance with P4- programmed data plane. Our reference test was on two 10 gigabits cards with test BMD with routing via DPDK and actually we got better latency on smartNIC.

OK, another question. How about the scalability of this solution? So in general all the control applications that provided or maybe we extended ONOS with are scalable as such, so you can control multiple devices or some proxy agents using this firewall control application. But when it comes to their proxy implementations, it can handle only one physical device. So, if you want to support more physical devices, which do not have any native support for P4Runtime and you need to use such a kind of proxy, then you need to operate in a proxy per smartNIC model. So this is what you need to take into account.

OK, maybe some other questions? Maybe wait a few more minutes. OK. OK, so I think there are no more questions. Thank you so much for attending this webinar. We really hope it was interesting for you. If you have any other questions, you can reach us via social media, for instance. Thank you so much. See you next time.

Artur

Artur Jaworski

Software Engineer
Paweł

Paweł Parol

Solutions Architect