Linux is a popular and powerful operating system that is widely used in both personal and professional settings. One of the key features of Linux is its robust networking capabilities, which make it an excellent choice for a wide range of networking tasks.
The Linux network stack has grown significantly throughout the years and currently supports not only basic functionalities but also advanced and complex features (e.g. network namespaces allowing the creation of different and isolated network stack instances). As a result, it is often used as a network device (more about that can be found in Linux as a network device).
At first glance, networking might be seen as an easy task as its role is to simply provide connectivity. However, ensuring that packets are transported from a computer to the right destination somewhere in the Internet or exposing an application, running on the server, to traffic from external sources can quickly get complicated.
It can be really complex when networking does not work as expected. Troubleshooting network issues is a challenging task as it requires understanding of networking concepts and configuration options. There is no difference when networking in Linux - knowledge of the network stack and networking tools are required in order to be able to find and solve problems.
This blog post describes the foundations of Linux network troubleshooting. In the first part, commonly-used tools are listed. In the second, example issues are described and possible approaches for finding the causes and ways to resolve them.
Linux network troubleshooting commands and tools
In the table below are tools which are useful in troubleshooting networking problems in Linux. These tools, generally, are natively supported by most Linux distributions (pre-installed or available as packages) and you can configure, modify or check their network related settings or statuses. It is not an exhaustive list, as there are many different tools available as well as new ones are constantly being created. The focus is on the ones which are most commonly used in diagnosis and resolving issues.
Table 1. Tools and commands used for network troubleshooting in Linux
Linux network troubleshooting: Examples & use cases
In this part we provide examples of networking issues and troubleshooting steps allowing us to diagnose, modify (correct), and check the results.
Use case 1: virtual machine network connectivity issues
Description
A virtual machine (VM) is running on a server (both are Linux-based). The VM needs Internet access.
There are two machines:
- Server, can be recognized by terminal prompt
[user@term:~]$
- VM (virtual machine) has a network interface with an IP address: 192.168.122.44, can be recognized by terminal prompt
ubuntu@vm:~$
Troubleshooting
First we log into VM using ssh and check the basic network configuration - the IP address assigned to it:
[user@term:~]$ ssh ubuntu@192.168.122.44
ubuntu@vm:~$
ubuntu@vm:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:36:cd:8c brd ff:ff:ff:ff:ff:ff
altname enp0s3
inet 192.168.122.44/24 metric 100 brd 192.168.122.255 scope global dynamic ens3
valid_lft 3284sec preferred_lft 3284sec
inet6 fe80::5054:ff:fe36:cd8c/64 scope link
valid_lft forever preferred_lft forever
ssh access works fine, the VM has an IP address assigned. The next step is to verify external connectivity:
ubuntu@vm:~$ ping 1.1.1.1
ping: connect: Network is unreachable
ping to an external address (1.1.1.1 in this case) does not work as there is an output message “Network is unreachable”. It means that it is not possible to reach the target network and the message indicates an issue with routing - no route entry to the destination IP (also the case when the interfaces have not been assigned IP addresses). Another situation when such output can be seen is when a firewall (on the path towards the destination) rejects packets with the “ICMP admin prohibited” message and the host (source of packets) misinterprets it and displays “Network is unreachable”. For our case let’s start with checking the routing entries:
ubuntu@vm:~$ ip route
192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.44 metric 100
192.168.122.1 dev ens3 proto dhcp scope link src 192.168.122.44 metric 100
There is only a route for a directly connected prefix (192.168.122.0/24) and no default. In consequence, the network stack is not able to send packets destined to IP addresses from other prefixes.
In this setup, VM uses libvirt’s default network which provides DHCP and DNS services for connected VMs and which (in default configuration) assigns IP addresses from the range 192.168.122.2-192.168.122.254. On the host system there is a virbr0 bridge which typically works as a default gateway and has an IP 192.168.122.1. A ping from the VM shows that it (VM) can reach the IP address of the host’s virbr0. Let’s add the default route via that bridge:
ubuntu@vm:~$ sudo ip route add default via 192.168.122.1
ubuntu@vm:~$ ip route
default via 192.168.122.1 dev ens3
192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.44 metric 100
192.168.122.1 dev ens3 proto dhcp scope link src 192.168.122.44 metric 100
ubuntu@vm:~$
ubuntu@vm:~$ ping -c3 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=63 time=3.16 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=63 time=2.04 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=63 time=4.23 ms
--- 1.1.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 2.041/3.142/4.225/0.891 ms
Now we can reach the (ping) IP address: 1.1.1.1 - external connectivity works.
Use case 2: DNS does not resolve domain names
Description
The setup is the same as in use case 1: a virtual machine (VM) is running on a server (both are Linux-based). On the VM we want to use domain names (DNS machinery) to access external hosts.
There are two machines:
- Server, can be recognized by terminal prompt
[user@term:~]$
- VM (virtual machine) has a network interface with an IP address: 192.168.122.44, can be recognized by terminal prompt
ubuntu@vm:~$
Troubleshooting
In the first use case we used an IP address to specify the destination (for ping), now we want to use a FQDN (fully qualified domain name) which relies on DNS resolution:
ubuntu@vm:~$ ping google.com
ping: google.com: Temporary failure in name resolution
DNS name resolution does not work (as we get the message: “Temporary failure in name resolution”). We need to check the DNS settings on our VM - on most Linux systems the DNS configuration is defined in /etc/resolv.conf file:
ubuntu@vm:~$ cat /etc/resolv.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 127.0.0.53
options edns0 trust-ad
search .
Based on the content of /etc/resolv.conf (first line of the file: “This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8)”) we see that the systemd-resolved service is used for domain names resolution. To verify which DNS servers are configured on our machine, we can use the resolvectl tool:
ubuntu@vm:~$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Link 2 (ens3)
Current Scopes: DNS
Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.122.1
DNS Servers: 192.168.122.1
The VM is launched on the Linux server, which uses the libvirt toolkit to manage the virtualization and, as one may expect, the DNS server provided through libvirt’s default network is configured (IP: 192.168.122.1) on the VM. However there is an issue with resolution - let’s check if the configured server is responding to DNS queries. This can be done using the dig tool (+short option to provide a concise answer):
ubuntu@vm:~$ dig +short google.com @192.168.122.1
;; communications error to 192.168.122.1#53: connection refused
ubuntu@vm:~$ dig +short google.com @1.1.1.1
216.58.208.206
A configured DNS server is not available, but when trying to use external DNS server 1.1.1.1 (the public DNS resolver operated by Cloudflare) there is an answer. It looks as if there is an issue with the DNS resolver service which should be serving our VM (e.g. it might have been stopped). If we do not have the ability to troubleshoot and make changes on the server, it is still possible to configure DNS on our VM.
As seen earlier, the systemd-resolved service is used to provide domain names resolution and we can modify its configuration file /etc/systemd/resolved.conf to add the DNS server - the following line is set (two DNS servers’ addresses are added):
DNS=1.1.1.1 8.8.8.8
And later restart the service:
ubuntu@vm:~$ sudo systemctl restart systemd-resolved.service
ubuntu@vm:~$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
DNS Servers: 1.1.1.1 8.8.8.8
Link 2 (ens3)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
The second command (resolvectl status) confirms that new entries defining DNS servers have been applied.
ubuntu@vm-kind:~$ ping -c3 google.com
PING google.com (142.250.203.142) 56(84) bytes of data.
64 bytes from waw07s06-in-f14.1e100.net (142.250.203.142): icmp_seq=1 ttl=119 time=5.38 ms
64 bytes from waw07s06-in-f14.1e100.net (142.250.203.142): icmp_seq=2 ttl=119 time=7.19 ms
64 bytes from waw07s06-in-f14.1e100.net (142.250.203.142): icmp_seq=3 ttl=119 time=6.83 ms
--- google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 5.379/6.466/7.186/0.782 ms
Now the DNS resolution on our VM works and it is possible to use domain names instead of IP addresses.
Use case 3: external access to application running in virtual machine
Description
An application (redis database) is launched on the VM (virtual machine). One needs to access the application from the server (on which the VM is running), using the VM's IP address and application port (using redis-cli application).
There are two machines:
- Server, can be recognized by terminal prompt
[user@term:~]$
- VM (virtual machine) has a network interface with an IP address: 192.168.122.44, can be recognized by terminal prompt
ubuntu@vm:~$
Troubleshooting
From the server we try to connect to the redis application running on the VM (VM has an IP: 192.168.122.44, redis is launched on default port 6379):
[user@term:~]$ redis-cli -h 192.168.122.44 -p 6379
Could not connect to Redis at 192.168.122.44:6379: Connection timed out
not connected>
A common reason for the “Connection timed out” error is a firewall blocking packets. Firewalls can run on the target host operating system (e.g. iptables) or be a device on the path from source to destination host. In the described case, the connection is from the server host to a VM running on the same server (there is no intermediate device between them). First we check the firewall settings on the VM. Generally, on Linux systems, firewall functionality is realized by iptables/nftables. The VM machine has Ubuntu installed. Ubuntu systems use ufw (Uncomplicated FireWall) which is a higher level management interface whose goal is to hide the complexity of packet filtering (underneath it configures iptables or nftables).
Let’s check if ufw is active and if there are any rules applied.
ubuntu@vm:~$ ssh ubuntu@192.168.122.44
Welcome to Ubuntu 22.10 (GNU/Linux 5.19.0-26-generic x86_64)
<truncated for brevity>
ubuntu@vm:~$ sudo ufw status numbered
Status: active
To Action From
-- ------ ----
[ 1] 22/tcp ALLOW IN Anywhere
[ 2] 22/tcp (v6) ALLOW IN Anywhere (v6)
ufw is enabled and there are rules for ssh (thanks to that we could log into the VM). The default behavior of ufw is to block incoming packets thus if there is no explicit rule to allow particular traffic it will be dropped - that is the reason for the connection errors in our case. Let’s add a rule allowing incoming connection on TCP port 6379:
ubuntu@vm:~$ sudo ufw allow to 192.168.122.44 proto tcp port 6379
Rule added
ubuntu@vm:~$ sudo ufw status numbered
Status: active
To Action From
-- ------ ----
[ 1] 22/tcp ALLOW IN Anywhere
[ 2] 192.168.122.44 6379/tcp ALLOW IN Anywhere
[ 3] 22/tcp (v6) ALLOW IN Anywhere (v6)
Now we return to the terminal window where we are logged into the server and try again to connect to the redis application (database) using redis-cli:
[user@term:~]$ redis-cli -h 192.168.122.44 -p 6379
Could not connect to Redis at 192.168.122.44:6379: Connection refused
not connected>
It is not possible, but this time we get a different error message “Connection refused”. This message can be seen in situations when there is no process running on the target host to listen on a given port or when, for example, a firewall blocks traffic but is set to reject packets (generally ‘reject’ means that the firewall informs the source about its action whereas it does not do so for ‘drop’). Using the ss tool we can check all listening TCP sockets on the VM and verify if the redis application is exposed (ss options: n - numeric, do not resolve; l - listening; t - TCP; p - show process using socket):
ubuntu@vm:~$ sudo ss -nltp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 127.0.0.54:53 0.0.0.0:* users:(("systemd-resolve",pid=494,fd=16))
LISTEN 0 511 127.0.0.1:6379 0.0.0.0:* users:(("redis-server",pid=12665,fd=6))
LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=494,fd=14))
LISTEN 0 4096 *:22 *:* users:(("sshd",pid=783,fd=3),("systemd",pid=1,fd=134))
Indeed there is a redis process running and listening on TCP port 6379, but it is bound to the Linux loopback interface (lo) which has an address 127.0.0.1. This address is only accessible for local connections (originating from the host where the lo interface is configured).
In consequence it is not possible to connect to the redis application from other machines (the server in our case), but it is possible to connect from the VM itself:
ubuntu@vm:~$ redis-cli
127.0.0.1:6379> hgetall *
(empty array)
127.0.0.1:6379>
Note that we use a redis-cli command *hgetall ** to verify that it is possible to perform actions on the database itself (this command returns all fields and values stored in the database, in our case the database is empty).
Before changing the configuration of redis to listen on an externally accessible interface, let’s use socat to relay the external connection to a loopback address and test if this configuration allows external access to redis.
ubuntu@vm:~$ socat TCP4-LISTEN:6379,bind=192.168.122.44 TCP:127.0.0.1:6379
The above command relays the TCP port 6379 bound to IPv4 address 192.168.122.44 to the TCP port 6379 bound to address 127.0.0.1 - the latter is used by the redis process. It is a temporary configuration, allowing only for a single connection (and when that connection is finished, socat will exit. For multiple connections one can use fork and reuseaddr options of socat).
On another terminal (after logging into the VM), let’s check the listening TCP sockets:
ubuntu@vm:~$ sudo ss -nltp
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 127.0.0.54:53 0.0.0.0:* users:(("systemd-resolve",pid=494,fd=16))
LISTEN 0 5 192.168.122.44:6379 0.0.0.0:* users:(("socat",pid=17896,fd=5))
LISTEN 0 511 127.0.0.1:6379 0.0.0.0:* users:(("redis-server",pid=12665,fd=6))
LISTEN 0 4096 127.0.0.53%lo:53 0.0.0.0:* users:(("systemd-resolve",pid=494,fd=14))
LISTEN 0 4096 *:22 *:* users:(("sshd",pid=783,fd=3),("systemd",pid=1,fd=134))
We see that there is a socat process listening on address 192.168.122.44 and port 6379. Now it should be possible to connect to the redis application running on the VM. We go back to the terminal window where we are logged into the server and try to connect:
[user@term:~]$ redis-cli -h 192.168.122.44 -p 6379
192.168.122.44:6379> hgetall *
(empty array)
192.168.122.44:6379>
It works - the redis-cli tool (running on the server) connects to our redis (running on the VM). To make it permanent (stop using socat to relay connections), we need to change the redis configuration to listen on the external interface. However before doing so, one should secure it so it only accepts authorized connections (secure the database before exposing it). This however is out of the scope of this blog post and is not covered here.
Use case 4: connect server to switch
Description
We want to configure the connection between the server’s interface ens3f1 and switch port Ethernet36. Both interfaces (servers and switch) are 25Gb and have been connected with DAC (Direct Attach Copper) cable. Both server and switch should be configured as a layer 3 device thus there should be an IP connectivity between those devices.
There are two devices:
- Server, can be recognized by terminal prompt:
[user@term:~]$
- Switch with SONiC (open-source network operating system), can be recognized by terminal prompt:
admin@sonic:~$
Troubleshooting
The server interface ens3f1 is already connected to the switch interface Ethernet36. First we verify that the connection is recognized by both ends.
On the server:
[user@term:~]$ ip link show ens3f1
16: ens3f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether ac:1f:6b:ed:6a:05 brd ff:ff:ff:ff:ff:ff
altname enp179s0f1
The interface is enabled but the state of the interface is reported as down. Now let’s check the status on the switch device.
This part is generally outside the Linux network troubleshooting topic, however the situation we have encountered is common - we have a connectivity issue and need to verify the configuration and logs on both ends: both the Linux server and the switch to which it is connected. Note that the below commands are specific to the SONiC network operating system (if you want to read more about SONiC please check article Developing custom network functionality using SONiC):
admin@sonic:~$ show interface status Ethernet36
Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC
----------- ------- ------- ----- ----- ------- ------ ------ ------- --------------
----------
Ethernet36 36 25G 9100 rs etp10 routed down up SFP/SFP+/SFP28 N/A
The switch also reports the interface state as down.
Now we go back to the terminal window with the server and check the interface settings with ethtool:
[user@term:~]$ ethtool ens3f1
Settings for ens3f1:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseT/Full
25000baseCR/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: None BaseR RS
Advertised link modes: 10000baseT/Full
25000baseCR/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: None BaseR RS
Speed: Unknown!
Duplex: Unknown! (255)
Port: Direct Attach Copper
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: no
One can see that interface ens3f1 supports 10Gb and 25Gb link speeds. What is more it supports (and advertises) different FEC (Forward Error Correction) modes: None BaseR RS. The switch for its port Ethernet36 reports usage of FEC with RS encoding. It is also supported by the network card on the server, however we should check the FEC configuration as a different mode might be set or it could be turned off. To do that, we again use ethtool, this time with the option --show-fec.
[user@term:~]$ ethtool --show-fec ens3f1
FEC parameters for ens3f1:
Configured FEC encodings: Off
Active FEC encoding: Off
FEC is turned off on the interface. The switch requires FEC with RS encoding thus it is not possible to establish a connection. Let’s change the FEC configuration on the server’s interface - we set auto so it can negotiate the mode with the other side:
[user@term:~]$ sudo ethtool --set-fec ens3f1 encoding auto
[user@term:~]$ ip link show ens3f1
16: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether ac:1f:6b:ed:6a:05 brd ff:ff:ff:ff:ff:ff
altname enp179s0f1
After the change is applied, we see that interface ens3f1 on the server is up. The same is visible on the switch:
admin@sonic:~$ show interfaces status Ethernet36
Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC
----------- ------- ------- ----- ----- ------- ------ ------ ------- -------------- ----------
Ethernet36 36 25G 9100 rs etp10 routed up up SFP/SFP+/SFP28 N/A
Looking at the interface ens3f1 driver’s logs on the server it is clear that FEC mode has been negotiated (we use the dmesg tool which prints the message buffer of the kernel, including messages from devices’ drivers):
[user@term:~]$ dmesg | grep ens3f1
[16946973.105255] i40e 0000:b3:00.1 ens3f1: renamed from eth0
[17109033.632111] i40e 0000:b3:00.1 ens3f1: NIC Link is Down
[17109048.596726] i40e 0000:b3:00.1 ens3f1: NIC Link is Up, 25 Gbps Full Duplex, Requested FEC: CL108 RS-FEC, Negotiated FEC: CL108 RS-FEC, Autoneg: False, Flow Control: None
Now it is possible to assign IP addresses and establish communication - on the switch:
admin@sonic:~$ sudo config interface ip add Ethernet36 192.168.100.1/24
and on the server:
[user@term:~]$ sudo ip address add 192.168.100.2/24 dev ens3f1
[user@term:~]$ ping -c 3 192.168.100.1
PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data.
64 bytes from 192.168.100.1: icmp_seq=1 ttl=64 time=0.442 ms
64 bytes from 192.168.100.1: icmp_seq=2 ttl=64 time=0.157 ms
64 bytes from 192.168.100.1: icmp_seq=3 ttl=64 time=0.158 ms
--- 192.168.100.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2040ms
rtt min/avg/max/mdev = 0.157/0.252/0.442/0.134 ms
We have an IP connectivity between the server’s and switch’s interfaces.
Use case 5: connect server 10Gb port with switch 25Gb port
Description
We want to connect the server's interface eno2 and the switch's interface Ethernet0. The switch is expected to be configured as a layer 3 device thus there should be an IP connectivity between those devices (server and switch).
The network port on the server is a 10Gb interface while the switch port is 25Gb, however the switch port also supports speeds of 1/10/25 Gb. A physical connection (cable) between server and switch ports is established.
There are two devices:
- Server, which can be recognized by terminal prompt:
[user@term:~]$
- Switch, with SONiC (open-source network operating system), can be recognized by terminal prompt:
admin@sonic:~$
Troubleshooting
The ports from server and switch are connected with physical cable, so let’s check the status of the interface on the server side.
[user@term:~]$ ip link show eno2
18: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ec:f4:bb:da:4a:7a brd ff:ff:ff:ff:ff:ff
altname enp1s0f1
The interface is disabled so the first step is to enable it:
[user@term:~]$ sudo ip link set up eno2
[user@term:~]$ ip link show eno2
18: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether ec:f4:bb:da:4a:7a brd ff:ff:ff:ff:ff:ff
altname enp1s0f1
After enabling the interface, its state is still down. As a next step, let’s check what ethtool reports:
[user@term:~]$ ethtool eno2
Settings for eno2:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: 10000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Unknown! (255)
Port: FIBRE
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Cannot get wake-on-lan settings: Operation not permitted
Current message level: 0x00000007 (7)
drv probe link
Link detected: no
We see “Link detected: no” which is no surprise taking into account the outputs from previous commands. We can check the kernel logs related to the interface eno2:
[user@term:~]$ dmesg | grep eno2
[ 286.193572] ixgbe 0000:01:00.1 eno2: renamed from eth0
[ 405.125867] ixgbe 0000:01:00.1: registered PHC device on eno2
[ 405.237725] 8021q: adding VLAN 0 to HW filter on device eno2
[ 405.306018] ixgbe 0000:01:00.1 eno2: detected SFP+: 5
There is nothing which could explain why the link is not detected. We need to check the other side of the connection - the status visible on the switch. As in use case 4, the switch has the SONiC network operating system installed - the below commands are specific for that system. Checking Ethernet0 interface status:
admin@sonic:~$ show interfaces status Ethernet0
Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC
----------- ------- ------- ----- ----- ------- ------ ------ ------- -------------- ----------
Ethernet0 0 25G 9100 N/A etp1 routed down up SFP/SFP+/SFP28 N/A
The switch reports the links as down (“Oper down”). What is important, is the Ethernet0 port is reported as a 25Gb interface but on the other side there is a 10Gb interface. Ports on the switch also support 1Gb and 10Gb - we need to check the configuration. First, verify if the port Ethernet0 has auto negotiation enabled:
admin@sonic:~$ show interfaces autoneg status Ethernet0
Interface Auto-Neg Mode Speed Adv Speeds Rmt Adv Speeds Type Adv Types Oper Admin
----------- --------------- ------- ------------ ---------------- ------ ----------- ------ -------
Ethernet0 disabled 25G N/A N/A N/A N/A down up
Auto negotiation is disabled. To enable it we run:
admin@sonic:~$ sudo config interface autoneg Ethernet0 enabled
admin@sonic:~$ show interfaces autoneg status Ethernet0
Interface Auto-Neg Mode Speed Adv Speeds Rmt Adv Speeds Type Adv Types Oper Admin
----------- --------------- ------- ------------ ---------------- ------ ----------- ------ -------
Ethernet0 enabled 10G N/A N/A N/A N/A up up
admin@sonic:~$ show interfaces status Ethernet0
Interface Lanes Speed MTU FEC Alias Vlan Oper Admin Type Asym PFC
----------- ------- ------- ----- ----- ------- ------ ------ ------- -------------- ----------
Ethernet0 0 10G 9100 N/A etp1 routed up up SFP/SFP+/SFP28 N/A
After enabling auto negotiation, the switch reports the link as working/operational (“Oper up”) and the link speed as 10Gb (as expected by the interface on the server).
Now we go back to the terminal window where we are logged into the server and check the link status:
[user@term:~]$ ip link show eno2
18: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether ec:f4:bb:da:4a:7a brd ff:ff:ff:ff:ff:ff
altname enp1s0f1
As a final verification, we configure IP addresses on the relevant interfaces on both the server (IP: 192.168.200.2/24) and the switch (IP: 192.168.200.1/24) and check if they are mutually reachable.
The server interface IP address:
[user@term:~]$ ip address show eno2
18: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether ec:f4:bb:da:4a:7a brd ff:ff:ff:ff:ff:ff
altname enp1s0f1
inet 192.168.200.2/24 scope global eno2
valid_lft forever preferred_lft forever
inet6 fe80::eef4:bbff:feda:4a7a/64 scope link
valid_lft forever preferred_lft forever
Checking the reachability of the switch’s Ethernet0 interface from the server:
[user@term:~]$ ping -c 3 192.168.200.1
PING 192.168.200.1 (192.168.200.1) 56(84) bytes of data.
64 bytes from 192.168.200.1: icmp_seq=1 ttl=64 time=0.366 ms
64 bytes from 192.168.200.1: icmp_seq=2 ttl=64 time=0.186 ms
64 bytes from 192.168.200.1: icmp_seq=3 ttl=64 time=0.193 ms
--- 192.168.200.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2055ms
rtt min/avg/max/mdev = 0.186/0.248/0.366/0.083 ms
Ping between server and switch works - connection is established.
Troubleshooting has been done on a Linux server and switch (to which the server has been connected). Changes/configuration actions have been carried out on both devices (enabling the interface on the server and enabling auto negotiation for the port on the switch). This illustrates that both ends of the link have to be configured accordingly.
Summary
Network troubleshooting in Linux requires a combination of tools and techniques to identify and fix problems. There are a number of tools available and in this blog post we list and describe the most commonly used ones. Additionally, we show example issues and processes for identifying and solving problems in those cases (similar steps can be used in other situations).
In real life, network troubleshooting can be complex (e.g. not limited only to Linux but also interconnected network devices). However, knowledge of tools and understanding of networking concepts allow diagnosis and resolution of issues, keeping your Linux systems running smoothly.
If you are looking for further information about Linux and network-related topics, check out our other articles: