9 September 2022

Debugging the weirdest routing problem I ever had

by Peter Stolz

I am currently having problems routing wireguard traffic. A few words regarding my setup:

I have a network namespace that contains a wireguard interface and a virtual ethernet bridge to the main network namespace. This approach allows me to easily run multiple different VPN networks on the same machine, without a need for full containerization.

flowchart LR wg((wg)) peer((bridge)) client[(client)] internet[(internet)] client <--> wg peer <--> internet subgraph Network Namespace wg<-- Normal traffic -->peer end

The Problem

Traffic from the wireguard interface is not forwarded.

Debugging environment

As a VPN connectino is not working, it is annoying to connect to it with my work machine, as I need music and a search engine. While I could set only certain routes for debugging, I decided to start two wireguard interfaces inside network namespaces, such that I can send packets through both connections without impacting my internet.

Sanity sanity check

Are the packets even arriving? Yes they are! Running ip netns exec cloudjet tcpdump -l -n -i wg --immediate-mode shows that packets are actually arriving on the wg interface, meaning that the VPN connection is properly established.

pwru approach

Packets where are you or pwru is an eBPF based network kernel tracing tool, which allows you to see which functions your packets goes through.

Send ping packet and trace kernel funcitons ./pwru --filter-proto icmp

Box that works:

0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]         napi_gro_receive  154442720178892
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]          dev_gro_receive  154442720221892
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]         inet_gro_receive  154442720226976
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]   skb_defer_rx_timestamp  154442720233069
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]              ip_rcv_core  154442720238541
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]             nf_hook_slow  154442720243358
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]           nf_ip_checksum  154442720250804
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]     ip_route_input_noref  154442720264170
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]       ip_route_input_rcu  154442720269028
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]      ip_route_input_slow  154442720272957
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]          __mkroute_input  154442720281718
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]      fib_validate_source  154442720287101
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]    __fib_validate_source  154442720291080
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]               ip_forward  154442720300311
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]             nf_hook_slow  154442720304345
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]        ip_forward_finish  154442720314249
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]                ip_output  154442720319693
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]             nf_hook_slow  154442720323127
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]  apparmor_ipv4_postroute  154442720327744
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]      skb_ensure_writable  154442720339737
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]      skb_ensure_writable  154442720343767
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] inet_proto_csum_replace4  154442720348111
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]    __xfrm_decode_session  154442720353867
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]          decode_session4  154442720361108
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] security_xfrm_decode_session  154442720365725
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]         ip_finish_output  154442720373732
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]       __ip_finish_output  154442720377994
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]        ip_finish_output2  154442720382119
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]           dev_queue_xmit  154442720388809
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]         __dev_queue_xmit  154442720395938
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg]       qdisc_pkt_len_init  154442720402760

Box that doesnt work:

0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]         napi_gro_receive   84154855159254
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]          dev_gro_receive   84154855195472
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]         inet_gro_receive   84154855202014
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]   skb_defer_rx_timestamp   84154855209949
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]              ip_rcv_core   84154855217012
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]             nf_hook_slow   84154855222953
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]           nf_ip_checksum   84154855230207
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]     ip_route_input_noref   84154855241248
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]       ip_route_input_rcu   84154855246930
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]      ip_route_input_slow   84154855252069
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]                 ip_error   84154855261497
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]                kfree_skb   84154855266767
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]   skb_release_head_state   84154855272638
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]         skb_release_data   84154855279731
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]            skb_free_head   84154855285181
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg]             kfree_skbmem   84154855291012

As we can observe, the function calls are identical up to ip_route_input_slow. In the working case the packet is passed to __mkroute_input, whereas the other machine calls ip_error. At least this tells us that we don’t have to look for the packet on another interface. It is discarded as the stacktrace ends in kfree_skb.

First things first, ensure packet forwarding is enabled!

sysctl -a | grep net.ipv4.ip_forward
Spoiler altert: It is imortant where you check that

Compare the routes inside the netns

default via 10.0.0.2 dev peer 
0.0.2/31 dev peer proto kernel scope link src 10.0.0.3 
10.10.0/24 dev wg scope link 
16.0.0/16 dev wg scope link 
168.0.0/16 dev wg scope link

-> The two files are identical

Compare iptables rules

root@cloudjetTesting1 ~/cloudjet (remove_dead_code)# ip netns exec cloudjet iptables -L -v -n
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 6050  519K ACCEPT     all  --  wg     *       0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

root@vpn [] ~# ip netns exec cloudjet iptables -L -v -n
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  wg     *       0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination        

As we can see the rules here are identical, however they differ in the pkts count. It seems like the packets we send to the second machine are not arriving at the iptables hook.

Ensure we are not running tc filters

ip netns exec cloudjet tc filter del dev wg ingress
ip netns exec cloudjet tc filter del dev wg egress

Sadly without effect

Compare more iptables rules

root@cloudjetTesting1 ~/cloudjet (remove_dead_code)# ip netns exec cloudjet iptables -L -v -n -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  146  9843 MASQUERADE  all  --  *      peer    0.0.0.0/0            0.0.0.0/0

root@vpn [] ~# ip netns exec cloudjet iptables -L -v -n -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 MASQUERADE  all  --  *      peer    10.10.10.0/24        0.0.0.0/0   

As expected we have the same rules on both machines! They differ once again only in pkts processed.

UFW issues?

A super unlikely scenario is trouble with the ubuntu firewall ufw, but unfortunaltely it is disabled on both machines. Therefore I am still on the journey of recovering my sanity. At this point I am almost certain this will be a meme blogpost, as the bug is either trivial or something crazy.

Am I stupid?

At this point I am just rambling on in hopes to have a rubber ducky moment, when formulating my thought process. As the only real difference is the pkts count in the iptables rules, something is telling me that there is another layer that prevents the packet from reaching iptables.

Diffing the network namespace interfaces

Looking at “all” the routes

root@cloudjetTesting1 ~/cloudjet (remove_dead_code)# ip -n cloudjet route show table all
default via 10.0.0.2 dev peer 
10.0.0.2/31 dev peer proto kernel scope link src 10.0.0.3 
10.10.10.0/24 dev wg scope link 
172.16.0.0/16 dev wg scope link 
192.168.0.0/16 dev wg scope link 
local 10.0.0.3 dev peer table local proto kernel scope host src 10.0.0.3 
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 
fe80::/64 dev peer proto kernel metric 256 pref medium
local ::1 dev lo table local proto kernel metric 0 pref medium
local fe80::7c39:a8ff:fe02:d270 dev peer table local proto kernel metric 0 pref medium
multicast ff00::/8 dev peer table local proto kernel metric 256 pref medium
multicast ff00::/8 dev wg table local proto kernel metric 256 pref medium

A diff yielded that only an IPv6 address is different, which makes sense as these addresses are generated based on MACs of the interfaces, and they are roughly random, as I use virtual interfaces.

If the routes are similar to that extent, there is probaly no difference in interface configuration but let’s exhaust that route anyways

root@cloudjetTesting1 ~/cloudjet (remove_dead_code)# ip -n cloudjet a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: peer@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:39:a8:02:d2:70 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.0.3/31 scope global peer
       valid_lft forever preferred_lft forever
    inet6 fe80::7c39:a8ff:fe02:d270/64 scope link 
       valid_lft forever preferred_lft forever
3: tun: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 500
    link/none 
4: tun-redirect: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 500
    link/none 
5: wg: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    link/none

As predicted only that MAC of the veth interface is different and therefore the IPv6 address

One reboot to fix it all?

Some thing I am terribly afraid of is restarting the working machine, to see if I manually influenced it. Therefore I will attempt to recreate the working machine and then I can hopefully diff what happens on my test setup compared to the automatically created machine.

The Solution

Wow, after many wasted hours I started playing around with the order of operations to setup my server. As it turns out if net.ipv4.ip_forward=1 is set after the network namespaces are created, it does not work.

If the ip_forward is already enabled when the machine booted, everything works as expected.

Lessons learned

Based on the network namespace man page:

Network namespaces provide isolation of the system resources associated with networking: network devices, IPv4 and IPv6 protocol stacks, IP routing tables, firewall rules, the /proc/net directory (which is a symbolic link to /proc/PID/net), the /sys/class/net directory, various files under /proc/sys/net, port numbers (sockets), and so on.

Looking at /proc/sys/net/ there is an ipv4 folder and inside is a ip_forward file. That is the location where net.ipv4.ip_forward is saved. So if these files are different in network namespaces it is no suprise it did not work. And indeed running: ip netns exec cloudjet sysctl -a | grep net.ipv4.ip_forward returns 0.

Therefore I either need to set ip_forward globally before creating the namespace or enabling it inside the netns.

If you made it this far, thanks for reading and I hope you learned some new command to troubleshoot your network problems.

tags: networknamespace - linux - namespaces - routing

Developer, Hacker, CyberSecurity Enthusiast

My Cyber Security blog about CTFs and other IT related topics.