Debugging the weirdest routing problem I ever had
by Peter Stolz
I am currently having problems routing wireguard traffic. A few words regarding my setup:
I have a network namespace that contains a wireguard interface and a virtual ethernet bridge to the main network namespace. This approach allows me to easily run multiple different VPN networks on the same machine, without a need for full containerization.
The Problem
Traffic from the wireguard interface is not forwarded.
Debugging environment
As a VPN connectino is not working, it is annoying to connect to it with my work machine, as I need music and a search engine. While I could set only certain routes for debugging, I decided to start two wireguard interfaces inside network namespaces, such that I can send packets through both connections without impacting my internet.
Sanity sanity check
Are the packets even arriving?
Yes they are! Running ip netns exec cloudjet tcpdump -l -n -i wg --immediate-mode
shows that packets are actually arriving on the wg interface, meaning that the VPN connection is properly established.
pwru approach
Packets where are you or pwru is an eBPF based network kernel tracing tool, which allows you to see which functions your packets goes through.
Send ping packet and trace kernel funcitons
./pwru --filter-proto icmp
Box that works:
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] napi_gro_receive 154442720178892
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] dev_gro_receive 154442720221892
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] inet_gro_receive 154442720226976
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] skb_defer_rx_timestamp 154442720233069
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] ip_rcv_core 154442720238541
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] nf_hook_slow 154442720243358
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] nf_ip_checksum 154442720250804
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] ip_route_input_noref 154442720264170
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] ip_route_input_rcu 154442720269028
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] ip_route_input_slow 154442720272957
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] __mkroute_input 154442720281718
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] fib_validate_source 154442720287101
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] __fib_validate_source 154442720291080
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] ip_forward 154442720300311
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] nf_hook_slow 154442720304345
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] ip_forward_finish 154442720314249
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] ip_output 154442720319693
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] nf_hook_slow 154442720323127
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] apparmor_ipv4_postroute 154442720327744
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] skb_ensure_writable 154442720339737
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] skb_ensure_writable 154442720343767
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] inet_proto_csum_replace4 154442720348111
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] __xfrm_decode_session 154442720353867
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] decode_session4 154442720361108
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] security_xfrm_decode_session 154442720365725
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] ip_finish_output 154442720373732
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] __ip_finish_output 154442720377994
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] ip_finish_output2 154442720382119
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] dev_queue_xmit 154442720388809
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] __dev_queue_xmit 154442720395938
0xffff8e3b487eb400 [kworker/1:0-wg-crypt-wg] qdisc_pkt_len_init 154442720402760
Box that doesnt work:
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] napi_gro_receive 84154855159254
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] dev_gro_receive 84154855195472
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] inet_gro_receive 84154855202014
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] skb_defer_rx_timestamp 84154855209949
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] ip_rcv_core 84154855217012
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] nf_hook_slow 84154855222953
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] nf_ip_checksum 84154855230207
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] ip_route_input_noref 84154855241248
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] ip_route_input_rcu 84154855246930
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] ip_route_input_slow 84154855252069
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] ip_error 84154855261497
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] kfree_skb 84154855266767
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] skb_release_head_state 84154855272638
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] skb_release_data 84154855279731
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] skb_free_head 84154855285181
0xffff89a14bc08f00 [kworker/0:1-wg-crypt-wg] kfree_skbmem 84154855291012
As we can observe, the function calls are identical up to ip_route_input_slow
.
In the working case the packet is passed to __mkroute_input
, whereas the other machine calls ip_error
. At least this tells us that we don’t have to look for the packet on another interface. It is discarded as the stacktrace ends in kfree_skb
.
First things first, ensure packet forwarding is enabled!
sysctl -a | grep net.ipv4.ip_forward
Spoiler altert: It is imortant where you check that
Compare the routes inside the netns
default via 10.0.0.2 dev peer
10.0.0.2/31 dev peer proto kernel scope link src 10.0.0.3
10.10.10.0/24 dev wg scope link
172.16.0.0/16 dev wg scope link
192.168.0.0/16 dev wg scope link
-> The two files are identical
Compare iptables rules
root@cloudjetTesting1 ~/cloudjet (remove_dead_code)# ip netns exec cloudjet iptables -L -v -n
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
6050 519K ACCEPT all -- wg * 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
root@vpn [] ~# ip netns exec cloudjet iptables -L -v -n
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- wg * 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
As we can see the rules here are identical, however they differ in the pkts count. It seems like the packets we send to the second machine are not arriving at the iptables hook.
Ensure we are not running tc filters
ip netns exec cloudjet tc filter del dev wg ingress
ip netns exec cloudjet tc filter del dev wg egress
Sadly without effect
Compare more iptables rules
root@cloudjetTesting1 ~/cloudjet (remove_dead_code)# ip netns exec cloudjet iptables -L -v -n -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
146 9843 MASQUERADE all -- * peer 0.0.0.0/0 0.0.0.0/0
root@vpn [] ~# ip netns exec cloudjet iptables -L -v -n -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * peer 10.10.10.0/24 0.0.0.0/0
As expected we have the same rules on both machines! They differ once again only in pkts processed.
UFW issues?
A super unlikely scenario is trouble with the ubuntu firewall ufw, but unfortunaltely it is disabled on both machines. Therefore I am still on the journey of recovering my sanity. At this point I am almost certain this will be a meme blogpost, as the bug is either trivial or something crazy.
Am I stupid?
At this point I am just rambling on in hopes to have a rubber ducky moment, when formulating my thought process. As the only real difference is the pkts count in the iptables rules, something is telling me that there is another layer that prevents the packet from reaching iptables.
Diffing the network namespace interfaces
Looking at “all” the routes
root@cloudjetTesting1 ~/cloudjet (remove_dead_code)# ip -n cloudjet route show table all
default via 10.0.0.2 dev peer
10.0.0.2/31 dev peer proto kernel scope link src 10.0.0.3
10.10.10.0/24 dev wg scope link
172.16.0.0/16 dev wg scope link
192.168.0.0/16 dev wg scope link
local 10.0.0.3 dev peer table local proto kernel scope host src 10.0.0.3
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1
fe80::/64 dev peer proto kernel metric 256 pref medium
local ::1 dev lo table local proto kernel metric 0 pref medium
local fe80::7c39:a8ff:fe02:d270 dev peer table local proto kernel metric 0 pref medium
multicast ff00::/8 dev peer table local proto kernel metric 256 pref medium
multicast ff00::/8 dev wg table local proto kernel metric 256 pref medium
A diff yielded that only an IPv6 address is different, which makes sense as these addresses are generated based on MACs of the interfaces, and they are roughly random, as I use virtual interfaces.
If the routes are similar to that extent, there is probaly no difference in interface configuration but let’s exhaust that route anyways
root@cloudjetTesting1 ~/cloudjet (remove_dead_code)# ip -n cloudjet a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: peer@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 7e:39:a8:02:d2:70 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.0.3/31 scope global peer
valid_lft forever preferred_lft forever
inet6 fe80::7c39:a8ff:fe02:d270/64 scope link
valid_lft forever preferred_lft forever
3: tun: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 500
link/none
4: tun-redirect: <NO-CARRIER,POINTOPOINT,MULTICAST,NOARP,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 500
link/none
5: wg: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
link/none
As predicted only that MAC of the veth interface is different and therefore the IPv6 address
One reboot to fix it all?
Some thing I am terribly afraid of is restarting the working machine, to see if I manually influenced it. Therefore I will attempt to recreate the working machine and then I can hopefully diff what happens on my test setup compared to the automatically created machine.
The Solution
Wow, after many wasted hours I started playing around with the order of operations to setup my server. As it turns out if net.ipv4.ip_forward=1
is set after the network namespaces are created, it does not work.
If the ip_forward
is already enabled when the machine booted, everything works as expected.
Lessons learned
Based on the network namespace man page:
Network namespaces provide isolation of the system resources associated with networking: network devices, IPv4 and IPv6 protocol stacks, IP routing tables, firewall rules, the /proc/net directory (which is a symbolic link to /proc/PID/net), the /sys/class/net directory, various files under /proc/sys/net, port numbers (sockets), and so on.
Looking at /proc/sys/net/
there is an ipv4
folder and inside is a ip_forward
file. That is the location where net.ipv4.ip_forward
is saved.
So if these files are different in network namespaces it is no suprise it did not work.
And indeed running:
ip netns exec cloudjet sysctl -a | grep net.ipv4.ip_forward
returns 0
.
Therefore I either need to set ip_forward globally before creating the namespace or enabling it inside the netns.
If you made it this far, thanks for reading and I hope you learned some new command to troubleshoot your network problems.
tags: networknamespace - linux - namespaces - routing