NetExp: Post-mortem on VXLAN overlay and Post-Quantum Encryption
Tldr; Tailscale will flap between Clearnet and other tunnels. It is generally not a good idea to overlay tunnels over Tailscale.
So one day I was deploy LUX node using VXLAN overlay Tailscale on a minipc (refer to Homelab page) and the fan started going furrrrr then quiet on repeat.
Something is wrong.
I went on debug adventure, CPU spike every 1 minute. htop shows BIRD and Tailscale are the culprit. BIRD shows Hold Time Exceeded.
BIRD dead when flushing routes, that means the links is making the problem.
Is it VXLAN generate too many packets and causes Tailscale to have CPU spikes? Yes, and stumble upon Wireguard has always inherit the greatest feature Packet Coalescing.
Then let's turn all VXLAN links to Wireguard! Better, but CPU still spike every 5 minute.
I break LUX-HKG link and session to let me sleep in peace. (Minipc is in the same room.)
OKay, restore the link and session on day two.
interface: inter-lux
public key: [REDACTED]
private key: [REDACTED]
listening port: 11514
peer: [REDACTED]
endpoint: [REDACTED]
allowed ips: 0.0.0.0/0, ::/0
latest handshake: 1 minute, 50 seconds ago
transfer: 21.37 GiB received, 1.72 TiB sent
Hmmmm, 1.72TiB, that's more than a month of data I could possibly used on my own.
What could possibly generating 1.72TiB in 6 hours?
I went on another debug hunt, and finally, the suspect is no more than itself.
In Tailscale terms, it will find the best possible direct path and upgrade the link to that path.
In reality, it flaps between IPv4 based on NAT Transverse and IPv6 based on VXLAN and Tailscale itself.
Such a simple mistake, and I forgot about it.
With the help of Claude Code, I quickly throw in each node's public IP and available ports in source of truth and cronjob will work out which endpoint and port should be using.
Also some notable updates on the network, we have ditch GitHub and use Cloudflare R2 bucket as the source of truth for speed and reliability. (And also hide my shit patches)
At the same time, Rosenpass shows up while I was debugging and I throw that to Claude Code to help me implemented in links between each node.
So, yay! I never have to worry about bad guys stealing my ICMP packets when Q-day arrives, and so do you.
Please upgrade to Post-Quantum Encrypted Links like Rosenpass as soon as possible. Give it a try.
P.S. it takes more than two days to debug and in the meantime to make it quiet, i jail it in this cage.
