Proxmox SDN / EVPN Routing Incident – Postmortem and Redesign
Summary
Section titled “Summary”This document records a real-world routing failure encountered in a Proxmox + SDN (EVPN/BGP/VRF) environment, the investigative timeline, conclusions, and the resulting redesign.
The core issue was asymmetric routing caused by Linux kernel route selection when SDN VRFs and host-connected networks overlap, specifically when the hypervisor itself resides inside one of the destination subnets, resulting in the frames taking a different return path then comming back at the border router that handled the traffic in the first place.
The final decision was to move network 172.23.30.0/24 back to a classic VLAN, because it is trusted, owned infrastructure (DNS, AD, management), and Linux SDN is not equivalent to VMware NSX-style data-plane isolation.
Initial Environment
Section titled “Initial Environment”Physical / Host Context
Section titled “Physical / Host Context”- Platform: Proxmox VE
- SDN: Proxmox SDN with EVPN + BGP
- Routing daemon: FRR
- Firewall: OPNsense (ruled out later)
- Overlay users: VMs, storage, services
Key Networks
Section titled “Key Networks”| Network | Purpose | Notes |
|---|---|---|
| 172.23.30.0/24 | Infra / DNS / AD / VM mgmt | Initially moved into SDN (mistake) |
| 172.23.69.0/24 | SDN-managed | Worked correctly |
| 172.23.96.0/24 | Hypervisor management | Host is directly connected |
| 172.23.97.0/24 | OOB / iDRAC | Out-of-band |
| 10.0.0.0/24 | SDN fabric | BGP underlay |
DNS/DC lives in 172.23.30.0/24, connected to an SDN VNet.
Symptoms
Section titled “Symptoms”- DNS resolution failed only for some subnets
- ICMP and TCP behaved inconsistently
- Tailscale sometimes worked, sometimes not
- Network 20 worked perfectly
- Network 69 worked correctly
- Networks 30 and 96 were flaky or broken
Key observation:
When traffic originated from 172.23.30.0/24 and targeted 172.23.96.0/24, replies bypassed the SDN fabric.
Investigation Timeline
Section titled “Investigation Timeline”1. Firewall / OPNsense ruled out
Section titled “1. Firewall / OPNsense ruled out”- reply-to disabled
- state types adjusted
- asymmetric routing options tested
- no effect
Conclusion: Not a firewall issue.
2. Tailscale behavior
Section titled “2. Tailscale behavior”- Tailscale traffic worked because:
- source IP not present in host routing table
- forced traversal via gateway/BGP
But:
- could reach 96 and 30
- could not reliably reach 69
Conclusion:
Tailscale bypassed host routing coincidence, masking the real problem.
3. Routing table inspection
Section titled “3. Routing table inspection”Host routing (excerpt)
Section titled “Host routing (excerpt)”C>* 172.23.96.0/24 is directly connected, vmbr0.2396B>* 172.23.30.0/24 is directly connected, vrf_evpnzoneVRF table (1001)
Section titled “VRF table (1001)”ip route show table 1001Key finding:
- 172.23.96.0/24 existed in BOTH main table (connected) and VRF (via BGP)
4. Policy routing tests
Section titled “4. Policy routing tests”Working case (69)
Section titled “Working case (69)”ip route get 172.23.96.21 from 172.23.69.1Result:
via 10.0.0.11 dev vmbr_sdn table vrf_evpnzoneBroken case (30)
Section titled “Broken case (30)”ip route get 172.23.96.21 from 172.23.30.1Result:
local 172.23.96.21 dev lo table localThis showed kernel local table short-circuiting.
5. Missing policy rule discovered
Section titled “5. Missing policy rule discovered”Rule existed for 69:
ip rule add from 172.23.69.0/24 lookup vrf_evpnzone priority 99But not for 30.
After adding:
ip rule add from 172.23.30.0/24 lookup vrf_evpnzone priority 96Routing lookup appeared correct, yet DNS still failed.
Root Cause
Section titled “Root Cause”The real problem
Section titled “The real problem”Linux routing is not VM-aware. It is host-aware.
Even with:
- VRFs
- policy routing
- BGP-learned routes
The kernel will always:
- Prefer
localtable - Prefer directly connected host addresses
- Short-circuit replies when destination IP belongs to the host
So when:
- Proxmox host lives in 172.23.96.0/24
- DNS server in 172.23.30.0/24 replies to 96
The packet:
- never enters EVPN fabric
- never reaches border router
- exits via host interface
Result:
- asymmetric routing
- MTU mismatch
- DNS response drops
This is not fixable cleanly in Linux without extreme hacks.
Final Decision
Section titled “Final Decision”Network 30 moved back to VLAN
Section titled “Network 30 moved back to VLAN”Reasoning:
- It is mine
- It is trusted
- It is control-plane traffic
- DNS must be symmetric
Quote-worthy conclusion:
Linux SDN is not VMware NSX. The kernel always wins.
What Stayed in SDN
Section titled “What Stayed in SDN”Storage networks remain in SDN because:
- 10G requirement
- strict access control (NFS export control)
- no dependency on DNS symmetry
- no overlap with host-connected subnets
New Network Design
Section titled “New Network Design”Design Principles
Section titled “Design Principles”- Trusted infrastructure stays simple
- SDN is for isolation, not ownership
- DNS never crosses asymmetric paths
- Hypervisor routes must never overlap SDN tenant space
Final Network Layout
Section titled “Final Network Layout”VLAN-based (Trusted)
Section titled “VLAN-based (Trusted)”| VLAN | Network | Purpose |
|---|---|---|
| 30 | 172.23.30.0/24 | DNS, AD, infra VMs |
| 96 | 172.23.96.0/24 | Hypervisor mgmt |
| 97 | 172.23.97.0/24 | OOB / iDRAC |
Routing: classic gateway, no VRFs
SDN / EVPN
Section titled “SDN / EVPN”| Network | Purpose |
|---|---|
| Storage-A | NFS bulk |
| Storage-B | VM disks |
| Storage-C | Backup |
Characteristics:
- 10G
- EVPN
- BGP-learned
- no host IP overlap
- no DNS dependency
Kubernetes Placement
Section titled “Kubernetes Placement”- Runs on trusted VLAN
- Uses DNS from VLAN 30
- No SDN dependency
- Gigabit sufficient
Final Mental Rule
Section titled “Final Mental Rule”If it must talk to DNS to function, keep it out of SDN.
Next Steps
Section titled “Next Steps”- Remove 172.23.30.0/24 from SDN entirely
- Remove VRF rules for 30
- Validate symmetric routing
- Document SDN usage boundaries
- Consider separate tenant-only Proxmox cluster if needed
Closing Note
Section titled “Closing Note”This was not a misconfiguration. This was a design mismatch between Linux routing semantics and expectations shaped by enterprise SDN platforms.
The correct fix was architectural, not technical.