Site to site VPN using IPSec virtual tunnels and BGP

Why?

If you have multiple networks and servers that should be part of a coherent whole, you have to think about how services from which server can be accessed from other sites. Luckily there is ProxyJump and tailscale for point-to-point connections.

I want to roll out services that can reach each other by just using their IP addresses though; the goal is to have a full blown distributed kubernetes setup between all or most of my networks. Expect another article about that soon.

There is two aspects to this: Creating the VPN connection and configuring the routing to send the data to the correct other router.

Since kubernetes will be using BGP to distribute routes, there was no question to also use it for routing between the pfSensen and their networks. BGP is actually not that hard.

Instead of using my usual go-to VPN solution (OpenVPN), I chose to try something leaner: IPSec. Usually you run IPSec in a “gateway” fashion where every endpoint knows which networks to route to which peer, but since the goal was kubernetes and BGP, I opted for the “virtual tunnel interface” mode of IPSec.

This is the setup I will be describing in this article. Both pfSensen have a number of networks attached, I omitted them here for clarity.

In my case it’s two networks with a pfSense as router and one stand-alone VPS. Instead of a VPS, you can understand that to be any Linux machine of course.

How?

The connection between the pfSensen is the easy part: Just fill out all boxes symmetrically.

.png

Since my IP address at home is dynamic, I cannot use it as identifier; thus I choose an arbitrary “Distinguished name”:

.png

For Phase 2 of IPSec I chose the Routed (VTI) option and configured virtual IP addresses for the tunnel endpoints:

.png

I would have liked configuring the pfSense with the fixed IP address to accept the connection from anywhere; then the dynamic IP on the other side of the connection would not have mattered. Unfortunately, setting up the virtual tunnel fails in weird ways when the peer-address is unknown. This is why I use a dynamic DNS entry pointing to the dynamic IP as peer address. When I reboot the pfSense the IP sometimes changes and that delays the creation of the tunnel.

.png

To set up the VPS, I wrote this ansible role. To use it, you need to supply the peer IP, the network to use for the tunnel, any routes to create and, of course, the PSK. If you read the configuration-file, you will see that the traffic selector is set up to put all traffic through the tunnel; obviously that is not desirable, so what gives? The kernel also supports selecting the traffic destined for an IPSec-tunnel using firewall marks; you can see in the configuration file that we are using mark 158. When creating the tunnel we ask the kernel to mark all packets with the same mark (confusingly called key in the command). That way the correct packets end up being encrypted and sent to the other side.

Setting up BPG was easy using the package “FRR. The only non-obvious thing to do was setting up a “Route Map” that permits everything. This map is applied to all peerings to allow any route to be pushed.

Please note from the linked ansible role that the VPS does not start any BGP daemon but rather gets configured a list of routes. This is because it will become a kubernetes node and the kubernetes plugin “calico” will handle BGP for that node. The routes are only for bootstrapping a connection to the rest of the cluster after a reboot.

Problems and solutions

I want to close this article with an unordered collection of debugging techniques and how they helped me.

First of all, the error message

14[KNL] <con1000|14> querying policy 0.0.0.0/0|/0 === 0.0.0.0/0|/0 out failed, not found

will be spammed in the pfSense’s IPSec-log. According to the internet that is not something to worry about. 🤷

Obviously all traffic over the tunnel is firewalled; all tunnels are handles by the one rule-“Tab” “IPSec”. It’s possibly to filter by interface in a rule though.

If the connection sometimes just “stops” and then restarts at a later time, it’s possible that some of the timings of IKE are not consistent. Both peers must be configured to expect a re-key at the same time.

Reading the log on both sides is important. When one site connects and the other logs something like “no proposal chosen” that points to a mis-configuration of the cyphers used. Also, using tcpdump makes many problems obvious.