GENEVE (Generic Network Virtualization Encapsulation) is a tunnel specification designed to be flexible, the tunnel header has extendable options in a Type-Length-Value (TLV) format.
The Linux Kernel has support GENEVE at least from version 4.3, but there is little documentation on how to set it up and how to configure GENEVE options.
In order to leverage all of the capabilities of the GENEVE tunnel, we will use the tc
and ip
utilities that are part of the iproute2 package. The examples were tested on Ubuntu 20.04 with kernel 5.4, and iproute2 version 5.5.0.
In this article, we will see how to set up a GENEVE tunnel interface and how to set the GENEVE header on egress traffic and parse it on ingress traffic using the Linux Traffic Control system.
Egress
First, we create the GENEVE tunnel interface:
sudo ip link add name gnv0 type geneve dstport 0 external
sudo ip link set gnv0 up
The first command creates a GENEVE tunnel interface, the external
keyword lets the Kernel know that the interface is managed by an external control plane in our case tc
. The second commands change the state of the device to up.
We can use the command ip link show
to investigate the interface:
ubuntu@gnv:~$ ip link show gnv0
3: gnv0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65465 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 0e:a4:0c:8f:71:2e brd ff:ff:ff:ff:ff:ff
We can see that the qdisc
section of gnv0
is set to noqueue
, in order to use tc
to perform the GENEVE encapsulation for us we need to add a queue to the interface this is done with the tc qdisc
command:
sudo tc qdisc add dev gnv0 root handle 1: prio
This command creates a prio qdisc for the tunnel interface, now we will be able to use tc
to manipulate egress traffic from it. (run ip link show gnv0
and see the qdisc
is now set to prio
)
now let's create a simple tc
filter that takes any packet passed to the GENEVE tunnel interface and encapsulate it.
sudo tc filter add dev gnv0 protocol ip parent 1: \
matchall \
action tunnel_key set \
src_ip 172.31.0.11 \
dst_ip 172.31.0.12 \
dst_port 6081 \
id 123456 \
geneve_opts 0FF01:80:123456789 \
pass
The filter command is composed of 2 elements classification and actions, the actions are performed on packets that match the classification criteria. in the above command, the classifier is matchall
this means all egress packets of gnv0 are processed (flow based filters can be used to filter by inner and outer headers). the actions section has only one action of type tunnel_key
with the option set
which used to set the GENEVE tunnel header the resulting header will have the following fields set:
+----------+--------------------+
| src | 172.31.0.11 |
| dst | 172.31.0.12 |
| dst_port | 6081 |
| vni | 123456 |
| opts | 0FF01:80:123456789 |
+----------+--------------------+
the pass
command tells tc
to allow the packet to continue without further processing by other filters/actions.
ARP issues
One problem is that in Linux GENEVE tunnel is L2 and so we need to answer ARP requests for the GENEVE tunnel. one workaround for this is to set an IP address on the tunnel device and use the command arp
to set an arp record like this:
sudo ip addr add 10.0.0.2/30 dev gnv0
sudo arp 10.0.0.1 00:00:00:00:00:01
now when adding routes through the tunnel interface the arp record is set:
sudo ip route add 1.2.3.4 dev gnv0 via 10.0.0.1
Ingress
In order to parse incoming GENEVE traffic we need to create another qdisc this time for ingress:
sudo tc qdisc add dev gnv0 ingress
ingress
is a special qdisc for processing incoming packets. now we can use a filter to parse the incoming traffic:
sudo tc filter add dev gnv0 protocol ip parent ffff: \
flower geneve_opts 0FF01:80:123456789 \
action tunnel_key unset \
action simple sdata "tlv match 0FF01:80:123456789"
This filter uses the flower classifier, the flower classifier matches a flow by a set of keys in the above example it will match incoming packets with GENEVE opts set to 0FF01:80:123456789
, this time the tunnel_key
action is called with the unset option which strips the GENEVE header, the second action simple
is used to print into the dmesg
log.