In my last post, I
explored changing the model of an IPAM application to better describe an entity
called VRF. And while my result is (I'd humbly say) closer to better describing
that entity, I'm not completely satisfied. So, to better understand the thing
I'm trying to model, I've tried to gather some solid information which could
help me model them. And, as always, along with some really cool stuff, I've
found that there's also a ton of misinformation and confusion surrounding VRFs,
as well as an aura of mystique, especially in the minds of people new to the
networking field.
These days you'd be
hard pressed to find a service provider network not using VRFs in some form or
another - from Cisco's VRF-lite to full-blown MPLS L3VPNs - and that goes for
many enterprise networks as well. Therefore a good understanding of what essentially
VRFs are, how and when do we implement them, and what are the tradeoffs and
caveats we must be wary of - is at least beneficial to a network engineer,
maybe even important. So, I decided to write a couple of posts on VRFs, and try
and throw together some of the answers to the questions that I have struggled
with in the past in trying to see the "big picture" about them.
So, without further
ado, let's start this series from the beginning and ask the two most obvious
questions - what the heck are VRFs and why would we need them anyway?
Mo routers mo problems
You may recall that
a useful way of thinking about network devices in general (and Layer-3 devices
in particular) is to think of them as a combination of three distinct (yet
interconnected and interdependent) entities:
- Control plane;
- Forwarding plane (also called "data plane");
- Management plane.
In the context of a
router, the following functions loosely map to each of the planes:
- Control plane is the part of the router hardware and software (but mostly software) which is in charge of gathering, storing and processing network reachability information - most important of those being the routes. That's the part that takes the interface addresses, static routes, runs and takes all the inputs from dynamic routing protocols on that router and produces the routing table.
- Forwarding (or data) plane is the software and hardware (but mostly hardware) which is programmed by the control plane functions, and is tasked with actually processing (routing) the packets according to the reachability information gathered by the control plane. This is where the "rubber meets the road" - packets are received, dequeued, decapsulated, outgoing interfaces are determined, packets are mangled or dropped or punted, or, if they're lucky - new headers are (re)written, packets are enqueued and sent on their merry way.
- Management plane is everything else that allows us as operators to have both an insight into the box and to interact with it and configure it.
So to see where VRFs
fit in there, we needn't look farther than the acronym itself: VRF (although it
has a long history of being interpreted differently) currently stands for
"Virtual Routing and Forwarding".
In a nutshell, VRFs
is a virtualization feature on layer 3 devices which allows us to have several
virtual routers in a single physical router box. More precisely: that is a
feature that allows us to segment the router's control plane into several
virtual instances - that's the "Virtual Routing" part - and do the
same to its forwarding plane (since it is dependent on the control plane) -
that's the "Forwarding" part.
The product of such
segmentation of control plane and forwarding plane is the creation of Layer 3 Virtual Private Networks:
- They're Layer 3 because the segmentation process results in creation of routed (layer 3) networks,
- They're Virtual because we're creating virtual layer 3 topologies on top of shared physical topology,
- They're Private because those virtual topologies (networks) are separate and disjunct from each other (barring any configuration to the contrary).
So it's useful to
keep one thing in mind: VRFs are intrinsically tied to Layer 3 VPNs -
implementation of VRFs creates L3 VPNs, and vice versa - you cannot create a
Layer 3 VPN without configuring VRFs.
Let's answer the
other question - what are those good for? Well, VRFs allow us to:
- Save money - instead of adding another physical device whenever we need to have logically separate routing domains, we can just add another VRF and all our troubles are solved [1] and the management loves us because we're not asking for more budget for new gear. [2]
- Have and operate overlapping address spaces - the "customers" of your network can use the same addresses/subnets in their networks [3], and since you can now place them in different routing domains (VRFs) - you can easily [4] manage and support all [5] of their crazy addressing notions and routing demands.
- Think more logically about the networks - we can keep routing tables smaller when we segment the space, as opposed to a single huge table. [6] That makes them easier to operate on [7] and easier to understand [8].
- Isolate the routing information - which can be a boon if your customers are paranoid about their security [9] and want to keep their routing information private to themselves and subsequently disable access to their infrastructure from your other customers.
VRFs are available
on practically all modern (and not-so-modern) routing platforms - even on Linux hosts. There
are some terminological differences between vendors, so I'll try and list those
for some of the vendors:
- Cisco uses two names for this "splitting-routers-into-independent-virtual-instances" technology, with some overlapping functionality and syntax:
- VRF-Lite;
- VRF;
- Juniper uses a concept of "routing-instances", with two types relevant for this discussion (there are others, but they relate to other types of VPNs):
- virtual-router - analogous to Cisco's VRF-Lite;
- vrf - for MPLS L3VPNs;
- Ericsson IPOS and SEOS devices use a concept of "context", which is much more general than "regular" VRFs: it describes virtualization not only of the L3 functionality, but also the router's management plane, so with Ericsson "contexts" you get even individually managed virtual devices, as opposed to "classic" VRFs.
- Nokia (ex-Alcatel) routers use a concept of "vprn", which stands for "Virtual Private Routed Network", and it's used for both L3 VPNs with or without MPLS transport.
One more thing that
I keep mentioning here but not clarifying: what is the difference between the
"VRF-Lite" and "just" VRFs? Well, there are quite a few
differences, and we'll cover them in more detail in future posts in this
series, but for now - let's say that the main difference between the two is
MPLS: you can create Layer 3 VPNs using VRF-Lite without
having an MPLS transport network. On the other hand, "full" VRFs need
to have an MPLS transport infrastructure in place.
How do they implement it on the boxes and why should I care?
There are a couple
of ways we could approach this problem. Today in 2018 AD, with virtualization
being so ubiquitous - we could fully virtualize the router. We could go all in
- virtualize all of the functions, including the management plane, and create as
many fully-virtual routers as the hardware permits. While it is a valid
approach - and widely used by many vendors today (Cisco Nexus 7k VDCs, Ericsson
IPOS and SEOS contexts, all sorts of virtual routers and appliances running on
commodity x86 hardware - Juniper vMX and vSRX, Cisco IOS XRv and IOSv, etc.) -
we need to remember that the need for such segmentation was evident way, way
before such "full" virtualization was so commonplace. In those days,
let's say late nineties - virtualization was, of course, not an unknown
concept, but was limited to hardware platforms with abundance of resources
which could handle the resource hunger of virtualization - think mainframes.
Routing platforms in those days hardly fit the description of hardware which
could afford virtualization as we know it today. So the early designers needed
to adopt a more targeted approach, and hence their decision to focus on
virtualizing only control and forwarding plane functionality.
As previously
mentioned, control plane is usually implemented in the router software - the
routing protocols are run as software processes, and the resulting routing
tables are (usually) kept in RAM. This consequently means that the forwarding
plane lends itself to a relatively easy virtualization - we can initialize and
populate additional structures in RAM to hold the new routing tables, and we
can spin new routing protocol processes to interact with those new routing
tables. That would incur some overhead on the box, but it would be manageable,
provided enough RAM and CPU power was available.
Forwarding plane
presents a bit more of a challenge: it is commonly implemented in hardware,
with a plethora of different approaches to implementing it, and a whole lot of
different hardware elements - TCAM for storing and matching forwarding
information, and all sorts of proprietary packet processors and ASICs in charge
of the packet mangling and forwarding. So, how would we go about introducing
new entities in functions which are so tied to the hardware?
For starters, the
easiest approach would be to throw additional dedicated hardware at the problem
- we could install additional store-and-match-thingies (e.g. TCAM) as well as
forwarding-thingies (e.g. ASICs) into the box to process the traffic for the newly
created routing instances. But the drawbacks of this approach are immediately
obvious:
- Store-and-match-thingies tend to be both expensive and power-hungry - the price of our devices and the price of running them with hardware VRFs would skyrocket;
- Same thing applies for the forwarding-engine thingies, maybe even more so;
- Scalability is a major issue: how many additional pieces of hardware should we add? Do we add, for example, 5 entities, and support max 5 additional VRFs? What if the customer needs only three? They paid for hardware which can service 5 VRFs and are using only 3 - not really what we would call great value for money. What if they need more? Equally bad.
So it's kind of
obvious that any sort of static and dedicated assignment of hardware elements
to VRFs is not the way to go. Therefore, most router platform designs today
implement a logical sharing of hardware resources (both forwarding databases
and forwarding engines). The forwarding databases are unified across a single
hardware "store-and-match-thingy", but they are programmed by
individual control-plane entities so that their entries contain identifiers
which signify which VRF they refer to, allowing per-VRF lookups and per-VRF
specific forwarding.
It is important to
note that all this implementation-specific stuff has some subtle consequences
of importance not only to network plaform designers, but to network engineers
as well: all of this can have and most often does
have an impact on aspects of performance and scalability. So always,
always, always bear in mind: using VRFs - as with many other features and
nerd-knobs - is always an exercise in compromises.
Let me illustrate
that on a single example: the total number of (unicast) routes a certain
platform supports does not change when
you implement VRFs, it is a platform limitation representing the total quantity of (usually) forwarding database
storage available on the box. Don't forget that this limitation does not
magically go away when you partition the routing information into separate
spaces - you will not have the total maximum number of routes per VRF, you will have it on the whole box.
Which consequently means that your per-VRF maximum number of routes can be
limited as well.
So as always - keep
these compromises in mind when thinking of implementing VRFs and procuring
hardware, and be ready to ask yourself and your vendor support and salespeople:
"How does this $performance_thing or this $limit_thing change when we implement
VRFs?".
How does it work in practice?
So, the first thing
I wanted to know once I started messing with VRFs is - how do the new routing
tables get populated? What ends up in there now?
Let's first see what
ends up in a routing table in a "normal" scenario, when we're not
using VRFs and have just a single (global or default) routing table:
- For each L3 addressed interface (physical or logical) which is "up" - an entry for a connected network will be unconditionally inserted in a routing table (along with a specific /32 prefix in Cisco and Juniper devices, designated as "Local")
- All manually configured static routes which pass muster (e.g. resolvable outgoing interface)
- All routes from routing protocols which pass both protocol criteria ("best route" in that protocol) and successfully compete for installation against other sources (e.g. best AD value)
It would be great if
we could implement VRFs so that all the above still holds true, so that
engineers working with virtual routing instances could have the same behavior
as before, with some additions. So, let's review what you can generally expect
to see in a (very lite) VRF-lite implementation:
- We must instruct the router to create a new L3 entity called VRF, and provide it a (hopefully meaningful) name which we can use to refer to it. Most routing platforms allow VRFs to be configured separately for IPv4 and IPv6, so you when you configure your VRF - you will have to specify whether you want it to be used for IPv4, IPv6 or both. When you configure a VRF, the router will (in most cases [10] ) perform the needed partitioning of control plane and forwarding plane elements, and create the numeric ID for the VRF that we mentioned before. If we're implementing VRFs as part of MPLS L3VPN solution, we'll need to configure some additional information besides the name and IP protocol version, but we'll go into those when we tackle MPLS L3VPNs.
- We must manually assign/configure L3 interfaces so that they "belong" to a VRF, that way the router knows which numerical ID to assign to connected routes - additional syntax is required.
- The static route syntax will be modified so that it is clear from the configuration in which routing table we want it to be inserted.
- The routing protocol syntax will be modified so that the router knows under which VRF they are running and into which routing table their "best routes" should attempt to be installed.
One more syntactic
consequence of VRF implementation is that once VRFs are in place, all IP commands on the platform necessarily
become "VRF aware". Syntax for displaying contents of a routing or
forwarding tables needs to include a reference to the VRF we want to see, ping
and traceroute commands must specify in which VRF the traffic will be
generated, displaying information about routing protocols running in VRFs must
specify that, etc. It's easy to forget this, and wonder why the box isn't
giving us the information we need or we seemingly have reachability issues with
ping/traceroute not working.
Final word of
warning: even though we haven't necessarily messed with the router's management
plane by introducing VRFs, if we've placed some of the router's management
interfaces into a new VRF - we need to think about reachability of the
router-generated and for-router traffic:
- SSH (and hopefully not telnet) access,
- AAA servers,
- NTP servers,
- Syslog/SNMP trap servers,
- Netfow export targets.
Conclusion
VRFs are a great
tool for segmenting and virtualizing the network. It's a mature solution, with
good support across vendors and platforms, and is unavoidable when we need to
support overlapping address spaces and isolation on a shared hardware
infrastructure - and that's a pretty common requirement for networks these
days, so I'd argue that you, as a network engineer - must know about them. While it's all nice and rosy, a couple of
things you should keep in mind:
- Using VRFs implies some limitations - be sure to know what those are when you're planning and implementing them;
- Using VRFs means syntax for many IP-related commands becomes VRF-aware;
- Placing the router's management interfaces into VRFs might change some of the for-router or router-generated traffic patterns, or necessitate some VRF-aware configuration addition
In the next post,
I'll go over configuring and exploring the possibilities of litest of VRF-lite,
on Cisco and Juniper devices. Stay tuned!
[1] Disclaimer:
chances are - when you start meddling with VRFs - your troubles have probably
just begun.
[2] It might actually
grow, because vendors often charge extra for VRF support. And - no, you're
still asking for too much money. No, the budget should be optimally negative.
Nah, the business still hates ya and sees you as a cost center. Welcome to networking.
[3] Terms and
conditions apply. The rest of your network architecture might have something to
say about that.
[4] For a given value
of "easily".
[5] For a given value
of "all".
[6] That's usually
marketing. You usually just end up partitioning the space differently, but all
of the scale is still there. But it's hidden. But it looks a bit more tidy. But
it's not.
[7] It's not. You get
additional syntax to remember.
[8] It's not. You get
additional headaches for trying to understand L3 VPNs.
[9] They're probably
not, but the good news is - you have to be for them.
[10] Some older
platforms actually required the platform to be set up and the forwarding plane
elements pre-partitioned by the operator, in advance, in order to support VRFs
- see Cisco
SDM (Switch Database Management) templates, on Catalyst 3560-X or 3750-X
switches.
No comments:
Post a Comment