Simplicity, scalability, efficiency, flexibility — who doesn’t want to be able to use those words when talking about their data center? As more and more companies adopt web-scale networking and watch their growth rapidly increase, the need for an equally scalable and powerful solution becomes apparent. Fortunately, Cumulus Networks has a solution. We believe in listening to what our customers want and providing them with what they need; that’s why we support the Facebook Backpack for 64 to 128 ports of 100gig connectivity and the Edge-Core OMP800 for 256 ports of 100gig connectivity. So, what exactly is so great about these chassis? Let’s take a closer, more technical look.
When designing and building out new data centers, customers have universally agreed on spine and leaf networks as the way to go. Easy scale out by adding more leafs when server racks are added and more manageable oversubscription by adding more spines makes this design an obvious choice. We at Cumulus have built some of the largest data centers in the world out of one-rack-unit switches: 48 port leafs and 32 port spines. These web-scale data centers build “three-tier” spine and leaf networks using spine and leaf “pods” and an additional layer of “superspines” to connect the pods together.
(A three-tier spine and leaf network with superspines at the top and an example of a single spine and leaf pod)
On the other side of the spectrum you have customers building less than 16 racks of compute, easily building two-tier spine and leaf networks with two leafs per rack.
If you have more than 16 racks but don’t want to go through the trouble of designing and cabling up a layer of super spines, there haven’t been many options. Now that Cumulus supports the Facebook Backpack and the Edge-Core OMP800, customers can build much larger pods of at least 128 racks, with two leafs per rack. All of the front panel ports support any mix of interface speeds, from single 100g, 2×50, 1×40, 4×25 or 4x10g breakouts.
(The EdgeCore OMP800 256 port chassis on the left and the Facebook Backpack 128 port chassis on the right)
Both the Backpack and OMP800 use a spine and leaf network inside the chassis, exactly like a spine-superspine network design, but without all the cabling complexity. What both Facebook and Edge-Core have done is taken each leaf and turned it into a line card, and taken each spine and turned it into a fabric card. Inside the chassis switch, everything acts like a single, standalone switch. There is no hidden fabric protocol or secret communication channel between the cards. It’s all ethernet with a fully routed, layer 3 backplane.
(The super spines of the three tier spine and leaf network become the fabric cards within the chassis and the pod spines become the line cards of the chassis)
The line card
On both chassis, each line card contains two Broadcom Tomahawk ASICs, each controlling 16 front panel ports. This is the same Tomahawk ASIC in the 32 port, 100gig switches already offered by Cumulus today. The remaining 16 ports on the line card connect into the fabric, creating 32, 100gig ports: 16 front panel ports that connect to leafs and 16 internal fabric ports. Each ASIC has its own, dedicated CPU. With an ASIC and CPU, each half of the line card acts as a fully independent 32 port switch, running its own copy of Cumulus Linux with its own configuration. This means you can upgrade or reboot half of a line card without impacting the other half.
(This is a line card from the OMP800. The two yellow boxes are the two Tomahawk ASICs, one for each half of the card. At the bottom are the 32 QSFP ports. The connectors at the top provide connectivity to the fabric modules. The two black heatsinks at the top and bottom of the image in the center are the two CPUs, one for each ASIC and each half of the line card)
Inside the fabric, the Backpack and OMP800 do things a little differently. The Backpack has four fabric modules, each with a single Tomahawk ASIC. The OMP800 also has four fabric modules, but puts two Tomahawk ASICs on each module. Just like the line cards, the fabric modules have their own, dedicated CPU per ASIC. Again, each ASIC runs its own full version of Cumulus Linux with its own configuration.
(This is the OMP800 fabric module. The two fabric ASICs are highlighted. There are no QSFP ports, since all connections are internal to the chassis. Just like the line cards, there are two black CPU heatsinks, near the top of the image).
The fabric itself, on both chassis, is fully non-blocking, with 1:1 oversubscription. On the Backpack, there are four ethernet connections from each line card ASIC to each fabric card. On the OMP800, since there are twice as many fabric ASICs, there are only two connections from each line card ASIC to each fabric ASIC.
The Cumulus touch
The idea of multiple connections and a spine and leaf style topology within a chassis is nothing new, but what’s different with the chassis supported by Cumulus is that each element in the chassis is a fully independent switch. Each line card has normal ethernet connections to each fabric card and they just speak BGP between them!
Logging into an OMP800, line card 1, ASIC 1, we see 32 ethernet interfaces: swp1-16 and fp0-15. Like all Cumulus switches the “swp” ports are the front panel ports; fp0-15 are the internal ethernet ports that connect the line card to the fabric cards. We can see each fabric card (labeled “fc”) as an LLDP peer and 16 way ECMP for a route across the fabric ports.
(Here we see 16 LLDP peers, one to each ASIC on each fabric card.)
(The routing output on the OMP800 shows 16 equal cost routes across each of the fp interfaces.)
Because the backplane is just ethernet, it’s easy to look at interface counters, troubleshoot issues between line cards and fabric cards and understand exactly what link or fabric card will be used for a given packet. No more taking screenshots of the support engineer running secret commands on how to troubleshoot your chassis. You could even SPAN a line card’s fabric ports or ERSPAN any port on the fabric. As a former TAC engineer, I think that’s pretty cool.
(Each fp has it’s own Rx, Tx and Error counters. You can see how each individual connection is doing and even shut down a specific port if there were problems)
CPU and physical specs
I mentioned earlier that each ASIC has its own CPU, meaning you can upgrade or reboot an ASIC at a time without impacting the rest of the system. On traditional chassis all of the line cards and all of the fabric cards share a single, central CPU on the supervisor module. On these traditional chassis architectures, that central supervisor owns and manages the whole system. If you lose your supervisor you lose the system. If you want to upgrade you hope that the “hitless upgrade” feature like ISSU works, and that’s if your upgrade path is supported. I’m sure we’ve all seen the feature or major upgrade path that isn’t supported with ISSU.
In contrast, with a stand alone CPU I can upgrade half of a line card or any part of the fabric without worrying. Since it’s just ethernet and just BGP inside, there is no need for a fragile system like ISSU to make everything play nice across software versions.
Finally, looking at the physical aspects, the OMP800 provides 256 ports of connectivity in 10 rack units, saving 14 rack units of space. Having a chassis instead of 32 port switches also saves 256 cables and 512 optics and over 30% in power draw. The savings on optics alone, we all know, will make the chassis pay for itself!
If you’re considering building more than 16 racks of compute, the new Cumulus open networking chassis seem like a no brainer. Try out Cumulus technology for free with Cumulus in the Cloud and get your data center revolution started.
<< This article was originally published on blog here. >>