Author: ccie14023

In Praise of Vendor Lock-In

There is one really nice thing about having a blog whose readership consists mainly of car insurance spambots:  I don’t have to feel guilty when I don’t post anything for a while.  I had started a series on programmability, but I managed to get sidetracked by the inevitable runup to Cisco Live that consumes Cisco TME’s, and so that thread got a bit neglected. Meanwhile, an old article by the great Ivan Pepelnjak got me out of post-CL recuperation and back onto the blog.  Ivan’s article talks about how vendor lock-in is inevitable.  Thank you, Ivan.  Allow me to go further, and write a paean in praise of vendor lock-in.  Now this might seem predicable given that I work at Cisco, and previously worked at Juniper.  Of course, lock-in is very good for the vendor who gets the lock.  However, I also spent many years in IT, and also worked at a partner, and I can say from experience that I prefer to manage single vendor networks.  At least, as single vendor as is possible in a modern network.  Two stories will help to illustrate this. In my first full-fledged network engineer job, I managed the network for a large metropolitan newspaper (back when such a thing existed.)  The previous network team had installed a bunch of Foundry gear.  They also had a fair amount of Cisco.  It was...

Read More

TAC Tales #11: Full up

No customer is happy if they have to reboot one of their Internet-facing routers periodically, and this was one of our biggest customers.  (At HTTS, they were all big customers.)  This customer had a GSR connecting to the Internet, with partial BGP routes, and he kept getting this error: %RP-3-ENCAP: Failure to allocate encap table entry, exceeded max number of entries, slot 2 1 %RP-3-ENCAP: Failure to allocate encap table entry, exceeded max number of entries, slot 2 Eventually the router would stop passing traffic and when this happened, he had to reload it.  Needless to say, he wasn’t happy. The error came with a traceback, which shows what functions the code was executing when the error was generated.  The last function was this: arp_background(0x5053d290)+0x140 1 arp_background(0x5053d290)+0x140 Well, this was obviously some sort of ARP issue.  But why was ARP causing the router to stop forwarding traffic? Looking up the error, I found that it meant the route processor was unable to allocate a rewrite entry for the slot 2 line card.  As a packet leaves the fabric of a large router like the GSR, the headers are re-written with the destination layer 2 info.  The rewrite table used for this was full.  I had the customer run a hidden command a few times, and we could see the table entries incrementing quickly: Adjacency Table has 3167 adjacencies Adjacency Table...

Read More

Programmability for Network Engineers

Since I finished my series of articles on the CCIE, I thought I would kick off a new series on my current area of focus:  network programmability.  The past year at Cisco, programmability and automation have been my focus, first on Nexus and now on Catalyst switches.  I did do a two-part post on DCNM, a product which I am no longer covering, but it’s well worth a read if you are interested in learning the value of automation. One thing I’ve noticed about this topic is that many of the people working on and explaining programmability have a background in software engineering.  I, on the other hand, approach the subject from the perspective of a network engineer.  I did do some programming when I was younger, in Pascal (showing my age here) and C.  I also did a tiny bit of C++ but not enough to really get comfortable with object-oriented programming.  Regardless, I left programming (now known as “coding”) behind for a long time, and the field has advanced in the meantime.  Because of this, when I explain these concepts I don’t bring the assumptions of a professional software engineer, but assume you, the reader, know nothing either. Thus, it seems logical that in starting out this series, I need to explain what exactly programmability means in the context of network engineering, and what it means to do something programmatically....

Read More

TAC Tales #10: Out to Lunch

When you work at TAC, you are required to be “on-shift” for 4 hours each day.  This doesn’t mean that you work four hours a day, just that you are actively taking cases only four hours per day.  The other four (or more) hours you work on your existing backlog, calling customers, chasing down engineering for bug fixes, doing recreates, and, if you’re lucky, doing some training on the side.  While you were on shift, you would still work on the other stuff, but you were responsible for monitoring your “queue” and taking cases as they came in.  On our queue we generally liked to have four customer support engineers (CSE’s) on shift at any time.  Occasionally we had more or less, but never less than two.  We didn’t like to run with two engineers for very long;  if a P1 comes in, a CSE can be tied up for hours unable to deal with the other cases that come in, and the odds are not low that more than one P1 come in.  With all CSE’s on-shift tied up, it was up to the duty manager to start paging off-shift engineers as cases came in, never a good thing.  If ever you were on hold for a long time with a P1, there is a good chance the call center agent was simply unable to find a CSE because...

Read More

Cisco DCNM 10 Overlay Provisioning Part 2

Introduction My role at Cisco is transitioning to enterprise so I won’t be working on Nexus switches much any more.  I figured it would be a good time to finish this article on DCNM.  In my previous article, I talked about DCNM’s overlay provisioning capabilities, and explained the basic structure DCNM uses to describe multi-tenancy data centers.  In this article, we will look at the details of 802.1q-triggered auto-configuration, as well as VMtracker-based triggered auto-configuration.  Please be aware that the types of triggers and their behaviors depends on the platform you are using.  For example, you cannot do dot1q-based triggers on Nexus 9k, and on Nexus 5k, while I can use VMTracker, it will not prune unneeded VLANs.  If you have not read my previous article, please review it so the terminology is clear. Have a look at the topology we will use: The spine switches are not particularly relevant, since they are just passing traffic and not actively involved in the auto-configuration.  The Nexus 5K leaves are, of course, and attached to each is an ESXi server.  The one on the left has two VMs in two different VLANs, 501 and 502.  The 5k will learn about the active hosts via 802.1q triggering.  The rightmost host has only one VM, and in this case the switch will learn about the host via VMtracker.  In both cases the switches will...

Read More