Archives

All posts by ccie14023

Since I finished my series of articles on the CCIE, I thought I would kick off a new series on my current area of focus:  network programmability.  The past year at Cisco, programmability and automation have been my focus, first on Nexus and now on Catalyst switches.  I did do a two-part post on DCNM, a product which I am no longer covering, but it’s well worth a read if you are interested in learning the value of automation.

One thing I’ve noticed about this topic is that many of the people working on and explaining programmability have a background in software engineering.  I, on the other hand, approach the subject from the perspective of a network engineer.  I did do some programming when I was younger, in Pascal (showing my age here) and C.  I also did a tiny bit of C++ but not enough to really get comfortable with object-oriented programming.  Regardless, I left programming (now known as “coding”) behind for a long time, and the field has advanced in the meantime.  Because of this, when I explain these concepts I don’t bring the assumptions of a professional software engineer, but assume you, the reader, know nothing either.

Thus, it seems logical that in starting out this series, I need to explain what exactly programmability means in the context of network engineering, and what it means to do something programmatically.

Programmability simply means the capacity for a network device to be configured and managed by a computer program, as opposed to being configured and managed directly by humans.  This is a broad definition, but technically using an interface like Expect (really just CLI) or SNMP qualifies as a type of programmability.  Thus, we can qualify this by saying that programmability in today’s world includes the design of interfaces that are optimized for machine-to-machine control.

To manage a network device programmatically really just means using a computer program to control that network device.  However, when we consider a computer program, it has certain characteristics over and above simply controlling a device.  Essential to programming is the design of control structures that make decisions based on certain pieces of information.

Thus, we could use NETCONF to simply push configuration to a router or switch, but this isn’t the most optimal reason to use it.  It would be a far more effective use of NETCONF if we read some piece of data from the device (say interface errors) and took an action based on that data (say, shutting the interface down when the counters got too high.)  The other advantage of programmability is the ability to tie together multiple systems.  For example, we could read a device inventory out of APIC-EM, and then push config to devices based on the device type.  In other words, the decision-making capability of programmability is most important.

Network programmability encompasses a number of technologies:

  • Day 0 technologies to bring network devices up with an appropriate software version and configuration, with little to no human intervention.  Examples:  ZTP, PoAP, PnP.
  • Technologies to push and read configuration and operational data from devices.  Examples:  SNMP, NETCONF.
  • Automation systems such as Puppet, Chef, and Ansible, which are not strictly programming languages, but allow for configuration of numerous devices based on the role of the device.
  • The use of external programming languages, such as Ruby and Python, to interact with network devices.
  • The use of on-box programming technologies, such as on-box Python and EEM, to control network devices.

In this series of articles we will cover all of these topics as well as the mysteries of data models, YANG, YAML, JSON, XML, etc., all within the context of network engineering.  I know when I first encountered YANG and data models, I was quite confused and I hope I clear up some of this confusion.

1
1

When you work at TAC, you are required to be “on-shift” for 4 hours each day.  This doesn’t mean that you work four hours a day, just that you are actively taking cases only four hours per day.  The other four (or more) hours you work on your existing backlog, calling customers, chasing down engineering for bug fixes, doing recreates, and, if you’re lucky, doing some training on the side.  While you were on shift, you would still work on the other stuff, but you were responsible for monitoring your “queue” and taking cases as they came in.  On our queue we generally liked to have four customer support engineers (CSE’s) on shift at any time.  Occasionally we had more or less, but never less than two.  We didn’t like to run with two engineers for very long;  if a P1 comes in, a CSE can be tied up for hours unable to deal with the other cases that come in, and the odds are not low that more than one P1 come in.  With all CSE’s on-shift tied up, it was up to the duty manager to start paging off-shift engineers as cases came in, never a good thing.  If ever you were on hold for a long time with a P1, there is a good chance the call center agent was simply unable to find a CSE because they were all tied up.  Sometimes it was due to bad planning, sometimes lack of staff.  Sometimes you would start a shift with five CSE’s on the queue and they’d all get on P1’s in the first five minutes.  The queue was always unpredictable.

At TAC, when you were on-shift, you could never be far from your desk.  You were expected to stay put, and if you had to get up to use the bathroom or go to the lab, you notified your fellow on-shift engineers so they knew you wouldn’t be available.  Since I preferred the 10am-2pm shift, in 2 years I took lunch away from my desk maybe 5 times.  Most days I told the other guys I was stepping out, ran to the cafeteria, and ran back to my desk to eat while taking cases.

Thus, I was quite happy one day when I had a later shift and my colleague Delvin showed up at my desk and asked if I wanted to go to a nice Chinese lunch.  Eddy Lau, one of our new CSE’s who had recently emigrated from China, had found an excellent and authentic restaurant.  We hopped into Eddy’s car and drove over to the restaurant, where we proceeded to have a two-hour long Chinese feast.  “It’s so great to actually go to lunch,” I said to Delvin, “since I eat at my desk every day.”  Eddy was happy to help his new colleagues out.

As we were driving back, Devlin asked Eddy, “When are you on shift?”

“Right now,” said Eddy.

“You’re on shift now?!” Delvin asked incredulously.  “Dude, you can’t leave for two hours if you’re on shift.  Who are you on shift with?”

“Just me and Sarah,” Eddy said, not really comprehending the situation.

“You left Sarah on shift by herself?!” Devlin asked.  “What if a P1 comes in?  What if she gets swamped by cases?  You can’t leave someone alone on the queue!”

We hurried back to the office and pulled up WebMonitor, which showed not only active cases, but who had taken cases that shift and how many.  Sarah had taken a single case.  By some amazing stroke of luck, it had been a very quiet shift.

I walked by Eddy’s desk and he gave me a thumbs up and a big smile.  I figured he wouldn’t last long.  A couple months later, after blowing a case, Eddy got put on RMA duty and subsequently quit.

If you ever wonder why you had to wait so long on the phone, it could be a busy day.  Or it could be your CSE’s decided to take a long lunch without telling anyone.

Introduction

My role at Cisco is transitioning to enterprise so I won’t be working on Nexus switches much any more.  I figured it would be a good time to finish this article on DCNM.  In my previous article, I talked about DCNM’s overlay provisioning capabilities, and explained the basic structure DCNM uses to describe multi-tenancy data centers.  In this article, we will look at the details of 802.1q-triggered auto-configuration, as well as VMtracker-based triggered auto-configuration.  Please be aware that the types of triggers and their behaviors depends on the platform you are using.  For example, you cannot do dot1q-based triggers on Nexus 9k, and on Nexus 5k, while I can use VMTracker, it will not prune unneeded VLANs.  If you have not read my previous article, please review it so the terminology is clear.

Have a look at the topology we will use:

autoconfig

The spine switches are not particularly relevant, since they are just passing traffic and not actively involved in the auto-configuration.  The Nexus 5K leaves are, of course, and attached to each is an ESXi server.  The one on the left has two VMs in two different VLANs, 501 and 502.  The 5k will learn about the active hosts via 802.1q triggering.  The rightmost host has only one VM, and in this case the switch will learn about the host via VMtracker.  In both cases the switches will provision the required configuration in response to the workloads, without manual intervention, pulling their configs from DCNM as described in part 1.

Underlay

Because we are focused on overlay provisioning, I won’t go through the underlay piece in detail.  However, when you set up the underlay, you need to configure some parameters that will be used by the overlay.  Since you are using DCNM, I’m assuming you’ll be using the Power-on Auto-Provision feature, which allows a switch to get its configuration on bootup without human intervention.

config-fabric

Recall that a fabric is the highest level construct we have in DCNM.  The fabric is a collection of switches running an encapsulation like VXLAN or FabricPath together.  Before we create any PoAP definitions, we need to set up a fabric.  During the definition of the fabric, we choose the type of provisioning we want.  Since we are doing auto-config, we choose this option as our Fabric Provision Mode.  The previous article describes the Top Down option.

Next, we need to build our PoAP definitions.  Each switch that is configured via PoAP needs a definition, which tells DCNM what software image and what configuration to push.  This is done from the Configure->PoAP->PoAP Definitions section of DCNM.  Because generating a lot of PoAP defs for a large fabric is tedious, DCNM10 also allows you to build a fabric plan, where you specify the overall parameters for your fabric and then DCNM generates the PoAP definitions automatically, incrementing variables such as management IP address for you.  We won’t cover fabric plans here, but if you go that route the auto-config piece is basically the same.config-poap-defs

Once we are in the PoAP definition for the individual switch, we can enable auto-configuration and select the type we want.

poap-def

In this case I have only enabled the 802.1q trigger.  If I want to enable VMTracker, I just check the box for it and enter my vCenter server IP address and credentials in the box below.  I won’t show the interface configuration, but please note that it is very important that you choose the correct access interfaces in the PoAP defs.  As we will see, DCNM will add some commands under the interfaces to make the auto-config work.

Once the switch has been powered on and has pulled down its configuration, you will see the relevant config under the interfaces:

n5672-1# sh run int e1/33
interface Ethernet1/33
switchport mode trunk
encapsulation dynamic dot1q
spanning-tree port type edge trunk

If the encapsulation command is not there, auto-config will not work.

Overlay Definition

Remember from the previous article that, after we define the Fabric, we need to define the Organization (Tenant), the Partition (VRF), and then the Network.  Defining the organization is quite easy: just navigate to the organizations screen, click the plus button, and give it a name.  You may only have one tenant in your data center, but if you have more than one you can define them here.  (I am using extremely creative and non-trademark-violating names here.)  Be sure to pick the correct Fabric name in the drop-down at the top of the screen;  often when you don’t see what you are expecting in DCNM, it is because you are not on the correct fabric.

config-organization

Next, we need to add the partition, which is DCNM’s name for a VRF.  Remember, we are talking about mutlitenancy here.  Not only do we have the option to create multiple tenants, but each tenant can have multiple VRFs.  Adding a VRF is just about as easy as adding an organization.  DCNM does have a number of profiles that can be used to build the VRFs, but for most VXLAN fabrics, the default EVPN profile is fine.  You only need to enter the VRF name.  The partition ID is already populated for you, and there is no need to change it.

partition

There is something important to note in the above screen shot.  The name given to the VRF is prepended with the name of the organization.  This is because the switches themselves have no concept of organization.  By prepending the org name to the VRF, you can easily reuse VRF names in different organizations without risk of conflict on the switch.

Finally, let’s provision the network.  This is where most of the configuration happens.  Under the same LAN Fabric Automation menu we saw above, navigate to Networks.  As before, we need to pick a profile, but the default is fine for most layer 3 cases.

network

Once we specify the organization and partition that we already created, we tell DCNM the gateway address.  This is the Anycast gateway address that will be configured on any switch that has a host in this VLAN.  Remember that in VXLAN/EVPN, each leaf switch acts as a default gateway for the VLANs it serves.  We also specify the VLAN ID, of course.

Once this is saved, the profile is in DCNM and ready to go.  Unlike with the underlay config, nothing is actually deployed on the switch at this point.  The config is just sitting in DCNM, waiting for a workload to become active that requires it.  If no workload requires the configuration we specified, it will never make it to a switch.  And, if switch-1 requires the config while switch-2 does not, well, switch-2 will never get it.  This is the power of auto-configuration.  It’s entirely likely that when you are configuring your data center switches by hand, you don’t configure VLANs on switches that don’t require them, but you have to figure that out yourself.  With auto-config, we just deploy as needed.

Let’s take a step back and review what we have done:

  1. We have told DCNM to enable 802.1q triggering for switches that are configured with auto-provisioning.
  2. We have created an organization and partition for our new network.
  3. We have told DCNM what configuration that network requires to support it.

Auto-Config in Action

Now that we’ve set DCNM up, let’s look at the switches.  First of all, I verify that there is no VRF or SVI configuration for this partition and network:


jemclaug-hh14-n5672-1# sh vrf all
VRF-Name VRF-ID State Reason
default 1 Up --
management 2 Up --

jemclaug-hh14-n5672-1# sh ip int brief vrf all | i 192.168
jemclaug-hh14-n5672-1#

We can see here that there is no VRF other than the default and management VRFs, and there are no SVI’s with the 192.168.x.x prefix. Now I start a ping from my VM1, which you will recall is connected to this switch:

jeffmc@ABC-VM1:~$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=9 ttl=255 time=0.794 ms
64 bytes from 192.168.1.1: icmp_seq=10 ttl=255 time=0.741 ms
64 bytes from 192.168.1.1: icmp_seq=11 ttl=255 time=0.683 ms

Notice from the output that the first ping I get back is sequence #9. Back on the switch:

jemclaug-hh14-n5672-1# sh vrf all
VRF-Name VRF-ID State Reason
ABCCorp:VRF1 4 Up --
default 1 Up --
management 2 Up --
jemclaug-hh14-n5672-1# sh ip int brief vrf all | i 192.168
Vlan501 192.168.1.1 protocol-up/link-up/admin-up
jemclaug-hh14-n5672-1#

Now we have a VRF and an SVI! As I stated before, the switch itself has no concept of organization, which is really just a tag DCNM applies to the front of the VRF. If I had created a VRF1 in the XYZCorp organization, the switch would not see it as a conflict because it would be XYZCorp:VRF1 instead of ABCCorp:VRF1.

If we want to look at the SVI configuration, we need to use the expand-port-profile option. The profile pulled down from DCNM is not shown in the running config:


jemclaug-hh14-n5672-1# sh run int vlan 501 expand-port-profile

interface Vlan501
no shutdown
vrf member ABCCorp:VRF1
ip address 192.168.1.1/24 tag 12345
fabric forwarding mode anycast-gateway

VMTracker

Let’s have a quick look at VMTracker. As I mentioned in this blog and previous one, dot1q triggering requires the host to actually send data before it auto-configures the switch. The nice thing about VMTracker is that it will configure the switch when a VM becomes active, regardless of whether it is actually sending data. The switch itself is configured with the address of and credentials for your vCenter server, so it becomes aware when a workload is active.

Note: Earlier I said you have to configure the vCenter address and credentials in DCNM. Don’t be confused! DCNM is not talking to vCenter, the Nexus switch actually is. You only put it in DCNM if you are using Power-on Auto-Provisioning. In other words, DCNM will not establish a connection to vCenter, but will push the address and credentials down to the switch, and the switch establishes the connection.

We can see the VMTracker configuration on the second Nexus 5K:


jemclaug-hh14-n5672-2(config-vmt-conn)# sh run | sec vmtracker
feature vmtracker
encapsulation dynamic vmtracker
vmtracker fabric auto-config
vmtracker connection vc
remote ip address 172.26.244.120
username administrator@vsphere.local password 5 Qxz!12345
connect

The feature is enabled, and the “encapsulation dynamic vmtracker” command is applied to the relevant interfaces. (You can see the command here, but because I used the “| sec” option to view the config, you cannot see what interface it is applied under. We can see that I also supplied the vCenter IP address and login credentials. (The password is sort-of encrypted.) Notice also the connect statement. The Nexus will not connect to the vCenter server until this is applied. Now we can look at the vCenter connection:

jemclaug-hh14-n5672-2(config-vmt-conn)# sh vmtracker status
Connection Host/IP status
-----------------------------------------------------------------------------
vc 172.26.244.120 Connected

We have connected successfully!

As with dot1q triggering, there is no VRF or SVI configured yet for our host:

jemclaug-hh14-n5672-2# sh vrf all
VRF-Name VRF-ID State Reason
default 1 Up --
management 2 Up --
jemclaug-hh14-n5672-2# sh ip int brief vrf all | i 192.168

We now go to vSphere (or vCenter) and power up the VM connected to this switch:

 

vcenter-power-on

Once we bring up the VM, we can see the switch has been notified, and the VRF has been automatically provisioned, along with the SVI.

jemclaug-hh14-n5672-2# sh vmtracker event-history | i VM2
761412 Dec 21 2016 13:43:02:572793 ABC-VM2 on 172.26.244.177 in DC4 is powered on
jemclaug-hh14-n5672-2# sh vrf all | i ABC
ABCCorp:VRF1 4 Up --
jemclaug-hh14-n5672-2# sh ip int brief vrf all | i 192
Vlan501 192.168.1.1 protocol-up/link-up/admin-up
jemclaug-hh14-n5672-2#

Thus, we had the same effect as with dot1q triggering, but we didn’t need to wait for traffic!

I hope these articles have been helpful. Much of the documentation on DCNM right now is not in the form of a walk-through, and while I don’t offer as much detail, hopefully these articles should get you started. Remember, with DCNM, you get advanced features free for 30 days, so go ahead and download and play with it.

In the final post in my “Ten Years a CCIE” series, I take a look at the age-old question: Is a CCIE really worth it? I conclude the series with some thoughts on the value of this journey.

I’ve written this series of posts in the hope that others considering the pursuit of a CCIE  would have some idea of the process, the struggles, and the reward of passing this notorious exam.  I would like to finish this series with a final article examining the age-old question:  What is the value of a CCIE?  In other words, was it worth it?  All of the other articles in the series were written a couple of years ago, back when I worked for Juniper in IT, with only some slight revisions before publishing.  However, this article is being written now (late 2016), in very different circumstances.  I work at Cisco now, not Juniper.  I work in the switching business unit, not in IT.  I work in product management, and as a Principal Engineer have direct influence on the direction of our products.  I specifically work on programmability, automation, and SDN solutions.  As I write this, I see many potential CCIE candidates wondering if it is even worth pursuing a career in networking any more.  After all, won’t SDN and automation just eliminate their jobs? Wouldn’t they be better off learning Python?  Isn’t Cisco a company in its death throes, facing extinction at the hands of upstarts like Arista and Viptela?

Well, I can’t predict the future.  But I can look at the issues from the perspective of someone who has been in the industry a long time now, and at least set your mind at ease.  The short answer:  it’s worth it.  The longer answer requires us to examine the question from a few different angles.

Technical Knowledge

As I’ve pointed out in previous articles, during the CCIE preparation process, you will master a vast amount of material, if you approach the test honestly.  But, as anyone in this industry knows, the second you pass, a timer starts counting down the value of the material you learned.  On my R&S exam (2004), I studied DLSw+, ATM, ISDN, and Frame Relay.  On my Security (2008) exam, I studied the VPN 3k concentrator, PIX, and NAC Framework.  How much of that is valuable now?

True, I studied many things that are still in use today.  OSPF, BGP, EIGRP, and ISIS were all heavily tested on the R&S exam even back in 2004.  But how much of it do I remember?  I certainly have a good knowledge of each of them, but even after taking the JNCIE-SP exam less than two years ago, I find my knowledge of the details of routing protocols fading away.  If I were to take the JNCIE exam today, I would certainly fail.

Thus, some technologies I learned are obsolete, and some are not, but I have forgotten a lot.  However, there are also technologies that I never learned.  For example, ISE and Cisco TrustSec are huge topics that I am just starting to play with.  Despite my newness to these technologies, I do have the same right to call myself a “Security CCIE” as a guy who passed it yesterday, and knows those subjects cold!

So, then, what about the value of the technical knowledge?  Is it worth it?

First, it is because no matter what becomes obsolete, and no matter what you forget, the intensity of your study will guarantee you still have a core of knowledge that will be with you for a long time.  Fortunately, despite the rapidity of change our industry is supposedly undergoing, networking is conservative by nature.  The Internet is a large distributed system consisting of many systems running disparate operating systems and they have to work together.  Just ask the guys pushing IPv6.  The core of networking doesn’t change very much.  You will also learn that even new technologies, like VXLAN, build on concepts you already know.

Second, even obsolete knowledge is valuable.  Newer engineers often don’t realize how certain technologies developed, or why we do things the way that we do.  They don’t know the things that have been tried and which failed in the past.  Despite the premium our industry places on youth, old-timers do have important insight gleaned from playing with things like, say, DLSw+.  It’s important for us not to simply wax nostalgic (as I am here) about the glorious days of IPX, but to explain to younger engineers how these dinosaurs actually worked, so that they can understand what decisions protocol designers made and why, and why some technologies fell into disuse or were replaced.  And, millennials, it is your responsibility to sit up and listen, if not to seek out such information.

Technical “soft skills”

In addition to specific technical knowledge, CCIE preparation simply makes you better at configuring and troubleshooting in general.  The hours of frustration in the lab and the broken configs that have you pulling your hair out ultimately help you to learn what makes a network break, and how to go about systematically debugging it.  Certainly you can learn those skills elsewhere, but when you are faced with the challenge of building a lab in a short amount of time, you have to be able to think quickly on your feet.  Even troubleshooting a live network outage doesn’t quite compare the intensity of the CCIE crucible, the eight hour slog during which one mistake can cost you months of studying.  The exams where I took multiple attempts required me to seriously up my game between tries, and each time I’ve come out a stronger thinker.

Non-Technical Skills

Passing the CCIE exam honestly is a challenge, plain and simple.  It requires discipline, persistence, extreme attention to detail, resourcefulness, ability to read documentation carefully, and time management skills.  While you won’t pass the CCIE without having those skills to begin with, there is no question that the rigors of CCIE preparation will help to hone them.  You will be a better person all around if you submit yourself to the discipline of passing the exam honestly.

It’s a bit like studying the martial arts.  Most martial artists will tell you that, in addition to being able to deliver a mean axe kick to the head, they have achieved self-discipline and confidence from their practice of martial arts, and that they find these skills apply elsewhere in life.  It’s true of any rigorous pursuit, really, and definitely true of the CCIE.  Personally I have no doubt that the many hours of laborious study have made me a more detail-oriented and better engineer.

Employment value:  Employee’s perspective

In my own case, my CCIE certification was directly responsible for my getting hired at Cisco TAC in 2005.  It was absolutely essential for me to move to a VAR in 2007.  Believe it or not, it was critical in my getting a job at Juniper in 2009, and having two CCIE’s and a JNCIE helped open the door to be re-hired at Cisco in 2015.  Obviously, employers see value in it.

You still see CCIE certification listed as a requirement in many job descriptions.  Without one you are cut out of those positions.  Many employers see it as a key differentiator and qualification for a senior position in network engineering.  VARs are still required to have a number of CCIEs on staff, so for some positions they will only talk to someone who has a CCIE.

Rarely is it the only qualification, however, and rarely is it enough, on its own, to get you hired.  Shortly after I passed my second CCIE I got laid off from the VAR where I worked.  I ended up interviewing at Nexus IS (now Dimension Data), another VAR, a couple months later.  Having been out of work I was rusty, and the technical interviewer grilled me mercilessly.  I’ve stated in the past that I don’t find this sort of thing productive, and I was uncomfortable, and didn’t have a great performance.  Despite the fact that I had two CCIE’s, and despite the value of such credentials for a reseller, I got a call from HR in a few days telling me that they chose not to hire me.  (These things often work out in the end;  I am in a good place now, and I wouldn’t have made it here if I had stayed in the VAR world.)  The entire experience reinforces a point I made in my Cheaters post:  don’t think an ugly plaque and a number are a guarantee of employment.

That said, there is little doubt that a person with X skill-set and a CCIE is in a better place than a person with just X skill-set.  As I said above, the certification simply opens doors.  You cannot rely on it alone, but combined with good experience, the certification is invaluable.  I would never have made it this far without one.  Now it’s true that there are some very talented and knowledgeable engineers who don’t have, and in some cases disdain, the CCIE.  Many of these folks do quite well and have successful careers.  But again, add the famous number and you will always do better in the end.

Employment value:  Employer’s perspective

Let’s turn it around now, and look at it from the perspective of a hiring manager.  When I have the stack of resumes in front of me, how important is it for me to look for that famous CCIE number?

This is a much harder question to answer, and I’ve touched on some of the problems of the CCIE above.  If you are looking at somebody who has a CCIE #1xxx, you know you are getting someone with age and experience.  But if you’re looking for someone to do hands-on implementation work, will this be the right candidate?  Maybe, of course, if Mr. #1xxx has been doing a lot of that lately.  The point is, you can’t tell from the number alone.

Now let’s say you are looking at CCIE #50xxx.  Does her CCIE number mean she will be the perfect fit for the hands-on job?  Quite possibly, because she has a lot of recent hands-on experience.  Will she be unqualified for the management role because she lacks the experience of CCIE #1xxx?  Who knows?

The point is, from a hiring manager’s perspective, the CCIE is simply one piece of data in the overall picture of the candidate.  It tells you something, something important, but it’s not nearly enough to be sure you are picking the right person.  If you are hiring for a VAR and just need a number to get your Gold status, then the value of the CCIE jumps up a little higher.  If you are just hiring a network engineer for IT, you need to take a host of other factors into account.

At the end of the day, I hope that people don’t make hiring decisions based on that one criterion.  I had one job where we hired in a CCIE (against my advice) because the hiring manager believed anyone in possession of a CCIE number was a genius.  (See “The CCIE Mystique”, here.)  The guy was a disaster.  Did he pass by cheating, or was he just good at taking tests?  I don’t know.  I’ve also known many, many extremely smart and talented network engineers who never got their CCIE, including several who just couldn’t pass it.  Keep that in mind when you are getting overly impressed with the famous four letters.

SDN and the New World Order

I would like to conclude this post, and this series, with a few thoughts on the relevance of CCIE certification in the future.  Various people in the industry, most of whom hold MBA’s in finance or marketing, tell us that CLI is dead and that the Ciscos and Junipers will be replaced by cheap, “white-label” hardware.  Google and Facebook are managing thousands of network devices with a handful of staff, using scripting and automation.  A CCIE certification is a waste of time, according to this thinking, because Cisco is spiraling down into oblivion.  Soon, it will all be code.  Learn Python instead.

I’d like to present my thoughts with a caveat.  I’m not always a great technology prognosticator.  When a friend showed me a web browser for the first time in 1994, I told him this Web thing would never take off.  Oops.  However, I have a lot of experience in the industry, and right now my focus at Cisco is programmability and automation.  I think I have a right to an opinion here.

First of all, there is no question that interest in automation and programmability is increasing.  All of the vendors, including my employer, are devoting a lot of resources towards developing and increasing their programmability and automation capabilities.  I’ve spent my first year here at Cisco working on automating network devices with Puppet, Ansible, Python, and our own tools like DCNM.  We’ve been working hard to build out YANG data models for our features.  I barely touched Expect scripting before I came here!  Customers are interested in managing their networks more efficiently, sadly, sometimes with fewer people.

Let me look at this from a wider perspective.  As computing power increases, are humans redundant?  For example, as a pilot I know that it’s entirely possible to replace human pilots with computers.  Air Traffic Control could relay instructions digitally to the computers that now control most airplanes.  With ILS and precision approaches, airplanes could land themselves at most airports.  But would you get into an airplane that had no human pilots?

The problem is, even in the age of Watson, computers react predictably to predictable circumstances, but predictably badly to unpredictable circumstances.  Many of the alleged errors that are introduced with CLI will still be introduced with NETCONF when the operator puts in the wrong data.  Scripts and automation tools have to power to replicate errors across huge numbers of devices faster than CLI.

At the end of the day, you still need to know what it is that you’re automating.  Networks cannot go away.  If all the Cisco and Juniper boxes out there vanished suddenly, our digital world would come to a screeching halt.  We still need people who understand what switches, routers, and firewalls do, regardless of whether they are managing them with CLI or Python.  Look at the cockpit of a modern airplane.  The old dial gauges have been replaced by flat-panel displays.  Do you think pilots no longer need to study weather systems, aerodynamics, and engine operation?  Of course they do!  Just the same, we need network engineers, a lot of them, to make networks work correctly.

As to whether Cisco itself goes away, who knows?  Obviously returning here, I have a great deal of faith in the company and its ability to execute.  I think that a lot of the industry hype is just hype.  Sure, some of the largest network operators will displace our hardware with Open Compute boxes.  Sure, it will hurt us.  But we are still in most networks and I don’t see that changing for a long time.

Closing thoughts, and a postscript

I began this series with an article on what I called, “The CCIE Mystique.”  I don’t know if that quite exists the same way it did in 2001 when I started this journey, but I suspect it is still there a least a little bit.  Those of us who have been through it have had the veil pulled back a bit, and we see it for what it is:  a very hard, but ultimately very surmountable challenge.  A test that measures some, but not even close to all of the skill required to be a network engineer.  A ticket to a better career, but not a guarantee of one.

Being back at Cisco has been an amazing experience.  This is the company that started my career, and that has built an entire industry.  I’d be happy to finish things out here, but I have no idea in this age of mass layoffs whether that will be possible.

I had an interesting chat with the head of the CCIE route/switch lab curriculum not long ago.  I must say, that for whatever complaints I have about the program, I was quite impressed with him and his awareness of where the program is strong and where it is lacking.  I have no doubt that those guys work very hard on the CCIE program and it’s always easier to sit on the sidelines and complain.  Ten years ago, when I started this journey, I wouldn’t have believed I would be at Cisco, helping to develop and market products, with a lab stocked full of gear, chatting with a CCIE proctor.  Everyone’s path is different, and I certainly cannot promise my readers that investing in one certification will land them a pot of gold.  There were a lot of other factors in my career trajectory.  But at the end of a decade (actually 12 years as I write this), I can say that the time, money, and effort spent in pursuit of the elusive CCIE was well worth it.

As I close out this series I hope that my little autobiographical exclusion was not too self-indulgent.  I hope that those of you who are new to the industry or studying for the exam got some insight from the articles.  I hope that any other old CCIE’s who stopped by relived a few memories.  Thanks for reading the series, and good luck on your studies!

Ivan Peplnjak referenced a piece by Robert Graham called “Why cybersecurity certifications suck,” which I found amusing considering that I am doing some question writing right now.  I’ve certainly been a member of the chorus of complainers in the past, in various venues, but I have to say that my experience on the writing end of things has changed my perspective a bit.  I’m currently working on CCIE R/S questions, and previously worked on Juniper certifications.  Graham references a question which is pure trivia, and then attributes the problem to question writers who are “only one short step ahead of their students” because they “lack experience and deep knowledge.”

Writing and Writers

I can’t speak for every exam, but these statements are definitely incorrect for the Cisco and Juniper examinations I’ve worked on.  In both cases, I was required to prove myself to be a “subject matter expert” based on both credentials and work experience.  Obviously this does not mean I am a SME on every single topic on the exam, but in both cases I was also given the choice on which topics to write on.  I was never forced into writing questions on a topic that wasn’t familiar.

Writing exam questions is not easy.  First, let’s consider the motivation of the question writer.  Some are professionals in the training department of the company or organization whose exam they are writing for.  These people often produce high quality questions because they have some formal training in the subject and have done it a lot.  They are writing the questions simply because their job requires it of them.

Others, like myself, have a day job.  We do the question writing on the side.  We are given some training and extensive guidelines on how to write questions.  Usually there is some extra incentive for us.  At both Juniper and Cisco, I was able to recertify my existing certifications by writing questions, thus avoiding the dreaded recert exam.  Folks like us pose two problems:  (1) We are not experts at writing questions and (2) we have a motivation to get as many questions out the door as possible.  Given number 2, there can be an overwhelming desire to pick a topic, go dig through the relevant doc until you find footnote 7 on page 135, and drop a quick trivia question.

The hardest questions to write are ones that test thinking and not mere trivia.  I have spent hours on questions of that type.  They are my favorite to write, but often require drawing a diagram, setting up the scenario in the lab, collecting outputs, and pulling it all together into a question with right and wrong answers.  Given the difficulty, you can see why many writers skip right to the easy trivia, especially if they are in it to recertify.

Most exams are in multiple choice format these days, although there are several that use simulators and fill-in-the-blank style questions.  This is just a limitation of administering so many exams to so many people.  I wish we could do essay questions on the CCIE written, but it’s not happening.

Wrong Answers

Interestingly enough, the hardest part about writing multiple choice questions is coming up with the wrong answers.  That’s because they need to be really wrong, but also plausible.  Really wrong in that if the question says “pick the correct 2 out of 4”, it’s no good if 3 out of 4 are actually correct.  It’s very easy to accidentally write a wrong answer that is correct.  Plausible because they cannot be so obviously wrong that they give away the correct answer(s).  Let me take an example.

Which two cities are in Asia?  (Choose two.)

A.  Beijing
B.  Golden Retriever
C.  Singapore
D.  Banana

In this case, even if I didn’t know what cities were in Asia, I could easily find the right answer because I know B and D are not cities.  A better question would be:

Which two cities are in Asia?  (Choose two.)

A.  Beijing
B.  Warsaw
C.  Singapore
D.  Lisbon

However, the closer you make the wrong answers to reality, the more likely you will make one that is accidentally correct!

Writing Questions

So how do I write questions?  Usually I go through the topics assigned, and pick one that I am comfortable with.  Then I read through books, configuration guides, and other material on that subject until I think of a suitable question.  This could be factual, or it could be more of a scenario.  I generally write the question and then the right answer(s) and start working on the wrong answers.  After I’ve written it, I review it several times.  Every exam I have worked on required a reference document, so I always make sure to keep track of where I got the idea.  I try to put myself in the perspective of the examinee.  If I were studying this blueprint, where would I look for information?  How far into the document would I read?  At what point would I consider myself to have mastered the subject, and what knowledge would I have then?

Cheaters

It is very important for vendors such as Cisco to maintain the integrity of their exams.  Unfortunately, there is an army of professional cheaters who steal questions and post them verbatim on the Internet.  There are various techniques a vendor can use to combat this, but one of the best is simply to make the question pool so large that nobody can memorize it all.  This in turn puts relentless pressure on vendors to write as many questions as possible.  If you’ve ever used a PDF copy of a test to study, not only are you cheating yourself, you are also putting pressure on us to make more and more questions, which can decrease the question pool quality.

Amusingly enough, when I took my private pilot written exam back in 2005, we studied using the famous red Gleim book, which (legally) had all the questions and answers on the test.  When I took the test, I brought in my calculator and flight computer and never used them, because I had already seen the questions so many times!  I would just look at the question and think, “OK, the answer there is a heading of 270 degrees.”  The only reason I got less than one hundred percent was because I intentionally answered a few wrong, on the advice of my instructor.  Apparently if you show up to the oral portion of the flight exam with a 100% written score, the examiner might think you were arrogant and try to put you in your place with challenging questions.

 

I hope this article provided a little insight into why certification exams sometimes have bad questions.  Please know my intention is not to defend bad questions on a test.  Questions that are poorly written and which make it through Q/A are unfair to test takers and cause unneeded stress and even failures.  I have been on the other side many times, and have experienced the frustration of wrong or unfair exams.  As far as Cisco goes, a number of recent blogs have complained about the quality of the CCIE R/S question pool.  Having worked with the people responsible for it, I can say I have a lot of confidence in them and I think they are doing a good job administering it.

**  NOTE:  I will not answer any specific questions about the exam content I have written, so don’t even try!

 

When I was still a new engineer, a fellow customer support engineer (CSE) asked a favor of me. I’ll call him Andy.

“I’m going on PTO, could you cover a case for me? I’ve filed a bug and while I’m gone there will be a conference call. Just jump on it and tell them that the bug has been filed an engineering is working on it.” The case was with one of our largest service provider clients. I won’t say which, but they were a household name.

When you’re new and want to make a good impression, you jump on chances like this. It was a simple request and would prove I’m a team player. Of course I accepted the case and went about my business with the conference call on my calendar for the next week.

Before I got on the call I took a brief look at the case notes and the DDTS (what Cisco calls a bug.) Everything seemed to be in order. The bug was filed and in engineering’s hands. Nothing to do but hop on the call and report that the bug was filed and we were working on it.

I dialed the bridge and after I gave my name the automated conference bridge said “there are 20 other parties in the conference.” Uh oh. Why did they need so many?

After I joined, someone asked for introductions. As they went around the call, there were a few engineers, several VP’s, and multiple senior directors. Double uh oh.

“Jeff is calling from Cisco,” the leader of the call said. “He is here to report on the P1 outage we had last week affecting multiple customers. I’m happy to tell you that Cisco has been working diligently on the problem and is here to report their findings and their solution. Cisco, take it away.”

I felt my heart in my throat. I cleared my voice, and sheepishly said: “Uh, we’ve, uh, filed a bug for your problem and, uh, engineering is looking into it.”

It was dead silence, followed by a VP chiming in: “That’s it?”

I was then chewed out thoroughly for not doing enough and wasting everyone’s time.

When Andy got back he grabbed the case back from me. “How’d the call go?” he asked.

I told him how it went horribly, how they were expecting more than I delivered, and how I took a beating for him.

Andy just smiled. Welcome to TAC.

My job as a customer support engineer (CSE) at TAC was the most quantified I’ve ever had.  Every aspect of our job performance was tracked and measured.  We live in the era of big data, and while numbers can be helpful, they can also mislead.  In TAC, there were many examples of that.

Take, for example, our customer satisfaction rating, known as a “bingo” score.  Every time a customer filled out a survey at the end of a TAC case, the engineer was notified and the bingo score recorded and averaged with all his previous scores.  While this would seem to be an effective measure of an engineer’s performance, it often wasn’t.

In TAC, we often ended up taking cases that were “requeues.”  These were cases that were previously worked by another engineer.  Imagine you got a requeue of a case that another CSE had handled terribly.  You close the case quickly, but the customer is still angry at the first CSE, so he gives a low bingo score.  That score was credited against the CSE who closed the case, so even though you took care of it, you got stuck with the low numbers.

This also happened with create-to-close numbers.  We were measured on how quickly we closed cases.  Imagine another CSE had been sitting on a case for six months doing nothing.  The customer requeues it, and you end up with the case, closing it immediately.  You end up with a six month create-to-close number even though it wasn’t your fault.

Even worse, if you think about it, the create-to-close number discouraged engineers from taking hard cases.  Easy cases close quickly, but hard ones stay open while recreates are done and bugs are filed.  The engineers who took the hardest cases and were very skilled often had terrible create-to-close numbers.

The bottom line is that you need more than data to understand a person.  Most things in life don’t lend themselves to easy quantification.  Numbers always need to be in context.  Corporate managers are obsessed with quantification, and the Google’s of the world are helping to drive our number-love even further.  Meanwhile, reducing people to numbers is a great way to treat them less humanly.

Introduction

I’ve been side-tracked for a while doing personal articles, so I thought it would be a good time to get back to some technical explanations.  Seeing that I work for Cisco now, I thought it would be a good time to cover some Cisco technology.  My focus here has been on programmability and automation.  Some of this work has involved using tools like Puppet and Ansible to configure switches, as well as Python and NETCONF.  I also recently had a chance to present BRKNMS-2002 at Cisco Live in Las Vegas, on LAN management with Data Center Network Manager 10.  It was my first Cisco Live breakout, and of course I had a few problems, from projector issues to live demo failures.  Ah well.  But for those of you who don’t have access to the CL library, or the time to watch the breakout, I thought I’d cover an important DCNM concept for you here on my blog.

Continue Reading

When I first started at TAC, I wasn’t allowed to take cases by myself.  If I grabbed a case, I had to get an experienced engineer to help me out.  One day I grabbed a case on a Catalyst 6k power supply, and asked Veena (not her real name) to help me on the case.

We got the customer on the phone.  He was an engineer at a New York financial institution, and sounded like he came from Brooklyn.  I lived in Williamsburg for a while with my mom back in the 1980’s before it was cool, and I know the accent.  He explained that he had a new 6k, and it wasn’t recognizing the power supply he had bought for it.  All of the modules had a “power denied” message on them.

I put the customer on speaker phone in my cube and Veena looked at the case notes.  As was often the case in TAC, we put the customer on mute while discussing the issues.  Veena thought it was a bad connection between the power supply and the switch.

“Here’s what I want you to do,” Veena said to the customer, un-muting the phone.  “I used to work in the BU, and a lot of times these power supplies don’t connect to the backplane.  You need to put it in hard.  Pull the power supply out and slam it in to the chassis.  I want to hear it crack!”

The customer seemed surprised.  “You want me to do what?!” he bristled.

“Slam it in!  Just slam it in as hard as you can!  We saw this in the BU all the time!”

“Hey lady,” he responded, “we paid a couple hundred grand for this box and I don’t want to break it.”

“It’s ok,” she said, “it’ll be fine.  I want to hear the crack!”

“Well, ok,” he said with resignation.  He put the phone down and we heard him shuffle off to the switch.  Meanwhile Veena looked at me and said “Pull up the release notes.”  I pulled up the notes, and we saw that the power supply wasn’t supported in his version of Catalyst OS.

Meanwhile in the background:  CRACK!!!

The customer came back on the line.  “Lady, I slammed that power supply into the chassis as hard as I could.  I think I broke something on it, and it still doesn’t work!”

“Yes,” Veena replied.  “We’ve discovered that your software doesn’t support the power supply and you will need to do an upgrade…”

Note:  This article was originally posted in 2016.  Since that time, the CCIE program has changed the process for earning a CCIE, and the separate written exam is no longer used.  This means that the problem of people claiming to be a “CCIE” when they have only passed the written exam is no longer the case.  I’m leaving the article as is for now, but will modify it in the future when I have time, to reflect the new circumstances.  Regardless, you should never claim you have a certification when you have only passed a part of the requirements.  (ccie14023, Feb 2020)

In this article in the “Ten Years a CCIE” series, I look at the question of cheating.  Is it possible to cheat on the CCIE exam?  And what does cheating do to the value of the certification?

Yes, you can cheat on the CCIE

Shortly after I passed my Security exam I spoke with the first CCIE to pass the Voice exam. He took a beta version while he worked at Cisco. I commented that I valued my CCIE so much because it was simply impossible to cheat at the exam. It wasn’t a written exam; you couldn’t just walk in knowing the answers; you had to think on your feet. He laughed at me and explained my ignorance.

A lot of people cheat on the CCIE lab exam, he said. Either they work in groups and share the contents of the exams  they’ve seen, or else they get copies of the exam from unscrupulous vendors on the Internet. Then, having seen the actual exam, and having researched the difficult problems, and configured it several times in their lab, they can pass with ease.

I was quite shocked to hear this. I had always studied alone, and when I started down the CCIE road, I didn’t just want to pass the exams, I wanted to beat them. I didn’t just want the CCIE, I wanted the CCIE mystique. I was flabbergasted that people would want the certification without the work. Of course there is a great appeal to gaining something so valuable with minimal effort, but how are you going to make it through a job interview?

The stupidity of cheating

I had encountered rampant cheating in graduate school. This was at the dawn of the Internet, and I saw that many of my fellow students ripped off entire papers from the Internet. We used to send our papers to each other via email, and occasionally I would paste a snippet into AltaVista (Google not being available yet), and often I would hit upon the original work that they’d stolen. Leaving aside the ethical issues of stealing someone else’s work, or ripping off the questions and answers for an exam, there is a practical downside to cheating. You are claiming a credential that you haven’t earned. I remember conducting a job interview of a girl with a Masters degree from the same program as myself. I asked her the subject of one of her papers, and it had to do with routing protocols. She couldn’t answer even the most basic questions about the content of her paper. It was obvious that she cheated her way through the program. And she looked like a complete fool claiming to be a “master” of a subject about which she knew nothing.

Sometimes engineers I know roll their eyes when they hear I have a CCIE. They have encountered one of my fellow “experts” only to find that he seemed hardly an expert at all. Since I know that it’s possible to cheat on this exam, I’m convinced that many of these so-called CCIE’s cheated on their exams. They look like fools as did the girl with her Masters degree.

I’ll talk more about the value of the certification and later post, but one thing to keep in mind is that there’s great value in the study process. There’s great value in learning. And if you study for the test as you are supposed to study for it, you’re guaranteed to learn a lot.

A blurry ethical line?

I, like almost everybody these days, passed my exams using material from legitimate vendors, primarily Internetwork Expert and IPExpert.  (The latter has closed their doors.)  These vendors provide quite a lot of material, but their signature product is a book of sample exams designed to prepare you for the real thing.  This brings up a question.  Presumably some of the scenarios covered by the “legitimate” vendors are scenarios that might come up on the real lab.  After all, how many ways are there to configure BGP?

Interestingly enough, any exam has to provide a certain amount of information to test-takers beforehand.  The CCIE exams have detailed blueprints which guide candidates in their studies.  It would be impossible to take and pass an exam without such advance information.  With merely a blueprint in hand, it would be possible to construct some kind of sample exam, but do vendors simply build them off the blueprint?  Or do they get information from candidates and use them to build their tests?

The ethical lines can be blurry, but one thing is for certain:  studying for a CCIE exam using an actual copy of a real test is blatant cheating and disgraceful behavior.

Cheating on the written exam

Cheating is also rampant on the written exam. This is even the case among CCIE’s who are recertifying, perhaps especially the case. As I mentioned in my recertification post, taking an exam every two years, especially a hard one, is a big hassle. Many CCIE’s get lazy about the process. There are vendors who will sell verbatim copies of the tests. There is still, of course, some work involved. Someone with a copy of the test has to actually memorize all the answers. But it is far easier than going into a test blind.

My last re-certification was quite painful, and yet I refuse to use any sort of brain dump.  Instead, I built an Anki database of questions.  It wasn’t perfect, and it took a couple of failures for me to build a database that had sufficient coverage.

False CCIEs

Another way to “cheat” is simply to falsely claim CCIE status.  Anyone with a CCO account can verify if somebody actually has a CCIE, and whether they are active, but oftentimes employers just don’t bother to check.  When I was at Cisco HTTS, we were very close to hiring someone for a CCIE-requiring position, when I ran his name through the tool.  His CCIE had been revoked because he hadn’t recertified.  He was clearly embarrassed, and had simply been too busy to recertify.  While I can empathize with that, the fact was that he did not have a CCIE and would need to take both written and lab to get it back.  We didn’t hire him, but it amazed me that nobody had bothered to check early in the hiring process.

There is also a large group of imposters who have passed the written, and somehow think this qualifies them to put “CCIE” on their resume.  I recently saw a poster on a LinkedIn group who gave herself a CCIE (Wrt) title.  I also remember one candidate who put “CCIE Routing/Switching” in huge, bold letters on his resume, with “written” in a tiny font right next to it.  Well, I have news for you.  There is no CCIE written certification.  You either have a CCIE or you don’t.  Pass the lab before you put CCIE anything on your resume.  If you are looking at an employer that is willing to sponsor you for the lab, then by all means, tell them you passed the written.  But don’t claim CCIE status without a number.

What is the value?

We all know that there are a number of CCIE’s out there who should not have the certification.  It reflects poorly on the CCIE community.  There is no question that whatever value the CCIE has is diminished by those who obtained their credential through fraud.  If you are frustrated with the exam and thinking of hunting for a brain dump, remember this:  if you can’t pass the exam, you have no right to call yourself a CCIE.

In my next and final article in the series, The Value of a CCIE, I will take a look at the value of the credential.  Ten years later, do I think it was worth it?  Would I recommend someone take the CCIE exam now?  What do I think the future is for network experts in the world of SDN and automation?