Archives

All posts by ccie14023

New Year’s resolutions are made to be broken, and I haven’t been keeping up with my resolution to do more blog posts.  Now that I am back at Cisco, I am focusing on programmability and automation, and I do have a lot to say.  However, in honor of my return to Cisco, I thought I would post a new Tac Tales entry.  There is a moral to this story.

One day my boss came to me and said my team would be supporting the MWAM module in the Cat 6K.  I had done a lot of Cat 6K work at that point, but I had never even heard of an MWAM, and failed to see why cases on it would be sent to the routing protocols team.  My boss didn’t seem too concerned with my objections, and said, “Just go watch the VoD.”  VoD = Video on Demand.  So, I did.  I watched the VoD, and it started out by telling me how many processors were on the card and of what kind;  what types of buses were used to transmit data;  what kind of memory it had;  and how it interfaced with the Catalyst and its backplane.  Never did the video ever tell me what the card actually did.  I had no idea why one would buy an MWAM or what one would do with it.  I hoped a case wouldn’t come in on the card, and when it did I immediately escalated to engineering because I had no idea what to do.  Fortunately I only ever had one case on the MWAM.  (And the fun thing about coming back to Cisco after 10 years is that I can go look up all these cases I remember and read my notes.  Very cool!)

What is the moral, you ask?  Well, as a Technical Marketing Engineer, a big part of my role is communicating technical concepts clearly to others.  How often have you bought a book or looked at a web page to learn some new protocol, only to find that the description of it begins with packet header formats or state machines?  Fine, but tell me what it actually does before you tell me how it works.  Imagine if you went out into the jungle and encountered someone who has never seen a car.  You wouldn’t start explaining it to him by saying “it uses an internal combustion engine which has a four-stroke cycle of intake, compression, power, exhaust.”  You’d say, “it has wheels and takes me places very fast.”  Now in defense of the MWAM VoD guy, he may have designed his video for people who already knew what the card was.  But often I have found that people make this assumption, and when I backtrack and start at the beginning when explaining something, often people say, “you know, I’ve always been afraid to ask about that, but thanks for explaining it.”

Meanwhile, my second try at Cisco is much more fun than the last.  And thankfully, no MWAMs.  TAC was an great experience and period of growth, but it’s a not a fun job.

I feel a bit of guilt for letting this blog languish for a while. I can see from the response to my articles explaining confusing Juniper features that my work had some benefit outside my own edification, and so I hate to leave articles unfinished which might have been helpful. In addition, WordPress is not easy to maintain and I keep losing notifications of comments, which means that when I am not logging in, I miss the opportunity to respond to kind words and questions.

As it is, my work explaining Juniper to the masses will have to be put on hold, as I have left Juniper after six years and returned to my old employer Cisco! I worked at Juniper longer than I had anywhere else, and it’s amazing to consider that I just closed the door on half a decade. But, I even after attaining my JNCIE I always felt like a Cisco guy at heart, and so here I am again. A few random thoughts then:

1. I interviewed for a number of jobs, and now that I am hired I can say that I really hate interviewing. My interviews at Cisco were very fair and reasonable. Just for the heck of it I did a phone screen with Google and completely bombed it. I’m not ashamed to admit that. I’m not supposed to reveal their questions, and I won’t, but they were mostly basic questions about TCP functionality, and MAC/ARP stuff, and it’s amazing how you forget some of the basics over the years. I wasn’t really interested in working there so I did no preparation, and in fact the recruiter warned me to brush up on basics. I just figured my work and blog show that I am at least somewhat technical. I plan to write some posts on the art of technical interviewing, but I was certainly underwhelmed by Google’s screening process, as I’m sure they were by my performance. I really wanted the Cisco job, and what a difference attitude makes! (Oh, and I completely munged an MPLS FRR/Node & Link protection question, less than a year after passing the JNCIE-SP. Uh, whoops.)

2. I bear Juniper no ill will. It was an interesting six years. When I came on board, during the Kevin Johnson years, it was all rah-rah pep talks about how we were going to be the next $10 billion company (errr, no…) followed by a plethora of product disasters. Killing off Netscreen gave the firewall market to Palo Alto, Fortinet, and amazingly resuscitated Checkpoint. Junos Space was a disaster, and Pulse slightly less so. QFabric was not a bad idea, but was far too complex. You needed to buy a professional services contract with the product, because it was too complex to install by itself. And yet it supposedly simplified the data center? There was a fiasco with our load balancer product. And then came the activist investors with their Integrated Operating Plan. I will permanently loathe activist investors. Juniper was hurting and they just magnified the hurt. There’s nothing worse than a bunch of generic business-types who wouldn’t know a router if they saw one trying to tell a router company how to do its business. They thought they could apply the same formula you learn in B-school to any company no matter what it manufactures or does. Then we had the CEO revolving door.

Despite all of this, as I said, I like Juniper. I did ok there, and there are a lot of people I respect working there. Rami Rahim is a good choice for CEO. I left for personal reasons. They still have some good products and good ideas, and I think competition is always good for the marketplace. For the sake of my friends there, I hope Juniper does well.

3. If you read my bio, you will see that I was THE network architect for Juniper IT, meaning I covered everything. This included (in theory at least) campus LAN, WAN, data center, wireless, network security, etc. I did something in all of these spaces. It was a broad level of knowledge, but not deep. That’s why I did my JNCIE-SP–I was hungering to go deep on something. My new job at Cisco is principal technical strategy engineer for data center. This is an opportunity to go deep but not as broad, and I’m happy to be doing that. The data center space is where it’s at these days, and I can’t wait to get deeper into it.

4. Coming back to Cisco after an eight year hiatus was bizarre. It was cool to pull up all my old bugs and postings to internal aliases to see what I was doing back then. Heck, I actually sounded like I knew a thing or two. I was thrilled to find out I am on the same team as Tim Stevenson, whose work as a Cat 6K TME I admired when I worked in TAC. Just for fun I walked though my old building and floor (K, floor 2) and nearly fell over when I saw that it looked identical. I mean, not only the cubes, but there were these giant signs for the different teams (e.g. “HTTS AT&T TEAM”) which were still hanging there as though the intervening eight years had never happened.

Unfortunately, I have to leave a few in progress articles in the dustbin. First, I shouldn’t really be promoting Juniper now that I am working for Cisco. And second, I’ve lost access to VMM, the internal Juniper tool I used to spin up VM versions of Juniper routers. However, I hope to start posting on Cisco topics now that I have access to that gear. Cisco’s products are generally better documented than Juniper’s, but I promise to fill any gaps I might find. And I will leave my previous articles up in hopes that they will benefit future engineers who struggle with Junos.

Onwards!

I don’t advertise this blog so I’m always amazed that people even find it. I figured the least-read articles on this blog were my “TAC Tales,” but someone recently commented that they wanted to see more… Well, I’m happy to oblige.

The recent events at United reminded me of a case where operations were down for one of the major airlines at Miami International Airport. It didn’t directly impact flight operations, but ticketing and baggage handling systems were down. Naturally, it was a P1 and so I dialed into the conference bridge.

This airline had four Cat 6500’s acting as their core devices for the network. The four switches had vastly disparate configurations, both hardware and software. I seem to recall one of them was running a Supe 1 module, which was even old in 2007 when I took the case. There was a different software version on each of them.

EIGRP was acting funny. As a TAC engineer in the routing protocols team, I absolutely hated EIGRP. EIGRP Stuck-In-Active was my nightmare case. It was always such a pain to track down the source, and meanwhile you’d have peers resetting all over the place. OSPF doesn’t do that, nor ISIS. I once got in a debate on an internal Cisco alias with some EIGRP guys. Granted, I had insulted their life’s work, but I stated that EIGRP was fast, but unreliable and prone to meltdown. Their retort was that properly designed EIGRP networks do not melt down. Great, but when are networks ever properly designed? They are so often slapped together haphazardly, grow organically, and overall need to be resilient when even when unplanned. Of course, those of us in design and architecture positions do our best to build highly available networks, but you don’t want to be running a protocol that flips out when a route at some far end of the network disappears. Anyhow…

The adjacencies on all four boxes were resetting constantly. It was totally unstable. Every five minutes or so, some manager from the airline would hop on the bridge to tell us that they were using handwritten tickets and baggage tags, that lines at the ticket counters were going out the door, etc, etc. Because that really helps me to concentrate. I tried to troubleshoot the way TAC engineers are trained to troubleshoot: collect logs, search for bugs in the relevant software, look for configuration issues. With routing adjacency flaps on switches, always check for STP issues. I couldn’t figure it out.

Finally some high-level engineer for the airline got on the phone and took over like a five-star general. He had his ops team systematically shut down and reset the switches, one at a time. The instability stopped. Wish I’d thought of that.

The standards for a routing protocol like OSPF are written by slow-moving committees, and hence don’t change much. These committees often have members from multiple competing vendors who disagree on exactly what should be done, and even when they do agree, nothing happens fast in IETF committees. Conversely, Cisco owns EIGRP, and they can change it as much as they want. Even their internal committees are nowhere near as bureaucratic as IETF. This means that there can be significant changes in the EIGRP code between IOS releases, much more so than for OSPF, and it is thus vital to keep code revisions amongst participating routers fairly close.

In this case, the consulting engineers for the airline helped them to standardize the hardware and software revisions. They never re-opened the case.

In my previous post, we saw the theory behind hub-and-spoke VPN. We saw how H/S involves multiple VRFs with cross-importation between them, and we traced the basic flow of a route advertised from one spoke to another.
Next, we are going to look at two options for configuring H/S VPNs. In this post, I will cover using BGP as the PE-CE routing protocol without independent route reflectors. In my next post, I will cover OSPF. Finally, I will return to BGP and examine the issues that come up when we use independent route reflectors with hub and spoke VPN. Continue Reading

One of the JNCIE-SP exam objectives I found difficult was hub and spoke VPN. Conceptually it’s not easy, and as is often the case, the documentation is only somewhat helpful. This series of posts is designed to walk you through the concepts of hub and spoke VPN, as well as its basic configuration using BGP, and then OSPF as the PE-CE protocol. Finally I will talk about route reflector issues when using H/S VPN. It is an important topic to master for your JNCIE exam, and if properly explained you will find it’s not as difficult as it seems. This article assumes you are familiar with basic configuration of Layer 3 MPLS VPN, including vrf import and export policies.

Note: I recommend that you only use vrf import/export policies for the JNCIE lab and avoid the vrf-target command for layer 3 VPN.

We’ll take a simple example topology with three sites.  Normally, if these sites are configured in a Layer 3 MPLS VPN, site 1 and site 2 are able to talk directly over the MLPS network. That is, the traffic between site 1 and site 2 is never seen by the hub site.
hs1
This is, of course desirable. Routing the site-to-site traffic through a central site could produce significant delays, especially if the geographic distance between the sites is large. It also would consume bandwidth unnecessarily on the hub interfaces. By default MPLS VPNs avoid this behavior.

Hub and spoke VPN changes this model and actually routes the spoke-to-spoke traffic through the hub site.
hs2
Given the above paragraph, why would anybody want to do this? Well, I would give you the caveat that for expert-level certification exams, you should not spend too much time asking why. Nevertheless, one reason you might use this configuration is to apply some sort of monitoring to the spoke-to-spoke traffic.

You will notice that the hub site has two interfaces. Whether physical or logical, the hub site will need to have two interfaces.

We’re going to blow the picture up a bit, and put some actual routers in to flesh out the picture, and then we are going to look at how the control plane works.

hs-route

Site 1 CE advertises a route (172.255.255.9) to its PE. The PE router then advertises the route via MBGP to the hub PE. The hub PE then advertises it over the top link to the hub CE. The hub CE learns the route and then re-advertises it back to the same PE it learned it from. If you haven’t seen h/s VPN before this may confuse you, but here comes the trick: the two interfaces that connect the hub CE to the hub PE are in different VRFs. Therefore, the hub PE will have a copy of the 172.255.255.9 route in both VRFs. This is the part which makes it a bit confusing.

The hub PE, having re-learned the route from the hub CE, then advertises it out to the site 2 PE, which sends it on its way to the site 2 CE. Since site 2’s route came from the hub PE via the hub CE, the route actually is pointing along the path through the hub site. As a result of this hocus-pocus, the data plane will forward through the hub site. To achieve this we will need to define specific route export/import policies on the PEs and do some cross-VRF importing, as illustrated below.

hs-communities

The spoke PEs do not learn routes directly from each other, but instead only learn routes that have been re-advertised by the hub site. Therefore, they will only import routes into their VRF that have the hub VRF target. They will not, repeat, will not import routes from the spoke VRF target. This may be counterintuitive because they are not importing routes with their “own” VRF target. However, this is how we achieve the cross-VRF importation. Even though they do not import routes with their own target, they do export with it. All routes exported from the spoke sites have the spoke target.

The hub PE is even more complex. It has two VRFs. The spoke VRF imports routes that have the spoke target, but it doesn’t export anything. That’s because we want to force the spoke sites to pull from the hub VRF. No routes are exported from the spoke. Meanwhile the hub instance does not import anything at all. It pulls no routes from MBGP or the other PEs. Instead, any routes in its table (aside from interface routes) it learns from the hub CE. Unlike the spoke, however, it does export routes, which are tagged with the hub VRF target. It is this target that the spokes accept.

In my next article you will see how we configure a null policy to achieve these goals. In the meantime, a little memory trick helped me to configure this. On the hub router, the spoke instance imports but does not export. I named my import policy “spoke-in” which sounds a bit like Spokane, the city. It may not be the greatest memory trick, but you can remember all the rest of the hub PE policies if you remember this. Remember, each VRF has a “null” policy, and they are the opposite of each other. So, if the spoke has an import policy (“spoke-in”), its export policy must be null, and therefore the import policy for the hub must be null and it must have an export policy (“hub-out”). This will become clearer in future articles when I explain the configuration if it’s not clear now.

To summarize this article:

  • Hub and Spoke MPLS VPN routes traffic through a hub site instead of directly between spokes.
  • To achieve this, the control plane (i.e. routing) also follows a hub and spoke model.
  • The spoke routers import and export from different hub PE VRFs.
  • The hub PE has two VRFs, one for sending routes to the hub CE and one for receiving them.

Look forward to the next article which will explain how to configure this with BGP as the PE-CE routing protocol.

Ah, the joys of being an “expert.”  I had forgotten what happens after you pass an exam like the JNCIE.  One of my colleagues starts grilling me on various topics for which I am unprepared, since they weren’t covered on the exam.  MX architecture, MC-LAG, MX virtual chassis, etc.  Be careful what you wish for.  The second you put “expert” on your title some people will take it as a challenge.  Of course, being in a very non-hands-on position, I am not an expert in many of the things day-to-day engineers know quite well.  I certainly know the topics on the test.  Then again, having obtained a CCIE 10 years ago, I know better than to start parading myself around as an expert on anything.  The JNCIE number goes on the blog and the bottom of my LinkedIn, but I am not putting it in my email signature.  Meanwhile, time to start reading Doug Hanks’ book on MX architecture!

Shortly after I went to work at TAC, my first team, which was dedicated to enterprise customers, was dissolved, and I ended up on the Routing Protocols team.  The RP team supported both enterprise and service provider customers, and I had zero experience on the SP side.  There was quite a learning curve ahead.

One day my phone rang with a P1.  I dreaded P1’s.  When a P1 came in, you were thrown head-first into a potentially huge outage with no knowledge of the case in advance.  Often times a case that had been worked by another engineer got raised to P1 and you had to deal with someone else’s mess.

On this particular day, the case was a line card problem on a 12000-series, or GSR.  GSR was a service provider box and I knew nothing about it.  I didn’t even think it ran IOS (it does).  I had no idea where to start.  You would think they would have given me training on every product we covered, but at HTTS, at least when I worked there, the general approach was to throw you to the wolves.

So, I politely put the customer on hold and started running around the second floor of building K where HTTS is located, shouting:  “Does anyone know GSR?  Anybody around to help me?!”  Finally I stumbled across my teammate Abe in the break room stirring up a cup of coffee.  Abe had been a product manager for the GSR before he came to TAC.  “Abe, you gotta help me!  I took a P1 on a GSR and I’ve never touched one!”

“Way to go, grab the bull by the horns!” was Abe’s response.  We rushed back to my cube and Abe walked me through GSR line card troubleshooting.  Abe and I have been great friends ever since that day.

Remember, when you call for support and the person on the other end of the phone sounds like they might not know what they are doing, they probably don’t.  And if they put you on hold they may be running around screaming for help.  It happened a lot in TAC.

Back to the blog, now that the JNCIE-SP is finished. I got #2332. The last time I did an expert-level exam was 2008, and I forgot just how challenging it is. I passed my JNCIP in June and it took me until November, working solidly most of the time, to get my number. It’s been a great experience. I work in a director-level architecture role at Juniper, and I am getting more and more removed from day-to-day, hands-on work. When I was in Cisco TAC, it was extremely technical, detailed work every day. Now it is meetings and PowerPoints. However, my ability to contribute at this level is entirely dependent on my technical expertise, and it feels great to refresh the knowledge and hit the CLI again. They say CLI will be dead with automation and SDN–don’t count on it. They can’t change the fundamental way networks operate, and when you look at SDN solutions, they are a lot more complicated then how they are presented. Being acquainted with MPLS and routing protocols in depth is the best preparation for anything to come, and the only way to learn those topics is at the command line. Period. Continue Reading

For the handful of people who come across this blog and have posted comments, thank you very much for the kind words. This blog is on hold for a bit while I finish up my JNCIE-SP, which I am taking in a couple weeks. I’ve come across a lot of excellent blog post topics from my studying, so I hope to get back into the swing of things soon. Keep an eye out, especially if you are new to Juniper or struggling with some of the less-documented commands.