A new journey with Arista


As the title suggested after many years working in large enterprises, services providers and as of recently partners I am headed to a network vendor.  This is a large change for me but a very good one.  I am tremendously excited for the opportunity.

I am joining Arista for multiple reasons.  I was able to bring in Arista at a prior position and watch the entire network change in a drastic fashion.  We were able to automate many tasks that were previous manual.  It seems like every time I turn around Arista has integration into multiple different technologies ie docker,openstack,golang etc.

I was fortune enough to attend tech field day last year where Ken Duda CTO of Arista networks delivered one of the most awesome presentations on the evolution of EOS and code quality.

Everyone I have met within Arista has been amazing.  This was an easy decision given the talent and passion of every individual I have met within the organization. It fits my career goals of moving towards SDN, continuous integration/continuous delivery and all things automation.  I start late August.


Network Continuous integration using Jenkins,Jinja2 and Ansible.

I have been into the devops and agile life lately.  I have read the following books that have been extremely technology changing for myself..

Devops 2.0
Ansible for Devops
Learning Continuous integration with Jenkins

The Devops 2.0 book is fantastic and @vfarcic constantly updates the book so it is worth the price of admission.  I have said this many times before about Devops.  Devops is simply using open source tooling to run through a workflow in a automated fashion.  The definition of Continuous integration is constantly testing your code to integration into production.

We as network engineers commonly use CLI or manual tasks to put into production because we know what are doing will “Just work.”  We have done it a million times and it has never failed.  We do not take into account human error or issues within the process ie firewall rules, wrong ip address etc.

This is the devops way which should be implemented for anything new configuration wise following the devops practive.


Script/configuration is implemented to Jenkins.  Jenkins schedules the task.  Ansible iterates upon the script.  Checks and balances are done.  The script/config is then ran on the switch.  This is another use case for docker on a switch as this could all be ran in a test container before trying the code on the actual switch.  Finally notifications of the test build being successful are sent out either through slack or through email.

So in this blog post we are going to do something really simple so any network engineer can follow along and get their feet wet into the continuous integration book.  I also highly recommend the three books in which I have posted above.

Our topology is simple
-2 Oracle Virtual BOX VM’s
1.) Ubuntu 14.04 LTS
2.) Arista vEOS running 4.16
-Jenkins is running on the Ubuntu VM
-Ansible is running on the Ubuntu VM
-Python 2.7 is running on the Ubuntu VM

I will start with a simple line of configuration.  NTP is probably the easiest 1 liner of configuration we as network people normally use.  An NTP server can be added to a switch with a 1 liner

ntp server

We are going to make this slightly fancy here with Ansible and Jinja templates as if we ever wanted to change our NTP server we could do it on a large amount of switches in a ansible inventory.  So here is the J2 template it is very simple.  Lets take a look at our Ansible structure first.
group_vars – Will hold all the variables in this case the NTP server config.
veos.yaml – contains the ntp server
inventory – contains all the hosts within the inventory file.
ntpserver.yaml – The ansible playbook
scripts – optional directory this has a python script we will get to that later
templates – directory that holds Jinja2 templates
ntpserver.j2 – holds the configuration “ntp server x.x.x.x”

Lets first check out our ansible-playbook

This ansible-playbook simply uses the veos hosts which are located in the inventory file.
The second step is uses the eos_template which is located in the templates/ntpserver.j2 and applies this NTP server to each host like a giant for loop.


This would simply add a NTP server to a arista vEOS device as long as its in the .eapi file and in the inventory file.

So for those who are following so far the next step is running this through a CI integration.  That is where Jenkins comes through.  Jenkins can execute the playbook and run through its checks and balances.  Here this is extremely simple as we are not doing much.  The work flow is as follows once again..

Here is the job we will run.
Here is a example of the job.

The first step is to run the playbook and run it in  “–Check” Mode.  This will run the play book as a dry run and not make any changes.  This is simply for any sort of error corrections or something that would be wrong with trying to connect to something within the host file.  So if either switch did not connect.

The second step is to start the configuration.  This will apply the configuration on the switch as it is in the Jinja2 templates.

The last step is a Python script shown here.

Alright so enough typing lets go ahead and run the script and hit the console into JenkinsCI.
We were able to run through the tests everything is successful. Checking the switch it shoes the new NTP server of

Lets purposely fail this setup and try to make a NTP server of something bogus that the switch would never take.


This time it failed.  The first dry run will work because it can connect.  The second execution will not run as it will simply fail at trying to add commands to the switch.
Typically at this point either a email or a chat notification is sent out to make the rest of the team aware.

This was a good exercise walking through CI of network changes.  This is for sure the way to go for testing and checking for network changes in live production.


Tech field day VMware NSX at Interop


        I was fortunate enough to go to Interop in Las Vegas on the behalf of tech field day.  So I cannot thank them enough for the opportunity to go to Interop.  For me this was my first Interop I was able to take part in.  Which was really eye-opening and will for sure add many new fortes in regards to new trends in networking.

        I have been using the NSX product for the better half almost 3 years in a large environment.  Before NSX I also leveraged VMware VCNS the predecessor to NSX.  I have written numerous blog posts and scripts all related to NSX.  Alright enough about me and on to the VMware NSX presentations.

        Bruce Davie CTO of VMware networking and security business unit presented upon where VMware is currently on the state of VMware NSX.  VMware’s three largest and most compelling reasons for customers to view or run NSX in their environments comes down to three large reasons.
Agility – Having an open and central API that can talk to all components within the NSX stack for example NSX edge routers,distributed firewall etc.

Security – Allowing true micro segmentation of VM’s at the vnic level and the availability to virtualize security components as virtual machines.

Application Continuity – Allowing NSX related virtual machines to live within the private or public cloud.

Where NSX is today customer wise.
Screenshot from 2016-05-18 09:44:25
Where NSX was 8 months ago at VMworld.
Screenshot from 2016-05-18 09:43:50

        The numbers are quite impressive to double your customers and triple the amount of customers going into production with NSX based networks.  Looking at the different verticles it is quite impressive.  The list of customers shows different use cases between Health care providers, financial institutions , large enterprises and retailers.

       Talking about operations and visibility Bruce was quick to point criticism of lack of operations and visibility once a customer moves to an overlay network.  VMware took those criticisms very heavily and made it a priority to invest into visibility.  Bruce gives a demo which was from VMworld last year.  In the demo Bruce presents vrops which is VMware’s go to tool operationally for all of their products.  This particular demo had the network management pack.  The network management pack to monitor and alert on both the physical and virtual networks.  Traceflow  was also mentioned.  Traceflow will interject real data packets into a NSX virtual machine to another NSX virtual machine for troubleshooting issues to see if a particular service is blocked a long the path within the virtual network.

        VMware has integrated many third-party monitoring tools within their portfolio.  The two which Bruce gave mention Gigamon which allows for a virtual tap directly into the hypervisor or Gigamon can simply strip VXLAN headers to view raw data packets on the physical network and arkin which is a super slick UI that will give performance statistics on both the physical and virtual networks as well as vsphere information.

        NSX everywhere was quite possibly the most intriguing presentation of the day.  Bruce talked about the future of NSX taking what security policies are within the private cloud and move the same security policies to any hypervisor or baremetal machine given in AWS,Azure etc.  We also touched on the point of running NSX on VDI and airwatch VMware’s mobile VDI product.

        We talked about the possibilities of future hypervisors and platforms which NSX will integrate into. There are plans to integrate NSX in the future to Hyper-V.  As of today the NSX transformers is supported on bare metal linux, KVM and vsphere 6.x. NSX-V will continue to operate on vsphere 6.x.

Moving docker containers manually and automated.

This is a blog post that will cover part of my container obsession just the basics.  Ill go over how to manually create a docker container.  Moving a container anywhere.  Publishing the container to docker hub then finally I wrote a quick python script that will take and automate the creation of as many containers as a user wants.

This is all within my home test lab which was using the following
-Ubuntu 14.04 LTS
-Docker 1.11.0
-Python 2.7 (yah yah I know)
-Assuming the docker client and daemon are on the same host.

First things first in typical Debian we will want to go ahead and install docker.  The funny thing about Ubuntu is that there was a process called docker previously. So in the ubuntu world we want to run the following commands

#sudo apt-get update
#sudo apt-get install docker.io
Screenshot from 2016-05-16 13:02:53

With the 14.04 repository 1.6 is the current docker version which will most likely work for this exercise.  I upgraded to 1.11 due to the ip mac/vlan project.  So we will upgrade to the latest and greatest straight from docker source.

#sudo wget -qO- https://get.docker.com/ | sh
This command should grab the latest version off of the docker repo
Screenshot from 2016-05-16 13:05:56

Alright.  So we are about to deploy our very first container within ubuntu 14.04.  The process is really easy.  I love what docker has done with containers to package everything up so it is so simple that it can be ran from a simple one liner.  So lets get into it.

#sudo docker run -dit ubuntu:14.04.1 /bin/bash
Docker run is command to start a container the argument -dit is disconnect and interactive ubuntu 14.04.1 is the container image and /bin/bash is the run command.
Screenshot from 2016-05-16 13:09:01
It is important to note that this ubuntu image is directly from the docker hub.  Within Dockerhub anyone can pull any public image down from anywhere running docker.  So After docker realizes that ubuntu image is not local it will pull the image down.  Now there are 4 pulls.  This is what makes docker really significant.  There are 4 levels of this image.  What is even more awesome is that if I tied something to the ubuntu image I would only have to ever update the level I tied to it.  So I would never have to keep downloading over and over again the ubuntu image.

So the container is running within this system.
#sudo docker ps -a
Screenshot from 2016-05-16 13:14:28
We can see this containers unique ID which I will get into in a bit.  The image it uses.  The starting bash how long it has ran for.  Ports are something I am not going to get into but basically we can expose network tcp/udp ports to each one of these containers.

We can easily kill the container with the following commands
#sudo docker stop CONTAINERID
#sudo docker rm CONTAINERID
The container needs to be stopped and deleted.  Just as a FYI the container can be brought back up at any point.  The files live within /var/lib/docker/aufs/diff using their container ID.  So they will need to be deleted as well.

So lets go ahead and create out very own docker container unique to lets say a project and try to move it from this VM I am using and off to another docker host.

First thing is create a file in whichever directory called Dockerfile it has to be called exactly that.
#sudo vim Dockerfile
Screenshot from 2016-05-16 16:20:01
This file is where docker will build the image from.  So for example the first line is a simple comment.  The from Ubuntu is saying pull down the ubuntu image.  I am the maintainer.  The next two commands tell the container to first update your respo then go ahead and download iperf. The last is a simple write the output to testfile so later we know it all worked.

We are ready to create our docker image.
#docker build -t firstimage:0.1 Dockerfile
#docker run -it firstimage:0.1 /bin/bash
Screenshot from 2016-05-16 16:27:45
We can see that this has iperf installed from the apt-get we did told it to in the Dockerfile.

Now the first option here we can do to get this container off of this host is to manually move it.  Its possible once the docker container is up and running to save it and move it to any docker platform and run it there!

Saving the docker image
#docker save -o firstimage.tar firstimage:0.1

The image is now saved we need to SCP it to another host
#scp firstimage user@machine:/pathtoscpfile
Screenshot from 2016-05-16 16:45:34
The part that always gets me is that this file is only 156MB!

Next lets go ahead and jump on the machine we moved the file to and run the file

#docker load -i  /pathtofolder
#docker run -it

Screenshot from 2016-05-16 16:45:34
So in that tutorial we have successfully moved a container the manual way.

Next we will walk through how to push this to docker hub.  The first thing to do is sign up for docker hub.

So here is my docker hub repository
Screenshot from 2016-05-16 16:53:08
The repository we are going to move firstimage to is going to be called burnyd/iperf-test

We first want to create a docker tag we need to find the unique docker ID
#docker images | grep firstimage
Screenshot from 2016-05-16 17:08:54
We have found the unique image file.  Now we need to create the tag
#docker tag 4ea31abc539d burnyd/iperf-test:1.0
If we check the images again we can see there is a new image created.

So lets go ahead and push this to dockerhub.
#docker push burny/iperf-test:1.0
Screenshot from 2016-05-16 17:11:01
Now if we check our docker hub we should see this image.
Screenshot from 2016-05-16 17:12:19
It is that simple.  I should be able to pull this container and run it on any docker running host lets try a new host.

#docker pull burny/iperf-test:1.0
Screenshot from 2016-05-16 17:15:45.png
So now we should be able to simply run the image.
#docker run -it burnyd/iperf-test:1.0 /bin/bash
Screenshot from 2016-05-16 17:16:49
So there we have it.  We can grab this image from anywhere in the world!

The last section of this blog post is a automated script I created to run docker images. Schedulers are really the way to go like Kubernetes and Swarm.  I simply created a python script that will prompt the user and ask them how many containers and what commands they would like to run I posted it on Github here.

On this test we will create 20 containers.  Let them run then delete them and tear it all down.  We could easily create 20 iperf client streams to a server if we wanted to for example.

Screenshot from 2016-05-16 17:35:33

Okay now lets go ahead and tear it all down!
Screenshot from 2016-05-16 17:36:20
Here is the code for anyone interested without going to git hub.

Screenshot from 2016-05-16 17:37:30


Arista ZTP basics

ZTP within Arista switches makes deploying infrastructure really easy.  ZTP takes away a lot of human errors and allows for zero touch provisioning of switches.  Zero touch provisioning switches really falls in line with a lot of the SDN/Automation craze.  We have all been there when it comes to installing a switch and doing the simple CRTL+C and CTRL+V into note pad.  It never really works out to well.  From personal experience at my last position we were able to end the configuration of switches to facilities to plug them in after a simple script was ran per environment.

In this blog post we will work with the the following environment….


I have 4 VM’s EOS-1 to EOS-4 within VMware.  vEOS inside of VMware is a easy Install.  Once all 4 vEOS VM’s are loaded a Ubuntu 14.04 LTS VM will be needed.  That VM will require a install of ISC DHCP Server  and  Arista ZTP server.  For the Arista ZTP server simply follow all the apt-get packages and it should install rather easily.  So once that is out of the way for vEOS make sure to go ahead and change the default ZTP method to mac address for all the vEOS VM’s.

Now that everything is installed lets take a look at how ZTP officially works.


The switch will come online.  In this case vEOS1 -> vEOS4.  The switch will then try to send out a DHCP discover on the management interface first then all interfaces.  Once the switch receives an IP from the DHCP server the switch will then receive a file from the DHCP options.  That is step two.  This file is generally called a “bootstrap”  a bootstrap file is simply  a python script which tells the switch where to grab its config from.  After the script has been downloaded and all script configurations have been applied the switch will then reboot.

Okay thats great but how does that work…

Here is a screen shot of my DHCP server on the Ubuntu 14.04 LTS VM


The is really where the magic happens.  So if everything was followed correctly in the previous guide of the arista ZTP server on the Ubuntu 14.04 VM you should have your ZTP server running on port 8080.  So when the VM is brought online it will receive its bootstrap file directly from the ZTP server.   Lets reload one of my switches and watch the ZTP server at the same time.


From the switches perspective it booted up and had zero configuration on it.  It sends out a DHCP request first and received the IP address  The DHCP server through a DHCP option then instructed the switch to get its Bootstrap file from  It then downloads the boot strap file.  After downloading the bootstrap file it will then execute the file and reboot.

From the Ubuntu 14.04LTS servers perspective / ZTP server.


The First line says that the node 005056a878fa is requesting the bootstrap file.  Now what I will explain next is how would the ztp server know this is a distinct switch ie ToR5 vs ToR1.  When these first boot up they receive either a system mac address or a serial number.  I just used the mac address feature.  So not to go off path here but 005056a878fa is the switches system mac address.  I will explain the next few lines in a little bit.

So far we know some what how this process works but how do we know where to store the configuration etc for each device?  How do we make each switch unique?

In the following file. /etc/ztpserver/ztpserver.conf


The location is where the ztp files are I will explain later but boot strap for example is located within /usr/share/ztpserver/bootstrap.  The same with image files etc.  Under the unique identifier I chose systemmac.  Serial can be used as well and server url will be where you can run ZTP server.

So under /usr/share/ztp/server there are a list of the VM’s configuration for ZTP I have created.  Lets go to the node where I just ZTP’d.

/share/ztpserver/nodes/005056a878fa/ is where the node information is.  So each time a new switch is created all the information should be under its own directory within the nodes folder with its MAC address.  There are 3 files located within this directory.

-Startup-configuration – Holds the configuration for the unique switch

-Definition – This is one of my favorites with ZTP I will get into but you can have a definition say something similar to make sure when the switch boots it is always on a certain version of code.  Or download this batch,python or GO script.

-Pattern – Pattern allows to dynamically build the environment via LLDP neighbors if need be.

So here are some snippets of my startup-config,definitions and pattern.  My pattern file is default.  But these files are all needed by ZTP.  Keep in mind I built the configs before hand they just ZTP.  Later in a few days or weeks I will get a ZTP going where it will build the configurations based off of a simple python script.




Screenshot from 2016-05-09 17:28:37


Screenshot from 2016-05-09 17:28:54

In my definition files the very first action is to install_image.  What this simply does is make sure that is the current image.  The second takes the dnsscript from /usr/share/ztpserver/files/scripts/dnsscripts and sends it to the the switch.  Now what this does is simply add a DNS A record of the switches management address each time is ZTP’s.  Its  a simple bash script I put together here.


What the script says is do a NSUPDATE to the DNS server I have here at home.  Add host which is the string that is returned with hostname then the MGTINT is the ip address with a complex grep of the ifconfig ma1 interface which is the management interface.  I need to go ahead and do this same thing for all interfaces with some sort of fancy for loop here in the future.

So once that script is sent to the switch we will use the CLI once the script is booted within the configuration with a event-handler.


Lastly, I wanted to created a quick python script that will simply connect to all 3 vEOS switches and write erase / reload each of the switches.



Right now its a build up tear down for any type of testing I want to do.  In the future.  I for sure want to get a automated script that will build the configuration and the nodes directory automatically for me once its finished.  But for right now I can wr erase and remove all nodes.  I also have more DNS entries to make for all the interfaces.




Cloning VM’s from powercli then automatically add to a NSX load balancer.


This environment will be built out from the script.  It is a simple powershell script from powercli.  I am not that great with powercli I was able to learn a lot about powercli this week with some given down time.  It really is not that hard.

The script can be found on my github here the script basically first clones a vm from a template as many times as it is in the range.  After cloning it will then return back the vsphere mob id for the VM’s as NSX load balancer uses either a mob reference or it can use an IP. Im not sure how to power on the VM and then grab the IP?  I guess I could have done that but I really do not have the time and this was just for a small fun project.

After the VM’s are created the NSX portion takes over.  I could not figure out for the life of me how to get powershell to communicate properly with the NSX api.  Thankfully Chris Wahl had a really good blog post on how to do just that followed up with hit github on how to do so.

There are three effective API calls.  The first is to create the application profile.  I put in 443 but that can be changed, the second is to create the back end pool I am also assuming that snat is being used.  The third then creates a virtual IP with that Pool inside of it.  I need to add more to this logic.  Like returning the pool in a string etc… I just ran out of useable time today unfortunately.   But for the most part the script should work within powershell.

NSX Python automated build

I figured I would share this NSX python script to build an entire environment.  I worked with VMware a few months ago to get a lot of this going.  Some of this is my code some VMware’s.  But generally it builds out this environment.  This is taking into assumption that the NSX manager appliance is built out connected to vcenter, clusters are prepared for NSX vibs and controllers are deployed.  The click click click click has always killed me.  This should take up to 5-10 minutes to run all three scripts total.


This is all within my Github page Here

Here are some screen shots of how the script works..

Create the distributed logical router

Screenshot from 2016-04-18 08:57:46

Create routing on the distributed logical router for bgp purposes

Screenshot from 2016-04-18 09:07:42

Create 4 Edge service gateways with BGP peering to the top of rack switches and DLR.  BGP is created, syslog is created and firewall is disable on the edges.

Screenshot from 2016-04-18 11:00:49

Everything you need other than the network build can be found within the vsphere MOB https://vcenterip/mob – > content – > root folder.


Everything is within the github repository.

There is a A side VLAN 1085(only on ToR1) B side VLAN 1086(only on TOR2)

Each Logical switch is unicast mode

Edge firewall is disable,ECMP is enabled, Syslog is created.

This was setup in three scripts because I had to use each of these all the time to either convert a VLAN backed environment to VXLAN or swap over a OSPF environment over to a BGP environment.

Building Hypervisor leaf spine overlay network with BGP

This has been long overdue.  In this blog post I will explain why a leaf spine model achieves the best scale model for a overlay network.  I was recently on the #packetpushers podcast in the design and build show for BGP within the data center.  We talked about why BGP is the best that we currently have for building a leaf spine infrastructure.  I am big into VMware’s NSX but this sort of topology would be able to relate to any overlay model using the same principles. This will be a rather large post with the following technologies

1.)Leaf Spine architecture
a.)Spine layer
b.)Leaf layer
c.)Physical links
d.)East West bandwidth

a.)BGP Peerings
c.)AS Numbering
d.)Community Strings
e.)Dynamic Peering

a.)NSX edge router placement
b.)VTEP Communication

Here is your typical Leaf Spine infrastructure.


Spine Layer


A common misconception here is that the spine switches have to be linked together.  This is due to the prior ways of thinking with first hop redundancy protocols.Each connection from leaf to spine is a point to point layer 3 routed link.  Spine switches are never connected together.  As their soul purpose is to provide east west connectivity for leaf switches.  So any traffic that egresses a leaf switch should simply pick via some ECMP method of landing on either spine to reach another leaf switch.  Spine switches are very similar to “P” routers in a MPLS design.  Each spine is also within the same BGP AS#.

Leaf switches

Each Leaf switch has its own purpose in this environment. Starting from left to right.  The transit leafs provide connectivity to anything leaving the environment.  When traffic egresses an environment typically we would send it to either another data center,internets or some sort of vendor connectivity or public/private cloud.

Services leaf in a design is generally where you put your external services.  This can be a mixture of bare metal and virtual devices.  I would suggest putting load balancers,AD/LDAP and any type of IP storage in this environment.  Typically load balancers use source nat to have traffic ingress back through the load balancer after leaving a VM.  In the future I will experiment more with hardware based VTEPs.

Edge/Management rack is for connectivity for our NSX or overlay networks.  This is where our NSX routers peer via BGP with the top of rack switches and provide connectivity for all of our compute subnets.

Compute racks.  Once we have our edge rack connected this is where we put all of our compute racks.  So our clusters where have ESXi hosts running our Web,APP and DB cluster related VM’s.

The physical links within this infrastructure from leaf to spine have to be the same speed.  So if you built your environment for 40GB/s links and 100GB/s came out the week after and is the new hotness you are stuck at 40GB/s.  BGP is a distance vector protocol or what I would like to call a “Glorified next hop collector”  Bandwidth is not taken into consideration.  So a 40GB link is the same as a 100GB link.  Do not worry I will explain why you can scale out more spines and it should not matter!

East-West traffic is the largest driving purpose for a leaf spine infrastructure.  Lets take 2 VM’s across two different leaf switches for example.

I only included one leaf switch on each compute rack for simplicity.  As the drawing shows each VM to reach each other will land on a leaf switch.  That leaf switch has 160GB of bandwidth to reach the other leaf switch.  This seams like overkill at the moment but once you start layering a lot of web,app and db like applications thousands per different rack this makes a lot of sense.  So getting back to our previous demonstration with physical link nodes if we find that we need to add more bandwidth there is nothing stopping anyone from adding another spine and one more 40GB link for an additional bandwidth.  Most implementations I have experienced use the trident T2 which typically uses 48 10GB ports and 4 40GB ports.  So 4 spines is the most I have seen at the moment.


Why BGP?
BGP historically has been given a bad reputation when it comes to convergence as the timers are slower and it was harder to use than your usual IGP that a network person could turn on within a few lines of CLI.. eww CLI!

OSPF is really not a applicable choice in this design as it is typically really difficult to filter with OSPF.  EIGRP is our of the question to due it being proprietary.

BGP has made vast improvements in the protocol.  It is enterprise ready.  BGP is has quicker timers and we can make it dynamic now.

The peerings in a BGP leaf spine architecture are rather easy.  iBGP between each leaf switches and eBGP between each leaf to spine connection. ECMP is rather vital in this topology as BGP by default DOES NOT leverage multiple links in a ECMP fashion.  So generally it has to be turned on.

Community strings are vital.  In the past network people have used prefix-list,access-list and route-maps to control traffic leaving a routing protocol.  They still have their uses’s today but generally traffic leaving each environment should have a community string that matches its BGP AS.  So for example if compute rack 1 uses 65004 it should use a community string of 65004:100.  What works out really well for advertising subnets in the environment dynamically is leveraging the transit switches to to aggregate all of the community strings into one large community string for outbound advertisements to other data centers so it is dynamic.  Today trying to use prefix-lists to control traffic that potentially touch a large amount of routers is less than ideal.  If filtering is necessary on the edge routers that is about the only place I would apply prefix-lists to filter traffic.

Dynamic BGP

The first I heard about this was 2-3 years ago with MPLS routers.  Cisco has moved this technology in all of their latest releases.  I have also tested this with Arista switches.  The idea is that you have a subnet you use for BGP for virtual routers.  Lets say it is and all of your NSX edge routers are located on that subnet within the same BGP AS you can dynamically bring up BGP peers on that network.  I like this as there is no need to add neighbors or make a physical switch change.


So any new virtual routers within the network talking to the physical switch in this scenario will automatically peer with the physical switch.  Now here is the tricky part.  If you are using multiple tenants within the same physical infrastructure and they need to talk between each other as-override is needed.  However, everything should just follow the default route that is advertised.  I do this just in case a default route is lost.


NSX or a hypervisor based overlay is what really scales in this environment.  In our edge rack we place our NSX edge routers that peer with the physical network.  These routers advertise our address space where our VM’s live.  Since NSX 6.1.x days they support ECMP.  Since their latest release now in 6.2.x NSX supports the use of seeing BGP AS paths within a given route which was not there prior.


The compute racks are what make the east west VM to VM connectivity possible.  Within NSX each hypervisor terminates a tunnel known as VXLAN from one hypervisor to another to overlay layer 2 segments.


I have written about this before. The idea here is that the communication is from VXLAN vmk or VTEP.  The outer part of the VXLAN packet will contain the source destination VTEP/VXLAN vmk interface.  The inner part of the encapsulated packet will contain the talking with the for example.

The broad idea here is that each network that is related to VTEPs needs advertised into BGP.  So from edge/management to any of the computer cluster will use the VTEP network.  So it needs advertised properly.  All of the data plane traffic should be VXLAN and use those following segments.  The end result should look like the following.


VCP-NV 610 passed


On my second shot November 27th I passed the VCP-NV 610 exam. The test was all multiple choice and tested my knowledge of NSX in all aspects. For this test I walked in and took it the first time not expecting much. I failed by a lot. I had to go back and study things I was weak on. For my production / lab environments I focused a lot on the routing aspect of NSX and not enough on all the other services because routing is generally what I felt most comfortable with. Once I went back and made sure I understood all concepts of NSX I passed the exam on the second try.

To be honest the exam is not structured very well. There were a few questions that seemed like they were unrelated to the entire blue print. It is a good think I did not need high marks to pass the exam. Otherwise, if someone had some hands one experience with NSX ie the hands on labs or home lab it would be a easy pass. I would recommend this test to all my network friends as if you have a valid CCNA you do not need the normal VMware class to get the certification.

NSX troubleshooting commands.

NSX Controller related commands
show control-cluster status – Shows if a controller is connected to the cluster

This command is ran on every NSX controller to make sure that each controller is added to the 3 node cluster. For some reason or another if the NSX controller is not enabled for all processes it either has to be deleted or rebooted then re added.

If for some reason the join is not completed then do the following.
1.) Ping the other NSX controllers for connectivity
2.) Reload controller.
3.) Check NSX install management to see if the controller is setup.

show control-cluster logical-switch vni xxx – This command shows which one of the NSX controllers handles all the functionality for a particular VXLAN/VNI.
In my experience if you do not see a logical switch /VNI associated with a specific controller please do the following.
1.) Make sure the right VNI is being used
2.) Find the logical switch change its mode to multicast then back to unicast quickly.

show controler-cluster logical-switches vtep-table xxx – Discover what hosts participate in a VXLAN

1.)You do not see VTEPs showing up on the controller who owns that VNI/VTEP – Restart the NetCPA agent by logging into a ESXi host and issuing the following command /etc/ini.d/netcpad restart
2.)Netcpa did not resolve the issue the only way to fix it at this point is a reboot of the host.

show control-cluster logical-switches arp-table xxx – Discover VM’s arp address in a VXLAN
Connection-ID shows the Host where it belongs to. If we look at the previous command.

1.)If a IP address does not show up in a controller issuing the arp-table command for its VXLAN/VNI chances are that VM will not be able to communicate to the outside world due to an issue with the host where it lives. Take that VM and migrate it to another host that has a working VTEP.
2.) IP address shows up but cannot ping its default gateway. Check to see the default gateway of the host and make sure it matches the default gateway of the LIF same goes with hosts OS.

show control-cluster logical-switches mac-table xxx – Discover VM’s mac addresses in a VXLAN

Same thing as the Arp-table the connection-ID directly maps to the VTEP table.

1.)Mac does not show up in the controller. Chances are there is an issue with the host. Check that the hosts VTEP interface shows up when issuing the command to see all the VTEPs that participate within a VXLAN/VNI. VMotion the VM to another host and reboot the non functional host.
2.)Check to make sure that the mac address is correct in the guest operating system.

show control-cluster logical-routers instance all – Shows each edges association with each host.
This command like the other controller commands will look different per controller. The LR-ID number will be needed for future commands.

show control-cluster logical-routers interface-summary – Provides all the interfaces for the LDR / Edge associated

show control-cluster logical-routers interface routerID interface – Provides the default gateway IP / MAC and MTU

show control-cluster logical-routers routes routerID – Shows all the routes for a given ESG. Note this is different per controller.

NSX edge commands
show ip route
Show ip route ospf/static/bgp
Show ip ospf
Show ip ospf neighbors
Show ip ospf database
show firewall flows – Will show every single flow going through the Edge router at that time. Similar to a iptables –L
show firewall flows top 10 – Provides the top 10 largest sessions
show firewall flows top 10 sort-by-pkts – Provides the top 10 by the amount of packet
show flowtable – will show all flows.
show ip forwarding – Displays the FIB as show ip route will show the rib
show system uptime – Shows the uptime of a device.

ESXi Related troubleshooting commands
esxcli network vswitch dvs vmware vxlan list – Lists the VTEP segment and default gateway for the VTEP with MTU
net-vdr -l –instance – Will list the routers along with their associated LIFs etc.
Esxcli software vib list | grep vxlan – This is the installed vib that needs to be installed on each host. If the vib is not installed the host cannot participate in VXLAN.