BGP: TROUBLESHOOT NEIGHBOR RELATIONSHIPS

“Neighbors. Everybody needs good neighbors! With a little understanding, you can find the perfect blend.” So went the lyrics to your favourite Aussie soap opera. But understand this: sometimes, neighbors can be evil. Sometimes neighbors will steal your ball, kidnap your cat, wake up at 4am to poison your milk, and sometimes – just sometimes – neighbors will spend 15 long years dedicating their lives to putting posters of your face up around the city; posters declaring you to be “the enemy of beauty”, “absolute bobbins” and “comparable to the toilet”.

I became an expert in troubleshooting neighbor relationships during those times. Sadly, I can’t tell you how I “fixed” the relationship with my execrable neighbor Garren – at least not without implicating myself in the eyes of the law. But I can tell you this: the experience taught me everything I need to know about troubleshooting BGP relationships.

If you ever find a new BGP neighborship isn’t coming up, here’s a list of things to check. That is, if you’re not a coward?

 

1) YOU SURE YOU GOT THE RIGHT IP, BUDDY?

A router’s source IP address has to match whatever the neighbor’s BGP configuration is expecting. It’s so easy to accidentally type the wrong IP, especially if the neighbor is coming from a loopback.

So for example, here we have two routers. Router1 is trying to start a BGP neighbor relationship, using Router2’s GigabitEthernet interface, on 192.168.12.2.

This won’t work though, because Router2 is sourcing the session from its loopback, 2.2.2.2. As such, the neighbor relationship will never come up. Check out this faulty config:

ROUTER 1:

interface Loopback0
 ip address 1.1.1.1 255.255.255.255

router bgp 1
 neighbor 192.168.12.2 remote-as 2 
 neighbor 192.168.12.2 ebgp-multihop 2
 neighbor 192.168.12.2 update-source Loopback0

ROUTER 2:

interface Loopback0
 ip address 2.2.2.2 255.255.255.255

router bgp 2
 neighbor 1.1.1.1 remote-as 1
 neighbor 1.1.1.1 ebgp-multihop 2
 neighbor 1.1.1.1 update-source Loopback0

The fix is simple: reconfigure Router1 so that the commands say “neighbor 2.2.2.2“.

 

2) YOU SURE YOU GOT THE RIGHT AUTONOMOUS SYSTEM, FRIEND?

It’s so easy to make a typing error when you type your neighbor’s AS number. Below we see a mistake on Router2 – it thinks the remote AS is 11, not 1.

router bgp 2
 neighbor 192.168.12.1 remote-as 11

It’s another simple fix – just check your typing. It’s an easy mistake to make. Don’t feel bad!

 

3) ARE YOUR MULTIHOP SETTINGS ON POINT?

Remember, the default behaviour for eBGP is to expect a neighbor to be directly connected, because the Time To Live of the packet will be 1. Check out the missing line on Router 1, compared to Router 2:

ROUTER 1:

router bgp 1
 neighbor 2.2.2.2 remote-as 2
 neighbor 2.2.2.2 update-source Loopback0

ROUTER 2:

router bgp 2
 neighbor 1.1.1.1 remote-as 1
 neighbor 1.1.1.1 ebgp-multihop 2
 neighbor 1.1.1.1 update-source Loopback0

If you’re connecting to a loopback, you’re gonna want to jack that TTL up to a sweet number. Don’t overdo it though. There’s no use setting the TTL to 255 just for the sake of it. For example, if the connection goes down, do you really want the BGP relationship to take an alternative path through 15 other routers and 4 other autonomous systems? No sir! Set it to what you need, and no more. Don’t be greedy. You’re better than that.

 

4) YOU SURE YOU HAVE A ROUTE TO THE NEIGHBOUR, PAL?

If you’re making a neighborship to another router’s loopback, remember to check that the IP is actually in your routing table!

Router1#show ip route 2.2.2.2
 % Network not in table

You know the fix: Make sure that you’ve either got static routes in both directions, or where possible run an internal routing protocol like OSPF, EIGRP or IS-IS.

Once that’s set up, ping to double-check that all is good. Source it from your loopback interface. If you’re load-balancing over multiple paths, do a lot of pings, to make sure that it’s all good.

Router1#ping 2.2.2.2 source loopback0 repeat 50

Type escape sequence to abort.
 Sending 50, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
 Packet sent with a source address of 1.1.1.1
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 Success rate is 100 percent (50/50), round-trip min/avg/max = 8/16/28 ms
 Router1#

 

5) ARE YOU BLOCKING TCP PORT 179 BY MISTAKE?

It’s quite possible that there’s an access-list on the interface, for example one that blocks everything apart from web and mail traffic. Remove the access-list if it’s not needed, or permit TCP 179 if it is.

 

6) STILL NOT WORKING? HERE’S A FEW OTHER MISCELLAOUS THINGS TO CONSIDER

  • There could be a password mismatch, of course. But you thought of that already, right? Right?
  • Interestingly, the keepalive and hold timers DON’T have to match! The default keepalive is 60 seconds, and the hold time is 3x as big (180 seconds) – but if one router is configured differently, they’ll exchange settings and agree to use the smallest time. (I wish my old neighbor Garren wasn’t alive.)
  • If there’s an MTU mismatch on the two router interfaces (or indeed, on any devices in between the two routers), the neighbor relationship will still come up – but it might flap when the devices actually try to exchange prefixes. There’s a good article on Cisco’s website all about MTU troubleshooting for BGP.

Leave a Reply

Your email address will not be published. Required fields are marked *