Cisco 4900M Upgrade failure issue.

So you’ve got a Cisco 4900M switch to upgrade. You’ve uploaded the new image to flash, and set the boot statement correctly. However, your switch still won’t come up on the new version of code you’ve uploaded, no matter how many times you reload. This makes no sense at all and you begin to tear your hair out.

4900M-SW-1#dir
Directory of bootflash:/
 
    6  -rw-    25442405   Sep 7 2012 12:55:14 +01:00  cat4500e-entservicesk9-mz.122-53.SG1.bin
    7  -rw-    25646261   Sep 8 2012 13:31:20 +01:00  cat4500e-entservicesk9-mz.122-53.SG2.bin
    8  -rw-       41218  Jan 14 2013 11:10:26 +01:00  extra_logs.txt
    9  -rw-    25936915  Mar 18 2013 10:00:40 +01:00  cat4500e-entservicesk9-mz.122-54.SG1.bin

4900M-SW-1(config)# boot system flash bootflash:cat4500e-entservicesk9-mz.122-54.SG1.bin

The issue here is easily missed. The problem is that the 4900M platforms seem to ship with their config register set to 0x2101 by default, which means load the first image found in flash. Doing a show version will verify this. Fix this with:

4900M-SW-1(config)# config-register 0x2102
4900M-SW-1(config)#^Z

Using “show bootvar” will verify your config register has been changed. After reload, your switch should now reload with the correct image (whatever is specified in your boot statement). You will see the config register is now correct from “show version”:

cisco WS-C4900M (MPC8548) processor (revision 2) with 524288K bytes of memory.
Processor board ID XXXXXXXXX
MPC8548 CPU at 1.33GHz, Cisco Catalyst 4900M
Last reset from Reload
10 Virtual Ethernet interfaces
36 Gigabit Ethernet interfaces
16 Ten Gigabit Ethernet interfaces
511K bytes of non-volatile configuration memory.
 
Configuration register is 0x2102

I should say here that 4900Ms are very strange beasts and have caused me a few issues in the past. More code related than hardware, though. An older version caused an outage when uploading an IOS image via FTP thanks to a process deciding to hog 100% of the CPU (confirmed bug from cisco). Call me paranoid, but I now avoid uploading new images during the day if I can help it!

Cisco Nexus and problems with Links/Circuits and SFPs

When migrating from the more traditional Cisco infrastructure to Nexus platforms (eg 7Ks and 5Ks), a few teething problems tend to emerge.

Migrating carrier circuits and 1 Gig links on to Nexus 5Ks for example, can be a tad frustrating at first because there are a couple of “gotchas”. Ignoring the simple stuff like fibres being the wrong way round or dodgy cables, here are a couple I’ve come across.

First Gotcha: “SFP Invalid”
This can happen despite having a valid SFP in the port (eg: GLC-SX-MM), and won’t be resolved until you do the following, as the Nexus port in question wants to run at 10Gig by default. Earlier versions of NX-OS wouldn’t support 1Gig operation, so if it doesn’t allow this, chances are, you need to upgrade.

interface Eth1/24
 speed 1000

Second Gotcha: “Link still won’t come up despite setting speed 1000”
Try this (especially if it’s a carrier circuit and works fine on the old kit). Note that this feature was only added in 5.0.3N2 code onwards. If you’re migrating a carrier circuit that was set to “speed nonegotiate”, you will likely need this.

interface Eth1/24
 no negotiate auto

Third gotcha: Persistent Invalid/Unsupported transceiver state.
This can happen when setting up 10 Gig links more than 1 Gig links, but sometimes your “smart hands” person inserts a FET SFP (for FEXes) into the slot, or something like an FC transciever. You can always verify what’s going on with:

NXASW-A# sho int e1/29 transceiver
Ethernet1/29
    transceiver is present
    type is 1000base-SX
    name is CISCO-FINISAR
    part number is FTLF8519P2BCL-CS
    revision is 0000
    serial number is XXXXXXXXXX
    nominal bitrate is 1300 MBit/sec
    Link length supported for 50/125um fiber is 550 m
    Link length supported for 62.5/125um fiber is 270 m
    cisco id is --
    cisco extended id number is 4
 

If the nominal bitrate is 4000-something, it’s an FC SFP. It may appear as 1000base-unknown.

NNMi discovers a cisco device multiple times

This drove me absolutely crazy and took quite some time to figure out.

We know that NNMI autodiscovery is clever enough to determine whether a node it’s found on one address is already managed in the topology via another address.

What this doesn’t account for, however, is that devices with multiple VRFs can be discovered multiple times IF the SNMP agent returns a different hostname for each polled IP address in the different VRFs. Because of this it’s important to have the same domain name configured within each VRF.

ip domain-name yourdomain.com
 
vrf context management
  ip domain-name yourdomain.com

If for example, the domain-name isn’t configured within the vrf contexts, the agent will reply with just the short hostname, eg: yournode rather than yournode.yourdomain.com, and will then decide that it’s a different node altogether. This means you get a messy map and a headache.

This is probably just as valid with other network management tools that use an autodiscovery mechanism.

The other fix, of course, is to lock down SNMP access solely to a single target address (eg: a loopback) but you may find this hampers functionality if you allow your management system to try different interfaces/addresses in the case of it becoming unreachable. It’s more common to lock down by source management subnets rather than target IMO.

Cisco: setup express and the potential nightmares.

Some Cisco switches these days have an annoying extra feature called express setup.

What you may find, sooner or later, is that someone does some cabling in a cabinet which then causes a cable loom or RJ45 plug to rest against the mode button on your switch. Suddenly, you’re called out at silly o’clock because the switch lost its config and is no longer functioning as expected. :(

To avoid this nightmare scenario is simple. Add this to your configuration standards documentation where applicable:

Switch(config)# no setup express

Headache averted!

Unsupported SFPs in Cisco Switches – a warning!

So… an SFP has blown and there are no spare Cisco ones around. As luck would have it you’ve found a one made by HP, or some other manufacturer and tried installing it, only to see in the logs that there is an “Unsupported transceiver found in GiX/X” (or something similar.)

Damn.

A  bit of googling shows you that this is easily worked around with the following commands. The first of these is a hidden command and must be typed in full:

# service unsupported-transceiver
# no errdisable detect cause gbic-invalid

What you aren’t told, however, is that the first command behaves differently across the various Cisco platforms! On most kit, it’ll happily take the command, give you a warning about support (on your head be it) and then your SFP will spring into life. On others, such as the 4900M, for example, it will take down and reinitialise all of your X2 modules and SFPs.

This kind of outage is probably not what you want during the day, so unless your switch is isolated anyway, it’s probably best to schedule this out of hours if you can’t lab it first to find out what the consequences will be.