So you’ve got a Cisco 4900M switch to upgrade. You’ve uploaded the new image to flash, and set the boot statement correctly. However, your switch still won’t come up on the new version of code you’ve uploaded, no matter how many times you reload. This makes no sense at all and you begin to tear your hair out.
Directory of bootflash:/
6 -rw- 25442405 Sep 7 2012 12:55:14 +01:00 cat4500e-entservicesk9-mz.122-53.SG1.bin
7 -rw- 25646261 Sep 8 2012 13:31:20 +01:00 cat4500e-entservicesk9-mz.122-53.SG2.bin
8 -rw- 41218 Jan 14 2013 11:10:26 +01:00 extra_logs.txt
9 -rw- 25936915 Mar 18 2013 10:00:40 +01:00 cat4500e-entservicesk9-mz.122-54.SG1.bin
4900M-SW-1(config)# boot system flash bootflash:cat4500e-entservicesk9-mz.122-54.SG1.bin
The issue here is easily missed. The problem is that the 4900M platforms seem to ship with their config register set to 0x2101 by default, which means load the first image found in flash. Doing a show version will verify this. Fix this with:
4900M-SW-1(config)# config-register 0x2102
Using “show bootvar” will verify your config register has been changed. After reload, your switch should now reload with the correct image (whatever is specified in your boot statement). You will see the config register is now correct from “show version”:
cisco WS-C4900M (MPC8548) processor (revision 2) with 524288K bytes of memory.
Processor board ID XXXXXXXXX
MPC8548 CPU at 1.33GHz, Cisco Catalyst 4900M
Last reset from Reload
10 Virtual Ethernet interfaces
36 Gigabit Ethernet interfaces
16 Ten Gigabit Ethernet interfaces
511K bytes of non-volatile configuration memory.
Configuration register is 0x2102
I should say here that 4900Ms are very strange beasts and have caused me a few issues in the past. More code related than hardware, though. An older version caused an outage when uploading an IOS image via FTP thanks to a process deciding to hog 100% of the CPU (confirmed bug from cisco). Call me paranoid, but I now avoid uploading new images during the day if I can help it!
When migrating from the more traditional Cisco infrastructure to Nexus platforms (eg 7Ks and 5Ks), a few teething problems tend to emerge.
Migrating carrier circuits and 1 Gig links on to Nexus 5Ks for example, can be a tad frustrating at first because there are a couple of “gotchas”. Ignoring the simple stuff like fibres being the wrong way round or dodgy cables, here are a couple I’ve come across.
First Gotcha: “SFP Invalid”
This can happen despite having a valid SFP in the port (eg: GLC-SX-MM), and won’t be resolved until you do the following, as the Nexus port in question wants to run at 10Gig by default. Earlier versions of NX-OS wouldn’t support 1Gig operation, so if it doesn’t allow this, chances are, you need to upgrade.
Second Gotcha: “Link still won’t come up despite setting speed 1000”
Try this (especially if it’s a carrier circuit and works fine on the old kit). Note that this feature was only added in 5.0.3N2 code onwards. If you’re migrating a carrier circuit that was set to “speed nonegotiate”, you will likely need this.
no negotiate auto
Third gotcha: Persistent Invalid/Unsupported transceiver state.
This can happen when setting up 10 Gig links more than 1 Gig links, but sometimes your “smart hands” person inserts a FET SFP (for FEXes) into the slot, or something like an FC transciever. You can always verify what’s going on with:
NXASW-A# sho int e1/29 transceiver
transceiver is present
type is 1000base-SX
name is CISCO-FINISAR
part number is FTLF8519P2BCL-CS
revision is 0000
serial number is XXXXXXXXXX
nominal bitrate is 1300 MBit/sec
Link length supported for 50/125um fiber is 550 m
Link length supported for 62.5/125um fiber is 270 m
cisco id is --
cisco extended id number is 4
If the nominal bitrate is 4000-something, it’s an FC SFP. It may appear as 1000base-unknown.
This drove me absolutely crazy and took quite some time to figure out.
We know that NNMI autodiscovery is clever enough to determine whether a node it’s found on one address is already managed in the topology via another address.
What this doesn’t account for, however, is that devices with multiple VRFs can be discovered multiple times IF the SNMP agent returns a different hostname for each polled IP address in the different VRFs. Because of this it’s important to have the same domain name configured within each VRF.
ip domain-name yourdomain.com
vrf context management
ip domain-name yourdomain.com
If for example, the domain-name isn’t configured within the vrf contexts, the agent will reply with just the short hostname, eg: yournode rather than yournode.yourdomain.com, and will then decide that it’s a different node altogether. This means you get a messy map and a headache.
This is probably just as valid with other network management tools that use an autodiscovery mechanism.
The other fix, of course, is to lock down SNMP access solely to a single target address (eg: a loopback) but you may find this hampers functionality if you allow your management system to try different interfaces/addresses in the case of it becoming unreachable. It’s more common to lock down by source management subnets rather than target IMO.
Some Cisco switches these days have an
annoying extra feature called express setup.
What you may find, sooner or later, is that someone does some cabling in a cabinet which then causes a cable loom or RJ45 plug to rest against the mode button on your switch. Suddenly, you’re called out at silly o’clock because the switch lost its config and is no longer functioning as expected. :(
To avoid this nightmare scenario is simple. Add this to your configuration standards documentation where applicable:
Switch(config)# no setup express