NNMi discovers a cisco device multiple times

This drove me absolutely crazy and took quite some time to figure out.

We know that NNMI autodiscovery is clever enough to determine whether a node it’s found on one address is already managed in the topology via another address.

What this doesn’t account for, however, is that devices with multiple VRFs can be discovered multiple times IF the SNMP agent returns a different hostname for each polled IP address in the different VRFs. Because of this it’s important to have the same domain name configured within each VRF.

ip domain-name yourdomain.com
 
vrf context management
  ip domain-name yourdomain.com

If for example, the domain-name isn’t configured within the vrf contexts, the agent will reply with just the short hostname, eg: yournode rather than yournode.yourdomain.com, and will then decide that it’s a different node altogether. This means you get a messy map and a headache.

This is probably just as valid with other network management tools that use an autodiscovery mechanism.

The other fix, of course, is to lock down SNMP access solely to a single target address (eg: a loopback) but you may find this hampers functionality if you allow your management system to try different interfaces/addresses in the case of it becoming unreachable. It’s more common to lock down by source management subnets rather than target IMO.

CentOS 6.3 NNMi Installation for the Lazy

Can’t be bothered to RTFM? Here is a quickstart guide to installing NNMi9 on CentOS 6.3 (it’s essentially the same procedure as on RHEL). This is handy for getting a test instance up and running quickly. Obviously, for a full-blown production environment, you should always read the deployment guide and release notes.

Install Base CentOS build with /var/opt/OV and /opt/OV filesystems and hardware sized to the requirements of your environment (see below).

Nodes   /opt/OV   /var/opt/OV  CPUs  RAM  -Xmx_value
250     3GB       10GB         2     4GB  -Xmx2g
250-3K  3GB       30GB         10    8GB  -Xmx4g
3K-8K   3GB       40GB         25    16GB -Xmx8g
8K-18K  3GB       60GB         40    24GB -Xmx12g
18K-25K 3GB       80GB         40    32GB -Xmx16g

Download the HP Public Keys and import them.

rpm --import hpPublicKey.pub
rpm --import hpPublicKey2048.pub

Install required packages and disable iptables for the most pain-free installation (you’re installing this in a trusted environment, right?)

yum install compat-libstdc++-33.i686
yum install compat-libstdc++-33.x86_64
yum install glibc.x86_64
yum install libXi.i686
yum install compat-libstdc++-33.x86_64
yum install xorg-x11-apps-7.4-10.el6.x86_64
yum install xorg-x11-server-Xorg.x86_64
yum install xorg-x11-server-Xorg.x86_64
yum install libX11.i686
yum install libX11.x86_64
yum groupinstall "X Window System"

chkconfig iptables off

Edit /etc/sysctl.conf and adjust/add parameters according to the HP Deployment Documentation.

vi /etc/sysctl.conf

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296

# NNMi settings for UDP receive and send buffer sizes
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608

:wq!

/sbin/sysctl -p

Update /etc/security/limits.conf as follows:

 vi /etc/security/limits.conf

*       soft    nofile  4096
*       hard    nofile  4096

Install NNMI from an XClient. This is highly annoying but easy enough – you should be able to tunnel X11 over SSH in PuTTY’s properties. You will first need to allow X11 forwarding (X11Forwarding yes) in /etc/ssh/sshd_config and restart sshd.

If you only have sudo access to root (common in enterprise environments), then you will find that this is a show-stopper. You need to do the following (don’t blindly copy and paste, you need to copy your own xauth information):

[user@nnmiserver ~] xauth list
nnmi.localdomain.com/unix:10  MIT-MAGIC-COOKIE-1  58682a5bb5a4f731ae15c186ff3d68f8
[user@nnmiserver ~] sudo su -
[root@nnmiserver ~] xauth add nnmi.localdomain.com/unix:10  MIT-MAGIC-COOKIE-1  58682a5bb5a4f731ae15c186ff3d68f8

Avoid using ReflectionX on Windows 7, as this has a habit of crashing/hanging during installation; this caused massive headaches and took a while to figure out! Xming/Exceed are fine.

Assuming you’ve downloaded the iso, mount it as follows:

mkdir /nnminstall
mount -o loop -t iso9660 Software,_NNM_i_Linux_9.20_Eng_TB768-15004.iso /nnmistall
cd /nnminstall
./setup.bin

Once installed and running, tune the ovjboss Xmx value as per the table at the top of this article.

ovstop -c ovjboss
vi /var/opt/OV/shared/nnm/conf/props/ovjboss.jvmargs

# –Xms: Initial Java Heap Size
# –Xmx: Maximum Java Heap Size
# –Xss: Java stack size (default to OS-supplied value)
#
–Xms128m
–Xmx2048m
#–Xss128m

:wq!

ovstart -c ovjboss

From here, download and install the latest patch RPMs to bring it up to the current version. :)

NNMi and Firewall Connections Monitoring

One thing that is sometimes overlooked on firewalls is the connection count. Badly written applications or incorrect firewall configurations can mean that the connections table becomes saturated, causing disconnections, connection failures and a myriad of other problems. This can result in people running tests, seeing packet loss, and concluding that there must be a duplex mismatch, erroring link, or something else fundamental along the path.

On a side node, one of the painful situations that causes this on Checkpoint is someone adjusting the TCP timeout value in global properties to something way above the default. TCP timeouts on Checkpoint should ALWAYS be done on the service object level, NOT globally.

We can monitor the connection tables on firewalls via NNMi and generate alerts and also affect the node status (colour) on node maps to help us find these problems.

This article will cover Nokia Checkpoint and Cisco ASA Firewalls as an example, but it can easily be replicated for any firewall by using a different OID for the number of concurrent connections.

Firstly, ensure that you have loaded the MIBs for your firewalls (CHECKPOINT-MIB for Nokias and/or the CISCO-FIREWALL-TC and CISCO-UNIFIED-FIREWALL-MIB for ASA)

Secondly, for all Nokia Checkpoint firewalls, you MUST run cpconfig, and enable the SNMP extension. Bear in mind that you will need to restart checkpoint services with cp restart which is disruptive!

Now, we must create Node Groups for monitoring our firewalls. I suggest that you create node groups that define members by SysOID, as each model/configuration of firewall will have a different limit to the maximum amount of concurrent connections allowed. Low, Mid and Top-end groups are a good idea so you can define a reasonably granular threshold for each group. You should consult the vendor documents to decide on what is appropriate for your environment.

The following SysOIDs may be useful:

nokiaIP110      .1.3.6.1.4.1.94.1.21.2.1.10
nokiaIP1220     .1.3.6.1.4.1.94.1.21.2.1.139
nokiaIP1260     .1.3.6.1.4.1.94.1.21.2.1.140
nokiaIP1280     .1.3.6.1.4.1.94.1.21.2.1.149
nokiaIP150      .1.3.6.1.4.1.94.1.21.2.1.151
nokiaIP2255     .1.3.6.1.4.1.94.1.21.2.1.143
nokiaIP2450     .1.3.6.1.4.1.94.1.21.2.1.148
nokiaIP260      .1.3.6.1.4.1.94.1.21.2.1.141
nokiaIP266      .1.3.6.1.4.1.94.1.21.2.1.142
nokiaIP290      .1.3.6.1.4.1.94.1.21.2.1.146
nokiaIP3400     .1.3.6.1.4.1.94.1.21.2.1.13
nokiaIP350      .1.3.6.1.4.1.94.1.21.2.1.138
nokiaIP380      .1.3.6.1.4.1.94.1.21.2.1.137
nokiaIP390      .1.3.6.1.4.1.94.1.21.2.1.144
nokiaIP3XX      .1.3.6.1.4.1.94.1.21.2.1.9
nokiaIP400      .1.3.6.1.4.1.94.1.21.2.1.3
nokiaIP410      .1.3.6.1.4.1.94.1.21.2.1.4
nokiaIP440      .1.3.6.1.4.1.94.1.21.2.1.5
nokiaIP4XX      .1.3.6.1.4.1.94.1.21.2.1.2
nokiaIP530      .1.3.6.1.4.1.94.1.21.2.1.11
nokiaIP560      .1.3.6.1.4.1.94.1.21.2.1.145
nokiaIP600      .1.3.6.1.4.1.94.1.21.2.1.7
nokiaIP650      .1.3.6.1.4.1.94.1.21.2.1.8
nokiaIP690      .1.3.6.1.4.1.94.1.21.2.1.147
nokiaIP6XX      .1.3.6.1.4.1.94.1.21.2.1.6
nokiaIP710      .1.3.6.1.4.1.94.1.21.2.1.15
nokiaIP740      .1.3.6.1.4.1.94.1.21.2.1.12
ASA5505         .1.3.6.1.4.9.1.745
ASA5510         .1.3.6.1.4.9.1.699
ASA5520         .1.3.6.1.4.9.1.670
ASA5540         .1.3.6.1.4.9.1.672
ASA5550         .1.3.6.1.4.9.1.753

Now, we create a MIB expression for the OID we want to monitor.


Clicking on the right hand side of MIB Variable lets us drill down the MIB tree to the OID we want. In this case, it’s 1.3.6.1.4.1.2620.1.1.25.3) – fwNumConn. For ASAs, you want 1.3.6.1.4.1.9.9.491.1.1.1.6.

This gives us the following:

We now create a Custom Poller Policy (The 25000 here suggests devices that have a maximum of 25K connections limit, but you could call it anything you like, such as “Low-End-CP-Firewalls-25K-MAX”). From here, we create a new collection policy (see right hand side of image below). We select “Generate incident” on Node collection to generate incidents when the threshold is breached, and we also select “Affect Node Status”, since connection count being over threshold is going to impact performance.

If using NPS, export the collection. You may also prefer to change the “Incident Source Object” to “Custom node Collection” rather than “Custom Polled Instance” as instances tend to work better for multiple objects within the same OID such as BGP peerings.

And then define a threshold… This should be a bit below the maximum amount of supported connections for the group of devices you are monitoring. Eg: for a 25K connections device, select 20000.

Now, back in the Custom poller policy form, we can assign our Node Group. EG: Checkpoint_LowEnd_Firewalls (a node group that includes IP260/IP290 and selects nodes by SysOIDs )

Once all forms are saved, we can verify this is working by navigating to Monitoring > Custom Polled Instances.

This configuration means that firewalls that are added to the topology will automatically fall into the correct node groups and alerting thresholds. It also means that the map will change when the threshold is breached. Split the node groups down as much as you want, but bear in mind that you will have to create a new polling policy/collection+threshold for each group.

It should also be noted that in NNMi, if you adjust a threshold, the collection policy will be suspended and you will have to re-enable it. Don’t let this catch you out!

You may find that keeping an eye on this particular aspect of your firewalls may save you some real headaches later on.

Monitoring NNMi9 Node Inventory Changes

NNMi9 is a fantastic product, and works very well in automatic discovery mode, however, it can be quite difficult monitoring what has changed in the topology from day to day.

For this reason, I thought it’d be a good idea to create a simple script that summarised the changes in the form of nodes added and removed from the topology since the last invocation of the script.

Firstly, it’s good practice to run nnmsetcmduserpw.ovpl while logged in as your normal user account to avoid having to store the admin username and password in plain text, so do this first. Next, I set up a cron job:

# crontab -e

   0 8 * * * /home/networkadmin/nnmi-changes.sh >/dev/null 2>&1

The cron job references this script:

#!/bin/sh
#
#     File: nnmi-changes.sh
#  Version: 1.1
# Function: Check for changes in NNMI Node Inventory
#   Author: sol@subnetzero.org
#
# For security reasons, you should make sure you have
# used nnmsetcmduserpw.ovpl to store the admin user/pass
# encrypted form in your home dir..
#
# If for some reason you don't want to do this, you must
# uncomment and set the NNMUSER and NNMPASS vars below.
#
# This script can be run as a cron job if required.
#
# Set some vars
BASE_DIR=/home/networkadmin
MAILADDR="nnmi-notifications@yourdomain.com"
SUBJECT="NNMi Node Topology Changes"
MAIL=1
#NNMUSER=nnmadminuser
#NNMPASS=nnmpass

# Check we can write to dir
if [ ! -w $BASE_DIR ]; then
   printf "Unable to write to $BASE_DIR\n"
   exit 1
fi

# Put list of Node shortnames into a temp file
# We have to run sort with LC_COLLATE=C as sort behaves
# differently under cron, so running the script from
# command line creates false alerts.
if [ "$NNMUSER" ]; then
   /opt/OV/bin/nnmtopodump.ovpl -u $NNMUSER -p $NNMPASS -type node |\
    grep "shortname" |\
    sed -e 's/<[^>]*>//g' -e 's/ //g' |\
    LC_COLLATE=C sort -d > $BASE_DIR/nodelist.tmp
else
   /opt/OV/bin/nnmtopodump.ovpl -type node |\
    grep "shortname" |\
    sed -e 's/<[^>]*>//g' -e 's/ //g' |\
    LC_COLLATE=C sort -d > $BASE_DIR/nodelist.tmp
fi

# Check for previous nodelist file, then diff the
# current output against the last one, grep for
# changes and do a simple substitution to make
# things more obvious.
#
if [ -f $BASE_DIR/nodelist ]; then
   CHANGELIST=`comm -3 $BASE_DIR/nodelist.tmp $BASE_DIR/nodelist |\
                sed  -e 's/^\([A-Za-z0-9]\)/Node Added: \1/g' \
                     -e 's/\t/Node Removed: /g'`
   NODECOUNT=`wc -l $BASE_DIR/nodelist.tmp | awk '{print $1}'`

   # See if there are any changes.
   if [ "$CHANGELIST" ]; then
      printf "$CHANGELIST\n\nTotal Nodes: $NODECOUNT\n"

      # Send a mail if required
      if [ "$MAIL" -eq "1" ]; then
         printf "$CHANGELIST\n\nTotal Nodes: $NODECOUNT\n" |\
          /bin/mailx -s "$SUBJECT" $MAILADDR
      fi
   else
      printf "No changes in topology.\n\nTotal Nodes: $NODECOUNT\n"

      # Send a mail if required
      if [ "$MAIL" -eq "1" ]; then
         printf "No changes in topology.\n\nTotal Node count: $NODECOUNT\n" |\
          /bin/mailx -s "$SUBJECT" "$MAILADDR"
      fi
   fi

   # Now checks are finished, the temp file can
   # become the reference file for the next check.
   mv $BASE_DIR/nodelist.tmp $BASE_DIR/nodelist
else
   printf "First time run. Nothing to report\n"
   mv $BASE_DIR/nodelist.tmp $BASE_DIR/nodelist
fi

This will give you a mail in your inbox in the morning along these lines:

-----Original Message-----
From: nnmi@nnmiserver [mailto:nnmi@nnmiserver] 
Sent: 24 April 2013 08:01
To: nnmi-notifications@yourdomain.com
Subject: NNMi Topology Changes

Node Removed: TESTNODE1
Node Added: TESTNODE2
Node Added: TESTNODE3

Total Nodes: 862

Note: After upgrade to 9.22, permissions on the /opt/OV/nonOV/perl/a/lib/site_perl/5.8.8 directory had been incorrectly set which caused the script to fail. This was fixed with:

chmod 555 /opt/OV/nonOV/perl/a/lib/site_perl/5.8.8