Do NOT install HP OMI agent v12 on NNMi9 Servers!

As per title, do NOT install/upgrade OMI agent on NNMi boxes to v12, as this will overwrite files in the /opt/OV/nonOV/perl directory. NNMi9 relies on Per 5.8.8 however 5.16.0 files will be installed when the OMI agent is installed.

Likely the first issues you will see will be NNMi self monitoring alerts, and then you’ll notice that you can’t run any of the .ovpl commands any more, getting a message like this:

Can't locate OVNNMvars.pm in @INC (@INC contains: /opt/OV/nonOV/perl/a/lib/site_perl/5.16.0/x86_64-linux-thread-multi /opt/OV/nonOV/perl/a/lib/site_perl/5.16.0 /opt/OV/nonOV/perl/a/lib/5.16.0/x86_64-linux-thread-multi /opt/OV/nonOV/perl/a/lib/5.16.0 /opt/OV/nonOV/perl/a/lib/site_perl/5.16.0/x86_64-linux-thread-multi /opt/OV/nonOV/perl/a/lib/site_perl/5.16.0 /opt/OV/nonOV/perl/a/lib/site_perl .) at /opt/OV/bin/nnmversion.ovpl line 19.
 BEGIN failed--compilation aborted at /opt/OV/bin/nnmversion.ovpl line 19.

It wasn’t fun getting back to a known state but we managed in the end by looking at what files the agent had installed/modified and restoring backups. Avoid the pain by not getting into the situation in the first place – also make sure you’re not just backing up data files but binaries as well.

The latest version of NNM10 uses Perl 5.16.0 I believe so this shouldn’t be an issue. I’d personally still avoid installing any product that uses the same file structure.

Effective Interface Monitoring with iSPI for Metrics

While iSPI for Metrics is useful for graphing, arguably the more critical use for it is in generating alerts when things are not working as expected.

I found today that Juniper does not include “FCS Lan Errors” in the ifInErrors counters which made finding an issue take a bit longer than usual. This is what I’m talking about:

Interface: ge-1/1/0, Enabled, Link is Up
Encapsulation: Ethernet, Speed: 1000mbps
Traffic statistics:                                           Current delta
  Input bytes:            42120054608900 (6704304 bps)           [16457880]
  Output bytes:           33903229061248 (11640616 bps)          [38429868]
  Input packets:             60739200100 (3676 pps)                 [82119]
  Output packets:            51621757514 (5316 pps)                [125843]
Error statistics:
  Input errors:                        0                                [0]
  Input drops:                         0                                [0]
  Input framing errors:          1435325                            [12231]
  Policed discards:                    0                                [0]
  L3 incompletes:                      0                                [0]

For effective monitoring of critical WAN interfaces, you can create an interface group using for example, a filter by interface name (or ifAlias – description if you have a naming standard), which then is set up as a monitoring policy with the thresholds you want. Alternatively you can use a Node group based policy bearing in mind you’ll be alerted for all interfaces on those nodes in the group.

Example:
thresh

You want to enable Interface Fault Polling and Interface Performance Polling to make this work, and may also want to consider lowering the default 5 minute poll period for critical infrastructure.

I tend to set the trigger count at 1, and the rearm count higher (at least 2) so that the incident doesn’t just disappear before someone’s had a chance to have a look at it when the error condition clears.

As a side note, you’ll also want to set generic thresholds… Eg: For “Routers” at least the following:

router-thresh