Recovered from the older tannerjc.net wiki snapshot dated January 23, 2016.

Info

Testing

-bash-3.2$ ssh -p 4545  nocpulse@10.0.0.180
Permission denied (publickey,keyboard-interactive).
-bash-3.2$
[root@satellite nocpulse]# ssh -p 4545 -i /var/lib/nocpulse/.ssh/nocpulse-identity nocpulse@10.0.0.180
Last login: Tue Sep 21 22:49:26 2010 from satellite.sat53.net
-bash-3.2$
[root@satellite nocpulse]# ssh -i /var/lib/nocpulse/.ssh/nocpulse-identity nocpulse@10.0.0.180

RHN Satellite kickstart on 2010-06-03

-bash-3.2$ exit
logout
Connection to 10.0.0.180 closed.
[root@satellite nocpulse]#
-bash-3.2$ ssh -i /var/lib/nocpulse/.ssh/nocpulse-identity nocpulse@10.0.0.180
Last login: Tue Sep 21 22:51:09 2010 from satellite.sat53.net

RHN Satellite kickstart on 2010-06-03

-bash-3.2$
-bash-3.2$ ssh -p 4545 -i /var/lib/nocpulse/.ssh/nocpulse-identity nocpulse@10.0.0.180
Last login: Tue Sep 21 22:51:14 2010 from satellite.sat53.net
-bash-3.2$
[root@satellite nocpulse]# rpm -qa | fgrep -i noc
perl-NOCpulse-Object-1.26.10-1.el5sat
NOCpulsePlugins-2.208.6-7.el5sat
nocpulse-common-2.1.8-9.el5sat
perl-NOCpulse-PersistentConnection-1.5.4-1.el5sat
perl-NOCpulse-Utils-1.14.11-1.el5sat
perl-NOCpulse-OracleDB-1.28.12-3.el5sat
perl-NOCpulse-Scheduler-1.58.11-3.el5sat
perl-NOCpulse-SetID-1.6.11-1.el5sat
perl-NOCpulse-Gritch-1.27.4-1.el5sat
perl-NOCpulse-Probe-1.183.6-2.el5sat
perl-NOCpulse-CLAC-1.9.8-1.el5sat
perl-NOCpulse-ProcessPool-0.10.3-2.el5sat
nocpulse-db-perl-3.6.2-2.el5sat
perl-NOCpulse-Debug-1.23.15-1.el5sat
[root@satellite init.d]# file /usr/sbin/MonitoringScout
/usr/sbin/MonitoringScout: symbolic link to `/etc/rc.d/np.d/sysvStep'
[root@satellite init.d]# file /etc/rc.d/np.d/sysvStep
/etc/rc.d/np.d/sysvStep: perl script text executable
[root@satellite init.d]# head /etc/rc.d/np.d/sysvStep | egrep -v ^\#
use NOCpulse::Object;
use lib qw(/etc/rc.d/np.d);
use RootOnlyPlease;
use NOCpulse::Config;
use SysVStep;
[root@nagios log]# diff /etc/nocpulse/rhnmd_config /etc/nocpulse/rhnmd_config.orig
35c35
 LogLevel INFO
---
 LogLevel QUIET
  • Client
== /var/log/messages ==
Sep 21 23:14:28 nagios rhnmd[5539]: WARNING: /etc/ssh/moduli does not exist, using fixed modulus
Sep 21 23:14:28 nagios rhnmd[5539]: Accepted publickey for nocpulse from 10.0.0.5 port 52475 ssh2
Sep 21 23:14:36 nagios rhnmd[5546]: WARNING: /etc/ssh/moduli does not exist, using fixed modulus
Sep 21 23:14:36 nagios rhnmd[5546]: Accepted publickey for nocpulse from 10.0.0.5 port 52481 ssh2
  • Probe server - /var/log/nocpulse
== kernel.log ==
2010-09-21 19:16:40 Spawned 7
2010-09-21 19:16:40 Scheduler says we're up-to-date
2010-09-21 19:16:42 Reaped 7

== dequeue.log ==
2010-09-21 19:16:45 74: looking for queue entries
2010-09-21 19:16:45 74: sending entry: queuename=ts_dbversion=1.0mac=52:54:00:21:88:0Dsatcluster=f473e45010f2fn=batch_insertdata=1-7-pctused%091285111001%097%0A1-7-space_used%091285111001%0912%0A1-7-space_avail%091285111001%09167%0A
2010-09-21 19:16:45 74: 		Success

== kernel.log ==

== dequeue.log ==
2010-09-21 19:16:50 74: looking for queue entries

== kernel.log ==

== ack_handler.log ==
2010-09-21 19:16:51 waiting for data....

Troubleshooting

[root@satellite nocpulse]# su - nocpulse
-bash-3.2$ rhn-catalog
7 ServiceProbe on nagios (10.0.0.180      ): Linux: Disk Usage
8 ServiceProbe on nagios (10.0.0.180      ): Linux: Load
  • If no probes are listed, the scout configs need to be pushed again at https://hostname/network/monitoring/scout/index.pxt?set_label=scout_list
-bash-3.2$ rhn-catalog --commandline --dump 7 | head -n 2
7 ServiceProbe on nagios (10.0.0.180      ): Linux: Disk Usage
      Run as: Unix::Disk.pm --sshhost=10.0.0.180       --sshbannerignore= --fs_0=/dev/hda1 --critical_avail= --critical=90 --critical_used= --timeout=15 --warn=75 --warn_avail= --warn_used= --shell=SSHRemoteCommandShell --sshuser=nocpulse --sshport=4545
-bash-3.2$ rhn-runprobe --probe 7
2010-09-21 22:46:01 	No items changed
2010-09-21 22:46:01 	Notification not required
2010-09-21 22:46:01 	NOTE: Running in test mode; no changes saved, nothing enqueued
2010-09-21 22:46:01
============================================================
OK: Filesystem /dev/hda1 (/boot): Filesystem pct used 7%; Space available 167 MB; Space used 12 MB
============================================================
-bash-3.2$ rhn-runprobe --probe 8
2010-09-21 22:46:15 	No items changed
2010-09-21 22:46:15 	Notification not required
2010-09-21 22:46:15 	NOTE: Running in test mode; no changes saved, nothing enqueued
2010-09-21 22:46:15
============================================================
OK: CPU load 1-min ave 0.00; CPU load 5-min ave 0.01; CPU load 15-min ave 0.00
============================================================
  • If the client does not have the rsa key in /var/lib/nocpulse/.ssh/authorized_keys, the following probe error will occur …
-bash-3.2$ rhn-runprobe 9
2010-09-22 01:01:21 	Caught an error, status changing to UNKNOWN
2010-09-22 01:01:21 	NOTE: Running in test mode; no changes saved, nothing enqueued
2010-09-22 01:01:21
============================================================
UNKNOWN: The RHN Monitoring Daemon (RHNMD) is not responding: Permission denied (publickey,keyboard-interactive). Please make sure the daemon is running and the host is accessible from the monitoring scout. Command was: /usr/bin/ssh -l nocpulse -p 4545 -i /var/lib/nocpulse/.ssh/nocpulse-identity -o StrictHostKeyChecking=no -o BatchMode=yes 192.168.2.78 /bin/sh -s
============================================================
  • Try executing a probe in debug mode
-bash-3.2$ rhn-runprobe --debug=2 --probe 42
2011-01-18 14:14:53 	Items changed or removed:
2011-01-18 14:14:53 		pctused '0' is OK
2011-01-18 14:14:53 		NOCpulse::Probe::Shell::ConnectError '' is UNKNOWN
2011-01-18 14:14:53 	Notification not required
2011-01-18 14:14:53 	NOTE: Running in test mode; no changes saved, nothing enqueued
2011-01-18 14:14:53
============================================================
OK: CPU pct used 0%
============================================================
  • Active logs
[root@satellite nocpulse]# ps aux | fgrep -i nocpulse | awk '{print $15}' | sort | uniq | egrep -v ^\$
--hbfile=/var/lib/nocpulse/commands/heartbeat
--hbfile=/var/log/nocpulse/ack_handler.log
--hbfile=/var/log/nocpulse/dequeue.log
--hbfile=/var/log/nocpulse/generate_config.log
--hbfile=/var/log/nocpulse/kernel.log
--hbfile=/var/log/nocpulse/notif-escalator.log
--hbfile=/var/log/nocpulse/notifier.log
--hbfile=/var/log/nocpulse/notif-launcher.log
  • Good runprobe example
-bash-3.2$ rhn-runprobe --live --probe 42
2011-01-18 14:20:01 	No items changed
2011-01-18 14:20:01 	Notification not required
2011-01-18 14:20:01
============================================================
OK: CPU pct used 1%
============================================================

Database

SQL select RECID from host;

     RECID
----------
1000010127
1000010128
1000010129
1000010130
1000010186
1000010187
1000010188
1000010206
1000010386
1000010406
1000010426

     RECID
----------
1000010447
1000010466
1000010467

14 rows selected.
SQL select * from check_probe;

  PROBE_ID PROBE_TYPE	   HOST_ID SAT_CLUSTER_ID
---------- ------------ ---------- --------------
	 9 check	1000010406	       21
	10 check	1000010406	       21