I've been working on setting up nrpe to allow a nagios server to execute check commands on another server.
From looking at NRPE, it seems to be a simple tcp server, a bit like a telnet server that receives a command, executes it on the host machine, and the sends the result back. NRPE also uses ssl encryption between the client and nagios server, so the communication seems quite secure.
============= =============
| Nagios Server | -> | Remote Server |
============= =============
check_nrpe -> nrpe server -> check_ping
So I've compiled nrpe from source to get the nrpe server daemon and the check_nrpe plugin.
As a note, I used the nrpe server daemon I compiled on the remote box, and I compiled nrpe on my nagios server. Then I copied the check_nrpe binary to the nagios plugin folder.
The point I'm making is, I didn't compile everything on one server, I compiled nrpe on both servers and used the server daemon on the remote box, and the check plugin on the nagios server.
Stops you getting compile / execute errors, due to package differences between the two servers.
Once I got nrpe up and running, I tried to use the check_nrpe script from the nagios server to connect to the remote server and run a check_users command. It would fail with NRPE: Unable to read output.
But if i sent a bogus command that wasn't defined in the nrpe.cfg file on the remote server, nrpe would come back saying the command didn't exist.
STRANGE indeed. That means NRPE is fine, the SSL is working, but something else is causing a problem.
In my /var/log/messages on the remote server I would find entries like this:
sudo(pam_unix)[15694]: authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=nagiosThese entries would appear after I ran the check_nrpe plugin from the nagios server.
It turns out, nrpe on the remote host, needs to execute the check commands as root. This makes sense as the data the check_users plugin needs, would only be available as root. Here sudo now gets involved.
So nrpe fires up an sudo command to run the nagios plugin to get say our check_users data.
The command would be something like sudo -u nagios /etc/nagios/plugins/./check_users -w 4 -c 8
If you were to goto the remote server, and do the following:
# su nagios (become the nagios user)
# sudo -u nagios /etc/nagios/plugins/./check_users -w 4 -c 8
After the sudo, you would see it request a password. As nrpe can't enter passwords, this is what is causing the error. It invokes the command, but can't complete it, as sudo wants a password.
To solve this, you need to edit the /etc/sudoers file.
Here is a good doc explaining it in detail:
http://www.chinalinuxpub.com/doc/www.siliconvalleyccie.com/linux-hn/sudo.htm#_Toc32905575The short of it is, I uses visudo and added this to the sudoers file.
nagios ALL=(ALL) NOPASSWD: /usr/local/nagios/libexec/check_users
For some reason you have to add and entry for each file. With this done, nrpe will work correctly.
Cheers for now.
Labels: Linux, Nagios, NRPE