NSClient Error – Could not get data for
I started receiving an error in Nagios that stated: NSClient – Error: Could not get data for 10 perhaps we don’t collect data this far back? I started researching the issue, and the obvious solutions weren’t working.
When troubleshooting errors with NSClient, the easiest (and quickest) way to resolve the issue is to enable logging. You can do this by editing your nsc.ini file and uncommenting the file line:
;# LOG FILE ;# The file to print log statements to file=NSC.log
Now, open services.msc and restart the NSClient service.
After you manually run another check (or wait until Nagios does it for you), you should get some valuable information in the nsc.log file. In mine, I kept seeing:
error:modules\CheckSystem\PDHCollector.cpp:264: Failed to get CPU value: \Processor(_total)\% Processor Time: Failed to get mutex : (
I first verified that the cpu buffer size in the nsc.ini file was large enough. The recommendation is to go a step higher than your highest check. For example, if the biggest value you check for CPU is the last minute average, the recommendation is to set the buffer to 2m. However, setting this value too high could cause memory or performance issues on the box.
Currently, I monitor the 10 minute average, 60 minute average and the 1440 minute average (or 1 day average). What was confusing was that it was failing on the first query for the 10 minute average.
When I tried viewing the CPU in the performance monitor built into Windows, it too failed, which turned out to be the “ah-ha” moment. I simply needed to rebuild the performance counters, and reset the nsclient service again.
Rebuilding the performance counters in Windows 2003 is pretty easy. The command is:
1 <strong>lodctr /R</strong>
Microsoft offers a different approach to rebuilding the counters if you’re running Windows 2000 – HOWEVER – I did run this on a Win2k server, and found it resolved the problem as well. So far, my servers have been reporting the correct CPU averages now for the past 2 days. In the past, there wasn’t anything I could do to get them to work (even restarting the service). I’m still not sure what caused the corruption, but rebuilding seems to be the solution.