Saturday, August 14, 2010

Zenoss Quirks

So I ran into two really obnoxious issues that took a considerable amount of digging to resolve:

1. It was marking my apache process monitor as failed/recovered randomly and often.
2. It would not let go of misapplied process monitors that had been picked-up by overly liberal regexes. It would include:

tail -f /var/log/nginx/access.log
nginx: worker

in the nginx process monitor if I was tailing the log when I modeled the host. The problem was, after I killed the tail, the process was alerting as "Process Not Running". FOREVER. Even if I deleted and recreated the process monitor, the host, the events, everything.

In the first case, it turns out that since apache marks its process as "apache defunct" when it's shutting down a child process, Zenoss would occasionally pick this up as a live apache process. It would then mark it as "Down" after the proc terminated. The solution for this was to make my regex more specific:

apache2 \-k start

The second case was much more obnoxious because not only would the events not clear, they would return and begin alerting every time the device was recreated.

After some digging online, it seemed that the best course was to restart the zenprocess daemon.

This is best done under Settings > Daemons. You can also view the logs there (which showed the bad checks prior to the restart and nothing after).

When that's complete, re-add your device and you should be rid of the baggage.