monitoring

The golden rule of monitoring : Don't check for failures. This seems obvious once you're used to it : You should always define what you consider to be the working state for a service, and make sure any other state gets treated as a failure, never the other way around.

A typical example would be parsing a log file for lines containing "ERROR:", when all others contain either "NOTICE:" or "WARNING:". Once new unexpected lines containing "CRITICAL:" start appearing :