Skip to content

Tips and Tricks for monitoring Log Files with System Center Operations Manager

November 8, 2014

Last week i have been working on monitoring a  business critical application. This application does not write events to the Windows event log but only in it’s own log files. With System Center Operations Manager you have several options on monitoring these log files. In this post some tips and tricks on how to do log file monitoring with System Center Operations Manager. There are some important considerations that must be taken into account when monitoring log files with System Center Operations Manager. The next lines are taken from the Microsoft support site: (http://support2.microsoft.com/kb/2691973)

When monitoring a log file, Operations Manager remembers the last line read within the file (a ‘high water mark’). It will not re-read data before this point unless the file is deleted and recreated, or renamed and recreated, which will reset the high water mark. If a logfile is deleted and recreated with the same name within the same minute, the high water mark will not be reset, and log entries will be ignored until the high water mark is exceeded. An implication of this is that log files that are cleared periodically without being renamed and recreated, or deleted and recreated, will not have entries in them processed until the high water mark from before the log is cleared is exceeded. Operations Manager cannot monitor ‘circular log files’ (i.e. log files that get to a certain size or line count, then start writing the newest entries at the beginning of the log) for the same reason. The log file must be deleted or renamed and then recreated, or the application configured to write to a new log once the current log is filled. Example:

  • 100 lines are written to logfile.txt
  • logfile.txt is cleared of all entries
  • log entries are written to logfile.txt (position 0 of the file)
  • None of the new entries will be processed until line 101 is written

Each line of a log file must end with a new line (0x0A0x0A hex sequence) before it will be read and processed by Operations Manager. If a rule or monitor is configured to match a pattern for log file names (e.g. using the ? or * wildcard characters), it is important that only ONE log that matches the pattern is written. If multiple logs that match the pattern are being written to, the high water mark is reset to the beginning of the file with each write to a different file. The result is that all previous log entries will be reprocessed. Example:

  • The log file name pattern is generic_csv??.txt
  • The current log is generic_csv01.txt and writes happen to this log.
  • A new log, generic_csv02.txt, is created. Writes occur to this log.
  • When the next line is written to generic_csv01.txt, the Operations Manager will read from the beginning of generic_csv.txt, not from the last point that was read from generic_csv01.txt. Lines previously processed will be processed again, possibly resulting in alerts or other actions (depending on the rule configuration).

Another consideration is that when the log file you configured does not exist you won’t get an alert. When monitoring log files you have again the choice to use a rule or monitor. If you want it to affect the health status of your object you use a monitor. In all other cases a rule. When using a monitor, you have preferably have a healthy log entry so you can automaticly have the object turn healthy again. Where do i start? You want to know which entries you can expect in a log file. Some entries will  affect the functionality on the application where other will not. I found that in many cases it’s difficult for application administrators to specify which errors and warnings can be written to an application log. If you are not sure about this you should consult the supplier and get al the information you can. Another option you can use is reading true old log files in search for Errors and/or warnings. I got the tip to use Notepad++ for this. You have the find in files option which gives you alle the entires in the logs files in a specific folder. You can also just simply get ALL the entires with the word Error or Warning. In this case i used regular expressions to filter the entries that should be picked up. The application i have been working on this week logged an entry: Error code = <0>. In this application it means everything is OK. All higher numbers are BAD. You could configure a regular expression like this:

logfile mon regex

You will find that using regular expressions gives you a certain flexability. (not only in log file monitoring…..) I nice tool i found this week is http://regexpal.com/. Here you can you test your regular expressions so you don’t have to wonder if you created the right syntax:

2014-11-08 11_38_44-Regex Tester

You should create a class in Operations Manager for your application you can target the monitors and rule to this. A computer or group can also be targeted but in my opinion this is not the way to go. So how do i create these monitor and rules? Setting up monitor and rules for log files can be done from the Operations Console. Go to Authoring, select rules or monitors and choose Create new. From here the different monitoring templates are available:

createlogfilemonitor

Maintenance mode: when the targetted object is in maintenance mode, the high water mark keeps running but no alerts will be raised.

One last tip You can display the Params you create in the error description. This way you can quickly view the entry that raised the alert:

createlogfilemonitor2

I hope this post gives you a bit more insights in the possibilities. If any questions come up please let me know. Let’s make it manageable!

Advertisements

From → System Center

5 Comments
  1. leo permalink

    We have SCOM watching a log file on RH Linux servers for errors. When an error is logged in the file an alert is raised. If we put this alert into maintenance mode the alerts stop during that period. Our problem is that errors can still be recorded in the log file during the maintenance mode. Once SCOM is taken out of MM the high water mark starts the monitor back at the same spot it left off at so we get flooded with alerts that we were trying to prevent by putting the alert in Maintenance mode.

    Is the only way to avoid getting those alerts to have the underlying log file moved / deleted / renamed?

  2. Hi Marthijn

    Have you tried the FREE NiCE Log File Management Pack as well? This is a great solution that was developed for the Microsoft System Center community to improve log file monitoring.
    Please see link to the solution and feedback from users here – http://www.nice.de/log-file-monitoring-scom-nice-logfile-mp

    Thanks!

    • Hi,

      Yes, i have heard great things about this management pack but have not yet found the time to play arround with it. I will recommend it as fellow community members are enthousiastic about this pack.

      Thanks!

  3. Bryan permalink

    Any suggestions for monitoring for the following (it is on two separate lines in my log file)? The date / time section will be unique as will the Error / Kernel and BEA numbers.

    <ExecuteRequest failed
    java.lang.OutOfMemoryError: getNewTla.

    Any suggestions / assistance is appreciated!

    Bry

    • Hi Bryan,

      This response is a bit late but maybe you can try the free Nice log file monitoring pack. It has extended possibilities which will probably help you.

      Best regards,

      Marthijn.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: