« Exchange 5.5 Site Consolidatin Notes - object rehoming | Sample Script - Check web server from a batch script »

Sample Script - Network Appliance snapshot monitor

As part of a HA Exchange 2003 rollout using iSCSI on Network Appliance filers, I needed a utility to monitor that our snapshots were running every few hours without issues. I couldn't find anything, so I wrote a script to do it.

Here's a little background:
We're running Exchange 2003 on 2 node MSCS clusters, with the back end storage on Netwok Appliance FAS-960c (cluster) over iSCSI. We have an R200 NearStore hooked up to Veritas NetBackup over NDMP. Exchange backups are run through SnapManager for Exchange Snapshot and SnapMirrored from the filers to the R200, then to tape. We snapmirror every 3 hours during the business day.

The Goal:
If the snapshots on the NearStore are ever older than 4 hours, we need to be alerted. SnapManager for Exchange will alert us if the backup process fails between Exchange and the front end filer, but it doesn't (currently) monitor that the snapshot was successfully snapmirrored all the way to the NearStore.

Requirements:
1. This script connects to the NearStore over RSH, calls snapmirror status, then decides if any of the lag times is older than (four) hours. In order to run this, you'll have to configure your filer to accept RSH connections from the box and account running this script.

2. Change the SMTP alerting variables for your environment. This script is written to use bmail, a freeware SMTP command line utility, but can easily be modified to use postie or blat.

3. You can change the alarm on x hours of lag time by modifying the script where lag time comments show 4 hours. (Total of 2 lines).


Deployment:

Copy the script, save as a .cmd file. Change the config and schedule it to run as a scheduled task. Make sure you've added the host and account to the RSH access table on all filers and your NearStore.


The Script:

Download here, or cut and paste the following code...


::REM -------------------------------- Begin Sample Batch Script ----------------------------------------
::REM Network Appliance Snapmirror Monitor script "SnapMonitor"
::REM by John D. Seaman, www.japan-page.net/batch
::REM (C) 2006 John D. Seaman, copylefted under terms of the GNU/GPL


@echo off
cls


::REM About this script...

echo.
echo Network Appliance SnapMirror Monitor v.2005.6.15 by JDS
echo.

echo.
echo This batch script will confirm that the lag time on SnapMirrors
echo on the R200 are not older than 4 hours. Anything older indicated
echo a problem with the SnapMirror process.
echo.
echo Due to a limitation with looping logic in batch files, this script
echo will alert and report only 1 instance of a lagging snapmirror. If
echo multiple lagging snapmirrors exist you'll notice this in the log file
echo anyway.
echo.
echo This script must run against both filers and the near store to
echo produce a complete report.
echo.
echo.
echo.


::REM ---------- Begin User Configuration ---------------------

::REM You need each front end filer and the nearstore to appear on a line in filers.ini. I prefer to
::REM compile this on the fly because people have a bad habit of deleting static .ini files...

::REM Generate filer list...

echo filer1>_filers.ini
echo filer2>>_filers.ini
echo nearstore1>>_filers.ini


::REM Set SMTP alert variables

set _sbj="Error: SnapMirror replication problem detected"
set _msg="A SnapMirror error was detected. Examine the log file for more info."
set _hst=smtp.yourdomain.com
set _frm=%computername%@yourdomain.com
set _too=admin@yourdomain.com


::REM ---------- End User Configuration ---------------------

::REM Debug mode (0 deletes files, 1 keeps output files)

set _debug=0

::REM Create a log file, initialize error variable

set _log=snapmon.log
set _alert=0


echo.>%_log%
echo.>>%_log%
echo.>>%_log%
echo NetApp SnapMirror Monitor v.2005.6.15 by JDS >>%_log%
echo Generated by %computername% at %time% on %date%... >>%_log%
echo ------------------------------------------------------------------------------ >>%_log%
echo.>>%_log%
echo.>>%_log%

::REM Do it finally...

::REM Get the SnapMirror status output in a loop for all 3 NetApp devices...


::REM Process in a loop

for /f %%i in (_filers.ini) do call :checkfiler %%i

echo.
echo.>>%_log%

::REM Send out SMTP alert, if needed.

if /i %_alert% EQU 1 (
echo Alert condition detected, sending alert e.mail...
echo Alert condition detected, sending alert e.mail...>>%_log%
echo ^
echo ^ >>%_log%

bmail -f %_frm% -s %_hst% -t %_too% -a %_sbj% -b %_msg% -m %_log% -d -h >smtp.log

echo Alert mail send call completed...
echo Alert mail send call completed... >>%_log%
)


echo Finished processing...
echo Finished processing...>>%_log%

::REM Cleanup

if /i not %_debug% EQU 1 del /q _*.txt

goto :EOF


::REM ---------------- F U N C T I O N S ----------------

:checkfiler

echo.
echo.>>%_log%

echo Now checking filer %1
echo ---------------------------------------------
echo Now checking filer %1 >>%_log%
echo --------------------------------------------->>%_log%

::REM Get the filer snapmirror status

rsh %1 snapmirror status >_%1out.txt

::REM Strip header line (lag time)

type _%1out.txt | find ":" >>_%1out2.txt


::REM Get the lag time field...

if exist _error.txt del /q error.txt
if exist _snaplag.txt del /q _snaplag.txt
for /f "tokens=1,2,3,4,5*" %%i in (_%1out2.txt) do echo %%l >>_snaplag.txt

::REM Get the first two digits of the lag time, find anything >= 4

for /f "delims=:, tokens=1,2,3*" %%i in (_snaplag.txt) do (
echo SnapMirror lag time value is %%i...
if /i %%i GEQ 04 (set _alert=1)
)

echo Alert status for host %1 is %_alert%...
echo Alert status for host %1 is %_alert%...>>%_log%

::REM This line strips out any lag times LEQ 03. Change this is you change the lag time from "4" hours.

type _%1out2.txt | find /V " 00:" | find /V " 01:" | find /V " 02:" | find /V " 03:">>%_log%
del /q _%1out2.txt

echo.>>%_log%
echo.>>%_log%


goto :EOF


::REM ---------------- E N D ----- F U N C T I O N S ----------------


:EOF
::REM -------------------------------- End Sample Batch Script ----------------------------------------

Comments

You should look into RecoverGuard. It has the tests you're looking for and much more.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)