Emailing Solaris FMA Alerts

Solaris Fault Management is great feature, but lacks basic reporting functionality on Solaris 10.

Here is a script I put together a couple years ago which will email alerts as they occur.

It is run from cron and will email any alerts encountered in the last X minutes.


Here is a example of the output:

Fri Dec 10 18:30:00 PST 2010

Fault Management Events Discovered on badserver.testdomain.com.com in the last 5 minutes:
Dec 10 18:25:42.7028 6f05831c-e318-6771-fafd-ecb888797fed SUN4V-8000-X2


TIME                 UUID                                 SUNW-MSG-ID
Dec 10 18:25:42.7028 6f05831c-e318-6771-fafd-ecb888797fed SUN4V-8000-X2
3%  fault.memory.datapath

Problem in: hc://:product-id=SUNW,T5140:chassis-id=BEL0824NQ1:server-id=badserver.testdomain.com.com:serial=e0332120/motherboard=0/chip=1/branch=0/dram-channel=0/dimm=0
Affects: mem:///unum=MB/CMP1/BR0/CH0/D0/J1800
FRU: hc://:serial=e0332120:part=36HTF51272F667E1D4/motherboard=0/chip=1/branch=0/dram-channel=0/dimm=0
Location: MB/CMP1/BR0/CH0/D0/J1800

93%  fault.memory.datapath

Problem in: hc://:product-id=SUNW,T5140:chassis-id=BEL0824NQ1:server-id=badserver.testdomain.com.com:serial=e2274b22/motherboard=0/chip=0/branch=1/dram-channel=1/dimm=0
Affects: mem:///unum=MB/CMP0/BR1/CH1/D0/J1100
FRU: hc://:serial=e2274b22:part=36HTF51272F677E1D4/motherboard=0/chip=0/branch=1/dram-channel=1/dimm=0
Location: MB/CMP0/BR1/CH1/D0/J1100

2%  fault.memory.datapath

Problem in: hc://:product-id=SUNW,T5140:chassis-id=BEL0824NQ1:server-id=badserver.testdomain.com.com:serial=e133d466/motherboard=0/chip=0/branch=0/dram-channel=1/dimm=0
Affects: mem:///unum=MB/CMP0/BR0/CH1/D0/J0700
FRU: hc://:serial=e133d466:part=36HTF51272F667E1D4/motherboard=0/chip=0/branch=0/dram-channel=1/dimm=0
Location: MB/CMP0/BR0/CH1/D0/J0700

Diagnose online at: http://www.sun.com/msg/SUN4V-8000-X2

fmadm faulty output:

————— ————————————  ————– ———
TIME            EVENT-ID                              MSG-ID         SEVERITY
————— ————————————  ————– ———
Dec 10 18:25:42 6f05831c-e318-6771-fafd-ecb888797fed  SUN4V-8000-X2  Major

Host        : badserver.testdomain.com.com
Platform    : SUNW,T5140    Chassis_id  :

Fault class : fault.memory.datapath max 93%
Affects     : mem:///unum=MB/CMP0/BR1/CH1/D0/J1100
faulted but still in service
FRU         : “MB/CMP0/BR1/CH1/D0/J1100” (hc://:serial=e2274b22:part=36HTF51272F677E1D4/motherboard=0/chip=0/branch=1/dram-channel=1/dimm=0) 93%
“MB/CMP1/BR0/CH0/D0/J1800” (hc://:serial=e0332120:part=36HTF51272F667E1D4/motherboard=0/chip=1/branch=0/dram-channel=0/dimm=0) 3%
“MB/CMP0/BR0/CH1/D0/J0700” (hc://:serial=e133d466:part=36HTF51272F667E1D4/motherboard=0/chip=0/branch=0/dram-channel=1/dimm=0) 2%

Description : Errors have been detected on multiple memory modules, suggesting
that a problem exists somewhere else in the system.
Refer to http://sun.com/msg/SUN4V-8000-X2 for more information.

Response    : Error reports from the affected components will be logged for
examination by Sun.

Impact      : System performance and stability may be affected.

Action      : Contact your service provider for further diagnosis.


2 responses to “Emailing Solaris FMA Alerts

  1. Matthew Obi April 6, 2011 at 11:19 pm

    In the example above, how did you resolve this issue? Did you need to replace the affected DIMMS?

  2. jflaster April 8, 2011 at 9:41 pm

    For this server, we ultimately ended up replacing the motherboard.

