rageek

A place for Unix Thoughts and Ideas

Solaris 10 update 9 zpool woes

Years ago I used to have a huge issue with the Oracle OEM agents core dumping constantly and rapidly filling up 100GB+ filesystems in a couple of hours.

My solution at the time was to consolidate the core dumps on /var/core and make /var/core a compressed 5GB zfs filesystem.

Since ZFS root wasn’t supported at the time, the zpool used a file in /var for hosting the zpool. This worked extremely well.

When I moved all my databases into local zones, I implemented a similar scheme, except that the zone roots were on vxfs filesystems instead of UFS. The core filesystem was a file again that existing in the zone root and then was included as a legacy zfs mount through a dataset definition.

This worked also worked well, until update 9 of solaris 10 where some subtle changes in the startup services tried to online all of my zpools prior to my vxfs filesystems coming online. The end result was that all of my zpool came up faulted.

This was easily remedied by running a zpool clear:
for i in `zpool list | grep FAULTED | awk ‘{print $1}’`; do zpool clear $i; done

But this requires manually intervention and delayed the starting of my zones.

The fix for this issue was create a solaris service that started right after system/filesystem/local which clears any faulted zpools.

Here is my service:

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!--
    Copyright 2004 Sun Microsystems, Inc.  All rights reserved.
    Use is subject to license terms.

    pragma ident        "@(#)zpool_clear.xml 1.2     04/08/09 SMI"
-->
<service_bundle type='manifest' name='zpool_clear'>

<service
    name='system/filesystem/zpool_clear'
    type='service'
    version='1'>

    <single_instance/>
        <dependency
            name='usr'
            type='service'
            grouping='require_all'
            restart_on='none'>
            <service_fmri value='svc:/system/filesystem/local'/>
        </dependency>

        <exec_method
            type='method'
            name='start'
            exec='/lib/svc/method/zpool_clear.sh start'
            timeout_seconds='30' />

         <exec_method
            type='method'
            name='stop'
            exec='/lib/svc/method/zpool_clear.sh stop'
            timeout_seconds='30' />
        <property_group name='startd' type='framework'>
                <propval name='duration' type='astring' value='transient' />
        </property_group>

        <instance name='default' enabled='true' />

        <stability value='Unstable' />

        <template>
                <common_name>
                        <loctext xml:lang='C'>
                                Zpool Service
                        </loctext>
                </common_name>
        </template>
</service>
</service_bundle>

And my startup script

#!/bin/sh
#
# zpool_clear.sh
#
case "$1" in
'start')
        for i in `zpool list | grep FAULTED | awk '{print $1}'`
        do
                echo "clearing FAULTED status on Zpool $i"
                zpool clear $i
        done

	zfs mount -a
        ;;
*)
        echo "Usage: $0 start"
        ;;
esac
exit 0

Installation:
cp zpool_clear.sh /lib/svc/method/zpool_clear.sh
cp zpool_clear.xml /var/svc/manifest/site
chmod +x /lib/svc/method/zpool_clear.sh

svccfg import /var/svc/manifest/site/zpool_clear.xml
svcadm enable zpool_clear

Updated 4/16/2012: added stop method to manifest to suppress errors while importing on Solaris 11
Updated 8/20/2012: added zfs mount -a to catch auto-mounting zfs datasets

Advertisements

2 responses to “Solaris 10 update 9 zpool woes

  1. RakuMyLady February 7, 2012 at 1:27 am

    Thank you for sharing your manifest solution! I have a very similar problem yet it is related to NetApp luns over iscsi network. Even if the iscsi service isn’t ready (for whatever reason, being either network error or the host iscsiadm client), the zfs pool recoveries “jump in”, regardless. It becomes a race as to “who is first”. These same pools are being used as zone roots, so I had to implement a quick workaround that included exporting these pools to avoid data corruption. Now this still is non-functional when it comes to adding in a VCS component, but that’s a whole different layer of “non-communication”. In conclusion, my workaround is not nearly as elegant as yours. I’ll be interested in pursing how I can apply an adaptation of your solution for my “most irritating” iscsi network scenario. Great post !

    • jflaster February 7, 2012 at 2:15 am

      You’re welcome. I noticed that sometimes I get emails about the faults and sometimes I don’t. Thanks for the heads up on the similar iSCSI issues. It would be nice if they changed this behavior. I wonder if it is still present in Solaris 11.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: