rageek

A place for Unix Thoughts and Ideas

Default gateway issues with Solaris 11 auto-installer

I have been working on finishing up my Solaris 11 baseline and I noticed a weird issue where my server would no longer have a default route after the first reboot.

Digging into my logs, I found the following error on the initial boot.

Error creating default route:
 "/usr/sbin/route get default 10.0.0.1 -ifp net0"

Looking into the log file for the service at /var/svc/log/network-install\:default.log

[ Apr 8 16:55:06 Executing start method ("/lib/svc/method/net-install"). ]
add net default: gateway 10.0.0.1
   route to: default
destination: default
       mask: default
    gateway: 10.16.148.1
  interface: net0
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh    rtt,ms rttvar,ms  hopcount      mtu     expire
       0         0         0         0         0         0      1500         0 
Error creating default route:
"/usr/sbin/route get default 10.16.148.1 -ifp net0"

Everything seems correct, except for the error.

So I dug into the service method to see what it was doing:

        if [ "$net_install_route" != "" ]; then
                if [ $ipv6_interface == 1 ]; then
                        details="-inet6 default"
                else
                        details="default"
                fi
                details="$details $net_install_route -ifp $ifp"
                cmd="$ROUTE add $details"
                $cmd
                cmd="$ROUTE get $details"
                $cmd
                if [ $? -ne 0 ]; then
                        err=$?
                        msg="Error creating default route:\n\"$cmd\""
                        net_record_err "$msg" $err
                        return $SMF_EXIT_ERR_FATAL
                fi
                rootdir=$SMF_SYSVOL_FS
                /usr/bin/mkdir -p $rootdir/etc/inet
                if [ $? -ne 0 ]; then
                        err=$?
                        msg="Error creating \"$rootdir/etc/inet\" directory"
                        net_record_err "$msg" $err
                        return $SMF_EXIT_ERR_FATAL
                fi
                cmd="$ROUTE -R $rootdir -p add $details"
                $cmd
                if [ $? -ne 0 ]; then
                        err=$?
                        msg="Error adding persistent default route:\n\"$cmd\""
                        net_record_err "$msg" $err
                        return $SMF_EXIT_ERR_FATAL
                fi
        fi

Looking at the code, the method was adding the gateway, then verifying it was added successful using route get and then creating a persistent entry.

The logic is written so if "/usr/sbin/route get default 10.0.0.1 -ifp net0" returned anything but zero, it would spit out the error and not continue.

Running the command manually it seemed to work fine and I got a zero on the exit code; so something is wrong.

I decided to look at the code for net_record_err in /lib/svc/share/net_include.sh which was responsible for logging the error:

net_record_err()
{
        message=$1
        err=$2

        echo "$message" | smf_console
        if [ $err -ne 0 ]; then
                echo "Error code = $err" | smf_console
        fi
}

Looking at this code, if the exit code was non-zero, I should have a "Error code = " line in my log, which wasn't there.

I came to the conclusion that the route get command was returning a zero and there is some oddity in the net_install script that was preventing it from working.

Since it is run on the initial boot and there is no post install script with auto-installer, I have no opportunity to fix the script prior to it running. Rather than waiting for Oracle to release a fix, I decided to simply code around it in my first-boot-script.

In my first boot script I added the following lines:

egrep -s /usr/sbin/route /var/svc/log/network-install\:default.log
if [ "$?" = "0" ]; then
        CG=`grep /usr/sbin/route /var/svc/log/network-install\:default.log | cut -d\" -f2`
        $CG  > /dev/null 2>&1
        if [ "$?" = "0" ] ;then
                echo "Manually setting static route to deal with issue in net-install script"
                PG=`grep /usr/sbin/route /var/svc/log/network-install\:default.log | cut -d\" -f2 | sed -e 's/get/-p add/'`
                $PG
        fi
fi

This will look for the error, and if it finds it, runs the route get command itself and then adds it persistently if the exit code is 0

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: