A place for Unix Thoughts and Ideas

Installing Oracle RAC 11gR2 on Solaris with Veritas 5.0MP3

A couple months ago I became very intimate with ZFS live upgrade and Veritas Filesystem checkpoints as I repeatedly tried in vain to upgrade my 11gR1 installation to 11gR2.

New installs worked fine, but the upgrades would hang and/or fail during the root.sh execution on the 2nd node.

After attempting this upgrade 20 times (thank goodness for zfs and checkpoints, failed oracle upgrades are painful to roll back), I stumbled upon the cause of my pain.

The Oracle Installer fails on the root.sh if the the CRS cluster address is not the native address on the private network adapters. For some odd reason, when VCS brings the private nics online, the IP’s were getting added in the wrong order. It didn’t happen on the first node, but definitely on the second node.

Here is how to check:
For my example system, the private network addresses as follows

#primary nic testnode-01-priv-crs testnode-01-priv-udp1
#secondary nic testnode-01-priv-udp2

#primary nic testnode-02-priv-crs testnode-02-priv-udp1
#secondary nic testnode-02-priv-udp2

# grep “`hostname |cut -d’.’ -f1`-priv-crs” /etc/hosts testnode-01-priv-crs

# ifconfig -a | gegrep -A1 ‘g1|ge1’
e1000g1: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1500 index 4
inet netmask fffffff0 broadcast

e1000g1:1: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 1500 index 4
inet netmask fffffff0 broadcast

On this system the addresses are in the wrong order.

Here is how I fixed it (on my system the CSSD and multipriv nic resources are in the cvm group)

Freeze the group containing the cssd and private nic resources
hagrp -freeze cvm

Run on both nodes
/etc/init.d/init.cssd stop

Reconfigure the IPs on the effected systems
hagrp -clear cvm
hagrp -unfreeze cvm
haconf -makerw
hares -modify multi_priv Enabled 0

ifconfig e1000g1 removeif
ifconfig e1000g1 netmask
ifconfig e1000g1 addif

hares -modify multi_priv Enabled 1
haconf -dump -makero
hagrp -freeze cvm

The IP’s are now in the correct order and the health checks in vcs will succeed

Restart CRS on both nodes
/etc/init.d/init.cssd start

After cssd restarts, unfreeze the group

hagrp -unfreeze cvm

Additionally, here are a couple additional things to note prior to running the CRS upgrade:

  1. The Veritas Documentation for 11gR2 is more complete on certain platforms. My final configs were a blend of the Solaris and Linux documentation.
  2. Unset all oracle environment variables prior to running the universal installer
  3. If you were previously using 11gR1 with Multi Priv nic and deleted all interfaces from the cluster config (using oifcfg delif -global), you must add them back to the configuration prior to starting the upgrade. If you don’t, it will hang.
  4. Run the following to prevent crs from starting at boot, seems to fix the odd crs start issues that I had on 10g.
      • $GRID_HOME/bin/crsctl disable crs
      • $GRID_HOME/bin/crsctl set css misscount 600

I will also add that the installer received a serious update in the release and may not have the above problems. For me, once I had a working procedure, I used it going forward. Let me know if you want me to post my full procedure for upgrading/installing 11gR2 with VCS 5.0mp3 on Solaris 10.


