2011-04-29

Document TitleOCR Corruption after Adding/Removing voting disk to a cluster when CRS stack is running

OCR Corruption after Adding/Removing voting disk to a cluster when CRS stack is running (Doc ID 390880.1)

Applies to:

Oracle Server Enterprise Edition - Version: 10.2.0.1 to 10.2.0.4
Information in this document applies to any platform.
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3
Oracle Clusterware

Description

In 10gR2, Oracle introduced the concept of multiple voting disks to prevent single point of failure. The command to add voting disks is
#crsctl add css votedisk /ocfs1/votdisk2.dbf
More details of the above command can be found at http://download-west.oracle.com/docs/cd/B19306_01/rac.102/b14197/votocr.htm#sthref226

When the CRS stack is running and the add votedisk command or the crsctl remove votedisk command is executed, It will fail with a "Cluster is not in a ready state for online disk addition" message. This is because of internal bug 3972986. Dynamic addition and deletion of voting disk when the Clusterware is running is not possible without the fix for this bug.

Some customers may mis-interpret the "Cluster is not in a ready state for online disk addition"  message and try to run the command with the -force attribute when the stack is running to force the change.  This is known to corrupt the OCR. It is not recommended to use the -force attribute to add or delete the voting disk when the Clusterware is running.
Without the fix for bug 3972986, voting disks **CANNOT** be added or deleted when the clusterware is running on any node.

Likelihood of Occurrence

The chances of hitting this issue is very high.

Possible Symptoms

Unfortunately there is no clear symptom of the corruption caused by running the crsctl add/delete command with the -force attribute. Eventually the corruption of the OCR manifests itself in different ways with different errors.  In some cases, they may lead to node evictions or crsctl query commands returning inconsistent results.
Further it is also important to note that No errors are reported when adding or deleting the voting disk with the -force attribute when the Oracle clusterware is up and running.

Workaround or Resolution

Workaround to the problem other than applying the patch for bug 3972986  after it becomes available is to ensure that the add or remove the voting disk commands are executed only after bringing down the clusterware on all nodes.  As mentioned in the Oracle documentation, this requires the use of -force option. The -force attribute can be safely used ONLY if the Clusterware is stopped on all the nodes of the cluster.
If the voting disk has already been added using the -force option when the Oracle clusterware stack is running then it is recommended to stop the Clusterware on all the nodes and restore the OCR from the time before the votedisk was added or removed. Steps to restore the OCR is documented http://download-west.oracle.com/docs/cd/B19306_01/rac.102/b14197/votocr.htm#i1012456
Oracle development is already aware of this issue and a fix will be issued shortly.

Patches

At this time there is no patch for this issue

Modification History

09-Sep-06 Note created by Anil Nair
12-Sep-06 Implemented changes suggested by John.
18-Sep-06 Implemented suggestion to add remove

Niciun comentariu:

Trimiteți un comentariu