2012-12-27

CSSD aborting from thread clssnmvDiskPingMonitorThread

11.2.0.3 CRS Abort With "CSSD aborting from thread clssnmvDiskPingMonitorThread" if Only One Voting Disk/File is Configured [ID 1466639.1]

Applies to:

Oracle Server - Enterprise Edition - Version 11.2.0.3 to 11.2.0.3 [Release 11.2]
Information in this document applies to any platform.

Description

On 11.2.0.3 (prior to 11.2.0.3.4 PSU), one of the cluster nodes may experience CRS restart intermittently (no node reboot) with ocssd message point to "clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1". As the result, ASM and database instance on the affected node also get restarted. It is caused by a racing condition when checking voting disk availability from different thread. It is reported and fixed in an unpublished bug 13869978.

Occurrence

It only affects cluster with 1 voting disk/file configed for Grid Infrastructure 11.2.0.3 prior to applying 11.2.0.3.4 PSU.

Symptoms

<grid-home>/log/<node>/cssd/ocssd.log shows the following:
2012-05-28 07:45:32.823: [    CSSD][1075423552](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1
2012-05-28 07:45:32.835: [    CSSD][1075423552]###################################
2012-05-28 07:45:32.835: [    CSSD][1075423552]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
2012-05-28 07:45:32.835: [    CSSD][1075423552]###################################
2012-05-28 07:45:32.835: [    CSSD][1075423552](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2012-05-28 07:45:32.849: [    CSSD][1075423552]

----- Call Stack Trace -----
2012-05-28 07:45:32.857: [    CSSD][1075423552]calling              call     entry                argument values in hex
2012-05-28 07:45:32.858: [    CSSD][1075423552]location             type     point                (? means dubious value)
2012-05-28 07:45:32.859: [    CSSD][1075423552]-------------------- -------- -------------------- ----------------------------
2012-05-28 07:45:32.881: [    CSSD][1075423552]clssscExit()+740     call     kgdsdst()            000000000 ? 000000000 ?
2012-05-28 07:45:32.884: [    CSSD][1075423552]clssnmvDiskCheck()+  call     clssscExit()         2AAAAC477780 ? 000000002 ?
2012-05-28 07:45:32.887: [    CSSD][1075423552]clssnmvDiskPingMoni  call     clssnmvDiskCheck()   2AAAAC477780 ? 2AAAAC0A3C40 ?
2012-05-28 07:45:32.888: [    CSSD][1075423552]torThread()+423                                    04019A0B8 ? 000000000 ?
2012-05-28 07:45:32.890: [    CSSD][1075423552]clssscthrdmain()+25  call     clssnmvDiskPingMoni  2AAAAC477780 ? 2AAAAC0A3C40 ?
For some cases, the following may show up in ocssd.log:
2012-03-20 23:11:19.337: [    CSSD][3956]clssnmFindVFByVDIN: Requested guid 0b11163b-77614f16-bf6dea8e-e0b9a98b, vdisk guid 0b11163b-77614f16-bf6dea8e-e0b9a98b (0000000007D8E248) - len 16, vfile (0000000007D8B980), link (0000000007D8B980)
2012-03-20 23:11:19.337: [    CSSD][3956]clssnmFindVFByVDIN: Voting file not found - queue(0000000007CF8AC0), prev (0000000007D8B980), next (0000000007D8B980)
2012-03-20 23:11:19.337: [    CSSD][3956]clssnmvDiskCheck: No voting file found for guid 0b11163b-77614f16-bf6dea8e-e0b9a98b
Usually, if there is a voting disk IO issue, the following will be seen in ocssd.log before cssd aborts the node:
2012-05-22 14:13:21.939: [    CSSD][1101846848]clssnmvDiskCheck: (ORCL:DATA01) No I/O completed after 75% maximum time, 27000 ms, will be considered unusable in 6640 ms
..
2012-05-22 14:13:26.408: [    CSSD][1101846848]clssnmvDiskCheck: (ORCL:DATA01) No I/O completed after 90% maximum time, 27000 ms, will be considered unusable in 2170 ms

OR
If access to voting disk is down instead of slow, an OS error will be printed.

Workaround

Use 3 or more voting disks/files instead of 1.
If the voting disk is on ASM, move the voting disk to a normal or high redundancy diskgroup. Please refer to note 428681.1 OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE) for instructions to move voting disks.
As a best practice, It is recommended to config multiple voting disks.

Patches

The bug 13869978 fix has been included in 11.2.0.3 Grid Infrastructure PSU 4 and above. Please apply 11.2.0.3.4 GI PSU (patch 14275572).
Alternatively, interim patch 13869978 has been provided for 11.2.0.3.2 and 11.2.0.3.3 PSU on various platform, please check My Oracle Support "Patches & Updates" for availability.

History

Database - RAC/Scalability Community
To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Database - RAC/Scalability Community

References

NOTE:428681.1 - OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)

Niciun comentariu:

Trimiteți un comentariu