2012-02-03

Long Log File Sync Time in 11.2

AIX: Things To Check When Seeing Long Log File Sync Time in 11.2. [ID 1318709.1]
You should apply 11.2.0.2.5 (PSU5 for 11GR2 ) in order to fix bug 9829397

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.3 - Release: 11.2 to 11.2
IBM AIX on POWER Systems (64-bit)

Symptoms

  Long log file sync times on AIX 6.1 (+ later) and Oracle 11.2


Top 5 Timed Foreground Events

Event                                       Waits       Time(s)   Avg wait (ms)   % DB time   Wait Class
db file sequential read      6,456,003        43,555                       7               42.00   User I/O
log file sync                             5,192        21,585                4157                20.81  Commit
read by other session        1,436,853        13,718                    10                13.23  User I/O
log buffer space                       4,749          8,790                1851                  8.48  Configuration
DB CPU                                                     5,518                                          5.32


Performing  a TRUSS of LGWR  e.g. truss -dD -o lgwrtruss.log -p <ospid>
you can see that the time reported next to lseek may be very high, and warning messages are being written to the LGWR trace.  This is the elapsed  time since the previous call aio_nwait64.


0.0009: aio_nwait64(0x0000000000001000, 0x0000000000000012, 0x0FFFFFFFFFFEC0A0, 0x800000000000D032, 0x3FFC000000000003, 0x000000000000011C, 0x0000000000000000, 0x0000000000000000) = 0x0000000000000012
2.4272: lseek(20, 0, 1) = 1689988                    <====== Time between lseek and previous call is high
kwrite(20, "\n * * * 2 0 1 1 - 0 3".., 29) = 29
kwrite(21, " 3 ? + O M 9 ~ r T\n", 10) = 10
0.0002: lseek(20, 0, 1) = 1690017                   
kwrite(20, " W a r n i n g : l o g".., 52) = 52    <====== Warning message written to LGWR trace
kwrite(21, " 3 ? V S 2 ~ 0 q\n", 9) = 9
kwrite(20, "\n", 1) = 1
0.0002: thread_post(12321045) = 0

The above shows LGWR taking a long time and then writing the below message to the LGWR trace file.
Warning: log write elapsed time XXXXms, size XKB

If we look just at the times since the previous call, by looking at lseek times , they jump when load becomes slightly higher.  Write times were good though.


0.0003: lseek(20, 0, 1) = 1686342
0.0002: lseek(20, 0, 1) = 1686564
0.0002: lseek(20, 0, 1) = 1686757
0.0002: lseek(22, 0, 0) = 0
0.0002: lseek(482, 0, 0) = 0
0.0002: lseek(483, 0, 0) = 0
0.0026: lseek(20, 0, 1) = 1687283
2.1588: lseek(20, 0, 1) = 1687285  <== elapsed time since previous call (aio_nwait64) jumps, which we see
2.1983: lseek(20, 0, 1) = 1687367          represented next to the lseek call.
2.1029: lseek(20, 0, 1) = 1687449
2.1846: lseek(20, 0, 1) = 1687531

Changes

Database Upgraded to 11.2 or a AIX version, TL or SP is applied

Cause

The operating system, or file system is not correctly configured.

Solution

It is important to ensure that all AIX system parameters are correctly configured and that the file system block size aligns with the block size of the online redo logs.  

There have been multiple occasions where an upgrade of Oracle to 11g, or the application of a later Technology Level (TL) and Service Pack (SP) has seem a sudden increase in log file sync wait times with the symptom being high delta between aio_nwait64 and lseek calls seen during truss and high log file sync times.

Even though the only change may be an Oracle upgrade or a OS patch, multiple issues with the above symptoms have been addressed by either correctly configuring the OS or filesystem containing online redo.

Some previous changes that have resolved long LFS issue in Oracle 11.2  on AIX include :-
- Disabling Disk I/O Pacing for the filesystem with Redo Logs. (Enabled by default in AIX 6.1) 
- Correctly aligning file system and redo log block size to avoid demoted I/O's
- Correctly setting vmo and network parameters

Please review the following IBM whitepaper "Oracle Architecture and Tuning on AIX" and ensure that minimum recommended settings are met.

Disabling or delaying the warning message from being written to the LGWR trace file also may help improve LFS times.  In 10.2 set event 10468 level 4.  In 11.2, you can vary the message warning threshold using 
"_long_log_write_warning_threshold"  which sets the threshold for long log write warning messages in ms.

If the everything is correctly configured then applying the fix for Bug 12412983 - AIX: "asynch descriptor resize" may also help.  This contains both the fix of bug 12412983 and bug 9829397.

There is also a merge patch 12986882: MERGE  ON TOP OF 11.2.0.2.0 FOR BUGS 9829397 12412983.

Note: 
The following recommendation may help in alleviating long 'log file sync' waits
However we highly advise you to contact IBM to verify any changes to configuration on the OS level
Please also ensure all recommended OS patches have been installed

Niciun comentariu:

Trimiteți un comentariu