Wednesday, February 25, 2015

RS-700 Celloflsrv hang detected on Exadata

Links to this post
RS-700 [Celloflsrv hang detected. It will be terminated] [SYS_121111_140712] [] [] [] [] [] [] [] [] [] []
Server Model Oracle Corporation SUN SERVER X4-2L High Capacity
Release Version
Release Label OSS_12.

This is Bug 19132065 - Oracle Linux semtimedop() wakeups by timeout are lagging causing offload operations to fail (which may degrade performance) and errors similar to one or more of the following:
? ORA-700 [Offload issue job timed out]
? ORA-700 [Offload group not open]
? RS-700 [Celloflsrv hang detected. It will be terminated]

This bug affects related to storage Version.
It is due to DB Node RCU delayed and cause Offload job to fail on Cellservices .
it affects database performance not availability.
Error ocure mostly when cellserv tried to do Read optimization.
reducing Delay in RCU is work around accross whole stack.

Step 1: Set rcu_delay for runtime

# echo 1 > /proc/sys/kernel/rcu_delay
Verify the setting
# cat /proc/sys/kernel/rcu_delay

Step 2: Set rcu_delay in /etc/sysctl.conf for proper setting upon reboot

Add "kernel.rcu_delay=1" to /etc/sysctl.conf

Step 3: Restart cellsrv on storage servers

CellCLI> alter cell restart services cellsrv;

This workaround is automatically applied in the following cases:
When a new system is deployed with Exadata or using OEDA Sep 2014 or later.
When storage servers are upgraded to or and the patchmgr plugins patch is properly staged before running patchmgr, as documented.
When database servers are upgraded to or using v3.58 or later.