Tuesday, May 5, 2015

Exadata Local Read-only file system | End_request: I/O error

---On database Nodes we wont be able to write on / or any local file system.

 [root@dummyhostname01 ~]# df  
 Filesystem       1K-blocks    Used  Available Use% Mounted on  
 /dev/mapper/VGExaDb-LVDbSys1  
             30963708  22372264   7018580 77% /  
 tmpfs         264152064     4  264152060  1% /dev/shm  
 /dev/sda1         516040   40016   449812  9% /boot  
 /dev/mapper/VGExaDb-LVDbOra1  
             103212320  25806260  72163180 27% /u01  
 [root@dummyhostname01 oracle.cellos]# cd conf  
 [root@dummyhostname01 conf]# ls  
 ls: reading directory .: Input/output error  
 [root@dummyhostname01 log]# cd /u01  
 [root@dummyhostname01 u01]# ls  
 app lost+found  
 [root@dummyhostname01 u01]# ll  
 total 20  
 drwxr-xr-x 5 root oinstall 4096 Mar 23 19:11 app  
 drwx------ 2 root root   16384 Mar 10 19:26 lost+found  
 [root@dummyhostname01 u01]# touch test  
 touch: cannot touch `test': Read-only file system  


---On looking at tail -f /var/log/messeges

 May 5 02:18:24 dummyhostname01 adclient[21135]: INFO <fd:25 sudo(100222)> client.sudo Set credentials for user 'root': mapping misconfiguration. Passing user to next service module.  
 May 5 02:18:26 dummyhostname01 kernel: megaraid_sas: Iop2SysDoorbellIntfor scsi0  
 May 5 02:18:27 dummyhostname01 kernel: megasas: Found FW in FAULT state, will reset adapter scsi0.  
 May 5 02:18:27 dummyhostname01 kernel: megaraid_sas: resetting fusion adapter scsi0.  
 May 5 02:18:27 dummyhostname01 kernel: megaraid_sas: Reset not supported, killing adapter scsi0.  
 May 5 02:18:27 dummyhostname01 kernel: sd 0:2:0:0: [sda] Unhandled error code  
 May 5 02:18:27 dummyhostname01 kernel: sd 0:2:0:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK  
 May 5 02:18:27 dummyhostname01 kernel: sd 0:2:0:0: [sda] CDB: Write(10): 2a 00 16 b5 5d 40 00 00 08 00  
 May 5 02:18:27 dummyhostname01 kernel: blk_update_request: 4 callbacks suppressed  
 May 5 02:18:27 dummyhostname01 kernel: end_request: I/O error, dev sda, sector 380984640  
 May 5 02:18:27 dummyhostname01 kernel: Buffer I/O error on device dm-3, logical block 17606304  

---firmware / hardware diagnostic on Database Node.
---As you can not write on Local Filesystem you could write on NFS mounts if there is present on DBNODE.

 dmesg > /NFS_MOUNT/sundiag/dmesg.txt   
 ipmitool sunoem cli force 'show /SP/console/history' > /NFS_MOUNT/sundiag/console.out   
 /opt/MegaRAID/MegaCli/MegaCli64 -FwTermLog Dsply -a0 >/NFS_MOUNT/sundiag/fwterm.txt   
 [root@dummyhostname01 sundiag]# cat /NFS_MOUNT/sundiag/fwterm.txt  
 User specified controller is not present.  
 Failed to get CpController object.  
 Exit Code: 0x01  
 /opt/MegaRAID/MegaCli/MegaCli64 -AdpEventLog -GetEvents -f /NFS_MOUNT/sundiag/events.txt -a0   
 lspci -vvv > /NFS_MOUNT/sundiag/lspci.out  

Solution.

--Finally reboot / restart force fully from ILOM.

 -> stop -f /SYS  
 Are you sure you want to immediately stop /SYS (y/n)? y  
 Stopping /SYS immediately  

1 comment:

  1. Your blog has given me that thing which I never expect to get from all over the websites. Nice post guys!

    ReplyDelete