Index: [Article Count Order] [Thread]

Date:  Mon, 27 Mar 2006 23:15:35 +0300
From:  Ji Do <jido (at mark) k.ro>
Subject:  [coba-e:04378] Server crash: kernel errors with hardware RAID CentOS/Bluequartz Installation CD v3.5
To:  coba-e (at mark) bluequartz.org
Message-Id:  <200603272015.k2RKFZnv029665 (at mark) k.ro>
X-Mail-Count: 04378


I use CentOS 4.2(Final)/Bluequartz Installation CD v3.5 with kernel
2.6.9-34.EL, Adaptec hardware raid controller ATA 2400A (BIOS 1.92), RAID
1, and 2 brandnew 300 GB harddrives.

I have tested that configuration on 2 different servers with different
Adaptec controllers and different harddrives. I also  tested different
RAM's on the motherboard and also on the Adaptec controllers.

If i save files on the server (wget or ftp or sftp) the RAID will crash -->
"Status: Impacted" at the second harddrive

I cannot login, neither at the console nor ssh or BQ GUI after this. But i
can ping and portscan the server, everything seems OK. 
Only a cold reboot and a "Rebuild RAID" (Ctrl-A menu of Adaptec BIOS) will
bring back the server to live.

Sometimes i got a read-only filesystem. Sometimes i got the error

ext3-fs error (device i2o/hda3) in start_transaction journal has aborted

LogWatch sends me this here:

WARNING:  Kernel Errors Present
/dev/i2o/hda error: Failure communi...:  4 Time(s)
Buffer I/O error on device i2o/hd...:  33 Time(s)
EXT3-fs error (device i2o/hda7...:  1 Time(s)
end_request: I/O error, dev i2o/hda, se...:  4 Time(s)
lost page write due to I/O error on i2o/hda7...:  33 Time(s)

and this too:

Mar 24 04:15:48 end_request: I/O error, dev i2o/hda, sector 390697107 
Mar 24 04:15:48 end_request: I/O error, dev i2o/hda, sector 390697371 
Mar 24 04:15:48 end_request: I/O error, dev i2o/hda, sector 390697851 
Mar 24 04:15:48 end_request: I/O error, dev i2o/hda, sector 390698579 

Here are the /var/log/messages:

Mar 24 04:15:48 bq kernel: /dev/i2o/hda error: Failure communicating to
device.
Mar 24 04:15:48 bq kernel: end_request: I/O error, dev i2o/hda, sector
390697107
Mar 24 04:15:48 bq kernel: Buffer I/O error on device i2o/hda7, logical
block 45664293
Mar 24 04:15:48 bq kernel: lost page write due to I/O error on i2o/hda7


I use the same controller on another machine with Bluequartz/FC1 since more
than 1 year and i didnt had any problems like that. I have 120 GB RAID 1 on
that machine.

Does anybody have a solution? Maybe another RAID controller? Which one? Or
a kernel
patch?


---------------------------------
Targul Online de Joburi . Participa si tu!
http://www.myjob.ro/index.php?m=jobfair