Group: comp.os.linux.hardware


Subject: Brand new machine mystery lockup
From: Robert M. Riches Jr.
Date: 10/27/2007 2:17:00 PM
On 2007-10-27, Yan Seiner <yan@NsOeSiPnAeMr.com> wrote: > I just built a server that seems to be posessed, or at least flaky. > > It's built on an Asus M2N-SLI DELUXE mobo, with an AMD Athlon 64 X2 4600+ > CPU, 2 gig RAM, and an Adaptec ASC-29320ALP SCSI adapter. The SCSI > adapter has 2 Fujitsu 36 GB 15K drives in a software RAID-1. The power > supply is a SILVERSTONE ST50EF-SC ATX12V / EPS12V 500W. > > Once in a while (like every 2-5 days) the machine locks up: > > Screen goes black, all fans go to full-on, and neither the power nor the > reset button will work. It takes a flip of the power switch on the PS to > restart it. > > Normally I would say that it's the PS, but sometimes - only sometimes, > though - the system won't boot because mdadm can't find any of the md > devices to boot. At this point the kernel's already booted off the SCSI > drives, so I know they're spinning; just mdadm can't find them. This > typically happens on a soft-reboot; again, I have to fully power cycle > the machine to get it to boot. > > Of course there are no errors anywhere at any time in any log. The > machine just stops. > > Google says people have had trouble with that SCSI adapter under windows > but that seems to be a driver problem and it's reported to work fine with > linux. > > So, I have 3 possible culprits: > > Power Supply > Mobo > SCSI adapter > > Any place I can look? Any diagnostics I can do? I have about 2 weeks > left of Newegg's 30 day return timeframe, so I can do some testing.... Running memtest86 for several hours may be useful. HTH -- Robert Riches spamtrap42@verizon.net (Yes, that is one of my email addresses.)

Subject: Brand new machine mystery lockup
From: Cydrome Leader
Date: 10/29/2007 5:37:00 PM
In comp.periphs.scsi Yan Seiner <yan@nsoesipnaemr.com> wrote: > I just built a server that seems to be posessed, or at least flaky. > > It's built on an Asus M2N-SLI DELUXE mobo, with an AMD Athlon 64 X2 4600+ > CPU, 2 gig RAM, and an Adaptec ASC-29320ALP SCSI adapter. The SCSI > adapter has 2 Fujitsu 36 GB 15K drives in a software RAID-1. The power > supply is a SILVERSTONE ST50EF-SC ATX12V / EPS12V 500W. > > Once in a while (like every 2-5 days) the machine locks up: > > Screen goes black, all fans go to full-on, and neither the power nor the > reset button will work. It takes a flip of the power switch on the PS to > restart it. > > Normally I would say that it's the PS, but sometimes - only sometimes, > though - the system won't boot because mdadm can't find any of the md > devices to boot. At this point the kernel's already booted off the SCSI > drives, so I know they're spinning; just mdadm can't find them. This > typically happens on a soft-reboot; again, I have to fully power cycle > the machine to get it to boot. > > Of course there are no errors anywhere at any time in any log. The > machine just stops. > > Google says people have had trouble with that SCSI adapter under windows > but that seems to be a driver problem and it's reported to work fine with > linux. > > So, I have 3 possible culprits: > > Power Supply > Mobo > SCSI adapter > > Any place I can look? Any diagnostics I can do? I have about 2 weeks > left of Newegg's 30 day return timeframe, so I can do some testing.... newegg is prtty good about returns. just send it back, and try again.

Subject: Brand new machine mystery lockup
From: Cydrome Leader
Date: 10/29/2007 7:56:15 PM
In comp.periphs.scsi General Schvantzkopf <schvantzkopf@yahoo.com> wrote: > On Sat, 27 Oct 2007 13:48:48 +0000, Yan Seiner wrote: > >> I just built a server that seems to be posessed, or at least flaky. >> >> It's built on an Asus M2N-SLI DELUXE mobo, with an AMD Athlon 64 X2 >> 4600+ CPU, 2 gig RAM, and an Adaptec ASC-29320ALP SCSI adapter. The >> SCSI adapter has 2 Fujitsu 36 GB 15K drives in a software RAID-1. The >> power supply is a SILVERSTONE ST50EF-SC ATX12V / EPS12V 500W. >> >> Once in a while (like every 2-5 days) the machine locks up: >> >> Screen goes black, all fans go to full-on, and neither the power nor the >> reset button will work. It takes a flip of the power switch on the PS >> to restart it. >> >> Normally I would say that it's the PS, but sometimes - only sometimes, >> though - the system won't boot because mdadm can't find any of the md >> devices to boot. At this point the kernel's already booted off the SCSI >> drives, so I know they're spinning; just mdadm can't find them. This >> typically happens on a soft-reboot; again, I have to fully power cycle >> the machine to get it to boot. >> >> Of course there are no errors anywhere at any time in any log. The >> machine just stops. >> >> Google says people have had trouble with that SCSI adapter under windows >> but that seems to be a driver problem and it's reported to work fine >> with linux. >> >> So, I have 3 possible culprits: >> >> Power Supply >> Mobo >> SCSI adapter >> >> Any place I can look? Any diagnostics I can do? I have about 2 weeks >> left of Newegg's 30 day return timeframe, so I can do some testing.... > > I wrote a system stress test that you can run, > > http://www.polybus.com/sys_basher_web/ > > Sys_basher puts all of the subsystems except graphics under maximum load. > It's multithreaded so it can keep all of your cores at maximum load. It > also does a good job of stressing memory and disk subsystems. The log > file records the temperatures after each test and it writes the log to > disk between tests so that you'll have a record if the system crashes. this program looks interesting. Can you port it solaris 10? bug me off list if you need access to test machines to run it on.

Subject: Brand new machine mystery lockup
From: scott@slp53.sl.home (Scott Lurndal)
Date: 11/8/2007 12:46:23 AM
Yan Seiner <yan@NsOeSiPnAeMr.com> writes: >Screen goes black, all fans go to full-on, and neither the power nor the >reset button will work. It takes a flip of the power switch on the PS to >restart it. > Ok. This behavior indicates that the following is happening: - Interrupts are disabled - The processor is in a tight loop - The processor temperature goes up (due to the tight loop) - an SMI interrupt triggers the BIOS to speed up the fans (due to the high processor temperature). As for reasons, in decending order of probability: - Undetected memory parity error - HBA hardware problem (driver is polling and never sees the polled bit) (this would also be considered a driver problem as no sane driver should poll forever). - A device driver or operating system bug. scott