| Hallo zusammen, 
 ich habe mir gestern Kanotix 2006-01-RC4 installiert und gleich ein fehlerfreies upgrade auf etch gemacht.
 
 meine Festplatten sehen so aus:
 
 Code: 
# fdisk -l
 Platte /dev/hda: 13.5 GByte, 13578485760 Byte
 255 Köpfe, 63 Sektoren/Spuren, 1650 Zylinder
 Einheiten = Zylinder von 16065 × 512 = 8225280 Bytes
 
 Gerät  boot.     Anfang        Ende     Blöcke   Id  System
 /dev/hda1   *           1        1530    12289693+   7  HPFS/NTFS
 /dev/hda2            1531        1649      955867+   f  W95 Erw. (LBA)
 /dev/hda5            1531        1649      955836    b  W95 FAT32
 
 Platte /dev/sda: 80.0 GByte, 80026361856 Byte
 255 Köpfe, 63 Sektoren/Spuren, 9729 Zylinder
 Einheiten = Zylinder von 16065 × 512 = 8225280 Bytes
 
 Gerät  boot.     Anfang        Ende     Blöcke   Id  System
 /dev/sda1   *           1         729     5855661   83  Linux
 /dev/sda2            9536        9729     1558305   82  Linux Swap / Solaris
 /dev/sda3             730        9535    70734195   83  Linux
 
 Auf /dev/sda1 habe ich das Root-Verzeichnis drauf, auf /dev/sda3 alle homes.
 
 Mein Problem:
 
 gelegentlich hängt das System (Maus geht noch, aber sonst nichts mehr) oder es friert völlig ein. Zwei mal ist mir auch schon X abgestürzt.
 Häufig höre ich kurz vor dem Einfrieren ein (1) leichtes Klack aus dem Rechner, als ob der RW-Arm in die Parkposition fahren würden.
 
 Um Hardwareschäden an der /dev/sda zu untersuchen habe ich smartctl mit unterschiedlichen Optionen ausgeführt.
 
 Code: 
 # smartctl -a -d ata -F samsung2 /dev/sda
Anmerkung zur letzten Zeile oben: was ist von den vielen UDMA-Errors zu halten??smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
 Home page is http://smartmontools.sourceforge.net/
 
 === START OF INFORMATION SECTION ===
 Device Model:     SAMSUNG HD080HJ
 Serial Number:    S08EJ1NL469000
 Firmware Version: ZH100-41
 User Capacity:    80.026.361.856 bytes
 Device is:        In smartctl database [for details use: -P show]
 ATA Version is:   7
 ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
 Local Time is:    Fri Jun 15 16:55:55 2007 CEST
 
 ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.
 
 SMART support is: Available - device has SMART capability.
 SMART support is: Enabled
 
 === START OF READ SMART DATA SECTION ===
 SMART overall-health self-assessment test result: PASSED
 
 General SMART Values:
 Offline data collection status:  (0x02) Offline data collection activity
 was completed without error.
 Auto Offline Data Collection: Disabled.
 Self-test execution status:      (   0) The previous self-test routine completed
 without error or no self-test has ever
 been run.
 Total time to complete Offline
 data collection:                 (1848) seconds.
 Offline data collection
 capabilities:                    (0x5b) SMART execute Offline immediate.
 Auto Offline data collection on/off support.
 Suspend Offline collection upon new
 command.
 Offline surface scan supported.
 Self-test supported.
 No Conveyance Self-test supported.
 Selective Self-test supported.
 SMART capabilities:            (0x0003) Saves SMART data before entering
 power-saving mode.
 Supports SMART auto save timer.
 Error logging capability:        (0x01) Error logging supported.
 General Purpose Logging supported.
 Short self-test routine
 recommended polling time:        (   1) minutes.
 Extended self-test routine
 recommended polling time:        (  30) minutes.
 
 SMART Attributes Data Structure revision number: 16
 Vendor Specific SMART Attributes with Thresholds:
 ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x000f   253   093   051    Pre-fail  Always       -       0
 3 Spin_Up_Time            0x0007   100   100   025    Pre-fail  Always       -       4032
 4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       676
 5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail  Always       -       0
 7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
 8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail  Offline      -       0
 9 Power_On_Hours          0x0032   253   253   000    Old_age   Always       -       175
 10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   253   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       625
 187 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       131073
 190 Unknown_Attribute       0x0022   076   073   000    Old_age   Always       -       54
 194 Temperature_Celsius     0x0022   076   073   000    Old_age   Always       -       54
 195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       24890
 196 Reallocated_Event_Count 0x0032   253   253   000    Old_age   Always       -       0
 197 Current_Pending_Sector  0x0012   253   253   000    Old_age   Always       -       0
 198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0
 199 UDMA_CRC_Error_Count    0x003e   135   094   000    Old_age   Always       -       23746
 Code: 
200 Multi_Zone_Error_Rate   0x000a   253   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   100   000    Old_age   Always       -       0
 202 TA_Increase_Count       0x0032   253   253   000    Old_age   Always       -       0
 
 SMART Error Log Version: 1
 ATA Error Count: 17176 (device log contains only the most recent five errors)
 CR = Command Register [HEX]
 FR = Features Register [HEX]
 SC = Sector Count Register [HEX]
 SN = Sector Number Register [HEX]
 CL = Cylinder Low Register [HEX]
 CH = Cylinder High Register [HEX]
 DH = Device/Head Register [HEX]
 DC = Device Command Register [HEX]
 ER = Error register [HEX]
 ST = Status register [HEX]
 Powered_Up_Time is measured from power on, and printed as
 DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
 SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
 Error 17176 occurred at disk power-on lifetime: 155 hours (6 days + 11 hours)
 When the command that caused the error occurred, the device was active or idle.
 
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 08 5f 00 00 e0  Error: ICRC, ABRT at LBA = 0x0000005f = 95
 
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 ca 00 08 5f 00 00 e0 00      01:54:03.375  WRITE DMA
 ca 00 10 3f 00 00 e0 00      01:54:03.313  WRITE DMA
 schon wieder beginnen DMA Fehler
 Code: 
Error 17175 occurred at disk power-on lifetime: 155 hours (6 days + 11 hours)
hatte 3 Selbsttests durchlaufen lassen (2x kurz 1 x lang): keine Fehler -> Klack kein Headcrash ?!When the command that caused the error occurred, the device was active or idle.
 
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 10 3f 00 00 e0  Error: ICRC, ABRT at LBA = 0x0000003f = 63
 
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 ca 00 10 3f 00 00 e0 00      01:54:03.313  WRITE DMA
 c8 00 50 7f 12 00 e0 00      01:54:03.250  READ DMA
 c8 00 00 7f 11 00 e0 00      01:54:03.250  READ DMA
 c8 00 00 cf 12 00 e0 00      01:54:03.250  READ DMA
 c8 00 a8 d7 10 00 e0 00      01:54:03.250  READ DMA
 
 Error 17174 occurred at disk power-on lifetime: 154 hours (6 days + 10 hours)
 When the command that caused the error occurred, the device was active or idle.
 
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 f8 3f 0f b0 e0  Error: ICRC, ABRT at LBA = 0x00b00f3f = 11538239
 
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 ca 00 f8 3f 0f b0 e0 00      01:17:14.375  WRITE DMA
 
 Error 17173 occurred at disk power-on lifetime: 154 hours (6 days + 10 hours)
 When the command that caused the error occurred, the device was active or idle.
 
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 f8 3f 0f b0 e0  Error: ICRC, ABRT at LBA = 0x00b00f3f = 11538239
 
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 ca 00 f8 3f 0f b0 e0 00      01:17:14.313  WRITE DMA
 
 Error 17172 occurred at disk power-on lifetime: 154 hours (6 days + 10 hours)
 When the command that caused the error occurred, the device was active or idle.
 
 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 84 51 f8 3f 0f b0 e0  Error: ICRC, ABRT at LBA = 0x00b00f3f = 11538239
 
 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 ca 00 f8 3f 0f b0 e0 00      01:17:14.313  WRITE DMA
 
 SMART Self-test log structure revision number 1
 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
 # 1  Extended offline    Completed without error       00%       156         -
 # 2  Short offline       Completed without error       00%       155         -
 # 3  Short offline       Completed without error       00%       155         -
 Code: 
SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
 Warning: ATA Specification requires selective self-test log data structure revision number = 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
 1        0        0  Not_testing
 2        0        0  Not_testing
 3        0        0  Not_testing
 4        0        0  Not_testing
 5        0        0  Not_testing
 Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
 If Selective self-test is pending on power-up, resume after 0 minute delay.
 
 
 Ich bin kein Festplattenexperte aber mir sind die vielen DMA-Fehler aufgefallen. Vermutlich hat es aber keinen Headcrash gegeben (wegen dem "Klack"), sonst wären sicher andere/mehr Fehlermeldungen - oder?
 
 Das habe ich auch schon untersucht:
 
 Code: 
# hdparm -Tt /dev/sda
 /dev/sda:
 Timing cached reads:   862 MB in  2.00 seconds = 431.01 MB/sec
 Timing buffered disk reads:  176 MB in  3.02 seconds =  58.35 MB/sec
 # hdparm -i /dev/sda
 
 /dev/sda:
 HDIO_GET_IDENTITY failed: Inappropriate ioctl for device
 # dmesg | grep -i sata
 sata_nv 0000:00:0e.0: version 2.0
 ata1: SATA max UDMA/133 cmd 0xF80 ctl 0xF02 bmdma 0xE000 irq 209
 ata2: SATA max UDMA/133 cmd 0xE80 ctl 0xE02 bmdma 0xE008 irq 209
 scsi0 : sata_nv
 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 scsi1 : sata_nv
 ata2: SATA link down (SStatus 0 SControl 300)
 RocketRAID 3xxx SATA Controller driver v1.0 (060426)
 # lsmod | grep -i sata
 # [keine Ausgabe]
 
 Noch etwas:
 
 1.
 Wenn das System hängen bleibt und ich einen Reset mit dem Reset-Knopf mache, dann wird der PC und die HDs nicht stromlos. Der Rechner startet einen neuen Bootvorgang vom BIOS aus. Dabei bleibt das BIOS immer bei der Festplattenerkennung von der SATA-Platte hängen.
 
 Bei einem Hard-Reset mit Strom abschalten hat das BIOS keine Probleme.
 
 2.
 Vor Kanotix hatte ich mit Kubuntu (Feisty) auch sehr große HD-Probleme. Wenn das System einfror (was ständig passierte) konnte ich auf der tty-Konsole ATA/SATA Errors mit Hex-Zahlen und "frozen" (o. ä) sehen.
 Auch der Effekt 1. war dort reproduzierbar.
 
 
 Meine Theorie:
 
 Irgend ein Vorgang im Linux verträgt sich nicht mit dem HD-Controller der SATA-Platte und bringt dort die Steuerung durcheinander, sodass die HD keine Daten mehr liefert/annimmt.
 
 Oder fehlen mir Kernelmodule? Ich habe gelesen, dass manche  sata und/oder  libata geladen haben. Gilt das auch beim Kanotixkernel 2.6.18-kanotix-1? Und falls ja, wie mache ich das?
 
 Oder muss ich irgendwo (im BIOS?? mit hdparm??) eine DMA-Einstellung verändern?
 
 Wo kann ich sonst noch suchen, um das Problem besser einzugrenzen?
 
 Danke für Hilfe
 
 -----------------------------------
 Nachtrag1:
 
 Code: 
# modprobe libata
FATAL: Module libata not found.
 finde aber doch etwas:
 
 Code: 
# locate libata
/usr/include/linux/libata.h
 /usr/lib/ssl/engines/libatalla.so
 /usr/src/linux-headers-2.6.18-kanotix-1/include/linux/libata.h
 
 Hat das etwas zu bedeuten?
 
 ----------------------------------
 Nachtrag 2:
 eben hat sich die /dev/sda1/ während der Arbeit unbemerkt in den Nur-Lese-Modus "umgemountet".
 
 Auf der tty1-Konsole konnte ich folgende Meldungen lesen:
 
 Code: 
ata1.00: speed down request but not transfer mode left
ata1.00: exception Emask [....]
 ata1.00: tag ß cmd 0xc5 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
 
 Daraufhin habe ich das System neu gestartet mit
 
 
 Code: 
#shutdown -r now
 
 Beim Hochfahren wiederholten sich diese 3 Zeilen hunderte male.
 
 Weitere Meldungen:
 
 Code: 
# dmesg
 [.... ]
 
 sda: Write Protect is off
 sda: Mode Sense: 00 3a 00 00
 SCSI device sda: drive cache: write back
 ata1.00: speed down requested but no transfer mode left
 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x400000 action 0x2
 ata1.00: tag 0 cmd 0xc5 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
 ata1: soft resetting port
 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata1.00: configured for PIO0
 ata1: EH complete
 SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
 sda: Write Protect is off
 sda: Mode Sense: 00 3a 00 00
 SCSI device sda: drive cache: write back
 ata1.00: speed down requested but no transfer mode left
 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x400000 action 0x2
 ata1.00: tag 0 cmd 0x39 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
 ata1: soft resetting port
 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata1.00: configured for PIO0
 ata1: EH complete
 ata1.00: speed down requested but no transfer mode left
 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x400000 action 0x2
 ata1.00: tag 0 cmd 0x39 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)
 ata1: soft resetting port
 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata1.00: configured for PIO0
 ata1: EH complete
 SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
 sda: Write Protect is off
 sda: Mode Sense: 00 3a 00 00
 SCSI device sda: drive cache: write back
 SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
 sda: Write Protect is off
 sda: Mode Sense: 00 3a 00 00
 SCSI device sda: drive cache: write back
 
 [...wiederholt sich viele male....]
 
 
 Hilft das weiter???
 |