[PATCH 2.6.12.4] sata_sis.c: Introducing device ID 0x182
Our new SIS based AMD desktop systems come with a very new SIS chipset that has a Serial ATA controller that has the device ID 0x182. Without this patch the system won't be able to use the hard disk in native mode. As a proof of concept we patched the kernel on a system with an older SIS chipset and then transfered the hard disk to the new system, looks like the new chipset is compatible enough to run without problems. Regards Rainer Patch signed-off-by: Rainer Koenig <[EMAIL PROTECTED]> --- linux-2.6.12.4/drivers/scsi/sata_sis.c 2005-08-05 09:04:37.0 +0200 +++ linux/drivers/scsi/sata_sis.c 2005-08-11 10:22:07.0 +0200 @@ -62,6 +62,7 @@ static struct pci_device_id sis_pci_tbl[] = { { PCI_VENDOR_ID_SI, 0x180, PCI_ANY_ID, PCI_ANY_ID, 0, 0, sis_180 }, { PCI_VENDOR_ID_SI, 0x181, PCI_ANY_ID, PCI_ANY_ID, 0, 0, sis_180 }, +{ PCI_VENDOR_ID_SI, 0x182, PCI_ANY_ID, PCI_ANY_ID, 0, 0, sis_180 }, { } /* terminate list */ }; -- Dipl.-Inf. (FH) Rainer Koenig Project Manager Linux Business Clients Fujitsu Siemens Computers VP BC E SW OS Phone: +49-821-804-3321 Fax: +49-821-804-2131 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA status report updated
Hi Simon, Simon Oosthoek <[EMAIL PROTECTED]> writes: > I'm wondering how the support for the SIS 182 controller is doing, I > noticed they have a GPL driver on their website for kernel 2.6.10, > which is not a drop in replacement for sata_sis.c in 2.6.12.5, I > haven't tried compiling it as an add-on module outside the tree, > though... I tried the sources from the SiS website (that seem to add more details than my simple patch that just adds the device ID) as a drop in for the Fedora installation kernel 2.6.11-1.1369_FC4, but the kernel build process ran into an error at the sata_sis module. The problem is that the source from SiS has a conditional code that depends on the definition of a symbol "KERN_2_6_10" which is defined by their "outside build makefile", but not in the standard kernel build process. I added a #define KERN_2_6_10 to the source and then it compiled also inside the kernel build process. > Adding the 0x182 identifier to the 180 driver does compile (duh!), but > I haven't tried it on hardware. Working at a PC manufacturer I have access to hardware and I tried out a lot and didn't run into any problem so far. > As a temporary measure, there was a patch posted to this list [1] a > while ago, would it be a good idea to include this while full support > is being worked on? Seeing that the source from the SiS website is much more going into the details than my simple adding of the device ID (of course SiS has hopefully a much deeper knowledge of their hardware than I have ;-) I would rather go for integrating the SiS source in the current kernel. And this problem is quite urgent since its a sort of "showstopper" for brandnew hardware. We have a query from an university that wants to buy 7000 PCs with that hardware in the next 4 years, but until yesterday they were unable to install Fedora Core 4 on the machine since the installer doesn't see any hard disks. I succeeded to make a simple quick&dirty driver disk to get Linux at least installed on the hard disk. But the problem also applies for every other Linux distribution, so we urgently need to get support for that device in the mainstream kernel hoping that it will be inherited to the installation kernels of the distributions soon. Generally SATA is replacing parallel ATA in the new PC platforms and we already got anouncements that future platforms will come with SATA only. So can't emphasize enought that SATA support is absolutely important for Linux on the desktop. If there is something I can do to help or contribute let me know. Best regards Rainer -- Dipl.-Inf. (FH) Rainer Koenig Project Manager Linux Business Clients Fujitsu Siemens Computers VP BC E SW OS Phone: +49-821-804-3321 Fax: +49-821-804-2131 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA status report updated
Hi Jeff, Jeff Garzik <[EMAIL PROTECTED]> writes: > Here is a list of problems with the patch. I'll paste this into the > bug as well: [Lot of interesting issues] > 8) The DMA pad code is very buggy. It uses the dma_map_single() to > map a buffer, but never synchronizes nor flushes the buffer. This can > and will lead to data corruption, particularly on x86-64 platform. That's very bad since the target platform for that chipset is able to support AMD64. :-( >From your comments I've learned that my patch (just the device ID) is too tiny and the SiS provided patch is doing too much things that it shouldn't do. How can we find a solution for that? Would it make sense that I try to find the "goods" in the SiS patch and merge them somehow in the actual kernel? But: What kernel shall I take to do that work? The latest development kernel, the kernel of my distribution (whatever this will be, sooner or later it has to work with all distributions) or just a kernel that is "close" to the patch from SiS, e.g. 2.6.10? As I mentioned before, getting hardware to try out patches wouldn't be that big deal since I'm located in a PC factory and I can get test machines if needed. What would be good tests to e.g. detect the problems that you mentioned above? Are there hardware specific tests for SATA hard disks around? I would be very interested in that since testing also under Linux will become daily work for me and my colleauges from the system test department. Best regards Rainer (posting from home) -- Please send NO spam to my mail addresses. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SATA status report updated
Hi Simon, Simon Oosthoek <[EMAIL PROTECTED]> writes: > Unfortunately I'm not able to check the logic of the driver, because > although I can read C, I'm totally unfamiliar with the disk controler > logic in the kernel... Well, today I've spent some time in looking at the SiS driver and compared it with the driver that is in kernel 2.6.10. And keeping Jeff's comments about libata in mind (together with a printout of libata.h) helped a bit to understand the differences. So I will see if I can somehow get the important things from the SiS driver while using whatever libata provides already. Will take some time anyway since kernel hacking is not the main focus of my job. Anyway, I will try. I guess the main issue is to find the 0x182 specific details and merge them into the kernel driver. Best regards Rainer -- Rainer König, Diplom-Informatiker (FH), Augsburg, Germany - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[x86_64] SATA disks on ICH5 not detected in 2.6.{8,9}
Hi, I hope my follwing questions are not too much offtopic on the LKML. I just filed this bug: http://bugzilla.kernel.org/show_bug.cgi?id=4142 Its about problems with SATA support on x86_64 architecture. And yes, it looks like the bug is solved (either on purpose or accidentialy) in 2.6.10. But I'm still a bit concerned, because the impact of the problem is rather deep: This bug prevents people from installing Linux on their machines if they have an AMD64/EM64T architecture and if they have only SATA disks. So if someone buys an actual high end system for a lot of bucks and gets an actual distribution to install on it he's stuck because the install kernels don't have the capability to recognize the SATA disks in the machine. Looking at the bug-trackers of different distributions and also the kernel bug tracker I have the impression that a lot of people are suffering from such a bug. Here in my workplace I'm suffering as well, because we produce machines and tell our customers "yes, it runs with Linux" based on e.g. a distribution that was working with an earlier kernel that didn't have this bug. Ok, shit happens, I know, but I wonder, if there is something that we can do about that. One thing that is coming to my mind is: How do distibutors select the kernel they use for their distribution. And: Is there a chance to submit bug alerts to distributors with a recommondation like "Don't use kernel x.y.z because it has problems in detecting well known hardware"? I guess, the impact of such a problem doesn't only hit PC hardware vendors, a distributor will get a lot of annoyed customers if the distribution is not installable on some sort of hardware. Can I offer some help as well? Ok, I'm far away from being a kernel hacker, but at least I have access to actual and brand new hardware here in my laboratory. So I wonder if I can find the resources (time, hardware and people) to do a sort of "regresion tests" on this hardware (especially thinking on AMD64/EM64T architecture) every time a new kernel is released. Actually I'm buried under a workload that is increasing faster than I can handle it :-) but I have the hope that my employer soon agrees to hire some students that will come to support me. And seeing the trouble that such a bug causes when its detected at a customer (and not before in any testing environment) makes me think that there much sense in using part of my ressources for testing of new kernels. Thanks for reading this, comments are welcome. Rainer P.S.: I'm not subscribed to the LKML, but I read it via the NNTP gateway. -- Dipl.-Inf. (FH) Rainer Koenig Project Manager Linux Fujitsu Siemens Computers VP BC E SW OS Phone: +49-821-804-3321 Fax: +49-821-804-2131 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: How do I debug PCI resource allocation problems
Am Freitag, 9. November 2007 00:51 Robert Hancock wrote: > Rainer Koenig wrote: > > > > Ok, short description of the problem: > > > > I run 64-bit Linux using 2 GB of RAM, no problem at all. Then I turn off > > the machine, add 2 more GB so that now I have 4 GB of RAM. Turning it on > > I see the splashscreen of the boot loader, starting the kernel turns the > > screen black and that's it. The machine comes up, I can even ssh to it > > over the net. That is how I obtained the following data. Just for the minutes. Problem solved. It was caused by wrong e820 information from the BIOS: > > <4> BIOS-e820: df80 - e010 (reserved) That one overlaps with e000-efff which is the resource that then leads to this: > > <3>PCI: Cannot allocate resource region 2 of device :00:02.0 > Looks like the BIOS reserved part of that memory range already. Question > is why vesafb is trying to reserve that range in the first place though. Yep. > > Question: Memory region 2 at 12000? That is beyond the 4GB boundary > > and the BIOS guys I know told me that every PCI IOMEM region should > > reside within the first 4 GBs! When running the machine with 2 GB only > > lspci output looks like this for the VGA device: > > 64-bit capable PCI devices can indeed have BARs which can be located > above 4GB. However, I can't see why lspci is detecting that from this > configuration space: the BAR contents for region 2 are 2008, which > means prefetchable memory at 0x2000 which can be located anywhere > within 32-bit memory space. That doesn't make any sense though, since > that's in the middle of RAM! Quite likely this bogus resource setting of > the graphics controller is a large part of your problem. Question is > who's doing this.. I compiled a debug kernel with lots of verbosity about PCI. That then reported: <7> got res [12000:12fff] bus [12000:12fff] flags 1208 for BAR 2 of :00:02.0 <7>PCI: moved device :00:02.0 resource 2 (1208) to 2000 I guess that explains it a bit. Looks like with the "move" also the bytes that lspci -xx prints out are affected. Anyway, after we had enough information about what range was giving us problems the BIOS development was able to find and fix the bug quite quickly. Installed new BIOS and problem was gone. :-) Thanks anyway for the comments. Regards Rainer -- Rainer Koenig, Diplom-Informatiker (FH), Augsburg, Germany - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
How do I debug PCI resource allocation problems
This will get long, sorry. But I'm a bit desperate because I encounter strange problems on a new mainboard with Intel Q35 chipset and a shared memory graphics card. The logs and data I use here are from a SLED10 SP1 (x86_64) installation, but the problem occurs whatever distribution I try out. Ok, short description of the problem: I run 64-bit Linux using 2 GB of RAM, no problem at all. Then I turn off the machine, add 2 more GB so that now I have 4 GB of RAM. Turning it on I see the splashscreen of the boot loader, starting the kernel turns the screen black and that's it. The machine comes up, I can even ssh to it over the net. That is how I obtained the following data. First a look at the boot.msg log. I will point out places of interest (at least the lines I think that are important for further analysis). --8<-boot.msg--start-- Inspecting /boot/System.map-2.6.16.46-0.12-smp Loaded 23173 symbols from /boot/System.map-2.6.16.46-0.12-smp. Symbols match kernel version 2.6.16. No module symbols loaded - kernel modules not enabled. klogd 1.4.1, log source = ksyslog started. <4>Bootdata ok (command line is root=/dev/disk/by-id/scsi-SATA_ST3160815AS_9RX01AP0-part5 vga=0x31a resume=/dev/sda1 splash=silent) <5>Linux version 2.6.16.46-0.12-smp ([EMAIL PROTECTED]) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #1 SMP Thu May 17 14:00:09 UTC 2007 <6>BIOS-provided physical RAM map: <4> BIOS-e820: - 0009d800 (usable) <4> BIOS-e820: 0009d800 - 000a (reserved) <4> BIOS-e820: 000ce000 - 000d (reserved) <4> BIOS-e820: 000e - 0010 (reserved) <4> BIOS-e820: 0010 - df5d (usable) <4> BIOS-e820: df5d - df5dc000 (ACPI data) <4> BIOS-e820: df5dc000 - df5df000 (ACPI NVS) <4> BIOS-e820: df5df000 - df70 (reserved) <4> BIOS-e820: df80 - e010 (reserved) <4> BIOS-e820: f800 - fc00 (reserved) <4> BIOS-e820: fec0 - fec1 (reserved) <4> BIOS-e820: fee0 - fee01000 (reserved) <4> BIOS-e820: ffb0 - 0001 (reserved) * Comment: The following 2 lines are added when I upgrade from 2 GB to 4 GB. <4> BIOS-e820: 0001 - 00011a00 (usable) <4> BIOS-e820: 00011a00 - 00011c00 (reserved) <6>DMI present. <7>ACPI: RSDP (v000 PTLTD ) @ 0x000f7240 <7>ACPI: RSDT (v001 PTLTDRSDT 0x0006 LTP 0x) @ 0xdf5d6d0b <7>ACPI: FADT (v001 FSC 0x0006 0x000f4240) @ 0xdf5dba5f <7>ACPI: TCPA (v001 Phoeni x 0x0006 TL 0x) @ 0xdf5dbad3 <7>ACPI: _MAR (v001 Intel OEMDMAR 0x0006 LOHR 0x0001) @ 0xdf5dbb05 <7>ACPI: SSDT (v001 FSCPST_CPU0 0x0006 CSF 0x0001) @ 0xdf5dbb35 <7>ACPI: SSDT (v001 FSCPST_CPU1 0x0006 CSF 0x0001) @ 0xdf5dbbeb <7>ACPI: SSDT (v001 FSCPST_CPU2 0x0006 CSF 0x0001) @ 0xdf5dbca1 <7>ACPI: SSDT (v001 FSCPST_CPU3 0x0006 CSF 0x0001) @ 0xdf5dbd57 <7>ACPI: SPCR (v001 PTLTD $UCRTBL$ 0x0006 PTL 0x0001) @ 0xdf5dbe0d <7>ACPI: MCFG (v001 PTLTDMCFG 0x0006 LTP 0x) @ 0xdf5dbe5d <7>ACPI: HPET (v001 PTLTD HPETTBL 0x0006 LTP 0x0001) @ 0xdf5dbe99 <7>ACPI: MADT (v001 PTLTDAPIC 0x0006 LTP 0x) @ 0xdf5dbed1 <7>ACPI: BOOT (v001 PTLTD $SBFTBL$ 0x0006 LTP 0x0001) @ 0xdf5dbf55 <7>ACPI: ASF! (v016 CETP CETP 0x0006 PTL 0x0001) @ 0xdf5dbf7d <7>ACPI: DSDT (v001 FSCD2587/A1 0x0006 MSFT 0x0301) @ 0x <6>No NUMA configuration found <6>Faking a node at -00011a00 <6>Bootmem setup node 0 -00011a00 <7>On node 0 totalpages: 1004539 <7> DMA zone: 2979 pages, LIFO batch:0 <7> DMA32 zone: 896520 pages, LIFO batch:31 <7> Normal zone: 105040 pages, LIFO batch:31 <7> HighMem zone: 0 pages, LIFO batch:0 <6>ACPI: PM-Timer IO Port: 0x1008 <7>ACPI: Local APIC address 0xfee0 <6>ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) <6>Processor #0 6:15 APIC version 20 <6>ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) <6>Processor #1 6:15 APIC version 20 <6>ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) <6>Processor #2 6:15 APIC version 20 <6>ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) <6>Processor #3 6:15 APIC version 20 <6>ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) <6>ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) <6>ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) <6>ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) <6>ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0]) <6>IOAPIC[0]: apic_id 4, version 32, address
How do I analyze a soft lockup?
Hi there, Environment: Kernel is 2.6.16.27, arch x86_64 on a Dual Core AMD64 machine with 4 GB of RAM. Also involved is an Areca 1100 SATA RAID controller with the drives from the Tekram website. Problem: We get customer reports that a system stops with the following kernel messages (as they are submitted to us, we're not able yet to reproduce the problem in our laboratory :-( ) --8<-snip BUG: soft lockup detected on CPU#1! Call Trace: [] softlockup_tick+0xd3/0xe5 [] update_process_times+0x42/0x68 [] smp_local_timer_interrupt+0x31/0x54 [] smp_apic_timer_interrupt+0x4f/0x66 [] apic_timer_interrupt+0x66/0x70 [] system_call+0x7e/0x83 [] __handle_mm_fault+0x256/0x2d9 [] __handle_mm_fault+0x253/0x2d9 [] do_page_fault+0x23f/0x572 [] system_call+0x7e/0x83 [] system_call+0x7e/0x83 [] system_call+0x7e/0x83 [] system_call+0x7e/0x83 [] error_exit+0x0/0x84 [] system_call+0x7e/0x83 [] __put_user_4+0x20/0x30 [] schedule_tail+0x81/0x86 [] ret_from_fork+0xc/0x25 BUG: soft lockup detected on CPU#1! Call Trace: [] softlockup_tick+0xd3/0xe5 [] update_process_times+0x42/0x68 [] smp_local_timer_interrupt+0x31/0x54 [] smp_apic_timer_interrupt+0x4f/0x66 [] apic_timer_interrupt+0x66/0x70 [] system_call+0x7e/0x83 [] __handle_mm_fault+0x256/0x2d9 [] __handle_mm_fault+0x253/0x2d9 [] do_page_fault+0x23f/0x572 [] system_call+0x7e/0x83 [] system_call+0x7e/0x83 [] do_page_fault+0x56f/0x572 [] error_exit+0x0/0x84 [] system_call+0x7e/0x83 [] __put_user_4+0x20/0x30 [] schedule_tail+0x81/0x86 [] ret_from_fork+0xc/0x25 8<-snip-- The first thing that makes me very suspicious or curious is that the call stack shows "__handle_mm_fault" twice, but looking at the source of that function I don't see any recursion that would explain the double line. Is there any idea where I can start digging around? TIA Rainer -- Rainer König, Diplom-Infomatiker (FH), Augsubrg, Germany - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sched: How does the scheduler determine which CPU core gets the job?
Short summary: == Investigating an isuue where parallel tasks are spread differently over the available CPU cores, depending if the machine was cold booted from power off or warm booted by init 6. On cold boot the parallel processes were spread as expected so that with "N" cores and "N" tasks every core gets one task. Same test with warm boot shows that the tasks are spread differently which results in a lousy performance. More details: = Have a workstation here with 2 physical CPUs Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GH which sums up to 48 cores (including hypterthreading). The test sample is an example from the LIGGHTS tutorial files. Test is called like that: mpirun -np 48 liggghts < in.chutewear The performance and CPU load is monitored with htop. If I run the test after a cold boot everyting is like I expected it to be. 48 parallel processes are started, distributed over 48 cores and I see that every CPU core is working at around 100% load. Same hardware, same test, only difference is that meanwhile I did a reboot. Behaviour is totally different. This time only a few CPU cores get the processes and so many cores are just idling around. Question that comes to my mind: === What can cause such a behaviour? Ok, simple answer would be "talk to your machine vendor and ask them what they have done wrong during initialization when the system is rebootet". Bad news in that is that I'm working for that vendor and we need an idea what to look for. After discussing this on the OpenMPI list I now decided to ask here for help. What we tried out so far: = - compared dmesg output betweend cold and warm boot. Nothing special, just a few different numbers for computed performance and different timestamps. - compared the output of lstopo from hwloc, but nothing special here too. - wrote a script that make a snapshot of all /proc//status files for the liggghts jobs and compared the snapshots. Now its clear that we still launch 48 processes, but they are distributed differently. - tried newer kernel (test is running on Ubuntu 14.04.4). Performance got a bit better, but problem still exists. - Took snapshots of /proc/sched_debug when test is running after cold or warm boot. Problem is that for interpreting this output I would need the details how the scheduler works. But that's why I'm asking here. So, if anyone has an idea what to look for please post it here and add me to Cc: TIA Rainer -- Dipl.-Inf. (FH) Rainer Koenig Project Manager Linux Clients FJ EMEIA PR PSO PM&D CCD ENG SW OSS&C Fujitsu Technology Solutions Bürgermeister-Ullrich-Str. 100 86199 Augsburg Germany Telephone: +49-821-804-3321 Telefax: +49-821-804-2131 Mail: mailto:rainer.koe...@ts.fujitsu.com Internet ts.fujtsu.com Company Details ts.fujitsu.com/imprint.html