[PATCH] sata_sis.c: Introducing device ID 0x182

2005-08-11 Thread Rainer Koenig
Our new SIS based AMD desktop systems come with a very new SIS chipset
that has a Serial ATA controller that has the device ID 0x182. Without
this patch the system won't be able to use the hard disk in native mode.
As a proof of concept we patched the kernel on a system with an older SIS
chipset and then transfered the hard disk to the new system, looks like
the new chipset is compatible enough to run without problems.


Patch signed-off-by: Rainer Koenig <[EMAIL PROTECTED]>

--- linux-  2005-08-05 09:04:37.0 
+++ linux/drivers/scsi/sata_sis.c   2005-08-11 10:22:07.0 +0200
@@ -62,6 +62,7 @@
 static struct pci_device_id sis_pci_tbl[] = {
{ PCI_VENDOR_ID_SI, 0x180, PCI_ANY_ID, PCI_ANY_ID, 0, 0, sis_180 },
{ PCI_VENDOR_ID_SI, 0x181, PCI_ANY_ID, PCI_ANY_ID, 0, 0, sis_180 },
+{ PCI_VENDOR_ID_SI, 0x182, PCI_ANY_ID, PCI_ANY_ID, 0, 0, sis_180 },
{ } /* terminate list */

Dipl.-Inf. (FH) Rainer Koenig
Project Manager Linux
Business Clients
Fujitsu Siemens Computers 
Phone: +49-821-804-3321
Fax:   +49-821-804-2131
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SATA status report updated

2005-08-19 Thread Rainer Koenig
Hi Simon,

Simon Oosthoek <[EMAIL PROTECTED]> writes:

> I'm wondering how the support for the SIS 182 controller is doing, I
> noticed they have a GPL driver on their website for kernel 2.6.10,
> which is not a drop in replacement for sata_sis.c in, I
> haven't tried compiling it as an add-on module outside the tree,
> though...

I tried the sources from the SiS website (that seem to add more
details than my simple patch that just adds the device ID) as a drop
in for the Fedora installation kernel 2.6.11-1.1369_FC4, but the
kernel build process ran into an error at the sata_sis module. The
problem is that the source from SiS has a conditional code that
depends on the definition of a symbol "KERN_2_6_10" which is defined
by their "outside build makefile", but not in the standard kernel
build process. I added a #define KERN_2_6_10 to the source and then it
compiled also inside the kernel build process.

> Adding the 0x182 identifier to the 180 driver does compile (duh!), but
> I haven't tried it on hardware.

Working at a PC manufacturer I have access to hardware and I tried out
a lot and didn't run into any problem so far. 

> As a temporary measure, there was a patch posted to this list [1] a
> while ago, would it be a good idea to include this while full support
> is being worked on?

Seeing that the source from the SiS website is much more going into the
details than my simple adding of the device ID (of course SiS has hopefully
a much deeper knowledge of their hardware than I have ;-) I would rather
go for integrating the SiS source in the current kernel. 

And this problem is quite urgent since its a sort of "showstopper" for 
brandnew hardware. We have a query from an university that wants to buy
7000 PCs with that hardware in the next 4 years, but until yesterday they
were unable to install Fedora Core 4 on the machine since the installer
doesn't see any hard disks. I succeeded to make a simple quick&dirty
driver disk to get Linux at least installed on the hard disk. But the
problem also applies for every other Linux distribution, so we urgently
need to get support for that device in the mainstream kernel hoping 
that it will be inherited to the installation kernels of the distributions

Generally SATA is replacing parallel ATA in the new PC platforms and we
already got anouncements that future platforms will come with SATA only.
So can't emphasize enought that SATA support is absolutely important for
Linux on the desktop. 

If there is something I can do to help or contribute let me know. 

Best regards
Dipl.-Inf. (FH) Rainer Koenig
Project Manager Linux
Business Clients
Fujitsu Siemens Computers 
Phone: +49-821-804-3321
Fax:   +49-821-804-2131
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SATA status report updated

2005-08-20 Thread Rainer Koenig
Hi Jeff,

Jeff Garzik <[EMAIL PROTECTED]> writes:

> Here is a list of problems with the patch.  I'll paste this into the
> bug as well:

[Lot of interesting issues]

> 8) The DMA pad code is very buggy.  It uses the dma_map_single() to
> map a buffer, but never synchronizes nor flushes the buffer.  This can
> and will lead to data corruption, particularly on x86-64 platform.

That's very bad since the target platform for that chipset is able
to support AMD64. :-(

>From your comments I've learned that my patch (just the device ID) is
too tiny and the SiS provided patch is doing too much things that it
shouldn't do. How can we find a solution for that? 

Would it make sense that I try to find the "goods" in the SiS patch and
merge them somehow in the actual kernel? But: What kernel shall I take
to do that work? The latest development kernel, the kernel of my 
distribution (whatever this will be, sooner or later it has to work
with all distributions) or just a kernel that is "close" to the patch
from SiS, e.g. 2.6.10? 

As I mentioned before, getting hardware to try out patches wouldn't be
that big deal since I'm located in a PC factory and I can get test 
machines if needed. What would be good tests to e.g. detect the problems
that you mentioned above? Are there hardware specific tests for SATA
hard disks around? I would be very interested in that since testing 
also under Linux will become daily work for me and my colleauges from
the system test department.

Best regards
Rainer (posting from home)
Please send NO spam to my mail addresses.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SATA status report updated

2005-08-22 Thread Rainer Koenig
Hi Simon,

Simon Oosthoek <[EMAIL PROTECTED]> writes:

> Unfortunately I'm not able to check the logic of the driver, because
> although I can read C, I'm totally unfamiliar with the disk controler
> logic in the kernel...

Well, today I've spent some time in looking at the SiS driver and compared
it with the driver that is in kernel 2.6.10. And keeping Jeff's comments
about libata in mind (together with a printout of libata.h) helped a bit 
to understand the differences. So I will see if I can somehow get the
important things from the SiS driver while using whatever libata 
provides already. Will take some time anyway since kernel hacking is
not the main focus of my job. Anyway, I will try. I guess the main 
issue is to find the 0x182 specific details and merge them into the
kernel driver. 

Best regards
Rainer König, Diplom-Informatiker (FH), Augsburg, Germany
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[x86_64] SATA disks on ICH5 not detected in 2.6.{8,9}

2005-02-01 Thread Rainer Koenig

I hope my follwing questions are not too much offtopic on the LKML.

I just filed this bug: http://bugzilla.kernel.org/show_bug.cgi?id=4142

Its about problems with SATA support on x86_64 architecture. And yes, it
looks like the bug is solved (either on purpose or accidentialy) in 2.6.10.
But I'm still a bit concerned, because the impact of the problem is rather
deep: This bug prevents people from installing Linux on their machines if
they have an AMD64/EM64T architecture and if they have only SATA disks.
So if someone buys an actual high end system for a lot of bucks and gets
an actual distribution to install on it he's stuck because the install
kernels don't have the capability to recognize the SATA disks in the 

Looking at the bug-trackers of different distributions and also the 
kernel bug tracker I have the impression that a lot of people are 
suffering from such a bug. Here in my workplace I'm suffering as well,
because we produce machines and tell our customers "yes, it runs with
Linux" based on e.g. a distribution that was working with an earlier
kernel that didn't have this bug. Ok, shit happens, I know, but
I wonder, if there is something that we can do about that. 

One thing that is coming to my mind is: How do distibutors select 
the kernel they use for their distribution. And: Is there a chance
to submit bug alerts to distributors with a recommondation like
"Don't use kernel x.y.z because it has problems in detecting well
known hardware"? I guess, the impact of such a problem doesn't only
hit PC hardware vendors, a distributor will get a lot of annoyed
customers if the distribution is not installable on some sort of 

Can I offer some help as well? Ok, I'm far away from being a kernel
hacker, but at least I have access to actual and brand new hardware
here in my laboratory. So I wonder if I can find the resources (time,
hardware and people) to do a sort of "regresion tests" on this hardware
(especially thinking on AMD64/EM64T architecture) every time a new
kernel is released. Actually I'm buried under a workload that is 
increasing faster than I can handle it :-) but I have the hope that
my employer soon agrees to hire some students that will come to 
support me. And seeing the trouble that such a bug causes when its
detected at a customer (and not before in any testing environment)
makes me think that there much sense in using part of my ressources
for testing of new kernels. 

Thanks for reading this, comments are welcome. 
P.S.: I'm not subscribed to the LKML, but I read it via the NNTP 
Dipl.-Inf. (FH) Rainer Koenig
Project Manager Linux
Fujitsu Siemens Computers 
Phone: +49-821-804-3321
Fax:   +49-821-804-2131
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: How do I debug PCI resource allocation problems

2007-11-16 Thread Rainer Koenig
Am Freitag, 9. November 2007 00:51  Robert Hancock wrote:
> Rainer Koenig wrote:
> >
> > Ok, short description of the problem:
> >
> > I run 64-bit Linux using 2 GB of RAM, no problem at all. Then I turn off
> > the machine, add 2 more GB so that now I have 4 GB of RAM. Turning it on
> > I see the splashscreen of the boot loader, starting the kernel turns the
> > screen black and that's it. The machine comes up, I can even ssh to it
> > over the net. That is how I obtained the following data.

Just for the minutes. Problem solved. It was caused by wrong e820 information 
from the BIOS:

> > <4> BIOS-e820: df80 - e010 (reserved)

That one overlaps with e000-efff which is the resource that 
then leads to this:
> > <3>PCI: Cannot allocate resource region 2 of device :00:02.0

> Looks like the BIOS reserved part of that memory range already. Question
> is why vesafb is trying to reserve that range in the first place though.


> > Question: Memory region 2 at 12000? That is beyond the 4GB boundary
> > and the BIOS guys I know told me that every PCI IOMEM region should
> > reside within the first 4 GBs! When running the machine with 2 GB only
> > lspci output looks like this for the VGA device:
> 64-bit capable PCI devices can indeed have BARs which can be located
> above 4GB. However, I can't see why lspci is detecting that from this
> configuration space: the BAR contents for region 2 are 2008, which
> means prefetchable memory at 0x2000 which can be located anywhere
> within 32-bit memory space. That doesn't make any sense though, since
> that's in the middle of RAM! Quite likely this bogus resource setting of
> the graphics controller is a large part of your problem. Question is
> who's doing this..

I compiled a debug kernel with lots of verbosity about PCI. That then 
<7>  got res [12000:12fff] bus [12000:12fff] flags 1208 for 
BAR 2 of :00:02.0
<7>PCI: moved device :00:02.0 resource 2 (1208) to 2000

I guess that explains it a bit. Looks like with the "move" also the bytes that 
lspci -xx prints out are affected. 

Anyway, after we had enough information about what range was giving us 
problems the BIOS development was able to find and fix  the bug quite 
quickly. Installed new BIOS and problem was gone. :-)

Thanks anyway for the comments. 

Rainer Koenig, Diplom-Informatiker (FH), Augsburg, Germany
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

How do I debug PCI resource allocation problems

2007-11-08 Thread Rainer Koenig
This will get long, sorry. But I'm a bit desperate because I encounter strange 
problems on a new mainboard with Intel Q35 chipset and a shared memory 
graphics card. The logs and data I use here are from a SLED10 SP1 (x86_64) 
installation, but the problem occurs whatever distribution I try out. 

Ok, short description of the problem:

I run 64-bit Linux using 2 GB of RAM, no problem at all. Then I turn off the 
machine, add 2 more GB so that now I have 4 GB of RAM. Turning it on I see 
the splashscreen of the boot loader, starting the kernel turns the screen 
black and that's it. The machine comes up, I can even ssh to it over the net. 
That is how I obtained the following data.

First a look at the boot.msg log. I will point out places of interest (at 
least the lines I think that are important for further analysis).

Inspecting /boot/System.map-
Loaded 23173 symbols from /boot/System.map-
Symbols match kernel version 2.6.16.
No module symbols loaded - kernel modules not enabled.

klogd 1.4.1, log source = ksyslog started.
<4>Bootdata ok (command line is 
root=/dev/disk/by-id/scsi-SATA_ST3160815AS_9RX01AP0-part5 vga=0x31a
resume=/dev/sda1 splash=silent)
<5>Linux version ([EMAIL PROTECTED]) (gcc version 4.1.2 
20070115 (prerelease) (SUSE Linux)) #1 SMP Thu May 17 14:00:09 UTC 2007
<6>BIOS-provided physical RAM map:
<4> BIOS-e820:  - 0009d800 (usable)
<4> BIOS-e820: 0009d800 - 000a (reserved)
<4> BIOS-e820: 000ce000 - 000d (reserved)
<4> BIOS-e820: 000e - 0010 (reserved)
<4> BIOS-e820: 0010 - df5d (usable)
<4> BIOS-e820: df5d - df5dc000 (ACPI data)
<4> BIOS-e820: df5dc000 - df5df000 (ACPI NVS)
<4> BIOS-e820: df5df000 - df70 (reserved)
<4> BIOS-e820: df80 - e010 (reserved)
<4> BIOS-e820: f800 - fc00 (reserved)
<4> BIOS-e820: fec0 - fec1 (reserved)
<4> BIOS-e820: fee0 - fee01000 (reserved)
<4> BIOS-e820: ffb0 - 0001 (reserved)

* Comment: The following 2 lines are added when I upgrade from 2 GB to 
4 GB.

<4> BIOS-e820: 0001 - 00011a00 (usable)
<4> BIOS-e820: 00011a00 - 00011c00 (reserved)
<6>DMI present.
<7>ACPI: RSDP (v000 PTLTD ) @ 
<7>ACPI: RSDT (v001 PTLTDRSDT   0x0006  LTP 0x) @ 
<7>ACPI: FADT (v001 FSC 0x0006  0x000f4240) @ 
<7>ACPI: TCPA (v001 Phoeni  x   0x0006 TL  0x) @ 
<7>ACPI: _MAR (v001 Intel  OEMDMAR  0x0006 LOHR 0x0001) @ 
<7>ACPI: SSDT (v001 FSCPST_CPU0 0x0006  CSF 0x0001) @ 
<7>ACPI: SSDT (v001 FSCPST_CPU1 0x0006  CSF 0x0001) @ 
<7>ACPI: SSDT (v001 FSCPST_CPU2 0x0006  CSF 0x0001) @ 
<7>ACPI: SSDT (v001 FSCPST_CPU3 0x0006  CSF 0x0001) @ 
<7>ACPI: SPCR (v001 PTLTD  $UCRTBL$ 0x0006 PTL  0x0001) @ 
<7>ACPI: MCFG (v001 PTLTDMCFG   0x0006  LTP 0x) @ 
<7>ACPI: HPET (v001 PTLTD  HPETTBL  0x0006  LTP 0x0001) @ 
<7>ACPI: MADT (v001 PTLTDAPIC   0x0006  LTP 0x) @ 
<7>ACPI: BOOT (v001 PTLTD  $SBFTBL$ 0x0006  LTP 0x0001) @ 
<7>ACPI: ASF! (v016   CETP CETP 0x0006 PTL  0x0001) @ 
<7>ACPI: DSDT (v001 FSCD2587/A1 0x0006 MSFT 0x0301) @ 
<6>No NUMA configuration found
<6>Faking a node at -00011a00
<6>Bootmem setup node 0 -00011a00
<7>On node 0 totalpages: 1004539
<7>  DMA zone: 2979 pages, LIFO batch:0
<7>  DMA32 zone: 896520 pages, LIFO batch:31
<7>  Normal zone: 105040 pages, LIFO batch:31
<7>  HighMem zone: 0 pages, LIFO batch:0
<6>ACPI: PM-Timer IO Port: 0x1008
<7>ACPI: Local APIC address 0xfee0
<6>ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
<6>Processor #0 6:15 APIC version 20
<6>ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
<6>Processor #1 6:15 APIC version 20
<6>ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
<6>Processor #2 6:15 APIC version 20
<6>ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
<6>Processor #3 6:15 APIC version 20
<6>ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
<6>ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
<6>ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
<6>ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
<6>ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0])
<6>IOAPIC[0]: apic_id 4, version 32, address 

How do I analyze a soft lockup?

2007-03-07 Thread Rainer Koenig
Hi there,


Kernel is, arch x86_64 on a Dual Core AMD64 machine with
4 GB of RAM. Also involved is an Areca 1100 SATA RAID controller
with the drives from the Tekram website. 


We get customer reports that a system stops with the following kernel
messages (as they are submitted to us, we're not able yet to reproduce
the problem in our laboratory :-( )


BUG: soft lockup detected on CPU#1!

Call Trace:
[] softlockup_tick+0xd3/0xe5
  [] update_process_times+0x42/0x68
  [] smp_local_timer_interrupt+0x31/0x54
  [] smp_apic_timer_interrupt+0x4f/0x66
  [] apic_timer_interrupt+0x66/0x70
[] system_call+0x7e/0x83
  [] __handle_mm_fault+0x256/0x2d9
  [] __handle_mm_fault+0x253/0x2d9
  [] do_page_fault+0x23f/0x572
  [] system_call+0x7e/0x83
  [] system_call+0x7e/0x83
  [] system_call+0x7e/0x83
  [] system_call+0x7e/0x83
  [] error_exit+0x0/0x84
  [] system_call+0x7e/0x83
  [] __put_user_4+0x20/0x30
  [] schedule_tail+0x81/0x86
  [] ret_from_fork+0xc/0x25

BUG: soft lockup detected on CPU#1!

Call Trace:
[] softlockup_tick+0xd3/0xe5
  [] update_process_times+0x42/0x68
  [] smp_local_timer_interrupt+0x31/0x54
  [] smp_apic_timer_interrupt+0x4f/0x66
  [] apic_timer_interrupt+0x66/0x70
[] system_call+0x7e/0x83
  [] __handle_mm_fault+0x256/0x2d9
  [] __handle_mm_fault+0x253/0x2d9
  [] do_page_fault+0x23f/0x572
  [] system_call+0x7e/0x83
  [] system_call+0x7e/0x83
  [] do_page_fault+0x56f/0x572
  [] error_exit+0x0/0x84
  [] system_call+0x7e/0x83
  [] __put_user_4+0x20/0x30
  [] schedule_tail+0x81/0x86
  [] ret_from_fork+0xc/0x25


The first thing that makes me very suspicious or curious is that the
call stack shows "__handle_mm_fault" twice, but looking at the source
of that function I don't see any recursion that would explain the 
double line. 

Is there any idea where I can start digging around? 

Rainer König, Diplom-Infomatiker (FH), Augsubrg, Germany
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

sched: How does the scheduler determine which CPU core gets the job?

2016-04-08 Thread Rainer Koenig
Short summary:
Investigating an isuue where parallel tasks are spread differently over
the available CPU cores, depending if the machine was cold booted from
power off or warm booted by init 6. On cold boot the parallel processes
were spread as expected so that with "N" cores and "N" tasks every core
gets one task. Same test with warm boot shows that the tasks are spread
differently  which results in a lousy performance.

More details:
Have a workstation here with 2 physical CPUs Intel(R) Xeon(R) CPU
E5-2680 v3 @ 2.50GH which sums up to 48 cores (including hypterthreading).

The test sample is an example from the LIGGHTS tutorial files.

Test is called like that:

mpirun -np 48 liggghts < in.chutewear

The performance and CPU load is monitored with htop.

If I run the test after a cold boot everyting is like I expected it to
be. 48 parallel processes are started, distributed over 48 cores and I
see that every CPU core is working at around 100% load.

Same hardware, same test, only difference is that meanwhile I did a
reboot. Behaviour is totally different. This time only a few CPU cores
get the processes and so many cores are just idling around.

Question that comes to my mind:
What can cause such a behaviour? Ok, simple answer would be "talk to
your machine vendor and ask them what they have done wrong during
initialization when the system is rebootet". Bad news in that is that
I'm working for that vendor and we need an idea what to look for. After
discussing this on the OpenMPI list I now decided to ask here for help.

What we tried out so far:

- compared dmesg output betweend cold and warm boot. Nothing special,
  just a few different numbers for computed performance and different
- compared the output of lstopo from hwloc, but nothing special here
- wrote a script that make a snapshot of all /proc//status files
  for the liggghts jobs and compared the snapshots. Now its clear that
  we still launch 48 processes, but they are distributed differently.
- tried newer kernel (test is running on Ubuntu 14.04.4). Performance
  got a bit better, but problem still exists.
- Took snapshots of /proc/sched_debug when test is running after cold
  or warm boot. Problem is that for interpreting this output I would
  need the details how the scheduler works. But that's why I'm asking

So, if anyone has an idea what to look for please post it here and add
me to Cc:

Dipl.-Inf. (FH) Rainer Koenig
Project Manager Linux Clients

Fujitsu Technology Solutions
Bürgermeister-Ullrich-Str. 100
86199 Augsburg

Telephone: +49-821-804-3321
Telefax:   +49-821-804-2131
Mail:  mailto:rainer.koe...@ts.fujitsu.com

Internet ts.fujtsu.com
Company Details  ts.fujitsu.com/imprint.html