[Qemu-devel] x86, nops settings result in kernel crash

2012-08-16 Thread Tomas Racek
Hi,

I am writing a file system test which I execute in qemu with kernel compiled 
from latest git sources and running it causes this error:

https://bugzilla.kernel.org/show_bug.cgi?id=45971

It works with v3.5, so I ran git bisect which pointed me to:

d6250a3f12edb3a86db9598ffeca3de8b4a219e9 x86, nops: Missing break resulting in 
incorrect selection on Intel

To be quite honest, I don't understand this stuff much but I tried to do some 
debugging and I figured out (I hope) that the crash is caused by setting 
ideal_nops to p6_nops (k8_nops was used before the break statement was added).

Here is cpuinfo from guest machine:

[root@test ~]# cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 2
model name  : QEMU Virtual CPU version 0.15.1
stepping: 3
microcode   : 0x1
cpu MHz : 2591.580
cache size  : 4096 KB
fpu : yes
fpu_exception   : yes
cpuid level : 4
wp  : yes
flags   : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good nopl pni cx16 popcnt 
hypervisor lahf_lm
bogomips: 5183.16
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual

And from host machine:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 42
model name  : Intel(R) Core(TM) i5-2540M CPU @ 2.60GHz
stepping: 7
microcode   : 0x28
cpu MHz : 800.000
cache size  : 3072 KB
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc aperfmperf pni 
pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid 
sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat 
epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips: 5183.17
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

If I use "qemu-kvm -cpu host", it works correctly. I hope you'll find something 
useful in it. 
Thanks for your time.

Regards,
Tomas



Re: [Qemu-devel] x86, nops settings result in kernel crash

2012-08-16 Thread Tomas Racek
- Original Message -
> On Thu, Aug 16, 2012 at 09:35:12AM -0400, Tomas Racek wrote:
> > Hi,
> > 
> > I am writing a file system test which I execute in qemu with kernel
> > compiled from latest git sources and running it causes this error:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=45971
> > 
> > It works with v3.5, so I ran git bisect which pointed me to:
> > 
> > d6250a3f12edb3a86db9598ffeca3de8b4a219e9 x86, nops: Missing break
> > resulting in incorrect selection on Intel
> > 
> > To be quite honest, I don't understand this stuff much but I tried
> > to do some debugging and I figured out (I hope) that the crash is
> > caused by setting ideal_nops to p6_nops (k8_nops was used before
> > the break statement was added).
> 
> Maybe I overlooked it or maybe it was implied but did you try
> reverting
> the patch and rerunning your test? Does it work ok then?
> 

Yes, if I remove the break statement (introduced by this commit), it works fine.

Tomas

> Thanks.
> 
> --
> Regards/Gruss,
> Boris.
> 



Re: [Qemu-devel] x86, nops settings result in kernel crash

2012-08-17 Thread Tomas Racek


- Original Message -
> On 08/16/2012 11:53 AM, Alan Cox wrote:
> >>
> >> Yes, if I remove the break statement (introduced by this commit),
> >> it works fine.
> > 
> > What version of qemu is this - do we have qemu bug here I wonder.
> > 
> 
> Also, is it 32 or 64 bits?

It's 64-bit.

Regards,

Tomas

> 
>   -hpa
> 



Re: [Qemu-devel] x86, nops settings result in kernel crash

2012-08-20 Thread Tomas Racek
- Original Message -
> On Fri, Aug 17, 2012 at 03:43:56AM -0400, Tomas Racek wrote:
> > Well, I've added some debug statements to the code:
> > 
> > void __init arch_init_ideal_nops(void)
> > {
> > switch (boot_cpu_data.x86_vendor) {
> > case X86_VENDOR_INTEL:
> > /*
> >  * Due to a decoder implementation quirk, some
> >  * specific Intel CPUs actually perform better with
> >  * the "k8_nops" than with the SDM-recommended
> >  NOPs.
> >  */
> > if (boot_cpu_data.x86 == 6 &&
> > boot_cpu_data.x86_model >= 0x0f &&
> > boot_cpu_data.x86_model != 0x1c &&
> > boot_cpu_data.x86_model != 0x26 &&
> > boot_cpu_data.x86_model != 0x27 &&
> > boot_cpu_data.x86_model < 0x30) {
> > printk("NOPS: Option 1\n");
> > ideal_nops = k8_nops;
> > } else if (boot_cpu_has(X86_FEATURE_NOPL)) {
> > printk("NOPS: Option 2\n");
> >ideal_nops = p6_nops;
> > } else {
> > printk("NOPS: Option 3\n");
> > #ifdef CONFIG_X86_64
> > ideal_nops = k8_nops;
> > #else
> > ideal_nops = intel_nops;
> > #endif
> > }
> > break;
> > default:
> > #ifdef CONFIG_X86_64
> > ideal_nops = k8_nops;
> > #else
> > if (boot_cpu_has(X86_FEATURE_K8))
> > ideal_nops = k8_nops;
> > else if (boot_cpu_has(X86_FEATURE_K7))
> > ideal_nops = k7_nops;
> > else
> > ideal_nops = intel_nops;
> > #endif
> > }
> > }
> > 
> > This gives me Option 1 with "-cpu host" and Option 2 without.
> 
> This looks like an emulation bug. The interesting thing is that your
> both traces from the bugzilla point to generic_make_request_checks
> but
> it could also be due to timing.
> 
> Decoding the instruction stream in the second trace in the bugzilla
> gives:
> 
> [ 278.595106] Code: 03 48 89 03 48 8b 57 70 48 89 53 10 48 2b 01 8b
> 3f 48 89 45 98 48 8b 82 90 00 00 00 89 7d 94 48 8b 80 60 02 00 00 48
> 89 45 88 ac <17> 00 00 c8 45 85 e4 74 30 48 8b 43 10 48 8b 40 08 48
> 8b 40 48
> All code
> 
>0:   03 48 89add-0x77(%rax),%ecx
>3:   03 48 8badd-0x75(%rax),%ecx
>6:   57  push   %rdi
>7:   70 48   jo 0x51
>9:   89 53 10mov%edx,0x10(%rbx)
>c:   48 2b 01sub(%rcx),%rax
>f:   8b 3f   mov(%rdi),%edi
>   11:   48 89 45 98 mov%rax,-0x68(%rbp)
>   15:   48 8b 82 90 00 00 00mov0x90(%rdx),%rax
>   1c:   89 7d 94mov%edi,-0x6c(%rbp)
>   1f:   48 8b 80 60 02 00 00mov0x260(%rax),%rax
>   26:   48 89 45 88 mov%rax,-0x78(%rbp)
>   2a:   ac  lods   %ds:(%rsi),%al
>   2b:*  17  (bad)   <-- trapping instruction
>   2c:   00 00   add%al,(%rax)
>   2e:   c8 45 85 e4 enterq $0x8545,$0xe4
>   32:   74 30   je 0x64
>   34:   48 8b 43 10 mov0x10(%rbx),%rax
>   38:   48 8b 40 08 mov0x8(%rax),%rax
>   3c:   48 8b 40 48 mov0x48(%rax),%rax
> ...
> 
> Code starting with the faulting instruction
> ===
>0:   17  (bad)
>1:   00 00   add%al,(%rax)
>3:   c8 45 85 e4 enterq $0x8545,$0xe4
>7:   74 30   je 0x39
>9:   48 8b 43 10 mov0x10(%rbx),%rax
>d:   48 8b 40 08 mov0x8(%rax),%rax
>   11:   48 8b 40 48 mov0x48(%rax),%rax
> 
> 
> and an instruction with opcode 0x17 in 64-bit mode is, AFAICT,
> invalid (on 32-bit it is "pop %ss" according to this thing:
> http://www.onlinedisassembler.com).

I can provide you with more different traces if it can help. But I thought that 
maybe it will be more useful for you to try it on your own. So I've prepared 
some minimal debian installation which you could download here (apx 163M 
bzipped):

http://fi.muni.cz/~xracek/debian.img.bz2

Password:
root/asdfgh

Here is my config for guest kernel:

http://fi.muni.cz/~xracek/config

I use

qemu-kvm -m 1500 -hda debian.img -kernel linux/arch/x86/boot/bzImage -append 
"root=/dev/sda1"

After logging in just run "sh runtest.sh". This leads to crash in my case 
(host: Intel Core i5-2540M, kernel 3.5.2-1.fc17.x86_64, qemu 1.0.1).

Regards,

Tom

> 
> --
> Regards/Gruss,
> Boris.
> 



Re: [Qemu-devel] x86, nops settings result in kernel crash

2012-08-21 Thread Tomas Racek
> On 20.08.2012 21:13, Tomas Racek wrote:
> []
> Can we trim the old, large and now not-so-relevant discussion please?
> ;)
> 
> > I can provide you with more different traces if it can help. But I
> > thought that maybe it will be more useful for you to try it on
> > your own. So I've prepared some minimal debian installation which
> > you could download here (apx 163M bzipped):
> > 
> > http://fi.muni.cz/~xracek/debian.img.bz2
> > 
> > Password:
> > root/asdfgh
> > 
> > Here is my config for guest kernel:
> > 
> > http://fi.muni.cz/~xracek/config
> > 
> > I use
> > 
> > qemu-kvm -m 1500 -hda debian.img -kernel
> > linux/arch/x86/boot/bzImage -append "root=/dev/sda1"
> 
> Um.  I'd expect the image to be self-contained, no external kernel.
> I wanted to do a quick test to see if it fails on my machine too,
> d/loaded debian.img.bz2 but there's no kernel.  So.. no quick test
> for you ;)

Well, the point was to use the latest sources instead of some image which can 
be obsolete tomorrow. However I created a new image with today's kernel which 
you can use:

http://fi.muni.cz/~xracek/debian2.img.bz2

Other things are the same.

The runtest.sh sets environment for xfstests and runs test 285 which I wrote 
and and which should test if FS sends discard requests only on free sectors:
285:
1. Create loop device and FS on it.
2. Populate it with some garbage.
3. Get free sectors from FS.
4. Run fstrim and look for discard requests via blk tracer.
5. Compare free sectors to discard requests.

The test itself can have some issues but I'm pretty sure it shouldn't crash the 
system. ;-)

Regards,
Tom

> 
> > After logging in just run "sh runtest.sh". This leads to crash in
> > my case (host: Intel Core i5-2540M, kernel 3.5.2-1.fc17.x86_64,
> > qemu 1.0.1).
> 
> With all the above, this "runtest.sh" is informationally equal to
> your disk image.
> 
> /mjt
> 



Re: [Qemu-devel] [PATCH] x86, alternative: fix p6 nops on non-modular kernels

2012-08-22 Thread Tomas Racek
- Original Message -
> On 08/22/2012 12:54 PM, Avi Kivity wrote:
> > On 08/21/2012 12:28 PM, Tomas Racek wrote:
> >> 
> >> http://fi.muni.cz/~xracek/debian2.img.bz2
> >> 
> >> Other things are the same.
> >> 
> >> The runtest.sh sets environment for xfstests and runs test 285
> >> which I wrote and and which should test if FS sends discard
> >> requests only on free sectors:
> >> 285:
> >> 1. Create loop device and FS on it.
> >> 2. Populate it with some garbage.
> >> 3. Get free sectors from FS.
> >> 4. Run fstrim and look for discard requests via blk tracer.
> >> 5. Compare free sectors to discard requests.
> >> 
> >> The test itself can have some issues but I'm pretty sure it
> >> shouldn't crash the system. ;-)
> > 
> > Does the following patch help?
> > 
> 
> It's obvious that it should.  You're running a non-modular kernel,
> and those nops are discarded (probably a leftover from the days
> patching was a boot-only activity), so the kernel patched garbage
> over its own code.
> 

Works fine. Thank you!

Tom


> ---8<cut-here-8<---
> 
> From: Avi Kivity 
> Date: Wed, 22 Aug 2012 12:58:18 +0300
> Subject: [PATCH] x86, alternative: fix p6 nops on non-modular kernels
> 
> Probably a leftover from the early days of self-patching, p6nops are
> marked __initconst_or_module, which causes them to be discarded in a
> non-modular kernel.  If something later triggers patching, it will
> overwrite kernel code with garbage.
> 
> Reported-by: Tomas Racek 
> Signed-off-by: Avi Kivity 
> 
> diff --git a/arch/x86/kernel/alternative.c
> b/arch/x86/kernel/alternative.c
> index afb7ff7..ced4534 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -165,7 +165,7 @@ static int __init setup_noreplace_paravirt(char
> *str)
>  #endif
>  
>  #ifdef P6_NOP1
> -static const unsigned char  __initconst_or_module p6nops[] =
> +static const unsigned char p6nops[] =
>  {
>   P6_NOP1,
>   P6_NOP2,
> 
> 
> --
> error compiling committee.c: too many arguments to function
>