[Qemu-devel] Running KVM guest on X86

2012-08-06 Thread Bhushan Bharat-R65777
Hi Avi/All,

I am facing issue to boot KVM guest on x86 (I used to work on PowerPC platform 
and do not have enough knowledge of x86). I am working on making VFIO working 
on PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled 
kernel for x86 on fedora with virtualization configuration (selected all kernel 
config options for same). Run below command to boot Guest (I have not provided 
vfio device yet): 

"qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel 
arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial 
tcp::,server,telnet"

After the I can see qemu command line (able to run various commands like "info 
registers" etc), while guest does not boot (not even the first print comes).

Can anyone help in what I am missing or doing wrong?

Thanks
-Bharat




Re: [Qemu-devel] Running KVM guest on X86

2012-08-06 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Monday, August 06, 2012 9:27 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; Avi Kivity
> Subject: Re: Running KVM guest on X86
> 
> On Mon, 2012-08-06 at 15:40 +0000, Bhushan Bharat-R65777 wrote:
> > Hi Avi/All,
> >
> > I am facing issue to boot KVM guest on x86 (I used to work on PowerPC 
> > platform
> and do not have enough knowledge of x86). I am working on making VFIO working 
> on
> PowerPC Booke, So I have cloned Alex Williamsons git repository, compiled 
> kernel
> for x86 on fedora with virtualization configuration (selected all kernel 
> config
> options for same). Run below command to boot Guest (I have not provided vfio
> device yet):
> >
> > "qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel
> arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img -serial
> tcp::,server,telnet"
> >
> > After the I can see qemu command line (able to run various commands like 
> > "info
> registers" etc), while guest does not boot (not even the first print comes).
> >
> > Can anyone help in what I am missing or doing wrong?
> 
> x86 doesn't use the serial port for console by default, so you're making 
> things
> quite a bit more difficult that way.  Typically you'll want to provide a disk
> image (the -hda option is the easiest way to do this), a display (-vga std 
> -vnc
> :0 is again easiest), and probably something to install from (-cdrom
> ).  You can also add a -boot d to get it to choose the cdrom the
> first time for install.  Thanks,

Thanks Avi and Alex, I can see the KVM guest boot prints by adding -append 
"console=ttyS0"

Now my exact command is like: 
"qemu-system-x86_64 -enable-kvm -nographic -nodefconfig -kernel 
/boot/vmlinuz-3.5.0+  -initrd /boot/initramfs-3.5.0+.img -append 
"console=ttyS0" -hda fedora.qcow  -m 1024"

  Where fedora.qcow is created by "qemu-img create fedora.qcow 5G"

With this it is falling to Dracut, below are the error prints:


[2.288931] dracut: FATAL: No or empty root= argument
[2.291808] dracut: Refusing to continue


[2.294721] dracut Warning: Signal caught!
dracut Warning: Signal caught!
[2.298894] dracut Warning: dracut: FATAL: No or empty root= argument
dracut Warning: dracut: FATAL: No or empty root= argument
[2.304713] dracut Warning: dracut: Refusing to continue
dracut Warning: dracut: Refusing to continue

[2.320311] init (1) used greatest stack depth: 2664 bytes left
[2.323851] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0100
[2.323851] 
[2.324551] Pid: 1, comm: init Not tainted 3.5.0+ #7
[2.324551] Call Trace:
[2.324551]  [] panic+0xc6/0x1e1
[2.324551]  [] ? _raw_write_unlock_irq+0x30/0x50
[2.324551]  [] do_exit+0xa20/0xb70
[2.324551]  [] ? retint_swapgs+0x13/0x1b
[2.324551]  [] do_group_exit+0x4f/0xc0
[2.324551]  [] sys_exit_group+0x17/0x20
[2.324551]  [] system_call_fastpath+0x16/0x1b
-

Should I mount the root-file-system in disk and pass -append "root=.

I had no luck when I tried "qemu-system-x86_64 -device virtio-scsi-pci,id=scsi  
-enable-kvm  -nographic -nodefconfig -nodefaults  -chardev stdio,id=charserial0 
 -device isa-serial,chardev=charserial0,id=serial0  -drive 
file=fedora.qcow,if=none,id=sda  -device scsi-disk,bus=scsi.0,drive=sda  
-device sga  -kernel /boot/vmlinuz-3.5.0+  -initrd /boot/initramfs-3.5.0+.img  
-append "console=ttyS0" -m 1024"
 

Thanks in Advance
-Bharat



Re: [Qemu-devel] Running KVM guest on X86

2012-08-09 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Stuart Yoder [mailto:b08...@gmail.com]
> Sent: Thursday, August 09, 2012 8:28 PM
> To: Bhushan Bharat-R65777
> Cc: Alex Williamson; qemu-devel@nongnu.org; Avi Kivity
> Subject: Re: [Qemu-devel] Running KVM guest on X86
> 
> On Tue, Aug 7, 2012 at 1:30 AM, Bhushan Bharat-R65777 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Alex Williamson [mailto:alex.william...@redhat.com]
> >> Sent: Monday, August 06, 2012 9:27 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: qemu-devel@nongnu.org; Avi Kivity
> >> Subject: Re: Running KVM guest on X86
> >>
> >> On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote:
> >> > Hi Avi/All,
> >> >
> >> > I am facing issue to boot KVM guest on x86 (I used to work on
> >> > PowerPC platform
> >> and do not have enough knowledge of x86). I am working on making VFIO
> >> working on PowerPC Booke, So I have cloned Alex Williamsons git
> >> repository, compiled kernel for x86 on fedora with virtualization
> >> configuration (selected all kernel config options for same). Run
> >> below command to boot Guest (I have not provided vfio device yet):
> >> >
> >> > "qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel
> >> arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img
> >> -serial tcp::,server,telnet"
> >> >
> >> > After the I can see qemu command line (able to run various commands
> >> > like "info
> >> registers" etc), while guest does not boot (not even the first print 
> >> comes).
> >> >
> >> > Can anyone help in what I am missing or doing wrong?
> >>
> >> x86 doesn't use the serial port for console by default, so you're
> >> making things quite a bit more difficult that way.  Typically you'll
> >> want to provide a disk image (the -hda option is the easiest way to
> >> do this), a display (-vga std -vnc :0 is again easiest), and probably
> >> something to install from (-cdrom ).  You can also add a
> >> -boot d to get it to choose the cdrom the first time for install.
> >> Thanks,
> >
> > Thanks Avi and Alex, I can see the KVM guest boot prints by adding -append
> "console=ttyS0"
> 
> Note, once you get to user space you will need a getty specified in
> inittab in order to get a login on your serial port.   Something like:
> 
>T0:23:respawn:/sbin/getty -L ttyS0

1)
I tried booting with prebuilt qcow2 then it works for me:
qemu-system-x86_64  -enable-kvm  -nographic  -device sga  -m 1024 -hda 
debian_squeeze_amd64_standard.qcow2

Does anyone help on how I can add my kernel to qcow2? Or create a proper qcow2?

2)
Also I tried as mentioned in section "3.9 Direct Linux Boot": 
http://qemu.weilnetz.de/qemu-doc.html#disk_005fimages : 

qemu-kvm  -enable-kvm  -nographic -kernel /boot/vmlinuz-3.5.0+ -hda 
/boot/initramfs-3.5.0+.img  -append "console=ttyS0 root=/dev/sda" -m 1024 

I get below error :
[1.299225] No filesystem could mount root, tried:  ext3 ext2 ext4 iso9660
[1.303232] Kernel panic - not syncing: VFS: Unable to mount root fs on 
unknown-block(8,0)
[1.307683] Pid: 1, comm: swapper/0 Not tainted 3.3.5-2.fc16.x86_64 #1
[1.311201] Call Trace:
[1.312548]  [] panic+0xba/0x1cd
[1.315160]  [] mount_block_root+0x258/0x283
[1.318275]  [] mount_root+0x53/0x57
[1.321047]  [] prepare_namespace+0x13d/0x176
[1.324206]  [] kernel_init+0x156/0x15b
[1.327114]  [] ? schedule_tail+0x27/0xb0
[1.330102]  [] kernel_thread_helper+0x4/0x10
[1.333413]  [] ? start_kernel+0x3c5/0x3c5
[1.336446]  [] ? gs_change+0x13/0x13

Thanks
-Bharat




Re: [Qemu-devel] Running KVM guest on X86

2012-08-09 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, August 09, 2012 11:25 PM
> To: Bhushan Bharat-R65777
> Cc: Stuart Yoder; qemu-devel@nongnu.org; Avi Kivity
> Subject: Re: [Qemu-devel] Running KVM guest on X86
> 
> On Thu, 2012-08-09 at 17:39 +0000, Bhushan Bharat-R65777 wrote:
> >
> > > -Original Message-
> > > From: Stuart Yoder [mailto:b08...@gmail.com]
> > > Sent: Thursday, August 09, 2012 8:28 PM
> > > To: Bhushan Bharat-R65777
> > > Cc: Alex Williamson; qemu-devel@nongnu.org; Avi Kivity
> > > Subject: Re: [Qemu-devel] Running KVM guest on X86
> > >
> > > On Tue, Aug 7, 2012 at 1:30 AM, Bhushan Bharat-R65777
> > > 
> > > wrote:
> > > >
> > > >
> > > >> -----Original Message-
> > > >> From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > >> Sent: Monday, August 06, 2012 9:27 PM
> > > >> To: Bhushan Bharat-R65777
> > > >> Cc: qemu-devel@nongnu.org; Avi Kivity
> > > >> Subject: Re: Running KVM guest on X86
> > > >>
> > > >> On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote:
> > > >> > Hi Avi/All,
> > > >> >
> > > >> > I am facing issue to boot KVM guest on x86 (I used to work on
> > > >> > PowerPC platform
> > > >> and do not have enough knowledge of x86). I am working on making
> > > >> VFIO working on PowerPC Booke, So I have cloned Alex Williamsons
> > > >> git repository, compiled kernel for x86 on fedora with
> > > >> virtualization configuration (selected all kernel config options
> > > >> for same). Run below command to boot Guest (I have not provided vfio
> device yet):
> > > >> >
> > > >> > "qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel
> > > >> arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img
> > > >> -serial tcp::,server,telnet"
> > > >> >
> > > >> > After the I can see qemu command line (able to run various
> > > >> > commands like "info
> > > >> registers" etc), while guest does not boot (not even the first print
> comes).
> > > >> >
> > > >> > Can anyone help in what I am missing or doing wrong?
> > > >>
> > > >> x86 doesn't use the serial port for console by default, so you're
> > > >> making things quite a bit more difficult that way.  Typically
> > > >> you'll want to provide a disk image (the -hda option is the
> > > >> easiest way to do this), a display (-vga std -vnc :0 is again
> > > >> easiest), and probably something to install from (-cdrom
> > > >> ).  You can also add a -boot d to get it to choose the cdrom
> the first time for install.
> > > >> Thanks,
> > > >
> > > > Thanks Avi and Alex, I can see the KVM guest boot prints by adding
> > > > -append
> > > "console=ttyS0"
> > >
> > > Note, once you get to user space you will need a getty specified in
> > > inittab in order to get a login on your serial port.   Something like:
> > >
> > >T0:23:respawn:/sbin/getty -L ttyS0
> >
> > 1)
> > I tried booting with prebuilt qcow2 then it works for me:
> > qemu-system-x86_64  -enable-kvm  -nographic  -device sga  -m 1024 -hda
> > debian_squeeze_amd64_standard.qcow2
> >
> > Does anyone help on how I can add my kernel to qcow2? Or create a proper
> qcow2?
> >
> > 2)
> > Also I tried as mentioned in section "3.9 Direct Linux Boot":
> http://qemu.weilnetz.de/qemu-doc.html#disk_005fimages :
> >
> > qemu-kvm  -enable-kvm  -nographic -kernel /boot/vmlinuz-3.5.0+ -hda
> > /boot/initramfs-3.5.0+.img  -append "console=ttyS0 root=/dev/sda" -m
> > 1024
> >
> > I get below error :
> > [1.299225] No filesystem could mount root, tried:  ext3 ext2 ext4 
> > iso9660
> > [1.303232] Kernel panic - not syncing: VFS: Unable to mount root fs on
> unknown-block(8,0)
> 
> 
> Chances are your root is a partition on /dev/sda, not the disk itself (ex. 
> sda1,
> sda2)... Why exactly are you causing yourself so much pain by doing these 
> direct
> boot, no VGA options instead of starting with an ISO image, installing it, 
> then
> setting up a serial console if you want serial access?  This may be the norm 
> for
> powerpc VMs, but it's not how I think most people setup x86 VMs.  Thanks,

I have a fedora machine to which I do not have direct access (but I can reboot 
remotely, have a console). So far what I was trying direct booting VM using 
same initramfs and bzimage as of host,

Alex, How I can create a ISO image with my kernel? Where I should place that on 
host?

Thanks
-Bharat

> 
> Alex
> 
> > [1.307683] Pid: 1, comm: swapper/0 Not tainted 3.3.5-2.fc16.x86_64 #1
> > [1.311201] Call Trace:
> > [1.312548]  [] panic+0xba/0x1cd
> > [1.315160]  [] mount_block_root+0x258/0x283
> > [1.318275]  [] mount_root+0x53/0x57
> > [1.321047]  [] prepare_namespace+0x13d/0x176
> > [1.324206]  [] kernel_init+0x156/0x15b
> > [1.327114]  [] ? schedule_tail+0x27/0xb0
> > [1.330102]  [] kernel_thread_helper+0x4/0x10
> > [1.333413]  [] ? start_kernel+0x3c5/0x3c5
> > [1.336446]  [] ? gs_change+0x13/0x13
> >
> > Thanks
> > -Bharat
> >
> 
> 
> 



Re: [Qemu-devel] Running KVM guest on X86

2012-08-10 Thread Bhushan Bharat-R65777
> > > >> On Mon, 2012-08-06 at 15:40 +, Bhushan Bharat-R65777 wrote:
> > > >> > Hi Avi/All,
> > > >> >
> > > >> > I am facing issue to boot KVM guest on x86 (I used to work on
> > > >> > PowerPC platform
> > > >> and do not have enough knowledge of x86). I am working on making
> > > >> VFIO working on PowerPC Booke, So I have cloned Alex Williamsons
> > > >> git repository, compiled kernel for x86 on fedora with
> > > >> virtualization configuration (selected all kernel config options
> > > >> for same). Run below command to boot Guest (I have not provided vfio
> device yet):
> > > >> >
> > > >> > "qemu-system-x86_64 -enable-kvm -m 1024 -nographic -kernel
> > > >> arch/x86_64/boot/bzImage -initrd /boot/initramfs-3.5.0-rc4+.img
> > > >> -serial tcp::,server,telnet"
> > > >> >
> > > >> > After the I can see qemu command line (able to run various
> > > >> > commands like "info
> > > >> registers" etc), while guest does not boot (not even the first print
> comes).
> > > >> >
> > > >> > Can anyone help in what I am missing or doing wrong?
> > > >>
> > > >> x86 doesn't use the serial port for console by default, so you're
> > > >> making things quite a bit more difficult that way.  Typically
> > > >> you'll want to provide a disk image (the -hda option is the
> > > >> easiest way to do this), a display (-vga std -vnc :0 is again
> > > >> easiest), and probably something to install from (-cdrom
> > > >> ).  You can also add a -boot d to get it to choose the cdrom
> the first time for install.
> > > >> Thanks,
> > > >
> > > > Thanks Avi and Alex, I can see the KVM guest boot prints by adding
> > > > -append
> > > "console=ttyS0"
> > >
> > > Note, once you get to user space you will need a getty specified in
> > > inittab in order to get a login on your serial port.   Something like:
> > >
> > >T0:23:respawn:/sbin/getty -L ttyS0
> >
> > 1)
> > I tried booting with prebuilt qcow2 then it works for me:
> > qemu-system-x86_64  -enable-kvm  -nographic  -device sga  -m 1024 -hda
> > debian_squeeze_amd64_standard.qcow2
> >
> > Does anyone help on how I can add my kernel to qcow2? Or create a proper
> qcow2?
> >
> > 2)
> > Also I tried as mentioned in section "3.9 Direct Linux Boot":
> http://qemu.weilnetz.de/qemu-doc.html#disk_005fimages :
> >
> > qemu-kvm  -enable-kvm  -nographic -kernel /boot/vmlinuz-3.5.0+ -hda
> > /boot/initramfs-3.5.0+.img  -append "console=ttyS0 root=/dev/sda" -m
> > 1024
> >
> -hda  /boot/initramfs-3.5.0+.img is incorrect. Should be -hda
> debian_squeeze_amd64_standard.qcow2 -initrd /boot/initramfs-3.5.0+.img and
> root=/dev/sda1 probably.

I tried :
qemu-system-x86_64  -enable-kvm  -nographic  -kernel /boot/vmlinuz-3.5.0+  
-initrd /boot/initramfs-3.5.0+.img  -append "root=/dev/sda1 rw console=ttyS0" 
-m 1024 -hda debian_squeeze_amd64_standard.qcow2

With this I get the login prompt, but it is not taking input character from 
keyboard properly (not able to give login credentials even). Seeing some weird 
behavior, like sometimes it treat normal character as like ENTER pressed.

Below are some boot prints and it is found that there were some junk characters 
after "Setting console screen modes".

--
Setting console screen modes.
]Rcannot (un)set powersave mode
[9;30][14;30]Skipping font and keymap setup (handled by console-setup).
Setting up console font and keymap...done.
[   11.547904] rc (278) used greatest stack depth: 1760 bytes left
Starting portmap daemon...Already running..
Starting NFS common utilities: statd.
Starting enhanced syslogd: rsyslogd.
Starting deferred execution scheduler: atd.
Starting ACPI services
Starting periodic command scheduler: cron.
Starting MTA: exim4.

Debian GNU/Linux 6.0 debian-amd64 ttyS0

debian-amd64 login:


Thanks 
-Bharat





Re: [Qemu-devel] [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller

2012-08-14 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Wednesday, August 15, 2012 6:59 AM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat-
> R65777
> Subject: Re: [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller
> 
> On 08/14/2012 07:50 AM, Bharat Bhushan wrote:
> > PCI Root complex have TYPE-1 configuration header while PCI endpoint
> > have type-0 configuration header. The type-1 configuration header have
> > a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci
> > address space to CCSR address space. This can used for 2 purposes: 1)
> > for MSI interrupt generation 2) Allow CCSR registers access when
> > configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM
> guest.
> >
> > What I observed is that when guest read the size of BAR0 of host
> > controller configuration header (TYPE1 header) then it always reads it
> > as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the
> > PCI controller device registering BAR0. I do not find any other
> > controller also doing so may they do not use BAR0.
> >
> > There are two issues when BAR0 is not there (which I can think of):
> > 1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header)
> > and when reading the size of BAR0, it should give size as per real h/w.
> >
> > 2) Do we need this BAR0 inbound address translation?
> > When BAR0 is of non-zero size then it will be configured for PCI
> > address space to local address(CCSR) space translation on inbound access.
> > The primary use case is for MSI interrupt generation. The device is
> > configured with a address offsets in PCI address space, which will be
> > translated to MSI interrupt generation MPIC registers. Currently I do
> > not understand the MSI interrupt generation mechanism in QEMU and also
> > IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines.
> > But this BAR0 will be used when using MSI on e500.
> 
> This patch is only trying to address #1, right?

Yes.

>  I don't see any connection from
> this BAR to CCSR.

Yes, it is not there. That can be the next step.

> 
> > +memory_region_init_io(&h->bar0, &pci_host_conf_be_ops, h,
> > +  "PCIHOST-bar0", 0x100);
> 
> 0x0100 is correct for e500mc-based systems, but it should be 0x0010 
> for
> e500v2-based systems.

Yes, I will correct this.

Thanks
-Bharat

> 
> -Scott



[Qemu-devel] Isuue assiging devices using VFIO on x86

2012-08-28 Thread Bhushan Bharat-R65777
Hi Alex,

In my susyem I have following devices:

I tried assigning a following PCI devices:
00:03.0 Communication controller: Intel Corporation 4 Series Chipset HECI 
Controller (rev 03)
00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER Controller 
(rev 03)
00:03.3 Serial controller: Intel Corporation 4 Series Chipset Serial KT 
Controller (rev 03)

and getting below error:

---
Command:
---
qemu-system-x86_64 -enable-kvm  -nographic  -kernel /boot/vmlinuz-3.6.0-rc2+ 
-initrd /boot/initramfs-3.6.0-rc2+.img -append "root=/dev/sda1" -m 1024 -drive 
file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device 
vfio-pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device 
vfio-pci,host=:00:03.3

Error prints:

qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning, device 
:00:03.0 does not support reset

qemu-system-x86_64: -device vfio-pci,host=:00:03.0: VFIO :00:03.0 BAR 0 
is too small to mmap, this may affect performance.

qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning, device 
:00:03.2 does not support reset

qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning, device 
:00:03.3 does not support reset

qemu: hardware error: register_ioport_read: invalid opaque for address 0x3f6
CPU #0:
EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc
ESI=2800 EDI=80002804 EBP=6fc0 ESP=6f38
EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000fcd68 0037
IDT= 000fdb60 
CR0=0011 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3= 
DR6=0ff0 DR7=0400
EFER=
FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
FPR0=  FPR1= 
FPR2=  FPR3= 
FPR4=  FPR5= 
FPR6=  FPR7= 
XMM00= XMM01=
XMM02= XMM03=
XMM04= XMM05=
XMM06= XMM07=
Aborted (core dumped)



Linux: http://github.com/awilliam/linux-vfio.git next
Qemu: https://github.com/awilliam/qemu-vfio.git  iommu-group-vfio

Any idea what's is the issue.

Thanks
-Bharat




Re: [Qemu-devel] Isuue assiging devices using VFIO on x86

2012-08-28 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, August 28, 2012 9:27 PM
> To: Bhushan Bharat-R65777
> Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
> Subject: Re: Isuue assiging devices using VFIO on x86
> 
> On Tue, 2012-08-28 at 09:23 +0000, Bhushan Bharat-R65777 wrote:
> > Hi Alex,
> >
> > In my susyem I have following devices:
> >
> > I tried assigning a following PCI devices:
> > 00:03.0 Communication controller: Intel Corporation 4 Series Chipset HECI
> Controller (rev 03)
> > 00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER Controller
> (rev 03)
> > 00:03.3 Serial controller: Intel Corporation 4 Series Chipset Serial KT
> Controller (rev 03)
> >
> > and getting below error:
> >
> > ---
> > Command:
> > ---
> > qemu-system-x86_64 -enable-kvm  -nographic  -kernel 
> > /boot/vmlinuz-3.6.0-rc2+ -
> initrd /boot/initramfs-3.6.0-rc2+.img -append "root=/dev/sda1" -m 1024 -drive
> file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device vfio-
> pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device vfio-
> pci,host=:00:03.3
> >
> > Error prints:
> > 
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning, device
> :00:03.0 does not support reset
> >
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.0: VFIO :00:03.0 
> > BAR
> 0 is too small to mmap, this may affect performance.
> >
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning, device
> :00:03.2 does not support reset
> >
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning, device
> :00:03.3 does not support reset
> >
> > qemu: hardware error: register_ioport_read: invalid opaque for address 0x3f6
> > CPU #0:
> > EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc
> > ESI=2800 EDI=80002804 EBP=6fc0 ESP=6f38
> > EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =0010   00c09300 DPL=0 DS   [-WA]
> > CS =0008   00c09b00 DPL=0 CS32 [-RA]
> > SS =0010   00c09300 DPL=0 DS   [-WA]
> > DS =0010   00c09300 DPL=0 DS   [-WA]
> > FS =0010   00c09300 DPL=0 DS   [-WA]
> > GS =0010   00c09300 DPL=0 DS   [-WA]
> > LDT=   8200 DPL=0 LDT
> > TR =   8b00 DPL=0 TSS32-busy
> > GDT= 000fcd68 0037
> > IDT= 000fdb60 
> > CR0=0011 CR2= CR3= CR4=
> > DR0= DR1= DR2=
> DR3=
> > DR6=0ff0 DR7=0400
> > EFER=
> > FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
> > FPR0=  FPR1= 
> > FPR2=  FPR3= 
> > FPR4=  FPR5= 
> > FPR6=  FPR7= 
> > XMM00= 
> > XMM01=
> > XMM02= 
> > XMM03=
> > XMM04= 
> > XMM05=
> > XMM06= 
> > XMM07=
> > Aborted (core dumped)
> >
> > 
> >
> > Linux: http://github.com/awilliam/linux-vfio.git next
> > Qemu: https://github.com/awilliam/qemu-vfio.git  iommu-group-vfio
> >
> > Any idea what's is the issue.
> 
> Please try the vfio-pci-for-qemu-1.2-v3 qemu-vfio.git tag, I'm not even
> sure what's in that old iommu-group-vfio branch.  Thanks,

Can you share the command to configure qemu?

Thanks
-Bharat

> 
> Alex
> 



Re: [Qemu-devel] [PATCH RFC] PowerPC: Added uapi directory into linux-header

2012-12-14 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Friday, December 14, 2012 5:06 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; 
> Jan
> Kiszka
> Subject: Re: [PATCH RFC] PowerPC: Added uapi directory into linux-header
> 
> 
> On 14.12.2012, at 12:04, Bharat Bhushan wrote:
> 
> > This is corrently done for powerpc.
> >
> > Signed-off-by: Bharat Bhushan 
> 
> Jan, could you please check if this is correct?
> 
> > ---
> > configure   |1 +
> > scripts/update-linux-headers.sh |5 +
> > 2 files changed, 6 insertions(+), 0 deletions(-)
> >
> > diff --git a/configure b/configure
> > index 780b19a..bdc2d5e 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3701,6 +3701,7 @@ if test "$linux" = "yes" ; then
> > # For non-KVM architectures we will not have asm headers
> > if [ -e "$source_path/linux-headers/asm-$linux_arch" ]; then
> >   symlink "$source_path/linux-headers/asm-$linux_arch"
> > linux-headers/asm
> > +  symlink "$source_path/linux-headers/uapi/asm-$linux_arch"
> > + linux-headers/uapi/asm
> > fi
> > fi
> >
> > diff --git a/scripts/update-linux-headers.sh
> > b/scripts/update-linux-headers.sh index 4c7b566..9f6bf25 100755
> > --- a/scripts/update-linux-headers.sh
> > +++ b/scripts/update-linux-headers.sh
> > @@ -48,6 +48,11 @@ for arch in $ARCHLIST; do
> >
> > rm -rf "$output/linux-headers/asm-$arch"
> > mkdir -p "$output/linux-headers/asm-$arch"
> > +if [ $arch = powerpc ]; then
> 
> This looks bogus. There shouldn't be any powerpc specifics anywhere in this
> file.

This file have x86 specific also, why ?

-Bharat

> 
> 
> Alex
> 
> > +   rm -rf $output/linux-headers/uapi/asm-$arch/*
> > +cp "$linux/arch/$arch/include/uapi/asm/epapr_hcalls.h"
> "$output/linux-headers/uapi/asm-$arch/"
> > +fi
> > +
> > for header in kvm.h kvm_para.h; do
> > cp "$tmpdir/include/asm/$header" "$output/linux-headers/asm-$arch"
> > done
> > --
> > 1.7.0.4
> >
> >
> 





Re: [Qemu-devel] [PATCH] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, December 17, 2012 7:45 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; 
> Jan
> Kiszka; Peter Maydell
> Subject: Re: [PATCH] Added uapi directory into linux-header
> 
> 
> On 17.12.2012, at 05:08, Bharat Bhushan wrote:
> 
> > Linux ARCH specific header files are now in uapi/ directory also.
> > These header files (epapr_hcalls.h on powerpc) are needed in qemu in
> > linux-headers/uapi/asm/ directory as these are referenced by other
> > header files in linux-headers/asm/.
> >
> > This patch is about changing the scripts for same.
> >
> > Signed-off-by: Bharat Bhushan 
> 
> Please include people who were participating in the discussion previously in 
> CC
> when you post a new version.
> 
> > ---
> > configure   |1 +
> > scripts/update-linux-headers.sh |   10 ++
> > 2 files changed, 11 insertions(+), 0 deletions(-)
> >
> > diff --git a/configure b/configure
> > index 38b1cc6..51cce6b 100755
> > --- a/configure
> > +++ b/configure
> > @@ -3724,6 +3724,7 @@ if test "$linux" = "yes" ; then
> > # For non-KVM architectures we will not have asm headers
> > if [ -e "$source_path/linux-headers/asm-$linux_arch" ]; then
> >   symlink "$source_path/linux-headers/asm-$linux_arch"
> > linux-headers/asm
> > +  symlink "$source_path/linux-headers/uapi/asm-$linux_arch"
> > + linux-headers/uapi/asm
> 
> This is still wrong. You want this guarded by another if [ -e ].

This is not wrong because  

> 
> > fi
> > fi
> >
> > diff --git a/scripts/update-linux-headers.sh
> > b/scripts/update-linux-headers.sh index 4c7b566..d40f9c4 100755
> > --- a/scripts/update-linux-headers.sh
> > +++ b/scripts/update-linux-headers.sh
> > @@ -46,14 +46,24 @@ for arch in $ARCHLIST; do
> >
> > make -C "$linux" INSTALL_HDR_PATH="$tmpdir" SRCARCH=$arch
> > headers_install
> >
> > +if ! [ -e "$output/linux-headers/uapi" ] ; then
> > +mkdir "$output/linux-headers/uapi"
> > +fi
> 
> ... which would mean you don't need this bit.

... because here we ensure that the directory exists.

> 
> > +
> > rm -rf "$output/linux-headers/asm-$arch"
> > mkdir -p "$output/linux-headers/asm-$arch"
> > +rm -rf "$output/linux-headers/uapi/asm-$arch"
> > +mkdir -p "$output/linux-headers/uapi/asm-$arch"
> 
> Only if it exists, no?

Above it is ensured that the directory always exists .. 

> 
> > +
> > for header in kvm.h kvm_para.h; do
> > cp "$tmpdir/include/asm/$header" "$output/linux-headers/asm-$arch"
> > done
> > if [ $arch = x86 ]; then
> > cp "$tmpdir/include/asm/hyperv.h" "$output/linux-headers/asm-x86"
> > fi
> > +if [ $arch = powerpc ]; then
> > +cp "$linux/arch/$arch/include/uapi/asm/epapr_hcalls.h"
> "$output/linux-headers/uapi/asm-$arch/"
> > +fi
> 
> Why is this pointing at $linux and not $tmpdir?

Because I found that $tmpdir does not have epapr_haclls.h, though I do not know 
why ? any idea why ?

> Also, no need for $arch - be
> explicit and make it "powerpc".

Ok.

-Bharat

> 
> 
> Alex
> 
> > done
> >
> > rm -rf "$output/linux-headers/linux"
> > --
> > 1.7.0.4
> >
> >
> 





Re: [Qemu-devel] [PATCH] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, December 17, 2012 8:24 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Jan Kiszka; Peter Maydell
> Subject: Re: [PATCH] Added uapi directory into linux-header
> 
> 
> On 17.12.2012, at 15:51, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Monday, December 17, 2012 7:45 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan
> >> Bharat-R65777; Jan Kiszka; Peter Maydell
> >> Subject: Re: [PATCH] Added uapi directory into linux-header
> >>
> >>
> >> On 17.12.2012, at 05:08, Bharat Bhushan wrote:
> >>
> >>> Linux ARCH specific header files are now in uapi/ directory also.
> >>> These header files (epapr_hcalls.h on powerpc) are needed in qemu in
> >>> linux-headers/uapi/asm/ directory as these are referenced by other
> >>> header files in linux-headers/asm/.
> >>>
> >>> This patch is about changing the scripts for same.
> >>>
> >>> Signed-off-by: Bharat Bhushan 
> >>
> >> Please include people who were participating in the discussion
> >> previously in CC when you post a new version.
> >>
> >>> ---
> >>> configure   |1 +
> >>> scripts/update-linux-headers.sh |   10 ++
> >>> 2 files changed, 11 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/configure b/configure
> >>> index 38b1cc6..51cce6b 100755
> >>> --- a/configure
> >>> +++ b/configure
> >>> @@ -3724,6 +3724,7 @@ if test "$linux" = "yes" ; then
> >>># For non-KVM architectures we will not have asm headers
> >>>if [ -e "$source_path/linux-headers/asm-$linux_arch" ]; then
> >>>  symlink "$source_path/linux-headers/asm-$linux_arch"
> >>> linux-headers/asm
> >>> +  symlink "$source_path/linux-headers/uapi/asm-$linux_arch"
> >>> + linux-headers/uapi/asm
> >>
> >> This is still wrong. You want this guarded by another if [ -e ].
> >
> > This is not wrong because 
> >
> >>
> >>>fi
> >>> fi
> >>>
> >>> diff --git a/scripts/update-linux-headers.sh
> >>> b/scripts/update-linux-headers.sh index 4c7b566..d40f9c4 100755
> >>> --- a/scripts/update-linux-headers.sh
> >>> +++ b/scripts/update-linux-headers.sh
> >>> @@ -46,14 +46,24 @@ for arch in $ARCHLIST; do
> >>>
> >>>make -C "$linux" INSTALL_HDR_PATH="$tmpdir" SRCARCH=$arch
> >>> headers_install
> >>>
> >>> +if ! [ -e "$output/linux-headers/uapi" ] ; then
> >>> +mkdir "$output/linux-headers/uapi"
> >>> +fi
> >>
> >> ... which would mean you don't need this bit.
> >
> > ... because here we ensure that the directory exists.
> 
> But that means we create a uapi directory for archs that don't have one, 
> right?

Yes, I think that's the direction going forward (user specific header files in 
uapi/).

> 
> >
> >>
> >>> +
> >>>rm -rf "$output/linux-headers/asm-$arch"
> >>>mkdir -p "$output/linux-headers/asm-$arch"
> >>> +rm -rf "$output/linux-headers/uapi/asm-$arch"
> >>> +mkdir -p "$output/linux-headers/uapi/asm-$arch"
> >>
> >> Only if it exists, no?
> >
> > Above it is ensured that the directory always exists ..
> >
> >>
> >>> +
> >>>for header in kvm.h kvm_para.h; do
> >>>cp "$tmpdir/include/asm/$header" "$output/linux-headers/asm-$arch"
> >>>done
> >>>if [ $arch = x86 ]; then
> >>>cp "$tmpdir/include/asm/hyperv.h" "$output/linux-headers/asm-x86"
> >>>fi
> >>> +if [ $arch = powerpc ]; then
> >>> +cp "$linux/arch/$arch/include/uapi/asm/epapr_hcalls.h"
> >> "$output/linux-headers/uapi/asm-$arch/"
> >>> +fi
> >>
> >> Why is this pointing at $linux and not $tmpdir?
> >
> > Because I found that $tmpdir does not have epapr_haclls.h, though I do not
> know why ? any idea why ?
> 
> Smells like a kernel bug :). In fact, I remember that we used to forget
> uapi/asm/epapr_hcalls.h in the header list a while ago.

I am not able to locate which command (set of commands) are installing the 
header files (all/selectively header files of arch/powerpc/include/asm/ and 
arch/powerp/include/uapi/asm/) in $tmpdir/

> Are you sure you're
> basing on the latest kvm code?

kvm-ppc-next branch of http://github.com/agraf/linux-2.6.git 

-Bharat

> 
> 
> Alex
> 
> >
> >> Also, no need for $arch - be
> >> explicit and make it "powerpc".
> >
> > Ok.
> >
> > -Bharat
> >
> >>
> >>
> >> Alex
> >>
> >>> done
> >>>
> >>> rm -rf "$output/linux-headers/linux"
> >>> --
> >>> 1.7.0.4
> >>>
> >>>
> >>
> >
> >
> 





Re: [Qemu-devel] [PATCH 2/3] Reset qemu timers when guest reset

2012-12-17 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, December 17, 2012 7:53 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 2/3] Reset qemu timers when guest reset
> 
> 
> On 17.12.2012, at 07:08, Bharat Bhushan wrote:
> 
> > This patch install the timer reset handler. This will be called when
> > the guest is reset.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > hw/ppc_booke.c |   12 
> > 1 files changed, 12 insertions(+), 0 deletions(-)
> >
> > diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index d51e7fa..837a5b6
> > 100644
> > --- a/hw/ppc_booke.c
> > +++ b/hw/ppc_booke.c
> > @@ -231,6 +231,16 @@ void store_booke_tcr(CPUPPCState *env,
> > target_ulong val)
> >
> > }
> >
> > +static void ppc_booke_timer_reset_handle(void *opaque) {
> > +CPUPPCState *env = opaque;
> > +
> 
> Doesn't this need a cpu_synchronize_state() call?

There are some more registered reset_handler which changes the spr's but does 
not call synchronize..

But is not the qemu_system_reset() ( which calls registered reset handler) 
synchronizes the cpu state ?

-Bharat

> 
> Alex
> 
> > +env->spr[SPR_BOOKE_TSR] = 0;
> > +env->spr[SPR_BOOKE_TCR] = 0;
> > +
> > +booke_update_irq(env);
> > +}
> > +
> > void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t
> > flags) {
> > ppc_tb_t *tb_env;
> > @@ -251,4 +261,6 @@ void ppc_booke_timers_init(CPUPPCState *env, uint32_t
> freq, uint32_t flags)
> > qemu_new_timer_ns(vm_clock, &booke_fit_cb, env);
> > booke_timer->wdt_timer =
> > qemu_new_timer_ns(vm_clock, &booke_wdt_cb, env);
> > +
> > +qemu_register_reset(ppc_booke_timer_reset_handle, env);
> > }
> > --
> > 1.7.0.4
> >
> >
> 





Re: [Qemu-devel] [PATCH] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, December 17, 2012 8:39 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Jan Kiszka; Peter Maydell
> Subject: Re: [PATCH] Added uapi directory into linux-header
> 
> 
> On 17.12.2012, at 16:02, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Monday, December 17, 2012 8:24 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Jan Kiszka;
> >> Peter Maydell
> >> Subject: Re: [PATCH] Added uapi directory into linux-header
> >>
> >>
> >> On 17.12.2012, at 15:51, Bhushan Bharat-R65777 wrote:
> >>
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Alexander Graf [mailto:ag...@suse.de]
> >>>> Sent: Monday, December 17, 2012 7:45 PM
> >>>> To: Bhushan Bharat-R65777
> >>>> Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan
> >>>> Bharat-R65777; Jan Kiszka; Peter Maydell
> >>>> Subject: Re: [PATCH] Added uapi directory into linux-header
> >>>>
> >>>>
> >>>> On 17.12.2012, at 05:08, Bharat Bhushan wrote:
> >>>>
> >>>>> Linux ARCH specific header files are now in uapi/ directory also.
> >>>>> These header files (epapr_hcalls.h on powerpc) are needed in qemu
> >>>>> in linux-headers/uapi/asm/ directory as these are referenced by
> >>>>> other header files in linux-headers/asm/.
> >>>>>
> >>>>> This patch is about changing the scripts for same.
> >>>>>
> >>>>> Signed-off-by: Bharat Bhushan 
> >>>>
> >>>> Please include people who were participating in the discussion
> >>>> previously in CC when you post a new version.
> >>>>
> >>>>> ---
> >>>>> configure   |1 +
> >>>>> scripts/update-linux-headers.sh |   10 ++
> >>>>> 2 files changed, 11 insertions(+), 0 deletions(-)
> >>>>>
> >>>>> diff --git a/configure b/configure index 38b1cc6..51cce6b 100755
> >>>>> --- a/configure
> >>>>> +++ b/configure
> >>>>> @@ -3724,6 +3724,7 @@ if test "$linux" = "yes" ; then
> >>>>>   # For non-KVM architectures we will not have asm headers
> >>>>>   if [ -e "$source_path/linux-headers/asm-$linux_arch" ]; then
> >>>>> symlink "$source_path/linux-headers/asm-$linux_arch"
> >>>>> linux-headers/asm
> >>>>> +  symlink "$source_path/linux-headers/uapi/asm-$linux_arch"
> >>>>> + linux-headers/uapi/asm
> >>>>
> >>>> This is still wrong. You want this guarded by another if [ -e ].
> >>>
> >>> This is not wrong because 
> >>>
> >>>>
> >>>>>   fi
> >>>>> fi
> >>>>>
> >>>>> diff --git a/scripts/update-linux-headers.sh
> >>>>> b/scripts/update-linux-headers.sh index 4c7b566..d40f9c4 100755
> >>>>> --- a/scripts/update-linux-headers.sh
> >>>>> +++ b/scripts/update-linux-headers.sh
> >>>>> @@ -46,14 +46,24 @@ for arch in $ARCHLIST; do
> >>>>>
> >>>>>   make -C "$linux" INSTALL_HDR_PATH="$tmpdir" SRCARCH=$arch
> >>>>> headers_install
> >>>>>
> >>>>> +if ! [ -e "$output/linux-headers/uapi" ] ; then
> >>>>> +mkdir "$output/linux-headers/uapi"
> >>>>> +fi
> >>>>
> >>>> ... which would mean you don't need this bit.
> >>>
> >>> ... because here we ensure that the directory exists.
> >>
> >> But that means we create a uapi directory for archs that don't have one,
> right?
> >
> > Yes, I think that's the direction going forward (user specific header files 
> > in
> uapi/).
> 
> Sure, but we shouldn't be faster than Linux itself here ;).

I am fine with what makes you happier :)

> 
> >
> >>
> >>>
> >>>>
> >>>>> +
> >>>>&g

Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777
> > +++ b/scripts/update-linux-headers.sh
> > @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do
> >
> > make -C "$linux" INSTALL_HDR_PATH="$tmpdir" SRCARCH=$arch
> > headers_install
> >
> > +if [ -e "$linux/arch/$arch/include/uapi" ] &&
> > +! [ -e "$output/linux-headers/uapi" ] ; then
> > +mkdir "$output/linux-headers/uapi"
> 
> mkdir -p
> 
> But looking through this whole thing, it seems like the root cause is actually
> different. We don't want any uapi directories exposed to user space. So let's 
> go
> back a step:
> 
> Why do we need the uapi include dir? Because some header is using it.
> 
> linux-headers/asm-powerpc/kvm_para.h:

The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/

Is not this the correct thing that any header file in include/uapi/asm/ (in 
this case kvm_para.h) includes another header file (epapr_hcalls.h) in same 
directory?

Also I think now only the uapi/asm/*.h files should be exposed to userspace 
(QEMU here).

Waiting for David's comment for better clarity ... 

-Bharat

> 
> #include 
> 
> This is the root cause of the problem. We must never manually include any uapi
> header paths. We only ever include their normal asm-counterparts which then 
> may
> include uapi (in kernel) or actually are uapi (in user space). David, please
> correct me if I'm wrong.
> 
> Could you please try and see if this kernel side patch makes things work for 
> you
> too?
> 
> 
> Alex
> 
> 
> diff --git a/arch/powerpc/include/uapi/asm/kvm_para.h
> b/arch/powerpc/include/uapi/asm/kvm_para.h
> index ed0e025..e3af328 100644
> --- a/arch/powerpc/include/uapi/asm/kvm_para.h
> +++ b/arch/powerpc/include/uapi/asm/kvm_para.h
> @@ -78,7 +78,7 @@ struct kvm_vcpu_arch_shared {
> 
>  #define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num)
> 
> -#include 
> +#include 
> 
>  #define KVM_FEATURE_MAGIC_PAGE   1
> 






Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Tuesday, December 18, 2012 6:51 AM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel qemu-devel; Peter Maydell; Jan Kiszka; qemu-...@nongnu.org 
> List;
> Marcelo Tosatti; David Howells
> Subject: Re: [PATCH v2] Added uapi directory into linux-header
> 
> 
> On 18.12.2012, at 02:14, Bhushan Bharat-R65777 wrote:
> 
> >>> +++ b/scripts/update-linux-headers.sh
> >>> @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do
> >>>
> >>>make -C "$linux" INSTALL_HDR_PATH="$tmpdir" SRCARCH=$arch
> >>> headers_install
> >>>
> >>> +if [ -e "$linux/arch/$arch/include/uapi" ] &&
> >>> +! [ -e "$output/linux-headers/uapi" ] ; then
> >>> +mkdir "$output/linux-headers/uapi"
> >>
> >> mkdir -p
> >>
> >> But looking through this whole thing, it seems like the root cause is
> >> actually different. We don't want any uapi directories exposed to
> >> user space. So let's go back a step:
> >>
> >> Why do we need the uapi include dir? Because some header is using it.
> >>
> >> linux-headers/asm-powerpc/kvm_para.h:
> >
> > The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/
> >
> > Is not this the correct thing that any header file in include/uapi/asm/ (in
> this case kvm_para.h) includes another header file (epapr_hcalls.h) in same
> directory?
> >
> > Also I think now only the uapi/asm/*.h files should be exposed to userspace
> (QEMU here).
> 
> make headers_install should basically remove all the uapi magic and give us
> normal backwards-compatible asm trees :).

I am perfectly fine, How we can do this now :)

-Bharat

> 
> 
> Alex
> 
> >
> > Waiting for David's comment for better clarity ...
> >
> > -Bharat
> >
> >>
> >>#include 
> >>
> >> This is the root cause of the problem. We must never manually include
> >> any uapi header paths. We only ever include their normal
> >> asm-counterparts which then may include uapi (in kernel) or actually
> >> are uapi (in user space). David, please correct me if I'm wrong.
> >>
> >> Could you please try and see if this kernel side patch makes things
> >> work for you too?
> >>
> >>
> >> Alex
> >>
> >>
> >> diff --git a/arch/powerpc/include/uapi/asm/kvm_para.h
> >> b/arch/powerpc/include/uapi/asm/kvm_para.h
> >> index ed0e025..e3af328 100644
> >> --- a/arch/powerpc/include/uapi/asm/kvm_para.h
> >> +++ b/arch/powerpc/include/uapi/asm/kvm_para.h
> >> @@ -78,7 +78,7 @@ struct kvm_vcpu_arch_shared {
> >>
> >> #define KVM_HCALL_TOKEN(num) _EV_HCALL_TOKEN(EV_KVM_VENDOR_ID, num)
> >>
> >> -#include 
> >> +#include 
> >>
> >> #define KVM_FEATURE_MAGIC_PAGE 1
> >>
> >
> >
> >
> 





Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header

2012-12-17 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Tuesday, December 18, 2012 7:00 AM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel qemu-devel; Peter Maydell; Jan Kiszka; qemu-...@nongnu.org 
> List;
> Marcelo Tosatti; David Howells
> Subject: Re: [PATCH v2] Added uapi directory into linux-header
> 
> 
> On 18.12.2012, at 02:27, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Tuesday, December 18, 2012 6:51 AM
> >> To: Bhushan Bharat-R65777
> >> Cc: qemu-devel qemu-devel; Peter Maydell; Jan Kiszka;
> >> qemu-...@nongnu.org List; Marcelo Tosatti; David Howells
> >> Subject: Re: [PATCH v2] Added uapi directory into linux-header
> >>
> >>
> >> On 18.12.2012, at 02:14, Bhushan Bharat-R65777 wrote:
> >>
> >>>>> +++ b/scripts/update-linux-headers.sh
> >>>>> @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do
> >>>>>
> >>>>>   make -C "$linux" INSTALL_HDR_PATH="$tmpdir" SRCARCH=$arch
> >>>>> headers_install
> >>>>>
> >>>>> +if [ -e "$linux/arch/$arch/include/uapi" ] &&
> >>>>> +! [ -e "$output/linux-headers/uapi" ] ; then
> >>>>> +mkdir "$output/linux-headers/uapi"
> >>>>
> >>>> mkdir -p
> >>>>
> >>>> But looking through this whole thing, it seems like the root cause
> >>>> is actually different. We don't want any uapi directories exposed
> >>>> to user space. So let's go back a step:
> >>>>
> >>>> Why do we need the uapi include dir? Because some header is using it.
> >>>>
> >>>> linux-headers/asm-powerpc/kvm_para.h:
> >>>
> >>> The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/
> >>>
> >>> Is not this the correct thing that any header file in
> >>> include/uapi/asm/ (in
> >> this case kvm_para.h) includes another header file (epapr_hcalls.h)
> >> in same directory?
> >>>
> >>> Also I think now only the uapi/asm/*.h files should be exposed to
> >>> userspace
> >> (QEMU here).
> >>
> >> make headers_install should basically remove all the uapi magic and
> >> give us normal backwards-compatible asm trees :).
> >
> > I am perfectly fine, How we can do this now :)
> 
> Well, for starters, do the headers work if you apply the patch I sent in a
> previous mail plus the epapr_hcall.h copy? If so, then that's the way to go :)

Are you really sure that applying a patch and then syncing (or other way round) 
 is the way you want to go ?

To me it does not look good, I think we can go with the script changes to make 
install_header is updated to do the work.

-Bharat

> 
> 
> Alex
> 





Re: [Qemu-devel] [PATCH v2] Added uapi directory into linux-header

2012-12-18 Thread Bhushan Bharat-R65777
> >>> +++ b/scripts/update-linux-headers.sh
> >>> @@ -46,14 +46,26 @@ for arch in $ARCHLIST; do
> >>>
> >>>  make -C "$linux" INSTALL_HDR_PATH="$tmpdir" SRCARCH=$arch
> >>> headers_install
> >>>
> >>> +if [ -e "$linux/arch/$arch/include/uapi" ] &&
> >>> +! [ -e "$output/linux-headers/uapi" ] ; then
> >>> +mkdir "$output/linux-headers/uapi"
> >>
> >> mkdir -p
> >>
> >> But looking through this whole thing, it seems like the root
> >> cause is actually different. We don't want any uapi directories
> >> exposed to user space. So let's go back a step:
> >>
> >> Why do we need the uapi include dir? Because some header is using it.
> >>
> >> linux-headers/asm-powerpc/kvm_para.h:
> >
> > The kvm_para.h (also kvm.h) are now defined in include/uapi/asm/
> >
> > Is not this the correct thing that any header file in
> > include/uapi/asm/ (in
>  this case kvm_para.h) includes another header file (epapr_hcalls.h)
>  in same directory?
> >
> > Also I think now only the uapi/asm/*.h files should be exposed to
> > userspace
>  (QEMU here).
> 
>  make headers_install should basically remove all the uapi magic and
>  give us normal backwards-compatible asm trees :).
> >>>
> >>> I am perfectly fine, How we can do this now :)
> >>
> >> Well, for starters, do the headers work if you apply the patch I sent
> >> in a previous mail plus the epapr_hcall.h copy? If so, then that's
> >> the way to go :)
> >
> > Are you really sure that applying a patch and then syncing (or other way
> round)  is the way you want to go ?
> 
> Yes, because I'm quite confident we're generating broken headers right now.

Ok, so every time someone does the sync he/she has to do this? Also do we think 
that sometime in future this will be taken care by make header_install?

Thanks
-Bharat

> 
> Alex
> 
> >
> > To me it does not look good, I think we can go with the script changes to 
> > make
> install_header is updated to do the work.
> >
> > -Bharat
> >
> >>
> >>
> >> Alex
> >>
> >
> >





Re: [Qemu-devel] [PATCH 3/3] Enable kvm emulated watchdog

2012-12-27 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, December 17, 2012 8:09 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 3/3] Enable kvm emulated watchdog
> 
> 
> On 17.12.2012, at 07:08, Bharat Bhushan wrote:
> 
> > Enable the KVM emulated watchdog if KVM supports (use the capability
> > enablement in watchdog handler). Also watchdog exit
> > (KVM_EXIT_WATCHDOG) handling is added.
> > Watchdog state machine is cleared whenever VM state changes to running.
> > This is to handle the cases like return from debug halt etc.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > hw/ppc.h |2 +
> > hw/ppc_booke.c   |   71 
> > ++
> > target-ppc/kvm.c |   13 +-
> > 3 files changed, 85 insertions(+), 1 deletions(-)
> >
> > diff --git a/hw/ppc.h b/hw/ppc.h
> > index 2f3ea27..3672fe8 100644
> > --- a/hw/ppc.h
> > +++ b/hw/ppc.h
> > @@ -44,6 +44,8 @@ struct ppc_tb_t {
> >
> > uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t
> > tb_offset); clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t
> > freq);
> > +extern int cap_ppc_watchdog;
> > +extern int cap_booke_sregs;
> 
> No. Never export cap_ variables. They are kvm internal.
> 
> > /* Embedded PowerPC DCR management */
> > typedef uint32_t (*dcr_read_cb)(void *opaque, int dcrn); typedef void
> > (*dcr_write_cb)(void *opaque, int dcrn, uint32_t val); diff --git
> > a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..f18df74 100644
> > --- a/hw/ppc_booke.c
> > +++ b/hw/ppc_booke.c
> > @@ -21,6 +21,8 @@
> >  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> > DEALINGS IN
> >  * THE SOFTWARE.
> >  */
> > +#include "sysemu.h"
> > +#include "kvm.h"
> > #include "hw.h"
> > #include "ppc.h"
> > #include "qemu-timer.h"
> > @@ -203,6 +205,11 @@ static void booke_wdt_cb(void *opaque)
> >  booke_timer->wdt_timer); }
> >
> > +static void ppc_booke_watchdog_clear_tsr(CPUPPCState *env,
> > +target_ulong tsr) {
> > +env->spr[SPR_BOOKE_TSR] = tsr & ~(TSR_ENW | TSR_WIS |
> > +TSR_WRS_MASK); }
> > +
> > void store_booke_tsr(CPUPPCState *env, target_ulong val) {
> > env->spr[SPR_BOOKE_TSR] &= ~val;
> > @@ -241,6 +248,64 @@ static void ppc_booke_timer_reset_handle(void *opaque)
> > booke_update_irq(env);
> > }
> >
> > +static void cpu_state_change_handler(void *opaque, int running,
> > +RunState state) {
> > +CPUPPCState *env = opaque;
> > +
> > +struct kvm_sregs sregs;
> > +
> > +if (!running) {
> > +return;
> > +}
> > +
> > +/*
> > + * Clear watchdog interrupt condition by clearing TSR.
> > + * Similar logic needed to be implemented for watchdog
> > + * emulation in qemu.
> > + */
> > +
> > +if (!kvm_enabled()) {
> > +/* FIXME: add handling for qemu emulated case */
> > +return;
> > +}
> > +
> > +if (cap_booke_sregs && cap_ppc_watchdog) {
> > +kvm_vcpu_ioctl(env, KVM_GET_SREGS, &sregs);
> > +
> > +/* Clear TSR.ENW, TSR.WIS and TSR.WRS */
> > +ppc_booke_watchdog_clear_tsr(env, sregs.u.e.tsr);
> 
> This should happen outside of the if (kvm_enabled()) block.
> 
> > +sregs.u.e.tsr = env->spr[SPR_BOOKE_TSR];
> > +sregs.u.e.update_special = KVM_SREGS_E_UPDATE_TSR;
> > +
> > +kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs);
> 
> Please create a kvmppc_... wrapper for all this in target-ppc/kvm.c. Or maybe
> even better yet add a helper variable that tells the kvm register sync 
> function
> to sync TSR as well and just use the normal cpu_synchronize_state() way of
> pushing register into the CPU.

Not sure what type of helper variable you are talking about.
What came in my mine is we define a helper variable as per bitmap of SREGS 
update feature KVM_SREGS_E_UPDATE_* (update_special) in env. Whenever any code 
changes the env[spr] it will set the update_special. Env->update_special will 
be checked in put_registers().

Thanks
-Bharat

> 
> > +}
> > +}
> > +
> > +static int kvm_booke_watchdog_enable(CPUPPCState *env) {
> > +int ret;
> > +struct kvm_enable_cap encap = {};
> > +
> > +if (!kvm_enab

[Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars

2013-02-14 Thread Bhushan Bharat-R65777
Hi Alex Williamson,

I am able (with some hacks :)) to directly assign the e1000 PCI device to KVM 
guest using VFIO on Freescale device. One of the problem I am facing is about 
the DMA mapping in IOMMU for PCI device BARs. On Freescale devices, the mmaped 
PCI device BARs are not required to be mapped in IOMMU. Typically the flow of 
in/out transaction (from CPU) is: 

Incoming flow:

-| |--| |---|   
|-|
CORE |<<--<-<--| IOMMU|<---<---<| 
PCI-Controller|<--<-<<| PCI device  |
-| |--| |---|   
|-|

Outgoing Flow: IOMMU is bypassed for out transactions

-| |--| |---|   
|-|
CORE |>>-->|   | IOMMU|->-->| 
PCI-Controller|>-->->>| PCI device  |
-| |   |--|   ^ |---|   
|-|
   |  |
   |--|


Also because of some hardware limitations on our IOMMU it is difficult to map 
these BAR regions with RAM (DDR) regions. So on Freescale device we want the 
VFIO_IOMMU_MAP_DMA ioctl to be called for RAM regions (DDR) only and _not_ for 
these mmaped ram regions of PCI device bars. I can understand that we need to 
add these mmaped PCI bars as RAM type in qemu memory_region_*(). So for that I 
tried to skip these regions in VFIO memory_listeners. Below changes which works 
for me. I am not sure whether this is correct approach, please suggest.

-
diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c
index c51ae67..63728d8 100644
--- a/hw/vfio_pci.c
+++ b/hw/vfio_pci.c
@@ -1115,9 +1115,35 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 return -errno;
 }
 
-static bool vfio_listener_skipped_section(MemoryRegionSection *section)
+static int memory_region_is_mmap_bars(VFIOContainer *container,
+  MemoryRegionSection *section)
 {
-return !memory_region_is_ram(section->mr);
+VFIOGroup *group;
+VFIODevice *vdev;
+int i;
+
+QLIST_FOREACH(group, &container->group_list, next) {
+QLIST_FOREACH(vdev, &group->device_list, next) {
+if (vdev->msix->mmap_mem.ram_addr == section->mr->ram_addr)
+return 1;
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+VFIOBAR *bar = &vdev->bars[i];
+if (bar->mmap_mem.ram_addr == section->mr->ram_addr)
+return 1;
+}
+}
+}
+
+return 0;
+}
+
+static bool vfio_listener_skipped_section(VFIOContainer *container,
+  MemoryRegionSection *section)
+{
+if (!memory_region_is_ram(section->mr))
+   return 1;
+
+return memory_region_is_mmap_bars(container, section);
 }
 
 static void vfio_listener_region_add(MemoryListener *listener,
@@ -1129,7 +1155,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 void *vaddr;
 int ret;
 
-if (vfio_listener_skipped_section(section)) {
+if (vfio_listener_skipped_section(container, section)) {
 DPRINTF("vfio: SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n",
 section->offset_within_address_space,
 section->offset_within_address_space + section->size - 1);
@@ -1173,7 +1199,7 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 hwaddr iova, end;
 int ret;
-if (vfio_listener_skipped_section(section)) {
+if (vfio_listener_skipped_section(container, section)) {
 DPRINTF("vfio: SKIPPING region_del %"HWADDR_PRIx" - %"PRIx64"\n",
 section->offset_within_address_space,
 section->offset_within_address_space + section->size - 1);

-


Thanks
-Bharat




Re: [Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars

2013-02-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Thursday, February 14, 2013 11:57 PM
> To: Bhushan Bharat-R65777
> Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701; qemu-
> de...@nongnu.org; qemu-...@nongnu.org
> Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars
> 
> On Thu, 2013-02-14 at 08:22 +, Bhushan Bharat-R65777 wrote:
> > Hi Alex Williamson,
> >
> > I am able (with some hacks :)) to directly assign the e1000 PCI device
> > to KVM guest using VFIO on Freescale device. One of the problem I am
> > facing is about the DMA mapping in IOMMU for PCI device BARs. On
> > Freescale devices, the mmaped PCI device BARs are not required to be
> > mapped in IOMMU. Typically the flow of in/out transaction (from CPU)
> > is:
> >
> > Incoming flow:
> >
> > -| |--| |---|
> |-|
> > CORE |<<--<-<--| IOMMU|<---<---<| 
> > PCI-Controller|<--<-
> <<| PCI device  |
> > -| |--| |---|
> |-|
> >
> > Outgoing Flow: IOMMU is bypassed for out transactions
> >
> > -| |--| |---|
> |-|
> > CORE |>>-->|   | IOMMU|->-->| 
> > PCI-Controller|>-->-
> >>| PCI device  |
> > -| |   |--|   ^ |---|
> |-|
> >|  |
> >|--|
> 
> Mapping the BAR into the IOMMU isn't for this second use case, CPU to device 
> is
> never IOMMU translated.  Instead, it's for this:
> 
> |--| |---|   |-|
> | IOMMU|<---<---<| PCI-Controller|<--<-<<| PCI device  |
> |--| |---|   |-|
>   |
>   |  |---|   |-|
>   +-->>--- ->| PCI-Controller|>-->->>| PCI device  |
>  |---|   |-|
> 
> It's perfectly fine to not support peer-to-peer DMA, but let's skip it where
> it's not supported in case it might actually work in some cases.
> 
> > Also because of some hardware limitations on our IOMMU it is difficult
> > to map these BAR regions with RAM (DDR) regions. So on Freescale
> > device we want the VFIO_IOMMU_MAP_DMA ioctl to be called for RAM
> > regions (DDR) only and _not_ for these mmaped ram regions of PCI
> > device bars. I can understand that we need to add these mmaped PCI
> > bars as RAM type in qemu memory_region_*(). So for that I tried to
> > skip these regions in VFIO memory_listeners. Below changes which works
> > for me. I am not sure whether this is correct approach, please
> > suggest.
> >
> > -
> > diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index c51ae67..63728d8
> > 100644
> > --- a/hw/vfio_pci.c
> > +++ b/hw/vfio_pci.c
> > @@ -1115,9 +1115,35 @@ static int vfio_dma_map(VFIOContainer *container,
> hwaddr iova,
> >  return -errno;
> >  }
> >
> > -static bool vfio_listener_skipped_section(MemoryRegionSection
> > *section)
> > +static int memory_region_is_mmap_bars(VFIOContainer *container,
> > +  MemoryRegionSection *section)
> >  {
> > -return !memory_region_is_ram(section->mr);
> > +VFIOGroup *group;
> > +VFIODevice *vdev;
> > +int i;
> > +
> > +QLIST_FOREACH(group, &container->group_list, next) {
> 
> I think you want to start at the global &group_list

Why you think global? My understanding is that the operation on dma_map/unmap 
is limited to a group in the given container.

> 
> > +QLIST_FOREACH(vdev, &group->device_list, next) {
> > +if (vdev->msix->mmap_mem.ram_addr ==
> > + section->mr->ram_addr)
> 
> Note the comment in memory.h:
> 
> struct MemoryRegion {
> /* All fields are private - violators will be prosecuted */
> ...
> ram_addr_t ram_addr;
> 
> You'll need to invent some kind of memory region comparison interfaces instead
> of accessing these private fields.

You mean adding some api ine memory.c?

> 
> > +return 1;
> > +

Re: [Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars

2013-02-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, February 19, 2013 9:39 AM
> To: Bhushan Bharat-R65777
> Cc: Alex Williamson; Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-
> B36701; qemu-devel@nongnu.org; qemu-...@nongnu.org
> Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars
> 
> On 02/18/2013 10:05:20 PM, Bhushan Bharat-R65777 wrote:
> >
> >
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Thursday, February 14, 2013 11:57 PM
> > > To: Bhushan Bharat-R65777
> > > Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701;
> > qemu-
> > > de...@nongnu.org; qemu-...@nongnu.org
> > > Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for
> > MMAPED PCI bars
> > >
> > > On Thu, 2013-02-14 at 08:22 +, Bhushan Bharat-R65777 wrote:
> > > > +static bool vfio_listener_skipped_section(VFIOContainer
> > *container,
> > > > +  MemoryRegionSection
> > > > +*section) {
> > > > +if (!memory_region_is_ram(section->mr))
> > > > +   return 1;
> > >
> > > Need some kind of conditional here for platforms that support
> > peer-to-peer.
> > > Thanks,
> >
> > What about adding a iommu_type will help here (VFIO_TYPE2_IOMMU),
> > which does not require BARs to be mapped in iommu ?
> 
> What specifically does "type 2" mean again?  If we're going to use it as the
> thing to key off of for all sorts of PAMU quirks, might as well just call it
> VFIO_IOMMU_FSL_PAMU.

Just thought of a common name so in case some other IOMMU also fall in this 
category and add a comment where this will be defined.
I am fine with this name as well :)

Thanks
-Bharat






Re: [Qemu-devel] VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars

2013-02-18 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, February 19, 2013 9:55 AM
> To: Bhushan Bharat-R65777
> Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701; qemu-
> de...@nongnu.org; qemu-...@nongnu.org
> Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED PCI bars
> 
> On Tue, 2013-02-19 at 04:05 +, Bhushan Bharat-R65777 wrote:
> >
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Thursday, February 14, 2013 11:57 PM
> > > To: Bhushan Bharat-R65777
> > > Cc: Yoder Stuart-B08248; Wood Scott-B07421; Graf Alexander-B36701;
> > > qemu- de...@nongnu.org; qemu-...@nongnu.org
> > > Subject: Re: VFIO: Not require to make VFIO_IOMMU_MAP_DMA for MMAPED
> > > PCI bars
> > >
> > > On Thu, 2013-02-14 at 08:22 +, Bhushan Bharat-R65777 wrote:
> > > > Hi Alex Williamson,
> > > >
> > > > I am able (with some hacks :)) to directly assign the e1000 PCI
> > > > device to KVM guest using VFIO on Freescale device. One of the
> > > > problem I am facing is about the DMA mapping in IOMMU for PCI
> > > > device BARs. On Freescale devices, the mmaped PCI device BARs are
> > > > not required to be mapped in IOMMU. Typically the flow of in/out
> > > > transaction (from CPU)
> > > > is:
> > > >
> > > > Incoming flow:
> > > >
> > > > -| |--| |---|
> > > |-|
> > > > CORE |<<--<-<--| IOMMU|<---<---<| 
> > > > PCI-Controller|<--<-
> 
> > > <<| PCI device  |
> > > > -| |--| |---|
> > > |-|
> > > >
> > > > Outgoing Flow: IOMMU is bypassed for out transactions
> > > >
> > > > -| |--| |---|
> > > |-|
> > > > CORE |>>-->|   | IOMMU|->-->| 
> > > > PCI-Controller|>-->-
> 
> > > >>| PCI device  |
> > > > -| |   |--|   ^ |---|
> > > |-|
> > > >|  |
> > > >|--|
> > >
> > > Mapping the BAR into the IOMMU isn't for this second use case, CPU
> > > to device is never IOMMU translated.  Instead, it's for this:
> > >
> > > |--| |---|   |-|
> > > | IOMMU|<---<---<| PCI-Controller|<--<-<<| PCI device  |
> > > |--| |---|   |-|
> > >   |
> > >   |  |---|   |-|
> > >   +-->>--- ->| PCI-Controller|>-->->>| PCI device  |
> > >  |---|   |-|
> > >
> > > It's perfectly fine to not support peer-to-peer DMA, but let's skip
> > > it where it's not supported in case it might actually work in some cases.
> > >
> > > > Also because of some hardware limitations on our IOMMU it is
> > > > difficult to map these BAR regions with RAM (DDR) regions. So on
> > > > Freescale device we want the VFIO_IOMMU_MAP_DMA ioctl to be called
> > > > for RAM regions (DDR) only and _not_ for these mmaped ram regions
> > > > of PCI device bars. I can understand that we need to add these
> > > > mmaped PCI bars as RAM type in qemu memory_region_*(). So for that
> > > > I tried to skip these regions in VFIO memory_listeners. Below
> > > > changes which works for me. I am not sure whether this is correct
> > > > approach, please suggest.
> > > >
> > > > -
> > > > diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index c51ae67..63728d8
> > > > 100644
> > > > --- a/hw/vfio_pci.c
> > > > +++ b/hw/vfio_pci.c
> > > > @@ -1115,9 +1115,35 @@ static int vfio_dma_map(VFIOContainer
> > > > *container,
> > > hwaddr iova,
> > > >  return -errno;
> > > >  }
> > > >
> > > > -static bool vfio_listene

Re: [Qemu-devel] [PATCH 1/2] e500: Adding CCSR memory region

2012-10-03 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Wednesday, October 03, 2012 5:39 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 1/2] e500: Adding CCSR memory region
> 
> 
> On 03.10.2012, at 13:49, Bharat Bhushan wrote:
> 
> > All devices are also placed under CCSR memory region.
> > The CCSR memory region is exported to pci device. The MSI interrupt
> > generation is the main reason to export the CCSR region to PCI device.
> > This put the requirement to move mpic under CCSR region, but logically
> > all devices should be under CCSR. So this patch places all emulated
> > devices under ccsr region.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > hw/ppc/e500.c |   51 +--
> > 1 files changed, 37 insertions(+), 14 deletions(-)
> >
> > diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index 5bab340..197411d
> > 100644
> > --- a/hw/ppc/e500.c
> > +++ b/hw/ppc/e500.c
> > @@ -46,14 +46,23 @@
> > /* TODO: parameterize */
> > #define MPC8544_CCSRBAR_BASE   0xE000ULL
> > #define MPC8544_CCSRBAR_SIZE   0x0010ULL
> > -#define MPC8544_MPIC_REGS_BASE (MPC8544_CCSRBAR_BASE + 0x4ULL)
> > -#define MPC8544_SERIAL0_REGS_BASE  (MPC8544_CCSRBAR_BASE + 0x4500ULL)
> > -#define MPC8544_SERIAL1_REGS_BASE  (MPC8544_CCSRBAR_BASE + 0x4600ULL)
> > -#define MPC8544_PCI_REGS_BASE  (MPC8544_CCSRBAR_BASE + 0x8000ULL)
> > +#define MPC8544_MPIC_REGS_OFFSET   0x4ULL
> > +#define MPC8544_MPIC_REGS_BASE (MPC8544_CCSRBAR_BASE + \
> > +MPC8544_MPIC_REGS_OFFSET) #define
> > +MPC8544_SERIAL0_REGS_OFFSET 0x4500ULL #define
> > +MPC8544_SERIAL0_REGS_BASE  (MPC8544_CCSRBAR_BASE + \
> > +MPC8544_SERIAL0_REGS_OFFSET)
> > +#define MPC8544_SERIAL1_REGS_OFFSET 0x4600ULL #define
> > +MPC8544_SERIAL1_REGS_BASE  (MPC8544_CCSRBAR_BASE + \
> > +MPC8544_SERIAL1_REGS_OFFSET)
> > +#define MPC8544_PCI_REGS_OFFSET0x8000ULL
> > +#define MPC8544_PCI_REGS_BASE  (MPC8544_CCSRBAR_BASE + \
> > +MPC8544_PCI_REGS_OFFSET)
> 
> You don't use any of the bases anymore, right? Please remove the respective
> #define's.

Alex, some of these bases are used in device tree creation code.

Thanks
-Bharat
> 
> The rest of the patch looks very nice.
> 
> 
> Alex
> 
> > #define MPC8544_PCI_REGS_SIZE  0x1000ULL
> > #define MPC8544_PCI_IO 0xE100ULL
> > #define MPC8544_PCI_IOLEN  0x1ULL
> > -#define MPC8544_UTIL_BASE  (MPC8544_CCSRBAR_BASE + 0xeULL)
> > +#define MPC8544_UTIL_OFFSET0xeULL
> > +#define MPC8544_UTIL_BASE  (MPC8544_CCSRBAR_BASE +
> MPC8544_UTIL_OFFSET)
> > #define MPC8544_SPIN_BASE  0xEF00ULL
> >
> > struct boot_info
> > @@ -419,6 +428,8 @@ void ppce500_init(PPCE500Params *params)
> > qemu_irq **irqs, *mpic;
> > DeviceState *dev;
> > CPUPPCState *firstenv = NULL;
> > +MemoryRegion *ccsr;
> > +SysBusDevice *s;
> >
> > /* Setup CPUs */
> > if (params->cpu_model == NULL) {
> > @@ -474,8 +485,12 @@ void ppce500_init(PPCE500Params *params)
> > vmstate_register_ram_global(ram);
> > memory_region_add_subregion(address_space_mem, 0, ram);
> >
> > +ccsr = g_malloc0(sizeof(MemoryRegion));
> > +memory_region_init(ccsr, "e500-cssr", MPC8544_CCSRBAR_SIZE);
> > +memory_region_add_subregion(address_space_mem,
> > + MPC8544_CCSRBAR_BASE, ccsr);
> > +
> > /* MPIC */
> > -mpic = mpic_init(address_space_mem, MPC8544_MPIC_REGS_BASE,
> > +mpic = mpic_init(ccsr, MPC8544_MPIC_REGS_OFFSET,
> >  smp_cpus, irqs, NULL);
> >
> > if (!mpic) {
> > @@ -484,25 +499,33 @@ void ppce500_init(PPCE500Params *params)
> >
> > /* Serial */
> > if (serial_hds[0]) {
> > -serial_mm_init(address_space_mem, MPC8544_SERIAL0_REGS_BASE,
> > +serial_mm_init(ccsr, MPC8544_SERIAL0_REGS_OFFSET,
> >0, mpic[12+26], 399193,
> >serial_hds[0], DEVICE_BIG_ENDIAN);
> > }
> >
> > if (serial_hds[1]) {
> > -serial_mm_init(address_space_mem, MPC8544_SERIAL1_REGS_BASE,
> > +serial_mm_init(ccsr, MPC8544_SERIAL1_REGS_OFFSET,
> >

Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-03 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Wednesday, October 03, 2012 5:41 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel qemu-devel; qemu-...@nongnu.org List; Bhushan Bharat-R65777; 
> Avi
> Kivity
> Subject: Re: [PATCH 2/2] Adding BAR0 for e500 PCI controller
> 
> 
> On 03.10.2012, at 13:50, Bharat Bhushan wrote:
> 
> > PCI Root complex have TYPE-1 configuration header while PCI endpoint
> > have type-0 configuration header. The type-1 configuration header have
> > a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci
> > address space to CCSR address space. This can used for 2 purposes: 1)
> > for MSI interrupt generation 2) Allow CCSR registers access when
> > configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM
> guest.
> >
> > What I observed is that when guest read the size of BAR0 of host
> > controller configuration header (TYPE1 header) then it always reads it
> > as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the
> > PCI controller device registering BAR0. I do not find any other
> > controller also doing so may they do not use BAR0.
> >
> > There are two issues when BAR0 is not there (which I can think of):
> > 1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header)
> > and when reading the size of BAR0, it should give size as per real h/w.
> >
> > This patch solves this problem.
> >
> > 2) Do we need this BAR0 inbound address translation?
> >When BAR0 is of non-zero size then it will be configured for
> > PCI address space to local address(CCSR) space translation on inbound 
> > access.
> > The primary use case is for MSI interrupt generation. The device is
> > configured with a address offsets in PCI address space, which will be
> > translated to MSI interrupt generation MPIC registers. Currently I do
> > not understand the MSI interrupt generation mechanism in QEMU and also
> > IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines.
> > But this BAR0 will be used when using MSI on e500.
> >
> > I can see one more issue, There are ATMUs emulated in
> > hw/ppce500_pci.c, but i do not see these being used for address translation.
> > So far that works because pci address space and local address space
> > are 1:1 mapped. BAR0 inbound translation + ATMU translation will
> > complete the address translation of inbound traffic.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > hw/ppc/e500.c|1 +
> > hw/ppce500_pci.c |   13 +
> > 2 files changed, 14 insertions(+), 0 deletions(-)
> >
> > diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index 197411d..c7ae2b6
> > 100644
> > --- a/hw/ppc/e500.c
> > +++ b/hw/ppc/e500.c
> > @@ -518,6 +518,7 @@ void ppce500_init(PPCE500Params *params)
> >
> > /* PCI */
> > dev = qdev_create(NULL, "e500-pcihost");
> > +qdev_prop_set_ptr(dev, "bar0_region", ccsr);
> > qdev_init_nofail(dev);
> > s = sysbus_from_qdev(dev);
> > sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git
> > a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644
> > --- a/hw/ppce500_pci.c
> > +++ b/hw/ppce500_pci.c
> > @@ -87,6 +87,7 @@ struct PPCE500PCIState {
> > /* mmio maps */
> > MemoryRegion container;
> > MemoryRegion iomem;
> > +void *bar0;
> > };
> >
> > typedef struct PPCE500PCIState PPCE500PCIState; @@ -315,6 +316,8 @@
> > static int e500_pcihost_initfn(SysBusDevice *dev)
> > int i;
> > MemoryRegion *address_space_mem = get_system_memory();
> > MemoryRegion *address_space_io = get_system_io();
> > +PCIDevice *pdev;
> > +MemoryRegion bar0;
> >
> > h = PCI_HOST_BRIDGE(dev);
> > s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -342,6 +345,10 @@ static int
> > e500_pcihost_initfn(SysBusDevice *dev)
> > memory_region_add_subregion(&s->container, PCIE500_REG_BASE, &s->iomem);
> > sysbus_init_mmio(dev, &s->container);
> >
> > +bar0 = *(MemoryRegion *)s->bar0;
> > +pdev = pci_find_device(b, 0, 0);
> > +pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &bar0);
> 
> Any reason you can't just pass in s->bar0 here? Though I'm not sure whether we
> have to do something special to get a memory region alias.

Yes I think this should work:
 pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, (MemoryRegion 
*)s->bar0);

Thanks
-Bharat

> 
> Av

Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-04 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Thursday, October 04, 2012 6:02 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat-
> R65777
> Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
> 
> On 10/03/2012 01:50 PM, Bharat Bhushan wrote:
> >  sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git
> > a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644
> > --- a/hw/ppce500_pci.c
> > +++ b/hw/ppce500_pci.c
> > @@ -87,6 +87,7 @@ struct PPCE500PCIState {
> >  /* mmio maps */
> >  MemoryRegion container;
> >  MemoryRegion iomem;
> > +void *bar0;
> >  };
> 
> void *?  Why?

Passing parameter via qdev is either possible by value or by void *. That's why 
I used void *.

> 
> >
> >  typedef struct PPCE500PCIState PPCE500PCIState; @@ -315,6 +316,8 @@
> > static int e500_pcihost_initfn(SysBusDevice *dev)
> >  int i;
> >  MemoryRegion *address_space_mem = get_system_memory();
> >  MemoryRegion *address_space_io = get_system_io();
> > +PCIDevice *pdev;
> > +MemoryRegion bar0;
> >
> >  h = PCI_HOST_BRIDGE(dev);
> >  s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -342,6 +345,10 @@ static
> > int e500_pcihost_initfn(SysBusDevice *dev)
> >  memory_region_add_subregion(&s->container, PCIE500_REG_BASE, 
> > &s->iomem);
> >  sysbus_init_mmio(dev, &s->container);
> >
> > +bar0 = *(MemoryRegion *)s->bar0;
> > +pdev = pci_find_device(b, 0, 0);
> > +pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &bar0);
> > +
> 
> This is broken, you're registering an object on the stack which will be freed 
> as
> soon as the function returns.
> 
> Just pass &s->bar0 as Alex suggests.

Ok.

> 
> However this is best done from the pci device's initialization function, not
> here (the MemoryRegion should be a member of the pci device as well).

We want to set BAR0 of PCI controller so we did this here. But yes, we also 
want that all devices under the PCI controller have this mapping, so when they 
access the MPIC register to send MSI then they can do that.

Which device's initialization function you are talking about?

Thanks
-Bharat

> 
> >  return 0;
> >  }
> >
> 
> 
> --
> error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-04 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Thursday, October 04, 2012 8:28 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de
> Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
> 
> On 10/04/2012 03:46 PM, Bhushan Bharat-R65777 wrote:
> >
> >
> >> -Original Message-
> >> From: Avi Kivity [mailto:a...@redhat.com]
> >> Sent: Thursday, October 04, 2012 6:02 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de;
> >> Bhushan Bharat-
> >> R65777
> >> Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI
> >> controller
> >>
> >> On 10/03/2012 01:50 PM, Bharat Bhushan wrote:
> >> >  sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git
> >> > a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2 100644
> >> > --- a/hw/ppce500_pci.c
> >> > +++ b/hw/ppce500_pci.c
> >> > @@ -87,6 +87,7 @@ struct PPCE500PCIState {
> >> >  /* mmio maps */
> >> >  MemoryRegion container;
> >> >  MemoryRegion iomem;
> >> > +void *bar0;
> >> >  };
> >>
> >> void *?  Why?
> >
> > Passing parameter via qdev is either possible by value or by void *. That's
> why I used void *.
> 
> Then you shouldn't be using qdev for this.  But if you make the changes below,
> it will likely not be necessary.
> 
> >>
> >> However this is best done from the pci device's initialization
> >> function, not here (the MemoryRegion should be a member of the pci device 
> >> as
> well).
> >
> > We want to set BAR0 of PCI controller so we did this here. But yes, we also
> want that all devices under the PCI controller have this mapping, so when they
> access the MPIC register to send MSI then they can do that.
> 
> Please elaborate.  One way to describe what you want is to issue an 'info 
> mtree'
> and show what changes you want.

Setup is something like this:

  |-|
  | PCI device  |
  | |
  ---
|
|
|
|
 |---|
 |   |
 | PCI controller|
 |   |
 |   --  |
 |   |  BAR0 in   |  |
 |   |   TYPE1|  |
 |   | Config HDR |  |
 |   --  |
 |   |
  ---

BAR0: it is an inbound window, and on e500 devices this maps the pci bus 
address to CCSR address.

My requirement are:
 1) when guest read pci controller BAR0, it is able to get proper size ( Size 
of CCSR space)
 2) Guest can program this to any address in pci address space. When PCI device 
access this address range then that address should be mapped to CCSR address 
space.

Though this patch only met the requirement number (1). 

> 
> >
> > Which device's initialization function you are talking about?
> 
> static const TypeInfo e500_host_bridge_info = {
> .name  = "e500-host-bridge",
> .parent= TYPE_PCI_DEVICE,
> .instance_size = sizeof(PCIDevice),
> .class_init= e500_host_bridge_class_init,
> };
> 
> This needs to describe a derived class of PCIDevice, hold the BAR as a data
> member, and register the BAR in the init function (which doesn't exist yet
> because you haven't subclassed it).  This way the BAR lives in the device 
> which
> exposes it, like BARs everywhere.

Do you mean that we add the "MemoryRegion bar0" in PCIDevice struct. Do the 
same thing that I was doing in e500_pcihost_initfn() in the k->init() (will add 
this) function of "e500-host-bridge"

This way I should be able to met both of my requirements.

Thanks
-Bharat

> 
> --
> error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-04 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, October 04, 2012 9:37 PM
> To: Bhushan Bharat-R65777
> Cc: Avi Kivity; qemu-devel@nongnu.org; qemu-...@nongnu.org
> Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
> 
> 
> On 04.10.2012, at 18:03, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Avi Kivity [mailto:a...@redhat.com]
> >> Sent: Thursday, October 04, 2012 8:28 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de
> >> Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI
> >> controller
> >>
> >> On 10/04/2012 03:46 PM, Bhushan Bharat-R65777 wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Avi Kivity [mailto:a...@redhat.com]
> >>>> Sent: Thursday, October 04, 2012 6:02 PM
> >>>> To: Bhushan Bharat-R65777
> >>>> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de;
> >>>> Bhushan Bharat-
> >>>> R65777
> >>>> Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI
> >>>> controller
> >>>>
> >>>> On 10/03/2012 01:50 PM, Bharat Bhushan wrote:
> >>>>> sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git
> >>>>> a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2
> >>>>> 100644
> >>>>> --- a/hw/ppce500_pci.c
> >>>>> +++ b/hw/ppce500_pci.c
> >>>>> @@ -87,6 +87,7 @@ struct PPCE500PCIState {
> >>>>> /* mmio maps */
> >>>>> MemoryRegion container;
> >>>>> MemoryRegion iomem;
> >>>>> +void *bar0;
> >>>>> };
> >>>>
> >>>> void *?  Why?
> >>>
> >>> Passing parameter via qdev is either possible by value or by void *.
> >>> That's
> >> why I used void *.
> >>
> >> Then you shouldn't be using qdev for this.  But if you make the
> >> changes below, it will likely not be necessary.
> >>
> >>>>
> >>>> However this is best done from the pci device's initialization
> >>>> function, not here (the MemoryRegion should be a member of the pci
> >>>> device as
> >> well).
> >>>
> >>> We want to set BAR0 of PCI controller so we did this here. But yes,
> >>> we also
> >> want that all devices under the PCI controller have this mapping, so
> >> when they access the MPIC register to send MSI then they can do that.
> >>
> >> Please elaborate.  One way to describe what you want is to issue an 'info
> mtree'
> >> and show what changes you want.
> >
> > Setup is something like this:
> >
> >  |-|
> >  | PCI device  |
> >  | |
> >  ---
> >|
> >|
> >|
> >|
> > |---|
> > |   |
> > | PCI controller|
> > |   |
> > |   --  |
> > |   |  BAR0 in   |  |
> > |   |   TYPE1|  |
> > |   | Config HDR |  |
> > |   --  |
> > |   |
> >  ---
> >
> > BAR0: it is an inbound window, and on e500 devices this maps the pci bus
> address to CCSR address.
> >
> > My requirement are:
> > 1) when guest read pci controller BAR0, it is able to get proper size
> > ( Size of CCSR space)
> > 2) Guest can program this to any address in pci address space. When PCI 
> > device
> access this address range then that address should be mapped to CCSR address
> space.
> >
> > Though this patch only met the requirement number (1).
> 
> No, it also meets (2). The PCI address space is identical to the CPU memory
> space in our mapping right now. So if the guest maps BAR0 somewhere, it
> automatically maps CCSR into CPU address space, which exposes it to PCI 
> address
> space.

Really? I think on powerpc the pci address space is defined as: it maps the 
outbound window just below 0x1__, then CCSR and then inbound window. So 
inbound window is 1:1 map if guest physical starts from 0x0. But I do not think 
CCSR is 1:1 map in pci address space and cpu address space. 

> 
> >
> >>
> >>>
> >>> Which device's initialization

Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-05 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, October 04, 2012 9:37 PM
> To: Bhushan Bharat-R65777
> Cc: Avi Kivity; qemu-devel@nongnu.org; qemu-...@nongnu.org
> Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller
> 
> 
> On 04.10.2012, at 18:03, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Avi Kivity [mailto:a...@redhat.com]
> >> Sent: Thursday, October 04, 2012 8:28 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de
> >> Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI
> >> controller
> >>
> >> On 10/04/2012 03:46 PM, Bhushan Bharat-R65777 wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Avi Kivity [mailto:a...@redhat.com]
> >>>> Sent: Thursday, October 04, 2012 6:02 PM
> >>>> To: Bhushan Bharat-R65777
> >>>> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de;
> >>>> Bhushan Bharat-
> >>>> R65777
> >>>> Subject: Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI
> >>>> controller
> >>>>
> >>>> On 10/03/2012 01:50 PM, Bharat Bhushan wrote:
> >>>>> sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]); diff --git
> >>>>> a/hw/ppce500_pci.c b/hw/ppce500_pci.c index 92b1dc0..16e4af2
> >>>>> 100644
> >>>>> --- a/hw/ppce500_pci.c
> >>>>> +++ b/hw/ppce500_pci.c
> >>>>> @@ -87,6 +87,7 @@ struct PPCE500PCIState {
> >>>>> /* mmio maps */
> >>>>> MemoryRegion container;
> >>>>> MemoryRegion iomem;
> >>>>> +void *bar0;
> >>>>> };
> >>>>
> >>>> void *?  Why?
> >>>
> >>> Passing parameter via qdev is either possible by value or by void *.
> >>> That's
> >> why I used void *.
> >>
> >> Then you shouldn't be using qdev for this.  But if you make the
> >> changes below, it will likely not be necessary.
> >>
> >>>>
> >>>> However this is best done from the pci device's initialization
> >>>> function, not here (the MemoryRegion should be a member of the pci
> >>>> device as
> >> well).
> >>>
> >>> We want to set BAR0 of PCI controller so we did this here. But yes,
> >>> we also
> >> want that all devices under the PCI controller have this mapping, so
> >> when they access the MPIC register to send MSI then they can do that.
> >>
> >> Please elaborate.  One way to describe what you want is to issue an 'info
> mtree'
> >> and show what changes you want.
> >
> > Setup is something like this:
> >
> >  |-|
> >  | PCI device  |
> >  | |
> >  ---
> >|
> >|
> >|
> >|
> > |---|
> > |   |
> > | PCI controller|
> > |   |
> > |   --  |
> > |   |  BAR0 in   |  |
> > |   |   TYPE1|  |
> > |   | Config HDR |  |
> > |   --  |
> > |   |
> >  ---
> >
> > BAR0: it is an inbound window, and on e500 devices this maps the pci bus
> address to CCSR address.
> >
> > My requirement are:
> > 1) when guest read pci controller BAR0, it is able to get proper size
> > ( Size of CCSR space)
> > 2) Guest can program this to any address in pci address space. When PCI 
> > device
> access this address range then that address should be mapped to CCSR address
> space.
> >
> > Though this patch only met the requirement number (1).
> 
> No, it also meets (2). The PCI address space is identical to the CPU memory
> space in our mapping right now. So if the guest maps BAR0 somewhere, it
> automatically maps CCSR into CPU address space, which exposes it to PCI 
> address
> space.
> 
> >
> >>
> >>>
> >>> Which device's initialization function you are talking about?
> >>
> >> static const TypeInfo e500_host_bridge_info = {
> >>.name  = "e500-host-bridge",
> >>.parent= TYPE_PCI_DEVICE,
> >>.instance_size = sizeof(PCIDevice),
> >>.cl

Re: [Qemu-devel] [PATCH 2/2] Adding BAR0 for e500 PCI controller

2012-10-08 Thread Bhushan Bharat-R65777
> > Which device's initialization function you are talking about?
> 
>  static const TypeInfo e500_host_bridge_info = {
>    .name  = "e500-host-bridge",
>    .parent= TYPE_PCI_DEVICE,
>    .instance_size = sizeof(PCIDevice),
>    .class_init= e500_host_bridge_class_init,
>  };
> 
>  This needs to describe a derived class of PCIDevice, hold the BAR
>  as a data member, and register the BAR in the init function (which
>  doesn't exist yet because you haven't subclassed it).  This way the
>  BAR lives in the device which exposes it, like BARs everywhere.
> >>>
> >>> Do you mean that we add the "MemoryRegion bar0" in PCIDevice struct.
> >>> Do the
> >> same thing that I was doing in e500_pcihost_initfn() in the k->init()
> >> (will add
> >> this) function of "e500-host-bridge"
> >>
> >> No, he means that you create a new struct like this:
> >>
> >>  struct foo {
> >>PCIDevice p;
> >>MemoryRegion bar0;
> >>  };
> >>
> >> Please check out any other random PCI device in QEMU. Almost all of
> >> them do this to store more information than their parent class can hold.
> >
> > Just want to be sure I understood you correctly: Do you mean something
> > like this : ( I know I have to switch to QOM mechanism to share
> > parameters)
> >
> > diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c index
> > 92b1dc0..a948bc6 100644
> > --- a/hw/ppce500_pci.c
> > +++ b/hw/ppce500_pci.c
> > @@ -89,6 +89,12 @@ struct PPCE500PCIState {
> > MemoryRegion iomem;
> > };
> >
> > +struct BHARAT {
> > +PCIDevice p;
> > +void *bar0;
> 
> MemoryRegion *bar0

If I uses " MemoryRegion *bar0"  or " MemoryRegion bar0" then how the size and 
address of bar0 will be populated?

As far as I can see, other PCI devices using this way to have extra information 
but they does not get MemoryRegion information from other object.

> 
> > +};
> > +
> > +typedef struct BHARAT bharat;
> > typedef struct PPCE500PCIState PPCE500PCIState;
> >
> > static uint64_t pci_reg_read4(void *opaque, target_phys_addr_t addr,
> > @@ -307,6 +313,16 @@ static const VMStateDescription
> > vmstate_ppce500_pci = {
> >
> > #include "exec-memory.h"
> >
> > +static int e500_pcihost_bridge_initfn(PCIDevice *d) {
> > +bharat *b = DO_UPCAST(bharat, p, d);
> > +
> > +printf("Addr = %llx, size = %llx\n", ((MemoryRegion *)b->bar0)->addr,
> (unsigned long long)int128_get64(((Me
> > +pci_register_bar(d, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
> > + (MemoryRegion *)b->bar0);
> 
> That one still has to call its parent initfn, no?

I am really sorry, but I did not get. The object says its parent is 
TYPE_PCI_DEVICE, so which function are you talking about?

> 
> > +return 0;
> > +}
> > +
> > +MemoryRegion ccsr;
> > static int e500_pcihost_initfn(SysBusDevice *dev) {
> > PCIHostState *h;
> > @@ -315,6 +331,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
> > int i;
> > MemoryRegion *address_space_mem = get_system_memory();
> > MemoryRegion *address_space_io = get_system_io();
> > +PCIDevice *d;
> >
> > h = PCI_HOST_BRIDGE(dev);
> > s = PPC_E500_PCI_HOST_BRIDGE(dev); @@ -328,7 +345,12 @@ static int
> > e500_pcihost_initfn(SysBusDevice *dev)
> >  address_space_io, PCI_DEVFN(0x11, 0), 4);
> > h->bus = b;
> >
> > -pci_create_simple(b, 0, "e500-host-bridge");
> > +d = pci_create(b, 0, "e500-host-bridge");
> > +/* ccsr-> should be passed from hw/ppc/e500.c */
> > +ccsr.addr = 0xE000;
> > +ccsr.size = int128_make64(0x0010);
> 
> What is this?

I am trying to pass the CCSR information to "e500-host-bridge". This is 
currently hardcoded, but finally this will come from hw/ppc/e500.c.

Thanks
-Bharat

> 
> 
> Alex
> 
> > +qdev_prop_set_ptr(&d->qdev, "bar0_region", &ccsr);
> > +qdev_init_nofail(&d->qdev);
> >
> > memory_region_init(&s->container, "pci-container", PCIE500_ALL_SIZE);
> > memory_region_init_io(&h->conf_mem, &pci_host_conf_be_ops, h, @@
> > -345,11 +367,18 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
> > return 0;
> > }
> >
> > +static Property pci_host_dev_info[] = {
> > +DEFINE_PROP_PTR("bar0_region", bharat, bar0),
> > +DEFINE_PROP_END_OF_LIST(),
> > +};
> > +
> > static void e500_host_bridge_class_init(ObjectClass *klass, void
> > *data) {
> > DeviceClass *dc = DEVICE_CLASS(klass);
> > PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> >
> > +k->init = e500_pcihost_bridge_initfn;
> > +dc->props = pci_host_dev_info;
> > k->vendor_id = PCI_VENDOR_ID_FREESCALE;
> > k->device_id = PCI_DEVICE_ID_MPC8533E;
> > k->class_id = PCI_CLASS_PROCESSOR_POWERPC; @@ -359,10 +388,11 @@
> > static void e500_host_bridge_class_init(ObjectClass *klass, void
> > *data) static const TypeInfo e500_host_bridge_info = {
> > .name  = "e500-host-bridge",
> > .parent= TYPE_PCI_DEVICE,
> > -.instance_size = sizeof(PCIDevice),
> > +.instan

Re: [Qemu-devel] [PATCH 3/3] Adding BAR0 for e500 PCI controller

2012-10-08 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Tuesday, October 09, 2012 1:54 AM
> To: Alexander Graf
> Cc: Andreas Färber; Bhushan Bharat-R65777; Avi Kivity; qemu-...@nongnu.org;
> qemu-devel@nongnu.org; Bhushan Bharat-R65777
> Subject: Re: [Qemu-devel] [PATCH 3/3] Adding BAR0 for e500 PCI controller
> 
> On 10/08/2012 02:04:43 PM, Alexander Graf wrote:
> >
> > On 08.10.2012, at 20:00, Andreas Färber wrote:
> >
> > > Am 08.10.2012 18:46, schrieb Bharat Bhushan:
> > >> #define BINARY_DEVICE_TREE_FILE"mpc8544ds.dtb"
> > >> #define UIMAGE_LOAD_BASE   0
> > >> -#define DTC_LOAD_PAD   0x180
> > >> +#define DTC_LOAD_PAD   0x50
> > >> #define DTC_PAD_MASK   0xF
> > >> #define INITRD_LOAD_PAD0x200
> > >> #define INITRD_PAD_MASK0xFF
> > >
> > > Was this change intentional? I don't see it being used here, and
> > commit
> > > message doesn't seem to mention it.
> >
> > I'd assume he tried to work around a bug I fixed in between. But this
> > change definitely is not intentional.
> 
> It looks like an accidental revert of
> http://patchwork.ozlabs.org/patch/179475/

Oops, it was not intended to be sent as patch. It was not accidental because 
without this revert I was not able to boot my guest.

Thanks
-Bharat





Re: [Qemu-devel] [PATCH 2/3] e500: Adding CCSR memory region

2012-10-09 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Tuesday, October 09, 2012 2:35 PM
> To: Andreas Färber
> Cc: Bhushan Bharat-R65777; ag...@suse.de; qemu-...@nongnu.org; qemu-
> de...@nongnu.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region
> 
> On 10/08/2012 07:21 PM, Andreas Färber wrote:
> > Am 08.10.2012 18:46, schrieb Bharat Bhushan:
> >> All devices are also placed under CCSR memory region.
> >> The CCSR memory region is exported to pci device. The MSI interrupt
> >> generation is the main reason to export the CCSR region to PCI device.
> >> This put the requirement to move mpic under CCSR region, but
> >> logically all devices should be under CCSR. So this patch places all
> >> emulated devices under ccsr region.
> >>
> >> +sysbus_connect_irq(s, 0, mpic[pci_irq_nrs[0]]);
> >> +sysbus_connect_irq(s, 1, mpic[pci_irq_nrs[1]]);
> >> +sysbus_connect_irq(s, 2, mpic[pci_irq_nrs[2]]);
> >> +sysbus_connect_irq(s, 3, mpic[pci_irq_nrs[3]]);
> >> +memory_region_add_subregion(ccsr, MPC8544_PCI_REGS_OFFSET,
> >> + s->mmio[0].memory);
> >
> > ... I wonder if fiddling with SysBus MMIO is a good idea.
> > s->mmio[0].addr is not getting assigned this way, which is checked as
> > condition for deleting the subregion. But sysbus_mmio_map() only adds
> > to / deletes from get_system_memory().
> > The alternative would be using a custom field rather than the
> > SysBus-internal one. Avi/Alex?
> 
> IMO yes.  Or not use sysbus at all.

What about adding a API:
void sysbus_mmio_map_to_mr(SysBusDevice *dev, int n, target_phys_addr_t addr,
   MemoryRegion *mr)
{
assert(n >= 0 && n < dev->num_mmio);

if (dev->mmio[n].addr == addr) {
/* ??? region already mapped here.  */
return;
}
if (dev->mmio[n].addr != (target_phys_addr_t)-1) {
/* Unregister previous mapping.  */
memory_region_del_subregion(mr, dev->mmio[n].memory);
}
dev->mmio[n].addr = addr;
memory_region_add_subregion(mr, addr, dev->mmio[n].memory);
}

Thanks
-Bharat

> 
> 
> --
> error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 2/3] e500: Adding CCSR memory region

2012-10-09 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Tuesday, October 09, 2012 10:24 PM
> To: Bhushan Bharat-R65777
> Cc: Andreas Färber; ag...@suse.de; qemu-...@nongnu.org; qemu-devel@nongnu.org
> Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region
> 
> On 10/09/2012 06:45 PM, Bhushan Bharat-R65777 wrote:
> >
> > What about adding a API:
> > void sysbus_mmio_map_to_mr(SysBusDevice *dev, int n, target_phys_addr_t 
> > addr,
> >MemoryRegion *mr) {
> > assert(n >= 0 && n < dev->num_mmio);
> >
> > if (dev->mmio[n].addr == addr) {
> > /* ??? region already mapped here.  */
> > return;
> > }
> > if (dev->mmio[n].addr != (target_phys_addr_t)-1) {
> > /* Unregister previous mapping.  */
> > memory_region_del_subregion(mr, dev->mmio[n].memory);
> > }
> > dev->mmio[n].addr = addr;
> > memory_region_add_subregion(mr, addr, dev->mmio[n].memory); }
> >
> 
> I think you can just use sysbus_mmio_get_region().  There are plenty of other
> users, so there's precedent.

You mean something like this : memory_region_add_subregion(mr, addr, 
sysbus_mmio_get_region(dev, 0);

Ok, but this will still not resolve the issue of not setting the 
"dev->mmio[n].addr", no ?

Thanks
-Bharat

> 
> 
> --
> error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 2/3] e500: Adding CCSR memory region

2012-10-09 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Avi Kivity [mailto:a...@redhat.com]
> Sent: Tuesday, October 09, 2012 10:31 PM
> To: Bhushan Bharat-R65777
> Cc: Andreas Färber; ag...@suse.de; qemu-...@nongnu.org; qemu-devel@nongnu.org
> Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region
> 
> On 10/09/2012 06:57 PM, Bhushan Bharat-R65777 wrote:
> >
> >
> >> -Original Message-
> >> From: Avi Kivity [mailto:a...@redhat.com]
> >> Sent: Tuesday, October 09, 2012 10:24 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: Andreas Färber; ag...@suse.de; qemu-...@nongnu.org;
> >> qemu-devel@nongnu.org
> >> Subject: Re: [PATCH 2/3] e500: Adding CCSR memory region
> >>
> >> On 10/09/2012 06:45 PM, Bhushan Bharat-R65777 wrote:
> >> >
> >> > What about adding a API:
> >> > void sysbus_mmio_map_to_mr(SysBusDevice *dev, int n, target_phys_addr_t
> addr,
> >> >MemoryRegion *mr) {
> >> > assert(n >= 0 && n < dev->num_mmio);
> >> >
> >> > if (dev->mmio[n].addr == addr) {
> >> > /* ??? region already mapped here.  */
> >> > return;
> >> > }
> >> > if (dev->mmio[n].addr != (target_phys_addr_t)-1) {
> >> > /* Unregister previous mapping.  */
> >> > memory_region_del_subregion(mr, dev->mmio[n].memory);
> >> > }
> >> > dev->mmio[n].addr = addr;
> >> > memory_region_add_subregion(mr, addr, dev->mmio[n].memory); }
> >> >
> >>
> >> I think you can just use sysbus_mmio_get_region().  There are plenty
> >> of other users, so there's precedent.
> >
> > You mean something like this : memory_region_add_subregion(mr, addr,
> > sysbus_mmio_get_region(dev, 0);
> >
> > Ok, but this will still not resolve the issue of not setting the "dev-
> >mmio[n].addr", no ?
> 
> Correct.  But there are 20 uses already, so it can't matter much.  If someone
> wants to fix them, they can write a new API and do a sweep.
> 
> But really, sysbus just needs to go away.  It's pointless to give it more and
> more features.

Ok

Thanks
-Bharat

> 
> 
> --
> error compiling committee.c: too many arguments to function





Re: [Qemu-devel] [PATCH 2/2 v2] Adding BAR0 for e500 PCI controller

2012-10-09 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Wednesday, October 10, 2012 4:08 AM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; 
> afaer...@suse.de;
> Bhushan Bharat-R65777
> Subject: Re: [Qemu-devel] [PATCH 2/2 v2] Adding BAR0 for e500 PCI controller
> 
> On 10/09/2012 01:19:10 PM, Bharat Bhushan wrote:
> > +static int e500_ccsr_initfn(SysBusDevice *dev) {
> > +PPCE500CCSRState *pci_ccsr;
> > +
> > +pci_ccsr = CCSR(dev);
> > +memory_region_init(&pci_ccsr->ccsr_space, "e500-ccsr",
> > +   MPC8544_CCSRBAR_SIZE);
> > +return 0;
> > +}
> 
> Is this object supposed to represent CCSR (which is what the type name seems 
> to
> imply, along with the existence of a different
> PPCE500PCIBridgeState) or PCI BAR0 (which is what pci_ccsr seems to imply, 
> along
> with the fact that it's being added in the PCI patch)?

It is ccsr, I will correct this naming.

Thanks
-Bharat





Re: [Qemu-devel] Isuue assiging devices using VFIO on x86

2012-08-29 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, August 29, 2012 12:16 PM
> To: Bhushan Bharat-R65777
> Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
> Subject: Re: Isuue assiging devices using VFIO on x86
> 
> On Tue, 2012-08-28 at 17:10 +0000, Bhushan Bharat-R65777 wrote:
> >
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Tuesday, August 28, 2012 9:27 PM
> > > To: Bhushan Bharat-R65777
> > > Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
> > > Subject: Re: Isuue assiging devices using VFIO on x86
> > >
> > > On Tue, 2012-08-28 at 09:23 +, Bhushan Bharat-R65777 wrote:
> > > > Hi Alex,
> > > >
> > > > In my susyem I have following devices:
> > > >
> > > > I tried assigning a following PCI devices:
> > > > 00:03.0 Communication controller: Intel Corporation 4 Series
> > > > Chipset HECI
> > > Controller (rev 03)
> > > > 00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT IDER
> > > > Controller
> > > (rev 03)
> > > > 00:03.3 Serial controller: Intel Corporation 4 Series Chipset
> > > > Serial KT
> > > Controller (rev 03)
> > > >
> > > > and getting below error:
> > > >
> > > > ---
> > > > Command:
> > > > ---
> > > > qemu-system-x86_64 -enable-kvm  -nographic  -kernel
> > > > /boot/vmlinuz-3.6.0-rc2+ -
> > > initrd /boot/initramfs-3.6.0-rc2+.img -append "root=/dev/sda1" -m
> > > 1024 -drive
> > > file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device vfio-
> > > pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device
> > > vfio-
> > > pci,host=:00:03.3
> > > >
> > > > Error prints:
> > > > 
> > > > qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning,
> > > > device
> > > :00:03.0 does not support reset
> > > >
> > > > qemu-system-x86_64: -device vfio-pci,host=:00:03.0: VFIO
> > > > :00:03.0 BAR
> > > 0 is too small to mmap, this may affect performance.
> > > >
> > > > qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning,
> > > > device
> > > :00:03.2 does not support reset
> > > >
> > > > qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning,
> > > > device
> > > :00:03.3 does not support reset
> > > >
> > > > qemu: hardware error: register_ioport_read: invalid opaque for
> > > > address 0x3f6 CPU #0:
> > > > EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc ESI=2800
> > > > EDI=80002804 EBP=6fc0 ESP=6f38
> > > > EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > > > ES =0010   00c09300 DPL=0 DS   [-WA]
> > > > CS =0008   00c09b00 DPL=0 CS32 [-RA]
> > > > SS =0010   00c09300 DPL=0 DS   [-WA]
> > > > DS =0010   00c09300 DPL=0 DS   [-WA]
> > > > FS =0010   00c09300 DPL=0 DS   [-WA]
> > > > GS =0010   00c09300 DPL=0 DS   [-WA]
> > > > LDT=   8200 DPL=0 LDT TR = 
> > > >  8b00 DPL=0 TSS32-busy
> > > > GDT= 000fcd68 0037
> > > > IDT= 000fdb60 
> > > > CR0=0011 CR2= CR3= CR4=
> > > > DR0= DR1= DR2=
> > > DR3=
> > > > DR6=0ff0 DR7=0400 EFER=
> > > > FCW=037f FSW= [ST=0] FTW=00 MXCSR=1f80
> > > > FPR0=  FPR1= 
> > > > FPR2=  FPR3= 
> > > > FPR4=  FPR5= 
> > > > FPR6=  FPR7= 
> > > > XMM00=
> > > > XMM01=
> > > > XMM02=
> > > > XMM03=
> > > > XMM04=
> > > > XMM05=
> > > &

Re: [Qemu-devel] Isuue assiging devices using VFIO on x86

2012-08-29 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, August 29, 2012 8:44 PM
> To: Bhushan Bharat-R65777
> Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
> Subject: Re: Isuue assiging devices using VFIO on x86
> 
> On Wed, 2012-08-29 at 07:11 +0000, Bhushan Bharat-R65777 wrote:
> >
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Wednesday, August 29, 2012 12:16 PM
> > > To: Bhushan Bharat-R65777
> > > Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
> > > Subject: Re: Isuue assiging devices using VFIO on x86
> > >
> > > On Tue, 2012-08-28 at 17:10 +, Bhushan Bharat-R65777 wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > Sent: Tuesday, August 28, 2012 9:27 PM
> > > > > To: Bhushan Bharat-R65777
> > > > > Cc: k...@vger.kernel.org; Avi Kivity; qemu-devel@nongnu.org
> > > > > Subject: Re: Isuue assiging devices using VFIO on x86
> > > > >
> > > > > On Tue, 2012-08-28 at 09:23 +, Bhushan Bharat-R65777 wrote:
> > > > > > Hi Alex,
> > > > > >
> > > > > > In my susyem I have following devices:
> > > > > >
> > > > > > I tried assigning a following PCI devices:
> > > > > > 00:03.0 Communication controller: Intel Corporation 4 Series
> > > > > > Chipset HECI
> > > > > Controller (rev 03)
> > > > > > 00:03.2 IDE interface: Intel Corporation 4 Series Chipset PT
> > > > > > IDER Controller
> > > > > (rev 03)
> > > > > > 00:03.3 Serial controller: Intel Corporation 4 Series Chipset
> > > > > > Serial KT
> > > > > Controller (rev 03)
> > Hello Alex,
> > I am using " ./configure --target-list=x86_64-softmmu --enable-kvm " .
> > QEMU repo is at:
> > commit 3b4672f6614e07f9dc0a85a12a8c89d480a2493c
> > Author: Alex Williamson 
> > Date:   Tue Aug 14 14:07:19 2012 -0600
> >
> > vfio: Enable vfio-pci and mark supported
> >
> > Signed-off-by: Alex Williamson 
> >
> > Linux Repo is at :
> > commit d9875690d9b89a866022ff49e3fcea892345ad92
> > Author: Linus Torvalds 
> > Date:   Thu Aug 16 14:51:24 2012 -0700
> >
> > Linux 3.6-rc2
> >
> > ---
> >
> > Then I bound 3 pci devices with VFIO and run below command to launch
> > VM
> > qemu-system-x86_64 -enable-kvm  -nographic  -kernel
> > /boot/vmlinuz-3.6.0-rc2+ -initrd /boot/initramfs-3.6.0-rc2+.img
> > -append "root=/dev/sda1" -m 1024 -drive
> > file=/home/kvmdev/debian_squeeze_amd64_standard.qcow2 -device
> > vfio-pci,host=:00:03.0 -device vfio-pci,host=:00:03.2 -device
> > vfio-pci,host=:00:03.3
> >
> > Then I was getting below errors:
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.0: vfio: failed
> > to set iommu for container: Operation not permitted
> >
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.0: vfio: failed
> > to setup container for group 2
> >
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.0: vfio: failed
> > to get group 2
> >
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Device
> > 'vfio-pci' could not be initialized
> > -
> >
> > Then I set
> > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts = 1
> >
> > And getting below error:
> >
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.0: Warning,
> > device :00:03.0 does not support reset
> >
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.2: Warning,
> > device :00:03.2 does not support reset
> >
> > qemu-system-x86_64: -device vfio-pci,host=:00:03.3: Warning,
> > device :00:03.3 does not support reset
> >
> > qemu: hardware error: register_ioport_read: invalid opaque for address
> > 0x3f6 CPU #0:
> > EAX=8003 EBX=3ffe0e80 ECX=8003 EDX=0cfc ESI=2800
> > EDI=80002804 EBP=6fc0 ESP=6f38
> > EIP=3ffec005 EFL=0086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =0010   00c09300 DPL=0 DS   [-WA]
> > CS =0008   00c09b00 DPL=0 CS32 [-RA]
> > SS =0010   00c09300 DPL=0 DS   [-WA]
> > DS =0010 

Re: [Qemu-devel] [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller

2012-09-02 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Wednesday, August 15, 2012 6:59 AM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat-
> R65777
> Subject: Re: [Qemu-ppc] [PATCH: RFC] Adding BAR0 for e500 PCI controller
> 
> On 08/14/2012 07:50 AM, Bharat Bhushan wrote:
> > PCI Root complex have TYPE-1 configuration header while PCI endpoint
> > have type-0 configuration header. The type-1 configuration header have
> > a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci
> > address space to CCSR address space. This can used for 2 purposes: 1)
> > for MSI interrupt generation 2) Allow CCSR registers access when
> > configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM
> guest.
> >
> > What I observed is that when guest read the size of BAR0 of host
> > controller configuration header (TYPE1 header) then it always reads it
> > as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the
> > PCI controller device registering BAR0. I do not find any other
> > controller also doing so may they do not use BAR0.
> >
> > There are two issues when BAR0 is not there (which I can think of):
> > 1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header)
> > and when reading the size of BAR0, it should give size as per real h/w.
> >
> > 2) Do we need this BAR0 inbound address translation?
> > When BAR0 is of non-zero size then it will be configured for PCI
> > address space to local address(CCSR) space translation on inbound access.
> > The primary use case is for MSI interrupt generation. The device is
> > configured with a address offsets in PCI address space, which will be
> > translated to MSI interrupt generation MPIC registers. Currently I do
> > not understand the MSI interrupt generation mechanism in QEMU and also
> > IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines.
> > But this BAR0 will be used when using MSI on e500.
> 
> This patch is only trying to address #1, right?  I don't see any connection 
> from
> this BAR to CCSR.
> 
> > +memory_region_init_io(&h->bar0, &pci_host_conf_be_ops, h,
> > +  "PCIHOST-bar0", 0x100);
> 
> 0x0100 is correct for e500mc-based systems, but it should be 0x0010 
> for
> e500v2-based systems.

Scott,

Currently we have a generic e500 machine which have CCSR size 0x0010 
(MPC8544_CCSRBAR_SIZE). We do not have e500mc and e500v2 machines. So should we 
make this 0x0010 as per generic e500 machine?

Can we somehow pass this via qdev/varargs from machine emulation code 
(hw/ppc/e500.c) ?

Thanks
-Bharat

> 
> -Scott



Re: [Qemu-devel] [PATCH] Adding BAR0 for e500 PCI controller

2012-09-19 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Wednesday, September 19, 2012 4:33 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; afaer...@suse.de; Bhushan
> Bharat-R65777
> Subject: Re: [PATCH] Adding BAR0 for e500 PCI controller
> 
> 
> On 19.09.2012, at 09:41, Bharat Bhushan wrote:
> 
> > PCI Root complex have TYPE-1 configuration header while PCI endpoint
> > have type-0 configuration header. The type-1 configuration header have
> > a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci
> > address space to CCSR address space. This can used for 2 purposes: 1)
> > for MSI interrupt generation 2) Allow CCSR registers access when
> > configured as PCI endpoint, which I am not sure is a use case with QEMU-KVM
> guest.
> >
> > What I observed is that when guest read the size of BAR0 of host
> > controller configuration header (TYPE1 header) then it always reads it
> > as 0. When looking into the QEMU hw/ppce500_pci.c, I do not find the
> > PCI controller device registering BAR0. I do not find any other
> > controller also doing so may they do not use BAR0.
> >
> > There are two issues when BAR0 is not there (which I can think of):
> > 1) There should be BAR0 emulated for PCI Root comaplex (TYPE1 header)
> > and when reading the size of BAR0, it should give size as per real h/w.
> >
> > This patch solves this problem.
> >
> > 2) Do we need this BAR0 inbound address translation?
> >When BAR0 is of non-zero size then it will be configured for
> > PCI address space to local address(CCSR) space translation on inbound 
> > access.
> > The primary use case is for MSI interrupt generation. The device is
> > configured with a address offsets in PCI address space, which will be
> > translated to MSI interrupt generation MPIC registers. Currently I do
> > not understand the MSI interrupt generation mechanism in QEMU and also
> > IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines.
> > But this BAR0 will be used when using MSI on e500.
> >
> > I can see one more issue, There are ATMUs emulated in
> > hw/ppce500_pci.c, but i do not see these being used for address translation.
> > So far that works because pci address space and local address space
> > are 1:1 mapped. BAR0 inbound translation + ATMU translation will
> > complete the address translation of inbound traffic.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > hw/pci_host.h|9 +
> > hw/ppc/e500.c|   21 +
> > hw/ppce500_pci.c |7 +++
> > 3 files changed, 37 insertions(+), 0 deletions(-)
> >
> > diff --git a/hw/pci_host.h b/hw/pci_host.h index 4b9c300..c1ec7eb
> > 100644
> > --- a/hw/pci_host.h
> > +++ b/hw/pci_host.h
> > @@ -41,10 +41,19 @@ struct PCIHostState {
> > MemoryRegion data_mem;
> > MemoryRegion mmcfg;
> > MemoryRegion *address_space;
> > +MemoryRegion bar0;
> > uint32_t config_reg;
> > PCIBus *bus;
> > };
> >
> > +typedef struct PPCE500CCSRState {
> > +SysBusDevice *parent;
> > +MemoryRegion ccsr_space;
> > +} PPCE500CCSRState;
> > +
> > +#define TYPE_CCSR "e500-ccsr"
> > +#define CCSR(obj) OBJECT_CHECK(PPCE500CCSRState, (obj), TYPE_CCSR)
> > +
> 
> All of this is e500 specific, so it should definitely not be in any generic 
> pci
> host header.

Yes, ok.

> 
> > /* common internal helpers for PCI/PCIe hosts, cut off overflows */
> > void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr,
> >   uint32_t limit, uint32_t val,
> > uint32_t len); diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index
> > 6f0de6d..1f5da28 100644
> > --- a/hw/ppc/e500.c
> > +++ b/hw/ppc/e500.c
> > @@ -31,6 +31,7 @@
> > #include "hw/loader.h"
> > #include "elf.h"
> > #include "hw/sysbus.h"
> > +#include "hw/pci_host.h"
> > #include "exec-memory.h"
> > #include "host-utils.h"
> >
> > @@ -587,3 +588,23 @@ void ppce500_init(PPCE500Params *params)
> > kvmppc_init();
> > }
> > }
> > +
> > +static void e500_ccsr_type_init(Object *obj) {
> > +PPCE500CCSRState *ccsr = CCSR(obj);
> > +ccsr->ccsr_space.size = int128_make64(MPC8544_CCSRBAR_SIZE);
> > +}
> > +
> > +static const TypeInfo e500_ccsr_info = {
> > +.name  = TYPE_CCSR,
> > +.pare

Re: [Qemu-devel] RFC: vfio API changes needed for powerpc (v3)

2013-04-09 Thread Bhushan Bharat-R65777

So now the sequence would be something like:

1)VFIO_GROUP_SET_CONTAINER // add groups to the container

2)VFIO_SET_IOMMU(VFIO_FSL_PAMU)// set iommu model

3)count = VFIO_IOMMU_GET_MSI_BANK_COUNT// returns max # of MSI banks

4)VFIO_IOMMU_SET_ATTR(ATTR_GEOMETRY) // set overall aperture

5)VFIO_IOMMU_SET_ATTR(ATTR_WINDOWS) // set # of windows, including MSI 
banks

6)   For (int I = 0; I < count; i++)
VFIO_IOMMU_PAMU_MAP_MSI_BANK() // map the MSI banks, do not 
enable aperture here.

7)   Memory Listener will call-> VFIO_IOMMU_MAP_DMA// map the guest's 
memory
   ---> kernel enables aperture here on first VFIO_IOMMU_MAP_DMA

8)VFIO_DEVICE_SET_IRQS
   ---> VFIO in kernel makes 
pci_enable_msix()/pci_enable_msi_block() calls, this sets actual MSI addr/data 
in physical device.
  ---> As the address set by above APIs is not what we want so
-> is using MSIX, VFIO will update address in the MSI-X 
table
-> if using MSI, update MSI address in PCI configuration 
space.



Thanks
-Bharat


> -Original Message-
> From: Yoder Stuart-B08248
> Sent: Friday, April 05, 2013 3:40 AM
> To: Alex Williamson
> Cc: Wood Scott-B07421; ag...@suse.de; Bhushan Bharat-R65777; Sethi 
> Varun-B16395;
> k...@vger.kernel.org; qemu-devel@nongnu.org; io...@lists.linux-foundation.org
> Subject: RFC: vfio API changes needed for powerpc (v3)
> 
> -v3 updates
>-made vfio_pamu_attr a union, added flags
>-s/VFIO_PAMU_/VFIO_IOMMU_PAMU_/ for the ioctls to make it more
> clear which fd is being operated on
>-added flags to vfio_pamu_msi_bank_map/umap
>-VFIO_PAMU_GET_MSI_BANK_COUNT now just returns a __u32
> not a struct
>-fixed some typos
> 
> 
> 
> The Freescale PAMU is an aperture-based IOMMU with the following
> characteristics.  Each device has an entry in a table in memory
> describing the iova->phys mapping. The mapping has:
>-an overall aperture that is power of 2 sized, and has a start iova that
> is naturally aligned
>-has 1 or more windows within the aperture
>   -number of windows must be power of 2, max is 256
>   -size of each window is determined by aperture size / # of windows
>   -iova of each window is determined by aperture start iova / # of windows
>   -the mapped region in each window can be different than
>the window size...mapping must power of 2
>   -physical address of the mapping must be naturally aligned
>with the mapping size
> 
> These ioctls operate on the VFIO file descriptor (/dev/vfio/vfio).
> 
> /*
>  * VFIO_IOMMU_PAMU_GET_ATTR
>  *
>  * Gets the iommu attributes for the current vfio container.  This
>  * ioctl is applicable to an iommu type of VFIO_PAMU only.
>  * Caller sets argsz and attribute.  The ioctl fills in
>  * the provided struct vfio_pamu_attr based on the attribute
>  * value that was set.
> 
>  * Return: 0 on success, -errno on failure
>  */
> struct vfio_pamu_attr {
> __u32   argsz;
> __u32   flags;/* no flags currently */
> __u32   attribute;
> 
> union {
> /* VFIO_ATTR_GEOMETRY */
> struct {
> __u64 aperture_start; /* first addr that can be mapped
> */
> __u64 aperture_end;  /* last addr that can be mapped 
> */
> } attr;
> 
> /* VFIO_ATTR_WINDOWS */
> __u32 windows;  /* number of windows in the aperture */
> /* initially this will be the max number
>  * of windows that can be set
>  */
> 
> /* VFIO_ATTR_PAMU_STASH */
> struct {
> __u32 cpu; /* CPU number for stashing */
> __u32 cache;   /* cache ID for stashing */
> } stash;
> }
> };
> #define VFIO_IOMMU_PAMU_GET_ATTR  _IO(VFIO_TYPE, VFIO_BASE + x,
> struct vfio_pamu_attr)
> 
> /*
>  * VFIO_IOMMU_PAMU_SET_ATTR
>  *
>  * Sets the iommu attributes for the current vfio container.  This
>  * ioctl is applicable to an iommu type of VFIO_PAMU only.
>  * Caller sets struct vfio_pamu attr, including argsz and attribute and
>  * setting any fields that are valid for the attribute.
>  * Return: 0 on success, -errno on failure
>  */
> #define VFIO_IOMMU_PAMU_SET_ATTR  _IO(VFIO_TYPE, VFIO_BASE + x,
> struct vfio_pamu_attr)
> 
> /*
>  * VFIO_IOMMU

Re: [Qemu-devel] [PATCH] ppc: initialize GPRs as per epapr

2013-04-26 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Friday, April 26, 2013 11:51 AM
> To: Bhushan Bharat-R65777
> Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; Wood Scott-B07421; Bhushan
> Bharat-R65777; Yoder Stuart-B08248
> Subject: Re: [PATCH] ppc: initialize GPRs as per epapr
> 
> 
> On 26.04.2013, at 08:17, Bharat Bhushan wrote:
> 
> > ePAPR defines the initial values of cpu registers. This patch
> > initialize the GPRs as per ePAPR specification.
> >
> > This resolves the issue of guest reboot/reset (guest hang on reboot).
> 
> Why does it hang only on reboot, not on initial bootup?

may be memory pointed by env pointer are zero initialized initially.
Reboot also not always hangs. I have seen reboot mostly working on 
e500v2/e500mc and mostly hanging on e5500.

> 
> >
> > Signed-off-by: Bharat Bhushan 
> > Signed-off-by: Stuart Yoder 
> > ---
> > hw/ppc/e500.c |7 +++
> > 1 files changed, 7 insertions(+), 0 deletions(-)
> >
> > diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index c1bdb6b..a47f976
> > 100644
> > --- a/hw/ppc/e500.c
> > +++ b/hw/ppc/e500.c
> > @@ -37,6 +37,7 @@
> > #include "qemu/host-utils.h"
> > #include "hw/pci-host/ppce500.h"
> >
> > +#define EPAPR_MAGIC(0x45504150)
> > #define BINARY_DEVICE_TREE_FILE"mpc8544ds.dtb"
> > #define UIMAGE_LOAD_BASE   0
> > #define DTC_LOAD_PAD   0x180
> > @@ -444,6 +445,12 @@ static void ppce500_cpu_reset(void *opaque)
> 
> Does ePAPR mention anything wrt GPR state of secondary CPUs?

Yes, I think we handle this in hw/ppc/ppce500_spin.c

> 
> > cs->halted = 0;
> > env->gpr[1] = (16<<20) - 8;
> > env->gpr[3] = bi->dt_base;
> > +env->gpr[4] = 0;
> > +env->gpr[5] = 0;
> > +env->gpr[6] = EPAPR_MAGIC;
> > +env->gpr[7] = (64 * 1024 * 1024);
> 
> What is this?

Size of initial TLB ( should be big enough to cover kernel handler). I do not 
see ePAPR defines any value, I set this to 64M.

-Bharat

> 
> 
> Alex
> 
> > +env->gpr[8] = 0;
> > +env->gpr[9] = 0;
> > env->nip = bi->entry;
> > mmubooke_create_initial_mapping(env);
> > }
> > --
> > 1.7.0.4
> >
> >
> 





[Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)

2012-06-08 Thread Bhushan Bharat-R65777
Hi All,

When Freescale PCI controller configured in Root Complex mode then, its 
configuration header (type 1) have one inbound BAR (BAR0, called as CCSRBAR). 
And rest of BARs (inbound and outbound) are supported by ATMU registers, which 
are outside the Type 1 configuration header. This BAR0 of Type 1 configuration 
header always translate to CCSR space and is of the size of CCSR. This BAR0 
(inbound window) is required for MSI interrupts support. With this window, the 
pci devices can write to the MPIC MSI registers to generate MSI interrupt.

As far as I know, as of now no emulated PCI controller supports this BAR0 in 
type 1 configuration header. But probably (I think so) that supporting this is 
not of big concern, but the point is that this window (BAR0) translate to 
mmio-regs (CCSR) and not to DDR memory.

So I have couple of concerns here:

1. Whenever PCI device does need DMA then these windows (inbound and outbound 
ATMUs registers) need to used to translate pci address to system physical 
address (Sometime we also call this as cpu address space). This will probably 
be done by : [Qemu-devel] [PATCH 00/12] IOMMU Infrastructure : patch-set ( I am 
trying to understand these patches :-))

2. Hook up this inbound BAR0 in the above patch-set to translate to mmio-regs. 
As this would be controller specific, a callback will be registered for 
translation. Now it will be upto the controller specific code on how it 
translates. As this is needed only for MSI interrupt, maybe, initially we do 
not do anything initially, till we want MSI emulation in QEMU.

Please provide your comment.

Thanks
-Bharat






Re: [Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)

2012-06-08 Thread Bhushan Bharat-R65777
Just adding Alex Graf and qemu-ppc@.

Will add my response in reply. 

> -Original Message-
> From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
> Sent: Friday, June 08, 2012 4:32 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; Wood Scott-B07421; Yoder Stuart-B08248
> Subject: Re: [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI 
> controller
> (Root Complex)
> 
> On Fri, 2012-06-08 at 09:03 +, Bhushan Bharat-R65777 wrote:
> > Hi All,
> >
> > When Freescale PCI controller configured in Root Complex mode then,
> > its configuration header (type 1) have one inbound BAR (BAR0, called
> > as CCSRBAR). And rest of BARs (inbound and outbound) are supported by
> > ATMU registers, which are outside the Type 1 configuration header.
> > This BAR0 of Type 1 configuration header always translate to CCSR
> > space and is of the size of CCSR. This BAR0 (inbound window) is
> > required for MSI interrupts support. With this window, the pci devices
> > can write to the MPIC MSI registers to generate MSI interrupt.
> 
> So you should start with giving an overview, nobody here knows what "ATMU" or
> "CCSR" are, those are device specific acronyms
> 
> As a matter of fact it looks like I misunderstood you on IRC :-) IE. I thought
> your problem was that BAR0 is 'special' as it represents the inbound memory
> window (as it does on some PPC 4xx for example and a whole other collection of
> embedded devices). You seem to indicate that it's in fact MMIO:
> 
> > As far as I know, as of now no emulated PCI controller supports this
> > BAR0 in type 1 configuration header. But probably (I think so) that
> > supporting this is not of big concern, but the point is that this
> > window (BAR0) translate to mmio-regs (CCSR) and not to DDR memory.
> 
> So the 4xx case or the case where your BAR 0 actually represents system memory
> are something that is bothering me but not what you are trying to discuss 
> here.
> 
> IE. Normal BAR operations should work for an MMIO BAR, ie register it normally
> as a PCIe device and devices accessing those addresses will do the right 
> thing.
> 
> From there, AFAIK, the MSI code will simply do stl_le_phys, which I
> -believe- will hit a BAR that does MMIO decoding for those addresses, but I'll
> let people knowing qemu more in depth reply whether that's true or not.
> 
> There's very few devices with MSI support in hw/* so it's hard to test with
> anything other than virtio.
> 
> > So I have couple of concerns here:
> >
> > 1. Whenever PCI device does need DMA then these windows (inbound and
> > outbound ATMUs registers) need to used to translate pci address to
> > system physical address (Sometime we also call this as cpu address
> > space). This will probably be done by : [Qemu-devel] [PATCH 00/12]
> > IOMMU Infrastructure : patch-set ( I am trying to understand these
> > patches :-))
> 
> Yes, that's basically it. The patches allow you to add a set of routines that
> will be used for translating DMA accesses to system memory along with 
> map/unmap
> operations etc...
> 
> Beware that this only works with drivers that use the proper accessors.
> The patch series "fixes" some of them but not everything. I don't know off 
> hand
> (I don't have the code at hand right now) whether the MSI code goes through 
> that
> or not, if it does, you may need to be careful to
> -not- hit the translation layer and pass the accesses on.
> 
> > 2. Hook up this inbound BAR0 in the above patch-set to translate to
> > mmio-regs. As this would be controller specific, a callback will be
> > registered for translation. Now it will be upto the controller
> > specific code on how it translates. As this is needed only for MSI
> > interrupt, maybe, initially we do not do anything initially, till we
> > want MSI emulation in QEMU.
> 
> Well, virtio can make good use of MSI emulation ...
> 
> Cheers,
> Ben.
> 
> > Please provide your comment.
> >
> > Thanks
> > -Bharat
> >
> >
> 
> 



Re: [Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)

2012-06-10 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Saturday, June 09, 2012 12:27 AM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; b...@kernel.crashing.org; Wood Scott-B07421; Yoder
> Stuart-B08248
> Subject: Re: [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI 
> controller
> (Root Complex)
> 
> On 06/08/2012 04:03 AM, Bhushan Bharat-R65777 wrote:
> > Hi All,
> >
> > When Freescale PCI controller configured in Root Complex mode then,
> > its configuration header (type 1) have one inbound BAR (BAR0, called
> > as CCSRBAR).
> 
> It maps to CCSRBAR, but it's called PCICSRBAR/PEXCSRBAR.
> 
> > And rest of BARs (inbound and outbound) are supported by ATMU
> > registers, which are outside the Type 1 configuration header.
> > This BAR0 of Type 1 configuration header always translate to CCSR
> > space and is of the size of CCSR. This BAR0 (inbound window) is
> > required for MSI interrupts support.
> 
> Some sort of inbound mapping is required for MSIs.  It's typically this one or
> PEXMSIBAR (which is a similar concept but more full-featured, and not in 
> config
> space), but doesn't have to be (e.g. we use normal inbound windows for MSI on
> Topaz, because of PAMU limitations that prevent us from using the CCSR address
> as the guest physical MSI destination).

When guest access BAR0 of configuration space or it accesses PEXMSIBAR, a 
common inbound mapping will translate the address to CCSR (Now the translated 
address may be further translated to real physical by IOMMU translation layer 
if configured in QEMU).

> 
> > As far as I know, as of now no emulated PCI controller supports this
> > BAR0 in type 1 configuration header. But probably (I think so) that
> > supporting this is not of big concern, but the point is that this
> > window (BAR0) translate to mmio-regs (CCSR) and not to DDR memory.
> >
> > So I have couple of concerns here:
> >
> > 1. Whenever PCI device does need DMA then these windows (inbound and
> > outbound ATMUs registers) need to used to translate pci address to
> > system physical address (Sometime we also call this as cpu address
> > space). This will probably be done by : [Qemu-devel] [PATCH 00/12]
> > IOMMU Infrastructure : patch-set ( I am trying to understand these
> > patches :-))
> >
> > 2. Hook up this inbound BAR0 in the above patch-set to translate to
> > mmio-regs. As this would be controller specific, a callback will be
> > registered for translation. Now it will be upto the controller
> > specific code on how it translates. As this is needed only for MSI
> > interrupt, maybe, initially we do not do anything initially, till we
> > want MSI emulation in QEMU.
> 
> MSIs may be the only thing that we plan to use it for, but if you want to
> properly emulate the chip you need to fully implement all the translation
> windows.

Do you mean address translation using ATMU inbound/outbound windows? Or 
complete translation support of CCSR space?

The former, I think, not sure, should get supported by "IOMMU Infrastructure" 
patch set.

The later is not supported. And if I understood correctly then, I think the I 
agree with you on emulating complete CCSR space. Do you think that we can start 
with supporting the MSI region mapping (MPIC mmio regs) and design it in way 
that later if needed it can be extended for rest of CCSR space.

If in kernel MPIC is there, then the QEMU will use the in-kernel MPIC to 
generate MSI interrupt or it will use QEMU emulated MPIC to generate MSI 
interrupt.

> 
> We also want the BAR itself to be properly emulated, as otherwise software can
> get confused when it reads it and sees zero as the location of the PCI DMA
> memory hole.
> 

Probably I did not understood clearly. Can you please elaborate of what type of 
confusion.

Thanks
-Bharat




Re: [Qemu-devel] [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI controller (Root Complex)

2012-06-11 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
> Sent: Friday, June 08, 2012 4:32 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; Wood Scott-B07421; Yoder Stuart-B08248
> Subject: Re: [RFC] Proposal: PCI/PCIe: inbound BAR0 emulation for PCI 
> controller
> (Root Complex)
> 
> On Fri, 2012-06-08 at 09:03 +, Bhushan Bharat-R65777 wrote:
> > Hi All,
> >
> > When Freescale PCI controller configured in Root Complex mode then,
> > its configuration header (type 1) have one inbound BAR (BAR0, called
> > as CCSRBAR). And rest of BARs (inbound and outbound) are supported by
> > ATMU registers, which are outside the Type 1 configuration header.
> > This BAR0 of Type 1 configuration header always translate to CCSR
> > space and is of the size of CCSR. This BAR0 (inbound window) is
> > required for MSI interrupts support. With this window, the pci devices
> > can write to the MPIC MSI registers to generate MSI interrupt.
> 
> So you should start with giving an overview, nobody here knows what "ATMU" or
> "CCSR" are, those are device specific acronyms

ATMU - Address Translation and Mapping Unit :- This facilitate flexibility in 
defining the address maps for the external interfaces (PCI, Rapidio etc). 
This basically provides following features:
1. Translating the local address to an external address space
2. Translating external addresses to the local address space

CCSR - Configuration, Control, and Status Register (CCSR) Space. This is memory 
mapped system register space.

In PCI/PCIe context this can be visualized as:


-|--|
Local|   |PCI controller (Root Complex)||
Address  |   |   ___| 
Space|<->|  |ATMU  |  |  Type 1||<--PCI Address Space---
 |   |  |__|  | Config Header  ||
 |   ||||
-|--|


> 
> As a matter of fact it looks like I misunderstood you on IRC :-) IE. I thought
> your problem was that BAR0 is 'special' as it represents the inbound memory
> window (as it does on some PPC 4xx for example and a whole other collection of
> embedded devices). You seem to indicate that it's in fact MMIO:

In some SOCs, BAR0 always maps to CCSR (MMIO) while in some new SOCs, the BAR0 
default maps to CCSR space while it can be programmed to map system memory in 
local address space.

> 
> > As far as I know, as of now no emulated PCI controller supports this
> > BAR0 in type 1 configuration header. But probably (I think so) that
> > supporting this is not of big concern, but the point is that this
> > window (BAR0) translate to mmio-regs (CCSR) and not to DDR memory.
> 
> So the 4xx case or the case where your BAR 0 actually represents system memory
> are something that is bothering me but not what you are trying to discuss 
> here.
> 
> IE. Normal BAR operations should work for an MMIO BAR, ie register it normally
> as a PCIe device and devices accessing those addresses will do the right 
> thing.
> 
> From there, AFAIK, the MSI code will simply do stl_le_phys, which I
> -believe- will hit a BAR that does MMIO decoding for those addresses, but I'll
> let people knowing qemu more in depth reply whether that's true or not.
> 
> There's very few devices with MSI support in hw/* so it's hard to test with
> anything other than virtio.

Probably testing with virtio should be ok as of now ( to put the framework in 
place).

Thanks
-Bharat
> 
> > So I have couple of concerns here:
> >
> > 1. Whenever PCI device does need DMA then these windows (inbound and
> > outbound ATMUs registers) need to used to translate pci address to
> > system physical address (Sometime we also call this as cpu address
> > space). This will probably be done by : [Qemu-devel] [PATCH 00/12]
> > IOMMU Infrastructure : patch-set ( I am trying to understand these
> > patches :-))
> 
> Yes, that's basically it. The patches allow you to add a set of routines that
> will be used for translating DMA accesses to system memory along with 
> map/unmap
> operations etc...
> 
> Beware that this only works with drivers that use the proper accessors.
> The patch series "fixes" some of them but not everything. I don't know off 
> hand
> (I don't have the code at hand right now) whether the MSI code goes through 
> that
> or not, if it does, you may need to be careful to
> -not- hit the translation layer and pass the accesses on.
> 
> > 2. Hook up this inbound BAR0 in the above patch-set

Re: [Qemu-devel] qemu_power_down_request handling

2012-04-16 Thread Bhushan Bharat-R65777

Also why not use qemu_system_shutdown_request() rather than 
qemu_system_powerdown_request() for "-watchdog-action shutdown" ?

Thanks
-Bharat

> -Original Message-
> From: Bhushan Bharat-R65777
> Sent: Monday, April 16, 2012 1:37 PM
> To: 'qemu-devel@nongnu.org'
> Cc: 'Alexander Graf'; Wood Scott-B07421
> Subject: qemu_power_down_request handling
> 
> Hi All,
> 
> The "-watchdog-action shutdown" uses the monitor system_shutdown mechanism but
> this does not actually powerdown the system.
> 
> In code, If there is pending powerdown_request 
> (qemu_system_powerdown_request()
> called) then it try to raise an irq. What is that irq supposed to do. 
> Currently
> it in NULL pointer so does not do anything.
> 
> Code snapshot -- vl.c : main_loop_should_exit()
> 
> 
> if (qemu_powerdown_requested()) {
> monitor_protocol_event(QEVENT_POWERDOWN, NULL);
> qemu_irq_raise(qemu_system_powerdown);
> }
> . . . .
> 
> Is the powerdown_request handling not complete?
> 
> Thanks
> -Bharat
> 




[Qemu-devel] qemu_power_down_request handling

2012-04-16 Thread Bhushan Bharat-R65777
Hi All,

The "-watchdog-action shutdown" uses the monitor system_shutdown mechanism but 
this does not actually powerdown the system. 

In code, If there is pending powerdown_request (qemu_system_powerdown_request() 
called) then it try to raise an irq. What is that irq supposed to do. Currently 
it in NULL pointer so does not do anything.

Code snapshot -- vl.c : main_loop_should_exit()


if (qemu_powerdown_requested()) {
monitor_protocol_event(QEVENT_POWERDOWN, NULL);
qemu_irq_raise(qemu_system_powerdown);
}
. . . .

Is the powerdown_request handling not complete?

Thanks
-Bharat 





Re: [Qemu-devel] qemu_power_down_request handling

2012-04-16 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, April 16, 2012 1:47 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; Wood Scott-B07421
> Subject: Re: qemu_power_down_request handling
> 
> 
> On 16.04.2012, at 10:07, Bhushan Bharat-R65777 wrote:
> 
> > Hi All,
> >
> > The "-watchdog-action shutdown" uses the monitor system_shutdown mechanism 
> > but
> this does not actually powerdown the system.
> >
> > In code, If there is pending powerdown_request
> (qemu_system_powerdown_request() called) then it try to raise an irq. What is
> that irq supposed to do. Currently it in NULL pointer so does not do anything.
> >
> > Code snapshot -- vl.c : main_loop_should_exit()
> >
> > 
> >if (qemu_powerdown_requested()) {
> >monitor_protocol_event(QEVENT_POWERDOWN, NULL);
> >qemu_irq_raise(qemu_system_powerdown);
> >}
> > . . . .
> >
> > Is the powerdown_request handling not complete?
> 
> The watchdog shutdown event is a soft power off. Imagine a PC where you press
> the power button, the OS gets notified through ACPI that it's supposed to shut
> down and shuts down the machine.
> 
> I don't know of any BookE system that has a comparable mechanism. We could
> implement it through qemu-ga and fall back to hard power off maybe?

Till someone develop this interface, does it make sense to use 
qemu_system_shutdown_request() rather than qemu_system_powerdown_request() for 
"-watchdog-action shutdown" ?

Thanks
-Bharat




Re: [Qemu-devel] [PATCH v2] ppc: initialize GPRs as per epapr

2013-04-29 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Monday, April 29, 2013 11:21 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Yoder Stuart-
> B08248; da...@gibson.dropbear.id.au; Bhushan Bharat-R65777; Bhushan Bharat-
> R65777
> Subject: Re: [PATCH v2] ppc: initialize GPRs as per epapr
> 
> On 04/29/2013 09:40:56 AM, Bharat Bhushan wrote:
> > ePAPR defines the initial values of cpu registers.
> > This patch initialize the GPRs as per ePAPR specification.
> >
> > This resolves the issue of guest reboot/reset (guest hang on reboot).
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > v1-->v2
> >  - Dynemic seting of initial map size in gpr[7]
> >  - For this the tlb size calculation is moved into a function.
> >
> >  hw/ppc/e500.c |   29 ++---
> >  1 files changed, 26 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index c1bdb6b..decd86c
> > 100644
> > --- a/hw/ppc/e500.c
> > +++ b/hw/ppc/e500.c
> > @@ -37,6 +37,7 @@
> >  #include "qemu/host-utils.h"
> >  #include "hw/pci-host/ppce500.h"
> >
> > +#define EPAPR_MAGIC(0x45504150)
> >  #define BINARY_DEVICE_TREE_FILE"mpc8544ds.dtb"
> >  #define UIMAGE_LOAD_BASE   0
> >  #define DTC_LOAD_PAD   0x180
> > @@ -393,11 +394,10 @@ static inline hwaddr
> > booke206_page_size_to_tlb(uint64_t size)
> >  return 63 - clz64(size >> 10);
> >  }
> >
> > -static void mmubooke_create_initial_mapping(CPUPPCState *env)
> > +static int booke206_initial_map_tsize(CPUPPCState *env)
> >  {
> >  struct boot_info *bi = env->load_info;
> > -ppcmas_tlb_t *tlb = booke206_get_tlbm(env, 1, 0, 0);
> > -hwaddr size, dt_end;
> > +hwaddr dt_end;
> >  int ps;
> >
> >  /* Our initial TLB entry needs to cover everything from 0 to @@
> > -408,6 +408,23 @@ static void
> > mmubooke_create_initial_mapping(CPUPPCState *env)
> >  /* e500v2 can only do even TLB size bits */
> >  ps++;
> >  }
> > +return ps;
> > +}
> > +static uint64_t mmubooke_initial_mapsize(CPUPPCState *env)
> 
> Please leave a blank line after each of those } lines.
> 
> You change the function name to look like it's simply returning
> information, but it still creates the TLB entry as far as I can see.

This is just returning the tlb size . not creating the tlb entry.



> Then you go calling it multiple times (why?).  This may not be harmful, but it
> is ugly.
> 
> > +{
> > +int tsize;
> > +
> > +tsize = booke206_initial_map_tsize(env);
> > +return (1ULL << 10 << tsize);
> > +}
> 
> What value does this wrapper have?

This returning the size on initial tlb entry in bytes.

>  The caller needs both the "tsize"
> and the size in bytes, and you only call this wrapper once, so let the caller 
> do
> the conversion instead.

Caller does not need to know both. It only need to know the size in bytes.

I think git diff make a mess of this. just trying to put the code for clarity 


static int booke206_initial_map_tsize(CPUPPCState *env)
{
struct boot_info *bi = env->load_info;
hwaddr dt_end;
int ps;

/* Our initial TLB entry needs to cover everything from 0 to
   the device tree top */
dt_end = bi->dt_base + bi->dt_size;
ps = booke206_page_size_to_tlb(dt_end) + 1;
if (ps & 1) {
/* e500v2 can only do even TLB size bits */
ps++;
}
return ps;
}
static uint64_t mmubooke_initial_mapsize(CPUPPCState *env)
{
int tsize;

tsize = booke206_initial_map_tsize(env);
return (1ULL << 10 << tsize);
}

static void mmubooke_create_initial_mapping(CPUPPCState *env)
{
ppcmas_tlb_t *tlb = booke206_get_tlbm(env, 1, 0, 0);
hwaddr size;
int ps;

ps = booke206_initial_map_tsize(env);
size = (ps << MAS1_TSIZE_SHIFT);
tlb->mas1 = MAS1_VALID | size;
tlb->mas2 = 0;
tlb->mas7_3 = 0;
tlb->mas7_3 |= MAS3_UR | MAS3_UW | MAS3_UX | MAS3_SR | MAS3_SW | MAS3_SX;

env->tlb_dirty = true;
}

--

Thanks
-Bharat
> 
> -Scott




Re: [Qemu-devel] [PATCH] arm_gic: Keep track of GICD_CPENDR and GICD_SPENDR

2013-10-29 Thread Bhushan Bharat-R65777
Hi Christoffer,

Not related to the patch, for edge type of interrupt, will setting bit in 
ICD_SPENDR generate interrupt?

Thanks
-Bharat

> -Original Message-
> From: Christoffer Dall [mailto:christoffer.d...@linaro.org]
> Sent: Wednesday, October 23, 2013 8:57 PM
> To: peter.mayd...@linaro.org
> Cc: patc...@linaro.org; qemu-devel@nongnu.org; kvm...@lists.cs.columbia.edu
> Subject: [PATCH] arm_gic: Keep track of GICD_CPENDR and GICD_SPENDR
> 
> If software writes to the ISPENDR and sets the pending state of a level-
> triggered interrupt, the falling edge of the hardware input must not clear the
> pending state.  Conversely, if software writes to the ICPENDR, the pending 
> state
> of a level-triggered interrupt should only be cleared if the hardware input is
> not asserted.
> 
> This requires an extra state variable to keep track of software writes.
> 
> Signed-off-by: Christoffer Dall 
> ---
>  hw/intc/arm_gic.c| 20 +---
>  hw/intc/arm_gic_common.c |  5 +++--
>  hw/intc/gic_internal.h   |  4 
>  3 files changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c index d1ddac1..db54061 
> 100644
> --- a/hw/intc/arm_gic.c
> +++ b/hw/intc/arm_gic.c
> @@ -101,6 +101,12 @@ static void gic_clear_pending(GICState *s, int irq, int 
> cm,
> uint8_t src)  {
>  unsigned cpu;
> 
> +/* If a level-triggered interrupt has been set to pending through the
> + * GICD_SPENDR, then a falling edge does not clear the pending state.
> + */
> +if (GIC_TEST_SW_PENDING(irq, cm))
> +return;
> +
>  GIC_CLEAR_PENDING(irq, cm);
>  if (irq < GIC_NR_SGIS) {
>  cpu = (unsigned)ffs(cm) - 1;
> @@ -177,8 +183,9 @@ uint32_t gic_acknowledge_irq(GICState *s, int cpu)
>  s->last_active[new_irq][cpu] = s->running_irq[cpu];
>  /* Clear pending flags for both level and edge triggered interrupts.
> Level triggered IRQs will be reasserted once they become inactive.  */
> -gic_clear_pending(s, new_irq, GIC_TEST_MODEL(new_irq) ? ALL_CPU_MASK : 
> cm,
> -  GIC_SGI_SRC(new_irq, cpu));
> +cm = GIC_TEST_MODEL(new_irq) ? ALL_CPU_MASK : cm;
> +GIC_CLEAR_SW_PENDING(new_irq, cm);
> +gic_clear_pending(s, new_irq, cm, GIC_SGI_SRC(new_irq, cpu));
>  gic_set_running_irq(s, cpu, new_irq);
>  DPRINTF("ACK %d\n", new_irq);
>  return new_irq;
> @@ -445,16 +452,23 @@ static void gic_dist_writeb(void *opaque, hwaddr offset,
>  for (i = 0; i < 8; i++) {
>  if (value & (1 << i)) {
>  GIC_SET_PENDING(irq + i, GIC_TARGET(irq + i));
> +if (!GIC_TEST_TRIGGER(irq + i)) {
> +GIC_SET_SW_PENDING(irq + i, GIC_TARGET(irq + i));
> +}
>  }
>  }
>  } else if (offset < 0x300) {
> +int cm = (1 << cpu);
>  /* Interrupt Clear Pending.  */
>  irq = (offset - 0x280) * 8 + GIC_BASE_IRQ;
>  if (irq >= s->num_irq)
>  goto bad_reg;
>  for (i = 0; i < 8; i++, irq++) {
>  if (irq > GIC_NR_SGIS && value & (1 << i)) {
> -gic_clear_pending(s, irq, 1 << cpu, 0);
> +GIC_CLEAR_SW_PENDING(irq, cm);
> +if (GIC_TEST_TRIGGER(irq + i) || !GIC_TEST_LEVEL(irq, cm)) {
> +GIC_CLEAR_PENDING(irq, cm);
> +}
>  }
>  }
>  } else if (offset < 0x400) {
> diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index
> 1d3b738..7f0615f 100644
> --- a/hw/intc/arm_gic_common.c
> +++ b/hw/intc/arm_gic_common.c
> @@ -43,11 +43,12 @@ static int gic_post_load(void *opaque, int version_id)
> 
>  static const VMStateDescription vmstate_gic_irq_state = {
>  .name = "arm_gic_irq_state",
> -.version_id = 1,
> -.minimum_version_id = 1,
> +.version_id = 2,
> +.minimum_version_id = 2,
>  .fields = (VMStateField[]) {
>  VMSTATE_UINT8(enabled, gic_irq_state),
>  VMSTATE_UINT8(pending, gic_irq_state),
> +VMSTATE_UINT8(sw_pending, gic_irq_state),
>  VMSTATE_UINT8(active, gic_irq_state),
>  VMSTATE_UINT8(level, gic_irq_state),
>  VMSTATE_BOOL(model, gic_irq_state), diff --git 
> a/hw/intc/gic_internal.h
> b/hw/intc/gic_internal.h index f9133b9..173c607 100644
> --- a/hw/intc/gic_internal.h
> +++ b/hw/intc/gic_internal.h
> @@ -43,6 +43,9 @@
>  #define GIC_SET_PENDING(irq, cm) s->irq_state[irq].pending |= (cm)  #define
> GIC_CLEAR_PENDING(irq, cm) s->irq_state[irq].pending &= ~(cm)  #define
> GIC_TEST_PENDING(irq, cm) ((s->irq_state[irq].pending & (cm)) != 0)
> +#define GIC_SET_SW_PENDING(irq, cm) s->irq_state[irq].sw_pending |=
> +(cm) #define GIC_CLEAR_SW_PENDING(irq, cm) s->irq_state[irq].sw_pending
> +&= ~(cm) #define GIC_TEST_SW_PENDING(irq, cm)
> +((s->irq_state[irq].sw_pending & (cm)) != 0)
>  #define GIC_SET_ACTIVE(irq, cm) s->irq_state[irq].active |= (cm)  #define
> GIC_CLEAR_ACT

Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/3 v2] Reset qemu timers when guest reset

2013-01-03 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Friday, January 04, 2013 1:51 AM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; ag...@suse.de; Bhushan Bharat-
> R65777
> Subject: Re: [Qemu-ppc] [PATCH 2/3 v2] Reset qemu timers when guest reset
> 
> On 12/27/2012 11:16:51 PM, Bharat Bhushan wrote:
> > This patch install the timer reset handler. This will be called when
> > the guest is reset.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  v2: same as v1
> >
> >  hw/ppc_booke.c |   12 
> >  1 files changed, 12 insertions(+), 0 deletions(-)
> >
> > diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index d51e7fa..837a5b6
> > 100644
> > --- a/hw/ppc_booke.c
> > +++ b/hw/ppc_booke.c
> > @@ -231,6 +231,16 @@ void store_booke_tcr(CPUPPCState *env,
> > target_ulong val)
> >
> >  }
> >
> > +static void ppc_booke_timer_reset_handle(void *opaque) {
> > +CPUPPCState *env = opaque;
> > +
> > +env->spr[SPR_BOOKE_TSR] = 0;
> > +env->spr[SPR_BOOKE_TCR] = 0;
> > +
> > +booke_update_irq(env);
> > +}
> 
> When does KVM_SET_SREGS get called?

This is part of reset processing and is not cpu_synchronize_state() called 
before all reset handlers are called and after that post_synchronize will do 
the KVM_SET_SREGS in kvm_put_registers().

Thanks
-Bharat





Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-03 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> Sent: Friday, January 04, 2013 7:08 AM
> To: Alexander Graf
> Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu-
> de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> do_kvm_cpu_synchronize_state data integrity issue
> 
> On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
> >
> > On 03.01.2013, at 19:48, Jason J. Herne wrote:
> >
> > > On 01/03/2013 08:56 AM, Alexander Graf wrote:
> > >>> static void do_kvm_cpu_synchronize_state(void *_args)
> > >>> >{
> > >>> > struct kvm_cpu_syncstate_args *args = _args;
> > >>> >+CPUArchState *env = args->env;
> > >>> >+int register_level = args->register_level;
> > >>> >
> > >> This probably becomes more readable if we explicitly revert back to
> unsynced state first:
> > >>
> > >> /* Write back local modifications at our current level */ if
> > >> (register_level > env->kvm_vcpu_dirty) {
> > >> kvm_arch_put_registers(...);
> > >> env->kvm_vcpu_dirty = 0;
> > >> }
> > >>
> > >> and then do the sync we are requested to do:
> > >>
> > >> if (!env->kvm_vcpu_dirty) {
> > >> ...
> > >> }
> > >
> > > I agree, but only if we add a second conditional to the if 1st statement  
> > > as
> such:
> > >
> > > if (args->env->kvm_vcpu_dirty && register_level >
> > > env->kvm_vcpu_dirty)
> > >
> > > This is to cover the case where the caller is asking for register level 
> > > "1"
> and we're already dirty at level "2". In this case, nothing should happen and
> we'll need the "args->env->kvm_vcpu_dirty" to ensure that is the case.
> >
> > As before, I'd prefer to make this explicit:
> >
> > >
> > > static void do_kvm_cpu_synchronize_state(void *_args) {
> > >struct kvm_cpu_syncstate_args *args = _args;
> > >CPUArchState *env = args->env;
> > >int register_level = args->register_level;
> >
> > if (register_level > env->kvm_vcpu_dirty) {
> > /* We are more dirty than we need to - all is well */
> > return;
> > }
> >
> > >
> > >/* Write back local modifications at our current level */
> > >if (args->env->kvm_vcpu_dirty && register_level > env->kvm_vcpu_dirty) 
> > > {
> > >kvm_arch_put_registers(env, env->kvm_vcpu_dirty);
> > >env->kvm_vcpu_dirty = 0;
> > >}
> > >
> > >if (!args->env->kvm_vcpu_dirty) {
> > >kvm_arch_get_registers(env, register_level);
> > >env->kvm_vcpu_dirty = register_level;
> > >}
> > > }
> > >
> > > Do you agree?  Thanks for your time. :)
> >
> > Please also check out the discussions I've had with Bharat about his 
> > watchdog
> patches. There we need a mechanism to synchronize registers only when we
> actually need to, in order to avoid potential race conditions with a kernel
> timer.
> >
> > That particular case doesn't work well with levels. We can have multiple
> different potential race producers in the kernel that we need to avoid
> individually, so we can't always synchronize all of them when only one of them
> needs to be synchronized.
> >
> > The big question is what we should be doing about this. We basically have 3
> options:
> >
> >   * implement levels, treat racy registers as "manually synchronized", as
> Bharat's latest patch set does
> >   * implement levels, add a bitmap for additional special synchronization 
> > bits
> >   * replace levels by bitmap
> >
> > I'm quite frankly not sure which one of the 3 would be the best way forward.
> >
> >
> > Alex
> 
> Implement read/write accessors for individual registers, similarly to how its
> done in the kernel. See kvm_cache_regs.h.

Read/write for individual registers can be heavy for cases where multiple 
register needed to be updated. Is not it?

For cases where multiple register needed to be synchronized then I would like 
to elaborate the options as:

Option 1:
int timer_func(CPUArchState env)
{
cpu_synchronize_state(env);
//update env->timer_registers
env->update

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-04 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Friday, January 04, 2013 2:11 PM
> To: Bhushan Bharat-R65777
> Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; 
> Anthony
> Liguori; qemu-devel@nongnu.org qemu-devel
> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> do_kvm_cpu_synchronize_state data integrity issue
> 
> 
> On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> >> Sent: Friday, January 04, 2013 7:08 AM
> >> To: Alexander Graf
> >> Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony
> >> Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
> >> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> >> do_kvm_cpu_synchronize_state data integrity issue
> >>
> >> On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
> >>>
> >>> On 03.01.2013, at 19:48, Jason J. Herne wrote:
> >>>
> >>>> On 01/03/2013 08:56 AM, Alexander Graf wrote:
> >>>>>> static void do_kvm_cpu_synchronize_state(void *_args)
> >>>>>>> {
> >>>>>>>struct kvm_cpu_syncstate_args *args = _args;
> >>>>>>> +CPUArchState *env = args->env;
> >>>>>>> +int register_level = args->register_level;
> >>>>>>>
> >>>>> This probably becomes more readable if we explicitly revert back
> >>>>> to
> >> unsynced state first:
> >>>>>
> >>>>> /* Write back local modifications at our current level */ if
> >>>>> (register_level > env->kvm_vcpu_dirty) {
> >>>>>kvm_arch_put_registers(...);
> >>>>>env->kvm_vcpu_dirty = 0;
> >>>>> }
> >>>>>
> >>>>> and then do the sync we are requested to do:
> >>>>>
> >>>>> if (!env->kvm_vcpu_dirty) {
> >>>>>...
> >>>>> }
> >>>>
> >>>> I agree, but only if we add a second conditional to the if 1st
> >>>> statement  as
> >> such:
> >>>>
> >>>> if (args->env->kvm_vcpu_dirty && register_level >
> >>>> env->kvm_vcpu_dirty)
> >>>>
> >>>> This is to cover the case where the caller is asking for register level 
> >>>> "1"
> >> and we're already dirty at level "2". In this case, nothing should
> >> happen and we'll need the "args->env->kvm_vcpu_dirty" to ensure that is the
> case.
> >>>
> >>> As before, I'd prefer to make this explicit:
> >>>
> >>>>
> >>>> static void do_kvm_cpu_synchronize_state(void *_args) {
> >>>>   struct kvm_cpu_syncstate_args *args = _args;
> >>>>   CPUArchState *env = args->env;
> >>>>   int register_level = args->register_level;
> >>>
> >>> if (register_level > env->kvm_vcpu_dirty) {
> >>>/* We are more dirty than we need to - all is well */
> >>>return;
> >>> }
> >>>
> >>>>
> >>>>   /* Write back local modifications at our current level */
> >>>>   if (args->env->kvm_vcpu_dirty && register_level > env->kvm_vcpu_dirty) 
> >>>> {
> >>>>   kvm_arch_put_registers(env, env->kvm_vcpu_dirty);
> >>>>   env->kvm_vcpu_dirty = 0;
> >>>>   }
> >>>>
> >>>>   if (!args->env->kvm_vcpu_dirty) {
> >>>>   kvm_arch_get_registers(env, register_level);
> >>>>   env->kvm_vcpu_dirty = register_level;
> >>>>   }
> >>>> }
> >>>>
> >>>> Do you agree?  Thanks for your time. :)
> >>>
> >>> Please also check out the discussions I've had with Bharat about his
> >>> watchdog
> >> patches. There we need a mechanism to synchronize registers only when
> >> we actually need to, in order to avoid potential race conditions with
> >> a kernel timer.
> >>>
> >>> That particular case doesn't work well with levels. We can have
> >>> multiple
> >> different potential race producers in t

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-04 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Friday, January 04, 2013 2:11 PM
> To: Bhushan Bharat-R65777
> Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; 
> Anthony
> Liguori; qemu-devel@nongnu.org qemu-devel
> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> do_kvm_cpu_synchronize_state data integrity issue
> 
> 
> On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> >> Sent: Friday, January 04, 2013 7:08 AM
> >> To: Alexander Graf
> >> Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony
> >> Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
> >> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> >> do_kvm_cpu_synchronize_state data integrity issue
> >>
> >> On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
> >>>
> >>> On 03.01.2013, at 19:48, Jason J. Herne wrote:
> >>>
> >>>> On 01/03/2013 08:56 AM, Alexander Graf wrote:
> >>>>>> static void do_kvm_cpu_synchronize_state(void *_args)
> >>>>>>> {
> >>>>>>>struct kvm_cpu_syncstate_args *args = _args;
> >>>>>>> +CPUArchState *env = args->env;
> >>>>>>> +int register_level = args->register_level;
> >>>>>>>
> >>>>> This probably becomes more readable if we explicitly revert back
> >>>>> to
> >> unsynced state first:
> >>>>>
> >>>>> /* Write back local modifications at our current level */ if
> >>>>> (register_level > env->kvm_vcpu_dirty) {
> >>>>>kvm_arch_put_registers(...);
> >>>>>env->kvm_vcpu_dirty = 0;
> >>>>> }
> >>>>>
> >>>>> and then do the sync we are requested to do:
> >>>>>
> >>>>> if (!env->kvm_vcpu_dirty) {
> >>>>>...
> >>>>> }
> >>>>
> >>>> I agree, but only if we add a second conditional to the if 1st
> >>>> statement  as
> >> such:
> >>>>
> >>>> if (args->env->kvm_vcpu_dirty && register_level >
> >>>> env->kvm_vcpu_dirty)
> >>>>
> >>>> This is to cover the case where the caller is asking for register level 
> >>>> "1"
> >> and we're already dirty at level "2". In this case, nothing should
> >> happen and we'll need the "args->env->kvm_vcpu_dirty" to ensure that is the
> case.
> >>>
> >>> As before, I'd prefer to make this explicit:
> >>>
> >>>>
> >>>> static void do_kvm_cpu_synchronize_state(void *_args) {
> >>>>   struct kvm_cpu_syncstate_args *args = _args;
> >>>>   CPUArchState *env = args->env;
> >>>>   int register_level = args->register_level;
> >>>
> >>> if (register_level > env->kvm_vcpu_dirty) {
> >>>/* We are more dirty than we need to - all is well */
> >>>return;
> >>> }
> >>>
> >>>>
> >>>>   /* Write back local modifications at our current level */
> >>>>   if (args->env->kvm_vcpu_dirty && register_level > env->kvm_vcpu_dirty) 
> >>>> {
> >>>>   kvm_arch_put_registers(env, env->kvm_vcpu_dirty);
> >>>>   env->kvm_vcpu_dirty = 0;
> >>>>   }
> >>>>
> >>>>   if (!args->env->kvm_vcpu_dirty) {
> >>>>   kvm_arch_get_registers(env, register_level);
> >>>>   env->kvm_vcpu_dirty = register_level;
> >>>>   }
> >>>> }
> >>>>
> >>>> Do you agree?  Thanks for your time. :)
> >>>
> >>> Please also check out the discussions I've had with Bharat about his
> >>> watchdog
> >> patches. There we need a mechanism to synchronize registers only when
> >> we actually need to, in order to avoid potential race conditions with
> >> a kernel timer.
> >>>
> >>> That particular case doesn't work well with levels. We can have
> >>> multiple
> >> different potential race producers in t

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-04 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Friday, January 04, 2013 3:59 PM
> To: Bhushan Bharat-R65777
> Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; 
> Anthony
> Liguori; qemu-devel@nongnu.org qemu-devel
> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> do_kvm_cpu_synchronize_state data integrity issue
> 
> 
> On 04.01.2013, at 11:23, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Friday, January 04, 2013 2:11 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian
> >> Borntraeger; Anthony Liguori; qemu-devel@nongnu.org qemu-devel
> >> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> >> do_kvm_cpu_synchronize_state data integrity issue
> >>
> >>
> >> On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote:
> >>
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> >>>> Sent: Friday, January 04, 2013 7:08 AM
> >>>> To: Alexander Graf
> >>>> Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony
> >>>> Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
> >>>> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> >>>> do_kvm_cpu_synchronize_state data integrity issue
> >>>>
> >>>> On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
> >>>>>
> >>>>> On 03.01.2013, at 19:48, Jason J. Herne wrote:
> >>>>>
> >>>>>> On 01/03/2013 08:56 AM, Alexander Graf wrote:
> >>>>>>>> static void do_kvm_cpu_synchronize_state(void *_args)
> >>>>>>>>> {
> >>>>>>>>>   struct kvm_cpu_syncstate_args *args = _args;
> >>>>>>>>> +CPUArchState *env = args->env;
> >>>>>>>>> +int register_level = args->register_level;
> >>>>>>>>>
> >>>>>>> This probably becomes more readable if we explicitly revert back
> >>>>>>> to
> >>>> unsynced state first:
> >>>>>>>
> >>>>>>> /* Write back local modifications at our current level */ if
> >>>>>>> (register_level > env->kvm_vcpu_dirty) {
> >>>>>>>   kvm_arch_put_registers(...);
> >>>>>>>   env->kvm_vcpu_dirty = 0;
> >>>>>>> }
> >>>>>>>
> >>>>>>> and then do the sync we are requested to do:
> >>>>>>>
> >>>>>>> if (!env->kvm_vcpu_dirty) {
> >>>>>>>   ...
> >>>>>>> }
> >>>>>>
> >>>>>> I agree, but only if we add a second conditional to the if 1st
> >>>>>> statement  as
> >>>> such:
> >>>>>>
> >>>>>> if (args->env->kvm_vcpu_dirty && register_level >
> >>>>>> env->kvm_vcpu_dirty)
> >>>>>>
> >>>>>> This is to cover the case where the caller is asking for register level
> "1"
> >>>> and we're already dirty at level "2". In this case, nothing should
> >>>> happen and we'll need the "args->env->kvm_vcpu_dirty" to ensure
> >>>> that is the
> >> case.
> >>>>>
> >>>>> As before, I'd prefer to make this explicit:
> >>>>>
> >>>>>>
> >>>>>> static void do_kvm_cpu_synchronize_state(void *_args) {  struct
> >>>>>> kvm_cpu_syncstate_args *args = _args;  CPUArchState *env =
> >>>>>> args->env;  int register_level = args->register_level;
> >>>>>
> >>>>> if (register_level > env->kvm_vcpu_dirty) {
> >>>>>   /* We are more dirty than we need to - all is well */
> >>>>>   return;
> >>>>> }
> >>>>>
> >>>>>>
> >>>>>>  /* Write back local modifications at our current level */  if
> >>>>>> (args->env->kvm_vcpu_dirty && register_level > env->kvm_vcp

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-04 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Friday, January 04, 2013 4:07 PM
> To: Bhushan Bharat-R65777
> Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; 
> Anthony
> Liguori; qemu-devel@nongnu.org qemu-devel
> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> do_kvm_cpu_synchronize_state data integrity issue
> 
> 
> On 04.01.2013, at 11:32, Bhushan Bharat-R65777 wrote:
> 
> >
> >
> >> -Original Message-
> >>>>>
> >>>>> Int timer_func(CPUxxState env)
> >>>>> {
> >>>>>  Struct timer_regs;
> >>>>>  read_regs_type((env, &timer_regs,TIMER_REGS);  //update
> >>>>> env->timer_registers  Write_regs_type(env, TIMER_REGS) }
> >>>>>
> >>>>> This will get one type of register_types and can cause multiple
> >>>>> IOCTL per
> >>>> entry to QEMU.
> >>>>
> >>>> This is still too explicit. How about
> >>>>
> >>>> static inline void ppc_set_tsr(CPUState *env, target_ulong val) {
> >>>>   env->kvm_sync_extra |= SYNC_TIMER_REGS;
> >>>>   cpu_synchronize_registers(env);
> >>>>   env->spr[SPR_TSR] = val;
> >>>
> >>> You also want env->kvm_sync_dirty also, right?
> >>
> >> Not quite, since SYNC_TIMER_REGS spans more than only TSR. So we need
> >> to make sure we fetch the non-TSR register values before we can
> >> declare TIMER_REGS as dirty.
> >
> > Right , I actually want communicate mark dirty the TIMER_REGS only :)
> 
> Imagine TIMER_REGS includes TSR and TCR. Then
> 
> ppc_set_tsr();
> 
> should still work without a respective ppc_set_tcr() call. So we can't just 
> mark
> it dirty here. The only way we could optimize this bit would be by either
> splitting the bitmap per register or by extending the setter functions to the
> full scope of the sync:

Are you talking about GET all register of a type, say TIMER (TCR and TSR) on 
first call to cpu_synchronization_function but SET will be only for dirty 
register. Say if TSR is changed then setter will only communicate that TSR is 
changed?

Or

(
get multiple regs of a type , say TIMER (TCR and TSR), change one or more and 
mark dirty if any one changed. Set all registers of a type, TIMER (TCR and TSR) 
if dirty.
  In this case the first call by ppc_set_tsr() to cpu_synchronization_function 
will implicitly get all registers of TIMER (TCR and TSR) - means one or more 
registers of timer can be updated by QEMU. This function will update only 
env->TSR.
  Later if ppc_set_tcr() is called, it will also call 
cpu_synchronization_function but this will see that TIMER registers are already 
available to QEMU, so it will not fetch again. This function will update the 
TCR also.
  Finally they all, TIMER (TCR and TSR) will be pushed to kernel.
)

Thanks
-Bharat

> 
> static inline void ppc_set_timer_regs(CPUState *env, target_ulong tsr,
> target_ulong tcr) {
> ...
> }
> 
> 
> Alex
> 





Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

2013-01-06 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Jason J. Herne [mailto:jjhe...@linux.vnet.ibm.com]
> Sent: Friday, January 04, 2013 11:10 PM
> To: Alexander Graf
> Cc: Christian Borntraeger; Anthony Liguori; Marcelo Tosatti; qemu-
> de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
> do_kvm_cpu_synchronize_state data integrity issue
> 
> On 01/04/2013 11:27 AM, Alexander Graf wrote:
> >
> > On 04.01.2013, at 16:25, Jason J. Herne wrote:
> >
> >> If I've followed the conversation correctly this is what needs to be done:
> >>
> >> 1. Remove the level parameters from kvm_arch_get_registers and
> kvm_arch_put_registers.
> >>
> >> 2. Add a new bitmap parameter to kvm_arch_get_registers and
> kvm_arch_put_registers.
> >
> > I would combine these into "replace levels with bitmap".
> >
> >> 3. Define a bit that correlates to our current notion of "all runtime
> registers".  This bit, and all bits in this bitmap, would be architecture
> specific.
> >
> > Why would that bit be architecture specific? "All runtime registers" ==
> "registers that gdb can access" IIRC. The implementation on what exactly that
> means obviously is architecture specific, but the bit itself would not be, as
> the gdbstub wants to be able to synchronize in arch independent code.
> >
> >> 4. Remove the cpustate->kvm_sync_dirty field.  Replace it with a bitmap 
> >> that
> tracks which bits are dirty and need to be synced back to KVM-land.
> >>
> >> 5. As we do today, we'll assume registers are dirty and turn on their
> corresponding bit in this new bitmap whenever we "get" the registers from KVM.
> >
> > Yes. Changing these semantics is nothing for today :).
> >
> >> 6. Add other bits as needed on a case by case basis.
> >>
> >> Does this seem to match what was discussed, and what we want to do?
> >
> > It's probably the best way forward, keeping everyone happy.
> >
> > Please coordinate with Bharat on who actually wants to sit down to implement
> this. Or if you're quick you might be able to beat him to it regardless thanks
> to time zones :).
> >
> 
> Hi Bharat,
> 
> How would you like to handle these changes?  I can do them, or you could if 
> you
> prefer. Please let me know.

Hi Jason,

I will be happy to see the patch from you and because of some other things if 
you think that it will take time then let me know, I will do the changes. This 
framework will be used by my watchdog patches and only thing I want is my 
watchdog changes get pushed in QEMU.

Thanks
-Bharat





Re: [Qemu-devel] [PATCH 3/3 v2] Enable kvm emulated watchdog

2013-01-10 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, January 10, 2013 9:07 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 3/3 v2] Enable kvm emulated watchdog
> 
> 
> On 28.12.2012, at 06:16, Bharat Bhushan wrote:
> 
> > Enable the KVM emulated watchdog if KVM supports (use the capability
> > enablement in watchdog handler). Also watchdog exit
> > (KVM_EXIT_WATCHDOG) handling is added.
> > Watchdog state machine is cleared whenever VM state changes to running.
> > This is to handle the cases like return from debug halt etc.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > v2:
> >  - access cap_* from target_ppc/kvm.c only.
> >  - Added wrapper functions in target_ppc/kvm.c for
> >enable_watchdog and tsr_sregs synchronization.
> >  - Incorporated other Review comments
> >
> > hw/ppc.h |1 +
> > hw/ppc_booke.c   |   36 +++-
> > target-ppc/kvm.c |   56 
> > ++
> > target-ppc/kvm_ppc.h |   11 +
> > 4 files changed, 103 insertions(+), 1 deletions(-)
> >
> > diff --git a/hw/ppc.h b/hw/ppc.h
> > index 2f3ea27..6ad9e1f 100644
> > --- a/hw/ppc.h
> > +++ b/hw/ppc.h
> > @@ -90,3 +90,4 @@ enum {
> >
> > /* ppc_booke.c */
> > void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t
> > flags);
> > +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env);
> > diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..7273259
> > 100644
> > --- a/hw/ppc_booke.c
> > +++ b/hw/ppc_booke.c
> > @@ -28,7 +28,7 @@
> > #include "nvram.h"
> > #include "qemu-log.h"
> > #include "loader.h"
> > -
> > +#include "kvm_ppc.h"
> >
> > /* Timer Control Register */
> >
> > @@ -203,6 +203,11 @@ static void booke_wdt_cb(void *opaque)
> >  booke_timer->wdt_timer); }
> >
> > +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env) {
> > +env->spr[SPR_BOOKE_TSR] &= ~(TSR_ENW | TSR_WIS | TSR_WRS_MASK); }
> > +
> > void store_booke_tsr(CPUPPCState *env, target_ulong val) {
> > env->spr[SPR_BOOKE_TSR] &= ~val;
> > @@ -241,10 +246,27 @@ static void ppc_booke_timer_reset_handle(void *opaque)
> > booke_update_irq(env);
> > }
> >
> > +static void cpu_state_change_handler(void *opaque, int running,
> > +RunState state) {
> 
> Needs a comment when this happens
> 
> > +CPUPPCState *env = opaque;
> > +
> > +if (!running) {
> > +return;
> > +}
> > +
> > +/*
> > + * Clear watchdog interrupt condition by clearing TSR.
> > + */
> > +ppc_booke_watchdog_clear_tsr(env);
> > +
> > +kvmppc_synch_sregs_tsr(env);
> 
> kvmppc_sync_tsr. Also please add the sync to store_booke_tsr(). Then here, you
> can just do
> 
>   store_booke_tsr(TSR_ENW | TSR_WIS | TSR_WRS_MASK);
> 
> > +}
> > +
> > void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t
> > flags) {
> > ppc_tb_t *tb_env;
> > booke_timer_t *booke_timer;
> > +int ret = 0;
> >
> > tb_env  = g_malloc0(sizeof(ppc_tb_t));
> > booke_timer = g_malloc0(sizeof(booke_timer_t)); @@ -262,5 +284,17
> > @@ void ppc_booke_timers_init(CPUPPCState *env, uint32_t freq, uint32_t 
> > flags)
> > booke_timer->wdt_timer =
> > qemu_new_timer_ns(vm_clock, &booke_wdt_cb, env);
> >
> > +ret = kvmppc_booke_watchdog_enable(env);
> > +
> > +if (ret) {
> > +/* TODO: Start the QEMU emulated watchdog if not running on KVM.
> > + * Also start the QEMU emulated watchdog if KVM does not support
> > + * emulated watchdog or somehow it is not enabled (supported but
> > + * not enabled is though some bug and requires debugging :)).
> > + */
> > +}
> > +
> > +qemu_add_vm_change_state_handler(cpu_state_change_handler, env);
> > +
> > qemu_register_reset(ppc_booke_timer_reset_handle, env); } diff
> > --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 3f5df57..6828afa
> > 100644
> > --- a/target-ppc/kvm.c
> > +++ b/target-ppc/kvm.c
> > @@ -32,10 +32,12 @@
> > #include "device_tree.h"
> > #include "hw/sysbus.h"
> > #include "hw/spapr.h"
> > +#include "

Re: [Qemu-devel] [PATCH 0/7 v2] KVM regsync

2013-01-11 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Jason J. Herne [mailto:jjhe...@linux.vnet.ibm.com]
> Sent: Thursday, January 10, 2013 8:59 PM
> To: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com;
> mtosa...@redhat.com; qemu-devel@nongnu.org; Bhushan Bharat-R65777;
> jan.kis...@siemens.com
> Subject: [PATCH 0/7 v2] KVM regsync
> 
> Rework the method used to synchronize CPU registers between Qemu & KVM.  This
> patch set extends kvm_arch_put_registers() and
> kvm_arch_get_registers() to take a register bitmap parameter.  All existing 
> code
> paths are updated to specify this new parameter.
> 
> IMPORTANT NOTE:  The PPC and i386 implementations are incomplete.
> I am submitting this code at this time only to get a review on the
> implementation of the existing code and to perhaps seek assistance with the
> mentioned architectures.
> 
> I am not sure who will finish the implementation of PPC/i386 yet.  Due to the
> fact that I am unfamiliar with these architectures at the register level and I
> do not have test environments I would like to humbly request that a maintainer
> of these architectures take a look at it.  Or perhaps Bharat could handle the
> PPC code?  This would only leave

I will handle the PPC code.

-Bharat

> i386 to worry about.  If I cannot find someone to handle i386 I will look into
> the feasibility of completing it myself.
> 
> In order to complete the missing implementations, kvm_arch_get_registers and
> kvm_arch_put_registers (and associated helper functions) will need to be 
> updated
> to only sync the registers contained in the new bitmap argument.  Also, each 
> set
> of registers represented  by one of the bits must be mutually exclusive with
> respect to every other bit.  if this is not the case then local register data
> can be lost when kvm_arch_get_registers is called causing an old register 
> value
> to overwrite a newer local value.
> 
> --
> -- Jason J. Herne (jjhe...@linux.vnet.ibm.com)
> 





Re: [Qemu-devel] [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to do_kvm_cpu_synchronize_state

2013-01-14 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Jason J. Herne [mailto:jjhe...@us.ibm.com]
> Sent: Thursday, January 10, 2013 8:59 PM
> To: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com;
> mtosa...@redhat.com; qemu-devel@nongnu.org; Bhushan Bharat-R65777;
> jan.kis...@siemens.com
> Cc: Jason J. Herne
> Subject: [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to
> do_kvm_cpu_synchronize_state
> 
> From: "Jason J. Herne" 
> 
> do_kvm_cpu_synchronize_state is called via run_on_cpu, so we can only pass
> a single argument.  Create SyncStateArgs struct for this purpose and add
> register bitmap data member to it.
> 
> Signed-off-by: Jason J. Herne 
> Reviewed-by: Christian Borntraeger 
> ---
>  include/sysemu/kvm.h |6 ++
>  kvm-all.c|   27 +--
>  2 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index e0738ba..193d1f4 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -223,6 +223,12 @@ int kvm_check_extension(KVMState *s, unsigned int
> extension);
> 
>  uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
>uint32_t index, int reg);
> +
> +struct kvm_cpu_syncstate_args {
> +CPUState *cpu;
> +int regmap;
> +};
> +
>  void kvm_cpu_synchronize_state(CPUArchState *env);
>  void kvm_cpu_synchronize_post_reset(CPUArchState *env);
>  void kvm_cpu_synchronize_post_init(CPUArchState *env);
> diff --git a/kvm-all.c b/kvm-all.c
> index 1aa61bb..77ab72a 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -231,7 +231,7 @@ int kvm_init_vcpu(CPUArchState *env)
> 
>  cpu->kvm_fd = ret;
>  cpu->kvm_state = s;
> -cpu->kvm_vcpu_dirty = true;
> +cpu->kvm_vcpu_dirty = KVM_REGSYNC_FULL_STATE;
> 
>  mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
>  if (mmap_size < 0) {
> @@ -1491,20 +1491,27 @@ void kvm_flush_coalesced_mmio_buffer(void)
> 
>  static void do_kvm_cpu_synchronize_state(void *arg)
>  {
> -CPUState *cpu = arg;
> +struct kvm_cpu_syncstate_args *args = arg;
> 
> -if (!cpu->kvm_vcpu_dirty) {
> -kvm_arch_get_registers(cpu, KVM_REGSYNC_FULL_STATE);
> -cpu->kvm_vcpu_dirty = true;
> +/* Do not sync regs that are already dirty */
> +int regs_to_get = args->regmap & ~args->cpu->kvm_vcpu_dirty;
> +
> +if (regs_to_get) {
> +kvm_arch_get_registers(args->cpu, regs_to_get);
> +args->cpu->kvm_vcpu_dirty |= regs_to_get;
>  }
>  }
> 
>  void kvm_cpu_synchronize_state(CPUArchState *env)
>  {
>  CPUState *cpu = ENV_GET_CPU(env);
> +struct kvm_cpu_syncstate_args args;
> +
> +args.cpu = cpu;
> +args.regmap = KVM_REGSYNC_FULL_STATE;
> 
> -if (!cpu->kvm_vcpu_dirty) {
> -run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu);
> +if (args.regmap & ~cpu->kvm_vcpu_dirty) {
> +run_on_cpu(cpu, do_kvm_cpu_synchronize_state, &args);
>  }
>  }
> 
> @@ -1513,7 +1520,7 @@ void kvm_cpu_synchronize_post_reset(CPUArchState *env)
>  CPUState *cpu = ENV_GET_CPU(env);
> 
>  kvm_arch_put_registers(cpu, KVM_REGSYNC_RESET_STATE);
> -cpu->kvm_vcpu_dirty = false;
> +cpu->kvm_vcpu_dirty &= ~KVM_REGSYNC_RESET_STATE;
>  }
> 
>  void kvm_cpu_synchronize_post_init(CPUArchState *env)
> @@ -1521,7 +1528,7 @@ void kvm_cpu_synchronize_post_init(CPUArchState *env)
>  CPUState *cpu = ENV_GET_CPU(env);
> 
>  kvm_arch_put_registers(cpu, KVM_REGSYNC_FULL_STATE);
> -cpu->kvm_vcpu_dirty = false;
> +cpu->kvm_vcpu_dirty &= ~KVM_REGSYNC_FULL_STATE;
>  }
> 
>  int kvm_cpu_exec(CPUArchState *env)
> @@ -1540,7 +1547,7 @@ int kvm_cpu_exec(CPUArchState *env)
>  do {
>  if (cpu->kvm_vcpu_dirty) {
>  kvm_arch_put_registers(cpu, KVM_REGSYNC_RUNTIME_STATE);

s/KVM_REGSYNC_RUNTIME_STATE/cpu->kvm_vcpu_dirty


> -cpu->kvm_vcpu_dirty = false;
> +cpu->kvm_vcpu_dirty &= ~KVM_REGSYNC_RUNTIME_STATE;

cpu->kvm_vcpu_dirty = 0;

-Bharat

>  }
> 
>  kvm_arch_pre_run(cpu, run);
> --
> 1.7.9.5
> 





Re: [Qemu-devel] [PATCH 0/7 v2] KVM regsync

2013-01-14 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Jason J. Herne [mailto:jjhe...@linux.vnet.ibm.com]
> Sent: Thursday, January 10, 2013 8:59 PM
> To: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com;
> mtosa...@redhat.com; qemu-devel@nongnu.org; Bhushan Bharat-R65777;
> jan.kis...@siemens.com
> Subject: [PATCH 0/7 v2] KVM regsync
> 
> Rework the method used to synchronize CPU registers between Qemu & KVM.  This
> patch set extends kvm_arch_put_registers() and
> kvm_arch_get_registers() to take a register bitmap parameter.  All existing 
> code
> paths are updated to specify this new parameter.
> 
> IMPORTANT NOTE:  The PPC and i386 implementations are incomplete.
> I am submitting this code at this time only to get a review on the
> implementation of the existing code and to perhaps seek assistance with the
> mentioned architectures.
> 
> I am not sure who will finish the implementation of PPC/i386 yet.  Due to the
> fact that I am unfamiliar with these architectures at the register level and I
> do not have test environments I would like to humbly request that a maintainer
> of these architectures take a look at it.  Or perhaps Bharat could handle the
> PPC code?  This would only leave
> i386 to worry about.  If I cannot find someone to handle i386 I will look into
> the feasibility of completing it myself.
> 
> In order to complete the missing implementations, kvm_arch_get_registers and
> kvm_arch_put_registers (and associated helper functions) will need to be 
> updated
> to only sync the registers contained in the new bitmap argument.  Also, each 
> set
> of registers represented  by one of the bits must be mutually exclusive with
> respect to every other bit.  if this is not the case then local register data
> can be lost when kvm_arch_get_registers is called causing an old register 
> value
> to overwrite a newer local value.

I think net step can be let the register bitmap to go all upto the KVM, so the 
KVM also read/write the specified registers set only.

Thanks
-Bharat




Re: [Qemu-devel] [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to do_kvm_cpu_synchronize_state

2013-01-16 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> Sent: Wednesday, January 16, 2013 9:36 PM
> To: Jason J. Herne
> Cc: ag...@suse.de; borntrae...@de.ibm.com; aligu...@us.ibm.com; qemu-
> de...@nongnu.org; Bhushan Bharat-R65777; jan.kis...@siemens.com
> Subject: Re: [PATCH 4/7 v2] KVM regsync: Add register bitmap parameter to
> do_kvm_cpu_synchronize_state
> 
> On Thu, Jan 10, 2013 at 10:29:04AM -0500, Jason J. Herne wrote:
> > From: "Jason J. Herne" 
> >
> > do_kvm_cpu_synchronize_state is called via run_on_cpu, so we can only
> > pass a single argument.  Create SyncStateArgs struct for this purpose
> > and add register bitmap data member to it.
> >
> > Signed-off-by: Jason J. Herne 
> > Reviewed-by: Christian Borntraeger 
> > ---
> >  include/sysemu/kvm.h |6 ++
> >  kvm-all.c|   27 +--
> >  2 files changed, 23 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index
> > e0738ba..193d1f4 100644
> > --- a/include/sysemu/kvm.h
> > +++ b/include/sysemu/kvm.h
> > @@ -223,6 +223,12 @@ int kvm_check_extension(KVMState *s, unsigned int
> > extension);
> >
> >  uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
> >uint32_t index, int reg);
> > +
> > +struct kvm_cpu_syncstate_args {
> > +CPUState *cpu;
> > +int regmap;
> > +};
> > +
> >  void kvm_cpu_synchronize_state(CPUArchState *env);  void
> > kvm_cpu_synchronize_post_reset(CPUArchState *env);  void
> > kvm_cpu_synchronize_post_init(CPUArchState *env); diff --git
> > a/kvm-all.c b/kvm-all.c index 1aa61bb..77ab72a 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -231,7 +231,7 @@ int kvm_init_vcpu(CPUArchState *env)
> >
> >  cpu->kvm_fd = ret;
> >  cpu->kvm_state = s;
> > -cpu->kvm_vcpu_dirty = true;
> > +cpu->kvm_vcpu_dirty = KVM_REGSYNC_FULL_STATE;
> >
> >  mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
> >  if (mmap_size < 0) {
> > @@ -1491,20 +1491,27 @@ void kvm_flush_coalesced_mmio_buffer(void)
> >
> >  static void do_kvm_cpu_synchronize_state(void *arg)  {
> > -CPUState *cpu = arg;
> > +struct kvm_cpu_syncstate_args *args = arg;
> >
> > -if (!cpu->kvm_vcpu_dirty) {
> > -kvm_arch_get_registers(cpu, KVM_REGSYNC_FULL_STATE);
> > -cpu->kvm_vcpu_dirty = true;
> > +/* Do not sync regs that are already dirty */
> > +int regs_to_get = args->regmap & ~args->cpu->kvm_vcpu_dirty;
> > +
> > +if (regs_to_get) {
> > +kvm_arch_get_registers(args->cpu, regs_to_get);
> > +args->cpu->kvm_vcpu_dirty |= regs_to_get;
> >  }
> >  }
> >
> >  void kvm_cpu_synchronize_state(CPUArchState *env)  {
> >  CPUState *cpu = ENV_GET_CPU(env);
> > +struct kvm_cpu_syncstate_args args;
> > +
> > +args.cpu = cpu;
> > +args.regmap = KVM_REGSYNC_FULL_STATE;
> >
> > -if (!cpu->kvm_vcpu_dirty) {
> > -run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu);
> > +if (args.regmap & ~cpu->kvm_vcpu_dirty) {
> > +run_on_cpu(cpu, do_kvm_cpu_synchronize_state, &args);
> >  }
> >  }
> >
> > @@ -1513,7 +1520,7 @@ void kvm_cpu_synchronize_post_reset(CPUArchState *env)
> >  CPUState *cpu = ENV_GET_CPU(env);
> >
> >  kvm_arch_put_registers(cpu, KVM_REGSYNC_RESET_STATE);
> > -cpu->kvm_vcpu_dirty = false;
> > +cpu->kvm_vcpu_dirty &= ~KVM_REGSYNC_RESET_STATE;
> >  }
> >
> >  void kvm_cpu_synchronize_post_init(CPUArchState *env) @@ -1521,7
> > +1528,7 @@ void kvm_cpu_synchronize_post_init(CPUArchState *env)
> >  CPUState *cpu = ENV_GET_CPU(env);
> >
> >  kvm_arch_put_registers(cpu, KVM_REGSYNC_FULL_STATE);
> > -cpu->kvm_vcpu_dirty = false;
> > +cpu->kvm_vcpu_dirty &= ~KVM_REGSYNC_FULL_STATE;
> >  }
> >
> >  int kvm_cpu_exec(CPUArchState *env)
> > @@ -1540,7 +1547,7 @@ int kvm_cpu_exec(CPUArchState *env)
> >  do {
> >  if (cpu->kvm_vcpu_dirty) {
> >  kvm_arch_put_registers(cpu, KVM_REGSYNC_RUNTIME_STATE);
> > -cpu->kvm_vcpu_dirty = false;
> > +cpu->kvm_vcpu_dirty &= ~KVM_REGSYNC_RUNTIME_STATE;
> >  }
> 
> 1) This implies a vcpu can enter guest mode with kvm_vcpu_dirty non-ze

Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61

2013-06-10 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Peter Maydell [mailto:peter.mayd...@linaro.org]
> Sent: Monday, June 10, 2013 2:56 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood Scott-
> B07421; Bhushan Bharat-R65777
> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for
> target_bit above 61
> 
> On 10 June 2013 08:55, Bharat Bhushan  wrote:
> > QEMU timer supports a maximum timer of INT64_MAX. So starting timer
> > only for time which is calculated using target_bit < 62 and
> > deactivate/stop timer if the target bit is above 61.
> 
> Is this really what the hardware does? Or do we need to set a timer for the
> maximum QEMU allows and then reset it for the residual time when the first 
> timer
> expires?

No, this is not how hardware works. But the time with the maximum target bit of 
61 (with current range of CPU frequency PowerPC supports) will be many many 
years. So I think it is pretty safe to stop the timer.

Thanks
-Bharat

> 
> thanks
> -- PMM



Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61

2013-06-10 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Andreas Färber [mailto:afaer...@suse.de]
> Sent: Monday, June 10, 2013 5:43 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood Scott-
> B07421; Bhushan Bharat-R65777
> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for
> target_bit above 61
> 
> Am 10.06.2013 09:55, schrieb Bharat Bhushan:
> > QEMU timer supports a maximum timer of INT64_MAX. So starting timer
> > only for time which is calculated using target_bit < 62 and
> > deactivate/stop timer if the target bit is above 61.
> >
> > This patch also fix the time calculation from target_bit.
> > The code was doing (1 << (target_bit + 1)) while this should be (1ULL
> > << (target_bit + 1)).
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > v1->v2
> >  - Added "booke: timer:" in patch subject
> >
> >  hw/ppc/ppc_booke.c |8 +++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/hw/ppc/ppc_booke.c b/hw/ppc/ppc_booke.c index
> > e41b036..f4eda15 100644
> > --- a/hw/ppc/ppc_booke.c
> > +++ b/hw/ppc/ppc_booke.c
> > @@ -133,9 +133,15 @@ static void booke_update_fixed_timer(CPUPPCState
> *env,
> >  ppc_tb_t *tb_env = env->tb_env;
> >  uint64_t lapse;
> >  uint64_t tb;
> > -uint64_t period = 1 << (target_bit + 1);
> > +uint64_t period;
> >  uint64_t now;
> >
> > +/* Deactivate timer for target_bit > 61 */
> > +if (target_bit > 61)
> > +return;
> 
> Braces missing and trailing whitespace after return.

Ok, will correct

> 
> So IIUC we can only allow 63 bits due to signedness, thus a maximum of
> (1 << 62), thus target_bit <= 61.
> 
> Any chance at least the comment can be worded to explain that any better? 
> Maybe
> also use (target-bit + 1 >= 63) or period > INT64_MAX as condition?

How about this:
/* QEMU timer supports a maximum timer of INT64_MAX (0x7fff_).
 * Run booke fit/wdog timer when
 * ((1ULL << target_bit + 1) < 0x4000_), i.e target_bit = 61.
 * Also the time with this maximum target_bit (with current range of
 * CPU frequency PowerPC supports) will be many many years. So it is
 * pretty safe to stop the timer above this threshold. */


> 
> Best regards,
> Andreas
> 
> > +
> > +period = 1ULL << (target_bit + 1);
> > +
> >  now = qemu_get_clock_ns(vm_clock);
> >  tb  = cpu_ppc_get_tb(tb_env, now, tb_env->tb_offset);
> >
> >
> 
> 
> --
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg





Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61

2013-06-11 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Monday, June 10, 2013 11:40 PM
> To: Wood Scott-B07421
> Cc: Bhushan Bharat-R65777; Andreas Färber; qemu-...@nongnu.org; qemu-
> de...@nongnu.org; Wood Scott-B07421
> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for
> target_bit above 61
> 
> 
> On 10.06.2013, at 19:20, Scott Wood wrote:
> 
> > On 06/10/2013 09:26:18 AM, Alexander Graf wrote:
> >> On 06/10/2013 02:47 PM, Bhushan Bharat-R65777 wrote:
> >>>> -Original Message-
> >>>> From: Andreas Färber [mailto:afaer...@suse.de]
> >>>> Sent: Monday, June 10, 2013 5:43 PM
> >>>> To: Bhushan Bharat-R65777
> >>>> Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de; Wood
> >>>> Scott- B07421; Bhushan Bharat-R65777
> >>>> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer
> >>>> for target_bit above 61 So IIUC we can only allow 63 bits due to
> >>>> signedness, thus a maximum of (1<<  62), thus target_bit<= 61.
> >>>> Any chance at least the comment can be worded to explain that any
> >>>> better? Maybe also use (target-bit + 1>= 63) or period>  INT64_MAX as
> condition?
> >>> How about this:
> >>> /* QEMU timer supports a maximum timer of INT64_MAX
> (0x7fff_).
> >>>  * Run booke fit/wdog timer when
> >>>  * ((1ULL<<  target_bit + 1)<  0x4000_), i.e target_bit =
> 61.
> >>>  * Also the time with this maximum target_bit (with current range of
> >>>  * CPU frequency PowerPC supports) will be many many years. So it is
> >>>  * pretty safe to stop the timer above this threshold. */
> >> How about
> >>  /* This timeout will take years to trigger. Treat the timer as
> >> disabled. */
> >
> > There should be at least a brief mention that it's because the QEMU
> > timer can't handle larger values,
> 
> If it can't handle higher values, maybe it's better to just set the timer 
> value
> to INT64_MAX when we detect an overflow? That would make the code plainly
> obvious.
> 

What about below comment (a mix of both :)):

/* Timeout calculated with (target_bit + 1) > 62 will take
 * hundreds of years to trigger. Treat the timer as disabled.
 * Also this timeout is within the qemu supported maximum
 * timeout limit (INT64_MAX.). */

-Bharat

> 
> Alex
> 
> > with the detailed explanation in the changelog.  A better lower bound on the
> number of years would be nice as well (e.g. "hundreds of years").
> >
> > -Scott
> 





Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61

2013-06-11 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Tuesday, June 11, 2013 6:10 PM
> To: Bhushan Bharat-R65777
> Cc: Wood Scott-B07421; Andreas Färber; qemu-...@nongnu.org; qemu-
> de...@nongnu.org
> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for
> target_bit above 61
> 
> On 06/11/2013 01:40 PM, Bhushan Bharat-R65777 wrote:
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Monday, June 10, 2013 11:40 PM
> >> To: Wood Scott-B07421
> >> Cc: Bhushan Bharat-R65777; Andreas Färber; qemu-...@nongnu.org; qemu-
> >> de...@nongnu.org; Wood Scott-B07421
> >> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer
> >> for target_bit above 61
> >>
> >>
> >> On 10.06.2013, at 19:20, Scott Wood wrote:
> >>
> >>> On 06/10/2013 09:26:18 AM, Alexander Graf wrote:
> >>>> On 06/10/2013 02:47 PM, Bhushan Bharat-R65777 wrote:
> >>>>>> -Original Message-
> >>>>>> From: Andreas Färber [mailto:afaer...@suse.de]
> >>>>>> Sent: Monday, June 10, 2013 5:43 PM
> >>>>>> To: Bhushan Bharat-R65777
> >>>>>> Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de;
> >>>>>> Wood
> >>>>>> Scott- B07421; Bhushan Bharat-R65777
> >>>>>> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate
> >>>>>> timer for target_bit above 61 So IIUC we can only allow 63 bits due to
> >>>>>> signedness, thus a maximum of (1<<   62), thus target_bit<= 61.
> >>>>>> Any chance at least the comment can be worded to explain that any
> >>>>>> better? Maybe also use (target-bit + 1>= 63) or period>   INT64_MAX as
> >> condition?
> >>>>> How about this:
> >>>>>  /* QEMU timer supports a maximum timer of INT64_MAX
> >> (0x7fff_).
> >>>>>   * Run booke fit/wdog timer when
> >>>>>   * ((1ULL<<   target_bit + 1)<   0x4000_), i.e 
> >>>>> target_bit
> =
> >> 61.
> >>>>>   * Also the time with this maximum target_bit (with current range 
> >>>>> of
> >>>>>   * CPU frequency PowerPC supports) will be many many years. So it 
> >>>>> is
> >>>>>   * pretty safe to stop the timer above this threshold. */
> >>>> How about
> >>>>   /* This timeout will take years to trigger. Treat the timer as
> >>>> disabled. */
> >>> There should be at least a brief mention that it's because the QEMU
> >>> timer can't handle larger values,
> >> If it can't handle higher values, maybe it's better to just set the
> >> timer value to INT64_MAX when we detect an overflow? That would make
> >> the code plainly obvious.
> >>
> > What about below comment (a mix of both :)):
> >
> >  /* Timeout calculated with (target_bit + 1)>  62 will take
> >   * hundreds of years to trigger. Treat the timer as disabled.
> >   * Also this timeout is within the qemu supported maximum
> >   * timeout limit (INT64_MAX.). */
> 
> Ok, next question: Why does return disable the timer?

Actually here disabled means _not_ starting the timer. This function will be 
called to start timer initially and then later it is called to restart after 
every expiry. If we do not start then it is as good as stopped/disabled (it is 
not disabled in TCR). Probably saying "do not start qemu timer" or something 
similar is better than saying disabling the timer.

Thanks
-Bharat

> 
> 
> Alex
> 





Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for target_bit above 61

2013-06-11 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Tuesday, June 11, 2013 6:27 PM
> To: Bhushan Bharat-R65777
> Cc: Wood Scott-B07421; Andreas Färber; qemu-...@nongnu.org; qemu-
> de...@nongnu.org
> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer for
> target_bit above 61
> 
> On 06/11/2013 02:47 PM, Bhushan Bharat-R65777 wrote:
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Tuesday, June 11, 2013 6:10 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: Wood Scott-B07421; Andreas Färber; qemu-...@nongnu.org; qemu-
> >> de...@nongnu.org
> >> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer
> >> for target_bit above 61
> >>
> >> On 06/11/2013 01:40 PM, Bhushan Bharat-R65777 wrote:
> >>>> -Original Message-
> >>>> From: Alexander Graf [mailto:ag...@suse.de]
> >>>> Sent: Monday, June 10, 2013 11:40 PM
> >>>> To: Wood Scott-B07421
> >>>> Cc: Bhushan Bharat-R65777; Andreas Färber; qemu-...@nongnu.org;
> >>>> qemu- de...@nongnu.org; Wood Scott-B07421
> >>>> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate timer
> >>>> for target_bit above 61
> >>>>
> >>>>
> >>>> On 10.06.2013, at 19:20, Scott Wood wrote:
> >>>>
> >>>>> On 06/10/2013 09:26:18 AM, Alexander Graf wrote:
> >>>>>> On 06/10/2013 02:47 PM, Bhushan Bharat-R65777 wrote:
> >>>>>>>> -Original Message-
> >>>>>>>> From: Andreas Färber [mailto:afaer...@suse.de]
> >>>>>>>> Sent: Monday, June 10, 2013 5:43 PM
> >>>>>>>> To: Bhushan Bharat-R65777
> >>>>>>>> Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org; ag...@suse.de;
> >>>>>>>> Wood
> >>>>>>>> Scott- B07421; Bhushan Bharat-R65777
> >>>>>>>> Subject: Re: [Qemu-devel] [PATCH v2]booke: timer: Deactivate
> >>>>>>>> timer for target_bit above 61 So IIUC we can only allow 63 bits due 
> >>>>>>>> to
> >>>>>>>> signedness, thus a maximum of (1<<62), thus target_bit<= 61.
> >>>>>>>> Any chance at least the comment can be worded to explain that any
> >>>>>>>> better? Maybe also use (target-bit + 1>= 63) or period>INT64_MAX 
> >>>>>>>> as
> >>>> condition?
> >>>>>>> How about this:
> >>>>>>>   /* QEMU timer supports a maximum timer of INT64_MAX
> >>>> (0x7fff_).
> >>>>>>>* Run booke fit/wdog timer when
> >>>>>>>* ((1ULL<<target_bit + 1)<0x4000_), i.e
> target_bit
> >> =
> >>>> 61.
> >>>>>>>* Also the time with this maximum target_bit (with current 
> >>>>>>> range
> of
> >>>>>>>* CPU frequency PowerPC supports) will be many many years. So 
> >>>>>>> it
> is
> >>>>>>>* pretty safe to stop the timer above this threshold. */
> >>>>>> How about
> >>>>>>/* This timeout will take years to trigger. Treat the timer as
> >>>>>> disabled. */
> >>>>> There should be at least a brief mention that it's because the
> >>>>> QEMU timer can't handle larger values,
> >>>> If it can't handle higher values, maybe it's better to just set the
> >>>> timer value to INT64_MAX when we detect an overflow? That would
> >>>> make the code plainly obvious.
> >>>>
> >>> What about below comment (a mix of both :)):
> >>>
> >>>   /* Timeout calculated with (target_bit + 1)>   62 will take
> >>>* hundreds of years to trigger. Treat the timer as disabled.
> >>>* Also this timeout is within the qemu supported maximum
> >>>* timeout limit (INT64_MAX.). */
> >> Ok, next question: Why does return disable the timer?
> > Actually here disabled means _not_ starting the timer. This function will be
> called to start timer initially and then later it is called to restart after
> every expiry. If we do not start then it is as good a

Re: [Qemu-devel] vfio for platform devices - 9/5/2012 - minutes

2013-09-12 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, September 11, 2013 10:45 PM
> To: Yoder Stuart-B08248
> Cc: Wood Scott-B07421; Sethi Varun-B16395; Bhushan Bharat-R65777; 'Peter
> Maydell'; 'Santosh Shukla'; 'Alexander Graf'; 'Antonios Motakis'; 'Christoffer
> Dall'; 'kim.phill...@linaro.org'; kvm...@lists.cs.columbia.edu; kvm-
> p...@vger.kernel.org; qemu-devel@nongnu.org
> Subject: Re: vfio for platform devices - 9/5/2012 - minutes
> 
> On Wed, 2013-09-11 at 16:42 +, Yoder Stuart-B08248 wrote:
> >
> > > -Original Message-
> > > From: Yoder Stuart-B08248
> > > Sent: Thursday, September 05, 2013 12:51 PM
> > > To: Wood Scott-B07421; Sethi Varun-B16395; Bhushan Bharat-R65777;
> > > 'Peter Maydell'; 'Santosh Shukla'; 'Alex Williamson'; 'Alexander
> > > Graf'; 'Antonios Motakis'; 'Christoffer Dall'; 'kim.phill...@linaro.org'
> > > Cc: kvm...@lists.cs.columbia.edu; 'kvm-...@vger.kernel.org'; 'qemu-
> > > de...@nongnu.org'
> > > Subject: vfio for platform devices - 9/5/2012 - minutes
> > >
> > > We had a call with those interested and/or working on vfio for
> > > platform devices.
> > >
> > > Participants: Scott Wood, Varun Sethi, Bharat Bhushan, Peter Maydell,
> > >   Santosh Shukla, Alex Williamson, Alexander Graf,
> > >   Antonios Motakis, Christoffer Dall, Kim Phillips,
> > >   Stuart Yoder
> > >
> > > Several aspects to vfio for platform devices:
> > >
> > > 1. IOMMU groups
> > >
> > >  -iommu driver needs to register a bus notifier for the platform bus
> > >   and create groups for relevant platform devices  -Antonios is
> > > looking at this for several ARM IOMMUs  -PAMU (Freescale) driver
> > > already does this
> > >
> > > 2. unbinding device from host
> > >
> > >  PCI:
> > >echo :06:0d.0 >
> > > /sys/bus/pci/devices/:06:0d.0/driver/unbind
> > >  Platform:
> > >echo ffe101300.dma >
> > > /sys/bus/platform/devices/ffe101300.dma/driver/unbind
> > >
> > >  -don't believe there are issues or work to do here
> > >
> > > 3. binding device to vfio-platform driver
> > >
> > >  PCI:
> > >echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
> > >
> > >  -this is probably the least understood issue-- platform drivers
> > >   register themselves with the bus for a specific "name"
> > >   string.  That is matched with device tree "compatible" strings
> > >   later to bind a device to a driver  -we want is to have the
> > > vfio-platform driver to dynamically bind
> > >   to a variety of platform devices previously unknown to
> > >   vfio-platform
> > >  -ideally unbinding and binding could be an atomic operation  -Alex
> > > W pointed out that x86 could leverage this work so
> > >   keep that in mind in what we design  -Kim Phillips (Linaro) will
> > > start working on this
> >
> > One thing we didn't discuss needs to be considered (probably by Kim
> > who is looking at the 'binding device' issue) is around returning a
> > passthru device back to the host.
> >
> > After a platform device has been bound to vfio and is in use by user
> > space or a virtual machine, we also need to be able to unwind all that
> > and return the device back to the host in a sane state.
> >
> > What happens when user space exits and the vfio file descriptors are
> > closed?
> 
> For reference, expectations of how vfio-pci handles these situations:
> 
> For vfio-pci, when the reference count on the device fd drops to zero we call 
> a
> device disable function that includes disabling the bus master bit in config
> space stop ongoing DMA.

There is no bus mastering for platform devices and device disabling is not a 
standard like PCI. Do you think that we might need to do some device specific 
handling. For example for DMA controller, we need to atleast disable the DMA 
controller and mask its interrupts (may be ensure that there is no 
pending/running dma transaction, release irqs etc). As we are not yet having 
any linkage to respective kernel driver then I am not sure how we will do that 
specific handling?

> 
> > What if the device is still active and doing bus
> > mastering?   (e.g. a VM cras

Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-12-02 Thread Bhushan Bharat-R65777


> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Stuart Yoder
> Sent: Friday, December 02, 2011 8:11 PM
> To: Alex Williamson
> Cc: Alexey Kardashevskiy; aafab...@cisco.com; k...@vger.kernel.org;
> p...@au1.ibm.com; qemu-devel@nongnu.org; joerg.roe...@amd.com;
> konrad.w...@oracle.com; ag...@suse.de; d...@au1.ibm.com; chrisw@sous-
> sol.org; Yoder Stuart-B08248; io...@lists.linux-foundation.org;
> a...@redhat.com; linux-...@vger.kernel.org; Wood Scott-B07421;
> be...@cisco.com
> Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
> 
> On Thu, Dec 1, 2011 at 3:25 PM, Alex Williamson
>  wrote:
> > On Thu, 2011-12-01 at 14:58 -0600, Stuart Yoder wrote:
> >> One other mechanism we need as well is the ability to enable/disable
> >> a domain.
> >>
> >> For example-- suppose a device is assigned to a VM, the device is in
> >> use when the VM is abruptly terminated.  The VM terminate would shut
> >> off DMA at the IOMMU, but now the device is in an indeterminate
> >> state.   Some devices have no simple reset bit and getting the device
> >> back into a sane state could be complicated-- something the
> >> hypervisor doesn't want to do.
> >>
> >> So now KVM restarts the VM, vfio init happens for the device and  the
> >> IOMMU for that device is re-configured, etc, but we really can't
> >> re-enable DMA until the guest OS tells us (via an hcall) that it is
> >> ready.   The guest needs to get the assigned device in a sane state
> >> before DMA is enabled.
> >
> > Giant red flag.  We need to paravirtualize the guest?  Not on x86.
> 
> It's the reality we have to deal with, but doing this would obviously
> only apply to platforms that need it.
> 
> > Some
> > devices are better for assignment than others.  PCI devices are moving
> > towards supporting standard reset mechanisms.
> >
> >> Does this warrant a new domain enable/disable API, or should we make
> >> this part of the setup API we are discussing here?
> >
> > What's wrong with simply not adding any DMA mapping entries until you
> > think your guest is ready?  Isn't that effectively the same thing?
> > Unmap ~= disable.  If the IOMMU API had a mechanism to toggle the
> > iommu domain on and off, I wouldn't be opposed to adding an ioctl to
> > do it, but it really seems like just a shortcut vs map/unmap.  Thanks,
> 
> Yes, we could do something like that I guess.

How do we determine whether guest is ready or not? There can be multiple device 
get ready at different time.
Further if guest have given the device to it user level process or its guest. 
Should not there be some mechanism for a guest to enable/disable on per device 
or group?

Thanks
-Bharat






Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-12-02 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Friday, December 02, 2011 11:57 PM
> To: Bhushan Bharat-R65777
> Cc: Stuart Yoder; Alex Williamson; Alexey Kardashevskiy;
> aafab...@cisco.com; k...@vger.kernel.org; p...@au1.ibm.com; qemu-
> de...@nongnu.org; joerg.roe...@amd.com; konrad.w...@oracle.com;
> ag...@suse.de; d...@au1.ibm.com; chr...@sous-sol.org; Yoder Stuart-B08248;
> io...@lists.linux-foundation.org; a...@redhat.com; linux-
> p...@vger.kernel.org; Wood Scott-B07421; be...@cisco.com
> Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
> 
> On 12/02/2011 12:11 PM, Bhushan Bharat-R65777 wrote:
> > How do we determine whether guest is ready or not? There can be
> multiple device get ready at different time.
> 
> The guest makes a hypercall with a device handle -- at least that's how
> we do it in Topaz.
> 
> > Further if guest have given the device to it user level process or its
> guest. Should not there be some mechanism for a guest to enable/disable
> on per device or group?
> 
> Yes, the same mechanism can be used for that -- though in that case we'll
> also want the ability for the guest to be able to control another layer
> of mapping (which will get quite tricky with PAMU's limitations).
>  This isn't really VFIO's concern, though (unless you're talking about
> the VFIO implementation in the guest).

Scott, I am not sure if there is any real use case where device needed to 
assigned beyond 2 level (host + immediate guest) in nested virtualization.

But if there any exists then will not it be better to virtualizes the iommu 
(PAMU for Freescale)?

Thanks
-Bharat 






Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-12-02 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Friday, December 02, 2011 11:57 PM
> To: Bhushan Bharat-R65777
> Cc: Stuart Yoder; Alex Williamson; Alexey Kardashevskiy;
> aafab...@cisco.com; k...@vger.kernel.org; p...@au1.ibm.com; qemu-
> de...@nongnu.org; joerg.roe...@amd.com; konrad.w...@oracle.com;
> ag...@suse.de; d...@au1.ibm.com; chr...@sous-sol.org; Yoder Stuart-B08248;
> io...@lists.linux-foundation.org; a...@redhat.com; linux-
> p...@vger.kernel.org; Wood Scott-B07421; be...@cisco.com
> Subject: Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework
> 
> On 12/02/2011 12:11 PM, Bhushan Bharat-R65777 wrote:
> > How do we determine whether guest is ready or not? There can be
> multiple device get ready at different time.
> 
> The guest makes a hypercall with a device handle -- at least that's how
> we do it in Topaz.

Yes, it is ok from hcall with device handle perspective.
But I could not understood how cleanly this can be handled with the idea of 
enabling iommu when guest is ready.

Thanks
-Bharat

> 
> > Further if guest have given the device to it user level process or its
> guest. Should not there be some mechanism for a guest to enable/disable
> on per device or group?
> 
> Yes, the same mechanism can be used for that -- though in that case we'll
> also want the ability for the guest to be able to control another layer
> of mapping (which will get quite tricky with PAMU's limitations).
>  This isn't really VFIO's concern, though (unless you're talking about
> the VFIO implementation in the guest).

May be thinking too ahead, but will not something like this will be needed for 
nested virtualization?

Thanks
-Bharat




Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog

2012-08-01 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Wednesday, August 01, 2012 7:57 AM
> To: Bhushan Bharat-R65777
> Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan Bharat-R65777;
> qemu-devel qemu-devel; KVM list
> Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
> 
> 
> On 20.07.2012, at 07:23, Bharat Bhushan wrote:
> 
> > Enable the KVM emulated watchdog if KVM supports (use the capability
> > enablement in watchdog handler). Also watchdog exit
> > (KVM_EXIT_WATCHDOG) handling is added.
> > Watchdog state machine is cleared whenever VM state changes to running.
> > This is to handle the cases like return from debug halt etc.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > v4: Enbale watchdog support only when user specifies watchdog-action
> >Earlier this was [3/3] of the patch series. Now because i separated
> >linux-header updation in separate patch, so this become [4/4]
> >
> > v3:
> > - TSR clearing is removed in whatchdog exit handling as this is
> >   no more needed.
> >
> > v2:
> > - Merged ([PATCH 3/4] Watchdog exit handling support) and
> >   ([PATCH 4/4] Enable to use kvm emulated watchdog)
> > - Clear watchdog state machine when VM state changes to running.
> >
> > hw/ppc_booke.c   |5 
> > sysemu.h |1 +
> > target-ppc/cpu.h |1 +
> > target-ppc/kvm.c |   69 
> > ++
> > vl.c |2 +
> > 5 files changed, 78 insertions(+), 0 deletions(-)
> >
> > diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..478dbcd
> > 100644
> > --- a/hw/ppc_booke.c
> > +++ b/hw/ppc_booke.c
> > @@ -203,6 +203,11 @@ static void booke_wdt_cb(void *opaque)
> >  booke_timer->wdt_timer); }
> >
> > +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong tsr)
> > +{
> > +env->spr[SPR_BOOKE_TSR] = tsr & ~(TSR_ENW | TSR_WIS |
> > +TSR_WRS_MASK); }
> > +
> > void store_booke_tsr(CPUPPCState *env, target_ulong val) {
> > env->spr[SPR_BOOKE_TSR] &= ~val;
> > diff --git a/sysemu.h b/sysemu.h
> > index bc2c788..fc388b7 100644
> > --- a/sysemu.h
> > +++ b/sysemu.h
> > @@ -131,6 +131,7 @@ extern uint8_t *boot_splash_filedata; extern int
> > boot_splash_filedata_size; extern uint8_t qemu_extra_params_fw[2];
> > extern QEMUClock *rtc_clock;
> > +extern int enable_watchdog_support;
> >
> > #define MAX_NODES 64
> > extern int nb_numa_nodes;
> > diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index
> > ca2fc21..163389a 100644
> > --- a/target-ppc/cpu.h
> > +++ b/target-ppc/cpu.h
> > @@ -1191,6 +1191,7 @@ void store_40x_dbcr0 (CPUPPCState *env, uint32_t
> > val); void store_40x_sler (CPUPPCState *env, uint32_t val); void
> > store_booke_tcr (CPUPPCState *env, target_ulong val); void
> > store_booke_tsr (CPUPPCState *env, target_ulong val);
> > +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong
> > +tsr);
> > void booke206_flush_tlb(CPUPPCState *env, int flags, const int
> > check_iprot); target_phys_addr_t booke206_tlb_to_page_size(CPUPPCState
> > *env, ppcmas_tlb_t *tlb); int ppcmas_tlb_check(CPUPPCState *env,
> > ppcmas_tlb_t *tlb, diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> > index b6ef72d..0226b5e 100644
> > --- a/target-ppc/kvm.c
> > +++ b/target-ppc/kvm.c
> > @@ -32,6 +32,7 @@
> > #include "device_tree.h"
> > #include "hw/sysbus.h"
> > #include "hw/spapr.h"
> > +#include "hw/watchdog.h"
> >
> > #include "hw/sysbus.h"
> > #include "hw/spapr.h"
> > @@ -60,6 +61,7 @@ static int cap_booke_sregs; static int cap_ppc_smt;
> > static int cap_ppc_rma; static int cap_spapr_tce;
> > +static int cap_ppc_watchdog;
> >
> > /* XXX We have a race condition where we actually have a level triggered
> >  * interrupt, but the infrastructure can't expose that yet, so the guest
> > @@ -86,6 +88,7 @@ int kvm_arch_init(KVMState *s)
> > cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
> > cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
> > cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
> > +cap_ppc_watchdog = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_WATCHDOG);
> >
> > if (!cap_interrupt_level) {
> > fprintf(stderr, "KVM: Couldn't find level irq capability. Expect 
> > the "
> > 

Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog

2012-08-01 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Wednesday, August 01, 2012 7:57 AM
> To: Bhushan Bharat-R65777
> Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan Bharat-R65777;
> qemu-devel qemu-devel; KVM list
> Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
> 
> 
> On 20.07.2012, at 07:23, Bharat Bhushan wrote:
> 
> > Enable the KVM emulated watchdog if KVM supports (use the capability
> > enablement in watchdog handler). Also watchdog exit
> > (KVM_EXIT_WATCHDOG) handling is added.
> > Watchdog state machine is cleared whenever VM state changes to running.
> > This is to handle the cases like return from debug halt etc.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> > v4: Enbale watchdog support only when user specifies watchdog-action
> >Earlier this was [3/3] of the patch series. Now because i separated
> >linux-header updation in separate patch, so this become [4/4]
> >
> > v3:
> > - TSR clearing is removed in whatchdog exit handling as this is
> >   no more needed.
> >
> > v2:
> > - Merged ([PATCH 3/4] Watchdog exit handling support) and
> >   ([PATCH 4/4] Enable to use kvm emulated watchdog)
> > - Clear watchdog state machine when VM state changes to running.
> >
> > hw/ppc_booke.c   |5 
> > sysemu.h |1 +
> > target-ppc/cpu.h |1 +
> > target-ppc/kvm.c |   69 
> > ++
> > vl.c |2 +
> > 5 files changed, 78 insertions(+), 0 deletions(-)
> >
> > diff --git a/hw/ppc_booke.c b/hw/ppc_booke.c index 837a5b6..478dbcd
> > 100644
> > --- a/hw/ppc_booke.c
> > +++ b/hw/ppc_booke.c
> > @@ -203,6 +203,11 @@ static void booke_wdt_cb(void *opaque)
> >  booke_timer->wdt_timer); }
> >
> > +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong tsr)
> > +{
> > +env->spr[SPR_BOOKE_TSR] = tsr & ~(TSR_ENW | TSR_WIS |
> > +TSR_WRS_MASK); }
> > +
> > void store_booke_tsr(CPUPPCState *env, target_ulong val) {
> > env->spr[SPR_BOOKE_TSR] &= ~val;
> > diff --git a/sysemu.h b/sysemu.h
> > index bc2c788..fc388b7 100644
> > --- a/sysemu.h
> > +++ b/sysemu.h
> > @@ -131,6 +131,7 @@ extern uint8_t *boot_splash_filedata; extern int
> > boot_splash_filedata_size; extern uint8_t qemu_extra_params_fw[2];
> > extern QEMUClock *rtc_clock;
> > +extern int enable_watchdog_support;
> >
> > #define MAX_NODES 64
> > extern int nb_numa_nodes;
> > diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h index
> > ca2fc21..163389a 100644
> > --- a/target-ppc/cpu.h
> > +++ b/target-ppc/cpu.h
> > @@ -1191,6 +1191,7 @@ void store_40x_dbcr0 (CPUPPCState *env, uint32_t
> > val); void store_40x_sler (CPUPPCState *env, uint32_t val); void
> > store_booke_tcr (CPUPPCState *env, target_ulong val); void
> > store_booke_tsr (CPUPPCState *env, target_ulong val);
> > +void ppc_booke_watchdog_clear_tsr(CPUPPCState *env, target_ulong
> > +tsr);
> > void booke206_flush_tlb(CPUPPCState *env, int flags, const int
> > check_iprot); target_phys_addr_t booke206_tlb_to_page_size(CPUPPCState
> > *env, ppcmas_tlb_t *tlb); int ppcmas_tlb_check(CPUPPCState *env,
> > ppcmas_tlb_t *tlb, diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> > index b6ef72d..0226b5e 100644
> > --- a/target-ppc/kvm.c
> > +++ b/target-ppc/kvm.c
> > @@ -32,6 +32,7 @@
> > #include "device_tree.h"
> > #include "hw/sysbus.h"
> > #include "hw/spapr.h"
> > +#include "hw/watchdog.h"
> >
> > #include "hw/sysbus.h"
> > #include "hw/spapr.h"
> > @@ -60,6 +61,7 @@ static int cap_booke_sregs; static int cap_ppc_smt;
> > static int cap_ppc_rma; static int cap_spapr_tce;
> > +static int cap_ppc_watchdog;
> >
> > /* XXX We have a race condition where we actually have a level triggered
> >  * interrupt, but the infrastructure can't expose that yet, so the guest
> > @@ -86,6 +88,7 @@ int kvm_arch_init(KVMState *s)
> > cap_ppc_smt = kvm_check_extension(s, KVM_CAP_PPC_SMT);
> > cap_ppc_rma = kvm_check_extension(s, KVM_CAP_PPC_RMA);
> > cap_spapr_tce = kvm_check_extension(s, KVM_CAP_SPAPR_TCE);
> > +cap_ppc_watchdog = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_WATCHDOG);
> >
> > if (!cap_interrupt_level) {
> > fprintf(stderr, "KVM: Couldn't find level irq capability. Expect 
> > the "
> > 

Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog

2012-08-02 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Wednesday, August 01, 2012 11:31 PM
> To: Bhushan Bharat-R65777
> Cc: Alexander Graf; qemu-...@nongnu.org List; kvm-...@vger.kernel.org; qemu-
> devel qemu-devel; KVM list
> Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
> 
> On 08/01/2012 12:27 PM, Bhushan Bharat-R65777 wrote:
> >
> >
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Wednesday, August 01, 2012 7:57 AM
> >> To: Bhushan Bharat-R65777
> >> Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan
> >> Bharat-R65777; qemu-devel qemu-devel; KVM list
> >> Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
> >>
> >>
> >> On 20.07.2012, at 07:23, Bharat Bhushan wrote:
> >>> @@ -384,6 +437,15 @@ int kvm_arch_init_vcpu(CPUPPCState *cenv)
> >>> return ret;
> >>> }
> >>>
> >>> +if (enable_watchdog_support) {
> >>> +ret = kvm_watchdog_enable(cenv);
> >>
> >> Do you think this is a good idea? Why would real hardware not
> >> implement a watchdog just because the user didn't select an action?
> >
> > If there is no watchdog action then why we want to run watchdog timer?
> 
> On real hardware, if software sets WRC to a non-zero value, the watchdog 
> action
> is a system reset.  The user doesn't have to do anything special.

Scott, maybe I misunderstood you comment, did not you commented to enable kvm 
watchdog if there is watchdog action provided by use. 

Thanks
-Bharat



Re: [Qemu-devel] [PATCH 4/4] Enable kvm emulated watchdog

2012-08-02 Thread Bhushan Bharat-R65777


> -Original Message-
> From: Wood Scott-B07421
> Sent: Friday, August 03, 2012 2:02 AM
> To: Bhushan Bharat-R65777
> Cc: Wood Scott-B07421; Alexander Graf; qemu-...@nongnu.org List; kvm-
> p...@vger.kernel.org; qemu-devel qemu-devel; KVM list
> Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
> 
> On 08/02/2012 10:56 AM, Bhushan Bharat-R65777 wrote:
> >
> >
> >> -Original Message-
> >> From: Wood Scott-B07421
> >> Sent: Wednesday, August 01, 2012 11:31 PM
> >> To: Bhushan Bharat-R65777
> >> Cc: Alexander Graf; qemu-...@nongnu.org List;
> >> kvm-...@vger.kernel.org; qemu- devel qemu-devel; KVM list
> >> Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
> >>
> >> On 08/01/2012 12:27 PM, Bhushan Bharat-R65777 wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Alexander Graf [mailto:ag...@suse.de]
> >>>> Sent: Wednesday, August 01, 2012 7:57 AM
> >>>> To: Bhushan Bharat-R65777
> >>>> Cc: qemu-...@nongnu.org List; kvm-...@vger.kernel.org; Bhushan
> >>>> Bharat-R65777; qemu-devel qemu-devel; KVM list
> >>>> Subject: Re: [PATCH 4/4] Enable kvm emulated watchdog
> >>>>
> >>>>
> >>>> On 20.07.2012, at 07:23, Bharat Bhushan wrote:
> >>>>> @@ -384,6 +437,15 @@ int kvm_arch_init_vcpu(CPUPPCState *cenv)
> >>>>> return ret;
> >>>>> }
> >>>>>
> >>>>> +if (enable_watchdog_support) {
> >>>>> +ret = kvm_watchdog_enable(cenv);
> >>>>
> >>>> Do you think this is a good idea? Why would real hardware not
> >>>> implement a watchdog just because the user didn't select an action?
> >>>
> >>> If there is no watchdog action then why we want to run watchdog timer?
> >>
> >> On real hardware, if software sets WRC to a non-zero value, the
> >> watchdog action is a system reset.  The user doesn't have to do anything
> special.
> >
> > Scott, maybe I misunderstood you comment, did not you commented to
> > enable kvm watchdog if there is watchdog action provided by use.
> 
> I changed my mind. :-)
> 
> The main difference between the user specifically asking for an action of 
> system
> reset, and the user not asking for anything, is that only in the former case
> should we error out if watchdog functionality isn't available -- but if it's a
> pain to distinguish, don't error out in either case.

:-) ..  ok 

-Bharat

> 
> -Scott



[Qemu-devel] Query: gdbstub software breakpoint removal

2011-03-21 Thread Bhushan Bharat-R65777
Hi All,

I can see that the software breakpoint queue is cleared in 
kvm_remove_breakpoint() in kvm-all.c.  I.e 
QTAILQ_REMOVE(¤t_env->kvm_state->kvm_sw_breakpoints, bp, entry); is 
called when the breakpoint is cleared.

While the queue is not cleared on kvm_remove_all_breakpoints();
Is there any specific reason for that?


Thanks
-Bharat



[Qemu-devel] RE: Query: gdbstub software breakpoint removal

2011-03-21 Thread Bhushan Bharat-R65777
Hi Jan,

I will send a patch.

Thanks
-Bharat

> -Original Message-
> From: Jan Kiszka [mailto:jan.kis...@siemens.com]
> Sent: Monday, March 21, 2011 5:04 PM
> To: Bhushan Bharat-R65777
> Cc: qemu-devel@nongnu.org
> Subject: Re: Query: gdbstub software breakpoint removal
> 
> On 2011-03-21 12:01, Bhushan Bharat-R65777 wrote:
> > Hi All,
> >
> > I can see that the software breakpoint queue is cleared in
> kvm_remove_breakpoint() in kvm-all.c.  I.e QTAILQ_REMOVE(¤t_env-
> >kvm_state->kvm_sw_breakpoints, bp, entry); is called when the breakpoint
> is cleared.
> >
> > While the queue is not cleared on kvm_remove_all_breakpoints(); Is
> > there any specific reason for that?
> 
> Hmm, no particular reason that I recall. This rather looks like a very
> old leak. Patch welcome.
> 
> Jan
> 
> --
> Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence
> Center Embedded Linux