date:20150618

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Paolo Bonzini



On 18/06/2015 08:39, Peter Lieven wrote:
> 
> It seems like the mainloop is waiting here:
> 
> #0  0x7606c89c in __lll_lock_wait ()
>from /lib/x86_64-linux-gnu/libpthread.so.0
> No symbol table info available.
> #1  0x76068065 in _L_lock_858 ()
>from /lib/x86_64-linux-gnu/libpthread.so.0
> No symbol table info available.
> #2  0x76067eba in pthread_mutex_lock ()
>from /lib/x86_64-linux-gnu/libpthread.so.0
> No symbol table info available.
> #3  0x559f2557 in qemu_mutex_lock (mutex=0x55ed6d40)
> at util/qemu-thread-posix.c:76
> err = 0
> __func__ = "qemu_mutex_lock"
> #4  0x556306ef in qemu_mutex_lock_iothread ()
> at /usr/src/qemu-2.2.0/cpus.c:1123
> No locals.

This means the VCPU is busy with some synchronous activity---maybe a
bdrv_aio_cancel?

Paolo

Re: [Qemu-devel] [PATCH v7 0/9] Add limited support of VMware's hyper-call rpc

2015-06-18 Thread Gerd Hoffmann

  Hi,

>  we enable this thing by default (why do we?)

Historical reasons :(

At least we recently got an option
to turn it off (-machine $name,vmport=off)

cheers,
  Gerd

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven


Am 18.06.2015 um 08:59 schrieb Paolo Bonzini:


On 18/06/2015 08:39, Peter Lieven wrote:

It seems like the mainloop is waiting here:

#0  0x7606c89c in __lll_lock_wait ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x76068065 in _L_lock_858 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x76067eba in pthread_mutex_lock ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x559f2557 in qemu_mutex_lock (mutex=0x55ed6d40)
 at util/qemu-thread-posix.c:76
 err = 0
 __func__ = "qemu_mutex_lock"
#4  0x556306ef in qemu_mutex_lock_iothread ()
 at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.

This means the VCPU is busy with some synchronous activity---maybe a
bdrv_aio_cancel?


Here is what the other threads are doing (dropped VNC thread):

Thread 3 (Thread 0x74d4f700 (LWP 2637)):
#0  0x7606c89c in __lll_lock_wait ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x76068065 in _L_lock_858 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x76067eba in pthread_mutex_lock ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x559f2557 in qemu_mutex_lock (mutex=0x55ed6d40)
at util/qemu-thread-posix.c:76
err = 0
__func__ = "qemu_mutex_lock"
#4  0x556306ef in qemu_mutex_lock_iothread ()
at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.
#5  0x5564b9ac in kvm_cpu_exec (cpu=0x563cb870)
at /usr/src/qemu-2.2.0/kvm-all.c:1770
run = 0x77ee2000
ret = 65536
run_ret = -4
#6  0x556301dc in qemu_kvm_cpu_thread_fn (arg=0x563cb870)
at /usr/src/qemu-2.2.0/cpus.c:953
cpu = 0x563cb870
r = 65536
#5  0x75d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#6  0x in ?? ()
No symbol table info available.

Thread 3 (Thread 0x74d4f700 (LWP 2637)):
#0  0x7606c89c in __lll_lock_wait ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x76068065 in _L_lock_858 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x76067eba in pthread_mutex_lock ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x559f2557 in qemu_mutex_lock (mutex=0x55ed6d40)
at util/qemu-thread-posix.c:76
err = 0
__func__ = "qemu_mutex_lock"
#4  0x556306ef in qemu_mutex_lock_iothread ()
at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.
#5  0x5564b9ac in kvm_cpu_exec (cpu=0x563cb870)
at /usr/src/qemu-2.2.0/kvm-all.c:1770
run = 0x77ee2000
ret = 65536
run_ret = -4
#6  0x556301dc in qemu_kvm_cpu_thread_fn (arg=0x563cb870)
at /usr/src/qemu-2.2.0/cpus.c:953
cpu = 0x563cb870
r = 65536
#7  0x76065e9a in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x75d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x in ?? ()
No symbol table info available.

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
timeout=4999424576) at qemu-timer.c:326
ts = {tv_sec = 4, tv_nsec = 999424576}
tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
at aio-posix.c:231
node = 0x0
was_dispatching = false
ret = 1
progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0, offset=4292007936,
qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
aio_context = 0x563528e0
co = 0x563888a0
rwco = {bs = 0x5637eae0, offset = 4292007936,
  qiov = 0x7554f760, is_write = false, ret = 2147483647, flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0, sector_num=8382828,
buf=0x744cc800 "(", nb_sectors=4, is_write=false, flags=0)
at block.c:2722
qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size = 2048}
iov = {iov_base = 0x744cc800, iov_len = 2048}
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#8  0x75d9338d in clone () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#9  0x in ?? ()
No symbol table info available.

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
tim

Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386

2015-06-18 Thread Pavel Dovgaluk

> From: Aurelien Jarno [mailto:aurel...@aurel32.net]
> On 2015-06-17 15:41, Pavel Dovgalyuk wrote:
> > In icount mode every translation block looks as follows:
> >
> > if icount < n then exit
> > icount -= n
> > instr1
> > instr2
> > ...
> > instrn
> > exit
> >
> > When one of these instructions initiates an exception, icount should be
> > restored and adjusted number of instructions should be subtracted from 
> > icount
> > instead of initial n.
> >
> > tlb_fill function passes retaddr to raise_exception, which allows restoring
> > current instructions in TB and correct icount calculation.
> >
> > When exception triggered with other function (e.g. by embedding call to
> > exception raising helper into TB), then PC is not passed as retaddr and
> > correct icount is not recovered. In such cases icount will be decreased
> > by the value equal to the size of TB.
> 
> Looking at how icount work, I see it's basically a variable in the CPU
> state (icount_decr.u16.low), which is already accessed from the TB.
> Couldn't we adjust it using additional code before generating an
> exception, when in icount mode.
> 
> For example for MIPS, we can add some code before generate_exception
> which use the value from s->gen_opc_icount[j] to adjust
> the variable icount_decr.u16.low.

It is possible, but it will incur additional overhead, because we will 
have to update icount every time the exception might be generated.
We'll have to update icount value before and after every helper call, 
that can cause an exception:

icount -= n
...
instr_k
icount += n - k
helper
icount -= n - k
...

And this overhead will slowdown the code even if no exception occur.

Pavel Dovgalyuk

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven


Am 18.06.2015 um 09:03 schrieb Peter Lieven:

Am 18.06.2015 um 08:59 schrieb Paolo Bonzini:


On 18/06/2015 08:39, Peter Lieven wrote:

It seems like the mainloop is waiting here:

#0  0x7606c89c in __lll_lock_wait ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x76068065 in _L_lock_858 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2  0x76067eba in pthread_mutex_lock ()
from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#3  0x559f2557 in qemu_mutex_lock (mutex=0x55ed6d40)
 at util/qemu-thread-posix.c:76
 err = 0
 __func__ = "qemu_mutex_lock"
#4  0x556306ef in qemu_mutex_lock_iothread ()
 at /usr/src/qemu-2.2.0/cpus.c:1123
No locals.

This means the VCPU is busy with some synchronous activity---maybe a
bdrv_aio_cancel?


Here is what the other threads are doing (dropped VNC thread):


Sorry, sth messed up while copying the buffer. Here should be the correct 
output:

(gdb) thread apply all bt full

Thread 4 (Thread 0x7fffee9ff700 (LWP 2640)):
#0  0x76069d84 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0x559f27ae in qemu_cond_wait (cond=0x563beed0,
mutex=0x563bef00) at util/qemu-thread-posix.c:135
err = 0
__func__ = "qemu_cond_wait"
#2  0x5593f12e in vnc_worker_thread_loop (queue=0x563beed0)
at ui/vnc-jobs.c:222
job = 0x5637bbd0
entry = 0x0
tmp = 0x0
vs = {csock = -1, dirty = {{0, 0, 0} },
  lossy_rect = 0x563ecd10, vd = 0x74465010, need_update = 0,
  force_update = 0, has_dirty = 0, features = 195, absolute = 0,
  last_x = 0, last_y = 0, last_bmask = 0, client_width = 0,
  client_height = 0, share_mode = 0, vnc_encoding = 5, major = 0,
  minor = 0, auth = 0, challenge = '\000' ,
  info = 0x0, output = {capacity = 6257, offset = 1348,
buffer = 0x7fffe4000d10 ""}, input = {capacity = 0, offset = 0,
buffer = 0x0},
  write_pixels = 0x55925d57 , client_pf = {
bits_per_pixel = 32 ' ', bytes_per_pixel = 4 '\004',
depth = 24 '\030', rmask = 16711680, gmask = 65280, bmask = 255,
amask = 0, rshift = 16 '\020', gshift = 8 '\b', bshift = 0 '\000',
ashift = 24 '\030', rmax = 255 '\377', gmax = 255 '\377',
bmax = 255 '\377', amax = 0 '\000', rbits = 8 '\b',
gbits = 8 '\b', bbits = 8 '\b', abits = 0 '\000'},
  client_format = 0, client_be = false, audio_cap = 0x0, as = {
freq = 0, nchannels = 0, fmt = AUD_FMT_U8, endianness = 0},
  read_handler = 0, read_handler_expect = 0,
  modifiers_state = '\000' , led = 0x0,
  abort = false, initialized = false, output_mutex = {lock = {
  __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
__kind = 0, __spins = 0, __list = {__prev = 0x0,
  __next = 0x0}}, __size = '\000' ,
  __align = 0}}, bh = 0x0, jobs_buffer = {capacity = 0,
offset = 0, buffer = 0x0}, tight = {type = 0,
quality = 255 '\377', compression = 9 '\t', pixel24 = 0 '\000',
tight = {capacity = 0, offset = 0, buffer = 0x0}, tmp = {
  capacity = 0, offset = 0, buffer = 0x0}, zlib = {capacity = 0,
  offset = 0, buffer = 0x0}, gradient = {capacity = 0, offset = 0,
  buffer = 0x0}, levels = {0, 0, 0, 0}, stream = {{next_in = 0x0,
avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0,
total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0,
opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {
next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0,
avail_out = 0, total_out = 0, msg = 0x0, state = 0x0,
zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0,
reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0,
next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0,
state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0,
data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0,
avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0,
total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0,
opaque = 0x0, data_type = 0, adler = 0, reserved = 0}}},
  zlib = {zlib = {capacity = 0, offset = 0, buffer = 0x0}, tmp = {
  capacity = 0, offset = 0, buffer = 0x0}, stream = {
  next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0,
  avail_out = 0, total_out = 0, msg = 0x0, state = 0x0,
  zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0,
  reserved =

Re: [Qemu-devel] linux-user crashes on clone(2) when run on ppc host

2015-06-18 Thread Peter Maydell

On 17 June 2015 at 22:36, Emilio G. Cota  wrote:
> On Wed, Jun 17, 2015 at 09:58:27 +0100, Peter Maydell wrote:
>> On 17 June 2015 at 01:52, Emilio G. Cota  wrote:
>> > I'm having trouble running a simple multithreaded program on a PowerPC 
>> > host machine.
>> >
>> > The machine I'm using is a ppc VM--I think it's running under KVM (I'm 
>> > using
>> > OVH's RunAbove Power8 service):
>> >   admin@adsf:~/qemu$ uname -a
>> >   Linux adsf 3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:27:09 UTC 2014 
>> > ppc64le ppc64le ppc64le GNU/Linux
>> >
>> > admin@adsf:~/qemu$ ppc64le-linux-user/qemu-ppc64le foo
>>
>> Multithreaded binaries don't work with linux-user; there are a bunch
>> of known race conditions involving data structures we don't correctly
>> lock or make per-thread.
>>
>> This is a long-standing issue; we're hoping we might get to fixing
>> it some time this year.
>
> I don't think this is a race because it also breaks when
> run on a single core (with taskset -c 0).
>
> What data structures are you referring to? Are they ppc-specific?

None of the code generation data structures are locked at all --
if two threads try to generate code at the same time they'll
tend to clobber each other.

> On x86 hosts linux-user works reliably with multithreaded apps.

No, it doesn't :-) If any multithreaded app happens to run on
any host it is pure fluke.

-- PMM

Re: [Qemu-devel] [ Patch ] for CVE-2015-3242

2015-06-18 Thread Peter Maydell

On 18 June 2015 at 03:40, 罗大龙  wrote:
> /qemu-2.3.0/hw/arm/pxa2xx.c
>
> --- pxa2xx.c.new2015-06-15 17:40:59.285002592 +0800
> +++ pxa2xx.c2015-06-15 17:43:47.001002592 +0800
> @@ -1986,6 +1986,10 @@
>
>  s->rx_len = qemu_get_byte(f);
>  s->rx_start = 0;
> +   if (s->rx_len < 0 || s->rx_len > ARRAY_SIZE(s->rx_fifo)) {
> +   return -EINVAL;
> +   }
> +
>  for (i = 0; i < s->rx_len; i ++)
>  s->rx_fifo[i] = qemu_get_byte(f);

Hi. I'm afraid I can't apply this, you have provided no
Signed-off-by: (and no commit message either). Also, the
code you are trying to patch does not exist in QEMU master.

NB: we do not consider bugs in the pxa2xx board to be
security issues -- the code was never written with the
expectation of being able to defend against malicious
guests, and certainly not against malicious incoming
migration data (as here). Treat it as a developer tool,
not a security boundary.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.

2015-06-18 Thread Li, Liang Z

> diff --git a/docs/migration.txt b/docs/migration.txt index f6df4be..b4b93d1
> 100644
> --- a/docs/migration.txt
> +++ b/docs/migration.txt
> @@ -291,3 +291,170 @@ save/send this state when we are in the middle of a
> pio operation  (that is what ide_drive_pio_state_needed() checks).  If
> DRQ_STAT is  not enabled, the values on that fields are garbage and don't
> need to  be sent.
> +
> += Return path =
> +
> +In most migration scenarios there is only a single data path that runs
> +from the source VM to the destination, typically along a single fd
> +(although possibly with another fd or similar for some fast way of throwing
> pages across).
> +
> +However, some uses need two way communication; in particular the
> +Postcopy destination needs to be able to request pages on demand from
> the source.
> +
> +For these scenarios there is a 'return path' from the destination to
> +the source;
> +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for
> +the return path.
> +
> +  Source side
> + Forward path - written by migration thread
> + Return path  - opened by main thread, read by return-path thread
> +
> +  Destination side
> + Forward path - read by main thread
> + Return path  - opened by main thread, written by main thread AND
> postcopy
> +thread (protected by rp_mutex)
> +
> += Postcopy =
> +'Postcopy' migration is a way to deal with migrations that refuse to
> +converge; its plus side is that there is an upper bound on the amount
> +of migration traffic and time it takes, the down side is that during
> +the postcopy phase, a failure of
> +*either* side or the network connection causes the guest to be lost.

Hi David,

Do you have any idea or plan to deal with the failure happened during the 
postcopy phase?

Lost the guest  is too frightening for a cloud provider, we have a discussion 
with 
Alibaba, they said that they can't use the postcopy feature unless there is a 
mechanism to
find the guest back.

Liang

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Kevin Wolf

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
> Thread 2 (Thread 0x75550700 (LWP 2636)):
> #0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
> No symbol table info available.
> #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
> timeout=4999424576) at qemu-timer.c:326
> ts = {tv_sec = 4, tv_nsec = 999424576}
> tvsec = 4
> #2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
> at aio-posix.c:231
> node = 0x0
> was_dispatching = false
> ret = 1
> progress = false
> #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0, offset=4292007936,
> qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
> aio_context = 0x563528e0
> co = 0x563888a0
> rwco = {bs = 0x5637eae0, offset = 4292007936,
>   qiov = 0x7554f760, is_write = false, ret = 2147483647, flags = 
> 0}
> #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0, sector_num=8382828,
> buf=0x744cc800 "(", nb_sectors=4, is_write=false, flags=0)
> at block.c:2722
> qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size = 2048}
> iov = {iov_base = 0x744cc800, iov_len = 2048}
> #5  0x5594b008 in bdrv_read (bs=0x5637eae0, sector_num=8382828,
> buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
> No locals.
> #6  0x5599acef in blk_read (blk=0x56376820, sector_num=8382828,
> buf=0x744cc800 "(", nb_sectors=4) at block/block-backend.c:404
> No locals.
> #7  0x55833ed2 in cd_read_sector (s=0x56408f88, lba=2095707,
> buf=0x744cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
> ret = 32767

Here is the problem: The ATAPI emulation uses synchronous blk_read()
instead of the AIO or coroutine interfaces. This means that it keeps
polling for request completion while it holds the BQL until the request
is completed.

We can (and should) fix that, otherwise the VCPUs is blocked while we're
reading from the image, even without a hang. It doesn't fully fix your
problem, though, as bdrv_drain_all() and friends still exist.

Kevin

> #8  0x55834202 in ide_atapi_cmd_reply_end (s=0x56408f88)
> at hw/ide/atapi.c:190
> byte_count_limit = 21845
> size = 1801980
> ret = 0
> #9  0x55834657 in ide_atapi_cmd_read_pio (s=0x56408f88,
> lba=2095707, nb_sectors=16, sector_size=2048) at hw/ide/atapi.c:279
> No locals.
> #10 0x55834b25 in ide_atapi_cmd_read (s=0x56408f88, lba=2095707,
> nb_sectors=16, sector_size=2048) at hw/ide/atapi.c:393
> No locals.
> #11 0x558358ed in cmd_read (s=0x56408f88, buf=0x744cc800 "(")
> at hw/ide/atapi.c:824
> nb_sectors = 16
> lba = 2095707
> #12 0x55836373 in ide_atapi_cmd (s=0x56408f88)
> at hw/ide/atapi.c:1152
> buf = 0x744cc800 "("
> #13 0x558323e1 in ide_data_writew (opaque=0x56408f08, addr=368,
> val=0) at hw/ide/core.c:2020
> bus = 0x56408f08
> s = 0x56408f88
> p = 0x744cc80c "IHDR"
> #14 0x5564285f in portio_write (opaque=0x5641d5d0, addr=0, data=0,
> size=2) at /usr/src/qemu-2.2.0/ioport.c:204
> mrpio = 0x5641d5d0
> mrp = 0x5641d6f8
> __PRETTY_FUNCTION__ = "portio_write"
> #15 0x5564f07c in memory_region_write_accessor (mr=0x5641d5d0,
> addr=0, value=0x7554fb28, size=2, shift=0, mask=65535)
> at /usr/src/qemu-2.2.0/memory.c:443
> tmp = 0
> #16 0x5564f1c4 in access_with_adjusted_size (addr=0,
> value=0x7554fb28, size=2, access_size_min=1, access_size_max=4,
> access=0x5564efe0 , mr=0x5641d5d0)
> at /usr/src/qemu-2.2.0/memory.c:480
> access_mask = 65535
> access_size = 2
> i = 0
> #17 0x5565209f in memory_region_dispatch_write (mr=0x5641d5d0,
> addr=0, data=0, size=2) at /usr/src/qemu-2.2.0/memory.c:1117
> No locals.
> #18 0x556559c7 in io_mem_write (mr=0x5641d5d0, addr=0, val=0,
> size=2) at /usr/src/qemu-2.2.0/memory.c:1973
> No locals.
> #19 0x555fc4be in address_space_rw (as=0x55e7a880, addr=368,
> buf=0x77ee6000 "", len=2, is_write=true)
> at /usr/src/qemu-2.2.0/exec.c:2141
> l = 2
> ptr = 0x5567a7a6 "H\213E\370dH3\004%("
> val = 0
> addr1 = 0
> mr = 0x5641d5d0
> error = false
> #20 0x5564b454 in kvm_handle_io (port=368, data=0x77ee6000,
> direction=1, size=2, count=1) at /usr/src/qemu-2.2.0/kvm-all.c:1632
> i = 0
> ptr = 0x77ee6000 ""
> #21 0x5564baa4 in kvm_cpu_exec (cpu=0x5638e7e0)
> at /usr/src/qemu-2.2.0/kvm-all.c:1789
> run = 0x77ee5000
> ret = 0
> run_ret = 0
> #22 0x556301dc in qemu_kvm_cpu_thread_fn (arg=0x5638e7e0)
> at /usr/src/qemu-

Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386

2015-06-18 Thread Peter Maydell

On 18 June 2015 at 08:12, Pavel Dovgaluk  wrote:
>> From: Aurelien Jarno [mailto:aurel...@aurel32.net]
>> Looking at how icount work, I see it's basically a variable in the CPU
>> state (icount_decr.u16.low), which is already accessed from the TB.
>> Couldn't we adjust it using additional code before generating an
>> exception, when in icount mode.
>>
>> For example for MIPS, we can add some code before generate_exception
>> which use the value from s->gen_opc_icount[j] to adjust
>> the variable icount_decr.u16.low.
>
> It is possible, but it will incur additional overhead, because we will
> have to update icount every time the exception might be generated.
> We'll have to update icount value before and after every helper call,
> that can cause an exception:
>
> icount -= n
> ...
> instr_k
> icount += n - k
> helper
> icount -= n - k
> ...
>
> And this overhead will slowdown the code even if no exception occur.

Right, this is a tradeoff: in some cases it's faster to assume
no exception and handle state resync by doing a retranslate.
In some cases it's faster to assume there will be an exception
and do a manual sync. Guest load/store is obviously in the
first category. Guest doing an instruction which always takes
an exception (like syscall insns) is in the second category.
For other cases there's a choice. We need to support both
approaches; obviously you can argue for any particular case
whether it should be approach 1 or approach 2.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386

2015-06-18 Thread Pavel Dovgaluk

> From: Peter Maydell [mailto:peter.mayd...@linaro.org]
> On 18 June 2015 at 08:12, Pavel Dovgaluk  wrote:
> >> From: Aurelien Jarno [mailto:aurel...@aurel32.net]
> >> Looking at how icount work, I see it's basically a variable in the CPU
> >> state (icount_decr.u16.low), which is already accessed from the TB.
> >> Couldn't we adjust it using additional code before generating an
> >> exception, when in icount mode.
> >>
> >> For example for MIPS, we can add some code before generate_exception
> >> which use the value from s->gen_opc_icount[j] to adjust
> >> the variable icount_decr.u16.low.
> >
> > It is possible, but it will incur additional overhead, because we will
> > have to update icount every time the exception might be generated.
> > We'll have to update icount value before and after every helper call,
> > that can cause an exception:
> >
> > icount -= n
> > ...
> > instr_k
> > icount += n - k
> > helper
> > icount -= n - k
> > ...
> >
> > And this overhead will slowdown the code even if no exception occur.
> 
> Right, this is a tradeoff: in some cases it's faster to assume
> no exception and handle state resync by doing a retranslate.
> In some cases it's faster to assume there will be an exception
> and do a manual sync. Guest load/store is obviously in the
> first category. Guest doing an instruction which always takes
> an exception (like syscall insns) is in the second category.
> For other cases there's a choice. We need to support both
> approaches; obviously you can argue for any particular case
> whether it should be approach 1 or approach 2.

Syscall and non-implemented instructions are in third category - they
always take an exception. In this case the translation should be stopped
without any additional actions.

By the way, I implemented this 'third category' approach for mips
and measured the performance. It does not show any performance degradation
when compared to original unfixed version.
All other exception-generating helpers and instructions use approach 1.

Pavel Dovgalyuk

Re: [Qemu-devel] [PATCH v7 0/9] Add limited support of VMware's hyper-call rpc

2015-06-18 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 06:26:06PM -0400, Don Slutz wrote:
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 06/17/15 09:44, Paolo Bonzini wrote:
> >
> > On 12/06/2015 16:05, Don Slutz wrote:
> >> Changes v6 to v7:
> ...
> > Looks good, feel free to send out patches 1+2+3+9 in a pull request if
> > you want.
> 
> If I am reading this correctly, I should add
> 
> Acked-by: Paolo Bonzini 
> 
> to these 4 patches.
> 
> Since I have never sent a pull request to QEMU before here is what I
> think should be in it:

I'd like to see a version with comments addressed first.


> 
> The following changes since commit f754c3c9cce3c4789733d9068394be4256dfe6a8:
> 
>   Merge remote-tracking branch
> 'remotes/agraf/tags/signed-s390-for-upstream' into staging (2015-06-17
> 12:43:26 +0100)
> 
> are available in the git repository at:
> 
> 
>   g...@github.com:dslutz/qemu.git vmware_pull_v7
> 
> for you to fetch changes up to a9e61af94c4452270521638c6bac11262ff2f2b7:
> 
>   MAINTAINERS: add VMware port (2015-06-17 18:21:02 -0400)
> 
> - 
> Don Slutz (4):
>   vmport: The io memory region needs to be at least a size of 4
>   vmport: Switch to trace
>   vmport: Fix vmport_cmd_ram_size
>   MAINTAINERS: add VMware port
> 
>  MAINTAINERS  |  7 +++
>  hw/misc/vmport.c | 15 +--
>  trace-events |  4 
>  3 files changed, 20 insertions(+), 6 deletions(-)
> 
> 
> Not clear at all about signing this pull.
> 
>-Don Slutz
> 
> > Thanks,
> >
> > Paolo
> >
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.22 (GNU/Linux)
> 
> iQIcBAEBAgAGBQJVgfP9AAoJEMH0lYjxq9KcR2YP/RRWoCXrROAdiJfqEKOdrccn
> ZrIOF+DT2nDD7VvzYvRvPNXaCZk7ugwzqHeMBTBzwFHXpxsoFMWSQkO1rwl5S7GC
> T4JHMwOBNcKY3DUasbAHjE/BPF2nJF+fkBmb6lhH7gbOSaSc/CB7JNc9UkMlBMFG
> BQj0vpcgGJnhGj3eL+aQ6juulNyWioHSsSoLAMLr8wB6ywpxm2W3lrzQhgvp7v5U
> Qn7r5Fp5AfmzP3Pxl2iS0bW7wligfPX0UgNw5GM1OIVnNHtZaRE0zKpdy2zvBfFs
> XS/DlWK1v6VcUyb0ywi1uFmnprmluiycKBafozK8wHgi75THw1MKY6G1V2GTWt6O
> PlCQnF9VOmKVzrg7YIa+7alyyJSeh7ELfBZOAl4zejbg1Mp7CoW7wvOOVX4TuJJ8
> S5nAIYJa0XfkRsaNYRD3LR6i7u35CA0f2dzRSXepDIkTmFihYA3i/tBZyjIc8+vP
> 9W1txyWyi9i8S6pzC5UCpERvq5+bOYAi/LaOKnh5N2TiILIZ91vK+o8zTkc1J5e2
> DBmE5cHRYHLf3HEPf5+wahQLKdCLV3cKF7LwkJyZEpfYW31jBlt2ePA2w5ebJenN
> KJ+rS9SKswUbvgC+EdR9TQHvmSQjJyVx20dsZ0CALL3lwxFFbXRsPleKpCFmx2Q4
> D2vNnwaa6GFzM48CNjcO
> =xLeM
> -END PGP SIGNATURE-

Re: [Qemu-devel] libcacard: use the library?

2015-06-18 Thread Paolo Bonzini



On 17/06/2015 22:15, Michael Tokarev wrote:
> I tried autoconf&automake&libtool.  It is a HugeMess, I disliked it.
> So I rewrote it as a simple shell script.
> 
> The result of both attempts is available at 
> http://www.corpit.ru/mjt/tmp/libcacard/
> There are 4 files in there:
> 
>  configure.ac Makefile.am -- auto*shit version, requires bootstrap like
>   libtoolize && aclocal && automake --foreign --add-missing && autoconf

More like autoreconf -fvi.

>  configure Makefile.in -- my small version based on what qemu ./configure
>   currently does.

Doesn't have dependency tracking.  That's already a no-no I think.

Paolo

Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.

2015-06-18 Thread Dr. David Alan Gilbert

* Li, Liang Z (liang.z...@intel.com) wrote:
> > diff --git a/docs/migration.txt b/docs/migration.txt index f6df4be..b4b93d1
> > 100644
> > --- a/docs/migration.txt
> > +++ b/docs/migration.txt
> > @@ -291,3 +291,170 @@ save/send this state when we are in the middle of a
> > pio operation  (that is what ide_drive_pio_state_needed() checks).  If
> > DRQ_STAT is  not enabled, the values on that fields are garbage and don't
> > need to  be sent.
> > +
> > += Return path =
> > +
> > +In most migration scenarios there is only a single data path that runs
> > +from the source VM to the destination, typically along a single fd
> > +(although possibly with another fd or similar for some fast way of throwing
> > pages across).
> > +
> > +However, some uses need two way communication; in particular the
> > +Postcopy destination needs to be able to request pages on demand from
> > the source.
> > +
> > +For these scenarios there is a 'return path' from the destination to
> > +the source;
> > +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for
> > +the return path.
> > +
> > +  Source side
> > + Forward path - written by migration thread
> > + Return path  - opened by main thread, read by return-path thread
> > +
> > +  Destination side
> > + Forward path - read by main thread
> > + Return path  - opened by main thread, written by main thread AND
> > postcopy
> > +thread (protected by rp_mutex)
> > +
> > += Postcopy =
> > +'Postcopy' migration is a way to deal with migrations that refuse to
> > +converge; its plus side is that there is an upper bound on the amount
> > +of migration traffic and time it takes, the down side is that during
> > +the postcopy phase, a failure of
> > +*either* side or the network connection causes the guest to be lost.
> 
> Hi David,
> 
> Do you have any idea or plan to deal with the failure happened during the 
> postcopy phase?
> 
> Lost the guest  is too frightening for a cloud provider, we have a discussion 
> with 
> Alibaba, they said that they can't use the postcopy feature unless there is a 
> mechanism to
> find the guest back.

The VM memory image is still on the source VM, so you can restart
the source, however that's not safe, because once the destination has
started running it is sending out packets and also modifying the block storage.
If you restarted the source at that point what block and net state can
you accept being visible?

Dave

> 
> Liang
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] libcacard: use the library?

2015-06-18 Thread Michael Tokarev

18.06.2015 11:09, Paolo Bonzini пишет:
> On 17/06/2015 22:15, Michael Tokarev wrote:
>> I tried autoconf&automake&libtool.  It is a HugeMess, I disliked it.
>> So I rewrote it as a simple shell script.
>>
>> The result of both attempts is available at 
>> http://www.corpit.ru/mjt/tmp/libcacard/
>> There are 4 files in there:
>>
>>  configure.ac Makefile.am -- auto*shit version, requires bootstrap like
>>   libtoolize && aclocal && automake --foreign --add-missing && autoconf
> 
> More like autoreconf -fvi.

My 10-minute expirience with auto*tools did't go that far :)

>>  configure Makefile.in -- my small version based on what qemu ./configure
>>   currently does.
> 
> Doesn't have dependency tracking.  That's already a no-no I think.

Well, it is trivial to add.  For a first cut it works.

/mjt

Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr

2015-06-18 Thread Paolo Bonzini



On 18/06/2015 07:17, Pavel Dovgaluk wrote:
>>> > >
>>> > >  static inline RES_TYPE
>>> > > -glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong 
>>> > > ptr)
>>> > > +glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
>>> > > +  target_ulong ptr,
>>> > > +  uintptr_t retaddr)
>> > 
>> > Would it make sense to call these helper_cpu_ld##USUFFIX##MEMSUFFIX?
> I don't want to use 'helper' prefix, because helper functions are
> usually called directly from TB.

True, but in the end these have the same functionality as helpers, just
they're indirectly called from other helpers.

Paolo

Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386

2015-06-18 Thread Aurelien Jarno

On 2015-06-18 10:12, Pavel Dovgaluk wrote:
> > From: Aurelien Jarno [mailto:aurel...@aurel32.net]
> > On 2015-06-17 15:41, Pavel Dovgalyuk wrote:
> > > In icount mode every translation block looks as follows:
> > >
> > > if icount < n then exit
> > > icount -= n
> > > instr1
> > > instr2
> > > ...
> > > instrn
> > > exit
> > >
> > > When one of these instructions initiates an exception, icount should be
> > > restored and adjusted number of instructions should be subtracted from 
> > > icount
> > > instead of initial n.
> > >
> > > tlb_fill function passes retaddr to raise_exception, which allows 
> > > restoring
> > > current instructions in TB and correct icount calculation.
> > >
> > > When exception triggered with other function (e.g. by embedding call to
> > > exception raising helper into TB), then PC is not passed as retaddr and
> > > correct icount is not recovered. In such cases icount will be decreased
> > > by the value equal to the size of TB.
> > 
> > Looking at how icount work, I see it's basically a variable in the CPU
> > state (icount_decr.u16.low), which is already accessed from the TB.
> > Couldn't we adjust it using additional code before generating an
> > exception, when in icount mode.
> > 
> > For example for MIPS, we can add some code before generate_exception
> > which use the value from s->gen_opc_icount[j] to adjust
> > the variable icount_decr.u16.low.
> 
> It is possible, but it will incur additional overhead, because we will 
> have to update icount every time the exception might be generated.
> We'll have to update icount value before and after every helper call, 
> that can cause an exception:
> 
> icount -= n
> ...
> instr_k
> icount += n - k
> helper
> icount -= n - k
> ...
> 
> And this overhead will slowdown the code even if no exception occur.

That's where I might disagree. Retranslation seems a very good idea on
the paper, but in practice it doesn't seems to always bring the
performance improvement it should. In addition it seems to be highly
dependent on the target. Just to give some numbers, on MIPS (as your
patch originally concerns this architecture), 40% of code generation is
actually due to retranslation. The problem is that over the time we have
improved a lot the code generation (liveness analysis, better register
allocation, constant propagation, ...) and thus we have increased the
code generation time. While it clearly has some benefits when this code
is actually executed, it's not the case when the code is simply
retranslated. In short we spend more time to find the CPU state
corresponding to an exception than before.

A simple way to show that is to apply the simple patch below, which
disable retranslation and save the CPU state before each instruction:

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 1d128ee..5238d71 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -19435,6 +19435,7 @@ gen_intermediate_code_internal(MIPSCPU *cpu, 
TranslationBlock *tb,
 LOG_DISAS("\ntb %p idx %d hflags %04x\n", tb, ctx.mem_idx, ctx.hflags);
 gen_tb_start(tb);
 while (ctx.bstate == BS_NONE) {
+save_cpu_state(&ctx, 1);
 if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
 QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
 if (bp->pc == ctx.pc) {
diff --git a/translate-all.c b/translate-all.c
index b6b0e1c..3d4c017 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -212,6 +212,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, 
TranslationBlock *tb,
 int64_t ti;
 #endif

+return -1;
+
 #ifdef CONFIG_PROFILER
 ti = profile_getclock();
 #endif

On x86, this patch brings a 5% boot time improvement on MIPS. One of the
reason is that the TCG code generator has a good knowledge about which
TCG ops or helpers can trigger an exception, so it can optimize out part
of the instructions saving the CPU state. I guess that the host CPUs have
also evolved over the time, now being superscalar and out-of-order so
that saving the CPU state can be done "in background". Also it's just a
quick and dirty patch, we can probably even do better.

All of that to say that I am worried for the performances to see more
paths through the retranslation code, especially on MIPS as it seems to
be costly. That said I haven't really look in details at other targets,
nor hosts.

Now to come back about your patches, we might want to simply fix icount
first, even if it has some performance impact, and deal with the
retranslation issue separately, as it concerns more than just icount.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] libcacard: use the library?

2015-06-18 Thread Paolo Bonzini



On 18/06/2015 10:11, Michael Tokarev wrote:
> 18.06.2015 11:09, Paolo Bonzini пишет:
>> On 17/06/2015 22:15, Michael Tokarev wrote:
>>> I tried autoconf&automake&libtool.  It is a HugeMess, I disliked it.
>>> So I rewrote it as a simple shell script.
>>>
>>> The result of both attempts is available at 
>>> http://www.corpit.ru/mjt/tmp/libcacard/
>>> There are 4 files in there:
>>>
>>>  configure.ac Makefile.am -- auto*shit version, requires bootstrap like
>>>   libtoolize && aclocal && automake --foreign --add-missing && autoconf
>>
>> More like autoreconf -fvi.
> 
> My 10-minute expirience with auto*tools did't go that far :)

You got everything else right, though.  Kudos.

>>>  configure Makefile.in -- my small version based on what qemu ./configure
>>>   currently does.
>>
>> Doesn't have dependency tracking.  That's already a no-no I think.
> 
> Well, it is trivial to add.  For a first cut it works.

And then it will be something else with cross-compilation, or something
else.  Let's just use autotools and call it a day...

Paolo

Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr

2015-06-18 Thread Aurelien Jarno

On 2015-06-18 10:16, Paolo Bonzini wrote:
> 
> 
> On 18/06/2015 07:17, Pavel Dovgaluk wrote:
> >>> > >
> >>> > >  static inline RES_TYPE
> >>> > > -glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, 
> >>> > > target_ulong ptr)
> >>> > > +glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
> >>> > > +  target_ulong ptr,
> >>> > > +  uintptr_t retaddr)
> >> > 
> >> > Would it make sense to call these helper_cpu_ld##USUFFIX##MEMSUFFIX?
> > I don't want to use 'helper' prefix, because helper functions are
> > usually called directly from TB.
> 
> True, but in the end these have the same functionality as helpers, just
> they're indirectly called from other helpers.

Not fully. The idea is that the helpers are non-inline functions
handling the slow path. The cpu_ld##USUFFIX##MEMSUFFIX are inline
functions handling the fast path, and calling the helpers for the slow
path. That allows for example GCC to optimize the fast path when the
cpu_ld functions are used in a loop.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.

2015-06-18 Thread Paolo Bonzini

On 18/06/2015 09:50, Li, Liang Z wrote:
> Do you have any idea or plan to deal with the failure happened during
> the postcopy phase?
> 
> Lost the guest  is too frightening for a cloud provider, we have a
> discussion with Alibaba, they said that they can't use the postcopy
> feature unless there is a mechanism to find the guest back.

There's no solution to this problem, except for rollback to a previous
snapshot.

To give an idea, an example of an intended usecase for postcopy is
datacenter evacuation in 30 minutes after a tsunami alert.  That's not a
case where you care much about losing guests to network failures.

Why is there no solution?  Let's look at one of the best surveys on
migration,
http://courses.cs.vt.edu/~cs5204/fall05-kafura/Papers/Migration/ProcessMigration.pdf
(warning, 59 pages!):

  [3.2] If only part of the task state is transferred to another node,
  the task can start executing sooner, and the initial migration costs
  are lower.

  [3.4] Fault resilience can be improved in several ways. The impact of
  failures during migration can be reduced by maintaining process state
  on both the source and destination sites until the destination site
  instance is successfully promoted to a regular process and the source
  node is informed about this.

  [3.5] Migration algorithms should avoid linear dependencies on the
  amount of state to be transferred. For example, the eager data
  transfer strategy has costs proportional to the address space size

"Pre"copy means "start copying *before* promoting the destination to be
the primary host" and it has such a linear dependency on the amount of
state to be transferred. "Post"copy means "delay some copying to *after*
promoting the destination to be the primary host".

So we have:

   PrecopyPostcopy
   3.2 Performance- (1) - (2)
   3.4 Fault resilience   + -
   3.5 Scalability- +

  (1) smaller impact, longer freeze time
  (2) larger impact, extremely short freeze time

Postcopy can also limit the length of the non-resilient phase, by
starting with a precopy phase and only switching to postcopy after some
time.  Then you have:

   PrecopyHybrid  Postcopy
   3.2 Performance- (1)  + (3)- (2)
   3.4 Fault resilience   +  ---
   3.5 Scalability-  ++

  (3) intermediate impact, extremely short freeze time

but there is still going to be a phase where migration is not resilient
to network faults.

Cloud operators can use a combination of precopy and postcopy.  For
example, I would not use postcopy for mass migration when doing
host updates, but it can be used as a last resort before a scheduled
downtime.

For example, say you're doing a rolling update and you want it complete
by next Sunday.  90% of the guests are shut down by the customers or can
be migrated successfully with precopy.  The others do not converge and
their SLA does not let you throttle them to complete precopy migration.

You then tell your customers that either they shutdown and restart their
instances before Saturday 8:00 PM, or they might be shut down forcibly.
 Then for customers who haven't rebooted you can do
postcopy---you have alerted them that something might go wrong.  So even
though postcopy would not be a first choice, it can still help cloud
operators.

Paolo

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven


Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
 timeout=4999424576) at qemu-timer.c:326
 ts = {tv_sec = 4, tv_nsec = 999424576}
 tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
 at aio-posix.c:231
 node = 0x0
 was_dispatching = false
 ret = 1
 progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0, offset=4292007936,
 qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
 aio_context = 0x563528e0
 co = 0x563888a0
 rwco = {bs = 0x5637eae0, offset = 4292007936,
   qiov = 0x7554f760, is_write = false, ret = 2147483647, flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0, sector_num=8382828,
 buf=0x744cc800 "(", nb_sectors=4, is_write=false, flags=0)
 at block.c:2722
 qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size = 2048}
 iov = {iov_base = 0x744cc800, iov_len = 2048}
#5  0x5594b008 in bdrv_read (bs=0x5637eae0, sector_num=8382828,
 buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
No locals.
#6  0x5599acef in blk_read (blk=0x56376820, sector_num=8382828,
 buf=0x744cc800 "(", nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x55833ed2 in cd_read_sector (s=0x56408f88, lba=2095707,
 buf=0x744cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
 ret = 32767

Here is the problem: The ATAPI emulation uses synchronous blk_read()
instead of the AIO or coroutine interfaces. This means that it keeps
polling for request completion while it holds the BQL until the request
is completed.


I will look at this.



We can (and should) fix that, otherwise the VCPUs is blocked while we're
reading from the image, even without a hang. It doesn't fully fix your
problem, though, as bdrv_drain_all() and friends still exist.


Any idea which commands actually call bdrv_drain_alll?

Peter

Re: [Qemu-devel] [PATCH v7 0/9] Add limited support of VMware's hyper-call rpc

2015-06-18 Thread Paolo Bonzini



On 18/06/2015 09:58, Michael S. Tsirkin wrote:
> > If I am reading this correctly, I should add
> > 
> > Acked-by: Paolo Bonzini 
> > 
> > to these 4 patches.
> > 
> > Since I have never sent a pull request to QEMU before here is what I
> > think should be in it:
> 
> I'd like to see a version with comments addressed first.

These four patches are unrelated.

Paolo

Re: [Qemu-devel] [PATCH v7 0/9] Add limited support of VMware's hyper-call rpc

2015-06-18 Thread Paolo Bonzini



On 18/06/2015 00:26, Don Slutz wrote:
> 
> The following changes since commit
> f754c3c9cce3c4789733d9068394be4256dfe6a8:
> 
> Merge remote-tracking branch 
> 'remotes/agraf/tags/signed-s390-for-upstream' into staging
> (2015-06-17 12:43:26 +0100)
> 
> are available in the git repository at:
> 
> 
> g...@github.com:dslutz/qemu.git vmware_pull_v7

Almost.  In your .gitconfig change

   url = g...@github.com:dslutz/qemu.git

to

   url = https://github.com/dslutz/qemu.git
   pushurl = g...@github.com:dslutz/qemu.git

It will also be faster for everyday use.

> for you to fetch changes up to
> a9e61af94c4452270521638c6bac11262ff2f2b7:
> 
> MAINTAINERS: add VMware port (2015-06-17 18:21:02 -0400)
> 
> -  
> Don Slutz (4): vmport: The io memory region needs to be at least a
> size of 4 vmport: Switch to trace vmport: Fix vmport_cmd_ram_size 
> MAINTAINERS: add VMware port
> 
> MAINTAINERS  |  7 +++ hw/misc/vmport.c | 15
> +-- trace-events |  4  3 files changed, 20
> insertions(+), 6 deletions(-)

Yes, this looks good.

> Not clear at all about signing this pull.

Instead of pushing a branch, do

   git tag -s -f for-upstream
   
   git push --tags name-of-github-remote

and git request-pull should just work.

Paolo

Re: [Qemu-devel] [PATCH] Add .dir-locals.el file to configure emacs coding style

2015-06-18 Thread Markus Armbruster

Michael Tokarev  writes:

> So, what is the consensus here?
>
> Everyone who talked wants the emacs mode, but everyone
> offers their own mode.
>
> I'd pick the stroustrup variant suggested by Marcus
> since it is shortest, but while being shortest, it
> is looks a bit "magical".

I don't think it's magical at all.  It uses .dir-locals exactly as
intended.  In fact, it's almost straight from the Emacs manual:

   The '.dir-locals.el' file should hold a specially-constructed list,
which maps major mode names (symbols) to alists (*note
(elisp)Association Lists::).  Each alist entry consists of a variable
name and the directory-local value to assign to that variable, when the
specified major mode is enabled.  Instead of a mode name, you can
specify 'nil', which means that the alist applies to any mode; or you
can specify a subdirectory name (a string), in which case the alist
applies to all files in that subdirectory.

   Here's an example of a '.dir-locals.el' file:

 ((nil . ((indent-tabs-mode . t)
  (fill-column . 80)))
  (c-mode . ((c-file-style . "BSD")
 (subdirs . nil)))
  ("src/imported"
   . ((nil . ((change-log-default-name
   . "ChangeLog.local"))

This sets 'indent-tabs-mode' and 'fill-column' for any file in the
directory tree, and the indentation style for any C source file.  The
special 'subdirs' element is not a variable, but a special keyword which
indicates that the C mode settings are only to be applied in the current
directory, not in any subdirectories.  Finally, it specifies a different
'ChangeLog' file name for any file in the 'src/imported' subdirectory.

The .dir-locals.el snippet I suggested is a straighforward cherry-pick
from the above:

((c-mode . ((c-file-style . "stroustrup")
(indent-tabs-mode . nil

Here's one that takes better care of tabs:

((nil . ((indent-tabs-mode . nil)))
 (makefile-mode ((indent-tabs-mode . t)))
 (c-mode . ((c-file-style . "stroustrup"

>On the other hand, variant
> from Peter Maydell (https://wiki.linaro.org/PeterMaydell/QemuEmacsStyle)
> explicitly defines everything.

I'm afraid putting this into the source tree isn't as easy, because it
involves defining a new style, which you're not supposed do in
.dir-locals.el (it's for directory local variables, not for defining
global constants and calling functions).

So users would have to put the style definition in their .emacs, and its
use in .dir-locals.el.  If we then commit the latter to the repository,
we screw everybody who hasn't added the former to his .emacs.  No go.

We could try something like

((nil . ((indent-tabs-mode . nil)))
 (makefile-mode ((indent-tabs-mode . t)))
 (c-mode . ((c-file-style . (if (assoc "qemu" c-style-alist)
"qemu" "stroustrup")

but that triggers the "contains values that may not be safe" prompt, so
it's another no go.

Let's see what Peter's style adds to "stroustrup", to gauge how much
trouble getting it would be worth:

(defconst qemu-c-style
  '(
;; recommend to do indent-tabs-mode separately to cover other major modes
(indent-tabs-mode . nil)
;; same as stroustrup
(c-basic-offset . 4)
;; this is the default, and anyone changing it is nuts
(tab-width . 8)
;; default
(c-comment-only-line-offset . 0)
;; duplicate entry, the one below wins, this one has no effect
(c-hanging-braces-alist . ((substatement-open before after)))
(c-offsets-alist . ((statement-block-intro . +)
(substatement-open . 0)
(label . 0)
(statement-cont . +)
;; up to here same as stroustrup
;; except we don't have (substatement-label . 0)
;; and thus default to (substatement-label . 2)
;;
;; stroustrup has + instead of 0
;; C++ "namespace" blocks
;; do we care to differ from stroustrup?
(innamespace . 0)
;; stroustrup has + instead of 0
;; Brace that opens an in-class inline method
;; do we care to differ from stroustrup?
(inline-open . 0)
))
;; This isn't for indentation, it's for automatically inserting
;; newlines when you type braces in auto-newline minor mode.
(c-hanging-braces-alist .
(
 ;; same as stroustrup
 (brace-list-open)
 ;; only qemu
 ;; First line in an enum or static array list
 ;; suppress auto-newlines there

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Kevin Wolf

Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:
> >Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
> >>Thread 2 (Thread 0x75550700 (LWP 2636)):
> >>#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
> >>No symbol table info available.
> >>#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
> >> timeout=4999424576) at qemu-timer.c:326
> >> ts = {tv_sec = 4, tv_nsec = 999424576}
> >> tvsec = 4
> >>#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
> >> at aio-posix.c:231
> >> node = 0x0
> >> was_dispatching = false
> >> ret = 1
> >> progress = false
> >>#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0, 
> >>offset=4292007936,
> >> qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
> >> aio_context = 0x563528e0
> >> co = 0x563888a0
> >> rwco = {bs = 0x5637eae0, offset = 4292007936,
> >>   qiov = 0x7554f760, is_write = false, ret = 2147483647, flags 
> >> = 0}
> >>#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0, sector_num=8382828,
> >> buf=0x744cc800 "(", nb_sectors=4, is_write=false, flags=0)
> >> at block.c:2722
> >> qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size = 2048}
> >> iov = {iov_base = 0x744cc800, iov_len = 2048}
> >>#5  0x5594b008 in bdrv_read (bs=0x5637eae0, sector_num=8382828,
> >> buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
> >>No locals.
> >>#6  0x5599acef in blk_read (blk=0x56376820, sector_num=8382828,
> >> buf=0x744cc800 "(", nb_sectors=4) at block/block-backend.c:404
> >>No locals.
> >>#7  0x55833ed2 in cd_read_sector (s=0x56408f88, lba=2095707,
> >> buf=0x744cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
> >> ret = 32767
> >Here is the problem: The ATAPI emulation uses synchronous blk_read()
> >instead of the AIO or coroutine interfaces. This means that it keeps
> >polling for request completion while it holds the BQL until the request
> >is completed.
> 
> I will look at this.
> 
> >
> >We can (and should) fix that, otherwise the VCPUs is blocked while we're
> >reading from the image, even without a hang. It doesn't fully fix your
> >problem, though, as bdrv_drain_all() and friends still exist.
> 
> Any idea which commands actually call bdrv_drain_alll?

At least 'stop' and all commands changing the BDS graph (block jobs,
snapshots, commit, etc.). For a full list, I would have to inspect each
command in the code.

The guest can even trigger bdrv_drain_all() by stopping a running DMA
operation.

Kevin

[Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints

2015-06-18 Thread Wen Congyang

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

Usage:
Please refer to docs/block-replication.txt

You can get the patch here:
https://github.com/wencongyang/qemu-colo/commits/block-replication-v6

The other newest COLO patchse will be sent soon.

Note: you should apply the following patch first:
http://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg01317.html

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
   if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (16):
  docs: block replication's description
  allow writing to the backing file
  Allow creating backup jobs when opening BDS
  block: Parse "backing_reference" option to reference existing BDS
  Backup: clear all bitmap when doing block checkpoint
  Don't allow a disk use backing reference target
  Add new block driver interface to connect/disconnect the remote target
  NBD client: implement block driver interfaces to connect/disconnect
NBD server
  Introduce a new -drive option to control whether to connect to remote
target
  NBD client: connect to nbd server later
  Add new block driver interfaces to control block replication
  skip nbd_target when starting block replication
  quorum: implement block driver interfaces for block replication
  introduce a new API qemu_opts_absorb_qdict_by_index()
  quorum: allow ignoring child errors
  Implement new driver for block replication

 block.c| 279 +++-
 block/Makefile.objs|   3 +-
 block/backup.c |  13 ++
 block/nbd.c|  69 +--
 block/quorum.c | 162 -
 block/replication.c| 441 +
 blockdev.c |   8 +
 blockjob.c |  10 +
 docs/block-replication.txt | 179 ++
 include/block/block.h  |  10 +
 include/block/block_int.h  |  18 ++
 include/block/blockjob.h   |  12 ++
 include/qemu/option.h  |   2 +
 qapi/block.json|  16 ++
 qemu-options.hx|   4 +
 tests/qemu-iotests/051 |  13 ++
 tests/qemu-iotests/051.out |  13 ++
 util/qemu-option.c |  44 +
 18 files changed, 1266 insertions(+), 30 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt

-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target

2015-06-18 Thread Wen Congyang

In some cases, we want to connect/disconnect the remote target when
we need, not in bdrv_open()/bdrv_close().

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block.c   | 24 
 include/block/block.h |  3 +++
 include/block/block_int.h |  3 +++
 3 files changed, 30 insertions(+)

diff --git a/block.c b/block.c
index 0b41af4..59071d4 100644
--- a/block.c
+++ b/block.c
@@ -4400,3 +4400,27 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
 {
 return &bs->stats;
 }
+
+void bdrv_connect(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_connect) {
+drv->bdrv_connect(bs, errp);
+} else if (bs->file) {
+bdrv_connect(bs->file, errp);
+} else {
+error_setg(errp, "this feature or command is not currently supported");
+}
+}
+
+void bdrv_disconnect(BlockDriverState *bs)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_disconnect) {
+drv->bdrv_disconnect(bs);
+} else if (bs->file) {
+bdrv_disconnect(bs->file);
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 7cdb569..2c2a0cc 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -606,4 +606,7 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
 
 BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 
+void bdrv_connect(BlockDriverState *bs, Error **errp);
+void bdrv_disconnect(BlockDriverState *bs);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 87fe89a..a3e5372 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -290,6 +290,9 @@ struct BlockDriver {
  */
 int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+void (*bdrv_connect)(BlockDriverState *bs, Error **errp);
+void (*bdrv_disconnect)(BlockDriverState *bs);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 01/16] docs: block replication's description

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: Yang Hongyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 docs/block-replication.txt | 179 +
 1 file changed, 179 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 000..a29f51a
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,179 @@
+Block replication
+
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COurse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  |  Copy and Forward| |
+  |-(1)--+   | Disk Buffer |
+  |  |   | |
+  | (3)  \-/
+  | speculative  ^
+  |write through(2)
+  |  |   |
+  V  V   |
+   +--+   ++
+   | Primary Disk |   | Secondary Disk |
+   +--+   ++
+
+1) Primary write requests will be copied and forwarded to Secondary
+   QEMU.
+2) Before Primary write requests are written to Secondary disk, the
+   original sector content will be read from Secondary disk and
+   buffered in the Disk buffer, but it will not overwrite the existing
+   sector content(it could be from either "Secondary Write Requests" or
+   previous COW of "Primary Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Secondary disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+ virtio-blk   ||
+ ^||.--
+ |||| Secondary
+1 Quorum  ||'--
+ /  \ ||
+/\||
+   Primary2 filter
+ disk ^
 virtio-blk
+  |
  ^
+3 NBD  --->  3 NBD 
  |
+client|| server
  2 filter
+  ||^  
  ^
+. |||  
  |
+Primary | ||  Secondary disk <- hidden-disk 5 
<- active-disk 4
+' |||  backing^   backing
+  ||| |
+  ||| |
+  ||'-'
+  ||   drive-backup sync=none
+
+1) The disk on the pr

[Qemu-devel] [PATCH COLO-Block v6 04/16] block: Parse "backing_reference" option to reference existing BDS

2015-06-18 Thread Wen Congyang

Usage:
-drive file=xxx,id=Y, \
-drive 
file=,id=X,backing_reference.drive_id=Y,backing_reference.hidden-disk.*

It will create such backing chain:
   {virtio-blk dev 'Y'}  
{virtio-blk dev 'X'}
 |  
|
 |  
|
 v  
v

[base] <- [mid] <- ( Y )  <- (hidden target) 
<--- ( X )

 v  ^
 v  ^
 v  ^
 v  ^
  drive-backup sync=none 

X's backing file is hidden-disk, and hidden-disk's backing file is Y.
Disk Y may be opened or reopened in read-write mode, so A block backup
job is automatically created: source is Y and target is hidden disk.
Active disk X, hidden disk, and Y are all on the same AioContext.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block.c| 154 -
 include/block/block.h  |   1 +
 include/block/block_int.h  |   1 +
 tests/qemu-iotests/051 |  13 
 tests/qemu-iotests/051.out |  13 
 5 files changed, 179 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index df4cbce..d1ed227 100644
--- a/block.c
+++ b/block.c
@@ -1245,6 +1245,119 @@ free_exit:
 return ret;
 }
 
+static void backing_reference_completed(void *opaque, int ret)
+{
+BlockDriverState *hidden_disk = opaque;
+
+assert(!hidden_disk->backing_reference);
+}
+
+static int bdrv_open_backing_reference_file(BlockDriverState *bs,
+QDict *options, Error **errp)
+{
+const char *backing_name;
+QDict *hidden_disk_options = NULL;
+BlockDriverState *backing_hd, *hidden_disk;
+BlockBackend *backing_blk;
+AioContext *aio_context;
+Error *local_err = NULL;
+int ret = 0;
+
+backing_name = qdict_get_try_str(options, "drive_id");
+if (!backing_name) {
+error_setg(errp, "Backing reference needs option drive_id");
+ret = -EINVAL;
+goto free_exit;
+}
+qdict_del(options, "drive_id");
+
+qdict_extract_subqdict(options, &hidden_disk_options, "hidden-disk.");
+if (!qdict_size(hidden_disk_options)) {
+error_setg(errp, "Backing reference needs option hidden-disk.*");
+ret = -EINVAL;
+goto free_exit;
+}
+
+if (qdict_size(options)) {
+const QDictEntry *entry = qdict_first(options);
+error_setg(errp, "Backing reference used by '%s' doesn't support "
+   "the option '%s'", bdrv_get_device_name(bs), entry->key);
+ret = -EINVAL;
+goto free_exit;
+}
+
+backing_blk = blk_by_name(backing_name);
+if (!backing_blk) {
+error_setg(errp, "Device '%s' not found", backing_name);
+ret = -ENOENT;
+goto free_exit;
+}
+
+backing_hd = blk_bs(backing_blk);
+/* Backing reference itself? */
+if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
+error_setg(errp, "Backing reference itself");
+ret = -EINVAL;
+goto free_exit;
+}
+
+if (bdrv_op_is_blocked(backing_hd, BLOCK_OP_TYPE_BACKING_REFERENCE,
+   errp)) {
+ret = -EBUSY;
+goto free_exit;
+}
+
+/* hidden-disk is bs's backing file */
+ret = bdrv_open_backing_file(bs, hidden_disk_options, errp);
+hidden_disk_options = NULL;
+if (ret < 0) {
+goto free_exit;
+}
+
+hidden_disk = bs->backing_hd;
+if (!hidden_disk->drv || !hidden_disk->drv->supports_backing) {
+ret = -EINVAL;
+error_setg(errp, "Hidden disk's driver doesn't support backing files");
+goto free_exit;
+}
+
+bdrv_set_backing_hd(hidden_disk, backing_hd);
+bdrv_ref(backing_hd);
+
+/*
+ * backing hd may be opened or reopened in read-write mode, so we
+ * should backup backing hd to hidden disk
+ */
+bdrv_op_unblock(hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
+bs->backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+hidden_disk->backing_blocker);
+
+bdrv_ref(hidden_disk);
+
+aio_context = bdrv_get_aio_context(bs);
+aio_context_acquire(aio_context);
+bdrv_set_aio_context(backing_hd, aio_context);
+backup_start(backing_hd, hidden_disk, 0, MIRROR_SYNC_MODE_NONE, NULL,
+ BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
+ backing_reference_completed, hidden_disk, &local_err);
+aio_context_release(aio_context);
+if (local_err) {
+error_propagate(errp

[Qemu-devel] [PATCH COLO-Block v6 09/16] Introduce a new -drive option to control whether to connect to remote target

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 blockdev.c| 8 
 include/block/block.h | 1 +
 qemu-options.hx   | 4 
 3 files changed, 13 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 1cd1b79..07b0477 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -431,6 +431,10 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 qdict_put(bs_opts, "driver", qstring_from_str(buf));
 }
 
+if (qemu_opt_get_bool(opts, "no-connect", false)) {
+bdrv_flags |= BDRV_O_NO_CONNECT;
+}
+
 /* disk I/O throttling */
 memset(&cfg, 0, sizeof(cfg));
 cfg.buckets[THROTTLE_BPS_TOTAL].avg =
@@ -3214,6 +3218,10 @@ QemuOptsList qemu_common_drive_opts = {
 .name = "detect-zeroes",
 .type = QEMU_OPT_STRING,
 .help = "try to optimize zero writes (off, on, unmap)",
+},{
+.name = "no-connect",
+.type = QEMU_OPT_BOOL,
+.help = "enable whether to connect remote target"
 },
 { /* end of list */ }
 },
diff --git a/include/block/block.h b/include/block/block.h
index 2c2a0cc..4b3a2b9 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -88,6 +88,7 @@ typedef struct HDGeometry {
 #define BDRV_O_PROTOCOL0x8000  /* if no block driver is explicitly given:
   select an appropriate protocol driver,
   ignoring the format layer */
+#define BDRV_O_NO_CONNECT  0x1 /* do not connect to remote target */
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 5438f98..7bdd7b7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -469,6 +469,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
 "   [[,iops_max=im]|[[,iops_rd_max=irm][,iops_wr_max=iwm]]]\n"
 "   [[,iops_size=is]]\n"
 "   [[,group=g]]\n"
+"   [,no-connect=on|off]\n"
 "use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
@@ -530,6 +531,9 @@ file sectors into the image file.
 conversion of plain zero writes by the OS to driver specific optimized
 zero write commands. You may even choose "unmap" if @var{discard} is set
 to "unmap" to allow a zero write to be converted to an UNMAP operation.
+@item no-connect=@var{no-connect}
+@var{no-connect} is "on" or "off", and enables whether to connect to remote
+target when open the drive. The default value is "off".
 @end table
 
 By default, the @option{cache=writeback} mode is used. It will report data
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 03/16] Allow creating backup jobs when opening BDS

2015-06-18 Thread Wen Congyang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Cc: Jeff Cody 
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c34fd7c..f068666 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
 iscsi.o-libs   := $(LIBISCSI_LIBS)
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 02/16] allow writing to the backing file

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 0ffb855..df4cbce 100644
--- a/block.c
+++ b/block.c
@@ -745,6 +745,15 @@ static const BdrvChildRole child_backing = {
 .inherit_flags = bdrv_backing_flags,
 };
 
+static int bdrv_backing_rw_flags(int flags)
+{
+return bdrv_backing_flags(flags) | BDRV_O_RDWR;
+}
+
+static const BdrvChildRole child_backing_rw = {
+.inherit_flags = bdrv_backing_rw_flags,
+};
+
 static int bdrv_open_flags(BlockDriverState *bs, int flags)
 {
 int open_flags = flags | BDRV_O_CACHE_WB;
@@ -1131,6 +1140,20 @@ out:
 bdrv_refresh_limits(bs, NULL);
 }
 
+#define ALLOW_WRITE_BACKING_FILE"allow-write-backing-file"
+static QemuOptsList backing_file_opts = {
+.name = "backing_file",
+.head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
+.desc = {
+{
+.name = ALLOW_WRITE_BACKING_FILE,
+.type = QEMU_OPT_BOOL,
+.help = "allow write to backing file",
+},
+{ /* end of list */ }
+},
+};
+
 /*
  * Opens the backing file for a BlockDriverState if not yet open
  *
@@ -1145,6 +1168,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 int ret = 0;
 BlockDriverState *backing_hd;
 Error *local_err = NULL;
+QemuOpts *opts = NULL;
+bool child_rw = false;
 
 if (bs->backing_hd != NULL) {
 QDECREF(options);
@@ -1157,6 +1182,18 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 }
 
 bs->open_flags &= ~BDRV_O_NO_BACKING;
+
+opts = qemu_opts_create(&backing_file_opts, NULL, 0, &error_abort);
+qemu_opts_absorb_qdict(opts, options, &local_err);
+if (local_err) {
+ret = -EINVAL;
+error_propagate(errp, local_err);
+QDECREF(options);
+goto free_exit;
+}
+child_rw = qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false);
+qemu_opts_del(opts);
+
 if (qdict_haskey(options, "file.filename")) {
 backing_filename[0] = '\0';
 } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
@@ -1189,7 +1226,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 assert(bs->backing_hd == NULL);
 ret = bdrv_open_inherit(&backing_hd,
 *backing_filename ? backing_filename : NULL,
-NULL, options, 0, bs, &child_backing,
+NULL, options, 0, bs,
+child_rw ? &child_backing_rw : &child_backing,
 NULL, &local_err);
 if (ret < 0) {
 bdrv_unref(backing_hd);
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 11/16] Add new block driver interfaces to control block replication

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Cc: Luiz Capitulino 
Cc: Michael Roth 
Reviewed-by: Paolo Bonzini 
---
 block.c   | 40 
 include/block/block.h |  5 +
 include/block/block_int.h | 14 ++
 qapi/block.json   | 16 
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index 59071d4..06222bf 100644
--- a/block.c
+++ b/block.c
@@ -4424,3 +4424,43 @@ void bdrv_disconnect(BlockDriverState *bs)
 bdrv_disconnect(bs->file);
 }
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_start_replication) {
+drv->bdrv_start_replication(bs, mode, errp);
+} else if (bs->file) {
+bdrv_start_replication(bs->file, mode, errp);
+} else {
+error_setg(errp, "this feature or command is not currently supported");
+}
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_do_checkpoint) {
+drv->bdrv_do_checkpoint(bs, errp);
+} else if (bs->file) {
+bdrv_do_checkpoint(bs->file, errp);
+} else {
+error_setg(errp, "this feature or command is not currently supported");
+}
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+BlockDriver *drv = bs->drv;
+
+if (drv && drv->bdrv_stop_replication) {
+drv->bdrv_stop_replication(bs, failover, errp);
+} else if (bs->file) {
+bdrv_stop_replication(bs->file, failover, errp);
+} else {
+error_setg(errp, "this feature or command is not currently supported");
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 4b3a2b9..573d39f 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -610,4 +610,9 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 void bdrv_connect(BlockDriverState *bs, Error **errp);
 void bdrv_disconnect(BlockDriverState *bs);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a3e5372..27ff3da 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -293,6 +293,20 @@ struct BlockDriver {
 void (*bdrv_connect)(BlockDriverState *bs, Error **errp);
 void (*bdrv_disconnect)(BlockDriverState *bs);
 
+void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+   Error **errp);
+/* Drop Disk buffer when doing checkpoint. */
+void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+/*
+ * After failover, we should flush Disk buffer into secondary disk
+ * and stop block replication.
+ *
+ * If the guest is shutdown, we should drop Disk buffer and stop
+ * block representation.
+ */
+void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+  Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block.json b/qapi/block.json
index aad645c..04dc4c2 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -40,6 +40,22 @@
   'data': ['auto', 'none', 'lba', 'large', 'rechs']}
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @unprotected: Replication is not started or after failover.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.4
+##
+{ 'enum' : 'ReplicationMode',
+  'data' : ['unprotected', 'primary', 'secondary']}
+
+##
 # @BlockdevSnapshotInternal
 #
 # @device: the name of the device to generate the snapshot from
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 05/16] Backup: clear all bitmap when doing block checkpoint

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Cc: Jeff Cody 
---
 block/backup.c   | 13 +
 blockjob.c   | 10 ++
 include/block/blockjob.h | 12 
 3 files changed, 35 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index d3f648d..d3d8ba7 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -210,11 +210,24 @@ static void backup_iostatus_reset(BlockJob *job)
 bdrv_iostatus_reset(s->target);
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+error_setg(errp, "this feature or command is not currently supported");
+return;
+}
+
+hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
 .set_speed  = backup_set_speed,
 .iostatus_reset = backup_iostatus_reset,
+.do_checkpoint  = backup_do_checkpoint,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
diff --git a/blockjob.c b/blockjob.c
index 2755465..9d2128a 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -399,3 +399,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
 
 qemu_bh_schedule(data->bh);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+if (!job->driver->do_checkpoint) {
+error_setg(errp, "this feature or command is not currently supported");
+return;
+}
+
+job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 57d8ef1..b832dc3 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
  * manually.
  */
 void (*complete)(BlockJob *job, Error **errp);
+
+/** Optional callback for job types that support checkpoint. */
+void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -348,4 +351,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
   BlockJobDeferToMainLoopFn *fn,
   void *opaque);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 06/16] Don't allow a disk use backing reference target

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/block.c b/block.c
index d1ed227..0b41af4 100644
--- a/block.c
+++ b/block.c
@@ -1294,6 +1294,14 @@ static int 
bdrv_open_backing_reference_file(BlockDriverState *bs,
 }
 
 backing_hd = blk_bs(backing_blk);
+/* Don't allow a disk use backing reference target */
+ret = blk_attach_dev(backing_hd->blk, bs);
+if (ret < 0) {
+error_setg(errp, "backing_hd %s is used by the other device model",
+   backing_name);
+goto free_exit;
+}
+
 /* Backing reference itself? */
 if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
 error_setg(errp, "Backing reference itself");
@@ -2037,6 +2045,7 @@ void bdrv_close(BlockDriverState *bs)
 if (backing_hd->backing_hd->job) {
 block_job_cancel(backing_hd->backing_hd->job);
 }
+blk_detach_dev(backing_hd->backing_hd->blk, bs);
 bdrv_set_backing_hd(backing_hd, NULL);
 bdrv_unref(backing_hd->backing_hd);
 }
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 13/16] quorum: implement block driver interfaces for block replication

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block/quorum.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 77e55b2..01cfac0 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -82,6 +82,8 @@ typedef struct BDRVQuorumState {
 */
 
 QuorumReadPattern read_pattern;
+
+int replication_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -945,6 +947,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 g_free(opened);
+s->replication_index = -1;
 goto exit;
 
 close_exit:
@@ -1032,6 +1035,77 @@ static void quorum_refresh_filename(BlockDriverState *bs)
 bs->full_open_options = opts;
 }
 
+static void quorum_start_replication(BlockDriverState *bs, ReplicationMode 
mode,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+int count = 0, i, index;
+Error *local_err = NULL;
+
+/*
+ * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
+ * QEMU becoming primary QEMU.
+ */
+if (mode != REPLICATION_MODE_PRIMARY) {
+error_setg(errp, "Invalid parameter 'mode'");
+return;
+}
+
+if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
+error_setg(errp, "Invalid parameter 'read pattern'");
+return;
+}
+
+for (i = 0; i < s->num_children; i++) {
+bdrv_start_replication(s->bs[i], mode, &local_err);
+if (local_err) {
+error_free(local_err);
+local_err = NULL;
+} else {
+count++;
+index = i;
+}
+}
+
+if (count == 0) {
+/* No child supports block replication */
+error_setg(errp, "this feature or command is not currently supported");
+} else if (count > 1) {
+for (i = 0; i < s->num_children; i++) {
+bdrv_stop_replication(s->bs[i], false, NULL);
+}
+error_setg(errp, "too many children support block replication");
+} else {
+s->replication_index = index;
+}
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+
+if (s->replication_index < 0) {
+error_setg(errp, "Block replication is not started");
+return;
+}
+
+bdrv_do_checkpoint(s->bs[s->replication_index], errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, bool failover,
+Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+
+if (s->replication_index < 0) {
+error_setg(errp, "Block replication is not started");
+return;
+}
+
+bdrv_stop_replication(s->bs[s->replication_index], failover, errp);
+s->replication_index = -1;
+}
+
 static BlockDriver bdrv_quorum = {
 .format_name= "quorum",
 .protocol_name  = "quorum",
@@ -1055,6 +1129,10 @@ static BlockDriver bdrv_quorum = {
 
 .is_filter  = true,
 .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+.bdrv_start_replication = quorum_start_replication,
+.bdrv_do_checkpoint = quorum_do_checkpoint,
+.bdrv_stop_replication  = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 15/16] quorum: allow ignoring child errors

2015-06-18 Thread Wen Congyang

If the child is not ready, read/write/getlength/flush will
return -errno. It is not critical error, and can be ignored:
1. read/write:
   Just not report the error event.
2. getlength:
   just ignore it. If all children's getlength return -errno,
   and be ignored, return -EIO.
3. flush:
   Just ignore it. If all children's getlength return -errno,
   and be ignored, return 0.

Usage: children.x.ignore-errors=true

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block/quorum.c | 84 +-
 1 file changed, 77 insertions(+), 7 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 01cfac0..c5dbb69 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -30,6 +30,7 @@
 #define QUORUM_OPT_BLKVERIFY  "blkverify"
 #define QUORUM_OPT_REWRITE"rewrite-corrupted"
 #define QUORUM_OPT_READ_PATTERN   "read-pattern"
+#define QUORUM_CHILDREN_OPT_IGNORE_ERRORS   "ignore-errors"
 
 /* This union holds a vote hash value */
 typedef union QuorumVoteValue {
@@ -65,6 +66,7 @@ typedef struct QuorumVotes {
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
 BlockDriverState **bs; /* children BlockDriverStates */
+bool *ignore_errors;   /* ignore children's error? */
 int num_children;  /* children count */
 int threshold; /* if less than threshold children reads gave the
 * same result a quorum error occurs.
@@ -99,6 +101,7 @@ typedef struct QuorumChildRequest {
 uint8_t *buf;
 int ret;
 QuorumAIOCB *parent;
+int index;
 } QuorumChildRequest;
 
 /* Quorum will use the following structure to track progress of each read/write
@@ -211,6 +214,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
 acb->qcrs[i].buf = NULL;
 acb->qcrs[i].ret = 0;
 acb->qcrs[i].parent = acb;
+acb->qcrs[i].index = i;
 }
 
 return acb;
@@ -304,7 +308,7 @@ static void quorum_aio_cb(void *opaque, int ret)
 acb->count++;
 if (ret == 0) {
 acb->success_count++;
-} else {
+} else if (!s->ignore_errors[sacb->index]) {
 quorum_report_bad(acb, sacb->aiocb->bs->node_name, ret);
 }
 assert(acb->count <= s->num_children);
@@ -719,19 +723,31 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
 static int64_t quorum_getlength(BlockDriverState *bs)
 {
 BDRVQuorumState *s = bs->opaque;
-int64_t result;
+int64_t result = -EIO;
 int i;
 
 /* check that all file have the same length */
-result = bdrv_getlength(s->bs[0]);
-if (result < 0) {
-return result;
-}
-for (i = 1; i < s->num_children; i++) {
+for (i = 0; i < s->num_children; i++) {
 int64_t value = bdrv_getlength(s->bs[i]);
+
 if (value < 0) {
 return value;
 }
+
+if (value == 0 && s->ignore_errors[i]) {
+/*
+ * If the child is not ready, it cannot return -errno,
+ * otherwise refresh_total_sectors() will fail when
+ * we open the child.
+ */
+continue;
+}
+
+if (result == -EIO) {
+result = value;
+continue;
+}
+
 if (value != result) {
 return -EIO;
 }
@@ -769,6 +785,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState 
*bs)
 
 for (i = 0; i < s->num_children; i++) {
 result = bdrv_co_flush(s->bs[i]);
+if (result < 0 && s->ignore_errors[i]) {
+result = 0;
+}
 result_value.l = result;
 quorum_count_vote(&error_votes, &result_value, i);
 }
@@ -843,6 +862,19 @@ static QemuOptsList quorum_runtime_opts = {
 },
 };
 
+static QemuOptsList quorum_children_common_opts = {
+.name = "quorum children",
+.head = QTAILQ_HEAD_INITIALIZER(quorum_children_common_opts.head),
+.desc = {
+{
+.name = QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
+.type = QEMU_OPT_BOOL,
+.help = "ignore child I/O error",
+},
+{ /* end of list */ }
+},
+};
+
 static int parse_read_pattern(const char *opt)
 {
 int i;
@@ -861,6 +893,37 @@ static int parse_read_pattern(const char *opt)
 return -EINVAL;
 }
 
+static int parse_children_options(BDRVQuorumState *s, QDict *options,
+  const char *indexstr, int index,
+  Error **errp)
+{
+QemuOpts *children_opts = NULL;
+Error *local_err = NULL;
+int ret = 0;
+bool value;
+
+children_opts = qemu_opts_create(&quorum_children_common_opts, NULL, 0,
+ &error_abort);
+qemu_opts_absorb_qdict_by_index(children_opts, options, indexstr,
+&local_err);
+if (local_err) {
+ret = -EINVAL;
+goto out;
+}
+
+value = qemu_opt_get_bool(children

[Qemu-devel] [PATCH COLO-Block v6 10/16] NBD client: connect to nbd server later

2015-06-18 Thread Wen Congyang

The secondary qemu starts later than the primary qemu, so we
cannot connect to nbd server in bdrv_open(). Introduce a new
open flags to control it.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block/nbd.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index bc9477a..4964cf8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -298,11 +298,13 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
-nbd_connect_server(bs, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-g_free(s->export);
-return -EINVAL;
+if (!(flags & BDRV_O_NO_CONNECT)) {
+nbd_connect_server(bs, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+g_free(s->export);
+return -EINVAL;
+}
 }
 
 return 0;
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 08/16] NBD client: implement block driver interfaces to connect/disconnect NBD server

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block/nbd.c | 67 -
 1 file changed, 49 insertions(+), 18 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2176186..bc9477a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -44,6 +44,8 @@
 typedef struct BDRVNBDState {
 NbdClientSession client;
 QemuOpts *socket_opts;
+char *export;
+bool connected;
 } BDRVNBDState;
 
 static int nbd_parse_uri(const char *filename, QDict *options)
@@ -254,34 +256,56 @@ static int nbd_establish_connection(BlockDriverState *bs, 
Error **errp)
 return sock;
 }
 
-static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
-Error **errp)
+static void nbd_connect_server(BlockDriverState *bs, Error **errp)
 {
 BDRVNBDState *s = bs->opaque;
-char *export = NULL;
-int result, sock;
-Error *local_err = NULL;
-
-/* Pop the config into our state object. Exit if invalid. */
-nbd_config(s, options, &export, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-return -EINVAL;
-}
+int sock;
 
 /* establish TCP connection, return error if it fails
  * TODO: Configurable retry-until-timeout behaviour.
  */
 sock = nbd_establish_connection(bs, errp);
 if (sock < 0) {
-g_free(export);
-return sock;
+return;
 }
 
 /* NBD handshake */
-result = nbd_client_init(bs, sock, export, errp);
-g_free(export);
-return result;
+nbd_client_init(bs, sock, s->export, errp);
+
+s->connected = true;
+}
+
+static void nbd_disconnect_server(BlockDriverState *bs)
+{
+BDRVNBDState *s = bs->opaque;
+
+if (s->connected) {
+nbd_client_close(bs);
+s->connected = false;
+}
+}
+
+static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
+Error **errp)
+{
+BDRVNBDState *s = bs->opaque;
+Error *local_err = NULL;
+
+/* Pop the config into our state object. Exit if invalid. */
+nbd_config(s, options, &s->export, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
+nbd_connect_server(bs, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+g_free(s->export);
+return -EINVAL;
+}
+
+return 0;
 }
 
 static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
@@ -318,7 +342,8 @@ static void nbd_close(BlockDriverState *bs)
 BDRVNBDState *s = bs->opaque;
 
 qemu_opts_del(s->socket_opts);
-nbd_client_close(bs);
+nbd_disconnect_server(bs);
+g_free(s->export);
 }
 
 static int64_t nbd_getlength(BlockDriverState *bs)
@@ -400,6 +425,8 @@ static BlockDriver bdrv_nbd = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_connect   = nbd_connect_server,
+.bdrv_disconnect= nbd_disconnect_server,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -418,6 +445,8 @@ static BlockDriver bdrv_nbd_tcp = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_connect   = nbd_connect_server,
+.bdrv_disconnect= nbd_disconnect_server,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -436,6 +465,8 @@ static BlockDriver bdrv_nbd_unix = {
 .bdrv_detach_aio_context= nbd_detach_aio_context,
 .bdrv_attach_aio_context= nbd_attach_aio_context,
 .bdrv_refresh_filename  = nbd_refresh_filename,
+.bdrv_connect   = nbd_connect_server,
+.bdrv_disconnect= nbd_disconnect_server,
 };
 
 static void bdrv_nbd_init(void)
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 12/16] skip nbd_target when starting block replication

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block.c b/block.c
index 06222bf..2108d02 100644
--- a/block.c
+++ b/block.c
@@ -4430,6 +4430,10 @@ void bdrv_start_replication(BlockDriverState *bs, 
ReplicationMode mode,
 {
 BlockDriver *drv = bs->drv;
 
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+return;
+}
+
 if (drv && drv->bdrv_start_replication) {
 drv->bdrv_start_replication(bs, mode, errp);
 } else if (bs->file) {
@@ -4443,6 +4447,10 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error 
**errp)
 {
 BlockDriver *drv = bs->drv;
 
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+return;
+}
+
 if (drv && drv->bdrv_do_checkpoint) {
 drv->bdrv_do_checkpoint(bs, errp);
 } else if (bs->file) {
@@ -4456,6 +4464,10 @@ void bdrv_stop_replication(BlockDriverState *bs, bool 
failover, Error **errp)
 {
 BlockDriver *drv = bs->drv;
 
+if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+return;
+}
+
 if (drv && drv->bdrv_stop_replication) {
 drv->bdrv_stop_replication(bs, failover, errp);
 } else if (bs->file) {
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 14/16] introduce a new API qemu_opts_absorb_qdict_by_index()

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 include/qemu/option.h |  2 ++
 util/qemu-option.c| 44 
 2 files changed, 46 insertions(+)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index ac0e43b..f868893 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -126,6 +126,8 @@ QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const 
QDict *qdict,
Error **errp);
 QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict);
 void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp);
+void qemu_opts_absorb_qdict_by_index(QemuOpts *opts, QDict *qdict,
+ const char *index, Error **errp);
 
 typedef int (*qemu_opts_loopfunc)(void *opaque, QemuOpts *opts, Error **errp);
 int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func,
diff --git a/util/qemu-option.c b/util/qemu-option.c
index 840f5f7..0c8e898 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -1006,6 +1006,50 @@ void qemu_opts_absorb_qdict(QemuOpts *opts, QDict 
*qdict, Error **errp)
 }
 
 /*
+ * Adds all QDict entries to the QemuOpts that can be added and removes them
+ * from the QDict. The key starts with "%index." in the %qdict. When this
+ * function returns, the QDict contains only those entries that couldn't be
+ * added to the QemuOpts.
+ */
+void qemu_opts_absorb_qdict_by_index(QemuOpts *opts, QDict *qdict,
+ const char *index, Error **errp)
+{
+const QDictEntry *entry, *next;
+const char *key;
+int len = strlen(index);
+
+entry = qdict_first(qdict);
+
+while (entry != NULL) {
+Error *local_err = NULL;
+OptsFromQDictState state = {
+.errp = &local_err,
+.opts = opts,
+};
+
+next = qdict_next(qdict, entry);
+if (strncmp(entry->key, index, len) || *(entry->key + len) != '.') {
+entry = next;
+continue;
+}
+
+key = entry->key + len + 1;
+
+if (find_desc_by_name(opts->list->desc, key)) {
+qemu_opts_from_qdict_1(key, entry->value, &state);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+} else {
+qdict_del(qdict, entry->key);
+}
+}
+
+entry = next;
+}
+}
+
+/*
  * Convert from QemuOpts to QDict.
  * The QDict values are of type QString.
  * TODO We'll want to use types appropriate for opt->desc->type, but
-- 
2.4.3

[Qemu-devel] [PATCH COLO-Block v6 16/16] Implement new driver for block replication

2015-06-18 Thread Wen Congyang

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
---
 block/Makefile.objs |   1 +
 block/replication.c | 441 
 2 files changed, 442 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index f068666..84952b1 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 000..da3b512
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,441 @@
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "block/nbd.h"
+
+typedef struct BDRVReplicationState {
+ReplicationMode mode;
+int replication_state;
+char *export_name;
+NBDExport *exp;
+BlockDriverState *active_disk;
+BlockDriverState *hidden_disk;
+BlockDriverState *secondary_disk; /* nbd target */
+int error;
+} BDRVReplicationState;
+
+enum {
+BLOCK_REPLICATION_NONE, /* block replication is not started */
+BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+BLOCK_REPLICATION_DONE, /* block replication is done(failover) */
+};
+
+#define COMMIT_CLUSTER_BITS 16
+#define COMMIT_CLUSTER_SIZE (1 << COMMIT_CLUSTER_BITS)
+#define COMMIT_SECTORS_PER_CLUSTER (COMMIT_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error 
**errp);
+
+#define NBD_OPT_EXPORT  "export"
+#define REPLICATION_MODE"mode"
+static QemuOptsList replication_runtime_opts = {
+.name = "replication",
+.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+.desc = {
+{
+.name = REPLICATION_MODE,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = NBD_OPT_EXPORT,
+.type = QEMU_OPT_STRING,
+.help = "The NBD server name",
+},
+{ /* end of list */ }
+},
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+int flags, Error **errp)
+{
+int ret;
+BDRVReplicationState *s = bs->opaque;;
+Error *local_err = NULL;
+QemuOpts *opts = NULL;
+const char *mode;
+
+ret = -EINVAL;
+opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
+qemu_opts_absorb_qdict(opts, options, &local_err);
+if (local_err) {
+goto fail;
+}
+
+mode = qemu_opt_get(opts, REPLICATION_MODE);
+if (!mode) {
+error_setg(&local_err, "Missing the option mode");
+goto fail;
+}
+
+if (!strcmp(mode, "primary")) {
+s->mode = REPLICATION_MODE_PRIMARY;
+} else if (!strcmp(mode, "secondary")) {
+s->mode = REPLICATION_MODE_SECONDARY;
+} else {
+error_setg(&local_err,
+   "The option mode's value should be primary or secondary");
+goto fail;
+}
+
+if (s->mode == REPLICATION_MODE_SECONDARY) {
+s->export_name = g_strdup(qemu_opt_get(opts, NBD_OPT_EXPORT));
+if (!s->export_name) {
+error_setg(&local_err, "Missing the option export");
+goto fail;
+}
+}
+
+return 0;
+
+fail:
+qemu_opts_del(opts);
+/* propagate error */
+if (local_err) {
+error_propagate(errp, local_err);
+}
+return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+BDRVReplicationState *s = bs->opaque;
+
+if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+replication_stop(bs, false, NULL);
+}
+
+g_free(s->export_name);
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+return bdrv_getlength(bs->file);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+switch (s->replication_state) {
+case BLOCK_REPLICATION_NONE:
+return -EIO;
+case BLOCK_REPLICATION_RUNNING:
+return 0;
+case BLOCK_REPLICATION_DONE:
+return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+default:
+abort();
+}
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+if (s->mode == REPLICATION_MODE_SECONDARY) {
+return ret;
+}
+
+if (ret < 0) {
+s->error = ret;
+ret = 0;
+}
+
+return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ int remaining_sectors,
+ QEMUIOVector *qiov)
+{
+BDRVReplicationState *s = bs->opaque;
+int ret;
+
+if (s->mode == REPLICATION_MODE_PRIMARY) {
+/* We only use it to forward p

Re: [Qemu-devel] [PATCH v2 1/6] qapi: qapi for audio backends

2015-06-18 Thread Markus Armbruster

"Kővágó Zoltán"  writes:

> 2015-06-17 18:06 keltezéssel, Markus Armbruster írta:
>> "Kővágó Zoltán"  writes:
>>
>>> 2015-06-17 15:37 keltezéssel, Markus Armbruster írta:
 "Kővágó Zoltán"  writes:

> 2015-06-17 13:48 keltezéssel, Markus Armbruster írta:
>> "Kővágó Zoltán"  writes:
>>
>>> 2015-06-17 09:46 keltezéssel, Markus Armbruster írta:
>> [...]
> +##
> +# @AudiodevBackendOptions
> +#
> +# A discriminated record of audio backends.
> +#
> +# Since: 2.4
> +##
> +{ 'union': 'AudiodevBackendOptions',
> +  'data': {
> +'none':  'AudiodevNoOptions',
> +'alsa':  'AudiodevAlsaOptions',
> +'coreaudio': 'AudiodevNoOptions',
> +'dsound':'AudiodevDsoundOptions',
> +'oss':   'AudiodevOssOptions',
> +'pa':'AudiodevPaOptions',
> +'sdl':   'AudiodevNoOptions',
> +'spice': 'AudiodevNoOptions',
> +'wav':   'AudiodevWavOptions' } }
> +
> +##
> +# @AudioFormat
> +#
> +# An enumeration of possible audio formats.
> +#
> +# Since: 2.4
> +##
> +{ 'enum': 'AudioFormat',
> +  'data': [ 'u8', 's8', 'u16', 's16', 'u32', 's32' ] }
> +
> +##
> +# @AudiodevPerDirectionOptions
> +#
> +# General audio backend options that are used for both playback
> and recording.
> +#
> +# @fixed-settings: #optional use fixed settings for host DAC/ADC
> +#
> +# @frequency: #optional frequency to use when using fixed settings
> +#
> +# @channels: #optional number of channels when using fixed settings
> +#
> +# @format: #optional sample format to use when using fixed settings

 Are these guys used when @fixed-settings is off?
>>>
>>> No.
>>
>> If @fixed-settings, are the other three all required?  If not, what are
>> their defaults?
>
> No, they all have defaults: 44100 Hz, 2 channels and s16 format.

 Okay, this sort of explains why you have @fixed-settings.

 My first thought was that @fixed-settings is redundant, because we can
 have any of @frequency, @channels, @format imply fixed settings.  Except
 that doesn't let you ask for the *default* fixed settings, as you have
 to specify at least one.

 What's the default for @fixed-settings?
>>>
>>> It's on by default.
>>>
 What if I specify frequency, channels or format together with explicit
 fixed-settings: false?
>>>
>>> They will be ignored.
>>>
>>> The audio system currently work like this: when an audio frontend
>>> wants to open an output with some format (frequency, channels, format)
>>> it checks fixed-settings. If it's false, it will just open the stream
>>> with the frontend specified settings. If it's true, it'll convert it
>>> into the format specified by @frequency, @channels, @format, then pass
>>> this converted/recoded stream to the backend.
>>
>> So user typically specifies either fixed-settings=off, or any
>> combination of the other three (including none of them).  Correct?
>>
>> We could reject the non-sensical combination of fixed-settings=off plus
>> any of the other three instead of silently ignoring their values.
>> Matter of taste, your choice.
>>
>> Whatever you do, make sure to document how these four work together.
>>
>> Thank you for educating me so patiently.
>
> The audio backend currently works like that you can pass any
> non-sensical values to it, like negative frequency, or 'kdp' count of
> channels, it will silently fallback to some default value, or just
> fail, but qemu will continue to run. We can make the new config more
> strict (and we should, I think), so if you have any idea where should
> we be more strict (without creating a backward compatibility
> headache), don't hesitate to point it out.

When a sensible default value exists, making the parameter optional is
usually a good idea.

We should refuse to start on non-sensical configuration, not silently
substitute defaults or disable the affected component (here: audio
backend).

I can't give you more specific guidance, because I'm an audio ignoramus
:)

Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386

2015-06-18 Thread Pavel Dovgaluk

> From: Aurelien Jarno [mailto:aurel...@aurel32.net]
> On 2015-06-18 10:12, Pavel Dovgaluk wrote:
> > > From: Aurelien Jarno [mailto:aurel...@aurel32.net]
> > > On 2015-06-17 15:41, Pavel Dovgalyuk wrote:
> > > > In icount mode every translation block looks as follows:
> > > >
> > > > if icount < n then exit
> > > > icount -= n
> > > > instr1
> > > > instr2
> > > > ...
> > > > instrn
> > > > exit
> > > >
> > > > When one of these instructions initiates an exception, icount should be
> > > > restored and adjusted number of instructions should be subtracted from 
> > > > icount
> > > > instead of initial n.
> > > >
> > > > tlb_fill function passes retaddr to raise_exception, which allows 
> > > > restoring
> > > > current instructions in TB and correct icount calculation.
> > > >
> > > > When exception triggered with other function (e.g. by embedding call to
> > > > exception raising helper into TB), then PC is not passed as retaddr and
> > > > correct icount is not recovered. In such cases icount will be decreased
> > > > by the value equal to the size of TB.
> > >
> > > Looking at how icount work, I see it's basically a variable in the CPU
> > > state (icount_decr.u16.low), which is already accessed from the TB.
> > > Couldn't we adjust it using additional code before generating an
> > > exception, when in icount mode.
> > >
> > > For example for MIPS, we can add some code before generate_exception
> > > which use the value from s->gen_opc_icount[j] to adjust
> > > the variable icount_decr.u16.low.
> >
> > It is possible, but it will incur additional overhead, because we will
> > have to update icount every time the exception might be generated.
> > We'll have to update icount value before and after every helper call,
> > that can cause an exception:
> >
> > icount -= n
> > ...
> > instr_k
> > icount += n - k
> > helper
> > icount -= n - k
> > ...
> >
> > And this overhead will slowdown the code even if no exception occur.
> 
> That's where I might disagree. Retranslation seems a very good idea on
> the paper, but in practice it doesn't seems to always bring the
> performance improvement it should. In addition it seems to be highly
> dependent on the target. Just to give some numbers, on MIPS (as your
> patch originally concerns this architecture), 40% of code generation is
> actually due to retranslation. The problem is that over the time we have
> improved a lot the code generation (liveness analysis, better register
> allocation, constant propagation, ...) and thus we have increased the
> code generation time. While it clearly has some benefits when this code
> is actually executed, it's not the case when the code is simply
> retranslated. In short we spend more time to find the CPU state
> corresponding to an exception than before.
> 
...
> 
> All of that to say that I am worried for the performances to see more
> paths through the retranslation code, especially on MIPS as it seems to
> be costly. That said I haven't really look in details at other targets,
> nor hosts.

I fixed syscalls, exceptions that occur without any conditions,
and removed redundant calls to save_cpu_state. Then I measured the performance
without enabling icount. And Linux boots even faster than with original version.
I'll submit this version for review soon.

> Now to come back about your patches, we might want to simply fix icount
> first, even if it has some performance impact, and deal with the
> retranslation issue separately, as it concerns more than just icount.

Pavel Dovgalyuk

[Qemu-devel] [PATCH COLO-Frame v6 01/31] configure: Add parameter for configure to enable/disable COLO support

2015-06-18 Thread zhanghailiang

configure --enable-colo/--disable-colo to switch COLO
support on/off.
COLO support is off by default.

Signed-off-by: zhanghailiang 
Signed-off-by: Yang Hongyang 
Signed-off-by: Gonglei 
Signed-off-by: Lai Jiangshan 
---
 configure | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/configure b/configure
index 222694f..8b6242f 100755
--- a/configure
+++ b/configure
@@ -259,6 +259,7 @@ xfs=""
 vhost_net="no"
 vhost_scsi="no"
 kvm="no"
+colo="no"
 rdma=""
 gprof="no"
 debug_tcg="no"
@@ -932,6 +933,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-colo) colo="no"
+  ;;
+  --enable-colo) colo="yes"
+  ;;
   --disable-tcg-interpreter) tcg_interpreter="no"
   ;;
   --enable-tcg-interpreter) tcg_interpreter="yes"
@@ -1336,6 +1341,10 @@ Advanced options (experts only):
   --disable-slirp  disable SLIRP userspace network connectivity
   --disable-kvmdisable KVM acceleration support
   --enable-kvm enable KVM acceleration support
+  --disable-colo   disable COarse-grain LOck-stepping Virtual
+   Machines for Non-stop Service
+  --enable-coloenable COarse-grain LOck-stepping Virtual
+   Machines for Non-stop Service (default)
   --disable-rdma   disable RDMA-based migration support
   --enable-rdmaenable RDMA-based migration support
   --enable-tcg-interpreter enable TCG with bytecode interpreter (TCI)
@@ -4450,6 +4459,7 @@ echo "Linux AIO support $linux_aio"
 echo "ATTR/XATTR support $attr"
 echo "Install blobs $blobs"
 echo "KVM support   $kvm"
+echo "COLO support  $colo"
 echo "RDMA support  $rdma"
 echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support   $fdt"
@@ -5007,6 +5017,10 @@ if have_backend "ftrace"; then
 fi
 echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
 
+if test "$colo" = "yes"; then
+  echo "CONFIG_COLO=y" >> $config_host_mak
+fi
+
 if test "$rdma" = "yes" ; then
   echo "CONFIG_RDMA=y" >> $config_host_mak
 fi
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 00/31] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service

2015-06-18 Thread zhanghailiang

This is the 6th version of COLO, here is only COLO frame part, include: VM 
checkpoint,
failover, proxy API, block replication API, not include block replication.
The block part is sent as a separate series.

As usuall, we provide two branch which one is 'colo-v1.3-basic', 
and the other is 'colo-v1.3-developing', The 'basic' branch is exactly the same
with this patch series, which has basic features of COLO.
We will keep this series simple as possible, just for easy review.

You can get the newest integrated qemu colo patches from github (Include Block 
part):
https://github.com/coloft/qemu/commits/colo-v1.3-basic
https://github.com/coloft/qemu/commits/colo-v1.3-developing (more features)
Please NOTE the difference between these two branch.
Colo-v1.3-developing has some optimization in the process of checkpoint, 
including: 
   1) separate ram and device save/load process to reduce size of extra memory
  used during checkpoint
   2) live migrate part of dirty pages to slave during sleep time.
Besides, we add some statistic info in 'developing' branch, which you can get 
these stat
info by using command 'info migrate'.

About how to test COLO, Please reference to the follow link.
http://wiki.qemu.org/Features/COLO.

For the kernel part (colo proxy) of COLO, we have sent a RFC patch to kernel 
community:
https://lkml.org/lkml/2015/6/18/32

COLO is a totally new feature which is still in early stage, 
your comments and feedback are warmly welcomed.

Cc: netfilter-de...@vger.kernel.org

TODO:
1. COLO function switch on/off
2. Optimize proxy part, include proxy script.
  1) Remove the limitation of forward network link.
  2) Reuse the nfqueue_entry and NF_STOLEN to enqueue skb
3. The capability of continuous FT

v6:
- Add a new qmp event 'COLO_EXIT' for COLO error, which is useful
  for users to get involved in failover verdict. 
- Support '-net nic' configure
- Fix segmentfault bug that triggered by running 'colo_lost_heartbeat' directly
  when VM is not in COLO state.
- Fix qemu abort bug that triggered by Startup another migration when in COLO 
state.
- Optimize some codes, especailly colo net part.

zhanghailiang (31):
  configure: Add parameter for configure to enable/disable COLO support
  migration: Introduce capability 'colo' to migration
  COLO: migrate colo related info to slave
  migration: Integrate COLO checkpoint process into migration
  migration: Integrate COLO checkpoint process into loadvm
  COLO: Implement colo checkpoint protocol
  COLO: Add a new RunState RUN_STATE_COLO
  QEMUSizedBuffer: Introduce two help functions for qsb
  COLO: Save VM state to slave when do checkpoint
  COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
  COLO VMstate: Load VM state into qsb before restore it
  arch_init: Start to trace dirty pages of SVM
  COLO RAM: Flush cached RAM into SVM's memory
  COLO failover: Introduce a new command to trigger a failover
  COLO failover: Implement COLO primary/secondary vm failover work
  qmp event: Add event notification for COLO error
  COLO failover: Don't do failover during loading VM's state
  COLO: Add new command parameter 'colo_nicname' 'colo_script' for net
  COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
  tap: Make launch_script() public
  COLO NIC: Implement colo nic device interface configure()
  COLO NIC : Implement colo nic init/destroy function
  COLO NIC: Some init work related with proxy module
  COLO: Handle nfnetlink message from proxy module
  COLO: Do checkpoint according to the result of packets comparation
  COLO: Improve checkpoint efficiency by do additional periodic
checkpoint
  COLO: Add colo-set-checkpoint-period command
  COLO NIC: Implement NIC checkpoint and failover
  COLO: Disable qdev hotplug when VM is in COLO mode
  COLO: Implement shutdown checkpoint
  COLO: Add block replication into colo process

 configure  |  36 +-
 docs/qmp/qmp-events.txt|  16 +
 hmp-commands.hx|  30 ++
 hmp.c  |  15 +
 hmp.h  |   2 +
 include/exec/cpu-all.h |   1 +
 include/migration/migration-colo.h |  50 ++
 include/migration/migration-failover.h |  22 +
 include/migration/migration.h  |   3 +
 include/migration/qemu-file.h  |   3 +-
 include/net/colo-nic.h |  34 ++
 include/net/net.h  |   2 +
 include/net/tap.h  |  19 +
 include/sysemu/sysemu.h|   3 +
 migration/Makefile.objs|   2 +
 migration/colo-comm.c  |  68 +++
 migration/colo-failover.c  |  53 ++
 migration/colo.c   | 854 +
 migration/migration.c  |  68 ++-
 migration/qemu-file-buf.c  |  58 +++
 migration/ram.c| 249 +-
 migration/savevm.c |   2 +-
 net/Mak

[Qemu-devel] [PATCH COLO-Frame v6 03/31] COLO: migrate colo related info to slave

2015-06-18 Thread zhanghailiang

We can know if VM in destination should go into COLO mode by refer to
the info that has been migrated from PVM.

Signed-off-by: zhanghailiang 
Signed-off-by: Yang Hongyang 
Signed-off-by: Lai Jiangshan 
Signed-off-by: Gonglei 
---
 include/migration/migration-colo.h |  2 ++
 migration/Makefile.objs|  1 +
 migration/colo-comm.c  | 47 ++
 trace-events   |  3 +++
 vl.c   |  5 +++-
 5 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 migration/colo-comm.c

diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index c6d0c51..e20a0c1 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -14,7 +14,9 @@
 #define QEMU_MIGRATION_COLO_H
 
 #include "qemu-common.h"
+#include "migration/migration.h"
 
 bool colo_supported(void);
+void colo_info_mig_init(void);
 
 #endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 5a25d39..cb7bd30 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,5 +1,6 @@
 common-obj-y += migration.o tcp.o
 common-obj-$(CONFIG_COLO) += colo.o
+common-obj-y += colo-comm.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
new file mode 100644
index 000..0b76eb4
--- /dev/null
+++ b/migration/colo-comm.c
@@ -0,0 +1,47 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later. See the COPYING file in the top-level directory.
+ *
+ */
+
+#include 
+#include "trace.h"
+
+static bool colo_requested;
+
+/* save */
+static void colo_info_save(QEMUFile *f, void *opaque)
+{
+qemu_put_byte(f, migrate_enable_colo());
+}
+
+/* restore */
+static int colo_info_load(QEMUFile *f, void *opaque, int version_id)
+{
+int value = qemu_get_byte(f);
+
+if (value && !colo_requested) {
+trace_colo_info_load("COLO request!");
+}
+colo_requested = value;
+
+return 0;
+}
+
+static SaveVMHandlers savevm_colo_info_handlers = {
+.save_state = colo_info_save,
+.load_state = colo_info_load,
+};
+
+void colo_info_mig_init(void)
+{
+register_savevm_live(NULL, "colo", -1, 1,
+ &savevm_colo_info_handlers, NULL);
+}
diff --git a/trace-events b/trace-events
index 52b7efa..3f63019 100644
--- a/trace-events
+++ b/trace-events
@@ -1466,6 +1466,9 @@ rdma_start_incoming_migration_after_rdma_listen(void) ""
 rdma_start_outgoing_migration_after_rdma_connect(void) ""
 rdma_start_outgoing_migration_after_rdma_source_init(void) ""
 
+# migration/colo-comm.c
+colo_info_load(const char *msg) "%s"
+
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
 kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
diff --git a/vl.c b/vl.c
index 2201e27..988567a 100644
--- a/vl.c
+++ b/vl.c
@@ -90,6 +90,7 @@ int main(int argc, char **argv)
 #include "sysemu/dma.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/migration-colo.h"
 #include "sysemu/kvm.h"
 #include "qapi/qmp/qjson.h"
 #include "qemu/option.h"
@@ -4261,7 +4262,9 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 ram_mig_init();
-
+#ifdef CONFIG_COLO
+colo_info_mig_init();
+#endif
 /* If the currently selected machine wishes to override the units-per-bus
  * property of its default HBA interface type, do so now. */
 if (machine_class->units_per_default_bus) {
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 06/31] COLO: Implement colo checkpoint protocol

2015-06-18 Thread zhanghailiang

We need communications protocol of user-defined to control the checkpoint
process.

The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points,

  Primary Secondary
  NEW @
  Suspend
  SUSPENDED   @
  Suspend&Save state
  SEND@
  Send state  Receive state
  RECEIVED@
  Flush network   Load state
  LOADED  @
  Resume  Resume

  Start Comparing
NOTE:
 1) '@' who sends the message
 2) Every sync-point is synchronized by two sides with only
one handshake(single direction) for low-latency.
If more strict synchronization is required, a opposite direction
sync-point should be added.
 3) Since sync-points are single direction, the remote side may
go forward a lot when this side just receives the sync-point.

Signed-off-by: Yang Hongyang 
Signed-off-by: Lai Jiangshan 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Gonglei 
---
 migration/colo.c | 237 ++-
 1 file changed, 235 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 45f9efd..0f7c36b 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -15,6 +15,41 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 
+enum {
+COLO_CHECPOINT_READY = 0x46,
+
+/*
+* Checkpoint synchronizing points.
+*
+*  Primary Secondary
+*  NEW @
+*  Suspend
+*  SUSPENDED   @
+*  Suspend&Save state
+*  SEND@
+*  Send state  Receive state
+*  RECEIVED@
+*  Flush network   Load state
+*  LOADED  @
+*  Resume  Resume
+*
+*  Start Comparing
+* NOTE:
+* 1) '@' who sends the message
+* 2) Every sync-point is synchronized by two sides with only
+*one handshake(single direction) for low-latency.
+*If more strict synchronization is required, a opposite direction
+*sync-point should be added.
+* 3) Since sync-points are single direction, the remote side may
+*go forward a lot when this side just receives the sync-point.
+*/
+COLO_CHECKPOINT_NEW,
+COLO_CHECKPOINT_SUSPENDED,
+COLO_CHECKPOINT_SEND,
+COLO_CHECKPOINT_RECEIVED,
+COLO_CHECKPOINT_LOADED,
+};
+
 static QEMUBH *colo_bh;
 static Coroutine *colo;
 
@@ -34,19 +69,136 @@ bool loadvm_in_colo_state(void)
 return colo != NULL;
 }
 
+/* colo checkpoint control helper */
+static int colo_ctl_put(QEMUFile *f, uint64_t request)
+{
+int ret = 0;
+
+qemu_put_be64(f, request);
+qemu_fflush(f);
+
+ret = qemu_file_get_error(f);
+
+return ret;
+}
+
+static int colo_ctl_get_value(QEMUFile *f, uint64_t *value)
+{
+int ret = 0;
+uint64_t temp;
+
+temp = qemu_get_be64(f);
+
+ret = qemu_file_get_error(f);
+if (ret < 0) {
+return -1;
+}
+
+*value = temp;
+return 0;
+}
+
+static int colo_ctl_get(QEMUFile *f, uint64_t require)
+{
+int ret;
+uint64_t value;
+
+ret = colo_ctl_get_value(f, &value);
+if (ret < 0) {
+return ret;
+}
+
+if (value != require) {
+error_report("unexpected state! expected: %"PRIu64
+ ", received: %"PRIu64, require, value);
+exit(1);
+}
+
+return ret;
+}
+
+static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
+{
+int ret;
+
+ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
+if (ret < 0) {
+goto out;
+}
+
+ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED);
+if (ret < 0) {
+goto out;
+}
+
+/* TODO: suspend and save vm state to colo buffer */
+
+ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
+if (ret < 0) {
+goto out;
+}
+
+/* TODO: send vmstate to Secondary */
+
+ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
+if (ret < 0) {
+goto out;
+}
+trace_colo_receive_message("COLO_CHECKPOINT_RECEIVED");
+
+ret = colo_ctl_get(control, COLO_CHECKPOINT_LOADED);
+if (ret < 0) {
+goto out;
+}
+trace_colo_receive_message("COLO_CHECKPOINT_LOADED");
+
+/* TODO: resume Primary */
+
+out:
+return ret;
+}
+
 static void *colo_thread(void *opaque)
 {
 MigrationState *s = opaque;
+QEMUFile *colo_control = NULL;
+int ret;
+
+colo_control = qemu_fopen_socket(qemu_get_fd(s->file), "rb");
+if (!colo_control) {
+

[Qemu-devel] [PATCH COLO-Frame v6 04/31] migration: Integrate COLO checkpoint process into migration

2015-06-18 Thread zhanghailiang

Add a migrate state: MIGRATION_STATUS_COLO, enter this migration state
after the first live migration successfully finished.

Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Lai Jiangshan 
---
 include/migration/migration-colo.h |  3 +++
 include/migration/migration.h  |  2 ++
 migration/colo.c   | 55 ++
 migration/migration.c  | 25 -
 qapi-schema.json   |  2 +-
 stubs/migration-colo.c |  9 +++
 trace-events   |  3 +++
 7 files changed, 92 insertions(+), 7 deletions(-)

diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index e20a0c1..b4f75c2 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -19,4 +19,7 @@
 bool colo_supported(void);
 void colo_info_mig_init(void);
 
+void colo_init_checkpointer(MigrationState *s);
+bool migrate_in_colo_state(void);
+
 #endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 225e9e6..67ad4fd 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -81,6 +81,8 @@ struct MigrationState
 int64_t dirty_sync_count;
 };
 
+void migrate_set_state(MigrationState *s, int old_state, int new_state);
+
 void process_incoming_migration(QEMUFile *f);
 
 void qemu_start_incoming_migration(const char *uri, Error **errp);
diff --git a/migration/colo.c b/migration/colo.c
index bcd753b..1ff4e55 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,9 +10,64 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
+#include "trace.h"
+
+static QEMUBH *colo_bh;
 
 bool colo_supported(void)
 {
 return true;
 }
+
+bool migrate_in_colo_state(void)
+{
+MigrationState *s = migrate_get_current();
+return (s->state == MIGRATION_STATUS_COLO);
+}
+
+static void *colo_thread(void *opaque)
+{
+MigrationState *s = opaque;
+
+qemu_mutex_lock_iothread();
+vm_start();
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("stop", "run");
+
+/*TODO: COLO checkpoint savevm loop*/
+
+migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
+
+qemu_mutex_lock_iothread();
+qemu_bh_schedule(s->cleanup_bh);
+qemu_mutex_unlock_iothread();
+
+return NULL;
+}
+
+static void colo_start_checkpointer(void *opaque)
+{
+MigrationState *s = opaque;
+
+if (colo_bh) {
+qemu_bh_delete(colo_bh);
+colo_bh = NULL;
+}
+
+qemu_mutex_unlock_iothread();
+qemu_thread_join(&s->thread);
+qemu_mutex_lock_iothread();
+
+migrate_set_state(s, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO);
+
+qemu_thread_create(&s->thread, "colo", colo_thread, s,
+   QEMU_THREAD_JOINABLE);
+}
+
+void colo_init_checkpointer(MigrationState *s)
+{
+colo_bh = qemu_bh_new(colo_start_checkpointer, s);
+qemu_bh_schedule(colo_bh);
+}
diff --git a/migration/migration.c b/migration/migration.c
index b31ce94..0589fc8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -298,6 +298,10 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
 get_xbzrle_cache_stats(info);
 break;
+case MIGRATION_STATUS_COLO:
+info->has_status = true;
+/* TODO: display COLO specific information (checkpoint info etc.) */
+break;
 case MIGRATION_STATUS_COMPLETED:
 get_xbzrle_cache_stats(info);
 
@@ -400,7 +404,7 @@ void qmp_migrate_set_parameters(bool has_compress_level,
 
 /* shared migration helpers */
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+void migrate_set_state(MigrationState *s, int old_state, int new_state)
 {
 if (atomic_cmpxchg(&s->state, old_state, new_state) == new_state) {
 trace_migrate_set_state(new_state);
@@ -583,7 +587,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 
 if (s->state == MIGRATION_STATUS_ACTIVE ||
 s->state == MIGRATION_STATUS_SETUP ||
-s->state == MIGRATION_STATUS_CANCELLING) {
+s->state == MIGRATION_STATUS_CANCELLING ||
+s->state == MIGRATION_STATUS_COLO) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
@@ -784,6 +789,7 @@ static void *migration_thread(void *opaque)
 int64_t max_size = 0;
 int64_t start_time = initial_time;
 bool old_vm_running = false;
+bool enable_colo = migrate_enable_colo();
 
 qemu_savevm_state_header(s->file);
 qemu_savevm_state_begin(s->file, &s->params);
@@ -822,8 +828,10 @@ static void *migration_thread(void *opaque)
 }
 
 if (!qemu_file_get_error(s->file)) {
-migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-  MIGRATION_STATUS_COMPLETED);
+if (!enable_colo) {
+migra

[Qemu-devel] [PATCH COLO-Frame v6 10/31] COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily

2015-06-18 Thread zhanghailiang

The ram cache is initially the same as SVM/PVM's memory.

At checkpoint, we cache the dirty RAM of PVM into RAM cache in the slave
(so that RAM cache always the same as PVM's memory at every
checkpoint), we will flush cached RAM to SVM after we receive
all PVM's vmstate (RAM/device).

Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Yang Hongyang 
Signed-off-by: Lai Jiangshan 
Signed-off-by: Li Zhijian 
---
 include/exec/cpu-all.h |  1 +
 include/migration/migration-colo.h |  3 ++
 migration/colo.c   | 31 +++--
 migration/ram.c| 93 +-
 4 files changed, 122 insertions(+), 6 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index ac06c67..bcfa3bc 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -272,6 +272,7 @@ struct RAMBlock {
 struct rcu_head rcu;
 struct MemoryRegion *mr;
 uint8_t *host;
+uint8_t *host_cache; /* For colo, VM's ram cache */
 ram_addr_t offset;
 ram_addr_t used_length;
 ram_addr_t max_length;
diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index b2798f7..2110182 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -35,4 +35,7 @@ bool loadvm_enable_colo(void);
 void loadvm_exit_colo(void);
 void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
+/* ram cache */
+int create_and_init_ram_cache(void);
+void release_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 7404507..439a6fa 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -325,11 +325,22 @@ void *colo_process_incoming_checkpoints(void *opaque)
 error_report("Can't open incoming channel!");
 goto out;
 }
+
+if (create_and_init_ram_cache() < 0) {
+error_report("Failed to initialize ram cache");
+goto out;
+}
+
 ret = colo_ctl_put(ctl, COLO_CHECPOINT_READY);
 if (ret < 0) {
 goto out;
 }
-/* TODO: in COLO mode, Secondary is runing, so start the vm */
+qemu_mutex_lock_iothread();
+/* in COLO mode, slave is runing, so start the vm */
+vm_start();
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("stop", "run");
+
 while (true) {
 int request = 0;
 int ret = colo_wait_handle_cmd(f, &request);
@@ -342,7 +353,12 @@ void *colo_process_incoming_checkpoints(void *opaque)
 }
 }
 
-/* TODO: suspend guest */
+/* suspend guest */
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("run", "stop");
+
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_SUSPENDED);
 if (ret < 0) {
 goto out;
@@ -354,7 +370,7 @@ void *colo_process_incoming_checkpoints(void *opaque)
 }
 trace_colo_receive_message("COLO_CHECKPOINT_SEND");
 
-/* TODO: read migration data into colo buffer */
+/*TODO Load VM state */
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
 if (ret < 0) {
@@ -362,16 +378,23 @@ void *colo_process_incoming_checkpoints(void *opaque)
 }
 trace_colo_receive_message("COLO_CHECKPOINT_RECEIVED");
 
-/* TODO: load vm state */
+/* TODO: flush vm state */
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
 if (ret < 0) {
 goto out;
 }
+
+/* resume guest */
+qemu_mutex_lock_iothread();
+vm_start();
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("stop", "start");
 }
 
 out:
 colo = NULL;
+release_ram_cache();
 if (ctl) {
 qemu_fclose(ctl);
 }
diff --git a/migration/ram.c b/migration/ram.c
index bc362f0..ded13f8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -224,6 +224,7 @@ static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
 static unsigned long *migration_bitmap;
 static uint64_t migration_dirty_pages;
+static bool ram_cache_enable;
 static uint32_t last_version;
 static bool ram_bulk_stage;
 
@@ -1313,6 +1314,8 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void 
*host)
 return 0;
 }
 
+static void *memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock 
*block);
+
 /* Must be called from within a rcu critical section.
  * Returns a pointer from within the RCU-protected ram_list.
  */
@@ -1330,7 +1333,20 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 return NULL;
 }
 
-return memory_region_get_ram_ptr(block->mr) + offset;
+if (ram_cache_enable) {
+/*
+* During colo checkpoint, we need bitmap of these migrated pages.
+* It help us to decide which pages in ram cache should be flushed
+* into VM's RAM later.
+*/
+long

[Qemu-devel] [PATCH COLO-Frame v6 05/31] migration: Integrate COLO checkpoint process into loadvm

2015-06-18 Thread zhanghailiang

Switch from normal migration loadvm process into COLO checkpoint process if
COLO mode is enabled.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Lai Jiangshan 
Signed-off-by: Yang Hongyang 
---
 include/migration/migration-colo.h | 13 +
 migration/colo-comm.c  | 10 ++
 migration/colo.c   | 20 
 migration/migration.c  | 26 +-
 stubs/migration-colo.c | 10 ++
 trace-events   |  1 +
 6 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index b4f75c2..b2798f7 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -15,11 +15,24 @@
 
 #include "qemu-common.h"
 #include "migration/migration.h"
+#include "block/coroutine.h"
+#include "qemu/thread.h"
 
 bool colo_supported(void);
 void colo_info_mig_init(void);
 
+struct colo_incoming {
+QEMUFile *file;
+QemuThread thread;
+};
+
 void colo_init_checkpointer(MigrationState *s);
 bool migrate_in_colo_state(void);
 
+/* loadvm */
+extern Coroutine *migration_incoming_co;
+bool loadvm_enable_colo(void);
+void loadvm_exit_colo(void);
+void *colo_process_incoming_checkpoints(void *opaque);
+bool loadvm_in_colo_state(void);
 #endif
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
index 0b76eb4..f8be027 100644
--- a/migration/colo-comm.c
+++ b/migration/colo-comm.c
@@ -45,3 +45,13 @@ void colo_info_mig_init(void)
 register_savevm_live(NULL, "colo", -1, 1,
  &savevm_colo_info_handlers, NULL);
 }
+
+bool loadvm_enable_colo(void)
+{
+return colo_requested;
+}
+
+void loadvm_exit_colo(void)
+{
+colo_requested = false;
+}
diff --git a/migration/colo.c b/migration/colo.c
index 1ff4e55..45f9efd 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -13,8 +13,10 @@
 #include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
 #include "trace.h"
+#include "qemu/error-report.h"
 
 static QEMUBH *colo_bh;
+static Coroutine *colo;
 
 bool colo_supported(void)
 {
@@ -27,6 +29,11 @@ bool migrate_in_colo_state(void)
 return (s->state == MIGRATION_STATUS_COLO);
 }
 
+bool loadvm_in_colo_state(void)
+{
+return colo != NULL;
+}
+
 static void *colo_thread(void *opaque)
 {
 MigrationState *s = opaque;
@@ -71,3 +78,16 @@ void colo_init_checkpointer(MigrationState *s)
 colo_bh = qemu_bh_new(colo_start_checkpointer, s);
 qemu_bh_schedule(colo_bh);
 }
+
+void *colo_process_incoming_checkpoints(void *opaque)
+{
+colo = qemu_coroutine_self();
+assert(colo != NULL);
+
+/* TODO: COLO checkpoint restore loop */
+
+colo = NULL;
+loadvm_exit_colo();
+
+return NULL;
+}
diff --git a/migration/migration.c b/migration/migration.c
index 0589fc8..72763b6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -135,6 +135,7 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 }
 }
 
+Coroutine *migration_incoming_co;
 static void process_incoming_migration_co(void *opaque)
 {
 QEMUFile *f = opaque;
@@ -145,7 +146,24 @@ static void process_incoming_migration_co(void *opaque)
 
 ret = qemu_loadvm_state(f);
 
-qemu_fclose(f);
+/* we get colo info, and know if we are in colo mode */
+if (loadvm_enable_colo()) {
+struct colo_incoming *colo_in = g_malloc0(sizeof(*colo_in));
+
+colo_in->file = f;
+migration_incoming_co = qemu_coroutine_self();
+qemu_thread_create(&colo_in->thread, "colo incoming",
+ colo_process_incoming_checkpoints, colo_in, QEMU_THREAD_JOINABLE);
+qemu_coroutine_yield();
+migration_incoming_co = NULL;
+#if 0
+/* FIXME  wait checkpoint incoming thread exit, and free resource */
+qemu_thread_join(&colo_in->thread);
+g_free(colo_in);
+#endif
+} else {
+qemu_fclose(f);
+}
 free_xbzrle_decoded_buf();
 migration_incoming_state_destroy();
 
@@ -593,6 +611,12 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 return;
 }
 
+if (loadvm_in_colo_state()) {
+error_setg(errp, "Secondary VM is not allowed to do migration while"
+   "in COLO status");
+return;
+}
+
 if (runstate_check(RUN_STATE_INMIGRATE)) {
 error_setg(errp, "Guest is waiting for an incoming migration");
 return;
diff --git a/stubs/migration-colo.c b/stubs/migration-colo.c
index c0bb8d8..827ee1f 100644
--- a/stubs/migration-colo.c
+++ b/stubs/migration-colo.c
@@ -22,6 +22,16 @@ bool migrate_in_colo_state(void)
 return false;
 }
 
+bool loadvm_in_colo_state(void)
+{
+return false;
+}
+
 void colo_init_checkpointer(MigrationState *s)
 {
 }
+
+void *colo_process_incoming_checkpoints(void *opaque)
+{
+return NULL;
+}
diff --git a/trace-events b/trace-events
index ae0a460..af05a12 100644
--- a/trace-events
+++ b/t

[Qemu-devel] [PATCH COLO-Frame v6 07/31] COLO: Add a new RunState RUN_STATE_COLO

2015-06-18 Thread zhanghailiang

Guest will enter this state when paused to save/restore VM state
under colo checkpoint.

Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Lai Jiangshan 
Reviewed-by: Dr. David Alan Gilbert 
---
 qapi-schema.json | 5 -
 vl.c | 8 
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 993a3be..88c9fd6 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -148,12 +148,15 @@
 # @watchdog: the watchdog action is configured to pause and has been triggered
 #
 # @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @colo: guest is paused to save/restore VM state under colo checkpoint (since
+# 2.4)
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-'guest-panicked' ] }
+'guest-panicked', 'colo' ] }
 
 ##
 # @StatusInfo:
diff --git a/vl.c b/vl.c
index 988567a..8f81062 100644
--- a/vl.c
+++ b/vl.c
@@ -572,6 +572,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 
 { RUN_STATE_INMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_INMIGRATE, RUN_STATE_PAUSED },
+{ RUN_STATE_INMIGRATE, RUN_STATE_COLO },
 
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
@@ -581,6 +582,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 
 { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
 { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_PAUSED, RUN_STATE_COLO},
 
 { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -591,9 +593,12 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
+{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO},
 
 { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING },
 
+{ RUN_STATE_COLO, RUN_STATE_RUNNING },
+
 { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
 { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
 { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR },
@@ -604,6 +609,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
 { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
 { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
+{ RUN_STATE_RUNNING, RUN_STATE_COLO},
 
 { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
@@ -614,9 +620,11 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
 { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
 { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_SUSPENDED, RUN_STATE_COLO},
 
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_WATCHDOG, RUN_STATE_COLO},
 
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 13/31] COLO RAM: Flush cached RAM into SVM's memory

2015-06-18 Thread zhanghailiang

During the time of VM's running, PVM/SVM may dirty some pages, we will transfer
PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
time. So, the content of SVM's RAM cache will always be some with PVM's memory
after checkpoint.

Instead of flushing all content of SVM's RAM cache into SVM's MEMORY,
we do this in a more efficient way:
Only flush any page that dirtied by PVM or SVM since last checkpoint.
In this way, we ensure SVM's memory same with PVM's.

Besides, we must ensure flush RAM cache before load device state.

Signed-off-by: zhanghailiang 
Signed-off-by: Lai Jiangshan 
Signed-off-by: Li Zhijian 
Signed-off-by: Yang Hongyang 
Signed-off-by: Gonglei 
---
 include/migration/migration-colo.h |  1 +
 migration/colo.c   |  2 -
 migration/ram.c| 92 ++
 3 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index 2110182..c03c391 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -37,5 +37,6 @@ void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
 /* ram cache */
 int create_and_init_ram_cache(void);
+void colo_flush_ram_cache(void);
 void release_ram_cache(void);
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index c82feb5..07f677a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -414,8 +414,6 @@ void *colo_process_incoming_checkpoints(void *opaque)
 }
 qemu_mutex_unlock_iothread();
 
-/* TODO: flush vm state */
-
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
 if (ret < 0) {
 goto out;
diff --git a/migration/ram.c b/migration/ram.c
index 8c9edf0..e677162 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1482,6 +1482,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 int flags = 0, ret = 0;
 static uint64_t seq_iter;
 int len = 0;
+bool need_flush = false;
 
 seq_iter++;
 
@@ -1548,6 +1549,8 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret = -EINVAL;
 break;
 }
+
+need_flush = true;
 ch = qemu_get_byte(f);
 ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
 break;
@@ -1558,6 +1561,8 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret = -EINVAL;
 break;
 }
+
+need_flush = true;
 qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
 break;
 case RAM_SAVE_FLAG_COMPRESS_PAGE:
@@ -1590,6 +1595,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 ret = -EINVAL;
 break;
 }
+need_flush = true;
 break;
 case RAM_SAVE_FLAG_EOS:
 /* normal exit */
@@ -1609,6 +1615,11 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 }
 
 rcu_read_unlock();
+
+if (!ret  && ram_cache_enable && need_flush) {
+DPRINTF("Flush ram_cache\n");
+colo_flush_ram_cache();
+}
 DPRINTF("Completed load of VM with exit code %d seq iteration "
 "%" PRIu64 "\n", ret, seq_iter);
 return ret;
@@ -1694,6 +1705,87 @@ static void 
*memory_region_get_ram_cache_ptr(MemoryRegion *mr, RAMBlock *block)
 return block->host_cache + (addr - block->offset);
 }
 
+/* fix me: should this helper function be merged with
+ * migration_bitmap_find_and_reset_dirty ?
+ */
+static inline
+ram_addr_t host_bitmap_find_and_reset_dirty(MemoryRegion *mr,
+ram_addr_t start)
+{
+unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
+unsigned long nr = base + (start >> TARGET_PAGE_BITS);
+uint64_t mr_size = TARGET_PAGE_ALIGN(memory_region_size(mr));
+unsigned long size = base + (mr_size >> TARGET_PAGE_BITS);
+
+unsigned long next;
+
+next = find_next_bit(ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION],
+ size, nr);
+if (next < size) {
+clear_bit(next, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
+}
+return (next - base) << TARGET_PAGE_BITS;
+}
+
+/*
+* Flush content of RAM cache into SVM's memory.
+* Only flush the pages that be dirtied by PVM or SVM or both.
+*/
+void colo_flush_ram_cache(void)
+{
+RAMBlock *block = NULL;
+void *dst_host;
+void *src_host;
+ram_addr_t ca  = 0, ha = 0;
+bool got_ca = 0, got_ha = 0;
+int64_t host_dirty = 0, both_dirty = 0;
+
+address_space_sync_dirty_bitmap(&address_space_memory);
+rcu_read_lock();
+block = QLIST_FIRST_RCU(&ram_list.blocks);
+while (true) {
+if (ca < block->used_length && ca <= ha) {
+ca = migration_bitmap_find_and_reset_dirty(block->mr, ca);
+if (ca < block->used_length) {
+go

[Qemu-devel] [PATCH COLO-Frame v6 15/31] COLO failover: Implement COLO primary/secondary vm failover work

2015-06-18 Thread zhanghailiang

If there are some errors happen, we will give users(administrators) time to
get involved in failover verdict, which they can decide
which side should take over the work by using 'colo_lost_heartbeat' command.

Note: The default verdict is primary VM takes over work while secondary VM exit.
So if users choose secondary VM to take over work, please make sure that
Primary VM is dead, or there will be 'split-brain' problem.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Lai Jiangshan 
---
 include/migration/migration-colo.h |   4 +
 include/migration/migration-failover.h |   2 +
 migration/colo-failover.c  |  12 ++-
 migration/colo.c   | 132 -
 trace-events   |   1 +
 5 files changed, 147 insertions(+), 4 deletions(-)

diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index f9c09f3..ea10374 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -43,4 +43,8 @@ int get_colo_mode(void);
 int create_and_init_ram_cache(void);
 void colo_flush_ram_cache(void);
 void release_ram_cache(void);
+
+/* failover */
+void colo_do_failover(MigrationState *s);
+
 #endif
diff --git a/include/migration/migration-failover.h 
b/include/migration/migration-failover.h
index a8767fc..5e59b1d 100644
--- a/include/migration/migration-failover.h
+++ b/include/migration/migration-failover.h
@@ -16,5 +16,7 @@
 #include "qemu-common.h"
 
 void failover_request_set(void);
+void failover_request_clear(void);
+bool failover_request_is_set(void);
 
 #endif
diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index 0f2f81f..d8f740c 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -23,7 +23,7 @@ static void colo_failover_bh(void *opaque)
 {
 qemu_bh_delete(failover_bh);
 failover_bh = NULL;
-/*TODO: Do failover work */
+colo_do_failover(NULL);
 }
 
 void failover_request_set(void)
@@ -33,6 +33,16 @@ void failover_request_set(void)
 qemu_bh_schedule(failover_bh);
 }
 
+void failover_request_clear(void)
+{
+failover_request = false;
+}
+
+bool failover_request_is_set(void)
+{
+return failover_request;
+}
+
 void qmp_colo_lost_heartbeat(Error **errp)
 {
 if (get_colo_mode() == COLO_MODE_UNKNOWN) {
diff --git a/migration/colo.c b/migration/colo.c
index cc3d321..3ecaec8 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -73,6 +73,67 @@ bool loadvm_in_colo_state(void)
 return colo != NULL;
 }
 
+static bool colo_runstate_is_stopped(void)
+{
+return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
+}
+
+/*
+ * there are two way to entry this function
+ * 1. From colo checkpoint incoming thread, in this case
+ * we should protect it by iothread lock
+ * 2. From user command, because hmp/qmp command
+ * was happened in main loop, iothread lock will cause a
+ * dead lock.
+ */
+static void secondary_vm_do_failover(void)
+{
+colo = NULL;
+
+if (!autostart) {
+error_report("\"-S\" qemu option will be ignored in secondary side");
+/* recover runstate to normal migration finish state */
+autostart = true;
+}
+
+/* For Secondary VM, jump to incoming co */
+if (migration_incoming_co) {
+qemu_coroutine_enter(migration_incoming_co, NULL);
+}
+}
+
+static void primary_vm_do_failover(void)
+{
+MigrationState *s = migrate_get_current();
+
+if (!colo_runstate_is_stopped()) {
+vm_stop_force_state(RUN_STATE_COLO);
+}
+
+if (s->state != MIGRATION_STATUS_FAILED) {
+migrate_set_state(s, MIGRATION_STATUS_COLO, 
MIGRATION_STATUS_COMPLETED);
+}
+
+vm_start();
+}
+
+static bool failover_completed;
+void colo_do_failover(MigrationState *s)
+{
+/* Make sure vm stopped while failover */
+if (!colo_runstate_is_stopped()) {
+vm_stop_force_state(RUN_STATE_COLO);
+}
+
+trace_colo_do_failover();
+if (get_colo_mode() == COLO_MODE_SECONDARY) {
+secondary_vm_do_failover();
+} else {
+primary_vm_do_failover();
+}
+failover_completed = true;
+}
+
 /* colo checkpoint control helper */
 static int colo_ctl_put(QEMUFile *f, uint64_t request)
 {
@@ -144,11 +205,23 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s, QEMUFile *control)
 goto out;
 }
 
+if (failover_request_is_set()) {
+ret = -1;
+goto out;
+}
 /* suspend and save vm state to colo buffer */
 qemu_mutex_lock_iothread();
 vm_stop_force_state(RUN_STATE_COLO);
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("run", "stop");
+/*
+ * failover request bh could be called after
+ * vm_stop_force_state so we check failover_request_is_set() again.
+ */
+if (failover_request_is_set()) {
+ret = -1;
+goto out;
+}
 
 /* Disable block migration */
 s->params.blk = 0;
@@ -209,7 +282,7 @@ stat

[Qemu-devel] [PATCH COLO-Frame v6 11/31] COLO VMstate: Load VM state into qsb before restore it

2015-06-18 Thread zhanghailiang

We should not destroy the state of secondary until we receive the whole
state from the primary, in case the primary fails in the middle of sending
the state, so, here we cache the device state in Secondary before restore it.

Besides, we should call qemu_system_reset() before load VM state,
which can ensure the data is intact.

Note: If we discard qemu_system_reset(), there will be some odd error,
For exmple, qemu in slave side crashes and reports:

KVM: entry failed, hardware error 0x7
EAX= EBX=e000 ECX=9578 EDX=434f
ESI=fc10 EDI=434f EBP= ESP=1fca
EIP=9594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0040 0400  9300
CS =f000 000f  9b00
SS =434f 000434f0  9300
DS =434f 000434f0  9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT= 0002dcc8 0047
IDT=  
CR0=0010 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90  90 fa fc 
66 c3 66 53 66 89 c3 66 e8 9d e8 ff ff 66 01 c3 66 89 d8 66 e8 40 e9 ff ff 66
ERROR: invalid runstate transition: 'internal-error' -> 'colo'

The reason is, some of the device state will be ignored when saving device 
state to slave,
if the corresponding data is in its initial value, such as 0.
But the device state in slave maybe in initialized value, after a loop of 
checkpoint,
there will be inconsistent for the value of device state.
This will happen when the PVM reboot or SVM run ahead of PVM in the startup 
process.

Signed-off-by: zhanghailiang 
Signed-off-by: Yang Hongyang 
Signed-off-by: Gonglei 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/colo.c | 53 ++---
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 439a6fa..c82feb5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -315,8 +315,10 @@ void *colo_process_incoming_checkpoints(void *opaque)
 struct colo_incoming *colo_in = opaque;
 QEMUFile *f = colo_in->file;
 int fd = qemu_get_fd(f);
-QEMUFile *ctl = NULL;
+QEMUFile *ctl = NULL, *fb = NULL;
 int ret;
+uint64_t total_size;
+
 colo = qemu_coroutine_self();
 assert(colo != NULL);
 
@@ -331,10 +333,17 @@ void *colo_process_incoming_checkpoints(void *opaque)
 goto out;
 }
 
+colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+if (colo_buffer == NULL) {
+error_report("Failed to allocate colo buffer!");
+goto out;
+}
+
 ret = colo_ctl_put(ctl, COLO_CHECPOINT_READY);
 if (ret < 0) {
 goto out;
 }
+
 qemu_mutex_lock_iothread();
 /* in COLO mode, slave is runing, so start the vm */
 vm_start();
@@ -370,7 +379,18 @@ void *colo_process_incoming_checkpoints(void *opaque)
 }
 trace_colo_receive_message("COLO_CHECKPOINT_SEND");
 
-/*TODO Load VM state */
+/* read the VM state total size first */
+ret = colo_ctl_get_value(f, &total_size);
+if (ret < 0) {
+goto out;
+}
+
+/* read vm device state into colo buffer */
+ret = qsb_fill_buffer(colo_buffer, f, total_size);
+if (ret != total_size) {
+error_report("can't get all migration data");
+goto out;
+}
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_RECEIVED);
 if (ret < 0) {
@@ -378,6 +398,22 @@ void *colo_process_incoming_checkpoints(void *opaque)
 }
 trace_colo_receive_message("COLO_CHECKPOINT_RECEIVED");
 
+/* open colo buffer for read */
+fb = qemu_bufopen("r", colo_buffer);
+if (!fb) {
+error_report("can't open colo buffer for read");
+goto out;
+}
+
+qemu_mutex_lock_iothread();
+qemu_system_reset(VMRESET_SILENT);
+if (qemu_loadvm_state(fb) < 0) {
+error_report("COLO: loadvm failed");
+qemu_mutex_unlock_iothread();
+goto out;
+}
+qemu_mutex_unlock_iothread();
+
 /* TODO: flush vm state */
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
@@ -390,14 +426,25 @@ void *colo_process_incoming_checkpoints(void *opaque)
 vm_start();
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("stop", "start");
-}
+
+qemu_fclose(fb);
+fb = NULL;
+}
 
 out:
 colo = NULL;
+
+if (fb) {
+qemu_fclose(fb);
+}
+
 release_ram_cache();
 if (ctl) {
 qemu_fclose(ctl);
 }
+
+qsb_free(colo_buffer);
+
 loadvm_exit_colo();
 
 return NULL;
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 14/31] COLO failover: Introduce a new command to trigger a failover

2015-06-18 Thread zhanghailiang

We leave users to use whatever heartbeat solution they want, if the heartbeat
is lost, or other errors they detect, they can use command
'colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
accordingly.

For example,
If send the command to PVM, Primary will exit COLO mode, and takeover,
if to Secondary, Secondary will do failover work and at last takeover server.

Cc: Luiz Capitulino 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Lai Jiangshan 
Signed-off-by: Yang Hongyang 
---
 hmp-commands.hx| 15 
 hmp.c  |  8 +++
 hmp.h  |  1 +
 include/migration/migration-colo.h |  4 
 include/migration/migration-failover.h | 20 
 migration/Makefile.objs|  2 +-
 migration/colo-comm.c  | 11 +
 migration/colo-failover.c  | 43 ++
 migration/colo.c   |  1 +
 qapi-schema.json   | 25 
 qmp-commands.hx| 19 +++
 stubs/migration-colo.c |  8 +++
 12 files changed, 156 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/migration-failover.h
 create mode 100644 migration/colo-failover.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index d3b7932..ed487a6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1006,6 +1006,21 @@ Set the parameter @var{parameter} for migration.
 ETEXI
 
 {
+.name   = "colo_lost_heartbeat",
+.args_type  = "",
+.params = "",
+.help   = "Tell COLO that heartbeat is lost,\n\t\t\t"
+  "a failover or takeover is needed.",
+.mhandler.cmd = hmp_colo_lost_heartbeat,
+},
+
+STEXI
+@item colo_lost_heartbeat
+@findex colo_lost_heartbeat
+Tell COLO that heartbeat is lost, a failover or takeover is needed.
+ETEXI
+
+{
 .name   = "client_migrate_info",
 .args_type  = 
"protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
 .params = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 23abc7d..8e25d5a 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1271,6 +1271,14 @@ void hmp_client_migrate_info(Monitor *mon, const QDict 
*qdict)
 hmp_handle_error(mon, &err);
 }
 
+void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict)
+{
+Error *err = NULL;
+
+qmp_colo_lost_heartbeat(&err);
+hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
 const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index 0cf4f2a..c36c99c 100644
--- a/hmp.h
+++ b/hmp.h
@@ -68,6 +68,7 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict 
*qdict);
 void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
+void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
index c03c391..f9c09f3 100644
--- a/include/migration/migration-colo.h
+++ b/include/migration/migration-colo.h
@@ -17,6 +17,7 @@
 #include "migration/migration.h"
 #include "block/coroutine.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 
 bool colo_supported(void);
 void colo_info_mig_init(void);
@@ -35,6 +36,9 @@ bool loadvm_enable_colo(void);
 void loadvm_exit_colo(void);
 void *colo_process_incoming_checkpoints(void *opaque);
 bool loadvm_in_colo_state(void);
+
+int get_colo_mode(void);
+
 /* ram cache */
 int create_and_init_ram_cache(void);
 void colo_flush_ram_cache(void);
diff --git a/include/migration/migration-failover.h 
b/include/migration/migration-failover.h
new file mode 100644
index 000..a8767fc
--- /dev/null
+++ b/include/migration/migration-failover.h
@@ -0,0 +1,20 @@
+/*
+ *  COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ *  (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO.,LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef MIGRATION_FAILOVER_H
+#define MIGRATION_FAILOVER_H
+
+#include "qemu-common.h"
+
+void failover_request_set(void);
+
+#endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index cb7bd30..50d8392 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-y += migration.

[Qemu-devel] [PATCH COLO-Frame v6 09/31] COLO: Save VM state to slave when do checkpoint

2015-06-18 Thread zhanghailiang

We should save PVM's RAM/device to slave when needed.

For VM state, we will cache them in slave, we use QEMUSizedBuffer
to store the data, we need know the data size of VM state, so in master,
we use qsb to store VM state temporarily, and then migrate the data to
slave.

Signed-off-by: zhanghailiang 
Signed-off-by: Yang Hongyang 
Signed-off-by: Gonglei 
Signed-off-by: Lai Jiangshan 
Signed-off-by: Li Zhijian 
---
 migration/colo.c   | 61 +++---
 migration/ram.c| 48 --
 migration/savevm.c |  2 +-
 3 files changed, 96 insertions(+), 15 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 0f7c36b..7404507 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -52,6 +52,9 @@ enum {
 
 static QEMUBH *colo_bh;
 static Coroutine *colo;
+/* colo buffer */
+#define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
+QEMUSizedBuffer *colo_buffer;
 
 bool colo_supported(void)
 {
@@ -120,6 +123,8 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
 static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
 {
 int ret;
+size_t size;
+QEMUFile *trans = NULL;
 
 ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
 if (ret < 0) {
@@ -130,15 +135,47 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s, QEMUFile *control)
 if (ret < 0) {
 goto out;
 }
+/* Reset colo buffer and open it for write */
+qsb_set_length(colo_buffer, 0);
+trans = qemu_bufopen("w", colo_buffer);
+if (!trans) {
+error_report("Open colo buffer for write failed");
+goto out;
+}
+
+/* suspend and save vm state to colo buffer */
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("run", "stop");
+
+/* Disable block migration */
+s->params.blk = 0;
+s->params.shared = 0;
+qemu_savevm_state_begin(trans, &s->params);
+qemu_mutex_lock_iothread();
+qemu_savevm_state_complete(trans);
+qemu_mutex_unlock_iothread();
 
-/* TODO: suspend and save vm state to colo buffer */
+qemu_fflush(trans);
 
 ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
 if (ret < 0) {
 goto out;
 }
+/* we send the total size of the vmstate first */
+size = qsb_get_length(colo_buffer);
+ret = colo_ctl_put(s->file, size);
+if (ret < 0) {
+goto out;
+}
 
-/* TODO: send vmstate to Secondary */
+qsb_put_buffer(s->file, colo_buffer, size);
+qemu_fflush(s->file);
+ret = qemu_file_get_error(s->file);
+if (ret < 0) {
+goto out;
+}
 
 ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED);
 if (ret < 0) {
@@ -152,9 +189,18 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s, QEMUFile *control)
 }
 trace_colo_receive_message("COLO_CHECKPOINT_LOADED");
 
-/* TODO: resume Primary */
+ret = 0;
+/* resume master */
+qemu_mutex_lock_iothread();
+vm_start();
+qemu_mutex_unlock_iothread();
+trace_colo_vm_state_change("stop", "run");
 
 out:
+if (trans) {
+qemu_fclose(trans);
+}
+
 return ret;
 }
 
@@ -180,6 +226,12 @@ static void *colo_thread(void *opaque)
 }
 trace_colo_receive_message("COLO_CHECPOINT_READY");
 
+colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE);
+if (colo_buffer == NULL) {
+error_report("Failed to allocate colo buffer!");
+goto out;
+}
+
 qemu_mutex_lock_iothread();
 vm_start();
 qemu_mutex_unlock_iothread();
@@ -195,6 +247,9 @@ static void *colo_thread(void *opaque)
 out:
 migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED);
 
+qsb_free(colo_buffer);
+colo_buffer = NULL;
+
 if (colo_control) {
 qemu_fclose(colo_control);
 }
diff --git a/migration/ram.c b/migration/ram.c
index 57368e1..bc362f0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -38,6 +38,7 @@
 #include "trace.h"
 #include "exec/ram_addr.h"
 #include "qemu/rcu_queue.h"
+#include "migration/migration-colo.h"
 
 #ifdef DEBUG_MIGRATION_RAM
 #define DPRINTF(fmt, ...) \
@@ -1051,16 +1052,8 @@ static void reset_ram_globals(void)
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
-
-/* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
- * long-running RCU critical section.  When rcu-reclaims in the code
- * start to become numerous it will be necessary to reduce the
- * granularity of these critical sections.
- */
-
-static int ram_save_setup(QEMUFile *f, void *opaque)
+static int ram_save_init_globals(void)
 {
-RAMBlock *block;
 int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */
 
 mig_throttle_on = false;
@@ -1119,6 +1112,31 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 migration_bitmap_sync();
 qemu_mutex_unlock_ramlist();
 qemu_mutex_unlock_iothread();
+r

[Qemu-devel] [PATCH COLO-Frame v6 12/31] arch_init: Start to trace dirty pages of SVM

2015-06-18 Thread zhanghailiang

we will use this dirty bitmap together with VM's cache RAM dirty bitmap
to decide which page in cache should be flushed into VM's RAM.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 migration/ram.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index ded13f8..8c9edf0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1621,6 +1621,7 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 int create_and_init_ram_cache(void)
 {
 RAMBlock *block;
+int64_t ram_cache_pages = last_ram_offset() >> TARGET_PAGE_BITS;
 
 rcu_read_lock();
 QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
@@ -1632,6 +1633,15 @@ int create_and_init_ram_cache(void)
 }
 rcu_read_unlock();
 ram_cache_enable = true;
+/*
+* Start dirty log for Secondary VM, we use this dirty bitmap together with
+* VM's cache RAM dirty bitmap to decide which page in cache should be
+* flushed into VM's RAM.
+*/
+migration_bitmap = bitmap_new(ram_cache_pages);
+migration_dirty_pages = 0;
+memory_global_dirty_log_start();
+
 return 0;
 
 out_locked:
@@ -1652,6 +1662,12 @@ void release_ram_cache(void)
 
 ram_cache_enable = false;
 
+if (migration_bitmap) {
+memory_global_dirty_log_stop();
+g_free(migration_bitmap);
+migration_bitmap = NULL;
+}
+
 rcu_read_lock();
 QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
 if (block->host_cache) {
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 02/31] migration: Introduce capability 'colo' to migration

2015-06-18 Thread zhanghailiang

We add helper function colo_supported() to indicate whether
colo is supported or not, with which we use to control whether or not
showing 'colo' string to users, they can use qmp command
'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
to learn if colo is supported.

Cc: Juan Quintela 
Cc: Amit Shah 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Yang Hongyang 
Signed-off-by: Gonglei 
Signed-off-by: Lai Jiangshan 
---
 include/migration/migration-colo.h | 20 
 include/migration/migration.h  |  1 +
 migration/Makefile.objs|  1 +
 migration/colo.c   | 18 ++
 migration/migration.c  | 17 +
 qapi-schema.json   |  5 -
 stubs/Makefile.objs|  1 +
 stubs/migration-colo.c | 18 ++
 8 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/migration-colo.h
 create mode 100644 migration/colo.c
 create mode 100644 stubs/migration-colo.c

diff --git a/include/migration/migration-colo.h 
b/include/migration/migration-colo.h
new file mode 100644
index 000..c6d0c51
--- /dev/null
+++ b/include/migration/migration-colo.h
@@ -0,0 +1,20 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_COLO_H
+#define QEMU_MIGRATION_COLO_H
+
+#include "qemu-common.h"
+
+bool colo_supported(void);
+
+#endif
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 9387c8c..225e9e6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -169,6 +169,7 @@ int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t 
*dst, int dlen);
 
 int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
+bool migrate_enable_colo(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
 
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d929e96..5a25d39 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,5 @@
 common-obj-y += migration.o tcp.o
+common-obj-$(CONFIG_COLO) += colo.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += xbzrle.o
diff --git a/migration/colo.c b/migration/colo.c
new file mode 100644
index 000..bcd753b
--- /dev/null
+++ b/migration/colo.c
@@ -0,0 +1,18 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "migration/migration-colo.h"
+
+bool colo_supported(void)
+{
+return true;
+}
diff --git a/migration/migration.c b/migration/migration.c
index b04b457..b31ce94 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -25,6 +25,7 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "migration/migration-colo.h"
 
 #define MAX_THROTTLE  (32 << 20)  /* Migration speed throttling */
 
@@ -202,6 +203,9 @@ MigrationCapabilityStatusList 
*qmp_query_migrate_capabilities(Error **errp)
 
 caps = NULL; /* silence compiler warning */
 for (i = 0; i < MIGRATION_CAPABILITY_MAX; i++) {
+if (i == MIGRATION_CAPABILITY_COLO && !colo_supported()) {
+continue;
+}
 if (head == NULL) {
 head = g_malloc0(sizeof(*caps));
 caps = head;
@@ -342,6 +346,13 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 }
 
 for (cap = params; cap; cap = cap->next) {
+if (cap->value->capability == MIGRATION_CAPABILITY_COLO &&
+cap->value->state && !colo_supported()) {
+error_setg(errp, "COLO is not currently supported, please"
+ " configure with --enable-colo option in order to"
+ " support COLO feature");
+continue;
+}
 s->enabled_capabilities[cap->value->capability] = cap->value->state;
 }
 }
@@ -756,6 +767,12 @@ int64_t migrate_xbzrle_cache_size(void)
 return s->xbzrle_cache_size;
 }
 
+bool migrate_enable_colo(void)
+{
+MigrationState *s = migrate_get_current();
+return s->enabled_capabilities[MIGRATION_CAPABILITY_COLO];
+}
+
 /* migration thread support */
 
 static void *migration_thread(void *opaque)
diff --git a/q

[Qemu-devel] [PATCH COLO-Frame v6 24/31] COLO: Handle nfnetlink message from proxy module

2015-06-18 Thread zhanghailiang

Proxy module will send message to qemu through nfnetlink.
Now, the message only contains the result of packets comparation.

We use a global variable 'packet_compare_different' to store the result.
And this variable should be accessed by using atomic related function,
such as 'atomic_set' 'atomic_xchg'.

Cc: Stefan Hajnoczi 
Cc: Jason Wang 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 net/colo-nic.c | 42 ++
 trace-events   |  1 +
 2 files changed, 43 insertions(+)

diff --git a/net/colo-nic.c b/net/colo-nic.c
index 55bc055..4496bfa 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -22,6 +22,7 @@
 #include "net/colo-nic.h"
 #include "qemu/error-report.h"
 #include "net/tap.h"
+#include "trace.h"
 
 /* Remove the follow define after proxy is merged into kernel,
 * using #include  instead.
@@ -79,6 +80,7 @@ typedef struct nic_device {
 
 static struct nfnl_handle *nfnlh;
 static struct nfnl_subsys_handle *nfnlssh;
+static int32_t packet_compare_different; /* The result of packet comparing */
 
 QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
 
@@ -242,6 +244,38 @@ static int colo_proxy_send(enum nfnl_colo_msg_types 
msg_type,
 return ret;
 }
 
+static int __colo_rcv_pkt(struct nlmsghdr *nlh, struct nfattr *nfa[],
+  void *data)
+{
+/* struct nfgenmsg *nfmsg = NLMSG_DATA(nlh); */
+int32_t  result = ntohl(nfnl_get_data(nfa, NFNL_COLO_COMPARE_RESULT,
+  int32_t));
+
+atomic_set(&packet_compare_different, result);
+trace_colo_rcv_pkt(result);
+return 0;
+}
+
+static struct nfnl_callback colo_nic_cb = {
+.call   = &__colo_rcv_pkt,
+.attr_count = NFNL_COLO_KERNEL_NOTIFY_MAX,
+};
+
+static void colo_proxy_recv(void *opaque)
+{
+unsigned char *buf = g_malloc0(2048);
+int len;
+int ret;
+
+len = nfnl_recv(nfnlh, buf, 2048);
+ret = nfnl_handle_packet(nfnlh, (char *)buf, len);
+if (ret < 0) {/* Notify colo thread the error */
+atomic_set(&packet_compare_different, -1);
+error_report("call nfnl_handle_packet failed");
+}
+g_free(buf);
+}
+
 static int check_proxy_ack(void)
 {
 unsigned char *buf = g_malloc0(2048);
@@ -297,6 +331,11 @@ int colo_proxy_init(enum COLOMode mode)
 goto err_out;
 }
 
+ret = nfnl_callback_register(nfnlssh, NFCOLO_KERNEL_NOTIFY, &colo_nic_cb);
+if (ret < 0) {
+goto err_out;
+}
+
 /* Netlink is not a reliable protocol, So it is necessary to request proxy
  * module to acknowledge in the first time.
  */
@@ -316,6 +355,8 @@ int colo_proxy_init(enum COLOMode mode)
 goto err_out;
 }
 
+   qemu_set_fd_handler(nfnl_fd(nfnlh), colo_proxy_recv, NULL, NULL);
+
 return 0;
 err_out:
 nfnl_close(nfnlh);
@@ -326,6 +367,7 @@ err_out:
 void colo_proxy_destroy(enum COLOMode mode)
 {
 if (nfnlh) {
+qemu_set_fd_handler(nfnl_fd(nfnlh), NULL, NULL, NULL);
 nfnl_close(nfnlh);
 }
 teardown_nic(mode, getpid());
diff --git a/trace-events b/trace-events
index e262c8a..a84a04b 100644
--- a/trace-events
+++ b/trace-events
@@ -1473,6 +1473,7 @@ colo_info_load(const char *msg) "%s"
 colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'"
 colo_receive_message(const char *msg) "Receive '%s'"
 colo_do_failover(void) ""
+colo_rcv_pkt(int result) "Result of net packets comparing is different: %d"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 17/31] COLO failover: Don't do failover during loading VM's state

2015-06-18 Thread zhanghailiang

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Signed-off-by: Lai Jiangshan 
---
 migration/colo.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index a65f9ea..76bdd44 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -53,6 +53,7 @@ enum {
 };
 
 static QEMUBH *colo_bh;
+static bool vmstate_loading;
 static Coroutine *colo;
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
@@ -89,6 +90,11 @@ static bool colo_runstate_is_stopped(void)
  */
 static void secondary_vm_do_failover(void)
 {
+/* Wait for incoming thread loading vmstate */
+while (vmstate_loading) {
+;
+}
+
 colo = NULL;
 
 if (!autostart) {
@@ -511,11 +517,15 @@ void *colo_process_incoming_checkpoints(void *opaque)
 
 qemu_mutex_lock_iothread();
 qemu_system_reset(VMRESET_SILENT);
+vmstate_loading = true;
 if (qemu_loadvm_state(fb) < 0) {
 error_report("COLO: loadvm failed");
+vmstate_loading = false;
 qemu_mutex_unlock_iothread();
 goto out;
 }
+
+vmstate_loading = false;
 qemu_mutex_unlock_iothread();
 
 ret = colo_ctl_put(ctl, COLO_CHECKPOINT_LOADED);
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 08/31] QEMUSizedBuffer: Introduce two help functions for qsb

2015-06-18 Thread zhanghailiang

Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
VM state:
One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
into QEMUFile, this is used to send buffered VM state to secondary.
Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
qsb, this is used to get VM state from socket into a buffer.

Signed-off-by: Yang Hongyang 
Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/migration/qemu-file.h |  3 ++-
 migration/qemu-file-buf.c | 58 +++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 4f67d79..286ca3a 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -140,7 +140,8 @@ ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t 
start, size_t count,
uint8_t *buf);
 ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
  off_t pos, size_t count);
-
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size);
+int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size);
 
 /*
  * For use on files opened with qemu_bufopen
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index 16a51a1..686f417 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -365,6 +365,64 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t 
*source,
 return count;
 }
 
+
+/**
+ * Put the content of a given QEMUSizedBuffer into QEMUFile.
+ *
+ * @f: A QEMUFile
+ * @qsb: A QEMUSizedBuffer
+ * @size: size of content to write
+ */
+void qsb_put_buffer(QEMUFile *f, QEMUSizedBuffer *qsb, int size)
+{
+int i, l;
+
+for (i = 0; i < qsb->n_iov && size > 0; i++) {
+l = MIN(qsb->iov[i].iov_len, size);
+qemu_put_buffer(f, qsb->iov[i].iov_base, l);
+size -= l;
+}
+}
+
+/*
+ * Read 'size' bytes of data from the file into qsb.
+ * always fill from pos 0 and used after qsb_create().
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ */
+int qsb_fill_buffer(QEMUSizedBuffer *qsb, QEMUFile *f, int size)
+{
+ssize_t rc = qsb_grow(qsb, size);
+int pending = size, i;
+qsb->used = 0;
+uint8_t *buf = NULL;
+
+if (rc < 0) {
+return rc;
+}
+
+for (i = 0; i < qsb->n_iov && pending > 0; i++) {
+int doneone = 0;
+/* read until iov full */
+while (doneone < qsb->iov[i].iov_len && pending > 0) {
+int readone = 0;
+buf = qsb->iov[i].iov_base;
+readone = qemu_get_buffer(f, buf,
+MIN(qsb->iov[i].iov_len - doneone, pending));
+if (readone == 0) {
+return qsb->used;
+}
+buf += readone;
+doneone += readone;
+pending -= readone;
+qsb->used += readone;
+}
+}
+return qsb->used;
+}
+
 typedef struct QEMUBuffer {
 QEMUSizedBuffer *qsb;
 QEMUFile *file;
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 31/31] COLO: Add block replication into colo process

2015-06-18 Thread zhanghailiang

Make sure master start block replication after slave's block replication 
started.

Signed-off-by: zhanghailiang 
Signed-off-by: Wen Congyang 
Signed-off-by: Yang Hongyang 
Signed-off-by: Li Zhijian 
---
 migration/colo.c | 139 ++-
 trace-events |   2 +
 2 files changed, 139 insertions(+), 2 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 499a042..8c7e674 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -20,6 +20,8 @@
 #include "qapi-event.h"
 #include "net/colo-nic.h"
 #include "qmp-commands.h"
+#include "block/block.h"
+#include "sysemu/block-backend.h"
 
 /*
 * We should not do checkpoint one after another without any time interval,
@@ -108,6 +110,76 @@ static bool colo_runstate_is_stopped(void)
 return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
 }
 
+static void blk_start_replication(bool primary, Error **errp)
+{
+ReplicationMode mode = primary ? REPLICATION_MODE_PRIMARY :
+ REPLICATION_MODE_SECONDARY;
+BlockBackend *blk, *temp;
+Error *local_err = NULL;
+
+for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+if (blk_is_read_only(blk) || !blk_is_inserted(blk)) {
+continue;
+}
+
+bdrv_start_replication(blk_bs(blk), mode, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
+}
+}
+
+return;
+
+fail:
+for (temp = blk_next(NULL); temp != blk; temp = blk_next(temp)) {
+bdrv_stop_replication(blk_bs(temp), false, NULL);
+}
+}
+
+static void blk_do_checkpoint(Error **errp)
+{
+BlockBackend *blk;
+Error *local_err = NULL;
+
+for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+if (blk_is_read_only(blk) || !blk_is_inserted(blk)) {
+continue;
+}
+
+bdrv_do_checkpoint(blk_bs(blk), &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
+static void blk_stop_replication(bool failover, Error **errp)
+{
+BlockBackend *blk;
+Error *local_err = NULL;
+
+for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+if (blk_is_read_only(blk) || !blk_is_inserted(blk)) {
+continue;
+}
+
+bdrv_stop_replication(blk_bs(blk), failover, &local_err);
+if (!errp) {
+/*
+ * The caller doesn't care the result, they just
+ * want to stop all block's replication.
+ */
+continue;
+}
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
 /*
  * there are two way to entry this function
  * 1. From colo checkpoint incoming thread, in this case
@@ -118,6 +190,8 @@ static bool colo_runstate_is_stopped(void)
  */
 static void secondary_vm_do_failover(void)
 {
+Error *local_err = NULL;
+
 /* Wait for incoming thread loading vmstate */
 while (vmstate_loading) {
 ;
@@ -128,6 +202,12 @@ static void secondary_vm_do_failover(void)
 }
 colo_proxy_destroy(COLO_MODE_SECONDARY);
 
+blk_stop_replication(true, &local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+trace_colo_stop_block_replication("failover");
+
 colo = NULL;
 
 if (!autostart) {
@@ -145,6 +225,7 @@ static void secondary_vm_do_failover(void)
 static void primary_vm_do_failover(void)
 {
 MigrationState *s = migrate_get_current();
+Error *local_err = NULL;
 
 if (!colo_runstate_is_stopped()) {
 vm_stop_force_state(RUN_STATE_COLO);
@@ -156,6 +237,12 @@ static void primary_vm_do_failover(void)
 migrate_set_state(s, MIGRATION_STATUS_COLO, 
MIGRATION_STATUS_COMPLETED);
 }
 
+blk_stop_replication(true, &local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+trace_colo_stop_block_replication("failover");
+
 vm_start();
 }
 
@@ -229,6 +316,7 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s, QEMUFile *control)
 int colo_shutdown, ret;
 size_t size;
 QEMUFile *trans = NULL;
+Error *local_err = NULL;
 
 ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
 if (ret < 0) {
@@ -282,6 +370,16 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s, QEMUFile *control)
 goto out;
 }
 
+/* we call this api although this may do nothing on primary side */
+qemu_mutex_lock_iothread();
+blk_do_checkpoint(&local_err);
+qemu_mutex_unlock_iothread();
+if (local_err) {
+error_report_err(local_err);
+ret = -1;
+goto out;
+}
+
 ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
 if (ret < 0) {
 goto out;
@@ -313,6 +411,10 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s, QEMUFile *control)
 trace_colo_receive_message("COLO_CHECKPOINT_LOADED");
 
 if (colo_sh

[Qemu-devel] [PATCH COLO-Frame v6 19/31] COLO NIC: Init/remove colo nic devices when add/cleanup tap devices

2015-06-18 Thread zhanghailiang

When COLO mode, we do some init work for nic that will be used for COLO.

Cc: Stefan Hajnoczi 
Cc: Jason Wang 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 include/net/colo-nic.h |  3 +++
 net/Makefile.objs  |  1 +
 net/colo-nic.c | 70 ++
 net/net.c  |  2 ++
 net/tap.c  | 12 +
 stubs/migration-colo.c |  9 +++
 6 files changed, 92 insertions(+), 5 deletions(-)
 create mode 100644 net/colo-nic.c

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index 3075d97..2bbe7bc 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -20,4 +20,7 @@ typedef struct COLONicState {
 char ifname[128];  /* e.g. tap name */
 } COLONicState;
 
+void colo_add_nic_devices(COLONicState *cns);
+void colo_remove_nic_devices(COLONicState *cns);
+
 #endif
diff --git a/net/Makefile.objs b/net/Makefile.objs
index ec19cb3..73f4a81 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -13,3 +13,4 @@ common-obj-$(CONFIG_HAIKU) += tap-haiku.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
 common-obj-$(CONFIG_NETMAP) += netmap.o
+common-obj-$(CONFIG_COLO) += colo-nic.o
diff --git a/net/colo-nic.c b/net/colo-nic.c
new file mode 100644
index 000..9745817
--- /dev/null
+++ b/net/colo-nic.c
@@ -0,0 +1,70 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+#include "include/migration/migration.h"
+#include "migration/migration-colo.h"
+#include "net/net.h"
+#include "net/colo-nic.h"
+#include "qemu/error-report.h"
+
+
+typedef struct nic_device {
+COLONicState *cns;
+int (*configure)(COLONicState *cns, bool up, int side, int index);
+QTAILQ_ENTRY(nic_device) next;
+bool is_up;
+} nic_device;
+
+
+
+QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
+
+/*
+* colo_proxy_script usage
+* ./colo_proxy_script master/slave install/uninstall phy_if virt_if index
+*/
+
+void colo_add_nic_devices(COLONicState *cns)
+{
+struct nic_device *nic;
+NetClientState *nc = container_of(cns, NetClientState, cns);
+
+if (nc->info->type == NET_CLIENT_OPTIONS_KIND_HUBPORT ||
+nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
+return;
+}
+QTAILQ_FOREACH(nic, &nic_devices, next) {
+NetClientState *nic_nc = container_of(nic->cns, NetClientState, cns);
+if ((nic_nc->peer && nic_nc->peer == nc) ||
+(nc->peer && nc->peer == nic_nc)) {
+return;
+}
+}
+
+nic = g_malloc0(sizeof(*nic));
+nic->configure = NULL;
+nic->cns = cns;
+
+QTAILQ_INSERT_TAIL(&nic_devices, nic, next);
+}
+
+void colo_remove_nic_devices(COLONicState *cns)
+{
+struct nic_device *nic, *next_nic;
+
+QTAILQ_FOREACH_SAFE(nic, &nic_devices, next, next_nic) {
+if (nic->cns == cns) {
+QTAILQ_REMOVE(&nic_devices, nic, next);
+g_free(nic);
+}
+}
+}
diff --git a/net/net.c b/net/net.c
index 25c2ef3..3393ffa 100644
--- a/net/net.c
+++ b/net/net.c
@@ -279,6 +279,7 @@ static void qemu_net_client_setup(NetClientState *nc,
 peer->peer = nc;
 }
 QTAILQ_INSERT_TAIL(&net_clients, nc, next);
+colo_add_nic_devices(&nc->cns);
 
 nc->incoming_queue = qemu_new_net_queue(nc);
 nc->destructor = destructor;
@@ -354,6 +355,7 @@ void *qemu_get_nic_opaque(NetClientState *nc)
 static void qemu_cleanup_net_client(NetClientState *nc)
 {
 QTAILQ_REMOVE(&net_clients, nc, next);
+colo_remove_nic_devices(&nc->cns);
 
 if (nc->info->cleanup) {
 nc->info->cleanup(nc);
diff --git a/net/tap.c b/net/tap.c
index c558f79..64e4264 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -41,6 +41,7 @@
 #include "qemu/error-report.h"
 
 #include "net/tap.h"
+#include "net/colo-nic.h"
 
 #include "net/vhost_net.h"
 
@@ -611,7 +612,8 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
  const char *model, const char *name,
  const char *ifname, const char *script,
  const char *downscript, const char *vhostfdname,
- int vnet_hdr, int fd, Error **errp)
+ int vnet_hdr, int fd, bool setup_colo,
+ Error **errp)
 {
 Error *err = NULL;
 TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
@@ -759,7 +761,7 @@ int net_init_tap(const NetClientOptions *opts, const char 
*name,
 
 net_init_tap_one(tap, peer, "tap", name, NULL,
  script, downscript,
-

[Qemu-devel] [PATCH COLO-Frame v6 23/31] COLO NIC: Some init work related with proxy module

2015-06-18 Thread zhanghailiang

Implement communication protocol with proxy module by using
nfnetlink, which requires libnfnetlink libs.

Tell proxy module to do initialization work and moreover ask
kernel to acknowledge the request. It's is necessary for the first
time because Netlink is not a reliable protocol.

Cc: Stefan Hajnoczi 
Cc: Jason Wang 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 configure  |  22 +++-
 net/colo-nic.c | 160 +
 2 files changed, 180 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 8b6242f..ab3942d 100755
--- a/configure
+++ b/configure
@@ -2310,7 +2310,25 @@ EOF
 rdma="no"
   fi
 fi
-
+##
+# COLO needs libnfnetlink libraries
+if test "$colo" != "no"; then
+  cat > $TMPC <
+int main(void) { return 0; }
+EOF
+  colo_libs="-lnfnetlink"
+  if compile_prog "" "$colo_libs"; then
+colo="yes"
+libs_softmmu="$libs_softmmu $colo_libs"
+  else
+if test "$colo" = "yes" ; then
+error_exit "libnfnetlink is required for colo feature." \
+"Make sure to have the libnfnetlink devel and headers installed."
+fi
+colo="no"
+  fi
+fi
 ##
 # VNC TLS/WS detection
 if test "$vnc" = "yes" -a \( "$vnc_tls" != "no" -o "$vnc_ws" != "no" \) ; then
@@ -2617,7 +2635,7 @@ EOF
 if compile_prog "$cfl" "$lib" ; then
 :
 else
-error_exit "$drv check failed" \
+rror_exit "$drv check failed" \
 "Make sure to have the $drv libs and headers installed."
 fi
 }
diff --git a/net/colo-nic.c b/net/colo-nic.c
index 7c3fcae..55bc055 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -10,6 +10,12 @@
  * later.  See the COPYING file in the top-level directory.
  *
  */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include "include/migration/migration.h"
 #include "migration/migration-colo.h"
 #include "net/net.h"
@@ -17,6 +23,53 @@
 #include "qemu/error-report.h"
 #include "net/tap.h"
 
+/* Remove the follow define after proxy is merged into kernel,
+* using #include  instead.
+*/
+#define NFNL_SUBSYS_COLO 12
+
+/* Message Format
+* <---NLMSG_ALIGN(hlen)-><-- NLMSG_ALIGN(len)->
+* ++- - -+- - - - - - - - - - - - - - +- - - - - - + - - -+
+* |   Header   | Pad |   Netfilter Netlink Header | Attributes | Pad  |
+* |struct nlmsghdr | | struct nfgenmsg||  |
+* ++- - -+- - - - - - - - - - - - - - + - - - - - -+ - - -+
+*/
+
+enum nfnl_colo_msg_types {
+NFCOLO_KERNEL_NOTIFY, /* Used by proxy module to notify qemu */
+
+NFCOLO_DO_CHECKPOINT,
+NFCOLO_DO_FAILOVER,
+NFCOLO_PROXY_INIT,
+NFCOLO_PROXY_RESET,
+
+NFCOLO_MSG_MAX
+};
+
+enum nfnl_colo_kernel_notify_attributes {
+NFNL_COLO_KERNEL_NOTIFY_UNSPEC,
+NFNL_COLO_COMPARE_RESULT,
+__NFNL_COLO_KERNEL_NOTIFY_MAX
+};
+
+#define NFNL_COLO_KERNEL_NOTIFY_MAX  (__NFNL_COLO_KERNEL_NOTIFY_MAX - 1)
+
+enum nfnl_colo_attributes {
+NFNL_COLO_UNSPEC,
+NFNL_COLO_MODE,
+__NFNL_COLO_MAX
+};
+#define NFNL_COLO_MAX  (__NFNL_COLO_MAX - 1)
+
+struct nfcolo_msg_mode {
+u_int8_t mode;
+};
+
+struct nfcolo_packet_compare { /* Unused */
+int32_t different;
+};
+
 typedef struct nic_device {
 COLONicState *cns;
 int (*configure)(COLONicState *cns, bool up, int side, int index);
@@ -24,6 +77,9 @@ typedef struct nic_device {
 bool is_up;
 } nic_device;
 
+static struct nfnl_handle *nfnlh;
+static struct nfnl_subsys_handle *nfnlssh;
+
 QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
 
 static int colo_nic_configure(COLONicState *cns,
@@ -154,19 +210,123 @@ void colo_remove_nic_devices(COLONicState *cns)
 }
 }
 
+static int colo_proxy_send(enum nfnl_colo_msg_types msg_type,
+   enum COLOMode mode, int flag, void *unused)
+{
+struct nfcolo_msg_mode params;
+union {
+char buf[NFNL_HEADER_LEN
+ + NFA_LENGTH(sizeof(struct nfcolo_msg_mode))];
+struct nlmsghdr nmh;
+} u;
+int ret;
+
+if (!nfnlssh || !nfnlh) {
+error_report("nfnlssh and nfnlh are uninited");
+return -1;
+}
+nfnl_fill_hdr(nfnlssh, &u.nmh, 0, AF_UNSPEC, 1,
+  msg_type, NLM_F_REQUEST | flag);
+params.mode = mode;
+u.nmh.nlmsg_pid = nfnl_portid(nfnlh);
+ret = nfnl_addattr_l(&u.nmh, sizeof(u),  NFNL_COLO_MODE, ¶ms,
+ sizeof(params));
+if (ret < 0) {
+error_report("call nfnl_addattr_l failed");
+return ret;
+}
+ret = nfnl_send(nfnlh, &u.nmh);
+if (ret < 0) {
+error_report("call nfnl_send failed");
+}
+return ret;
+}
+
+static int check_proxy_ack(void)
+{
+unsigned char *buf = g_malloc0(2048);
+struct nlmsghdr *nlmsg;
+int len;
+int ret = -1;
+
+len = nfnl_recv(nfnlh, bu

[Qemu-devel] [PATCH COLO-Frame v6 21/31] COLO NIC: Implement colo nic device interface configure()

2015-06-18 Thread zhanghailiang

Implement colo nic device interface configure()
add a script to configure nic devices:
${QEMU_SCRIPT_DIR}/colo-proxy-script.sh

Cc: Stefan Hajnoczi 
Cc: Jason Wang 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 include/net/tap.h| 17 +
 net/colo-nic.c   | 47 ---
 net/tap.c| 17 -
 scripts/colo-proxy-script.sh | 90 
 4 files changed, 148 insertions(+), 23 deletions(-)
 create mode 100755 scripts/colo-proxy-script.sh

diff --git a/include/net/tap.h b/include/net/tap.h
index ac99b31..9688765 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -29,6 +29,23 @@
 #include "qemu-common.h"
 #include "qapi-types.h"
 #include "standard-headers/linux/virtio_net.h"
+#include "net/net.h"
+#include "net/vhost_net.h"
+
+typedef struct TAPState {
+NetClientState nc;
+int fd;
+char down_script[1024];
+char down_script_arg[128];
+uint8_t buf[NET_BUFSIZE];
+bool read_poll;
+bool write_poll;
+bool using_vnet_hdr;
+bool has_ufo;
+bool enabled;
+VHostNetState *vhost_net;
+unsigned host_vnet_hdr_len;
+} TAPState;
 
 int tap_enable(NetClientState *nc);
 int tap_disable(NetClientState *nc);
diff --git a/net/colo-nic.c b/net/colo-nic.c
index 9745817..c7dd473 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -15,7 +15,7 @@
 #include "net/net.h"
 #include "net/colo-nic.h"
 #include "qemu/error-report.h"
-
+#include "net/tap.h"
 
 typedef struct nic_device {
 COLONicState *cns;
@@ -28,10 +28,45 @@ typedef struct nic_device {
 
 QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
 
-/*
-* colo_proxy_script usage
-* ./colo_proxy_script master/slave install/uninstall phy_if virt_if index
-*/
+static int colo_nic_configure(COLONicState *cns,
+bool up, int side, int index)
+{
+int i, argc = 6;
+char *argv[7], index_str[32];
+char **parg;
+NetClientState *nc = container_of(cns, NetClientState, cns);
+TAPState *s = DO_UPCAST(TAPState, nc, nc);
+Error *err = NULL;
+
+if (!cns && index <= 0) {
+error_report("Can not parse colo_script or colo_nicname");
+return -1;
+}
+
+parg = argv;
+*parg++ = cns->script;
+*parg++ = (char *)(side == COLO_MODE_SECONDARY ? "secondary" : "primary");
+*parg++ = (char *)(up ? "install" : "uninstall");
+*parg++ = cns->nicname;
+*parg++ = cns->ifname;
+sprintf(index_str, "%d", index);
+*parg++ = index_str;
+*parg = NULL;
+
+for (i = 0; i < argc; i++) {
+if (!argv[i][0]) {
+error_report("Can not get colo_script argument");
+return -1;
+}
+}
+
+launch_script(argv, s->fd, &err);
+if (err) {
+error_report_err(err);
+return -1;
+}
+return 0;
+}
 
 void colo_add_nic_devices(COLONicState *cns)
 {
@@ -51,7 +86,7 @@ void colo_add_nic_devices(COLONicState *cns)
 }
 
 nic = g_malloc0(sizeof(*nic));
-nic->configure = NULL;
+nic->configure = colo_nic_configure;
 nic->cns = cns;
 
 QTAILQ_INSERT_TAIL(&nic_devices, nic, next);
diff --git a/net/tap.c b/net/tap.c
index 78104b2..1339085 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -43,23 +43,6 @@
 #include "net/tap.h"
 #include "net/colo-nic.h"
 
-#include "net/vhost_net.h"
-
-typedef struct TAPState {
-NetClientState nc;
-int fd;
-char down_script[1024];
-char down_script_arg[128];
-uint8_t buf[NET_BUFSIZE];
-bool read_poll;
-bool write_poll;
-bool using_vnet_hdr;
-bool has_ufo;
-bool enabled;
-VHostNetState *vhost_net;
-unsigned host_vnet_hdr_len;
-} TAPState;
-
 static void tap_send(void *opaque);
 static void tap_writable(void *opaque);
 
diff --git a/scripts/colo-proxy-script.sh b/scripts/colo-proxy-script.sh
new file mode 100755
index 000..9ebff21
--- /dev/null
+++ b/scripts/colo-proxy-script.sh
@@ -0,0 +1,90 @@
+#!/bin/sh
+#usage:
+# colo-proxy-script.sh primary/secondary install/uninstall phy_if virt_if index
+#.e.g:
+# colo-proxy-script.sh primary install eth2 tap0 1
+
+side=$1
+action=$2
+phy_if=$3
+virt_if=$4
+index=$5
+br=br1
+failover_br=br0
+
+script_usage()
+{
+echo -n "usage: ./colo-proxy-script.sh primary/secondary "
+echo -e "install/uninstall phy_if virt_if index\n"
+}
+
+primary_install()
+{
+tc qdisc add dev $virt_if root handle 1: prio
+tc filter add dev $virt_if parent 1: protocol ip prio 10 u32 match u32 \
+0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+tc filter add dev $virt_if parent 1: protocol arp prio 11 u32 match u32 \
+0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+tc filter add dev $virt_if parent 1: protocol ipv6 prio 12 u32 match u32 \
+0 0 flowid 1:2 action mirred egress mirror dev $phy_if
+
+/usr/local/sbin/iptables -t mangle -I PREROUTING -m physdev --physdev-in \
+$virt_if -j PMYCOLO --index $index --forwa

[Qemu-devel] [PATCH COLO-Frame v6 26/31] COLO: Improve checkpoint efficiency by do additional periodic checkpoint

2015-06-18 Thread zhanghailiang

Besides normal checkpoint which according to the result of net packets
comparing, We do additional checkpoint periodically, it will reduce the number
of dirty pages when do one checkpoint, if we don't do checkpoint for a long
time (This is a special case when the net packets is always consistent).

Signed-off-by: zhanghailiang 
Signed-off-by: Yang Hongyang 
---
 migration/colo.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index cf7a6e1..b11ed7b 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,6 +10,7 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
 #include "trace.h"
@@ -25,6 +26,13 @@
 */
 #define CHECKPOINT_MIN_PERIOD 100  /* unit: ms */
 
+/*
+ * force checkpoint timer: unit ms
+ * this is large because COLO checkpoint will mostly depend on
+ * COLO compare module.
+ */
+#define CHECKPOINT_MAX_PEROID 1
+
 enum {
 COLO_CHECPOINT_READY = 0x46,
 
@@ -343,14 +351,7 @@ static void *colo_thread(void *opaque)
 proxy_checkpoint_req = colo_proxy_compare();
 if (proxy_checkpoint_req < 0) {
 goto out;
-} else if (!proxy_checkpoint_req) {
-/*
- * No checkpoint is needed, wait for 1ms and then
- * check if we need checkpoint again
- */
-g_usleep(1000);
-continue;
-} else {
+} else if (proxy_checkpoint_req) {
 int64_t interval;
 
 current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
@@ -359,8 +360,20 @@ static void *colo_thread(void *opaque)
 /* Limit the min time between two checkpoint */
 g_usleep((1000*(CHECKPOINT_MIN_PERIOD - interval)));
 }
+goto do_checkpoint;
+}
+
+/*
+ * No proxy checkpoint is request, wait for 100ms
+ * and then check if we need checkpoint again.
+ */
+current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+if (current_time - checkpoint_time < CHECKPOINT_MAX_PEROID) {
+g_usleep(10);
+continue;
 }
 
+do_checkpoint:
 /* start a colo checkpoint */
 if (colo_do_checkpoint_transaction(s, colo_control)) {
 goto out;
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 29/31] COLO: Disable qdev hotplug when VM is in COLO mode

2015-06-18 Thread zhanghailiang

COLO do not support qdev hotplug migration, disable it.

Signed-off-by: zhanghailiang 
Signed-off-by: Yang Hongyang 
---
 migration/colo.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index 0fcadcd..8d6d166 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -10,6 +10,7 @@
  * later.  See the COPYING file in the top-level directory.
  */
 
+#include "hw/qdev-core.h"
 #include "qemu/timer.h"
 #include "sysemu/sysemu.h"
 #include "migration/migration-colo.h"
@@ -325,6 +326,7 @@ out:
 static void *colo_thread(void *opaque)
 {
 MigrationState *s = opaque;
+int dev_hotplug = qdev_hotplug;
 QEMUFile *colo_control = NULL;
 int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 int i, ret;
@@ -340,6 +342,8 @@ static void *colo_thread(void *opaque)
 goto out;
 }
 
+qdev_hotplug = 0;
+
 /*
  * Wait for Secondary finish loading vm states and enter COLO
  * restore.
@@ -436,6 +440,8 @@ out:
 qemu_bh_schedule(s->cleanup_bh);
 qemu_mutex_unlock_iothread();
 
+qdev_hotplug = dev_hotplug;
+
 return NULL;
 }
 
@@ -493,10 +499,13 @@ void *colo_process_incoming_checkpoints(void *opaque)
 struct colo_incoming *colo_in = opaque;
 QEMUFile *f = colo_in->file;
 int fd = qemu_get_fd(f);
+int dev_hotplug = qdev_hotplug;
 QEMUFile *ctl = NULL, *fb = NULL;
 int i, ret;
 uint64_t total_size;
 
+qdev_hotplug = 0;
+
 colo = qemu_coroutine_self();
 assert(colo != NULL);
 
@@ -674,5 +683,7 @@ out:
 
 loadvm_exit_colo();
 
+qdev_hotplug = dev_hotplug;
+
 return NULL;
 }
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 25/31] COLO: Do checkpoint according to the result of packets comparation

2015-06-18 Thread zhanghailiang

Only do checkpoint, when the PVM's and SVM's output net packets are 
inconsistent,
We also limit the min time between two continuous checkpoint action, to
give VM a change to run.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 include/net/colo-nic.h |  2 ++
 migration/colo.c   | 32 
 net/colo-nic.c |  5 +
 3 files changed, 39 insertions(+)

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index 9ebc543..17f8800 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -27,4 +27,6 @@ void colo_remove_nic_devices(COLONicState *cns);
 int colo_proxy_init(enum COLOMode mode);
 void colo_proxy_destroy(enum COLOMode mode);
 
+int colo_proxy_compare(void);
+
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 4c9e781..cf7a6e1 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -18,6 +18,13 @@
 #include "qapi-event.h"
 #include "net/colo-nic.h"
 
+/*
+* We should not do checkpoint one after another without any time interval,
+* Because this will lead continuous 'stop' status for VM.
+* CHECKPOINT_MIN_PERIOD is the min time limit between two checkpoint action.
+*/
+#define CHECKPOINT_MIN_PERIOD 100  /* unit: ms */
+
 enum {
 COLO_CHECPOINT_READY = 0x46,
 
@@ -290,6 +297,7 @@ static void *colo_thread(void *opaque)
 {
 MigrationState *s = opaque;
 QEMUFile *colo_control = NULL;
+int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 int i, ret;
 
 if (colo_proxy_init(COLO_MODE_PRIMARY) != 0) {
@@ -325,15 +333,39 @@ static void *colo_thread(void *opaque)
 trace_colo_vm_state_change("stop", "run");
 
 while (s->state == MIGRATION_STATUS_COLO) {
+int proxy_checkpoint_req;
+
 if (failover_request_is_set()) {
 error_report("failover request");
 goto out;
 }
+/* wait for a colo checkpoint */
+proxy_checkpoint_req = colo_proxy_compare();
+if (proxy_checkpoint_req < 0) {
+goto out;
+} else if (!proxy_checkpoint_req) {
+/*
+ * No checkpoint is needed, wait for 1ms and then
+ * check if we need checkpoint again
+ */
+g_usleep(1000);
+continue;
+} else {
+int64_t interval;
+
+current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+interval = current_time - checkpoint_time;
+if (interval < CHECKPOINT_MIN_PERIOD) {
+/* Limit the min time between two checkpoint */
+g_usleep((1000*(CHECKPOINT_MIN_PERIOD - interval)));
+}
+}
 
 /* start a colo checkpoint */
 if (colo_do_checkpoint_transaction(s, colo_control)) {
 goto out;
 }
+checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 }
 
 out:
diff --git a/net/colo-nic.c b/net/colo-nic.c
index 4496bfa..e395c13 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -372,3 +372,8 @@ void colo_proxy_destroy(enum COLOMode mode)
 }
 teardown_nic(mode, getpid());
 }
+
+int colo_proxy_compare(void)
+{
+return atomic_xchg(&packet_compare_different, 0);
+}
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 20/31] tap: Make launch_script() public

2015-06-18 Thread zhanghailiang

We also change the parameters of launch_script().

Cc: Stefan Hajnoczi 
Cc: Jason Wang 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 include/net/tap.h |  2 ++
 net/tap.c | 31 ++-
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/include/net/tap.h b/include/net/tap.h
index 5da4edc..ac99b31 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -38,4 +38,6 @@ int tap_get_fd(NetClientState *nc);
 struct vhost_net;
 struct vhost_net *tap_get_vhost_net(NetClientState *nc);
 
+void launch_script(char *const args[], int fd, Error **errp);
+
 #endif /* QEMU_NET_TAP_H */
diff --git a/net/tap.c b/net/tap.c
index 64e4264..78104b2 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -60,9 +60,6 @@ typedef struct TAPState {
 unsigned host_vnet_hdr_len;
 } TAPState;
 
-static void launch_script(const char *setup_script, const char *ifname,
-  int fd, Error **errp);
-
 static void tap_send(void *opaque);
 static void tap_writable(void *opaque);
 
@@ -291,7 +288,14 @@ static void tap_cleanup(NetClientState *nc)
 qemu_purge_queued_packets(nc);
 
 if (s->down_script[0]) {
-launch_script(s->down_script, s->down_script_arg, s->fd, &err);
+char *args[3];
+char **parg;
+
+parg = args;
+*parg++ = (char *)s->down_script;
+*parg++ = (char *)s->down_script_arg;
+*parg = NULL;
+launch_script(args, s->fd, &err);
 if (err) {
 error_report_err(err);
 }
@@ -366,12 +370,10 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
 return s;
 }
 
-static void launch_script(const char *setup_script, const char *ifname,
-  int fd, Error **errp)
+void launch_script(char *const args[], int fd, Error **errp)
 {
 int pid, status;
-char *args[3];
-char **parg;
+const char *setup_script = args[0];
 
 /* try to launch network script */
 pid = fork();
@@ -388,10 +390,6 @@ static void launch_script(const char *setup_script, const 
char *ifname,
 close(i);
 }
 }
-parg = args;
-*parg++ = (char *)setup_script;
-*parg++ = (char *)ifname;
-*parg = NULL;
 execv(setup_script, args);
 _exit(1);
 } else {
@@ -595,7 +593,14 @@ static int net_tap_init(const NetdevTapOptions *tap, int 
*vnet_hdr,
 if (setup_script &&
 setup_script[0] != '\0' &&
 strcmp(setup_script, "no") != 0) {
-launch_script(setup_script, ifname, fd, &err);
+char *args[3];
+char **parg;
+parg = args;
+*parg++ = (char *)setup_script;
+*parg++ = (char *)ifname;
+*parg = NULL;
+
+launch_script(args, fd, &err);
 if (err) {
 error_propagate(errp, err);
 close(fd);
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 16/31] qmp event: Add event notification for COLO error

2015-06-18 Thread zhanghailiang

If some errors happen during VM's COLO FT stage, it's import to notify the users
this event, Togehter with 'colo_lost_heartbeat', users can intervene in COLO's
failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exit COLO mode.

Cc: Markus Armbruster 
Cc: Michael Roth 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 docs/qmp/qmp-events.txt | 16 
 migration/colo.c| 12 ++--
 qapi/event.json | 15 +++
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/docs/qmp/qmp-events.txt b/docs/qmp/qmp-events.txt
index 4c13d48..7b6df2e 100644
--- a/docs/qmp/qmp-events.txt
+++ b/docs/qmp/qmp-events.txt
@@ -473,6 +473,22 @@ Example:
 { "timestamp": {"seconds": 1290688046, "microseconds": 417172},
   "event": "SPICE_MIGRATE_COMPLETED" }
 
+COLO_EXIT
+-
+
+Emitted when VM finish COLO mode due to some errors happening or
+the request of users.
+
+Data: None.
+
+ - "mode": COLO mode, 'primary' or 'secondary'
+ - "error": Error message (json-string, optional)
+
+Example:
+
+{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
+ "event": "COLO_EXIT", "data": {"mode": "primary"}}
+
 
 STOP
 
diff --git a/migration/colo.c b/migration/colo.c
index 3ecaec8..a65f9ea 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -15,6 +15,7 @@
 #include "trace.h"
 #include "qemu/error-report.h"
 #include "migration/migration-failover.h"
+#include "qapi-event.h"
 
 enum {
 COLO_CHECPOINT_READY = 0x46,
@@ -325,13 +326,14 @@ static void *colo_thread(void *opaque)
 
 out:
 error_report("colo: some error happens in colo_thread");
+qapi_event_send_colo_exit("primary", true, "unknown", NULL);;
 /* Give users time (2s) to get involved in this verdict */
 for (i = 0; i < 10; i++) {
 if (failover_request_is_set()) {
 error_report("Primary VM will take over work");
 break;
 }
-usleep(200*1000);
+usleep(200 * 1000);
 }
 qemu_mutex_lock_iothread();
 if (!failover_request_is_set()) {
@@ -533,13 +535,19 @@ void *colo_process_incoming_checkpoints(void *opaque)
 
 out:
 error_report("Detect some error or get a failover request");
+/*
+* Here, we raise a qmp event to the user,
+* It can help user to know what happens, and help deciding whether to
+* do failover.
+*/
+qapi_event_send_colo_exit("secondary", true, "unknown", NULL);;
 /* Give users time (2s) to get involved in this verdict */
 for (i = 0; i < 10; i++) {
 if (failover_request_is_set()) {
 error_report("Secondary VM will take over work");
 break;
 }
-usleep(200*1000);
+usleep(200 * 1000);
 }
 /* check flag again*/
 if (!failover_request_is_set()) {
diff --git a/qapi/event.json b/qapi/event.json
index 378dda5..e269765 100644
--- a/qapi/event.json
+++ b/qapi/event.json
@@ -243,6 +243,21 @@
 { 'event': 'SPICE_MIGRATE_COMPLETED' }
 
 ##
+# @COLO_EXIT
+#
+# Emitted when VM finish COLO mode due to some errors happening or
+# the request of users.
+#
+# @mode: 'primary' or 'secondeary'.
+#
+# @error:  #optional, error message. Only present on error happening.
+#
+# Since: 2.4
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'str', '*error':'str'}}
+
+##
 # @ACPI_DEVICE_OST
 #
 # Emitted when guest executes ACPI _OST method.
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 18/31] COLO: Add new command parameter 'colo_nicname' 'colo_script' for net

2015-06-18 Thread zhanghailiang

The 'colo_nicname' should be assigned with network name,
for exmple, 'eth2'. It will be parameter of 'colo_script',
'colo_script' should be assigned with an scirpt path.

We parse these parameter in tap.

Cc: Stefan Hajnoczi 
Cc: Jason Wang 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 include/net/colo-nic.h | 23 +++
 include/net/net.h  |  2 ++
 net/tap.c  | 27 ---
 qapi-schema.json   |  8 +++-
 qemu-options.hx|  7 +++
 5 files changed, 63 insertions(+), 4 deletions(-)
 create mode 100644 include/net/colo-nic.h

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
new file mode 100644
index 000..3075d97
--- /dev/null
+++ b/include/net/colo-nic.h
@@ -0,0 +1,23 @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 FUJITSU LIMITED
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef COLO_NIC_H
+#define COLO_NIC_H
+
+typedef struct COLONicState {
+char nicname[128]; /* forward dev */
+char script[1024]; /* colo script */
+char ifname[128];  /* e.g. tap name */
+} COLONicState;
+
+#endif
diff --git a/include/net/net.h b/include/net/net.h
index e66ca03..615ba23 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -8,6 +8,7 @@
 #include "net/queue.h"
 #include "migration/vmstate.h"
 #include "qapi-types.h"
+#include "net/colo-nic.h"
 
 #define MAX_QUEUE_NUM 1024
 
@@ -84,6 +85,7 @@ struct NetClientState {
 char *model;
 char *name;
 char info_str[256];
+COLONicState cns;
 unsigned receive_disabled : 1;
 NetClientDestructor *destructor;
 unsigned int queue_index;
diff --git a/net/tap.c b/net/tap.c
index aa8b3f5..c558f79 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -616,6 +616,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 Error *err = NULL;
 TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
 int vhostfd;
+NetClientState *nc = NULL;
 
 tap_set_sndbuf(s->fd, tap, &err);
 if (err) {
@@ -640,6 +641,17 @@ static void net_init_tap_one(const NetdevTapOptions *tap, 
NetClientState *peer,
 }
 }
 
+nc = &(s->nc);
+snprintf(nc->cns.ifname, sizeof(nc->cns.ifname), "%s", ifname);
+if (tap->has_colo_script) {
+snprintf(nc->cns.script, sizeof(nc->cns.script), "%s",
+ tap->colo_script);
+}
+if (tap->has_colo_nicname) {
+snprintf(nc->cns.nicname, sizeof(nc->cns.nicname), "%s",
+ tap->colo_nicname);
+}
+
 if (tap->has_vhost ? tap->vhost :
 vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
 VhostNetOptions options;
@@ -759,9 +771,10 @@ int net_init_tap(const NetClientOptions *opts, const char 
*name,
 
 if (tap->has_ifname || tap->has_script || tap->has_downscript ||
 tap->has_vnet_hdr || tap->has_helper || tap->has_queues ||
-tap->has_vhostfd) {
+tap->has_vhostfd || tap->has_colo_script || tap->has_colo_nicname) 
{
 error_setg(errp, "ifname=, script=, downscript=, vnet_hdr=, "
"helper=, queues=, and vhostfd= "
+"colo_script=, and colo_nicname= "
"are invalid with fds=");
 return -1;
 }
@@ -804,9 +817,11 @@ int net_init_tap(const NetClientOptions *opts, const char 
*name,
 }
 } else if (tap->has_helper) {
 if (tap->has_ifname || tap->has_script || tap->has_downscript ||
-tap->has_vnet_hdr || tap->has_queues || tap->has_vhostfds) {
+tap->has_vnet_hdr || tap->has_queues || tap->has_vhostfds ||
+tap->has_colo_script || tap->has_colo_nicname) {
 error_setg(errp, "ifname=, script=, downscript=, vnet_hdr=, "
-   "queues=, and vhostfds= are invalid with helper=");
+   "queues=, and vhostfds=, colo_script=, and "
+   "colo_nicname= are invalid with helper=");
 return -1;
 }
 
@@ -828,6 +843,12 @@ int net_init_tap(const NetClientOptions *opts, const char 
*name,
 return -1;
 }
 } else {
+if (queues > 1 && (tap->has_colo_script || tap->has_colo_nicname)) {
+error_report("queues > 1 is invalid if colo_script or "
+ "colo_nicname is specified");
+return -1;
+}
+
 if (tap->has_vhostfds) {
 error_setg(errp, "vhostfds= is invalid if fds= wasn't specified");
 return -1;
diff --git a/qapi-schema.json b/qapi-schema.json
index e5a0b1d..736f8bd 100644
--- a/qapi-schema

[Qemu-devel] [PATCH COLO-Frame v6 28/31] COLO NIC: Implement NIC checkpoint and failover

2015-06-18 Thread zhanghailiang

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 include/net/colo-nic.h |  2 ++
 migration/colo.c   | 21 ++---
 net/colo-nic.c | 23 +++
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index 17f8800..252b3ae 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -28,5 +28,7 @@ int colo_proxy_init(enum COLOMode mode);
 void colo_proxy_destroy(enum COLOMode mode);
 
 int colo_proxy_compare(void);
+int colo_proxy_failover(void);
+int colo_proxy_checkpoint(enum COLOMode mode);
 
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index f446827..0fcadcd 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -120,6 +120,11 @@ static void secondary_vm_do_failover(void)
 ;
 }
 
+if (colo_proxy_failover() != 0) {
+error_report("colo proxy failed to do failover");
+}
+colo_proxy_destroy(COLO_MODE_SECONDARY);
+
 colo = NULL;
 
 if (!autostart) {
@@ -142,6 +147,8 @@ static void primary_vm_do_failover(void)
 vm_stop_force_state(RUN_STATE_COLO);
 }
 
+colo_proxy_destroy(COLO_MODE_PRIMARY);
+
 if (s->state != MIGRATION_STATUS_FAILED) {
 migrate_set_state(s, MIGRATION_STATUS_COLO, 
MIGRATION_STATUS_COMPLETED);
 }
@@ -265,6 +272,11 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s, QEMUFile *control)
 
 qemu_fflush(trans);
 
+ret = colo_proxy_checkpoint(COLO_MODE_PRIMARY);
+if (ret < 0) {
+goto out;
+}
+
 ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND);
 if (ret < 0) {
 goto out;
@@ -424,8 +436,6 @@ out:
 qemu_bh_schedule(s->cleanup_bh);
 qemu_mutex_unlock_iothread();
 
-colo_proxy_destroy(COLO_MODE_PRIMARY);
-
 return NULL;
 }
 
@@ -551,6 +561,11 @@ void *colo_process_incoming_checkpoints(void *opaque)
 goto out;
 }
 
+ret = colo_proxy_checkpoint(COLO_MODE_SECONDARY);
+if (ret < 0) {
+goto out;
+}
+
 ret = colo_ctl_get(f, COLO_CHECKPOINT_SEND);
 if (ret < 0) {
 goto out;
@@ -634,6 +649,7 @@ out:
 * just kill Secondary VM
 */
 error_report("SVM is going to exit in default!");
+colo_proxy_destroy(COLO_MODE_SECONDARY);
 exit(1);
 } else {
 /* if we went here, means Primary VM may dead, we are doing failover */
@@ -658,6 +674,5 @@ out:
 
 loadvm_exit_colo();
 
-colo_proxy_destroy(COLO_MODE_SECONDARY);
 return NULL;
 }
diff --git a/net/colo-nic.c b/net/colo-nic.c
index e395c13..4ddc9dc 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -373,6 +373,29 @@ void colo_proxy_destroy(enum COLOMode mode)
 teardown_nic(mode, getpid());
 }
 
+/*
+* Note: Weird, Only the VM in slave side need to do failover work !!!
+*/
+int colo_proxy_failover(void)
+{
+if (colo_proxy_send(NFCOLO_DO_FAILOVER, COLO_MODE_SECONDARY, 0, NULL) < 0) 
{
+return -1;
+}
+
+return 0;
+}
+
+/*
+* Note: Only the VM in master side need to do checkpoint
+*/
+int colo_proxy_checkpoint(enum COLOMode  mode)
+{
+if (colo_proxy_send(NFCOLO_DO_CHECKPOINT, mode, 0, NULL) < 0) {
+return -1;
+}
+return 0;
+}
+
 int colo_proxy_compare(void)
 {
 return atomic_xchg(&packet_compare_different, 0);
-- 
1.7.12.4

[Qemu-devel] [PATCH COLO-Frame v6 30/31] COLO: Implement shutdown checkpoint

2015-06-18 Thread zhanghailiang

For Secondary VM, we forbid it shutdown directly when in COLO mode,
FOR Primary VM's shutdown, we should do some work to ensure the consistent 
action
between PVM and SVM.

Cc: Paolo Bonzini 
Signed-off-by: zhanghailiang 
Signed-off-by: Lai Jiangshan 
Signed-off-by: Li Zhijian 
---
 include/sysemu/sysemu.h |  3 +++
 migration/colo.c| 32 +++-
 vl.c| 26 --
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 0304aa7..a77e18f 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -51,6 +51,8 @@ typedef enum WakeupReason {
 QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+extern int colo_shutdown_requested;
+
 void qemu_system_reset_request(void);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
@@ -58,6 +60,7 @@ void qemu_system_wakeup_request(WakeupReason reason);
 void qemu_system_wakeup_enable(WakeupReason reason, bool enabled);
 void qemu_register_wakeup_notifier(Notifier *notifier);
 void qemu_system_shutdown_request(void);
+void qemu_system_shutdown_request_core(void);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_system_debug_request(void);
diff --git a/migration/colo.c b/migration/colo.c
index 8d6d166..499a042 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -68,6 +68,8 @@ enum {
 COLO_CHECKPOINT_SEND,
 COLO_CHECKPOINT_RECEIVED,
 COLO_CHECKPOINT_LOADED,
+
+COLO_GUEST_SHUTDOWN
 };
 
 static QEMUBH *colo_bh;
@@ -224,7 +226,7 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require)
 
 static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control)
 {
-int ret;
+int colo_shutdown, ret;
 size_t size;
 QEMUFile *trans = NULL;
 
@@ -251,6 +253,7 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s, QEMUFile *control)
 }
 /* suspend and save vm state to colo buffer */
 qemu_mutex_lock_iothread();
+colo_shutdown = colo_shutdown_requested;
 vm_stop_force_state(RUN_STATE_COLO);
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("run", "stop");
@@ -266,6 +269,7 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s, QEMUFile *control)
 /* Disable block migration */
 s->params.blk = 0;
 s->params.shared = 0;
+qemu_savevm_state_header(trans);
 qemu_savevm_state_begin(trans, &s->params);
 qemu_mutex_lock_iothread();
 qemu_savevm_state_complete(trans);
@@ -308,6 +312,16 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s, QEMUFile *control)
 }
 trace_colo_receive_message("COLO_CHECKPOINT_LOADED");
 
+if (colo_shutdown) {
+colo_ctl_put(s->file, COLO_GUEST_SHUTDOWN);
+qemu_fflush(s->file);
+colo_shutdown_requested = 0;
+qemu_system_shutdown_request_core();
+while (1) {
+;
+}
+}
+
 ret = 0;
 /* resume master */
 qemu_mutex_lock_iothread();
@@ -372,6 +386,10 @@ static void *colo_thread(void *opaque)
 error_report("failover request");
 goto out;
 }
+
+if (colo_shutdown_requested) {
+goto do_checkpoint;
+}
 /* wait for a colo checkpoint */
 proxy_checkpoint_req = colo_proxy_compare();
 if (proxy_checkpoint_req < 0) {
@@ -489,6 +507,18 @@ static int colo_wait_handle_cmd(QEMUFile *f, int 
*checkpoint_request)
 case COLO_CHECKPOINT_NEW:
 *checkpoint_request = 1;
 return 0;
+case COLO_GUEST_SHUTDOWN:
+qemu_mutex_lock_iothread();
+vm_stop_force_state(RUN_STATE_COLO);
+qemu_system_shutdown_request_core();
+qemu_mutex_unlock_iothread();
+trace_colo_receive_message("COLO_GUEST_SHUTDOWN");
+/* the main thread will exit and termiante the whole
+* process, do we need some cleanup?
+*/
+for (;;) {
+;
+}
 default:
 return -1;
 }
diff --git a/vl.c b/vl.c
index 8f81062..0ff9e37 100644
--- a/vl.c
+++ b/vl.c
@@ -1543,6 +1543,8 @@ static NotifierList wakeup_notifiers =
 NOTIFIER_LIST_INITIALIZER(wakeup_notifiers);
 static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
 
+int colo_shutdown_requested;
+
 int qemu_shutdown_requested_get(void)
 {
 return shutdown_requested;
@@ -1659,6 +1661,10 @@ void qemu_system_reset(bool report)
 void qemu_system_reset_request(void)
 {
 if (no_reboot) {
+qemu_system_shutdown_request();
+if (!shutdown_requested) {/* colo handle it ? */
+return;
+}
 shutdown_requested = 1;
 } else {
 reset_requested = 1;
@@ -1727,13 +1733,29 @@ void qemu_system_killed(int signal, pid_t pid)
 qemu_system_shutdown_request();
 }
 
-void qemu_system_shutdown_request(void)
+void qemu_system_shutdown_request_core(void)
 {

[Qemu-devel] [PATCH COLO-Frame v6 22/31] COLO NIC : Implement colo nic init/destroy function

2015-06-18 Thread zhanghailiang

When in colo mode, call colo nic init/destroy function.

Cc: Stefan Hajnoczi 
Cc: Jason Wang 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 include/net/colo-nic.h |  4 +++
 migration/colo.c   | 15 +++
 net/colo-nic.c | 71 --
 3 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/include/net/colo-nic.h b/include/net/colo-nic.h
index 2bbe7bc..9ebc543 100644
--- a/include/net/colo-nic.h
+++ b/include/net/colo-nic.h
@@ -13,6 +13,7 @@
 
 #ifndef COLO_NIC_H
 #define COLO_NIC_H
+#include "migration/migration-colo.h"
 
 typedef struct COLONicState {
 char nicname[128]; /* forward dev */
@@ -23,4 +24,7 @@ typedef struct COLONicState {
 void colo_add_nic_devices(COLONicState *cns);
 void colo_remove_nic_devices(COLONicState *cns);
 
+int colo_proxy_init(enum COLOMode mode);
+void colo_proxy_destroy(enum COLOMode mode);
+
 #endif
diff --git a/migration/colo.c b/migration/colo.c
index 76bdd44..4c9e781 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -16,6 +16,7 @@
 #include "qemu/error-report.h"
 #include "migration/migration-failover.h"
 #include "qapi-event.h"
+#include "net/colo-nic.h"
 
 enum {
 COLO_CHECPOINT_READY = 0x46,
@@ -291,6 +292,11 @@ static void *colo_thread(void *opaque)
 QEMUFile *colo_control = NULL;
 int i, ret;
 
+if (colo_proxy_init(COLO_MODE_PRIMARY) != 0) {
+error_report("Init colo proxy error");
+goto out;
+}
+
 colo_control = qemu_fopen_socket(qemu_get_fd(s->file), "rb");
 if (!colo_control) {
 error_report("Open colo_control failed!");
@@ -364,6 +370,8 @@ out:
 qemu_bh_schedule(s->cleanup_bh);
 qemu_mutex_unlock_iothread();
 
+colo_proxy_destroy(COLO_MODE_PRIMARY);
+
 return NULL;
 }
 
@@ -428,6 +436,12 @@ void *colo_process_incoming_checkpoints(void *opaque)
 colo = qemu_coroutine_self();
 assert(colo != NULL);
 
+ /* configure the network */
+if (colo_proxy_init(COLO_MODE_SECONDARY) != 0) {
+error_report("Init colo proxy error\n");
+goto out;
+}
+
 ctl = qemu_fopen_socket(fd, "wb");
 if (!ctl) {
 error_report("Can't open incoming channel!");
@@ -590,5 +604,6 @@ out:
 
 loadvm_exit_colo();
 
+colo_proxy_destroy(COLO_MODE_SECONDARY);
 return NULL;
 }
diff --git a/net/colo-nic.c b/net/colo-nic.c
index c7dd473..7c3fcae 100644
--- a/net/colo-nic.c
+++ b/net/colo-nic.c
@@ -24,8 +24,6 @@ typedef struct nic_device {
 bool is_up;
 } nic_device;
 
-
-
 QTAILQ_HEAD(, nic_device) nic_devices = QTAILQ_HEAD_INITIALIZER(nic_devices);
 
 static int colo_nic_configure(COLONicState *cns,
@@ -68,6 +66,57 @@ static int colo_nic_configure(COLONicState *cns,
 return 0;
 }
 
+static int configure_one_nic(COLONicState *cns,
+ bool up, int side, int index)
+{
+struct nic_device *nic;
+
+assert(cns);
+
+QTAILQ_FOREACH(nic, &nic_devices, next) {
+if (nic->cns == cns) {
+if (up == nic->is_up) {
+return 0;
+}
+
+if (!nic->configure || (nic->configure(nic->cns, up, side, index) 
&&
+up)) {
+return -1;
+}
+nic->is_up = up;
+return 0;
+}
+}
+
+return -1;
+}
+
+static int configure_nic(int side, int index)
+{
+struct nic_device *nic;
+
+if (QTAILQ_EMPTY(&nic_devices)) {
+return -1;
+}
+
+QTAILQ_FOREACH(nic, &nic_devices, next) {
+if (configure_one_nic(nic->cns, 1, side, index)) {
+return -1;
+}
+}
+
+return 0;
+}
+
+static void teardown_nic(int side, int index)
+{
+struct nic_device *nic;
+
+QTAILQ_FOREACH(nic, &nic_devices, next) {
+configure_one_nic(nic->cns, 0, side, index);
+}
+}
+
 void colo_add_nic_devices(COLONicState *cns)
 {
 struct nic_device *nic;
@@ -98,8 +147,26 @@ void colo_remove_nic_devices(COLONicState *cns)
 
 QTAILQ_FOREACH_SAFE(nic, &nic_devices, next, next_nic) {
 if (nic->cns == cns) {
+configure_one_nic(cns, 0, get_colo_mode(), getpid());
 QTAILQ_REMOVE(&nic_devices, nic, next);
 g_free(nic);
 }
 }
 }
+
+int colo_proxy_init(enum COLOMode mode)
+{
+int ret = -1;
+
+ret = configure_nic(mode, getpid());
+if (ret != 0) {
+error_report("excute colo-proxy-script failed");
+}
+
+return ret;
+}
+
+void colo_proxy_destroy(enum COLOMode mode)
+{
+teardown_nic(mode, getpid());
+}
-- 
1.7.12.4

Re: [Qemu-devel] [PATCH] Add .dir-locals.el file to configure emacs coding style

2015-06-18 Thread Michael Tokarev

18.06.2015 11:36, Markus Armbruster wrote:
> Michael Tokarev  writes:
> 
>> So, what is the consensus here?
>>
>> Everyone who talked wants the emacs mode, but everyone
>> offers their own mode.
>>
>> I'd pick the stroustrup variant suggested by Marcus
>> since it is shortest, but while being shortest, it
>> is looks a bit "magical".
> 
> I don't think it's magical at all.  It uses .dir-locals exactly as
> intended.  In fact, it's almost straight from the Emacs manual:

No, I mean the one-line "stroustrup" definition is a bit "magical"
as it "magically" have all parameters behind the scenes matching
our coding style, as opposed to listing every aspect explicitly.

[]
> Want me to post a formal patch adding my revised .dir-locals.el?

Daniel already posted his 2-line V2.  It might be good
idea to add indent-tabs-mod nil to other modes too.

At any rate I'm *far* from emacs expert (even if it was my
only editor and development environment for about 10 years,
I forgot almost everything already).

Thanks,

/mjt

[Qemu-devel] [PATCH COLO-Frame v6 27/31] COLO: Add colo-set-checkpoint-period command

2015-06-18 Thread zhanghailiang

With this command, we can control the period of checkpoint, if
there is no comparison of net packets.

Cc: Luiz Capitulino 
Cc: Eric Blake 
Cc: Markus Armbruster 
Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
---
 hmp-commands.hx| 15 +++
 hmp.c  |  7 +++
 hmp.h  |  1 +
 migration/colo.c   | 11 ++-
 qapi-schema.json   | 13 +
 qmp-commands.hx| 22 ++
 stubs/migration-colo.c |  4 
 7 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index ed487a6..8e0412e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1021,6 +1021,21 @@ Tell COLO that heartbeat is lost, a failover or takeover 
is needed.
 ETEXI
 
 {
+.name   = "colo_set_checkpoint_period",
+.args_type  = "value:i",
+.params = "value",
+.help   = "set checkpoint period (in ms) for colo. "
+"Defaults to 100ms",
+.mhandler.cmd = hmp_colo_set_checkpoint_period,
+},
+
+STEXI
+@item migrate_set_checkpoint_period @var{value}
+@findex migrate_set_checkpoint_period
+Set checkpoint period to @var{value} (in ms) for colo.
+ETEXI
+
+{
 .name   = "client_migrate_info",
 .args_type  = 
"protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
 .params = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 8e25d5a..dfa47ed 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1279,6 +1279,13 @@ void hmp_colo_lost_heartbeat(Monitor *mon, const QDict 
*qdict)
 hmp_handle_error(mon, &err);
 }
 
+void hmp_colo_set_checkpoint_period(Monitor *mon, const QDict *qdict)
+{
+int64_t value = qdict_get_int(qdict, "value");
+
+qmp_colo_set_checkpoint_period(value, NULL);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
 const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index c36c99c..d66dc76 100644
--- a/hmp.h
+++ b/hmp.h
@@ -69,6 +69,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
 void hmp_colo_lost_heartbeat(Monitor *mon, const QDict *qdict);
+void hmp_colo_set_checkpoint_period(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/migration/colo.c b/migration/colo.c
index b11ed7b..f446827 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -18,6 +18,7 @@
 #include "migration/migration-failover.h"
 #include "qapi-event.h"
 #include "net/colo-nic.h"
+#include "qmp-commands.h"
 
 /*
 * We should not do checkpoint one after another without any time interval,
@@ -71,6 +72,9 @@ enum {
 static QEMUBH *colo_bh;
 static bool vmstate_loading;
 static Coroutine *colo;
+
+int64_t colo_checkpoint_period = CHECKPOINT_MAX_PEROID;
+
 /* colo buffer */
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 QEMUSizedBuffer *colo_buffer;
@@ -91,6 +95,11 @@ bool loadvm_in_colo_state(void)
 return colo != NULL;
 }
 
+void qmp_colo_set_checkpoint_period(int64_t value, Error **errp)
+{
+colo_checkpoint_period = value;
+}
+
 static bool colo_runstate_is_stopped(void)
 {
 return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
@@ -368,7 +377,7 @@ static void *colo_thread(void *opaque)
  * and then check if we need checkpoint again.
  */
 current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
-if (current_time - checkpoint_time < CHECKPOINT_MAX_PEROID) {
+if (current_time - checkpoint_time < colo_checkpoint_period) {
 g_usleep(10);
 continue;
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 736f8bd..abdd4c4 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -688,6 +688,19 @@
 { 'command': 'colo-lost-heartbeat' }
 
 ##
+# @colo-set-checkpoint-period
+#
+# Set colo checkpoint period
+#
+# @value: period of colo checkpoint in ms
+#
+# Returns: nothing on success
+#
+# Since: 2.4
+##
+{ 'command': 'colo-set-checkpoint-period', 'data': {'value': 'int'} }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 74626ac..28b1b96 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -800,6 +800,28 @@ Example:
 EQMP
 
 {
+ .name   = "colo-set-checkpoint-period",
+ .args_type  = "value:i",
+ .mhandler.cmd_new = qmp_marshal_input_colo_set_checkpoint_period,
+},
+
+SQMP
+colo-set-checkpoint-period
+--
+
+set checkpoint period
+
+Arguments:
+- "value": checkpoint period
+
+Example:
+
+-> { "execute": "colo-set-checkpoint-period", "arguments": { "value": "1000" } 
}
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "client

Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386

2015-06-18 Thread Aurelien Jarno

On 2015-06-18 10:16, Aurelien Jarno wrote:
> On x86, this patch brings a 5% boot time improvement on MIPS. One of the
> reason is that the TCG code generator has a good knowledge about which
> TCG ops or helpers can trigger an exception, so it can optimize out part
> of the instructions saving the CPU state. I guess that the host CPUs have
> also evolved over the time, now being superscalar and out-of-order so
> that saving the CPU state can be done "in background". Also it's just a
> quick and dirty patch, we can probably even do better.
> 
> All of that to say that I am worried for the performances to see more
> paths through the retranslation code, especially on MIPS as it seems to
> be costly. That said I haven't really look in details at other targets,
> nor hosts.

For an i386 guest still on an x86 host, I get a 4% slower boot time by
not using retranslation (see patch below). This is not that much
compared to the complexity retranslation bring us.

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 58b1959..de65bba 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8001,6 +8001,9 @@ static inline void gen_intermediate_code_internal(X86CPU 
*cpu,
 
 gen_tb_start(tb);
 for(;;) {
+gen_update_cc_op(dc);
+gen_jmp_im(pc_ptr - dc->cs_base);
+
 if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
 QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
 if (bp->pc == pc_ptr &&
diff --git a/translate-all.c b/translate-all.c
index b6b0e1c..3d4c017 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -212,6 +212,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, 
TranslationBlock *tb,
 int64_t ti;
 #endif
 
+return -1;
+
 #ifdef CONFIG_PROFILER
 ti = profile_getclock();
 #endif

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] libcacard: use the library?

2015-06-18 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 18/06/2015 10:11, Michael Tokarev wrote:
>> 18.06.2015 11:09, Paolo Bonzini пишет:
>>> On 17/06/2015 22:15, Michael Tokarev wrote:
 I tried autoconf&automake&libtool.  It is a HugeMess, I disliked it.
 So I rewrote it as a simple shell script.

 The result of both attempts is available at 
 http://www.corpit.ru/mjt/tmp/libcacard/
 There are 4 files in there:

  configure.ac Makefile.am -- auto*shit version, requires bootstrap like
   libtoolize && aclocal && automake --foreign --add-missing && autoconf
>>>
>>> More like autoreconf -fvi.
>> 
>> My 10-minute expirience with auto*tools did't go that far :)
>
> You got everything else right, though.  Kudos.
>
  configure Makefile.in -- my small version based on what qemu ./configure
   currently does.
>>>
>>> Doesn't have dependency tracking.  That's already a no-no I think.
>> 
>> Well, it is trivial to add.  For a first cut it works.
>
> And then it will be something else with cross-compilation, or something
> else.  Let's just use autotools and call it a day...

In my experience, the Autotools are the worst build system, except for
all the others.

Libtool is particularly horrible.  But when you actually have the
problem it solves (building shared libraries on almost every rotten OS
known to man), you're in a particularly horrible place already.

So, Paolo's recommendation seconded.

Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr

2015-06-18 Thread Pavel Dovgaluk

> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> On 17/06/2015 14:42, Pavel Dovgalyuk wrote:
> > This patch introduces several helpers to pass return address
> > which points to the TB. Correct return address allows correct
> > restoring of the guest PC and icount. These functions should be used when
> > helpers embedded into TB invoke memory operations.
> >
> > Signed-off-by: Pavel Dovgalyuk 
> > ---
> >  include/exec/cpu_ldst_template.h |   42 
> > +++---
> >  include/exec/exec-all.h  |   27 
> >  softmmu_template.h   |   18 
> >  3 files changed, 79 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/exec/cpu_ldst_template.h 
> > b/include/exec/cpu_ldst_template.h
> > index 95ab750..1847816 100644
> > --- a/include/exec/cpu_ldst_template.h
> > +++ b/include/exec/cpu_ldst_template.h
> > @@ -62,7 +62,9 @@
> >  /* generic load/store macros */
> >
> >  static inline RES_TYPE
> > -glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
> > +glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
> > +  target_ulong ptr,
> > +  uintptr_t retaddr)
> 
> Would it make sense to call these helper_cpu_ld##USUFFIX##MEMSUFFIX?
> 
> > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> > index 856e698..b3aefde 100644
> > --- a/include/exec/exec-all.h
> > +++ b/include/exec/exec-all.h
> > @@ -350,6 +350,33 @@ struct MemoryRegion *iotlb_to_region(CPUState *cpu,
> >  void tlb_fill(CPUState *cpu, target_ulong addr, int is_write, int mmu_idx,
> >uintptr_t retaddr);
> >
> > +uint8_t helper_call_ldb_cmmu(CPUArchState *env, target_ulong addr,
> > + int mmu_idx, uintptr_t retaddr);
> 
> Here we already have helper_ret_ldb_cmmu, so the new function is only
> needed if DATA_SIZE != 1.
> 
> > +uint16_t helper_call_ldw_cmmu(CPUArchState *env, target_ulong addr,
> > +  int mmu_idx, uintptr_t retaddr);
> 
> What about helper_ret_ldw_cmmu for consistency with the DATA_SIZE == 1 case?

tcg.h breaks these definitions:

/* Temporary aliases until backends are converted.  */
#ifdef TARGET_WORDS_BIGENDIAN
# define helper_ret_ldsw_mmu  helper_be_ldsw_mmu
# define helper_ret_lduw_mmu  helper_be_lduw_mmu
# define helper_ret_ldsl_mmu  helper_be_ldsl_mmu
# define helper_ret_ldul_mmu  helper_be_ldul_mmu
# define helper_ret_ldq_mmu   helper_be_ldq_mmu
# define helper_ret_stw_mmu   helper_be_stw_mmu
# define helper_ret_stl_mmu   helper_be_stl_mmu
# define helper_ret_stq_mmu   helper_be_stq_mmu
#else

Pavel Dovgalyuk

Re: [Qemu-devel] libcacard: use the library?

2015-06-18 Thread Daniel P. Berrange

On Thu, Jun 18, 2015 at 11:07:53AM +0200, Markus Armbruster wrote:
> Paolo Bonzini  writes:
> 
> > On 18/06/2015 10:11, Michael Tokarev wrote:
> >> 18.06.2015 11:09, Paolo Bonzini пишет:
> >>> On 17/06/2015 22:15, Michael Tokarev wrote:
>  I tried autoconf&automake&libtool.  It is a HugeMess, I disliked it.
>  So I rewrote it as a simple shell script.
> 
>  The result of both attempts is available at 
>  http://www.corpit.ru/mjt/tmp/libcacard/
>  There are 4 files in there:
> 
>   configure.ac Makefile.am -- auto*shit version, requires bootstrap like
>    libtoolize && aclocal && automake --foreign --add-missing && autoconf
> >>>
> >>> More like autoreconf -fvi.
> >> 
> >> My 10-minute expirience with auto*tools did't go that far :)
> >
> > You got everything else right, though.  Kudos.
> >
>   configure Makefile.in -- my small version based on what qemu ./configure
>    currently does.
> >>>
> >>> Doesn't have dependency tracking.  That's already a no-no I think.
> >> 
> >> Well, it is trivial to add.  For a first cut it works.
> >
> > And then it will be something else with cross-compilation, or something
> > else.  Let's just use autotools and call it a day...
> 
> In my experience, the Autotools are the worst build system, except for
> all the others.

And home grown systems that attempt to superficially look like autoconf,
eg qemu's configure, are the worst of all, because they give the poor
users false hope that behaviour will be like all autotools apps.

> Libtool is particularly horrible.  But when you actually have the
> problem it solves (building shared libraries on almost every rotten OS
> known to man), you're in a particularly horrible place already.
> 
> So, Paolo's recommendation seconded.

Agreed.   I'm happy to review any autoconf conversion, as I've maintained
obscenely complicated autoconf scripts (eg libvirts :-)

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v2] Add .dir-locals.el file to configure emacs coding style

2015-06-18 Thread Markus Armbruster

Peter Maydell  writes:

> On 4 June 2015 at 14:30, Daniel P. Berrange  wrote:
>> Some default emacs setups indent by 2 spaces and uses tabs
>> which is counter to the QEMU coding style rules. Adding a
>> .dir-locals.el file in the top level of the GIT repo will
>> inform emacs about the QEMU coding style, and so assist
>> contributors in avoiding common style mistakes before
>> they submit patches.
>>
>> Signed-off-by: Daniel P. Berrange 
>> ---
>>  .dir-locals.el | 2 ++
>>  1 file changed, 2 insertions(+)
>>  create mode 100644 .dir-locals.el
>>
>> diff --git a/.dir-locals.el b/.dir-locals.el
>> new file mode 100644
>> index 000..3ac0cfc
>> --- /dev/null
>> +++ b/.dir-locals.el
>> @@ -0,0 +1,2 @@
>> +((c-mode . ((c-file-style . "stroustrup")
>> +   (indent-tabs-mode . nil
>
> My .emacs defines a style like this:
>
> (defconst qemu-c-style
>   '((indent-tabs-mode . nil)
> (c-basic-offset . 4)
> (tab-width . 8)
> (c-comment-only-line-offset . 0)
> (c-offsets-alist . ((statement-block-intro . +)
> (substatement-open . 0)
> (label . 0)
> (statement-cont . +)
> (innamespace . 0)
> (inline-open . 0)
> ))
> (c-hanging-braces-alist .
> ((brace-list-open)
>  (brace-list-intro)
>  (brace-list-entry)
>  (brace-list-close)
>  (brace-entry-open)
>  (block-close . c-snug-do-while)
>  ;; structs have hanging braces on open
>  (class-open . (after))
>  ;; ditto if statements
>  (substatement-open . (after))
>  ;; and no auto newline at the end
>  (class-close)
>  ))
> )
>   "QEMU C Programming Style")
>
> which is a superset of Stroustrup and gets a few more
> corner cases right, I think.

Trouble is I can't figure out how to get that into Emacs using nothing
but files in the QEMU tree.  From within the tree, we can file- and
directory-local variables, and it just works as long as Emacs deems the
variables safe.  When not, Emacs prompts for confirmation.  I'd rather
not go there.

Using an existing C style is a textbook example of safe local variables
(literally, the Emacs manual does it).

Adding simple customizations on top should be feasible.

However, I can't see how I could define a new C style there without
pushing the "local variables" feature well beyond its intended use, and
triggering the confirmation prompts.

If we take Dan's patch, every Emacs user who hasn't already configured a
suitable style profits.  Users who have may have to adjust their
configuration to work with or around Dan's patch.

"That action is best which procures the greatest happiness for the
greatest numbers."  Let's take Dan's patch.

I can post a follow-up extending tab avoidance to files that don't use
c-mode.

Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386

2015-06-18 Thread Paolo Bonzini

On 18/06/2015 11:08, Aurelien Jarno wrote:
> For an i386 guest still on an x86 host, I get a 4% slower boot time by
> not using retranslation (see patch below). This is not that much
> compared to the complexity retranslation bring us.

QEMU could just always compute and store the restore_state information.
 TCG needs to help filling it in (a new TCG opcode?), but it should be easy.

Paolo

> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index 58b1959..de65bba 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -8001,6 +8001,9 @@ static inline void 
> gen_intermediate_code_internal(X86CPU *cpu,
>  
>  gen_tb_start(tb);
>  for(;;) {
> +gen_update_cc_op(dc);
> +gen_jmp_im(pc_ptr - dc->cs_base);
> +
>  if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
>  QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
>  if (bp->pc == pc_ptr &&
> diff --git a/translate-all.c b/translate-all.c
> index b6b0e1c..3d4c017 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -212,6 +212,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, 
> TranslationBlock *tb,
>  int64_t ti;
>  #endif
>  
> +return -1;
> +
>  #ifdef CONFIG_PROFILER
>  ti = profile_getclock();
>  #endif

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven


Am 18.06.2015 um 10:42 schrieb Kevin Wolf:

Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:

Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
 timeout=4999424576) at qemu-timer.c:326
 ts = {tv_sec = 4, tv_nsec = 999424576}
 tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
 at aio-posix.c:231
 node = 0x0
 was_dispatching = false
 ret = 1
 progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0, offset=4292007936,
 qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
 aio_context = 0x563528e0
 co = 0x563888a0
 rwco = {bs = 0x5637eae0, offset = 4292007936,
   qiov = 0x7554f760, is_write = false, ret = 2147483647, flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0, sector_num=8382828,
 buf=0x744cc800 "(", nb_sectors=4, is_write=false, flags=0)
 at block.c:2722
 qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size = 2048}
 iov = {iov_base = 0x744cc800, iov_len = 2048}
#5  0x5594b008 in bdrv_read (bs=0x5637eae0, sector_num=8382828,
 buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
No locals.
#6  0x5599acef in blk_read (blk=0x56376820, sector_num=8382828,
 buf=0x744cc800 "(", nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x55833ed2 in cd_read_sector (s=0x56408f88, lba=2095707,
 buf=0x744cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
 ret = 32767

Here is the problem: The ATAPI emulation uses synchronous blk_read()
instead of the AIO or coroutine interfaces. This means that it keeps
polling for request completion while it holds the BQL until the request
is completed.

I will look at this.


I need some further help. My way to "emulate" a hung NFS Server is to
block it in the Firewall. Currently I face the problem that I cannot mount
a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
a kernel NFS mount). It reads a few sectors and then stalls (maybe another bug):

(gdb) thread apply all bt full

Thread 3 (Thread 0x70c21700 (LWP 29710)):
#0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at 
util/qemu-thread-posix.c:120
err = 
__func__ = "qemu_cond_broadcast"
#1  0x55911164 in rfifolock_unlock (r=r@entry=0x56259910) at 
util/rfifolock.c:75
__PRETTY_FUNCTION__ = "rfifolock_unlock"
#2  0x55875921 in aio_context_release (ctx=ctx@entry=0x562598b0) at 
async.c:329
No locals.
#3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0, 
blocking=blocking@entry=true) at aio-posix.c:272
node = 
was_dispatching = false
i = 
ret = 
progress = false
timeout = 611734526
__PRETTY_FUNCTION__ = "aio_poll"
#4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0, 
offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0, 
is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:552
aio_context = 0x562598b0
co = 
rwco = {bs = 0x5627c0f0, offset = 7038976, qiov = 0x70c208f0, 
is_write = false, ret = 2147483647, flags = (unknown: 0)}
#5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0, 
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(", 
nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
flags=flags@entry=(unknown: 0)) at block/io.c:575
qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size = 2048}
iov = {iov_base = 0x57874800, iov_len = 2048}
#6  0x558bc593 in bdrv_read (bs=, 
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(", 
nb_sectors=nb_sectors@entry=4) at block/io.c:583
No locals.
#7  0x558af75d in blk_read (blk=, 
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(", 
nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
ret = 
#8  0x557abb88 in cd_read_sector (sector_size=, buf=0x57874800 
"(", lba=3437, s=0x5760db70) at hw/ide/atapi.c:116
ret = 
#9  ide_atapi_cmd_reply_end (s=0x5760db70) at hw/ide/atapi.c:190
byte_count_limit = 
size = 
ret = 2
#10 0x556398a6 in memory_region_write_accessor (mr=0x577f85d0, addr=, value=0x70c20a68, size=2, shift=, mask=, 
attrs=...)
at /home/lieven/git/qemu/memory.c:459
tmp = 
#11 0x5563956b in access_with_adjusted_size (addr=addr@entry=0, 
value=value@entry=0x70c20a68, size=size@entry=2, access_size_min=, 
access_size_max=,
access=access@entry=0x55639840 , 
mr=mr@entry=0x577f85d0, attrs=attrs@e

Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr

2015-06-18 Thread Paolo Bonzini



On 18/06/2015 11:24, Pavel Dovgaluk wrote:
>>> > > +uint16_t helper_call_ldw_cmmu(CPUArchState *env, target_ulong addr,
>>> > > +  int mmu_idx, uintptr_t retaddr);
>> > 
>> > What about helper_ret_ldw_cmmu for consistency with the DATA_SIZE == 1 
>> > case?
> tcg.h breaks these definitions:
> 
> /* Temporary aliases until backends are converted.  */
> #ifdef TARGET_WORDS_BIGENDIAN
> # define helper_ret_ldsw_mmu  helper_be_ldsw_mmu
> # define helper_ret_lduw_mmu  helper_be_lduw_mmu
> # define helper_ret_ldsl_mmu  helper_be_ldsl_mmu
> # define helper_ret_ldul_mmu  helper_be_ldul_mmu
> # define helper_ret_ldq_mmu   helper_be_ldq_mmu
> # define helper_ret_stw_mmu   helper_be_stw_mmu
> # define helper_ret_stl_mmu   helper_be_stl_mmu
> # define helper_ret_stq_mmu   helper_be_stq_mmu
> #else

Isn't this exactly the same as your helper_call_ldw_cmmu?

Paolo

Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr

2015-06-18 Thread Pavel Dovgaluk

> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> On 18/06/2015 11:24, Pavel Dovgaluk wrote:
> >>> > > +uint16_t helper_call_ldw_cmmu(CPUArchState *env, target_ulong addr,
> >>> > > +  int mmu_idx, uintptr_t retaddr);
> >> >
> >> > What about helper_ret_ldw_cmmu for consistency with the DATA_SIZE == 1 
> >> > case?
> > tcg.h breaks these definitions:
> >
> > /* Temporary aliases until backends are converted.  */
> > #ifdef TARGET_WORDS_BIGENDIAN
> > # define helper_ret_ldsw_mmu  helper_be_ldsw_mmu
> > # define helper_ret_lduw_mmu  helper_be_lduw_mmu
> > # define helper_ret_ldsl_mmu  helper_be_ldsl_mmu
> > # define helper_ret_ldul_mmu  helper_be_ldul_mmu
> > # define helper_ret_ldq_mmu   helper_be_ldq_mmu
> > # define helper_ret_stw_mmu   helper_be_stw_mmu
> > # define helper_ret_stl_mmu   helper_be_stl_mmu
> > # define helper_ret_stq_mmu   helper_be_stq_mmu
> > #else
> 
> Isn't this exactly the same as your helper_call_ldw_cmmu?

Yes, but I can't compile it yet.

Pavel Dovgalyuk

[Qemu-devel] [PULL 0/1] virtio-input: evdev passthrough

2015-06-18 Thread Gerd Hoffmann

  Hi,

New member for the virtio-input family:  evdev passthrough support.

please pull,
  Gerd

The following changes since commit f754c3c9cce3c4789733d9068394be4256dfe6a8:

  Merge remote-tracking branch 'remotes/agraf/tags/signed-s390-for-upstream' 
into staging (2015-06-17 12:43:26 +0100)

are available in the git repository at:


  git://git.kraxel.org/qemu tags/pull-input-20150618-1

for you to fetch changes up to 535017c3852f72e8706c3636d0bb2587920bf57d:

  virtio-input: evdev passthrough (2015-06-18 10:45:12 +0200)


virtio-input: evdev passthrough


Gerd Hoffmann (1):
  virtio-input: evdev passthrough

 hw/input/Makefile.objs   |   1 +
 hw/input/virtio-input-host.c | 182 +++
 hw/virtio/virtio-pci.c   |  30 +++
 hw/virtio/virtio-pci.h   |  10 +++
 include/hw/virtio/virtio-input.h |  13 +++
 5 files changed, 236 insertions(+)
 create mode 100644 hw/input/virtio-input-host.c

[Qemu-devel] [PULL 1/1] virtio-input: evdev passthrough

2015-06-18 Thread Gerd Hoffmann

This allows to assign host input devices to the guest:

qemu -device virtio-input-host-pci,evdev=/dev/input/event

The guest gets exclusive access to the input device, so be careful
with assigning the keyboard if you have only one connected to your
machine.

Signed-off-by: Gerd Hoffmann 
---
 hw/input/Makefile.objs   |   1 +
 hw/input/virtio-input-host.c | 182 +++
 hw/virtio/virtio-pci.c   |  30 +++
 hw/virtio/virtio-pci.h   |  10 +++
 include/hw/virtio/virtio-input.h |  13 +++
 5 files changed, 236 insertions(+)
 create mode 100644 hw/input/virtio-input-host.c

diff --git a/hw/input/Makefile.objs b/hw/input/Makefile.objs
index 0dae710..624ba7e 100644
--- a/hw/input/Makefile.objs
+++ b/hw/input/Makefile.objs
@@ -11,6 +11,7 @@ common-obj-$(CONFIG_VMMOUSE) += vmmouse.o
 ifeq ($(CONFIG_LINUX),y)
 common-obj-$(CONFIG_VIRTIO) += virtio-input.o
 common-obj-$(CONFIG_VIRTIO) += virtio-input-hid.o
+common-obj-$(CONFIG_VIRTIO) += virtio-input-host.o
 endif
 
 obj-$(CONFIG_MILKYMIST) += milkymist-softusb.o
diff --git a/hw/input/virtio-input-host.c b/hw/input/virtio-input-host.c
new file mode 100644
index 000..b16cc4c
--- /dev/null
+++ b/hw/input/virtio-input-host.c
@@ -0,0 +1,182 @@
+/*
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#include "qemu-common.h"
+#include "qemu/sockets.h"
+
+#include "hw/qdev.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/virtio-input.h"
+
+#include "standard-headers/linux/input.h"
+
+/* - */
+
+static struct virtio_input_config virtio_input_host_config[] = {
+{ /* empty list */ },
+};
+
+static void virtio_input_host_event(void *opaque)
+{
+VirtIOInputHost *vih = opaque;
+VirtIOInput *vinput = VIRTIO_INPUT(vih);
+struct virtio_input_event virtio;
+struct input_event evdev;
+int rc;
+
+for (;;) {
+rc = read(vih->fd, &evdev, sizeof(evdev));
+if (rc != sizeof(evdev)) {
+break;
+}
+
+virtio.type  = cpu_to_le16(evdev.type);
+virtio.code  = cpu_to_le16(evdev.code);
+virtio.value = cpu_to_le32(evdev.value);
+virtio_input_send(vinput, &virtio);
+}
+}
+
+static void virtio_input_bits_config(VirtIOInputHost *vih,
+ int type, int count)
+{
+virtio_input_config bits;
+int rc, i, size = 0;
+
+memset(&bits, 0, sizeof(bits));
+rc = ioctl(vih->fd, EVIOCGBIT(type, count/8), bits.u.bitmap);
+if (rc < 0) {
+return;
+}
+
+for (i = 0; i < count/8; i++) {
+if (bits.u.bitmap[i]) {
+size = i+1;
+}
+}
+if (size == 0) {
+return;
+}
+
+bits.select = VIRTIO_INPUT_CFG_EV_BITS;
+bits.subsel = type;
+bits.size   = size;
+virtio_input_add_config(VIRTIO_INPUT(vih), &bits);
+}
+
+static void virtio_input_host_realize(DeviceState *dev, Error **errp)
+{
+VirtIOInputHost *vih = VIRTIO_INPUT_HOST(dev);
+VirtIOInput *vinput = VIRTIO_INPUT(dev);
+virtio_input_config id;
+struct input_id ids;
+int rc, ver;
+
+if (!vih->evdev) {
+error_setg(errp, "evdev property is required");
+return;
+}
+
+vih->fd = open(vih->evdev, O_RDWR);
+if (vih->fd < 0)  {
+error_setg_file_open(errp, errno, vih->evdev);
+return;
+}
+qemu_set_nonblock(vih->fd);
+
+rc = ioctl(vih->fd, EVIOCGVERSION, &ver);
+if (rc < 0) {
+error_setg(errp, "%s: is not an evdev device", vih->evdev);
+goto err_close;
+}
+
+rc = ioctl(vih->fd, EVIOCGRAB, 1);
+if (rc < 0) {
+error_setg_errno(errp, errno, "%s: failed to get exclusive access",
+ vih->evdev);
+goto err_close;
+}
+
+memset(&id, 0, sizeof(id));
+ioctl(vih->fd, EVIOCGNAME(sizeof(id.u.string)-1), id.u.string);
+id.select = VIRTIO_INPUT_CFG_ID_NAME;
+id.size = strlen(id.u.string);
+virtio_input_add_config(vinput, &id);
+
+if (ioctl(vih->fd, EVIOCGID, &ids) == 0) {
+memset(&id, 0, sizeof(id));
+id.select = VIRTIO_INPUT_CFG_ID_DEVIDS;
+id.size = sizeof(struct virtio_input_devids);
+id.u.ids.bustype = cpu_to_le16(ids.bustype);
+id.u.ids.vendor  = cpu_to_le16(ids.vendor);
+id.u.ids.product = cpu_to_le16(ids.product);
+id.u.ids.version = cpu_to_le16(ids.version);
+virtio_input_add_config(vinput, &id);
+}
+
+virtio_input_bits_config(vih, EV_KEY, KEY_CNT);
+virtio_input_bits_config(vih, EV_REL, REL_CNT);
+virtio_input_bits_config(vih, EV_ABS, ABS_CNT);
+virtio_input_bits_config(vih, EV_MSC, MSC_CNT);
+virtio_input_bits_config(vih, EV_SW,  SW_CNT);
+
+qemu_set_fd_handler(vih->fd, virtio_input_host_event, NULL, vih);
+return;
+
+err_close:
+close(vih->fd);

Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr

2015-06-18 Thread Paolo Bonzini



On 18/06/2015 11:33, Pavel Dovgaluk wrote:
> > > /* Temporary aliases until backends are converted.  */
> > > #ifdef TARGET_WORDS_BIGENDIAN
> > > # define helper_ret_ldsw_mmu  helper_be_ldsw_mmu
> > > # define helper_ret_lduw_mmu  helper_be_lduw_mmu
> > > # define helper_ret_ldsl_mmu  helper_be_ldsl_mmu
> > > # define helper_ret_ldul_mmu  helper_be_ldul_mmu
> > > # define helper_ret_ldq_mmu   helper_be_ldq_mmu
> > > # define helper_ret_stw_mmu   helper_be_stw_mmu
> > > # define helper_ret_stl_mmu   helper_be_stl_mmu
> > > # define helper_ret_stq_mmu   helper_be_stq_mmu
> > > #else
> > 
> > Isn't this exactly the same as your helper_call_ldw_cmmu?
> 
> Yes, but I can't compile it yet.

I'm not sure what's the problem.  Can you just move this part of
tcg/tcg.h to another header file?

Paolo

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Stefan Hajnoczi

On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven  wrote:
> Am 18.06.2015 um 10:42 schrieb Kevin Wolf:
>>
>> Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:
>>>
>>> Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

 Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:
>
> Thread 2 (Thread 0x75550700 (LWP 2636)):
> #0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
> No symbol table info available.
> #1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
>  timeout=4999424576) at qemu-timer.c:326
>  ts = {tv_sec = 4, tv_nsec = 999424576}
>  tvsec = 4
> #2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
>  at aio-posix.c:231
>  node = 0x0
>  was_dispatching = false
>  ret = 1
>  progress = false
> #3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
> offset=4292007936,
>  qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
>  aio_context = 0x563528e0
>  co = 0x563888a0
>  rwco = {bs = 0x5637eae0, offset = 4292007936,
>qiov = 0x7554f760, is_write = false, ret = 2147483647,
> flags = 0}
> #4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
> sector_num=8382828,
>  buf=0x744cc800 "(", nb_sectors=4, is_write=false, flags=0)
>  at block.c:2722
>  qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size =
> 2048}
>  iov = {iov_base = 0x744cc800, iov_len = 2048}
> #5  0x5594b008 in bdrv_read (bs=0x5637eae0,
> sector_num=8382828,
>  buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
> No locals.
> #6  0x5599acef in blk_read (blk=0x56376820,
> sector_num=8382828,
>  buf=0x744cc800 "(", nb_sectors=4) at block/block-backend.c:404
> No locals.
> #7  0x55833ed2 in cd_read_sector (s=0x56408f88,
> lba=2095707,
>  buf=0x744cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
>  ret = 32767

 Here is the problem: The ATAPI emulation uses synchronous blk_read()
 instead of the AIO or coroutine interfaces. This means that it keeps
 polling for request completion while it holds the BQL until the request
 is completed.
>>>
>>> I will look at this.
>
>
> I need some further help. My way to "emulate" a hung NFS Server is to
> block it in the Firewall. Currently I face the problem that I cannot mount
> a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
> a kernel NFS mount). It reads a few sectors and then stalls (maybe another
> bug):
>
> (gdb) thread apply all bt full
>
> Thread 3 (Thread 0x70c21700 (LWP 29710)):
> #0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
> util/qemu-thread-posix.c:120
> err = 
> __func__ = "qemu_cond_broadcast"
> #1  0x55911164 in rfifolock_unlock (r=r@entry=0x56259910) at
> util/rfifolock.c:75
> __PRETTY_FUNCTION__ = "rfifolock_unlock"
> #2  0x55875921 in aio_context_release (ctx=ctx@entry=0x562598b0)
> at async.c:329
> No locals.
> #3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
> blocking=blocking@entry=true) at aio-posix.c:272
> node = 
> was_dispatching = false
> i = 
> ret = 
> progress = false
> timeout = 611734526
> __PRETTY_FUNCTION__ = "aio_poll"
> #4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
> offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
> is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
> block/io.c:552
> aio_context = 0x562598b0
> co = 
> rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
> 0x70c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
> #5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
> sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
> nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
> flags=flags@entry=(unknown: 0)) at block/io.c:575
> qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size = 2048}
> iov = {iov_base = 0x57874800, iov_len = 2048}
> #6  0x558bc593 in bdrv_read (bs=,
> sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
> nb_sectors=nb_sectors@entry=4) at block/io.c:583
> No locals.
> #7  0x558af75d in blk_read (blk=,
> sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
> nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
> ret = 
> #8  0x557abb88 in cd_read_sector (sector_size=,
> buf=0x57874800 "(", lba=3437, s=0x5760db70) at hw/ide/atapi.c:116
> ret = 
> #9  ide_atapi_cmd_reply_end (s=0x5760db70) at hw/ide/atapi.c:190
> byte_count_limit = 
> size = 
>

[Qemu-devel] [PATCH] ui/egl: use stride and y0_top

2015-06-18 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 Makefile |  4 +++-
 include/ui/egl-proto.h   |  2 ++
 qemu-eglview.c   | 14 --
 ui/egl.c |  4 
 ui/shader/texture-blit-flip.vert | 10 ++
 5 files changed, 31 insertions(+), 3 deletions(-)
 create mode 100644 ui/shader/texture-blit-flip.vert

diff --git a/Makefile b/Makefile
index 67eb59a..d10133a 100644
--- a/Makefile
+++ b/Makefile
@@ -458,7 +458,9 @@ ui/console-gl.o: $(SRC_PATH)/ui/console-gl.c \
ui/shader/texture-blit-vert.h ui/shader/texture-blit-frag.h
 
 qemu-eglview.o: $(SRC_PATH) qemu-eglview.c \
-   ui/shader/texture-blit-vert.h ui/shader/texture-blit-oes-frag.h
+   ui/shader/texture-blit-vert.h \
+   ui/shader/texture-blit-flip-vert.h \
+   ui/shader/texture-blit-oes-frag.h
 
 # documentation
 MAKEINFO=makeinfo
diff --git a/include/ui/egl-proto.h b/include/ui/egl-proto.h
index 1878224..3e149ed 100644
--- a/include/ui/egl-proto.h
+++ b/include/ui/egl-proto.h
@@ -24,7 +24,9 @@ typedef struct egl_msg {
 struct egl_newbuf {
 uint32_t width;
 uint32_t height;
+uint32_t stride;
 uint32_t fourcc;
+bool y0_top;
 } newbuf;
 struct egl_ptr_set {
 uint32_t x;
diff --git a/qemu-eglview.c b/qemu-eglview.c
index efe992b..ed7bee0 100644
--- a/qemu-eglview.c
+++ b/qemu-eglview.c
@@ -42,10 +42,12 @@ static GIOChannel *ioc;
 
 static uint32_t buf_width;
 static uint32_t buf_height;
+static bool buf_y0_top;
 static EGLImageKHR buf_image = EGL_NO_IMAGE_KHR;
 static GLuint buf_tex_id;
 
 static GLint texture_blit_prog;
+static GLint texture_blit_flip_prog;
 
 #define GL_CHECK_ERROR() do {   \
 GLint err = glGetError();   \
@@ -56,6 +58,7 @@ static GLint texture_blit_prog;
 } while (0)
 
 #include "ui/shader/texture-blit-vert.h"
+#include "ui/shader/texture-blit-flip-vert.h"
 #include "ui/shader/texture-blit-oes-frag.h"
 
 /* -- */
@@ -106,7 +109,11 @@ static gboolean egl_draw(GtkWidget *widget, cairo_t *cr, 
void *opaque)
 glClearColor(0.1f, 0.1f, 0.1f, 0.0f);
 glClear(GL_COLOR_BUFFER_BIT);
 
-qemu_gl_run_texture_blit(texture_blit_prog);
+if (buf_y0_top) {
+qemu_gl_run_texture_blit(texture_blit_flip_prog);
+} else {
+qemu_gl_run_texture_blit(texture_blit_prog);
+}
 eglSwapBuffers(qemu_egl_display, egl_surface);
 
 return TRUE;
@@ -131,13 +138,14 @@ static void egl_newbuf(egl_msg *msg, int msgfd)
 msgfd, msg->u.newbuf.width, msg->u.newbuf.height);
 buf_width = msg->u.newbuf.width;
 buf_height = msg->u.newbuf.height;
+buf_y0_top = msg->u.newbuf.y0_top;
 
 gtk_widget_set_size_request(draw, buf_width, buf_height);
 
 attrs[0] = EGL_DMA_BUF_PLANE0_FD_EXT;
 attrs[1] = msgfd;
 attrs[2] = EGL_DMA_BUF_PLANE0_PITCH_EXT;
-attrs[3] = buf_width * 4;
+attrs[3] = msg->u.newbuf.stride;
 attrs[4] = EGL_DMA_BUF_PLANE0_OFFSET_EXT;
 attrs[5] = 0;
 attrs[6] = EGL_WIDTH;
@@ -420,6 +428,8 @@ int main(int argc, char *argv[])
 gtk_widget_set_double_buffered(draw, FALSE);
 texture_blit_prog = qemu_gl_create_compile_link_program
 (texture_blit_vert_src, texture_blit_oes_frag_src);
+texture_blit_flip_prog = qemu_gl_create_compile_link_program
+(texture_blit_flip_vert_src, texture_blit_oes_frag_src);
 if (!texture_blit_prog) {
 fprintf(stderr, "shader compile/link failure\n");
 exit(1);
diff --git a/ui/egl.c b/ui/egl.c
index e420beb..c63b453 100644
--- a/ui/egl.c
+++ b/ui/egl.c
@@ -293,7 +293,9 @@ static void egl_gfx_switch(DisplayChangeListener *dcl,
 edpy->newbuf.display = edpy->idx;
 edpy->newbuf.u.newbuf.width = surface_width(edpy->ds);
 edpy->newbuf.u.newbuf.height = surface_height(edpy->ds);
+edpy->newbuf.u.newbuf.stride = stride;
 edpy->newbuf.u.newbuf.fourcc = fourcc;
+edpy->newbuf.u.newbuf.y0_top = false;
 
 egl_send_all(edpy->egl, &edpy->newbuf, edpy->dmabuf_fd);
 }
@@ -344,7 +346,9 @@ static void egl_scanout(DisplayChangeListener *dcl,
 edpy->newbuf.display = edpy->idx;
 edpy->newbuf.u.newbuf.width = surface_width(edpy->ds);
 edpy->newbuf.u.newbuf.height = surface_height(edpy->ds);
+edpy->newbuf.u.newbuf.stride = stride;
 edpy->newbuf.u.newbuf.fourcc = fourcc;
+edpy->newbuf.u.newbuf.y0_top = backing_y_0_top;
 
 egl_send_all(edpy->egl, &edpy->newbuf, edpy->dmabuf_fd);
 }
diff --git a/ui/shader/texture-blit-flip.vert b/ui/shader/texture-blit-flip.vert
new file mode 100644
index 000..ba081fa
--- /dev/null
+++ b/ui/shader/texture-blit-flip.vert
@@ -0,0 +1,10 @@
+
+#version 300 es
+
+in vec2  in_position;
+out vec2 ex_tex_coord;
+
+void main(void) {
+gl_Position = vec4(in_position, 0.0, 1.0);
+ex_tex_

Re: [Qemu-devel] [PULL 1/1] virtio-input: evdev passthrough

2015-06-18 Thread Michael S. Tsirkin

On Thu, Jun 18, 2015 at 11:33:55AM +0200, Gerd Hoffmann wrote:
> This allows to assign host input devices to the guest:
> 
> qemu -device virtio-input-host-pci,evdev=/dev/input/event
> 
> The guest gets exclusive access to the input device, so be careful
> with assigning the keyboard if you have only one connected to your
> machine.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/input/Makefile.objs   |   1 +
>  hw/input/virtio-input-host.c | 182 
> +++
>  hw/virtio/virtio-pci.c   |  30 +++
>  hw/virtio/virtio-pci.h   |  10 +++
>  include/hw/virtio/virtio-input.h |  13 +++
>  5 files changed, 236 insertions(+)
>  create mode 100644 hw/input/virtio-input-host.c
> 
> diff --git a/hw/input/Makefile.objs b/hw/input/Makefile.objs
> index 0dae710..624ba7e 100644
> --- a/hw/input/Makefile.objs
> +++ b/hw/input/Makefile.objs
> @@ -11,6 +11,7 @@ common-obj-$(CONFIG_VMMOUSE) += vmmouse.o
>  ifeq ($(CONFIG_LINUX),y)
>  common-obj-$(CONFIG_VIRTIO) += virtio-input.o
>  common-obj-$(CONFIG_VIRTIO) += virtio-input-hid.o
> +common-obj-$(CONFIG_VIRTIO) += virtio-input-host.o
>  endif
>  
>  obj-$(CONFIG_MILKYMIST) += milkymist-softusb.o
> diff --git a/hw/input/virtio-input-host.c b/hw/input/virtio-input-host.c
> new file mode 100644
> index 000..b16cc4c
> --- /dev/null
> +++ b/hw/input/virtio-input-host.c
> @@ -0,0 +1,182 @@
> +/*
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.  See the COPYING file in the
> + * top-level directory.
> + */
> +
> +#include "qemu-common.h"
> +#include "qemu/sockets.h"
> +
> +#include "hw/qdev.h"
> +#include "hw/virtio/virtio.h"
> +#include "hw/virtio/virtio-input.h"
> +
> +#include "standard-headers/linux/input.h"
> +
> +/* - */
> +
> +static struct virtio_input_config virtio_input_host_config[] = {
> +{ /* empty list */ },
> +};
> +
> +static void virtio_input_host_event(void *opaque)
> +{
> +VirtIOInputHost *vih = opaque;
> +VirtIOInput *vinput = VIRTIO_INPUT(vih);
> +struct virtio_input_event virtio;
> +struct input_event evdev;
> +int rc;
> +
> +for (;;) {
> +rc = read(vih->fd, &evdev, sizeof(evdev));
> +if (rc != sizeof(evdev)) {
> +break;
> +}
> +
> +virtio.type  = cpu_to_le16(evdev.type);
> +virtio.code  = cpu_to_le16(evdev.code);
> +virtio.value = cpu_to_le32(evdev.value);
> +virtio_input_send(vinput, &virtio);
> +}
> +}
> +
> +static void virtio_input_bits_config(VirtIOInputHost *vih,
> + int type, int count)
> +{
> +virtio_input_config bits;
> +int rc, i, size = 0;
> +
> +memset(&bits, 0, sizeof(bits));
> +rc = ioctl(vih->fd, EVIOCGBIT(type, count/8), bits.u.bitmap);
> +if (rc < 0) {
> +return;
> +}
> +
> +for (i = 0; i < count/8; i++) {
> +if (bits.u.bitmap[i]) {
> +size = i+1;
> +}
> +}
> +if (size == 0) {
> +return;
> +}
> +
> +bits.select = VIRTIO_INPUT_CFG_EV_BITS;
> +bits.subsel = type;
> +bits.size   = size;
> +virtio_input_add_config(VIRTIO_INPUT(vih), &bits);
> +}
> +
> +static void virtio_input_host_realize(DeviceState *dev, Error **errp)
> +{
> +VirtIOInputHost *vih = VIRTIO_INPUT_HOST(dev);
> +VirtIOInput *vinput = VIRTIO_INPUT(dev);
> +virtio_input_config id;
> +struct input_id ids;
> +int rc, ver;
> +
> +if (!vih->evdev) {
> +error_setg(errp, "evdev property is required");
> +return;
> +}
> +
> +vih->fd = open(vih->evdev, O_RDWR);
> +if (vih->fd < 0)  {
> +error_setg_file_open(errp, errno, vih->evdev);
> +return;
> +}
> +qemu_set_nonblock(vih->fd);
> +
> +rc = ioctl(vih->fd, EVIOCGVERSION, &ver);
> +if (rc < 0) {
> +error_setg(errp, "%s: is not an evdev device", vih->evdev);
> +goto err_close;
> +}
> +
> +rc = ioctl(vih->fd, EVIOCGRAB, 1);
> +if (rc < 0) {
> +error_setg_errno(errp, errno, "%s: failed to get exclusive access",
> + vih->evdev);
> +goto err_close;
> +}
> +
> +memset(&id, 0, sizeof(id));
> +ioctl(vih->fd, EVIOCGNAME(sizeof(id.u.string)-1), id.u.string);
> +id.select = VIRTIO_INPUT_CFG_ID_NAME;
> +id.size = strlen(id.u.string);
> +virtio_input_add_config(vinput, &id);
> +
> +if (ioctl(vih->fd, EVIOCGID, &ids) == 0) {
> +memset(&id, 0, sizeof(id));
> +id.select = VIRTIO_INPUT_CFG_ID_DEVIDS;
> +id.size = sizeof(struct virtio_input_devids);
> +id.u.ids.bustype = cpu_to_le16(ids.bustype);
> +id.u.ids.vendor  = cpu_to_le16(ids.vendor);
> +id.u.ids.product = cpu_to_le16(ids.product);
> +id.u.ids.version = cpu_to_le16(ids.version);
> +virtio_input_add_config(vinput, &id);
> +}
> +
> +vi

Re: [Qemu-devel] [PATCH v7 0/9] Add limited support of VMware's hyper-call rpc

2015-06-18 Thread Michael S. Tsirkin

On Thu, Jun 18, 2015 at 10:33:00AM +0200, Paolo Bonzini wrote:
> 
> 
> On 18/06/2015 09:58, Michael S. Tsirkin wrote:
> > > If I am reading this correctly, I should add
> > > 
> > > Acked-by: Paolo Bonzini 
> > > 
> > > to these 4 patches.
> > > 
> > > Since I have never sent a pull request to QEMU before here is what I
> > > think should be in it:
> > 
> > I'd like to see a version with comments addressed first.
> 
> These four patches are unrelated.
> 
> Paolo


Quite possibly but personally I'm confused.
May I see a series with either just patches intended for merge,
or the ones not intended for merge called out explicitly?

-- 
MST

Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386

2015-06-18 Thread Aurelien Jarno

On 2015-06-18 11:29, Paolo Bonzini wrote:
> On 18/06/2015 11:08, Aurelien Jarno wrote:
> > For an i386 guest still on an x86 host, I get a 4% slower boot time by
> > not using retranslation (see patch below). This is not that much
> > compared to the complexity retranslation bring us.
> 
> QEMU could just always compute and store the restore_state information.
>  TCG needs to help filling it in (a new TCG opcode?), but it should be easy.

Yes, that was another approach I have in mind (I called it exception
table in my other mail), but it requires a tiny more work than just
saving the CPU state all the time. The problem is that the state
information we want to save are varying for target to target. Going
through a TCG opcode means we can use the liveness analysis pass to save
the minimum amount of data.

That said I would like to push further the idea of always saving the CPU
state a bit more to see if we can keep the same performances. There are
still improvements to do, by removing more code on the core side (like
finding the call to tb_finc_pc which is now useless), or on the target
side by checking/improving helper flags. We might save the CPU state too
often if a helper doesn't declare it doesn't touch globals.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

[Qemu-devel] How to trigger faults for missing peripherals?

2015-06-18 Thread Liviu Ionescu

In order to make the Cortex-M emulation accurate, I would need to configure the 
missing address ranges to trigger memory faults.

I noticed that the emulator defines a memory range to cover the entire 64-bits 
memory space. Is it possible to make it trigger exceptions?

If not, what would be the solution? To define the entire Cortex-M 32-bits 
memory space as a memory range that will trigger exceptions, and on top of it 
define the existing memory ranges?


Regards,

Liviu

Re: [Qemu-devel] [Qemu-block] RFC cdrom in own thread?

2015-06-18 Thread Peter Lieven


Am 18.06.2015 um 11:36 schrieb Stefan Hajnoczi:

On Thu, Jun 18, 2015 at 10:29 AM, Peter Lieven  wrote:

Am 18.06.2015 um 10:42 schrieb Kevin Wolf:

Am 18.06.2015 um 10:30 hat Peter Lieven geschrieben:

Am 18.06.2015 um 09:45 schrieb Kevin Wolf:

Am 18.06.2015 um 09:12 hat Peter Lieven geschrieben:

Thread 2 (Thread 0x75550700 (LWP 2636)):
#0  0x75d87aa3 in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x55955d91 in qemu_poll_ns (fds=0x563889c0, nfds=3,
  timeout=4999424576) at qemu-timer.c:326
  ts = {tv_sec = 4, tv_nsec = 999424576}
  tvsec = 4
#2  0x55956feb in aio_poll (ctx=0x563528e0, blocking=true)
  at aio-posix.c:231
  node = 0x0
  was_dispatching = false
  ret = 1
  progress = false
#3  0x5594aeed in bdrv_prwv_co (bs=0x5637eae0,
offset=4292007936,
  qiov=0x7554f760, is_write=false, flags=0) at block.c:2699
  aio_context = 0x563528e0
  co = 0x563888a0
  rwco = {bs = 0x5637eae0, offset = 4292007936,
qiov = 0x7554f760, is_write = false, ret = 2147483647,
flags = 0}
#4  0x5594afa9 in bdrv_rw_co (bs=0x5637eae0,
sector_num=8382828,
  buf=0x744cc800 "(", nb_sectors=4, is_write=false, flags=0)
  at block.c:2722
  qiov = {iov = 0x7554f780, niov = 1, nalloc = -1, size =
2048}
  iov = {iov_base = 0x744cc800, iov_len = 2048}
#5  0x5594b008 in bdrv_read (bs=0x5637eae0,
sector_num=8382828,
  buf=0x744cc800 "(", nb_sectors=4) at block.c:2730
No locals.
#6  0x5599acef in blk_read (blk=0x56376820,
sector_num=8382828,
  buf=0x744cc800 "(", nb_sectors=4) at block/block-backend.c:404
No locals.
#7  0x55833ed2 in cd_read_sector (s=0x56408f88,
lba=2095707,
  buf=0x744cc800 "(", sector_size=2048) at hw/ide/atapi.c:116
  ret = 32767

Here is the problem: The ATAPI emulation uses synchronous blk_read()
instead of the AIO or coroutine interfaces. This means that it keeps
polling for request completion while it holds the BQL until the request
is completed.

I will look at this.


I need some further help. My way to "emulate" a hung NFS Server is to
block it in the Firewall. Currently I face the problem that I cannot mount
a CD Iso via libnfs (nfs://) without hanging Qemu (i previously tried with
a kernel NFS mount). It reads a few sectors and then stalls (maybe another
bug):

(gdb) thread apply all bt full

Thread 3 (Thread 0x70c21700 (LWP 29710)):
#0  qemu_cond_broadcast (cond=cond@entry=0x56259940) at
util/qemu-thread-posix.c:120
 err = 
 __func__ = "qemu_cond_broadcast"
#1  0x55911164 in rfifolock_unlock (r=r@entry=0x56259910) at
util/rfifolock.c:75
 __PRETTY_FUNCTION__ = "rfifolock_unlock"
#2  0x55875921 in aio_context_release (ctx=ctx@entry=0x562598b0)
at async.c:329
No locals.
#3  0x5588434c in aio_poll (ctx=ctx@entry=0x562598b0,
blocking=blocking@entry=true) at aio-posix.c:272
 node = 
 was_dispatching = false
 i = 
 ret = 
 progress = false
 timeout = 611734526
 __PRETTY_FUNCTION__ = "aio_poll"
#4  0x558bc43d in bdrv_prwv_co (bs=bs@entry=0x5627c0f0,
offset=offset@entry=7038976, qiov=qiov@entry=0x70c208f0,
is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at
block/io.c:552
 aio_context = 0x562598b0
 co = 
 rwco = {bs = 0x5627c0f0, offset = 7038976, qiov =
0x70c208f0, is_write = false, ret = 2147483647, flags = (unknown: 0)}
#5  0x558bc533 in bdrv_rw_co (bs=0x5627c0f0,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
nb_sectors=nb_sectors@entry=4, is_write=is_write@entry=false,
 flags=flags@entry=(unknown: 0)) at block/io.c:575
 qiov = {iov = 0x70c208e0, niov = 1, nalloc = -1, size = 2048}
 iov = {iov_base = 0x57874800, iov_len = 2048}
#6  0x558bc593 in bdrv_read (bs=,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
nb_sectors=nb_sectors@entry=4) at block/io.c:583
No locals.
#7  0x558af75d in blk_read (blk=,
sector_num=sector_num@entry=13748, buf=buf@entry=0x57874800 "(",
nb_sectors=nb_sectors@entry=4) at block/block-backend.c:493
 ret = 
#8  0x557abb88 in cd_read_sector (sector_size=,
buf=0x57874800 "(", lba=3437, s=0x5760db70) at hw/ide/atapi.c:116
 ret = 
#9  ide_atapi_cmd_reply_end (s=0x5760db70) at hw/ide/atapi.c:190
 byte_count_limit = 
 size = 
 ret = 2

This is still the same scenario Kevin explained.

The ATAPI CD-ROM emulation code is using synchronous blk_read().  This
function holds the QEMU global mutex while waiting for the I/O request
to complete.  This blocks other vcpu threads and the main loop thread.

The solution is to convert the CD-ROM emulat

Re: [Qemu-devel] [Qemu-trivial] [PATCH 2/2] Check value for invalid negative values

2015-06-18 Thread Frediano Ziglio

For the same reason there is the v >= l test.
The v >= l test state that the value can be out of range so it not always a 
constant in the range.
Adding the v < 0 check for every invalid value. As these are executed only for 
logging should not be a performance penalty.
I also hope the compiler is able to optimize

if (v < 0 || v >= l)

with 

if ((unsigned) v >= l)

Frediano

> 
> 11.06.2015 16:17, Frediano Ziglio wrote:
> > In qxl_v2n check that value is not negative.
> 
> Why do you think it is necessary?
> 
> Thanks,
> 
> /mjt
> 
> > Signed-off-by: Frediano Ziglio 
> > ---
> >  hw/display/qxl-logger.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/display/qxl-logger.c b/hw/display/qxl-logger.c
> > index d944d3f..faed869 100644
> > --- a/hw/display/qxl-logger.c
> > +++ b/hw/display/qxl-logger.c
> > @@ -93,7 +93,7 @@ static const char *const spice_cursor_type[] = {
> >  
> >  static const char *qxl_v2n(const char *const n[], size_t l, int v)
> >  {
> > -if (v >= l || !n[v]) {
> > +if (v < 0 || v >= l || !n[v]) {
> >  return "???";
> >  }
> >  return n[v];
> 
> 
> 
>

Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386

2015-06-18 Thread Paolo Bonzini

On 18/06/2015 11:42, Aurelien Jarno wrote:
>> > QEMU could just always compute and store the restore_state information.
>> >  TCG needs to help filling it in (a new TCG opcode?), but it should be 
>> > easy.
> Yes, that was another approach I have in mind (I called it exception
> table in my other mail),

Okay, understood.  My idea was more like always generating the gen_op_*
arrays.

> but it requires a tiny more work than just
> saving the CPU state all the time. The problem is that the state
> information we want to save are varying for target to target. Going
> through a TCG opcode means we can use the liveness analysis pass to save
> the minimum amount of data.

I mentioned a TCG opcode because the target PC is not available inside
the translator.  So the translator could pepper the TCG instruction
stream with things like

 checkpoint  $target_pc, $target_cc_op, $0

TCG can then use them to fill in an array stored inside the
TranslationBlock, together with the host PC.  Since the gen_opc_pc,
gen_opc_instr_start, gen_opc_icount arrays are inside tcg_ctx, it may be
a good idea to store the checkpoint information compressed in a byte
array (e.g. as a series of ULEB128 values---the host and target PCs can
even be stored as deltas from the last value).

As a first step, gen_intermediate_code_pc and tcg_gen_code_search_pc can
then be merged into a single target-independent function that
uncompresses the byte array up to the required host PC into tcg_ctx.
Later you can optimize them to remove the tcg_ctx arrays altogether.

So the patches could be something like this:

1) SPARC: put the jump target information directly in gen_opc_* without
using gen_opc_jump_pc (not trivial)

2) a few targets: instead of gen_opc_* arrays, use a new generic member
of tcg_ctx (similar to how csbase is used generically), e.g.
tcg_ctx.gen_opc_target1[] and tcg_ctx.gen_opc_target2[].

3) all targets: always fill in tcg_ctx.gen_*, even if search_pc is false

4) TCG: add support for a checkpoint operation, make it fill in
tcg_ctx.gen_*

5) all targets: change explicit filling of tcg_ctx.gen_* to use the
checkpoint operation

6) TCG/translate-all: convert gen_intermediate_code_pc as outlined above

> That said I would like to push further the idea of always saving the CPU
> state a bit more to see if we can keep the same performances. There are
> still improvements to do, by removing more code on the core side (like
> finding the call to tb_finc_pc which is now useless), or on the target
> side by checking/improving helper flags. We might save the CPU state too
> often if a helper doesn't declare it doesn't touch globals.

True, on the other hand there are a lot of helpers to audit...

Paolo

Re: [Qemu-devel] [PATCH] cpu-exec: Do not invalidate original TB in cpu_exec_nocache()

2015-06-18 Thread Sergey Fedorov

On 18.06.2015 09:57, Paolo Bonzini wrote:
> On 17/06/2015 19:54, Sergey Fedorov wrote:
>>  
>> -/* tb_gen_code can flush our orig_tb, invalidate it now */
>> -tb_phys_invalidate(orig_tb, -1);
>> -tb = tb_gen_code(cpu, pc, cs_base, flags,
>> +tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
>>   max_cycles | CF_NOCACHE);
>> +tb->orig_tb = orig_tb;
> What happens here if tb_gen_code calls tb_flush?
>
> Paolo

I think I understand. Did you mean tcg_ctx.tb_ctx.tb_invalidated_flag
should be checked here?

Sergey

Re: [Qemu-devel] [PATCH] ui/egl: use stride and y0_top

2015-06-18 Thread Gerd Hoffmann

On Do, 2015-06-18 at 11:37 +0200, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 

Oops, wrong patch, scratch that.

sorry,
  Gerd

1 2 3 4 >

1 - 100 of 383 matches

Mail list logo