On Mon, Feb 13, 2023 at 09:38:50PM +0200, Yonatan Shtarkman wrote: > Thanks for looking into this Rich and Daniel, and thanks for the profound > explanation. > See my answers below. > > `'i' is not used inside the loop? Or is this error intermittent?` - 'i' can > be > removed, the error is intermittent. > This reproduces on libguestfs 1.46.2 and 1.48.6. > > `This is specifically a Python API problem or would it affect > the C API too?` - I assume it's not in the C API, since this issue doesn't > reproduce when using guestfish. > > I would likely need a complete self-contained reproducer to really go > anywhere with this.` - I can supply a container image with libguestfs 1.46.2 > running the for loop. Let me know if that helps.
Could you come up with a Python program, or a Python + guestfish program. Something like ... $ guestfish -N fs -m /dev/sda1 touch '/xxxó' + something in Python that uses the image (test1.img) to reproduce it? Rich. > Also, as a workaround, I avoided calling the event callback if null is > returned > by Py_BuildValue (still print the error and release the thread). > This seems to work for our usage (we only use the event callbacks for > logging), > any potential issues I'm missing? > > On Mon, Feb 13, 2023 at 8:14 PM Daniel P. Berrangé <berra...@redhat.com> > wrote: > > On Mon, Feb 13, 2023 at 06:07:58PM +0000, Richard W.M. Jones wrote: > > On Sun, Feb 12, 2023 at 03:31:08PM +0200, Yonatan Shtarkman wrote: > > > Hey, > > > When downloading a file whose path contains multi-byte utf-8, > libguestfs > > > sometimes crashes. > > > This reproduces when using python, and not when using guestfish. > > > > > > Code to reproduce: > > > for i in range(2000): > > > g.download ('/xxxó', '/tmp/1') > > > > 'i' is not used inside the loop? Or is this error intermittent? > > > > > #0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/ > raise.c:50 > > > #1 0x00007ffff7fac140 in <signal handler called> () at /lib/ > x86_64-linux-gnu/ > > > libpthread.so.0 > > > #2 0x00007ffff6f77701 in _Py_INCREF (op=<optimized out>) at /usr/ > include/ > > > python3.9/object.h:408 > > > #3 guestfs_int_py_event_callback_wrapper > > > (g=<optimized out>, flags=<optimized out>, array_len=0, array=0x0, > buf_len= > > > 47, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \ > 303by", > > > event_handle=0, event=16, callback=0x7ffff2516600) at handle.c:137 > > > #4 guestfs_int_py_event_callback_wrapper > > > (g=<optimized out>, callback=0x7ffff2516600, event=16, > event_handle > =0, > > > flags=<optimized out>, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm > --debug > > > settle -E \303by", buf_len=47, array=0x0, array_len=0) at handle.c:104 > > > #5 0x00007ffff6e0076a in guestfs_int_call_callbacks_message (g= > 0xf31290, event > > > =16, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \ > 303by", > > > buf_len=47) > > > at events.c:117 > > > #6 0x00007ffff6e1702e in guestfs_int_log_message_callback > > > (g=g@entry=0xf31290, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm > --debug > > > settle -E \303by", len=len@entry=47) at proto.c:145 > > > #7 0x00007ffff6dfb759 in handle_log_message (g=g@entry=0xf31290, > conn= > > > conn@entry=0x110e280) at conn-socket.c:395 > > > #8 0x00007ffff6dfbd63 in read_data (len=4, bufv=<optimized out>, > connv > = > > > <optimized out>, g=<optimized out>) at conn-socket.c:179 > > > #9 read_data (g=0xf31290, connv=0x110e280, bufv=<optimized out>, len= > 4) at > > > conn-socket.c:142 > > > #10 0x00007ffff6e1764a in recv_from_daemon (buf_rtn=0x7fffffffd858, > size_rtn= > > > 0x7fffffffd854, g=0xf31290) at proto.c:545 > > > #11 guestfs_int_recv_from_daemon (g=g@entry=0xf31290, size_rtn= > size_rtn@entry= > > > 0x7fffffffd854, buf_rtn=buf_rtn@entry=0x7fffffffd858) at proto.c:623 > > > #12 0x00007ffff6e17a5a in guestfs_int_recv > > > (g=g@entry=0xf31290, fn=fn@entry=0x7ffff6e3b3e8 "download", hdr= > hdr@entry= > > > 0x7fffffffd920, err=err@entry=0x7fffffffd8f0, xdrp=xdrp@entry=0x0, > ret= > > > ret@entry=0x0) > > > at proto.c:668 > > > > > > I debugged this issue and noticed that the appliance logs > from commandrvf are > > > truncated, leading to parse failure (missing utf-8 additional bytes): > > > https://github.com/libguestfs/libguestfs/blob/master/python/handle.c# > L134 > > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position > 0: > invalid > > > start byte > > > > So I thought we'd fixed this in: > > > > https://github.com/libguestfs/libguestfs/commit/ > 0ee02e0117527b86a31b2a88a14994ce7f15571f > > > > This is specifically a Python API problem or would it affect > > the C API too? > > The difference with any C API is that almost nothing at the C level will > be validating that the bytes are actually valid utf-8 sequences. > > So the truncated data is unlikely to result in a fatal error. Python is > aggressively validating all bytes, and so you get a hard error from the > truncated UTF-8. Other languages may vary, but I've not seen anything > that makes validation errors a failure in quite such an aggressive way > as python. The problems with decode exceptions have hit soooo many apps > using python over the past few years. Even worse if running in a C > locale as python will reject anything with 8th bit set as being outside > 7-bit asciii, instead of being 8-bit clean in its stream handling. > > > With regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange > :| > |: https://libvirt.org -o- > https://fstop138.berrange.com > :| > |: https://entangle-photo.org -o- > https://www.instagram.com/dberrange > :| > > -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs