On Mon, Feb 13, 2023 at 09:38:50PM +0200, Yonatan Shtarkman wrote:
> Thanks for looking into this Rich and Daniel, and thanks for the profound
> explanation.
> See my answers below.
> 
> `'i' is not used inside the loop?  Or is this error intermittent?` - 'i' can 
> be
> removed, the error is intermittent.
> This reproduces on libguestfs 1.46.2 and 1.48.6.
> 
> `This is specifically a Python API problem or would it affect
> the C API too?` - I assume it's not in the C API, since this issue doesn't
> reproduce when using guestfish.
> 
> I would likely need a complete self-contained reproducer to really go
> anywhere with this.` - I can supply a container image with libguestfs 1.46.2
> running the for loop. Let me know if that helps.

Could you come up with a Python program, or a Python + guestfish
program.  Something like ...

$ guestfish -N fs -m /dev/sda1 touch '/xxxó'

+ something in Python that uses the image (test1.img) to
reproduce it?

Rich.


> Also, as a workaround, I avoided calling the event callback if null is 
> returned
> by Py_BuildValue (still print the error and release the thread).
> This seems to work for our usage (we only use the event callbacks for 
> logging),
> any potential issues I'm missing?
> 
> On Mon, Feb 13, 2023 at 8:14 PM Daniel P. Berrangé <berra...@redhat.com> 
> wrote:
> 
>     On Mon, Feb 13, 2023 at 06:07:58PM +0000, Richard W.M. Jones wrote:
>     > On Sun, Feb 12, 2023 at 03:31:08PM +0200, Yonatan Shtarkman wrote:
>     > > Hey,
>     > > When downloading a file whose path contains multi-byte utf-8,
>     libguestfs
>     > > sometimes crashes.
>     > > This reproduces when using python, and not when using guestfish.
>     > >
>     > > Code to reproduce:
>     > > for i in range(2000):
>     > >     g.download ('/xxxó', '/tmp/1')
>     >
>     > 'i' is not used inside the loop?  Or is this error intermittent?
>     >
>     > > #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/
>     raise.c:50
>     > > #1  0x00007ffff7fac140 in <signal handler called> () at /lib/
>     x86_64-linux-gnu/
>     > > libpthread.so.0
>     > > #2  0x00007ffff6f77701 in _Py_INCREF (op=<optimized out>) at /usr/
>     include/
>     > > python3.9/object.h:408
>     > > #3  guestfs_int_py_event_callback_wrapper
>     > >     (g=<optimized out>, flags=<optimized out>, array_len=0, array=0x0,
>     buf_len=
>     > > 47, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \
>     303by",
>     > > event_handle=0, event=16, callback=0x7ffff2516600) at handle.c:137
>     > > #4  guestfs_int_py_event_callback_wrapper
>     > >     (g=<optimized out>, callback=0x7ffff2516600, event=16, 
> event_handle
>     =0,
>     > > flags=<optimized out>, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm
>     --debug
>     > > settle -E \303by", buf_len=47, array=0x0, array_len=0) at handle.c:104
>     > > #5  0x00007ffff6e0076a in guestfs_int_call_callbacks_message (g=
>     0xf31290, event
>     > > =16, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm --debug settle -E \
>     303by",
>     > > buf_len=47)
>     > >     at events.c:117
>     > > #6  0x00007ffff6e1702e in guestfs_int_log_message_callback
>     > >     (g=g@entry=0xf31290, buf=0x113b8a0 "gs=0x0\r\ncommandrvf: udevadm
>     --debug
>     > > settle -E \303by", len=len@entry=47) at proto.c:145
>     > > #7  0x00007ffff6dfb759 in handle_log_message (g=g@entry=0xf31290, 
> conn=
>     > > conn@entry=0x110e280) at conn-socket.c:395
>     > > #8  0x00007ffff6dfbd63 in read_data (len=4, bufv=<optimized out>, 
> connv
>     =
>     > > <optimized out>, g=<optimized out>) at conn-socket.c:179
>     > > #9  read_data (g=0xf31290, connv=0x110e280, bufv=<optimized out>, len=
>     4) at
>     > > conn-socket.c:142
>     > > #10 0x00007ffff6e1764a in recv_from_daemon (buf_rtn=0x7fffffffd858,
>     size_rtn=
>     > > 0x7fffffffd854, g=0xf31290) at proto.c:545
>     > > #11 guestfs_int_recv_from_daemon (g=g@entry=0xf31290, size_rtn=
>     size_rtn@entry=
>     > > 0x7fffffffd854, buf_rtn=buf_rtn@entry=0x7fffffffd858) at proto.c:623
>     > > #12 0x00007ffff6e17a5a in guestfs_int_recv
>     > >     (g=g@entry=0xf31290, fn=fn@entry=0x7ffff6e3b3e8 "download", hdr=
>     hdr@entry=
>     > > 0x7fffffffd920, err=err@entry=0x7fffffffd8f0, xdrp=xdrp@entry=0x0, 
> ret=
>     > > ret@entry=0x0)
>     > >     at proto.c:668
>     > >
>     > > I debugged this issue and noticed that the appliance logs
>     from commandrvf are
>     > > truncated, leading to parse failure (missing utf-8 additional bytes):
>     > > https://github.com/libguestfs/libguestfs/blob/master/python/handle.c#
>     L134
>     > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 
> 0:
>     invalid
>     > > start byte
>     >
>     > So I thought we'd fixed this in:
>     >
>     > https://github.com/libguestfs/libguestfs/commit/
>     0ee02e0117527b86a31b2a88a14994ce7f15571f
>     >
>     > This is specifically a Python API problem or would it affect
>     > the C API too?
> 
>     The difference with any C API is that almost nothing at the C level will
>     be validating that the bytes are actually valid utf-8 sequences.
> 
>     So the truncated data is unlikely to result in a fatal error. Python is
>     aggressively validating all bytes, and so you get a hard error from the
>     truncated UTF-8.  Other languages may vary, but I've not seen anything
>     that makes validation errors a failure in quite such an aggressive way
>     as python. The problems with decode exceptions have hit soooo many apps
>     using python over the past few years. Even worse if running in a C
>     locale as python will reject anything with 8th bit set as being outside
>     7-bit asciii, instead of being 8-bit clean in its stream handling.
> 
> 
>     With regards,
>     Daniel
>     --
>     |: https://berrange.com      -o-    
> https://www.flickr.com/photos/dberrange
>     :|
>     |: https://libvirt.org         -o-            
> https://fstop138.berrange.com
>     :|
>     |: https://entangle-photo.org    -o-    
> https://www.instagram.com/dberrange
>     :|
> 
> 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Reply via email to