to C representations

Paolo Bonzini Wed, 03 Jul 2024 08:56:39 -0700

[warning: long email]

On Wed, Jul 3, 2024 at 2:48 PM Marc-André Lureau
<marcandre.lur...@gmail.com> wrote:
>
> Hi
>
> (adding Sebastian, one of the glib-rs developers in CC)
>
> On Mon, Jul 1, 2024 at 7:02 PM Paolo Bonzini <pbonz...@redhat.com> wrote:
>>
>> The qemu::util::foreign module provides:
>>
>> - A trait for structs that can be converted to a C ("foreign") representation
>> - A trait for structs that can be built from a C ("foreign") representation
>> - A wrapper for a pointer that automatically frees the contained data.
>
> You worry about technical debt, and I do too. Here you introduce quite 
> different traits than what glib-rs offers.
> We already touched this subject 2y ago, my opinion didn't change much
> (https://patchew.org/QEMU/20210907121943.3498701-1-marcandre.lur...@redhat.com/20210907121943.3498701-13-marcandre.lur...@redhat.com/)


Hi Marc-André, first of all thanks for reviewing and, probably, sorry
for not going more into detail as to why I reinvented this particular
wheel.

Like two years ago, I find the full/none nomenclature very confusing
and there is no trace of them in QEMU, but I could get over that
easily.

But also, I think the two have different focus and design. All I
wanted was a place to put common code to convert Rust objects (mostly
CStr/String/str and Error) to and from C representations (char * and
QEMU's Error*), to avoid rewriting it over and over. So I wanted the
traits to be as simple as possible, and I didn't like that glib puts
to_glib_none and to_glib_full into the same trait, though I understand
why it does so. I also didn't like that to_glib_full() is an express
ticket to a memory leak. :)

OwnedPointer<> and free_foreign solve all the qualms I had with glib-rs:

- OwnedPointer<> takes care of freeing at the cost of extra verbosity
for ".as_ptr()".

- because clone_to_foreign() does not leak, I don't need to think
about to_glib_none() from the get go.

This has a ripple effect on other parts of the design. For example,
because glib has to_glib_none(), its basic design is to never allocate
malloc-ed memory unless ownership of the pointer is passed to C.
glib-rs's to_glib_none can and will "clone to Rust-managed memory" if
needed. Instead, because I have free_foreign(), I can malloc freely.

In the future we could add something like Stash/to_glib_none(), but it
would be very different and with performance properties enforced by
the type system. It would "create a pointer into *existing*
Rust-managed memory" cheaply but, every time a cheap borrow is not
possible, users will have to call clone_to_foreign() explicitly. In
other words there will be a difference between OwnedPointer<> as the
result of an expensive operation, and [the equivalent of glib-rs]
Stash as the result of a cheap operation. This is unlike
to_glib_none() which is e.g. cheap for CStr but not for String. More
on this below.

> Also, you don't offer the equivalent of "to_glib_none" which uses a temporary 
> stash and is quite useful, as a majority of functions don't take ownership.

Right, in part you have already answered that: OwnedPointer handles
freeing in the same way as g_autofree, so you can use it as a
to_glib_none() replacement. If you need to pass the pointer and forget
about it, you can use clone_to_foreign_ptr(). If you need something
temporary that needs to be freed before returning, you use
clone_to_foreign() and as_ptr().

This is not any more unsafe than to_glib_none(). If you pass a
temporary pointer (x.clone_to_foreign().as_ptr()) where C code expects
full ownership, you get a dangling pointer either way. It is less
leak-happy than to_glib_full() though.

Regarding how to obtain a pointer cheaply, there are two use cases:

- borrow:

  fn borrow_foreign<'a>(&'a self) -> Stash<'a,
      <Self as CloneToForeign>::Foreign, &'a Self>;

Useful for slices and elementary integer types - including cases where
C parameters are passed by reference - and also for CStr. However
right now my main user is strings and it doesn't buy much there.
Because String and &str are not null-terminated, a "borrow" would
allocate memory, as in to_glib_none(), which is pretty weird. At this
point, I find it better to have the more explicit clone_to_foreign()
that tells you how expensive it is.

- consumption, which does not exist in glib-rs:

    fn into_foreign(self) -> Stash<'static,
      <Self as CloneToForeign>::Foreign, Self>;

This works for types that, like String, deref to a stable address even
when moved---this way we don't need to deal with pinning. And it is in
fact especially nice for the String case and therefore for anything
that uses format!(), because it could be implemented as

   s.reserve_exact(1);
   s.push('\0');

The two would be separate traits. There was one case in which I could
have used into_foreign(), namely the call to error_setg_internal() in
patch 5. But I am not sure if this will be more common than functions
taking &str, so I left it for later.

> Because much of our code is using GLib types and API style, I
> think we should strive for something that is close (if not just the
> same) to what glib-rs offers. It's already hard enough to handle
> one binding concept, having 2 will only make the matter worse.
> Consider a type like GHashTable<GUuid, QOM>, it will be very
> annoying to deal with if we have different bindings traits and
> implementations and we will likely end up duplicating glib-rs
> effort.

I'm not even sure if it's even true that much QEMU code is using GLib
types. We use quite a bit of GLib, and took a lot of inspiration from
it because GLib is good. But it's notable that we use very little GLib
*across modules*. There are only a handful of occurrences to
GList/GList/GHashTable in function arguments in include/, for all of
QEMU, and all of them are very rarely used functions. Why? Possibly
because, even where we take inspiration (the Error** idiom, QOM
casts), the implementation ends up completely different for reasons
that aren't just NIH.

In any case I don't see us using glib-rs much, if at all; uses of hash
tables in Rust code can just use HashMap<>. Yes, there is no
equivalent of the expensive, O(n) conversion
self.some_g_hash_table.to_glib_none(); but that's a feature, not a
limitation.

Also, anyway we couldn't extend the glib traits a lot, and glib-rs
only supports HashMap<String, String>.

> As for naming & consistency, glib-rs settled on something clearer imho:
>
> from_glib_full
> from_glib_none
> to_glib_full
> to_glib_none
>
> vs
>
> from_foreign
> cloned_from_foreign
> clone_to_foreign
> /nothing/

These last two are respectively clone_to_foreign_ptr and
clone_to_foreign. I am not wed to my names but I'm not sure the glib
ones are clearer.

What I want from names is clarity on who allocates or frees; I do that
with "clone", which is a common Rust term, to indicate that the C side
still needs to be freed after the call. I find it confusing that
from_glib_full() requires no freeing but to_glib_full() requires
freeing, for example - I understand _why_ but still "full" and "none"
are GIR terms that are not really familiar to most Rust or QEMU
developers.

And secondarily, clarity on what is cheap and what isn't. I don't have
that functionality at all for now, but the nomenclature is easily
extended to borrow_foreign() and into_foreign(). Talking about Rust
terms, glib-rs lacks "into" which is the standard Rust way to do
consuming conversions, while my code has into_native().

All in all, I think there are some significant differences between
glib-rs and this implementation that justify the effort.


Paolo

Re: [PATCH 03/14] rust: define traits and pointer wrappers to convert from/to C representations

Reply via email to