Since I wasn't able to attend the BOF, I delayed my reply until after I could get some experience with Manos's code and make a list of things to do in the immediate future. I think my recent patches reach the point at which the work can be parallelized (and a list of subsequent steps are at https://wiki.qemu.org/RustInQemu#TODO).
On Thu, Sep 26, 2024 at 4:23 PM Alex Bennée <alex.ben...@linaro.org> wrote: > During the various conversations I didn't hear anyone speak against the > proposed migration although some concerns where raised about review and > knowledge gaps. Yes, I agree. > One output from this discussion should be a clear statement that we are > going forward with this work and the road map. A rough roadmap might > look like: > > - 9.2 --enable-rust is available and developers can build with it. > rust devices have -x-device or -rust-device CLI flags for > runtime selection. > > - 10.x rust devices feature complete and migration compatible, enabled > by default when rust compiler detected. No CLI selection > required as legacy portions won't be built. Any partial > conversions should be behind --enable-prototype-rust configure > flag. > > - 11.x distros have enough infrastructure to build on supported > platforms. Rust becomes a mandatory dependency, old C versions > of converted code removed from build. Here is my version: 9.2 --enable-rust is available and developers can build with it. Ideally this results in feature parity, but we cannot yet be sure that this is the case. (Current holes are documented in https://wiki.qemu.org/RustInQemu#TODO) 10.x --enable-rust is default. Duplicates of existing code are allowed in order to gain experience with the creation of bindings to C code, but should (probably must) have feature parity with C code. It is possible to build Rust devices that have a limited amount of unsafe code in the device itself, especially during runtime as opposed to initialization. 11.x --enable-rust is mandatory and disabling it will drop support for devices or boards. > We should publish the intention and the road map prominently although it > was unclear if a blog post would be the best place vs expanding a > section in the developers manual. Perhaps both make sense with a blog > post for the statement of intent and rough timeline and the developer > manual being expanded with any new rules and standards to follow? Agreed. The blog post should also document some of the design decisions. > There was some concern about the missing gaps in the support matrix > especially as we support a number of "legacy" TCG backends. While *-user > support is more insulated from the effects of rust conversions due to > its relatively low set of dependencies it will still be a problem if we > convert the core CPU QOM classes to rust. > > In theory if LLVM supports the architecture we should be able to > generate binaries with a rust compiler although we may not have all > tools available on that host. Yeah, based on the detection code that we added in "configure", it seems that LLVM (and Rust) supports all of the targets in our support matrix? > Hanna and Stefan were keen to see more use of Rust in the block layer. > Hanna cautioned that it was hard to find a reasonable starting point for > the conversion. Originally she had started with the core sub-system but > it quickly ran into thousands of lines of code which would be hard to > get well reviewed before switching for such a crucial sub-system. Maybe > this is an ordering problem and it would make more sense to start with > individual block drivers first and work up from there. > > Alex mentioned softfloat could make a good candidate for conversion as > while the rewrite had made things easier to follow and extend there were > still some C macro tricks employed to get code re-use. It would depend > on if Rust's macro and trait system allows more of the common logic to > be kept together. > > The qboot firmware for MicroVM's was also mentioned as a potential > candidate. I am not sure of the benefits there. qboot has a bunch of hand-crafted C macros to work in 16-bit mode, and does not even have a free() function (it does have malloc() though). Rewriting qboot code in Rust is unlikely teach us many lessons that are useful for the rest of QEMU, and would not have substantial benefits in terms of memory safety. > With relative inexperience there was a concern we could inadvertently > introduce technical debt in the code base (C-like Rust vs Rusty rust). > What can we do to mitigate that issue? My main suggestion is to take it slowly. While the recent flurry of posted patches may suggest otherwise, a lot of those were developed over several months (sometimes years!) and the posted version is not the first. The other suggestion is for reviewers to not be afraid to say "I have no idea what you're doing". If a reviewer does not understand how the language is being used, the affected code must either have a clear documentation explaining the design, or be rewritten. Of course core bindings code will be more complex/complicated than we all wish, but the important thing is to _absolutely_ get public APIs right. Private implementation details can be subject to future evolution and cleanup, while public APIs are much harder to change. That said: - bad public APIs tend to be larger red flags of technical debt in Rust than in C - we can look at what other projects (especially Linux) are doing, and use that to make informed decisions. There aren't many projects doing integration of Rust into massive C code bases, but we are not first. - many of our core APIs (chardev, memory region ops, timers, etc.) are fairly stable - there are several patterns that are common across QEMU code (for example Error**, or vtables), and therefore experience writing one set of C<->Rust bindings will help with other areas of the code as well. Things may be different in other areas. For example, block devices may also suffer impedance mismatch between Rust async/await and C coroutines. > We make heavy use of GLib throughout the code base and while new Rust > code should use native structures for arrays and the like there are > places when Glib structures are passed across API boundaries. Should we > consider updating APIs as we go or live with a degree of thunking until > we have a better idea where the pain points are? I think that's relatively rare and we can cross that bridge when we get there. The glib_rs crate exists, but I don't really like its design, because it's very much tuned towards cloning C data to Rust-managed memory and it hides the performance characteristics of code that uses it. But if all we need is GArray/GPtrArray/GByteArray, there's not a lot of code to write and there's no need for thunking. That said, C<->Rust interoperability is something that has to be tackled sooner or later for objects such as Error. I have my favored approach (https://lore.kernel.org/r/all/20240701145853.1394967-4-pbonz...@redhat.com/) but it's not an easy subject. Again, let's take it slowly. It's okay if the first devices cannot have a realize() function that fails, just because we haven't figured out Error yet. > One of the promises of Rust is its support for inline unit test > integration although there was a little confusion about how this works > in practice. Are separate test binaries created with just the code under > test or is there a unit testable build of QEMU itself? Does this still > work with mixed C and Rust code? The posted patches have examples of both unit- and integration-testing of Rust code. > It was suggested creating a qemu-rust mailing list that all patches that > touch or introduce rust could Cc. This would help those willing to > review rust find patches without having to wade through the qemu-devel > firehose. Good idea, who does it? :) > Some distros do allow exceptions for "vendoring" > dependencies as part of the build but it is generally discouraged. [...] > The general consensus seemed to be we should be fairly conservative > about adding new crate dependencies while the Rust ecosystem in distros > matures. While we should support rustup/cargo installed tools so > developers can build and test on existing LTS distros we should be > aiming to build QEMU without downloading any additional packages. There > may be some flexibility for build-only dependencies (c.f. our pyenv) but > runtime dependencies should be served by the distro itself. Note that this is _not_ what we are doing right now. All dependent crates are vendored and included in the QEMU tarball. However, this is also why we are very conservative about adding new crates. Right now it's basically only "bilge" and its dependency "arbitrary-int", a crate for "readable bitfields". Thanks for writing this down! Paolo