On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote:
> On 8/26/25 6:07 AM, Alexandre Courbot wrote:
>>   /// Structure encapsulating the firmware blobs required for the GPU to 
>> operate.
>>   #[expect(dead_code)]
>>   pub(crate) struct Firmware {
>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware {
>>       booter_unloader: BooterFirmware,
>>       /// GSP bootloader, verifies the GSP firmware before loading and 
>> running it.
>>       gsp_bootloader: RiscvFirmware,
>> -    gsp: firmware::Firmware,
>> +    /// GSP firmware.
>> +    gsp: Pin<KBox<GspFirmware>>,
>
> Is there a reason why we don't just propagate it through struct Gpu, which 
> uses 
> pin-init already?
>
> You can make Firmware pin_data too and then everything is within the single 
> allocation of struct Gpu.

I tried doing that at first, and hit the problem that the `impl PinInit`
returned by `GspFirmware::new` borrows a reference to the GSP firmware
binary loaded by `Firmware::new` - when `Firmware::new` returns, the
firmware gets freed, and the borrow checker complains.

We could move the GSP firmware loading code into the `pin_init!` of
`Firmware::new`, but now we hit another problem: in `Gpu::new` the
following code is executed:

    FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)?

which requires the `Firmware` instance, which doesn't exist yet as the
`Gpu` object isn't initialized until the end of the method.

So we could move `FbLayout`, and everything else created by `Gpu::new`
to become members of the `Gpu` instance. It does make sense actually:
this `new` method is doing a lot of stuff, such as running FRTS, and
with Alistair's series it even runs Booter, the sequencer and so on.
Maybe we should move all firmware execution to a separate method that is
called by `probe` after the `Gpu` is constructed, as right now the `Gpu`
constructor looks like it does a bit more than it should.

... but even when doing that, `Firmware::new` and `FbLayout::new` still
require a reference to the `Bar`, and... you get the idea. :)

So I don't think the current design allows us to do that easily or at
all, and even if it does, it will be at a significant cost in code
clarity. There is also the fact that I am considering making the
firmware member of `Gpu` a trait object: the boot sequence is so
different between pre and post-Hopper that I don't think it makes sense
to share the same `Firmware` structure between the two. I would rather
see `Firmware` as an opaque trait object, which provides high-level
methods such as "start GSP" behind which the specifics of each GPU
family are hidden. If we go with this design, `Firmware` will become a
trait object and so cannot be pinned into `Gpu`.

This doesn't change my observation that `Gpu::new` should not IMHO do
steps like booting the GSP - it should just acquire the resources it
needs, return the pinned GPU object, and then `probe` can continue the
boot sequence. Having the GPU object pinned and constructed early
simplifies things quite a bit as the more we progress with boot, the
harder it becomes to construct everything in place (and the `PinInit`
closure also becomes more and more complex).

I'm still laying down the general design, but I'm pretty convinced that
having `Firmware` as a trait object is the right way to abstract the
differences between GPU families.

Reply via email to