On Fri May 30, 2025 at 6:30 AM JST, Timur Tabi wrote:
> On Wed, 2025-05-21 at 15:45 +0900, Alexandre Courbot wrote:
>
> I noticed something interesting in this change to Gpu::new().
>
>> +        // Check that the WPR2 region does not already exists - if it does, 
>> the GPU needs to be
>> +        // reset.
>> +        if regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI::read(bar).hi_val() != 0 {
>> +            dev_err!(
>> +                pdev.as_ref(),
>> +                "WPR2 region already exists - GPU needs to be reset to 
>> proceed\n"
>> +            );
>> +            return Err(EBUSY);
>> +        }
>
> You have a lot of checks in this code that display an error message and then 
> return an Err().
>
> But then ...
>
>> +
>> +        // Reset falcon, load FWSEC-FRTS, and run it.
>> +        gsp_falcon.reset(bar)?;
>> +        gsp_falcon.dma_load(bar, &fwsec_frts)?;
>> +        let (mbox0, _) = gsp_falcon.boot(bar, Some(0), None)?;
>> +        if mbox0 != 0 {
>> +            dev_err!(pdev.as_ref(), "FWSEC firmware returned error {}\n", 
>> mbox0);
>> +            return Err(EINVAL);
>> +        }
>
> There are several lines where you just terminate them with "?".  This means 
> that no error message is
> displays. 
>
> I think all of these ? should be replaced with something like:
>
>       gsp_falcon.reset(bar).inspect_err(|e| {
>             dev_err!(pdev.as_ref(), "Failed to reset GSP falcon: {:?}\n", e);
>         })?;
>
> This feels like something that would benefit from a macro, but I can't 
> imagine what that would look
> like.

This is because we are checking the cause of the error (unexpected value
after firmware runs) in this file, so it is the correct place to display
an error message. If the falcon reset fails, the error happens within
the `reset()` method which can display an error message if needed, so I
thought it was adequate to just propagate the error here.

Now doing so would not tell us *which* falcon failed, and this sequence
is so important that it is a good idea to understand where is fails
precicely, so I've added a few `inspect_err` as you suggested for
clarity.

Ideally we would have something like the user-space `thiserror` crate to
manage errors nicely and have custom error types like Lyude suggested.

Reply via email to