On Wed, Jul 9, 2025 at 9:08 PM Dragos Tatulea <dtatu...@nvidia.com> wrote:
>
> On Wed, Jul 09, 2025 at 08:59:13PM +0800, Wenli Quan wrote:
> > On Wed, Jul 9, 2025 at 6:38 PM Dragos Tatulea <dtatu...@nvidia.com> wrote:
> > >
> > > On Wed, Jul 09, 2025 at 05:36:04PM +0800, Wenli Quan wrote:
> > > > I am reporting an issue where the host system crashes when re-running
> > > > a script that creates a vDPA device after interrupting its previous
> > > > execution. I am attaching the script for your analysis, as I am unsure
> > > > of the exact step causing the crash.
> > > >
> > > Thanks for catching this Wenli. We'll look into it.
> > >
> > > > # uname -r
> > > > 6.16.0-rc2
> > > > # sh vdpa-setup.sh  0000:b5:00.1 1
> > > > interrupted by pressing Ctrl+C
> > > Could you specify during which stage of the script do you interrupt it?
> >
> > Interrupted after running for a few seconds.
> >
> I was not yet able to reproduce the issue.
>
> Could you print out the commands of the debug script so that we can see
> where it was interrupted? set -x should be enough.

I tried again many times, just to reproduce once. the following script
for your inference. but it doesn’t cause a crash every time.

sh test-vdpa-crash.sh
==== [TEST] Kill after 0.1s ====
[INFO] Running vdpa-setup.sh in background...
[INFO] Script PID: 2646
[INFO] Re-running script after termination
Then the host crashes.

Best Regards,
wenli
>
> > I encountered the same crash on both the 6.16.0-rc2 kernel and the one
> > with the "[PATCH vhost] vdpa/mlx5: Fix release of uninitialized
> > resources on error path" applied.
> >
> Right. This is another issue that is coming from mlx5_core.
>
> Thanks,
> Dragos
>
>
#!/bin/bash

DEVICE="0000:b5:00.1"
ARG2=1
SCRIPT="./vdpa-setup.sh"

KILL_TIMES=($(seq 0.1 0.1 13.0))

mkdir -p output

for TIME in "${KILL_TIMES[@]}"; do
    echo "==== [TEST] Kill after ${TIME}s ===="
    LOGFILE="output/test_kill_${TIME}.log"

    echo "[INFO] Running vdpa-setup.sh in background..." | tee "$LOGFILE"
    sh -x "$SCRIPT" "$DEVICE" "$ARG2" >>"$LOGFILE" 2>&1 &
    SCRIPT_PID=$!

    echo "[INFO] Script PID: $SCRIPT_PID" | tee -a "$LOGFILE"
    
    (sleep "$TIME" && echo "[INFO] Killing $SCRIPT_PID after $TIME s" >>"$LOGFILE" && kill -TERM "$SCRIPT_PID") &

    wait $SCRIPT_PID

    echo "[INFO] Re-running script after termination" | tee -a "$LOGFILE"
    sh "$SCRIPT" "$DEVICE" "$ARG2" >>"$LOGFILE" 2>&1

    echo "[DONE] Kill after ${TIME}s" | tee -a "$LOGFILE"
    echo
done

echo "===== All tests finished. Logs saved in ./output ====="

Reply via email to