On Wed, Jul 9, 2025 at 9:08 PM Dragos Tatulea <dtatu...@nvidia.com> wrote: > > On Wed, Jul 09, 2025 at 08:59:13PM +0800, Wenli Quan wrote: > > On Wed, Jul 9, 2025 at 6:38 PM Dragos Tatulea <dtatu...@nvidia.com> wrote: > > > > > > On Wed, Jul 09, 2025 at 05:36:04PM +0800, Wenli Quan wrote: > > > > I am reporting an issue where the host system crashes when re-running > > > > a script that creates a vDPA device after interrupting its previous > > > > execution. I am attaching the script for your analysis, as I am unsure > > > > of the exact step causing the crash. > > > > > > > Thanks for catching this Wenli. We'll look into it. > > > > > > > # uname -r > > > > 6.16.0-rc2 > > > > # sh vdpa-setup.sh 0000:b5:00.1 1 > > > > interrupted by pressing Ctrl+C > > > Could you specify during which stage of the script do you interrupt it? > > > > Interrupted after running for a few seconds. > > > I was not yet able to reproduce the issue. > > Could you print out the commands of the debug script so that we can see > where it was interrupted? set -x should be enough.
I tried again many times, just to reproduce once. the following script for your inference. but it doesn’t cause a crash every time. sh test-vdpa-crash.sh ==== [TEST] Kill after 0.1s ==== [INFO] Running vdpa-setup.sh in background... [INFO] Script PID: 2646 [INFO] Re-running script after termination Then the host crashes. Best Regards, wenli > > > I encountered the same crash on both the 6.16.0-rc2 kernel and the one > > with the "[PATCH vhost] vdpa/mlx5: Fix release of uninitialized > > resources on error path" applied. > > > Right. This is another issue that is coming from mlx5_core. > > Thanks, > Dragos > >
#!/bin/bash DEVICE="0000:b5:00.1" ARG2=1 SCRIPT="./vdpa-setup.sh" KILL_TIMES=($(seq 0.1 0.1 13.0)) mkdir -p output for TIME in "${KILL_TIMES[@]}"; do echo "==== [TEST] Kill after ${TIME}s ====" LOGFILE="output/test_kill_${TIME}.log" echo "[INFO] Running vdpa-setup.sh in background..." | tee "$LOGFILE" sh -x "$SCRIPT" "$DEVICE" "$ARG2" >>"$LOGFILE" 2>&1 & SCRIPT_PID=$! echo "[INFO] Script PID: $SCRIPT_PID" | tee -a "$LOGFILE" (sleep "$TIME" && echo "[INFO] Killing $SCRIPT_PID after $TIME s" >>"$LOGFILE" && kill -TERM "$SCRIPT_PID") & wait $SCRIPT_PID echo "[INFO] Re-running script after termination" | tee -a "$LOGFILE" sh "$SCRIPT" "$DEVICE" "$ARG2" >>"$LOGFILE" 2>&1 echo "[DONE] Kill after ${TIME}s" | tee -a "$LOGFILE" echo done echo "===== All tests finished. Logs saved in ./output ====="