[racket-users] Another futures-related bug hunt

Dominik Pantůček Fri, 08 May 2020 00:35:25 -0700

Hello fellow Racketeers,

my spare-time out-of-curiosity venture into using HPR (High-PerformanceRacket) for creating a software 3D rendering pipeline seems to bepushing the futures into rough edges.


The scenario is sort of "usual":

* 7 futures + 1 in RTT that form a binary tree
* GUI thread running

But this time, the futures perform not only data-heavy fixnumsoperations, but flonums operations as well.

Something along the lines of 2560x1440 fixnums and the same number offlonums is being handled in 8 threads effectively (give or take someoptimizations that slightly lower the 1440 height usually).

The code in question is relatively short - say 60 lines of code -however it does not make much sense without the remaining 2k lines :)

If the operation runs without futures in RTT, nothing happens. But undera heavy load and VERY varying amount of time (seconds to hours), itcompletely freezes with:


* 1 CPU being used at 100% (top/htop shows)
* Does not handle socket operations (X11 WM message for closing the window)
* Does not respond to keyboard (or via kill) SIGINT

* Can only be forcibly stopped by SIGKILL (or similar) or forcefullyclosing the window from WM which sort of gets handled probably in thelower-level parts of GDK completely without Racket runtime intervention(just prints Killed and the exit code is 137)

Based on these observations I can only conclude that it is the RTT thatgets stuck - but that is only the native thread perspective. From Racketthread perspective, it can be either the "main" application thread thatis in (thread-wait) for the thread that performs the futures stuff andit can also be the GUI thread which is created with parameterizing theeventspace (that is just some trickery to allow me to send breaks when Ireceive window close event).

Apart from obvious strace (after freeze) and gdb (before/after freeze)debugging to find possible sources of this bug, is there even a remotepossibility of getting any clue how can this happen based on theinformation gathered so far? My thought go along the lines:


* flonums are boxed - but for some operations they may be immediate

* apparently it is a busy-wait loop in RTT, otherwise 100% CPU usage isimpossible with this workload* unsafe ops are always suspicious, but again, the problem shows up evenwhen I switch to the safe versions - it just takes longer time

* which means, the most probable cause is a race condition

And that is basically all I can tell right now.

Of course, any suggestions would be really welcome.

Cheers,
Dominik

P.S.: I am really curious, what will I find when I finally putfsemaphores into the mix...





--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/ca40f468-53c7-6fd2-4e7f-0d963e931a60%40trustica.cz.

[racket-users] Another futures-related bug hunt

Reply via email to