Hi Dominic, Thanks for taking the time to look into this. For most of your suggestions I already suspected this to be the case, as I attempted to use futures several times in the past, but it is good to know that other people are of the same opinion.
I looked at some other suggestions in this thread and they do make small improvements to the speed, but nothing spectacular. The biggest improvement (about 30%) comes from using Typed Racket -- this is attractive as the code looks nice and has type annotations too. For realistic data sets, the best TR version still takes about 5 seconds. While 5 seconds is not a lot, the code will be added to a GUI application, and this means that I will need to implement some kind of progress bar for the user to look at, while the results come in... Given these experiments and discussion, and from my personal point of view, futures are not attractive, as they require a lot of effort to get any performance benefits out of them, and simple and "innocent" code changes can result in a complete performance loss. Also the resulting code is not particularly elegant and maintainable, especially once I start using the unsafe operations. Out of curiosity, I also implemented the straightforward CP3 search (cp3-baseline) in C++ and it runs in 0.7 seconds. However, for now, I will go with the Typed Racket solution and just add a progress bar to the GUI. Thanks everyone for looking at this code and providing suggestions. Alex. On Thursday, June 18, 2020 at 4:45:38 AM UTC+8, Dominik Pantůček wrote: > > Hi Alex, > > I finally got to investigate the issue in depth. And there are two major > problems blocking your implementation from running the futures in > parallel. > > 1) Allocations of boxed flonums. I tried to get rid of those by > allocating "scratchpad" flvectors and mapping everything onto them. The > future scheduling started to look different, however there are many > allocations because of your loops in functions cp3-futures and > evaluate-cost. > > I didn't finish the work though, because I noticed another strange > thing. The Typed Racket code in the lambda function returned by spline > (your mmax-fn argument) blocks almost entirely the parallel execution on > ... type checks. > > 2) All those =, < and friends just block it. > > So how to fix this? Well, it is relatively easy albeit quite a lot of > manual work. > > (I am looking only at the cp3-futures function when talking about > possible improvements). > > * Change the all the for loops for let loop forms and use preallocated > flvectors to hold all the values. > * Switch to racket/unsafe/ops for everything inside futures (this is not > necessary, but takes away a few possible surprises) > * Restructuralize the way you split the work into 30 futures and just > use a binary tree of futures as suggested earlier. > * Use racket/unsafe/ops from regular racket to implement the spline > interpolation. I would also move the coefficients into one huge flvector > and forgot about lists at all. This is very specific workload. > > And do not worry about GCs. Once you get rid of all allocations inside > futures, the GCs will disappear. > > Also bear in mind that my suggestions are rather low level. By following > them you will get the algorithm to scale over multiple cores if you > really need it. Just remember it will be at a huge cost to readability. > You can slightly mitigate this by some refactoring and custom syntax, > but that is even more work and I would really consider whether you need > the parallelism for a computation that takes a few seconds anyway. > > Of course, if you plan to use your algorithm for much bigger data set, > this is the way to go (including the custom syntax). > > > Cheers, > Dominik > > On 17. 06. 20 10:50, Alex Harsanyi wrote: > > > > I am trying to speed up an algorithm using futures, but I am getting > > some unexpected results (and no real speed improvements), and I was > > wondering if someone more experienced could have a look a the code and > > tell me what am I doing wrong. > > > > I put up the code in this repository: > > https://github.com/alex-hhh/cp3-exeriments, unfortunately it is the > > simplest meaningful example that I can come up with. Most of the > > functions, however are just support functions and there are six > > implementation of the same algorithm. > > > > Basically, the problem I am trying to solve is to fit a model to > > existing data and this is done by evaluating 2.5 million combinations of > > parameters. My best, non-futures based, algorithm can do this in about > > 3 seconds (8 seconds in DrRacket). > > > > Given that each of this 2.5 million combination is completely > > independent from the others, they could all be done in parallel. Given > > this, I "sliced" the combinations into 30 groups and tried to perform > > each "slice" in its own future and select the best among the 30 results > > produced by these futures. > > > > Unfortunately, while the futures versions of the algorithm produce the > > correct result, the algorithm runs at the same speed as the non-futures > > version. My `processor-count` returns 8, so I would expect at least > > some level of parallelism. > > > > As a first step, I tried using `would-be-future`, to see if it reported > > any operations which might block, but nothing was printed out. > > > > I also tried using the futures visualized, and I found the following: > > > > * the code appears to be blocking on primitive operations, such as +, -, > > < etc. > > * I suspect these operations are inside the code generated by the `for` > > loops, so I am not sure how to remove them without making the code even > > more difficult to read. > > * there seems to be a lot more time spent in the garbage collector when > > running the futures visualizer than without it (DrRacket runs with > > unlimited memory) > > > > So I am wondering if someone who is more familiar with futures can look > > at the code and provide some hints about what can be done to make this > > code run in parallel (or if it cannot, I would like to understand why). > > > > This is already a long message, so I will not add further details here, > > but the repository at https://github.com/alex-hhh/cp3-exeriments has an > > explanation of what every function does, and I am happy to provide > > further clarifications if needed. > > > > Thanks, > > Alex. > > > > -- > > You received this message because you are subscribed to the Google > > Groups "Racket Users" group. > > To unsubscribe from this group and stop receiving emails from it, send > > an email to racket...@googlegroups.com <javascript:> > > <mailto:racket-users+unsubscr...@googlegroups.com <javascript:>>. > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/racket-users/8bf6f7c4-3b2f-4b86-9a8a-be68e82d09cfo%40googlegroups.com > > > < > https://groups.google.com/d/msgid/racket-users/8bf6f7c4-3b2f-4b86-9a8a-be68e82d09cfo%40googlegroups.com?utm_medium=email&utm_source=footer>. > > > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/864d82f2-139d-4a6c-b99d-24f324b1003do%40googlegroups.com.