Re: [Rd] parallel PSOCK connection latency is greater on Linux?
Please, check a tcpdump session on localhost while running the following script: library(parallel) library(tictoc) cl <- makeCluster(1) Sys.sleep(1) for (i in 1:10) { tic() x <- clusterEvalQ(cl, iris) toc() } The initialization phase comprises 7 packets. Then, the 1-second sleep will help you see where the evaluation starts. Each clusterEvalQ generates 6 packets: 1. main -> worker PSH, ACK 1026 bytes 2. worker -> main ACK 66 bytes 3. worker -> main PSH, ACK 3758 bytes 4. main -> worker ACK 66 bytes 5. worker -> main PSH, ACK 2484 bytes 6. main -> worker ACK 66 bytes The first two are the command and its ACK, the following are the data back and their ACKs. In the first 4-5 iterations, I see no delay at all. Then, in the following iterations, a 40 ms delay starts to happen between packets 3 and 4, that is: the main process delays the ACK to the first packet of the incoming result. So I'd say Nagle is hardly to blame for this. It would be interesting to see how many packets are generated with TCP_NODELAY on. If there are still 6 packets, then we are fine. If we suddenly see a gazillion packets, then TCP_NODELAY does more harm than good. On the other hand, TCP_QUICKACK would surely solve the issue without any drawback. As Nagle himself put it once, "set TCP_QUICKACK. If you find a case where that makes things worse, let me know." Iñaki On Wed, 4 Nov 2020 at 04:34, Simon Urbanek wrote: > > I'm not sure the user would know ;). This is very system-specific issue just > because the Linux network stack behaves so differently from other OSes (for > purely historical reasons). That makes it hard to abstract as a "feature" for > the R sockets that are supposed to be platform-independent. At least > TCP_NODELAY is actually part of POSIX so it is on better footing, and > disabling delayed ACK is practically only useful to work around the other > side having Nagle on, so I would expect it to be rarely used. > > This is essentially RFC since we don't have a mechanism for socket options > (well, almost, there is timeout and blocking already...) and I don't think we > want to expose low-level details so perhaps one idea would be to add > something like delay=NA to socketConnection() in order to not touch (NA), > enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any other > way we could infer the intention of the user to try to choose the right > approach... > > Cheers, > Simon > > > > On Nov 3, 2020, at 02:28, Jeff wrote: > > > > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they > > might determine what is best for their potentially latency- or > > throughput-sensitive application? > > > > Best, > > Jeff > > > > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar wrote: > >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek > >> wrote: > >>> It looks like R sockets on Linux could do with TCP_NODELAY -- without > >>> (status quo): > >> How many network packets are generated with and without it? If there > >> are many small writes and thus setting TCP_NODELAY causes many small > >> packets to be sent, it might make more sense to set TCP_QUICKACK > >> instead. > >> Iñaki > >>> Unit: microseconds > >>>expr min lq mean median uq > >>> max > >>> clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 > >>> 48027.83 > >>> neval > >>> 1000 > >>> exactly the same machine + R but with TCP_NODELAY enabled in > >>> R_SockConnect(): > >>> Unit: microseconds > >>>expr min lq mean median uq max > >>> neval > >>> clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234 > >>> 1000 > >>> Cheers, > >>> Simon > >>> > On 2/11/2020, at 3:39 AM, Jeff wrote: > >>> > > >>> > I'm exploring latency overhead of parallel PSOCK workers and noticed > >>> > that serializing/unserializing data back to the main R session is > >>> > significantly slower on Linux than it is on Windows/MacOS with similar > >>> > hardware. Is there a reason for this difference and is there a way to > >>> > avoid the apparent additional Linux overhead? > >>> > > >>> > I attempted to isolate the behavior with a test that simply returns an > >>> > existing object from the worker back to the main R session. > >>> > > >>> > library(parallel) > >>> > library(microbenchmark) > >>> > gcinfo(TRUE) > >>> > cl <- makeCluster(1) > >>> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us")) > >>> > plot(x$time, ylab = "microseconds") > >>> > head(x$time, n = 10) > >>> > > >>> > On Windows/MacOS, the test runs in 300-500 microseconds depending on > >>> > hardware. A few of the 1000 runs are an order of magnitude slower but > >>> > this can probably be attributed to garbage collection on the worker. > >>> > > >>> > On Linux, the first 5 or so executions run at comparable speeds but all > >>> > subsequent executions are two orders of magnitude slower (~40 > >>> > milliseconds). > >>> >
Re: [Rd] sprintf, check number of parameters
Dear Matthias, thanks for the suggestion, R-devel now warns on unused arguments by format (both numbered and un-numbered). It seems that the new warning is useful, often it finds cases when arguments were accidentally passed to sprintf but had been meant for a different function. R allows combining both numbered and un-numbered references in a single format, even though it may be better to avoid and POSIX does not allow that. Best Tomas On 9/20/20 1:03 PM, Matthias Gondan wrote: Dear R developers, I am wondering if this should raise an error or a warning. sprintf('%.f, %.f', 1, 2, 3) [1] "1, 2" I am aware that R has „numbered“ sprintf arguments (sprintf('%1$.f', …), and in that case, omissing of specific arguments may be intended. But in the usual syntax, omission of an argument is probably a mistake. Thank you for your consideration. Best wishes, Matthias [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] sprintf, check number of parameters
Dear Tomas,Thank you.Regarding the "unnumbered" arguments, i.e. sprintf('%f %f', 1, 2, 3). This was the case I wanted to report, here a warning can be very useful.Regarding the "numbered" arguments, that is, sprintf('%$1f %$3f', 1, 2, 3). Here, omission of an argument might be intended, for example, in an application with support for multiple languages. Therefore, I am wondering if a warning should be raised.Regarding the mixture: never heard that this works, and I would probably not want to use it...Your work is much appreciated, thanks again.Best regards,Matthias Ursprüngliche Nachricht Von: Tomas Kalibera Datum: 04.11.20 15:43 (GMT+01:00) An: Matthias Gondan , r-devel@r-project.org Betreff: Re: [Rd] sprintf, check number of parameters Dear Matthias,thanks for the suggestion, R-devel now warns on unused arguments by format (both numbered and un-numbered). It seems that the new warning is useful, often it finds cases when arguments were accidentally passed to sprintf but had been meant for a different function.R allows combining both numbered and un-numbered references in a single format, even though it may be better to avoid and POSIX does not allow that.BestTomasOn 9/20/20 1:03 PM, Matthias Gondan wrote:> Dear R developers,>> I am wondering if this should raise an error or a warning.>>> sprintf('%.f, %.f', 1, 2, 3)> [1] "1, 2">> I am aware that R has „numbered“ sprintf arguments (sprintf('%1$.f', …), and in that case, omissing of specific arguments may be intended. But in the usual syntax, omission of an argument is probably a mistake.>> Thank you for your consideration.>> Best wishes,>> Matthias>>> [[alternative HTML version deleted]]>> __> R-devel@r-project.org mailing list> https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] sprintf, check number of parameters
Dear Matthias, On 11/4/20 4:01 PM, matthias-gondan wrote: > Dear Tomas, > > Thank you. > > Regarding the "unnumbered" arguments, i.e. sprintf('%f %f', 1, 2, 3). > This was the case I wanted to report, here a warning can be very useful. > > Regarding the "numbered" arguments, that is, sprintf('%$1f %$3f', 1, > 2, 3). Here, omission of an argument might be intended, for example, > in an application with support for multiple languages. Therefore, I am > wondering if a warning should be raised. It is rather "%$1f", etc. Say GCC warns also on unused arguments with numbered references ("unused arguments in $-style format"). I have not yet received any feedback from package maintainers who would have found a problem with the new warning for message translation. Would you have an example pattern that should be supported? Shouldn't all arguments be used, anyway, just possibly in different order? Unless there is a strong reason to do otherwise, I would rather not introduce more deviations from the C behavior. Of course, technically it would be simple: not print a warning when there is at least one numbered reference. Best Tomas > Regarding the mixture: never heard that this works, and I would > probably not want to use it... > > Your work is much appreciated, thanks again. > > Best regards, > > Matthias > > > Ursprüngliche Nachricht > Von: Tomas Kalibera > Datum: 04.11.20 15:43 (GMT+01:00) > An: Matthias Gondan , r-devel@r-project.org > Betreff: Re: [Rd] sprintf, check number of parameters > > Dear Matthias, > > thanks for the suggestion, R-devel now warns on unused arguments by > format (both numbered and un-numbered). It seems that the new warning is > useful, often it finds cases when arguments were accidentally passed to > sprintf but had been meant for a different function. > > R allows combining both numbered and un-numbered references in a single > format, even though it may be better to avoid and POSIX does not allow > that. > > Best > Tomas > > On 9/20/20 1:03 PM, Matthias Gondan wrote: > > Dear R developers, > > > > I am wondering if this should raise an error or a warning. > > > >> sprintf('%.f, %.f', 1, 2, 3) > > [1] "1, 2" > > > > I am aware that R has „numbered“ sprintf arguments (sprintf('%1$.f', > …), and in that case, omissing of specific arguments may be intended. > But in the usual syntax, omission of an argument is probably a mistake. > > > > Thank you for your consideration. > > > > Best wishes, > > > > Matthias > > > > > > [[alternative HTML version deleted]] > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] sprintf, check number of parameters
Now that you ask, no real use case comes to my mind in which an argument is skipped in one language but not another. Thank you for implementing the warning, I guess it will be useful on the long run.Best wishesMatthias Ursprüngliche Nachricht Von: Tomas Kalibera Datum: 04.11.20 16:26 (GMT+01:00) An: matthias-gondan , r-devel@r-project.org Betreff: Re: [Rd] sprintf, check number of parameters Dear Matthias, On 11/4/20 4:01 PM, matthias-gondan wrote: Dear Tomas, Thank you. Regarding the "unnumbered" arguments, i.e. sprintf('%f %f', 1, 2, 3). This was the case I wanted to report, here a warning can be very useful. Regarding the "numbered" arguments, that is, sprintf('%$1f %$3f', 1, 2, 3). Here, omission of an argument might be intended, for example, in an application with support for multiple languages. Therefore, I am wondering if a warning should be raised. It is rather "%$1f", etc. Say GCC warns also on unused arguments with numbered references ("unused arguments in $-style format"). I have not yet received any feedback from package maintainers who would have found a problem with the new warning for message translation. Would you have an example pattern that should be supported? Shouldn't all arguments be used, anyway, just possibly in different order? Unless there is a strong reason to do otherwise, I would rather not introduce more deviations from the C behavior. Of course, technically it would be simple: not print a warning when there is at least one numbered reference. Best Tomas Regarding the mixture: never heard that this works, and I would probably not want to use it... Your work is much appreciated, thanks again. Best regards, Matthias Ursprüngliche Nachricht Von: Tomas Kalibera Datum: 04.11.20 15:43 (GMT+01:00) An: Matthias Gondan , r-devel@r-project.org Betreff: Re: [Rd] sprintf, check number of parameters Dear Matthias, thanks for the suggestion, R-devel now warns on unused arguments by format (both numbered and un-numbered). It seems that the new warning is useful, often it finds cases when arguments were accidentally passed to sprintf but had been meant for a different function. R allows combining both numbered and un-numbered references in a single format, even though it may be better to avoid and POSIX does not allow that. Best Tomas On 9/20/20 1:03 PM, Matthias Gondan wrote: > Dear R developers, > > I am wondering if this should raise an error or a warning. > >> sprintf('%.f, %.f', 1, 2, 3) > [1] "1, 2" > > I am aware that R has „numbered“ sprintf arguments (sprintf('%1$.f', …), and in that case, omissing of specific arguments may be intended. But in the usual syntax, omission of an argument is probably a mistake. > > Thank you for your consideration. > > Best wishes, > > Matthias > > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] (no subject)
Hi All, I am no longer with TIBCO and hope to be able to contribute more directly to R now. It will take a little while to set up a build environment and to start working on some bugzilla issues. -Bill Dunlap williamwdun...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel