Re: [Rd] parallel PSOCK connection latency is greater on Linux?

2020-11-04 Thread Iñaki Ucar
Please, check a tcpdump session on localhost while running the following script:

library(parallel)
library(tictoc)
cl <- makeCluster(1)
Sys.sleep(1)

for (i in 1:10) {
  tic()
  x <- clusterEvalQ(cl, iris)
  toc()
}

The initialization phase comprises 7 packets. Then, the 1-second sleep
will help you see where the evaluation starts. Each clusterEvalQ
generates 6 packets:

1. main -> worker PSH, ACK 1026 bytes
2. worker -> main ACK 66 bytes
3. worker -> main PSH, ACK 3758 bytes
4. main -> worker ACK 66 bytes
5. worker -> main PSH, ACK 2484 bytes
6. main -> worker ACK 66 bytes

The first two are the command and its ACK, the following are the data
back and their ACKs. In the first 4-5 iterations, I see no delay at
all. Then, in the following iterations, a 40 ms delay starts to happen
between packets 3 and 4, that is: the main process delays the ACK to
the first packet of the incoming result.

So I'd say Nagle is hardly to blame for this. It would be interesting
to see how many packets are generated with TCP_NODELAY on. If there
are still 6 packets, then we are fine. If we suddenly see a gazillion
packets, then TCP_NODELAY does more harm than good. On the other hand,
TCP_QUICKACK would surely solve the issue without any drawback. As
Nagle himself put it once, "set TCP_QUICKACK. If you find a case where
that makes things worse, let me know."

Iñaki

On Wed, 4 Nov 2020 at 04:34, Simon Urbanek  wrote:
>
> I'm not sure the user would know ;). This is very system-specific issue just 
> because the Linux network stack behaves so differently from other OSes (for 
> purely historical reasons). That makes it hard to abstract as a "feature" for 
> the R sockets that are supposed to be platform-independent. At least 
> TCP_NODELAY is actually part of POSIX so it is on better footing, and 
> disabling delayed ACK is practically only useful to work around the other 
> side having Nagle on, so I would expect it to be rarely used.
>
> This is essentially RFC since we don't have a mechanism for socket options 
> (well, almost, there is timeout and blocking already...) and I don't think we 
> want to expose low-level details so perhaps one idea would be to add 
> something like delay=NA to socketConnection() in order to not touch (NA), 
> enable (TRUE) or disable (FALSE) TCP_NODELAY. I wonder if there is any other 
> way we could infer the intention of the user to try to choose the right 
> approach...
>
> Cheers,
> Simon
>
>
> > On Nov 3, 2020, at 02:28, Jeff  wrote:
> >
> > Could TCP_NODELAY and TCP_QUICKACK be exposed to the R user so that they 
> > might determine what is best for their potentially latency- or 
> > throughput-sensitive application?
> >
> > Best,
> > Jeff
> >
> > On Mon, Nov 2, 2020 at 14:05, Iñaki Ucar  wrote:
> >> On Mon, 2 Nov 2020 at 02:22, Simon Urbanek  
> >> wrote:
> >>> It looks like R sockets on Linux could do with TCP_NODELAY -- without 
> >>> (status quo):
> >> How many network packets are generated with and without it? If there
> >> are many small writes and thus setting TCP_NODELAY causes many small
> >> packets to be sent, it might make more sense to set TCP_QUICKACK
> >> instead.
> >> Iñaki
> >>> Unit: microseconds
> >>>expr  min   lq mean  median   uq  
> >>> max
> >>>  clusterEvalQ(cl, iris) 1449.997 43991.99 43975.21 43997.1 44001.91 
> >>> 48027.83
> >>>  neval
> >>>   1000
> >>> exactly the same machine + R but with TCP_NODELAY enabled in 
> >>> R_SockConnect():
> >>> Unit: microseconds
> >>>expr min lq mean  median  uq  max 
> >>> neval
> >>>  clusterEvalQ(cl, iris) 156.125 166.41 180.8806 170.247 174.298 5322.234  
> >>> 1000
> >>> Cheers,
> >>> Simon
> >>> > On 2/11/2020, at 3:39 AM, Jeff  wrote:
> >>> >
> >>> > I'm exploring latency overhead of parallel PSOCK workers and noticed 
> >>> > that serializing/unserializing data back to the main R session is 
> >>> > significantly slower on Linux than it is on Windows/MacOS with similar 
> >>> > hardware. Is there a reason for this difference and is there a way to 
> >>> > avoid the apparent additional Linux overhead?
> >>> >
> >>> > I attempted to isolate the behavior with a test that simply returns an 
> >>> > existing object from the worker back to the main R session.
> >>> >
> >>> > library(parallel)
> >>> > library(microbenchmark)
> >>> > gcinfo(TRUE)
> >>> > cl <- makeCluster(1)
> >>> > (x <- microbenchmark(clusterEvalQ(cl, iris), times = 1000, unit = "us"))
> >>> > plot(x$time, ylab = "microseconds")
> >>> > head(x$time, n = 10)
> >>> >
> >>> > On Windows/MacOS, the test runs in 300-500 microseconds depending on 
> >>> > hardware. A few of the 1000 runs are an order of magnitude slower but 
> >>> > this can probably be attributed to garbage collection on the worker.
> >>> >
> >>> > On Linux, the first 5 or so executions run at comparable speeds but all 
> >>> > subsequent executions are two orders of magnitude slower (~40 
> >>> > milliseconds).
> >>> >

Re: [Rd] sprintf, check number of parameters

2020-11-04 Thread Tomas Kalibera

Dear Matthias,

thanks for the suggestion, R-devel now warns on unused arguments by 
format (both numbered and un-numbered). It seems that the new warning is 
useful, often it finds cases when arguments were accidentally passed to 
sprintf but had been meant for a different function.


R allows combining both numbered and un-numbered references in a single 
format, even though it may be better to avoid and POSIX does not allow 
that.


Best
Tomas

On 9/20/20 1:03 PM, Matthias Gondan wrote:

Dear R developers,

I am wondering if this should raise an error or a warning.


sprintf('%.f, %.f', 1, 2, 3)

[1] "1, 2"

I am aware that R has „numbered“ sprintf arguments (sprintf('%1$.f', …), and in 
that case, omissing of specific arguments may be intended. But in the usual 
syntax, omission of an argument is probably a mistake.

Thank you for your consideration.

Best wishes,

Matthias


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sprintf, check number of parameters

2020-11-04 Thread matthias-gondan

Dear Tomas,Thank you.Regarding the "unnumbered" arguments, i.e. sprintf('%f 
%f', 1, 2, 3). This was the case I wanted to report, here a warning can be very 
useful.Regarding the "numbered" arguments, that is, sprintf('%$1f %$3f', 1, 2, 
3). Here, omission of an argument might be intended, for example, in an 
application with support for multiple languages. Therefore, I am wondering if a 
warning should be raised.Regarding the mixture: never heard that this works, 
and I would probably not want to use it...Your work is much appreciated, thanks 
again.Best regards,Matthias
 Ursprüngliche Nachricht Von: Tomas Kalibera 
 Datum: 04.11.20  15:43  (GMT+01:00) An: Matthias 
Gondan , r-devel@r-project.org Betreff: Re: [Rd] 
sprintf, check number of parameters Dear Matthias,thanks for the suggestion, 
R-devel now warns on unused arguments by format (both numbered and 
un-numbered). It seems that the new warning is useful, often it finds cases 
when arguments were accidentally passed to sprintf but had been meant for a 
different function.R allows combining both numbered and un-numbered references 
in a single format, even though it may be better to avoid and POSIX does not 
allow that.BestTomasOn 9/20/20 1:03 PM, Matthias Gondan wrote:> Dear R 
developers,>> I am wondering if this should raise an error or a warning.>>> 
sprintf('%.f, %.f', 1, 2, 3)> [1] "1, 2">> I am aware that R has „numbered“ 
sprintf arguments (sprintf('%1$.f', …), and in that case, omissing of specific 
arguments may be intended. But in the usual syntax, omission of an argument is 
probably a mistake.>> Thank you for your consideration.>> Best wishes,>> 
Matthias>>> [[alternative HTML version deleted]]>> 
__> R-devel@r-project.org mailing 
list> https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sprintf, check number of parameters

2020-11-04 Thread Tomas Kalibera


Dear Matthias,

On 11/4/20 4:01 PM, matthias-gondan wrote:
> Dear Tomas,
>
> Thank you.
>
> Regarding the "unnumbered" arguments, i.e. sprintf('%f %f', 1, 2, 3). 
> This was the case I wanted to report, here a warning can be very useful.
>
> Regarding the "numbered" arguments, that is, sprintf('%$1f %$3f', 1, 
> 2, 3). Here, omission of an argument might be intended, for example, 
> in an application with support for multiple languages. Therefore, I am 
> wondering if a warning should be raised.

It is rather "%$1f", etc.

Say GCC warns also on unused arguments with numbered references ("unused 
arguments in $-style format"). I have not yet received any feedback from 
package maintainers who would have found a problem with the new warning 
for message translation. Would you have an example pattern that should 
be supported? Shouldn't all arguments be used, anyway, just possibly in 
different order?

Unless there is a strong reason to do otherwise, I would rather not 
introduce more deviations from the C behavior. Of course, technically it 
would be simple: not print a warning when there is at least one numbered 
reference.

Best
Tomas

> Regarding the mixture: never heard that this works, and I would 
> probably not want to use it...
>
> Your work is much appreciated, thanks again.
>
> Best regards,
>
> Matthias
>
>
>  Ursprüngliche Nachricht 
> Von: Tomas Kalibera 
> Datum: 04.11.20 15:43 (GMT+01:00)
> An: Matthias Gondan , r-devel@r-project.org
> Betreff: Re: [Rd] sprintf, check number of parameters
>
> Dear Matthias,
>
> thanks for the suggestion, R-devel now warns on unused arguments by
> format (both numbered and un-numbered). It seems that the new warning is
> useful, often it finds cases when arguments were accidentally passed to
> sprintf but had been meant for a different function.
>
> R allows combining both numbered and un-numbered references in a single
> format, even though it may be better to avoid and POSIX does not allow
> that.
>
> Best
> Tomas
>
> On 9/20/20 1:03 PM, Matthias Gondan wrote:
> > Dear R developers,
> >
> > I am wondering if this should raise an error or a warning.
> >
> >> sprintf('%.f, %.f', 1, 2, 3)
> > [1] "1, 2"
> >
> > I am aware that R has „numbered“ sprintf arguments (sprintf('%1$.f', 
> …), and in that case, omissing of specific arguments may be intended. 
> But in the usual syntax, omission of an argument is probably a mistake.
> >
> > Thank you for your consideration.
> >
> > Best wishes,
> >
> > Matthias
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sprintf, check number of parameters

2020-11-04 Thread matthias-gondan
Now that you ask, no real use case comes to my mind in which an argument is 
skipped in one language but not another. Thank you for implementing the 
warning, I guess it will be useful on the long run.Best wishesMatthias
 Ursprüngliche Nachricht Von: Tomas Kalibera 
 Datum: 04.11.20  16:26  (GMT+01:00) An: 
matthias-gondan , r-devel@r-project.org Betreff: Re: 
[Rd] sprintf, check number of parameters 


Dear Matthias,


On 11/4/20 4:01 PM, matthias-gondan
  wrote:


  
  Dear Tomas,
  
  
  Thank you.
  
  
  Regarding the "unnumbered" arguments, i.e.
sprintf('%f %f', 1, 2, 3). This was the case I wanted to report,
here a warning can be very useful.
  
  
  Regarding the "numbered" arguments, that is,
sprintf('%$1f %$3f', 1, 2, 3). Here, omission of an argument
might be intended, for example, in an application with support
for multiple languages. Therefore, I am wondering if a warning
should be raised.

It is rather "%$1f", etc.
Say GCC warns also on unused arguments with numbered references
  ("unused arguments in $-style format"). I have not yet received
  any feedback from package maintainers who would have found a
  problem with the new warning for message translation. Would you
  have an example pattern that should be supported? Shouldn't all
  arguments be used, anyway, just possibly in different order?
Unless there is a strong reason to do otherwise, I would rather
  not introduce more deviations from the C behavior. Of course,
  technically it would be simple: not print a warning when there is
  at least one numbered reference.

Best
  Tomas


  Regarding the mixture: never heard that this
works, and I would probably not want to use it...
  
  
  Your work is much appreciated, thanks again.
  
  
  Best regards,
  
  
  

Matthias
  
  
  
  
  
 Ursprüngliche Nachricht 
Von: Tomas Kalibera  
Datum: 04.11.20 15:43 (GMT+01:00) 
An: Matthias Gondan ,
  r-devel@r-project.org 
Betreff: Re: [Rd] sprintf, check number of parameters 


  
  Dear Matthias,
  
  thanks for the suggestion, R-devel now warns on unused arguments
  by 
  format (both numbered and un-numbered). It seems that the new
  warning is 
  useful, often it finds cases when arguments were accidentally
  passed to 
  sprintf but had been meant for a different function.
  
  R allows combining both numbered and un-numbered references in a
  single 
  format, even though it may be better to avoid and POSIX does not
  allow 
  that.
  
  Best
  Tomas
  
  On 9/20/20 1:03 PM, Matthias Gondan wrote:
  > Dear R developers,
  >
  > I am wondering if this should raise an error or a warning.
  >
  >> sprintf('%.f, %.f', 1, 2, 3)
  > [1] "1, 2"
  >
  > I am aware that R has „numbered“ sprintf arguments
  (sprintf('%1$.f', …), and in that case, omissing of specific
  arguments may be intended. But in the usual syntax, omission of an
  argument is probably a mistake.
  >
  > Thank you for your consideration.
  >
  > Best wishes,
  >
  > Matthias
  >
  >
  > [[alternative HTML version deleted]]
  >
  > __
  > R-devel@r-project.org mailing list
  > https://stat.ethz.ch/mailman/listinfo/r-devel
  
  



  


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] (no subject)

2020-11-04 Thread Bill Dunlap
Hi All,

I am no longer with TIBCO and hope to be able to contribute more directly
to R now.  It will take a little while to set up a build environment and to
start working on some bugzilla issues.

-Bill Dunlap
williamwdun...@gmail.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel