[Rd] Request for stopifnot

2021-10-06 Thread Tim Taylor
Would R-core be receptive to adding an additional parameter to stopifnot so we 
can hide the call in the output as in stop?

i.e. The signature would become:
stopifnot2 <- function (..., exprs, exprObject, local = TRUE, .call = TRUE)

It looks like this would be a one-line change to the the underlying stop call 
to:
stop(simpleError(msg, call = if((p <- sys.parent(1L)) && isTRUE(.call)) 
sys.call(p)))

Best

Tim
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R/CRAN switch to UCRT on Windows

2021-12-08 Thread Tim Taylor
Hear, hear! Also thank you Uwe and the rest of the CRAN team for all the work 
you put in. It is much appreciated!


From: R-devel  on behalf of Uwe Ligges 

Sent: 08 December 2021 14:24
To: Tomas Kalibera; r-devel
Subject: Re: [Rd] R/CRAN switch to UCRT on Windows

Thank you, Tomas, for your hard work on the new toolchain, its
documentation, and all your efforts in providing patches for R and for
several contributed packages.

Best,
Uwe



On 08.12.2021 14:56, Tomas Kalibera wrote:
> Please note an update concerning the support of UTF-8 as native encoding
> on Windows, which may at this point be of interest particularly to
> developers of packages with native code and to R users using R-devel
> (the development version of R) on Windows:
>
> https://developer.r-project.org/Blog/public/2021/12/07/upcoming-changes-in-r-4.2-on-windows/
>
>
> The key part is that CRAN will switch the incoming checks of R packages
> on Windows to a new toolchain targeting UCRT on Monday, December 13.
>
> It may take up to several days for all systems to synchronize and during
> this time, it may be difficult to build R-devel on Windows from source
> or to install packages.  After the switch, the snapshot R-devel builds
> and binary package builds provided by CRAN will be built using the new
> toolchain for UCRT. These new builds will use UTF-8 as the native
> encoding on recent Windows.
>
> These builds will be incompatible with the previous builds for MSVCRT
> and installed/binary packages will be incompatible as well. The
> recommended/simplest course of action for R-devel users is to uninstall
> the old build of R-devel, RTools, delete the old package libraries, and
> then install the new versions.
>
> Checks of CRAN packages with the new toolchain have been running since
> March with results available on CRAN pages. By now, most packages are
> working, but some packages using native (C, C++, Fortran) code still
> have to be updated. The Winbuilder service and R-hub support the new
> toolchain, there is also support/example for using github actions. The
> builds of R-devel and CRAN (and recommended binary packages) with the
> new toolchain are available regularly since March.
>
> I've created patches for CRAN (and required Bioconductor packages) which
> are installed automatically at package installation time by R. This
> feature will be also in R-devel after the switch and will be used
> temporarily to give package authors more time to fix their packages. Uwe
> Ligges, other CRAN team members and I have also been in touch with some
> package authors, providing advice how to fix their packages, when the
> issues required more explanation. I am prepared to help the remaining
> authors as well if needed.
>
> Please see the blog post and materials linked from there for more
> details and feel free to ask questions.
>
> Thanks
> Tomas
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] model.weights and model.offset: request for adjustment

2022-02-03 Thread tim . taylor


> On 03/02/2022 11:14 Martin Maechler  wrote:
> 
>  
> > Ben Bolker 
> > on Tue, 1 Feb 2022 21:21:46 -0500 writes:
> 
> > The model.weights() and model.offset() functions from the 'stats' 
> > package index possibly-missing elements of a data frame via $, e.g.
> 
> > x$"(offset)"
> > x$"(weights)"
> 
> > This returns NULL without comment when x is a data frame:
> 
> > x <- data.frame(a=1)
> > x$"(offset)"  ## NULL
> > x$"(weights)"  ## NULL
> 
> > However, when x is a tibble we get a warning as well:
> 
> > x <- tibble::as_tibble(x)
> > x$"(offset)"
> > ## NULL
> > ## Warning message:
> > ## Unknown or uninitialised column: `(offset)`.
> 
> > I know it's not R-core's responsibility to manage forward 
> > compatibility with tibbles, but in this case [[-indexing would seem to 
> > be better practice in any case.
> 
> Yes, I would agree:  we should use  [[ instead of $ here
> in order to force exact matching just as principle
> 
> Importantly, because  also  mf[["(weights)"]]
> will return  NULL without a warning for a model/data frame, and
> it seems it does so also for tibbles.
> 
> > Might a patch be accepted ... ?
> 
> That would not be necessary.
> 
> There's one remaining problem however:
> `$` access is clearly faster than `[[` for small data frames
> (because `$` is a primitive function doing everything in C, 
>  whereas `[[` calls the R level data frame method ).
> 
> Faster in both cases, i.e., when there *is* a column and when there
> is none (and NULL is returned), e.g., for the first case
> 
> > system.time(for(i in 1:2) df[["a"]])
>user  system elapsed 
>   0.064   0.000   0.065 
> > system.time(for(i in 1:2) df$a)
>user  system elapsed 
>   0.009   0.000   0.009 
> 
> So that's probably been the reason why  `$`  has been prefered?

Would .subset2(df, "a) be preferable?
R> df <- mtcars
R> system.time(for(i in 1:2) df[["hp"]])
   user  system elapsed 
  0.078   0.000   0.078 
R> system.time(for(i in 1:2) df$hp)
   user  system elapsed 
  0.011   0.000   0.010 
R> system.time(for(i in 1:2) .subset2(df,"hp"))
   user  system elapsed 
  0.004   0.000   0.004 
Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R blog link on developer page

2022-11-08 Thread tim . taylor
The link to the R blog on https://developer.r-project.org/ currently points to 
the old site (https://developer.r-project.org/Blog/public). Should the link be 
updated to the new location (https://blog.r-project.org/)?
 
Apologies if this was the wrong list to raise this - please flag if there's a 
more appropriate one.
 
Tim
[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Hash table plans

2023-01-04 Thread Tim Taylor
I note in r-devel the hash table functionality is still marked as 
experimental. Is it expected that this will progress to non-experimental 
in 4.3 or is there a need for more feedback from the wider community first?


Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] memory leak in png()

2023-01-17 Thread Tim Taylor

On 17/01/2023 13:06, Duncan Murdoch wrote:
I don't have a valgrind-capable version of R, but I'd be interested to 
see whether this is a one-time loss, or repeated.  That is, do you get a 
much bigger loss from running the lossy code in a loop like this?


  for (i in 1:100) { png(filename='p.png'); plot(1:10); dev.off() }

Duncan Murdoch


Duncan - Not that I'm seeing

Rdevel -d valgrind --vanilla -e " for (i in 1:1) {png(filename='p.png'); 
plot(1:10); dev.off()}"


==63291== LEAK SUMMARY:
==63291==definitely lost: 2,560 bytes in 4 blocks
==63291==indirectly lost: 17,710 bytes in 762 blocks
==63291==  possibly lost: 1,820 bytes in 8 blocks
==63291==still reachable: 52,177,408 bytes in 21,282 blocks
==63291==   of which reachable via heuristic:
==63291== newarray   : 4,264 bytes in 1 
blocks

==63291== suppressed: 0 bytes in 0 blocks

Rdevel -d valgrind --vanilla -e " for (i in 1:100) 
{png(filename='p.png'); plot(1:10); dev.off()}"


==63464== LEAK SUMMARY:
==63464==definitely lost: 2,560 bytes in 4 blocks
==63464==indirectly lost: 17,710 bytes in 762 blocks
==63464==  possibly lost: 1,820 bytes in 8 blocks
==63464==still reachable: 56,072,939 bytes in 19,283 blocks
==63464==   of which reachable via heuristic:
==63464== newarray   : 4,264 bytes in 1 
blocks

==63464== suppressed: 0 bytes in 0 blocks

Tim




On 17/01/2023 4:55 a.m., Martin Maechler wrote:

Edward Ionides
 on Mon, 16 Jan 2023 09:04:49 -0500 writes:


 > Hi all,

 > Yesterday I discovered what seems to me like a memory leak in 
png() so I'm
 > reporting it in case that is helpful. Here is a small 
reproducible example:


 > R -d "valgrind --tool=memcheck --track-origins=yes 
--leak-check=full"

 > --vanilla -e "png(filename='p.png'); plot(1:10); dev.off()"
 > ## HAS LEAK
 > ==1021711== LEAK SUMMARY:
 > ==1021711==    definitely lost: 9,216 bytes in 30 blocks
 > ==1021711==    indirectly lost: 19,370 bytes in 838 blocks
 > ==1021711==  possibly lost: 3,868 bytes in 8 blocks

 > R -d "valgrind --tool=memcheck --track-origins=yes 
--leak-check=full"

 > --vanilla -e "pdf(file='p.pdf'); plot(1:10); dev.off()"
 > ## NO LEAK
 > ==1031300== LEAK SUMMARY:
 > ==1031300==    definitely lost: 0 bytes in 0 blocks
 > ==1031300==    indirectly lost: 0 bytes in 0 blocks
 > ==1031300==  possibly lost: 0 bytes in 0 blocks

I can reproduce, although I need to have the memcheck options in 
~/.valgrindrc

The same happens if grid-based graphics is used and for the
latest R-devel :

R-devel -d valgrind --vanilla -e 
'png("p.png");lattice::xyplot(1~1);dev.off()'


Using  png() shows leak
using  pdf()  is fine (0 bytes lost)


Looking at the full report (--leak-check=full  --track-origins=true
as 2 lines in ~/.valgrindrc ) I see several origins tracked to
internal malloc code,
but then also e.g.,

==1410108== 96 bytes in 1 blocks are possibly lost in loss record 700 
of 3,037

==1410108==    at 0x484A464: calloc (vg_replace_malloc.c:1328)
==1410108==    by 0x159EA3A0: g_malloc0 (gmem.c:155)
==1410108==    by 0x15A3AB8C: g_rc_box_alloc_full.constprop.0 
(grcbox.c:234)

==1410108==    by 0x1600A6C9: UnknownInlinedFun (pangofc-fontmap.c:899)
==1410108==    by 0x1600A6C9: UnknownInlinedFun (pangofc-fontmap.c:2145)
==1410108==    by 0x1600A6C9: pango_fc_font_map_load_fontset 
(pangofc-fontmap.c:2245)

==1410108==    by 0x158E7474: UnknownInlinedFun (itemize.c:892)
==1410108==    by 0x158E7474: UnknownInlinedFun (itemize.c:952)
==1410108==    by 0x158E7474: pango_itemize_with_font (itemize.c:1564)
==1410108==    by 0x158FA89E: 
pango_layout_check_lines.part.0.lto_priv.0 (pango-layout.c:4894)

==1410108==    by 0x158EF4DB: UnknownInlinedFun (pango-layout.c:4786)
==1410108==    by 0x158EF4DB: pango_layout_get_line (pango-layout.c:1715)
==1410108==    by 0x1B5E441F: PG_text_extents (cairoFns.c:1487)
==1410108==    by 0x1B5E441F: PangoCairo_StrWidth (cairoFns.c:1565)
==1410108==    by 0x4CEBE6: GEStrWidth (engine.c:2615)
==1410108==    by 0x4CEBE6: GEStrWidth (engine.c:2578)
==1410108==    by 0x1BE12D11: textRect (util.c:198)
==1410108==    by 0x1BDFAF19: gridText (grid.c:3740)
==1410108==    by 0x1BE01FAB: L_textBounds (grid.c:3892)

which (from the bottom up) shows  package grid C code,
then R main but infrastructure for grDevices ("GEStrWidth 
(engine.c)")  and

subdirectory grDevices/src/cairo/cairoFns.c
and then goes into system cairo or pangocairo libraries,
which here (Linux Fedora 36) are (I thik)

libpangocairo-1.0.so.0
libpango-1.0.so.0

{as they are stripped, I don't know how to check }

To *fix* this I also have to defer to others.

Thank you for the report,
Martin

--
Martin Maechler
ETH Zurich  and  R Core team


 > For some context, I am running R4.2.2. My goal was to run 
valgrind on the
 > latest version of my

Re: [Rd] range() for Date and POSIXct could respect `finite = TRUE`

2023-04-28 Thread Tim Taylor
A tiny nit-pick nit-pick: I'd take NA to mean the finish date is missing 
and you know neither whether the event has finished or if it has 
finished at all :-)


Either way the proposed method seems sensible.

Tim

On 28/04/2023 16:29, Paul McQuesten wrote:

A tiny nit-pick: Seems to me that end date = NA would mean the event has
not yet ended, whilst Inf would mean that the event is known to never
terminate, ie: an eternal fact, or physical law.

On Fri, Apr 28, 2023 at 10:12 AM Davis Vaughan via R-devel <
r-devel@r-project.org> wrote:


Hi all,

I noticed that `range.default()` has a nice `finite = TRUE` argument,
but it doesn't actually apply to Date or POSIXct due to how
`is.numeric()` works.

```
x <- .Date(c(0, Inf, 1, 2, Inf))
x
#> [1] "1970-01-01" "Inf""1970-01-02" "1970-01-03" "Inf"

# Darn!
range(x, finite = TRUE)
#> [1] "1970-01-01" "Inf"

# What I want
.Date(range(unclass(x), finite = TRUE))
#> [1] "1970-01-01" "1970-01-03"
```

I think `finite = TRUE` would be pretty nice for Dates in particular.

As a motivating example, sometimes you have ranges of dates
represented by start/end pairs. It is fairly natural to represent an
event that hasn't ended yet with an infinite date. If you need to then
compute a sequence of dates spanning the full range of the start/end
pairs, it would be nice to be able to use `range(finite = TRUE)` to do
so:

```
start <- as.Date(c("2019-01-05", "2019-01-10", "2019-01-11", "2019-01-14"))
end <- as.Date(c("2019-01-07", NA, "2019-01-14", NA))
end[is.na(end)] <- Inf

# `end = Inf` means that the event hasn't "ended" yet
data.frame(start, end)
#>startend
#> 1 2019-01-05 2019-01-07
#> 2 2019-01-10Inf
#> 3 2019-01-11 2019-01-14
#> 4 2019-01-14Inf

# Create a full sequence along all days in start/end
range <- .Date(range(unclass(c(start, end)), finite = TRUE))
seq(range[1], range[2], by = 1)
#>  [1] "2019-01-05" "2019-01-06" "2019-01-07" "2019-01-08" "2019-01-09"
#>  [6] "2019-01-10" "2019-01-11" "2019-01-12" "2019-01-13" "2019-01-14"
```

It seems like one option is to create a `range.Date()` method that
unclasses, forwards the arguments on to a second call to `range()`,
and then reclasses?

```
range.Date <- function(x, ..., na.rm = FALSE, finite = FALSE) {
   .Date(range(unclass(x), na.rm = na.rm, finite = finite), oldClass(x))
}
```

This is similar to how `rep.Date()` works.

Thanks,
Davis Vaughan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] as.matrix.dist patch (performance)

2023-08-10 Thread Tim Taylor
Please find attached a small patch to improve the performance of 
as.matrix.dist().  It's a tiny bit more involved than the current code 
but does bring a reasonable speed improvement for larger  objects 
(remaining comparable for smaller ones).


Example:

set.seed(1)
dat <- matrix(rnorm(2), ncol = 2);
system.time(as.matrix(dist(dat)))

As of r84931:

   user  system elapsed
  3.370   1.154   4.535

With this patch:

   user  system elapsed
  1.925   0.754   2.685

Submitting here in the first instance but happy to move to Bugzilla if 
more appropriate.


Cheers

Tim
Index: src/library/stats/R/dist.R
===
--- src/library/stats/R/dist.R	(revision 84931)
+++ src/library/stats/R/dist.R	(working copy)
@@ -49,10 +49,13 @@
 {
 size <- attr(x, "Size")
 df <- matrix(0, size, size)
-lower <- row(df) > col(df)
+idx <- seq_len(size)
+d1 <- unlist(lapply(idx[-1L], seq.int, to = size, by = 1L))
+d2 <- rep.int(idx[-size], times = rev(idx[-size]))
+lower <- cbind(d1,d2)
+upper <- cbind(d2,d1)
 df[lower] <- x ## preserving NAs in x
-df <- t(df)
-df[lower] <- x
+df[upper] <- x
 labels <- attr(x, "Labels")
 dimnames(df) <-
 	if(is.null(labels)) list(seq_len(size), seq_len(size)) else list(labels,labels)
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

2023-08-14 Thread Tim Taylor
Martin,

Thank you. Everything you have written is helpful and I admit I am likely 
guilty of using as.character() instead of format() in the past().

Ignoring the above though, one thing I’m still unclear on is the special 
handling of zero (or rather non-zero time) seconds in the method. Is the 
motivation that as.character() outputs the minimum necessary information? It is 
clearly a very deliberate choice but the reasoning is still going a little over 
my head.

Best

Tim

> On 14 Aug 2023, at 09:52, Martin Maechler  wrote:
> 
> 
>> 
>> Andy Teucher 
>>on Fri, 11 Aug 2023 16:07:36 -0700 writes:
> 
>> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 
>> (https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b),
>>  and I have come across a new behaviour and I wonder if it is unintended?
> 
> Well, as the NEWS entry says
> (partly visible in the url above -- which only shows one part of
>  the several changes for R 4.3) :
> 
>• as.character() now behaves more in line with the methods
>  for atomic vectors such as numbers, and is no longer influenced
>  by options().  Ditto for as.character().  The
>  as.character() method gets arguments digits and OutDec with
>  defaults _not_ depending on options().  Use of as.character(*,
>  format = .) now warns.
> 
> It was "inconsistent" to have  as.character(.) basically use format(.) for
> these datatime objects.
> as.character(x) for basic R types such as numbers, strings, logicals,... 
> fulfills the important property
> 
> as.character(x)[j] === as.character(x[j])
> 
> whereas that is very much different for format() where indeed,
> the formatting  of  x[1]  may quite a bit depend on the other
> x[j]'s values:
> 
>> as.character(c(1, pi, pi/2^20))
> [1] "1""3.14159265358979"   "2.99605622633914e-06"
> 
>> format(c(1, pi, pi/2^20))
> [1] "1.00e+00" "3.141593e+00" "2.996056e-06"
>> format(c(1, pi))
> [1] "1.00" "3.141593"
>> format(c(1, 10))
> [1] " 1" "10"
>> 
> 
> 
>> When calling `as.character.POSIXt()` on a vector that contains elements 
>> where the time component is midnight (00:00:00), it drops the time component 
>> of that element in the resulting character vector. Previously the time 
>> component was retained: 
> 
>> In R 4.2.3:
> 
>> ```
>> R.version$version.string
>> #> [1] "R version 4.2.3 (2023-03-15)"
> 
>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST”
> 
>> (tc <- as.character(t))
>> #> [1] "1975-01-01 00:00:00" "1975-01-01 15:27:00”
>> ```
> 
>> In R 4.3.1:
> 
>> ```
>> R.version$version.string
>> #> [1] "R version 4.3.1 (2023-06-16)"
> 
>> (t <- as.POSIXct(c("1975-01-01 00:00:00", "1975-01-01 15:27:00")))
>> #> [1] "1975-01-01 00:00:00 PST" "1975-01-01 15:27:00 PST”
> 
>> (tc <- as.character(t))
>> #> [1] "1975-01-01" "1975-01-01 15:27:00”
>> ```
> 
> You should have used format()  here  or at least should do so now.
> 
>> This has consequences when round-tripping from POSIXt ->
>> character -> POSIXt,
> 
> Well, I'd argue that such a "round trip" is not a "good idea"
> anyway, as there are quite a few platform (local timezone for
> one) issues, and precision is lost, notably for POSIXlt which
> may be more precise than you typically get, etc. 
> 
>> since `as.POSIXct.character()` drops the time component from the entire 
>> vector if any element does not have a time component:
> 
> Well, there *is* no as.POSIXct.character()  {but we understand what you 
> mean}: 
> If you look at the help page you'd see that there's  as.POSIXlt.character()
> {which is called from as.POSIXct.default()}
> with a 3rd argument 'format' and a 4th argument 'tryFormats'
> {and a lot more information -- the whole topic is far from trivial}.
> 
> Now, indirectly you would want R to be "smart", i.e. the
> as.POSIXlt.character() method "guess better" about what the
> user wants. ...
> ... and I agree that is not an unreasonable expectation, e.g.,
> for your example of wanting 
> 
>c("1975-01-01", "1975-01-01 15:27:00")
> 
> to  "work".
> 
> as.POSIXlt.character() is well documented to be trying all of
> the `tryFormats` in order, until it finds one that works for all
> vector components (or fail / use NA if none works);
> and here it's only a format which drops the time that works for
> all (i.e. both, in the example).
> 
> { Even though its behavior is well documented,
>  one could even argue that by default you'd want a warning in
>  such a case where "so much" is lost.
>  I think however that introducing such a warning  may trip too
>  much current code relying .. also, the extra *checking* maybe
>  somewhat costly .. (?)   anyway that's an interesting side topic
> }
> 
> Instead what you want here is for each string (element of the
> character vector) to try the `tryFormats and using the best
> available *individually*  {smart R users ==> "think lapply(.)"} :
> Currently, this would b

Re: [Rd] R 4.3: Change in behaviour of as.character.POSIXt for datetime values with midnight time

2023-08-15 Thread Tim Taylor

Many thanks Martin!

I was completely overlooking the behaviour for a length 1 vector with 
00:00:00. More coffee needed for me I think.


Best

Tim


On 15/08/2023 08:58, Martin Maechler wrote:

Tim Taylor
 on Mon, 14 Aug 2023 12:26:51 +0100 writes:

 > Martin,
 > Thank you. Everything you have written is helpful and I admit I am 
likely guilty of using as.character() instead of format() in the past().

 > Ignoring the above though, one thing I’m still unclear on is the special 
handling of zero (or rather non-zero time) seconds in the method. Is the 
motivation that as.character() outputs the minimum necessary information? It is 
clearly a very deliberate choice but the reasoning is still going a little over my 
head.

 > Best
 > Tim

Hmm, I really don't understand what you don't understand.
Here's some annotated R code exemplifying that indeed now,
 as.character(x)[j] === as.character(x[j])
but previously that was not fulfilled  {when  as.character() was
the same as format() for POSIXct or POSIXlt}:

##-
x0 <- c("1975-01-01 00:00:00", "1975-01-01 15:27:00")
t0 <- as.POSIXct(x0)
str(t0) #  POSIXct[1:2], format: "1975-01-01 00:00:00" "1975-01-01 15:27:00"
t0#  "1975-01-01 00:00:00 CET" "1975-01-01 15:27:00 CET"
t0[1] #  "1975-01-01 CET" <-- yes, *no* 00:00:00   in no version of R

## In R <= 4.2.x  as.character() was using format() for POSIX{ct,lt} :
as.character(t0)# "1975-01-01 00:00:00" "1975-01-01 15:27:00" << for R <= 
4.2.x
as.character(t0)# "1975-01-01"  "1975-01-01 15:27:00" << for R >= 
4.3.0
as.character(t0[1]) # "1975-01-01"  {in all versions of R}


Note that indeed   as.character()  does drop redundant trailing 0s :

   > as.character(c(0.5, 0.75, pi))
   [1] "0.5"  "0.75" "3.14159265358979"

whereas format() does not (ensuring resulting strings of the same nchar(.)):

   > format(  c(0.5, 0.75, pi))
   [1] "0.50" "0.75" "3.141593"



 >> On 14 Aug 2023, at 09:52, Martin Maechler  
wrote:
 >>
 >> 
 >>>
 >>>>>>> Andy Teucher
 >>>>>>> on Fri, 11 Aug 2023 16:07:36 -0700 writes:
 >>
 >>> I understand that `as.character.POSIXt()` had an overhaul in R 4.3 
(https://github.com/wch/r-source/commit/f6fd993f8a2f799a56dbecbd8238f155191fc31b), and I 
have come across a new behaviour and I wonder if it is unintended?
 >>
 >> Well, as the NEWS entry says
 >> (partly visible in the url above -- which only shows one part of
 >> the several changes for R 4.3) :
 >>
 >> • as.character() now behaves more in line with the methods
 >> for atomic vectors such as numbers, and is no longer influenced
 >> by options().  Ditto for as.character().  The
 >> as.character() method gets arguments digits and OutDec with
 >> defaults _not_ depending on options().  Use of as.character(*,
 >> format = .) now warns.
 >>
 >> It was "inconsistent" to have  as.character(.) basically use format(.) 
for
 >> these datatime objects.
 >> as.character(x) for basic R types such as numbers, strings, logicals,...
 >> fulfills the important property
 >>
 >> as.character(x)[j] === as.character(x[j])
 >>
 >> whereas that is very much different for format() where indeed,
 >> the formatting  of  x[1]  may quite a bit depend on the other
 >> x[j]'s values:
 >>
 >>> as.character(c(1, pi, pi/2^20))
 >> [1] "1""3.14159265358979"   "2.99605622633914e-06"
 >>
 >>> format(c(1, pi, pi/2^20))
 >> [1] "1.00e+00" "3.141593e+00" "2.996056e-06"
 >>> format(c(1, pi))
 >> [1] "1.00" "3.141593"
 >>> format(c(1, 10))
 >> [1] " 1" "10"
 >>>
 >>
 >>
 >>> When calling `as.character.POSIXt()` on a vector that contains 
elements where the time component is midnight (00:00:00), it drops the time component of 
that element in the resulting character vector. Previously the time component was 
retained:
 >>
 >>> In R 4.2.3:
 >>
 >>> ```
 >>> R.version$version.string
 >>> #> [1] "R version 4.2.3 (2023-03-15)"
 >>
 >>> (t <- as.POSIXct(c("1975-01

[Rd] Regenerate news feeds?

2023-11-17 Thread Tim Taylor
The news feeds (e.g. 
https://developer.r-project.org/blosxom.cgi/R-devel/NEWS) have some 
stray "\abbr" floating around. Do they need generating with a more 
recent version of R-devel?


I've run tools::Rd2txt on https://svn.r-project.org/R/trunk/doc/NEWS.Rd 
and r85550 does seem to remove these abberations (compared to the same 
function calling on an unpatched 4.3.2 where they remain).


Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.matrix.dist patch (performance)

2024-01-16 Thread Tim Taylor
Cheers Ivan

Heather Turner has improved in it further in the R contributors slack. I’ll 
take another look and ensure something is added to Bugzilla, with attribution, 
in the next few days.

Tim

> On 16 Jan 2024, at 14:36, Ivan Krylov  wrote:
> 
> Dear Tim,
> 
> В Thu, 10 Aug 2023 22:38:44 +0100
> Tim Taylor  пишет:
> 
>> Submitting here in the first instance but happy to move to Bugzilla
>> if more appropriate.
> 
> It's a fine patch. The 1.7 times speed up from not transposing the
> return value shouldn't be sneezed at. I think it's time to move it to
> Bugzilla so that it won't be completely forgotten.
> 
> --
> Best regards,
> Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] strcapture performance when perl = TRUE

2024-01-29 Thread Tim Taylor
I wanted to raise the possibility of improving strcapture performance in
cases where perl = TRUE. I believe we can do this in a non-breaking way
by calling regexpr instead of regexec (conditionally when perl = TRUE).
To illustrate this I've put together a 'proof of concept' function called
strcapture2 that utilises output from regexpr directly (following a very
nice substring approach that I've seen implemented by Toby Hocking
in the nc package - nc::capture_first_vec).

strcapture2 <- function(pattern, x, proto, perl = FALSE, useBytes = FALSE) {
if (isTRUE(perl)) {
m <- regexpr(pattern = pattern, text = x, perl = TRUE, useBytes = 
useBytes)
nomatch <- is.na(m) | m == -1L
ntokens <- length(proto)
if (any(!nomatch)) {
length <- attr(m, "match.length")
start <- attr(m, "capture.start")
length <- attr(m, "capture.length")
end <- start + length - 1L
end[nomatch, ] <- start[nomatch, ] <- NA
res <- substring(x, start, end)
out <- matrix(res, length(m))
if (ncol(out) != ntokens) {
stop("The number of captures in 'pattern' != 'length(proto)'")
}
} else {
out <- matrix(NA_character_, length(m), ntokens)
}
utils:::conformToProto(out,proto)
} else {
strcapture(pattern,x,proto,perl,useBytes)
}
}

Now comparing with strcapture we can expand the named capture example
from the grep documentation:

notables <- c(
"  Ben Franklin and Jefferson Davis",
"\tMillard Fillmore",
"Bob",
NA_character_
)

regex <- "(?[[:upper:]][[:lower:]]+) (?[[:upper:]][[:lower:]]+)"
proto = data.frame("", "")

(strcapture(regex, notables, proto, perl = TRUE))
  X..X...1
1 Ben Franklin
2 Millard Fillmore
3 
4 

(strcapture2(regex, notables, proto, perl = TRUE))
  X..X...1
1 Ben Franklin
2 Millard Fillmore
3 
4 

Now to compare timings over multiple reps:

lengths <- sort(outer(c(1, 2, 5), 10^(1:4)))
reps <- 20 

time_strcapture <- function(text, length, regex, proto, reps) {
text <- rep_len(text, length)
str <- system.time(for (i in seq_len(reps)) strcapture(regex, text, proto, 
perl = TRUE))
str2 <- system.time(for (i in seq_len(reps)) strcapture2(regex, text, 
proto, perl = TRUE))
c(strcapture = str[["user.self"]], strcapture2 = str2[["user.self"]])
}
timings <- sapply(
lengths,
time_strcapture,
text = notables, regex = regex, reps = reps, proto = proto
)
cbind(lengths, t(timings))
  lengths strcapture strcapture2
 [1,]  10  0.005   0.003
 [2,]  20  0.005   0.002
 [3,]  50  0.008   0.003
 [4,] 100  0.012   0.002
 [5,] 200  0.021   0.003
 [6,] 500  0.051   0.003
 [7,]1000  0.097   0.004
 [8,]2000  0.171   0.005
 [9,]5000  0.517   0.011
[10,]   1  1.203   0.018
[11,]   2  2.563   0.037
[12,]   5  7.276   0.090

I've attached a plot of these timings in case helpful.

I appreciate that changing strcapture in this way does make it more
complicated but I think the performance improvements make it worth
considering. Note that I've not thoroughly tested the above implementation
as wanted to get feedback from the list before proceeding further.

Hope all this make sense. Cheers

Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.4.0 has version of Matrix 1.7-0, but it's not available on CRAN

2024-04-26 Thread Tim Taylor
On Fri, 26 Apr 2024, at 11:32 AM, Martin Maechler wrote:
>> Gábor Csárdi 
>> on Fri, 26 Apr 2024 11:55:36 +0200 writes:
>
> > I don't know if this is a bug, but it is certainly weird. AFAICT R
> > 4.4.0 has Matrix 1.7-0.
>
> Yes, it *is* available from CRAN:  You can see it when looking into the
>
> 4.4.0/   , specifically the
> 4.4.0/Recommended/ sub directory.
>
> Recommended packages should be built as part of R
> unless *you* really want to *not* get them by choosing at
> configure time, not to get them via extra flag
>   --without-recommended-packages.
>
> So, well, you got what you wanted.
>
> > However, currently CRAN has
>
> > Package: Matrix
> > Version: 1.6-5
> > Priority: recommended
> > Depends: R (>= 3.5.0), methods
> > ...
> .
> Yes, because it has to provide Matrix to R versions before 4.4.0
> and Matrix 1.7-0 has  'Depends: R (>= 4.4.0)'
>
> > (plus another version for R >= 4.5.0 only).
>
> > Which has some weird consequences, e.g. if I have an R 4.4.0
> > installation without the recommended packages, 
>
> (why would you explicitly choose *not* to have the
>  recommended packages when they *are* recommended .. :-b )
>   ...

Hi Martin.

I appreciate the efforts you are going to balance these Matrix updates across 
CRAN versions. Related, but a little tangential, to the installation situation, 
I'd still expect the canonical CRAN link 
(https://cran.r-project.org/package=Matrix) to provide links to the *current* 
version.  Currently links to a source tarball and the reference manual is for 
1.6.5 (I'm guessing vignettes are also from 1.6.5.).

Best

Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Non-API updates

2024-06-25 Thread Tim Taylor



On 25/06/2024 17:25, Josiah Parry wrote:

With respect to NOTES and WARN on CRAN, these do not result in any
package maintainer notifications. The only notification that the developers
receive is the threatening one that states that the packages will be
removed
from CRAN with a very short timeline.


I'd recommend regularly checking 
https://cran.r-project.org/web//checks/check_results_josiah.parry_at_gmail.com.html 
on a regular basis or better still automating said checking. I use 
dang::checkCRANStatus() in my .Rprofile  to stay up to date ...


R> dang::checkCRANStatus("josiah.pa...@gmail.com")
    Package WARN NOTE OK
1    arcgis   13
2 arcgisgeocode 8  5
3  arcgislayers   13
4  arcgisplaces 9  4
5   arcgisutils    13
6    arcpbf    2    7  4
7   b64    2    7  4
8 rsgeo    2    7  4
9   trendyy   13


Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Consider exporting some of the .Rd_get_xxx functions in tools

2024-07-31 Thread Tim Taylor
Would R-core consider exporting some of the .Rd_get_ family of functions 
within tools (e.g. tools:::.Rd_get_section(), tools:::.Rd_get_metadata())?


Whilst these are currently internal there is some use of them within 
documentation (e.g. help("Rd_db") uses tools:::.Rd_get_metadata() in 
it's examples) which hopefully gives some justification for exporting.


Best

Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Consider getNamespaceVersion() returning a numeric_version

2024-10-17 Thread Tim Taylor
I mean the `numeric_version` object not a numeric (double/int). 
Basically to protect me from myself I'd prefer not to have to remember 
to wrap `getNamespaceVersion()` with `as.package_version()`.


I suspect a grep of CRAN may highlight others who are erroneously 
comparing character objects rather than a comparison between a 
`numeric_version` object and a character.


Tim


On 17/10/2024 13:22, Dirk Eddelbuettel wrote:

On 17 October 2024 at 12:38, Tim Taylor wrote:
| Would R-Core be receptive to having getNamespaceVersion() return a
| numeric_version object instead of a named character?

Is this good enough? What's your actual issue a 'numeric' would address?

> as.package_version(getNamespaceVersion("base")) < "4.5.0"
[1] TRUE
>
> as.package_version(getNamespaceVersion("Rcpp")) > "1.0.11"
[1] TRUE
> as.package_version(getNamespaceVersion("Rcpp")) > "1.0.14"
[1] FALSE
>

Dirk



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Consider getNamespaceVersion() returning a numeric_version

2024-10-17 Thread Tim Taylor
Would R-Core be receptive to having getNamespaceVersion() return a 
numeric_version object instead of a named character?


Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Consider getNamespaceVersion() returning a numeric_version

2024-10-17 Thread Tim Taylor

On 17/10/2024 15:53, Prof Brian Ripley wrote:

On 17/10/2024 13:42, Tim Taylor wrote:
I mean the `numeric_version` object not a numeric (double/int). 
Basically to protect me from myself I'd prefer not to have to 
remember to wrap `getNamespaceVersion()` with `as.package_version()`.


I suspect a grep of CRAN may highlight others who are erroneously 
comparing character objects rather than a comparison between a 
`numeric_version` object and a character.


Perhaps you could do that rather than speculating?  Similarly, try out 
over CRAN the effect of getNamespaceVersion changing its return type.


It seems to be used in less than 40 CRAN packages, many boilerplate 
code from a single author and most use the version as a printable 
character string.  A few are clearly wrong: E.g.


if(getNamespaceVersion("reticulate") >= "1.36.0")

will be false it that package ever reaches "1.100.0".  This is what 
compareVersion() is for 


I've raised an issue with the package authors of that particular piece 
of code. I'll try and see what other packages are making similar mistakes.





On 17/10/2024 13:22, Dirk Eddelbuettel wrote:

On 17 October 2024 at 12:38, Tim Taylor wrote:
| Would R-Core be receptive to having getNamespaceVersion() return a
| numeric_version object instead of a named character?

Is this good enough? What's your actual issue a 'numeric' would 
address?


    > as.package_version(getNamespaceVersion("base")) < "4.5.0"
    [1] TRUE
    >
    > as.package_version(getNamespaceVersion("Rcpp")) > "1.0.11"
    [1] TRUE
    > as.package_version(getNamespaceVersion("Rcpp")) > "1.0.14"
    [1] FALSE


There are differences, e.g.

> (z <- getNamespaceVersion("MASS"))
 version
"7.3-61"
> (zz <- as.package_version(z))
[1] ‘7.3.61’
> as.character(zz)
[1] "7.3.61"

and some uses need the first.  That makes changing the return value 
too disruptive.


If the issue is only comparison, getNamespaceVersion's return value 
could be given a class and an Ops group method, but the existence of 
compareVersion() makes that less compelling.


Yes the issue for me was with a comparison. I think the additional class 
and Ops method could be worthwhile to prevent others from making a 
similar mistake to myself. That said I do appreciate it does add more 
code when there are already alternatives available. If you'd be 
receptive I'd be happy to submit a patch in this regards.


Many thanks

Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug Report: Incorrect precedence between / and %/% in R 4.4.1

2025-01-31 Thread Tim Taylor
The higher precedence of %/% is documented in ?Syntax. Did something in 
particular make you think that it had the same precedence?

Tim
> On 31 Jan 2025, at 13:37, Lionel Fotie via R-devel  
> wrote:
> 
> Dear R Development Team,
> 
> I have encountered an unexpected behavior in R 4.4.1 regarding the precedence 
> of / and %/%.
> 
> Steps to reproduce:
> print(2 * 10 / 2 %/% 50)
> 
> Expected result:
> Since *, / and %/% have the same precedence, evaluation should be 
> left-to-right:
> (2 * 10) / 2 %/% 50  # Expected: 0
> 
> Actual result:
> [1] Inf
> 
> Workaround:
> Adding explicit parentheses fixes the issue:
> print(((2 * 10) / 2) %/% 50)  # Returns 0 (as expected)
> 
> This suggests that %/% is evaluated before /, contradicting the expected 
> left-associativity.
> Could you please confirm if this is a known issue ?
> 
> Best regards,
> 
> LIONEL FOTIE
> Data Scientist
> -
> GroupM Germany GmbH
> -
> Office: +49211 81991563
> Völklinger Straße 33 | 40221 Düsseldorf | Germany
> -
> Join us: Karriere  | 
> www.groupm.de
> Follow us: Facebook | 
> X | 
> Xing | 
> LinkedIn | 
> Instagram
> 
> 
>[[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] binary R packages for GNU/Linux

2025-02-11 Thread Tim Taylor
On Tue, 11 Feb 2025, at 8:07 AM, Iñaki Ucar wrote:
> On 10 February 2025 at 23:19, Jeroen Ooms wrote:
>> some people prefer installing binaries via apt rather than
>> install.packages(), which is all fine, but methods both have pros and
>> cons.
>
> Some people prefer having all their binaries *managed* by apt/dnf, but
> still using install.packages() to trigger that work. I count myself
> there, and that's what I built.

Ditto. It really is automagical!

FWIW - I think this integration with the distributions is key to a painless 
experience. Perhaps a better question is to ask Dirk, Iñaki et al. what their 
pain points are and whether they need any additional assistance? E.g. I'm aware 
of discussions around System Requirements that could help:

https://bugs.r-project.org/show_bug.cgi?id=18586

Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] request for discussion on lonely doc patch suggestion

2025-03-24 Thread Tim Taylor
FWIW, on the command line I’m a happy 'delta' user for a quick side by side 
comparison (https://github.com/dandavison/delta)

>  On 24 Mar 2025, at 19:32, J C Nash  wrote:
> 
> For Linux users, meld is quite nice for side by side editing, though I've 
> never tried using it for
> display. Just checking now suggests it isn't obvious how to "print" side by 
> side display.
> 
> I've made meld easier for my own use by creating an icon in Double Commander 
> (DC allows
> the user to create iconized links to scripts and programs). There are two 
> panes in the DC
> file manager. I highlight one file in each then click. This saves typing two 
> full paths in
> a command
> 
>   meld  path/to/file1 path/to/file2
> 
> I suspect the highlight and click makes my use of meld reasonably attractive. 
> I'm not sure
> I'd use it in the raw command line mode.
> 
> Like Duncan, I welcome suggestions for similar tools, especially if there's a 
> display option.
> 
> John Nash
> 
>> On 2025-03-24 15:21, Duncan Murdoch wrote:
>> I sent some comments directly to Ben.  I just want to reply publicly to this 
>> part:
>>> On 2025-03-24 1:18 p.m., Ben Bolker wrote:
>>> The patch file is attached (also available at bugzilla, if it doesn't
>>> get through to the list). I find the patch format a little hard to read,
>>> so I'm reproducing just the *new* text below.
>> I agree absolutely about the lack of readability of patch files.  A side by 
>> side display is much nicer.  If anyone out there isn't using one, you should.
>> I really like the one I use ("Beyond Compare"), but it's not open source.  
>> I've been using it for a very long time (20 years or more, I think), and I 
>> suspect there are very good open source competitors out there now (and may 
>> have been for all the time I've been using BC). Suggestions?
>> Duncan Murdoch
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check and CRAN's Rust policy

2025-03-31 Thread Tim Taylor


> To Tim's comment—the check is a simple grep of the installation log for
> "Downloading crates." This could be easily circumvented on CRAN and locally
> by suppressing stdout/err. But that would be adversarial and I would like
> to adhere to the intent of the check.

Josiah - I do sympathise but, irrespective of this particular check, this 
highlights an inflexibility in your CI setup to handle different warnings as 
you wish. It is not adversarial to not fail on a warning produced by R CMD 
check, within your own CI. 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check and CRAN's Rust policy

2025-03-31 Thread Tim Taylor
On Mon, 31 Mar 2025, at 4:50 PM, Josiah Parry wrote:
> Following up with this as I address the new R-devel "Compiled code should
> not call entry points which might terminate R" WARNING and this issue has
> reared its head again.
>
> Would a path forward be an environment variable similar
> to _R_CHECK_CRAN_INCOMING_ to skip this check primarily for GitHub Actions
> and CI?
>

If this is primarily about CI then can you tweak your scripts not to fail on 
that particular warning? If you are using the r-lib/actions then I believe they 
utilise rcmdcheck (https://cran.r-project.org/package=rcmdcheck) which does 
give an output object you can work with.

Tim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel