Dear Mike, et al, My remarks are not necessarily related to tidyverse packages. The main point is that there are various purposes and business cases for writing code, and they may imply different trade-offs. Let me illustrate with some examples. I will focus on non-standard evaluation and dependencies.
TL;DR version: (and this is my opinion, nobody has to agree). 1/Interactive use: user-level NSE ok (as in the not-a-pipe operator, dplyr verbs), use any package you want. 2/Applications & local packages: avoid NSE within functions, package an application with dependencies you need, write code with maintainers in mind. 3/Published R-packages: avoid NSE within functions, minimize dependencies to what you cannot avoid. Do Read version: 1/ One-off data analyses or exploratory data analyses. There are cases where you don't need to guarantee that your code will run a few years from now: you are the only user and once your task is done, you quickly need to move on to the next. Especially in EDA, I write a lot of code that is nice to keep in a structured project folder but most probably: 1) I will be its only user and 2) I will use it only for this one small project so maintenance is not an issue. Although I'm writing code in scripts, it is very close to interactive work on the command-line. In such cases I use whatever gets the job done, including dplyr, tidyr, ggplot2, data.table, you name it. Here I basically don't care about dependencies and if I write functions there are usually not many of them. 2/ Writing applications or packages for internal use. When you write an application you are usually committing to a longer maintenance horizon and more than one user. Good chance that you're not the user and also good chance you're not the only developer. There are many implications to this but since you need to maintain things for a longer term, dependencies can become a liability. Fortunately, there are techniques to contain dependencies, for example using packrat or by manually setting up a library containing the packages your application depends on. You can even use a docker instance. I have worked with custom libraries on several occasions. Since you (or someone else) is going to maintain the application, it is worth while to sit down and think what is the best way to set up code so it remains maintainable. This includes questions like: can I easily understand what happens when reading it? What expertise does the maintainer need to understand it? Non-standard evaluation is generally much harder to reason about than standard evaluated code. This makes debugging and extending code harder in general. Now some people will argue that something like filter(data, x>1) is easier to understand than data[data$x > 1,,drop=FALSE]. I agree that on a very shallow level, filter(data, x>1) is easy to follow, in the sense of "oh the author probably wants to filter something here". But when you are debugging, you need to understand in much greater detail what happens: you need to know that 'x>1' is an expression, that will be evaluated in the context of 'data'. You need to know about environments and parent environments and so on. All this knowledge can be avoided with data[data$x > 1,,drop=FALSE]. The latter also requires knowledge, but the concepts are much simple I think. Hence, I tend to avoid NSE when writing applications, although there may still be good reasons to do it. Dependencies can be containered in various ways so they are not such a big problem. 3/ Writing packages for CRAN. Now you are committing to long-term maintenance, and usage by interactive users, application builders, and possibly other package builders. Now a dependency becomes a direct liability in the sense that the author of your dependency can change interfaces and ask you to comply to the new version. Also, and especially because of recursive dependencies, importing a package may give you a whole tail of dependencies. This increases load time but also install-time, especially on systems where you need to install from source. Light-weight packages therefore have real advantages in applications that run many times (like a standalone script that is fired by users of a web-application or scripts that are scheduled to run in high frequency). It is also worth mentioning that an Imports or Depends puts a burden on the maintainer of the package you depend on: before submitting to CRAN, a pkg developer needs to check against all reverse dependencies (preferably recursively). So now, it is even more worth while to sit down and think about what is the best way to set up your code. Well thought out code can be a pleasure to maintain. Code that is hastily put together is a nightmare. My philosophy is as follows: I depend other packages only when they offer something that I cannot fairly trivially do myself. This may have to do with a statistical or numerical method I do not want or cannot implement, or it can have something to do with performance for example. This does indeed exclude much of the tidyverse almost automatically. Many tools in tidyverse make already existing functionality easier for (interactive) use. But since much of the functionality is already present in base R, and because I find NSE hard to reason about in a programming context I have until now not used any tidyverse packages as an Imports or Depends. Hope this helps, Best, Mark Op di 17 jul. 2018 om 23:10 schreef Michael Hannon < jmhannon.ucda...@gmail.com>: > Thanks, Mark. Your points are well-taken, but I wouldn't refer to > this as a "small side-track". You don't say so, but this could be > interpreted as a recommendation to avoid some or all of the > "tidyverse" in developing packages. I'm actually quite comfortable > doing the base-R-style programming you recommend. I've lately being > trying to make a point of using the "tidy" stuff, as that's what I'm > seeing almost exclusively from folks in my neighborhood these days. > ("Resistance is few-tile...") > > Also, it would seem to be a corollary that if the ultimate goal is to > make a package, then one shouldn't be using the convenience stuff > (pipes, dplyr, etc., etc.), even during the development stages. Can > you comment? Thanks. > > -- Mike > > > On Tue, Jul 17, 2018 at 2:53 AM, Mark van der Loo > <mark.vander...@gmail.com> wrote: > > Michael, > > > > Just a small side-track here. I would avoid using the not-a-pipe operator > > within functions or packages in general. It is great for interactive use, > > but it does make debugging and hence long-term maintenance of functions > > harder. There are two reasons for this. First, it hides intermediate > > results, and second, it adds several layers to the call stack making the > > output of functions like traceback() harder to interpret. I have > documented > > a simple example here: https://github.com/chriscardillo/norris/issues/1 > > (scroll down a bit). > > > > Regarding learning about quosures and so on. If the literal names of data > > frames are known, you could consider replacing > > > > some_var <- next_data_frame %>% dplyr::select(-amount,... > > > > with something simpler like > > > > some_var <- next_data_frame[ names(next_data_frame) != c("amount", ... ) > ] > > > > which might also save you some dependencies. > > > > > > > > > > Hope this helps, > > Best, > > Mark > > > > > > > > Op di 17 jul. 2018 om 11:28 schreef Michael Hannon > > <jmhannon.ucda...@gmail.com>: > >> > >> Thanks to John and Zhian for their recent and informative comments. > >> > >> Regarding check() and NSE: the moral seems to be that a little > >> learning is a dangerous thing. I'm off to try to bring quosure to > >> this issue. > >> > >> -- Mike > >> > >> > >> On Mon, Jul 16, 2018 at 2:38 PM, Zhian Kamvar <zkam...@gmail.com> > wrote: > >> > Using dplyr like that is for exploratory data analysis. You'll want to > >> > refer > >> > to dplyr's "Programming with dplyr" vignette for using dplyr in a > >> > package: > >> > > >> > > https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html > >> > > >> > Hope that helps. > >> > > >> > On Jul 16, 2018, at 22:13 , Michael Hannon < > jmhannon.ucda...@gmail.com> > >> > wrote: > >> > > >> > Thanks, Georgi. I've changed my approach and now do what I gather is > >> > recommended practice: put all external package names into the > >> > "Imports" section of the DESCRIPTION file and then use the > >> > fully-qualified names for functions from those packages, as: > >> > > >> > dplyr::select() > >> > > >> > The "check" operation is still not entirely "happy" with me, but it > >> > doesn't flag any errors, and the package builds and runs. > >> > > >> > BTW, one source of "complaints" from "check()" is evidently the use of > >> > NSE in the tidyverse functions. For instance, the line: > >> > > >> > next_data_frame %>% dplyr::select(-amount, > >> > > >> > generates the message: > >> > > >> > standardize_format: no visible binding for global variable ‘amount’ > >> > > >> > where, of course, "amount" is one of the column headings in > >> > "next_data_frame". There seems to be no harm done by this, and I plan > >> > to ignore such messages, but if there's some additional wisdom that > >> > applies here, I'd be happy to receive it. > >> > > >> > -- Mike > >> > > >> > > >> > On Sun, Jul 15, 2018 at 12:05 AM, Georgi Boshnakov > >> > <georgi.boshna...@manchester.ac.uk> wrote: > >> > > >> > > >> > It seems that the R session used by 'check' doesn't look in the > library > >> > used > >> > by your interactive session. This discrepancy may happen since the > check > >> > tools do not load the same Renviron files as interactive sessions. > This > >> > may > >> > result in different libraries in interactive and 'check' sessions. See > >> > ?Startup, especially section Note. > >> > It is difficult to give more specific advice without details of your > >> > setup. > >> > > >> > > >> > Hope this helps, > >> > Georgi Boshnakov > >> > > >> > > >> > ________________________________________ > >> > From: R-package-devel [r-package-devel-boun...@r-project.org] on > behalf > >> > of > >> > Michael Hannon [jmhannon.ucda...@gmail.com] > >> > Sent: 15 July 2018 02:13 > >> > To: r-package-devel@r-project.org > >> > Subject: [R-pkg-devel] Package builds, installs, and runs but does not > >> > pass > >> > devtools::check() > >> > > >> > Greetings. I'm working on a small package, and I'm using the devtools > >> > functions to create, build, etc., the package. > >> > > >> > As indicated in the subject line, I get no errors when I do: > >> > > >> > build() > >> > install() > >> > > >> > > >> > When I run a separate R session and load the package, i.e., > >> > > >> > library(my_pkg) > >> > > >> > > >> > the package loads without error, and the two exported functions appear > >> > to work as advertised. > >> > > >> > OTOH, if I include devtools::check() in the construction of the > >> > package, I consistently get an error: > >> > > >> > * installing *source* package ‘my_pkg’ ... > >> > ** R > >> > ** preparing package for lazy loading > >> > Error in loadNamespace(from, lib.loc = .library) : > >> > there is no package called ‘dplyr’ > >> > Error : unable to load R code in package 'my_pkg' > >> > > >> > Clearly there *is* a package called "dplyr" on my system (see the > >> > session info below, for instance). And, as I've mentioned, the code > >> > *does* run, and I can watch it successfully reading CSV files. > >> > > >> > Here's the relevant part of my DESCRIPTION file: > >> > > >> > Depends: R (>= 3.4.4) > >> > Imports: readr, > >> > dplyr, > >> > ggplot2, > >> > purrr, > >> > magrittr > >> > > >> > I suspect the problem may be that I'm misunderstanding something about > >> > the `import::from()` function, which I'm using for the first time to > >> > load required functions into my code. In each of the three files that > >> > use dplyr I have the line: > >> > > >> > import::from(dplyr, mutate, filter, rename, select, setdiff, slice, > >> > "%>%") > >> > > >> > I've tried: > >> > > >> > (1) putting that line in just one of the files (the lexically first > >> > one) > >> > (2) including different subsets of dplyr functions, as needed, in > >> > the various files > >> > > >> > Needless to say, I haven't seen any improvement with any of the above > >> > (or any of the other thrashing I've done). > >> > > >> > If you can point me in the right direction, I'd appreciate it. > Thanks. > >> > > >> > -- Mike > >> > > >> > > >> > session_info() > >> > > >> > Session info > >> > ------------------------------------------------------------------ > >> > setting value > >> > version R version 3.4.4 (2018-03-15) > >> > system x86_64, linux-gnu > >> > ui X11 > >> > language en_US > >> > collate en_US.UTF-8 > >> > tz America/Los_Angeles > >> > date 2018-07-14 > >> > > >> > Packages > >> > ---------------------------------------------------------------------- > >> > package * version date source > >> > assertthat 0.2.0 2017-04-11 CRAN (R 3.3.3) > >> > base * 3.4.4 2018-03-16 local > >> > bindr 0.1.1 2018-03-13 CRAN (R 3.4.3) > >> > bindrcpp 0.2.2 2018-03-29 CRAN (R 3.4.4) > >> > compiler 3.4.4 2018-03-16 local > >> > crayon 1.3.4 2017-09-16 CRAN (R 3.4.1) > >> > datasets * 3.4.4 2018-03-16 local > >> > devtools * 1.13.6 2018-06-27 CRAN (R 3.4.4) > >> > digest 0.6.15 2018-01-28 CRAN (R 3.4.3) > >> > dplyr * 0.7.6 2018-06-29 CRAN (R 3.4.4) > >> > glue 1.2.0 2017-10-29 CRAN (R 3.4.2) > >> > graphics * 3.4.4 2018-03-16 local > >> > grDevices * 3.4.4 2018-03-16 local > >> > magrittr 1.5 2014-11-22 CRAN (R 3.2.2) > >> > memoise 1.1.0 2017-04-21 CRAN (R 3.3.3) > >> > methods * 3.4.4 2018-03-16 local > >> > pillar 1.3.0 2018-07-14 CRAN (R 3.4.4) > >> > pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.0) > >> > purrr 0.2.5 2018-05-29 CRAN (R 3.4.4) > >> > R6 2.2.2 2017-06-17 CRAN (R 3.4.0) > >> > Rcpp 0.12.17 2018-05-18 CRAN (R 3.4.4) > >> > rlang 0.2.1 2018-05-30 CRAN (R 3.4.4) > >> > stats * 3.4.4 2018-03-16 local > >> > tibble 1.4.2 2018-01-22 CRAN (R 3.4.3) > >> > tidyselect 0.2.4 2018-02-26 CRAN (R 3.4.3) > >> > utils * 3.4.4 2018-03-16 local > >> > withr 2.1.2 2018-03-15 CRAN (R 3.4.3) > >> > > >> > > >> > > >> > ______________________________________________ > >> > R-package-devel@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-package-devel > >> > > >> > > >> > ______________________________________________ > >> > R-package-devel@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-package-devel > >> > > >> > > >> > >> ______________________________________________ > >> R-package-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-package-devel > [[alternative HTML version deleted]] ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel