Re: [R] Fwd: Wanted to learn R Language

2017-11-30 Thread Hadley Wickham
Or try haven::read_xpt():


On Thu, Nov 30, 2017 at 3:13 PM, Jim Lemon  wrote:
> Hi SAS_learner,
> Have a look at the read.xport function in the foreign package.
> Jim
> On Fri, Dec 1, 2017 at 7:50 AM, SAS_learner  wrote:
>> Hello all ,
>> I am a SAS user for a while and wanted to learn to program in R . My
>> biggest hurdle to start, is to get the data (I work in clinical domain
>> ) that too inside VPN secured access. The only way I can learn during
>> my work time is create my own data frames and create programs that can
>> be used for data ( either SDTM or AdAM data ) validation or checking
>> Table counts . For this I need to imitate the clinical data structure
>> . If there is any place or a package that help to start. I have couple
>> of dummy SAS datasets in my work area , but not sure how can I can
>> access them . Can anybody help me . Thanks ahead .
>> __
>> mailing list -- To UNSUBSCRIBE and more, see
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem installing libxml2 under Homebrew

2018-02-17 Thread Hadley Wickham
Is there a reason that you can't use the binary provided by CRAN?
That's the easiest way to get xml2.

On Sat, Feb 17, 2018 at 1:53 AM, Peter Meilstrup
> i am trying to install xml2 from CRAN, and it is throwing an error
> that it cannot find the libxml2 library configuration.
> The thing is that pkg-config seems to be set up correctly:
> :/usr/local/opt/libxml2/lib/pkgconfig:/usr/local/opt/libxml2/lib/pkgconfig
> $ pkg-config --cflags --libs libxml-2.0
> -I/usr/local/Cellar/libxml2/2.9.7/include/libxml2
> -L/usr/local/Cellar/libxml2/2.9.7/lib -lx
> Output of install.packages:
>> install.packages("xml2")
> --- Please select a CRAN mirror for use in this session ---
> trying URL ''
> Content type 'application/x-gzip' length 251614 bytes (245 KB)
> ==
> downloaded 245 KB
> * installing *source* package ‘xml2’ ...
> ** package ‘xml2’ successfully unpacked and MD5 sums checked
> Found pkg-config cflags and libs!
> Using PKG_CFLAGS=-I/usr/include/libxml2
> Using PKG_LIBS=-L/usr/lib -lxml2 -lz -lpthread -licucore -lm
> Configuration failed because libxml-2.0 was not found. Try installing:
>  * deb: libxml2-dev (Debian, Ubuntu, etc)
>  * rpm: libxml2-devel (Fedora, CentOS, RHEL)
>  * csw: libxml2_dev (Solaris)
> If libxml-2.0 is already installed, check that 'pkg-config' is in your
> PATH and PKG_CONFIG_PATH contains a libxml-2.0.pc file. If pkg-config
> is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
> R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
> ERROR: configuration failed for package ‘xml2’
> Homebrew package info:
> dekkera:pkgconfig peter$ brew info libxml2 R
> libxml2: stable 2.9.7 (bottled), HEAD [keg-only]
> GNOME XML library
> /usr/local/Cellar/libxml2/2.9.7 (281 files, 10.4MB)
>   Poured from bottle on 2018-02-16 at 22:42:54
> From:
> ==> Options
> --HEAD
> Install HEAD version
> ==> Caveats
> This formula is keg-only, which means it was not symlinked into /usr/local,
> because macOS already provides this software and installing another version in
> parallel can cause all kinds of trouble.
> If you need to have this software first in your PATH run:
>   echo 'export PATH="/usr/local/opt/libxml2/bin:$PATH"' >> ~/.bash_profile
> For compilers to find this software you may need to set:
> LDFLAGS:  -L/usr/local/opt/libxml2/lib
> CPPFLAGS: -I/usr/local/opt/libxml2/include
> For pkg-config to find this software you may need to set:
> PKG_CONFIG_PATH: /usr/local/opt/libxml2/lib/pkgconfig
> If you need Python to find bindings for this keg-only formula, run:
>   echo /usr/local/opt/libxml2/lib/python2.7/site-packages >>
> /usr/local/lib/python2.7/site-packages/libxml2.pth
>   mkdir -p /Users/peter/Library/Python/2.7/lib/python/site-packages
>   echo 'import site;
> site.addsitedir("/usr/local/lib/python2.7/site-packages")' >>
> /Users/peter/Library/Python/2.7/lib/python/site-packages/homebrew.pth
> R: stable 3.4.3 (bottled)
> Software environment for statistical computing
> /usr/local/Cellar/R/3.4.0_1 (2,246 files, 59.1MB)
>   Poured from bottle on 2017-05-20 at 17:10:29
> /usr/local/Cellar/R/3.4.1_2 (2,114 files, 55.3MB)
>   Poured from bottle on 2017-09-19 at 03:23:33
> /usr/local/Cellar/R/3.4.2 (2,111 files, 55.1MB)
>   Poured from bottle on 2017-11-13 at 22:23:36
> /usr/local/Cellar/R/3.4.3 (2,110 files, 55.1MB)
>   Poured from bottle on 2017-12-02 at 23:20:16
> /usr/local/Cellar/R/3.4.3_1 (3,773 files, 115.4MB)
>   Built from source on 2018-02-08 at 22:01:49 with: --with-x11
> --with-cairo --with-java
> From:
> ==> Dependencies
> Build: pkg-config ✔
> Required: gcc ✔, gettext ✔, jpeg ✔, libpng ✔, pcre ✔, readline ✔, xz ✔
> Optional: openblas ✔
> ==> Requirements
> Optional: java ✔
> ==> Options
> --with-java
> Build with java support
> --with-openblas
> Build with openblas support
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting a list to a data frame

2018-05-03 Thread Hadley Wickham
On Wed, May 2, 2018 at 11:53 AM, Jeff Newmiller
> Another approach:
> library(tidyr)
> L <- list( A = data.frame( x=1:2, y=3:4 )
>  , B = data.frame( x=5:6, y=7:8 )
>  )
> D <- data.frame( Type = names( L )
>, stringsAsFactors = FALSE
> D$data <- L
> unnest(D, data)
> #>   Type x y
> #> 1A 1 3
> #> 2A 2 4
> #> 3B 5 7
> #> 4B 6 8

I think a slightly more idiomatic tidyverse solution is dplyr::bind_rows()

l <- list(
  A = data.frame(x = 1:2, y = 3:4),
  B = data.frame(x = 5:6, y = 7:8)

dplyr::bind_rows(l, .id = "type")
#>   type x y
#> 1A 1 3
#> 2A 2 4
#> 3B 5 7
#> 4B 6 8

This also has the advantage of returning a data frame when the inputs
are data frames.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extract function parameters from a R expression

2018-06-20 Thread Hadley Wickham
You need to recursively walk the parse tree/AST. See, e.g.,


On Wed, Jun 20, 2018 at 10:08 AM, Sigbert Klinke
> Hi,
> I have read an R program with
> expr <- parse("myRprg.R")
> How can I extract the parameters of a specifc R command, e.g. "library"?
> So, if myprg.R containes the lines
> library("xyz")
> library("abc")
> then I would like to get "xyz" and "abc" back from expr.
> Thanks in advance
> Sigbert
> --
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Logical Operators' inconsistent Behavior

2017-05-21 Thread Hadley Wickham
On Fri, May 19, 2017 at 6:38 AM, S Ellison  wrote:
>> TRUE & FALSE is FALSE but TRUE & TRUE is TRUE, so TRUE & NA could be
>> either TRUE or FALSE and consequently is NA.
>> OTOH FALSE & (anything) is FALSE so FALSE & NA is FALSE.
>> As I said *think* about it; don't just go with your immediate knee-jerk
>> (simplistic) reaction.
> Hmm... not sure that was quite fair to the OP. Yes,  FALSE &  == 
> FALSE. But 'NA' does not mean 'anything'; it means 'missing' (see ?'NA'). It 
> is much less obvious that FALSE &  should generate a non-missing 
> value. SQL, for example, generally  takes the view that any expression 
> involving 'missing' is 'missing'.

That's not TRUE ;)

sqlite> select (3 > 2) OR NULL;

sqlite> select (4 < 3) AND NULL;



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Math ops behaviour with multiple classes

2017-06-08 Thread Hadley Wickham
On Thu, Jun 8, 2017 at 2:45 PM, Bert Gunter  wrote:
> I think you may be confusing (S3) class and ?mode.

Your point is well made, but to be precise, I think you should talk
about the "type of" an object, not it's mode. mode() is a wrapper
around typeof(), designed (I believe) for S compatibility.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Inheritance for S3 classes

2017-08-08 Thread Hadley Wickham
You might find to be helpful (in
particular,, gives my
advice about subclass constructors)


On Mon, Aug 7, 2017 at 7:06 PM, Kym Nitschke  wrote:
> Hi R Users,
> I am relatively new to programming in R … so I apologise if my questions 
> appear ‘dumb’.
> I am using a package that defines a number of S3 classes. I want to create an 
> S3 child class of one of these classes. The parent class has a contractor 
> with many arguments. I have been having difficulty writing the child class 
> contractor. I have been unable to find a good reference in the internet for 
> writing S3 classes. What I have been able to find out so far is that the 
> child class constructor should call the parent class constructor … which in 
> this case requires passing the argument list with a variable number of 
> arguments (i.e. there are a number of optional arguments) from the child to 
> the parent.
> So my first question is … is there an easy way to do this? The 
> function will return a call object .. however the attributes function when 
> used on the call object returns a ‘NULL’.
> My second question is … can any one recommend a good reference for object 
> oriented programming in R which includes a comprehensive discussion of the S3 
> class model?
> Thanks
> Regards,
> Kym
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tidyverse repeating error: "object 'rlang_mut_env_parent' not found"

2017-08-14 Thread Hadley Wickham
The most likely explanation is you have a new version of dplyr/tibble
and an old version of rlang. Try re-installing rlang.


On Mon, Aug 14, 2017 at 9:26 AM, Szumiloski, John
> UseRs,
> When doing some data manipulations using the tidyverse, I am repeatedly 
> getting the same error message in now three separate situations.  I can write 
> up a reproducible example, but want to lay out the high-level issues in case 
> someone recognizes exactly what is happening here.
> The error is:
> Error in mut_env_parent(overscope$.top_env, lexical_env) :
>   object 'rlang_mut_env_parent' not found
> and it occurs in three situations:
> 1)  Using tidyr::nest() on an output from dplyr::group_by()
> 2)  Using tibble::tibble() with  =  arguments
> 3)  Using dplyr::select() on a tibble to select two columns
> Any obvious clues as to what's happening here? The only non-base packages 
> loaded are current tidyverse, forcats, magrittr, readxl and stringr.
> Thanks,
> John
> John Szumiloski, Ph.D.
> Principal Scientist, Statistician
> Pharmaceutical Development / Drug Product Science & Technology
> NBR105-1-1411
> Bristol-Myers Squibb
> P.O. Box 191
> 1 Squibb Drive
> New Brunswick, NJ
> 08903-0191
> (732) 227-7167
> This message (including any attachments) may contain c...{{dropped:14}}

__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] functions from 'base' package are not accessible

2017-08-24 Thread Hadley Wickham
This was a change in tidyr 0.7.0 that is causing a lot of confusion,
so we are preparing tidyr 0.7.1 which will back this change out.

If you want a work around in the meantime, you can express your
operation a bit more elegantly as:


df <- data.frame(v1 = 1:5, somestring = 6:10, v3 = 11:15, v4 = 16:20)
df %>%
  gather(key = var, value = val, somestring:ncol(df)) %>%


On Thu, Aug 24, 2017 at 6:32 AM, Eugeny Melamud
> Hi all!
> The following code (executed in console)...
> somevar <- data.frame(v1 = 1:5, somestring = 6:10, v3 = 11:15, v4 = 
> 16:20);
> somevar %>% gather(key = var, value = val, which(names(somevar) == 
> "somestring"):length(somevar)) %>% head(2);
> throws...
> Error in which(names(somevar) == "somestring") :
>   could not find function "which"
> if I change which(names(somevar) == "somestring") with 0 I'll get
>Error in length(somevar) :
>   could not find function "length"
> So it looks like base package is not loaded. Still if type 'which' in console 
> I get
>   function (x, arr.ind = FALSE, useNames = TRUE)
>   {
> wh <- .Internal(which(x))
> if (arr.ind && !is.null(d <- dim(x)))
> arrayInd(wh, d, dimnames(x), useNames = useNames)
> else wh
>   }
> base (that contains which function) package is installed. R version is 3.4.1 
> and system is Win8
> Where should I look to understand how to fix the problem?
> Thank you in advance!
> Eugeny
> [[alternative HTML version deleted]]
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] To implement OO or not in R package, and if so, how to structure it?

2017-09-14 Thread Hadley Wickham
I just finished the first draft of the chapters on OO programming for
the 2nd edition of "Advanced R": - you might
find them helpful.


On Thu, Sep 14, 2017 at 7:58 AM, Alexander Shenkin  wrote:
> Hello all,
> I am trying to decide how to structure an R package.  Specifically, do I use
> OO classes, or just provide functions?  If the former, how should I
> structure the objects in relation to the type of data the package is
> intended to manage?
> I have searched for, but haven't found, resources that guide one in the
> *decision* about whether to implement OO frameworks or not in one's R
> package.  I suspect I should, but the utility of the package would be aided
> by *collections* of objects.  R, however, doesn't seem to implement
> collections.
> Background: I am writing an R package that will provide a framework for
> analyzing structural models of trees (as in trees made of wood, not
> statistical trees).  These models are generated from laser scanning
> instruments and model fitting algorithms, and hence may have aspects that
> are data-heavy.  Furthermore, coputing metrics based on these structures can
> be computationally heavy.  Finally, as a result, each tree has a number of
> metrics associated with it (which may be expensive to calculate), along with
> the underlying data of that tree.  It will be important as well to perform
> calculations across many of these trees, as one would do in a dataframe.
> This last point is important: if one organizes data across potentially
> thousands of objects, how easy or hard is it to massage properties of those
> objects into a dataframe for analysis?
> Thank you in advance for thoughts and pointers.
> Allie
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] To implement OO or not in R package, and if so, how to structure it?

2017-09-14 Thread Hadley Wickham
> Did you read this?
> Maybe it could give you some insight in how to create package.

That resource is ~9 years old. There are more modern treatments
available. You can read mine at



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] httr package syntax (PUT)

2016-06-02 Thread Hadley Wickham
On Wed, Jun 1, 2016 at 7:16 PM, Jared Rodecker  wrote:
> Greetings fellow R users.
> I'm struggling with the syntax of submitting a PUT request
> I'm trying to insert a few PUT requests into some legacy R code that I have
> that performs daily ETL on a small database. These requests will add users
> to an email mailing list in MailChimp.
> I have been able to get my GET requests formatted into syntax that R
> (specifically the httr package) accepts:
> GET("
> query = list(apikey = 'XX'))
> However when I try to do something similar for PUT requests this simple
> syntax isn't working - you can't just pass the API KEY and/or requested
> parameters directly through the URL. I get a 401 error if I use the same
> syntax I used for GET.
> I believe that I need to use the CONFIG option to pass the API key (either
> using AUTHENTICATE or ADD_HEADERS) and the requested parameters in the BODY
> to get the PUT request to work but I can't get the syntax to work - this
> gives a 400 error:
> auth <- authenticate("anystring", "XX", type = "basic")
> parms <- '[{"email_address" : "", "status_if_new" :
> "subscribed"}]'
> PUT("
> ",config=auth,body=parms,encode="json")
> If anyone can point me to a more flushed out example that would be
> amazing...but even just some tips on how to get more info on my error
> message to help me troubleshoot my syntax would also be a big help.  I've
> also been trying to get httpput (from the RCurl package) but also
> struggling with the syntax there.

If you use verbose() you should be able to see what the problem is -
httr does the json encoding for you. You want:

params <- list(email_address = "", status_if_new =
   config = auth,
   body = params,
   encode = "json"



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About identification of CRAN CHECK machines in logs

2016-06-09 Thread Hadley Wickham
On Thu, Jun 9, 2016 at 9:24 AM, Marcelo Perlin  wrote:
> Hi,
> I recently released two packages (RndTexExams and GetTDData) in CRAN and
> I'm trying to track the number of downloads and location of users.
> I wrote a simple script to download and analyze the log files in http://cran
> I realized, however, that during the release of a new version of the
> packages there is a spike in the number of downloads. I believe that the
> CRAN checks are included in the number of installations of the package in
> the log files.

I don't think that's true. Why would CRAN be installing the package
from a mirror?



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About identification of CRAN CHECK machines in logs

2016-06-10 Thread Hadley Wickham
On Fri, Jun 10, 2016 at 8:27 AM, Marcelo Perlin  wrote:
> I don't know Hadley. But you can see evidence of "something" systematically
> installing the packages in the log data. From my two CRAN packages I noticed
> a high correlation in the number of downloads.
> Try the following script, which will pick 5 random packages from CRAN and
> calculate the correlation matrix between their differenced number of
> downloads. To avoid spurious correlations,  I removed the weekends since we
> can expect some seasonality and also the zero entries. Its crude, I know,
> but it does shows some positive associations between the number of
> installations of the packages.

Which is not at all surprising:

* there are very strong seasonal patterns
* there are big jumps after releases of new versions of R
* some people like to have all packages installed locally

This is an intrinsic problem with download data. There's no way to
tell if a downloader is really using your package or not.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Build command in library(devtools)

2016-07-20 Thread Hadley Wickham
The first place to start is to make sure you have the latest version of
devtools. If that doesn't work, please file an issue on devtools' GitHub.


On Wednesday, July 20, 2016, Steven Yen  wrote:

> Here is what I found. I had to go back to as early as R 3.0.3 (March,
> 2014) along with Rtools30.exe that works with that version of R, in
> order for devtools to work right. With other/later version of R, I end
> up building a package with
> library(devtools); build("yenlib",binary=F)
> with no error message but the package does not run correctly; or with
> library(devtools); build("yenlib",binary=T)
> which deliver an error that says zip command failed (bevtools calls
> Rtools when binary=T).
> Updated versions are good, but what's the use if they do not work for a
> situation like this.
> Any help/insight would be appreciated.
> On 7/20/2016 10:08 AM, Steven Yen wrote:
> > On 7/19/2016 4:38 PM, John McKown wrote:
> >> On Tue, Jul 19, 2016 at 3:15 PM, Steven Yen  
> >> >wrote:
> >>
> >> I recently updated my R and RStudio to the latest version and
> >> now the
> >> binary option in the "build" command in devtools stops working.
> >>
> >> I went around and used the binary=F option which worked by I get
> >> the
> >> .tar.gz file instead of the .zip file which I prefer.
> >>
> >> Does anyone understand the following error message:
> >>
> >> status 127
> >> running 'zip' failed
> >>
> >>
> >> ​I'm not totally sure, but I think that means that R cannot find the
> >> "zip" program in order to run it. ​
> [[alternative HTML version deleted]]
> __
>  mailing list -- To UNSUBSCRIBE and
> more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Date Time in R

2016-07-26 Thread Hadley Wickham
I'd recommend reading up on how to create a minimal reproducible
example (e.g.
It is unlikely anyone will be able to help you unless you can reliably
communicate _exactly_ what you're doing. Unlike human languages,
computer languages are extremely strict, and just one wrong character
can make a big difference.


On Tue, Jul 26, 2016 at 10:45 AM, Shivi Bhatia  wrote:
> Hi David please see the code and some reproducible data:
> eir$date<- as.Date(eir$date,format = "%m-%d-%y")  then i had used the
> lubridate library to help with the dates:
> install.packages("lubridate")
> library(lubridate)
> eir$date <- mdy(eir$date)
> weekdays <- wdy(eir$date)
> week names <- wdy(eir$date, label = TRUE)
> This the output from the file:
> structure(list(date = structure(c(NA_real_, NA_real_, NA_real_), class =
> "Date"),
> month = structure(c(2L, 2L, 2L), .Label = c("Jun","May"), class = "factor"),
> day = c(30L, 30L, 30L),
> weekday = structure(c(2L,2L, 2L), .Label = c("Fri", "Mon", "Sat", "Sun",
> "Thu", "Tue","Wed"), class = "factor"),
> survey_rating = c(3L, 2L, 3L), query_status = c("Yes", "Don't know","No"),
>  a = c("05-30-16", "05-30-16", "05-30-16")), .Names = c("date","month",
> "day", "weekday", "survey_rating"), row.names = c(NA, 3L), class =
> "data.frame")
> There are several other variables that i have removed which are not
> relevant in this context.
> On Tue, Jul 26, 2016 at 8:55 PM, David L Carlson  wrote:
>> Show us the output, don’t just tell us what you are seeing. If the dates
>> are correct in the csv file, show us the structure of the data frame you
>> created with read.csv() and show the command(s) you used to convert the
>> character data to date format. The solution is likely to be simple if you
>> will cut/paste the R console and not just describe what is happening.
>> David C
>> *From:* Shivi Bhatia []
>> *Sent:* Tuesday, July 26, 2016 10:08 AM
>> *To:* David L Carlson
>> *Subject:* Re: [R] Date Time in R
>> Hi David,
>> This gives the results accurately. The first line shows all the variable
>> names and the rest shows all values stored for each of the variable. Here
>> date is appearing as correct.
>> Thanks, Shivi
>> On Tue, Jul 26, 2016 at 7:39 PM, David L Carlson 
>> wrote:
>> What does this produce?
>> > readLines("YourCSVfilename.csv", n=5)
>> If the data are in Excel, the date format used in .csv files is not always
>> in the same as the format used when viewing dates in the spreadsheet.
>> -
>> David L Carlson
>> Department of Anthropology
>> Texas A&M University
>> College Station, TX 77840-4352
>> -Original Message-
>> From: R-help [] On Behalf Of Shivi
>> Bhatia
>> Sent: Tuesday, July 26, 2016 7:42 AM
>> To: Marc Schwartz
>> Cc: R-help
>> Subject: Re: [R] Date Time in R
>> Thanks Marc for the help. this really helps.
>> I think there is some issue with the data saved in csv format for this
>> variable as when i checked:
>> str(eir$date)- this results in :-
>> Date[1:5327], format: NA NA NA NA NA.
>> Thanks again.
>> On Tue, Jul 26, 2016 at 5:58 PM, Marc Schwartz 
>> wrote:
>> > Hi,
>> >
>> > That eir$date might be a factor is irrelevant. There is an as.Date()
>> > method for factors, which does the factor to character coercion
>> internally
>> > and then calls as.Date.character() on the result.
>> >
>> > Using the example data below:
>> >
>> > eir <- data.frame(date = c("05-30-16", "05-30-16", "05-30-16",
>> >"05-30-16", "05-30-16", "05-30-16"))
>> >
>> > > str(eir)
>> > 'data.frame':   6 obs. of  1 variable:
>> >  $ date: Factor w/ 1 level "05-30-16": 1 1 1 1 1 1
>> >
>> > > eir
>> >   date
>> > 1 05-30-16
>> > 2 05-30-16
>> > 3 05-30-16
>> > 4 05-30-16
>> > 5 05-30-16
>> > 6 05-30-16
>> >
>> > eir$date <- as.Date(eir$date, format = "%m-%d-%y")
>> >
>> > > str(eir)
>> > 'data.frame':   6 obs. of  1 variable:
>> >  $ date: Date, format: "2016-05-30" ...
>> >
>> > > eir
>> > date
>> > 1 2016-05-30
>> > 2 2016-05-30
>> > 3 2016-05-30
>> > 4 2016-05-30
>> > 5 2016-05-30
>> > 6 2016-05-30
>> >
>> > eir$days <- weekdays(eir$date)
>> >
>> > > str(eir)
>> > 'data.frame':   6 obs. of  2 variables:
>> >  $ date: Date, format: "2016-05-30" ...
>> >  $ days: chr  "Monday" "Monday" "Monday" "Monday" ...
>> >
>> > > eir
>> > date   days
>> > 1 2016-05-30 Monday
>> > 2 2016-05-30 Monday
>> > 3 2016-05-30 Monday
>> > 4 2016-05-30 Monday
>> > 5 2016-05-30 Monday
>> > 6 2016-05-30 Monday
>> >
>> >
>> > I would check to be sure that you do not have any typos in your code.
>> >
>> > Regards,
>> >
>> > Marc Schwartz
>> >
>> >
>> > > On Jul 26, 2016, at 6:58 AM, Shivi Bhatia 
>> wrote:
>> > >
>> > > Hello Again,
>> > >
>> > > While i tried your solution as you suggested above it seems to b

Re: [R] lm() silently drops NAs

2016-07-26 Thread Hadley Wickham
On Tue, Jul 26, 2016 at 3:24 AM, Martin Maechler
> I have been asked (in private)

Martin was very polite to not share my name, but it was me :)

> > Hi Martin,
> y <- c(1, 2, 3, NA, 4)
> x <- c(1, 2, 2, 1, 1)
> t.test(y ~ x)
> lm(y ~ x)
> > Normally, most R functions follow the principle that
> > "missings should never silently go missing". Do you have
> > any background on why these functions (and most model/test
> > function in stats, I presume) silently drop missing values?
> And I think, the issue and an answer are important enough to be
> public, hence this posting to R-help :
> First note that in some sense it is not true that lm(y ~ x) silently drops
> NAs: Everybody who is taught about lm() is taught to look at
>summary( fm )  where  fm <- lm(y ~ x)

Good point - unfortunately the message was a bit subtle for me, and
don't remember being taught to look for it :(

> and that (for the above case)  "says"
>  ' (1 observation deleted due to missingness) '
> and so is not entirely silent.
> This goes all back to the idea of having an 'na.action' argument
> which may be a function (a very good idea, "functional
> programming" in a double sense!)... which Chambers et al
> introduced in The White Book (*1) and which I think to remember
> was quite a revolutionary idea; at least I had liked that very much
> once I understood the beauty of passing functions as arguments
> to other functions.
> One problem already back then has been that we already had the
> ---much more primitive but often sufficient--- standard of an
> 'na.rm = FALSE' (i.e. default FALSE) argument.
> This has been tought in all good classes/course about statistical
> modeling with S and R ever since ... I had hoped 
> (but it seems I was too optimistic, .. or many students have too
>  quickly forgotten what they were taught ..)
> Notably the white book itself, and the MASS (*2) book do teach
> this.. though possibly not loudly enough.
> Two more decisions about this were made back then, as well:
>   1) The default for na.action to be na.omit  (==> "silently dropping")
>   2) na.action being governed by  options(na.action = ..)
> '1)' may have been mostly "historical": I think it had been the behavior of
> other "main stream" statistical packages back then (and now?) *and*
> possibly more importantly, notably with the later (than white book = "S3")
> advent of na.replace, you did want to keep the missing in your
> data frame, for later analysis; e.g. drawing (-> "gaps" in
> plots) so the NA *were* carried along and would not be
> forgotten, something very important in most case.s.
> and '2)' is something I personally no longer like very
> much, as it is "killing" the functional paradigm.
> OTOH, it has to be said in favor of that "session wide" / "global" setting
>   options(na.action = *)
> that indeed it depends on the specific data analysis, or even
> the specific *phase* of a data analysis, *what* behavior of NA
> treatment is desired and at the time it was thought smart
> that all methods (also functions "deep down" called by
> user-called functtions) would automatically use the "currently
> desired" NA handling.
> There have been recommendations (I don't know exactly where and
> by whom) to always set
>options(na.action =
> in your global .First() or nowadays rather your  Rprofile, and
> I assume that some of the CRAN packages and some of the "teaching
> setups" would do that (and if you do that, the lm() and t.test()
> calls above give an error).

I think that's a bit too strict for me, so I wrote my own:

na.warn <- function(object, ...) {
  missing <- complete.cases(object)
  if (any(missing)) {
warning("Dropping ", sum(missing), " rows with missing values",
call. = FALSE)

  na.exclude(object, ...)

That gives me the behaviour I want:

options(na.action = na.warn)

y <- c(1, 2, 3, NA, 4)
x <- c(1, 2, 2, 1, 1)
mod <- lm(y ~ x)
#> Warning: Dropping 4 rows with missing values

#>   1   2   3   4   5
#> 2.5 2.5 2.5  NA 2.5
#> -1.5 -0.5  0.5   NA  1.5

To me, this would be the most sensible default behaviour, but I
realise it's too late to change without breaking many existing

On a related note, I've never really understood why it's called
na.exclude - from my perspective it causes the _inclusion_ of missing
values in the predictions/residuals.

Thanks for the (as always!) informative response, Martin.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm() silently drops NAs

2016-07-26 Thread Hadley Wickham
> I think that's a bit too strict for me, so I wrote my own:
> na.warn <- function(object, ...) {
>   missing <- complete.cases(object)
>   if (any(missing)) {
> warning("Dropping ", sum(missing), " rows with missing values",
> call. = FALSE)
>   }
>   na.exclude(object, ...)
> }

That should, of course, have been:

missing <- !complete.cases(object)




__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.xlsx function crashing R Studio

2016-08-22 Thread Hadley Wickham
Or readxl.


On Mon, Aug 22, 2016 at 5:54 AM, jim holtman  wrote:
> try the openxlsx package
> Jim Holtman
> Data Munger Guru
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
> On Sun, Aug 21, 2016 at 1:30 PM, Kevin Kowitski 
> wrote:
>> Hey everyone,
>>I have used read.xlsx in the past rather than XLConnect for importing
>> Excel data to R.  However, I have been finding now that the read.xlsx
>> function has been causing my R studio to Time out.  I thought it might be
>> because the R studio I had was out of date so I installed R studio X64
>> 3.3.1 and reinstalled the xlsx package but it is still failing.  I have
>> been trying to use XLConnect in it's place which has been working, excpet
>> that I am running into memory error:
>>   Error: OutOfMemoryError (Java): GC overhead limit exceeded
>> I did some online searching and found an option to increase memory:
>>   "options(java.parameters = "-Xmx4g" )
>> but it resulted in this new memory Error:
>>  Error: OutOfMemoryError (Java): Java heap space
>> Can anyone provide me with some help on getting the read.xlsx function
>> working?
>> -Kevin
>> __
>> mailing list -- To UNSUBSCRIBE and more, see
>> PLEASE do read the posting guide
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> [[alternative HTML version deleted]]
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] workflow getting UTF-8 csv in and out of R on Mac (spreadsheet editor)

2016-09-02 Thread Hadley Wickham
You can use readr::write_excel_csv() which adds a BOM that forces excel to
read as UTF-8.


On Friday, September 2, 2016, Erich Neuwirth 

> read_excel in Hadley’s readxl package
> should handle your encoding problems.
> Writing Excel files on a Mac, however, still is somewhat messy.
> And you probably should post this kind of question on r-sig-mac
> On 02 Sep 2016, at 13:03, Kai Mx > wrote:
> >
> > Hi all,
> >
> > I am hoping for some advice on how to handle UTF-8 spreadsheet files in a
> > Mac environment - sort of off-topic, but still relevant for hopefully a
> > bunch of people.
> >
> > I am using R on Mac OS 10.10. Sometimes I have the urge to actually look
> at
> > a large spreadsheet on the big screen or make some changes to the tables.
> > Since most of my colleagues live in the M$ Excel - world I tend to use
> > Excel 2011 as well. However, Excel does not handle UTF-8 (which I like
> > because of different system locales).
> > So I actually do a write.csv with file-encoding in macroman, but even
> then
> > Excel won't just open it and I will have to work my way through the
> > import-dialogue.
> >
> > The other way around, it's even worse. I save the spreadsheet as
> macroman,
> > iconv it to utf-8 and then read.csv it to R.
> >
> > It works, but it's just really messy. Is there a (preferably
> light-weight)
> > csv-spreadsheet Editor for Mac OS that you use? Open-Office? I would like
> > NOT to actually buy another Excel version. However, for collaboration, a
> > xls-export would be phenomenal.
> >
> > Thanks!
> >
> > Kai
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> >  mailing list -- To UNSUBSCRIBE and
> more, see
> >
> > PLEASE do read the posting guide
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] library(xlsx) fails with an error: Error: package ‘rJava’ could not be loaded

2015-04-20 Thread Hadley Wickham
You might want to try readxl instead, as it doesn't have any external

On Sat, Apr 18, 2015 at 3:07 PM, John Sorkin
> Windows 7 64-bit
> R 3.1.3
> RStudio 0.98.1103
> I am having difficulty loading and installing the xlsx package. The
> loading occurred without any problem, however the library command
> library(xlsx) produced an error related to rJava. I tried to install
> rJava seperately, re-loaded the xlsx package, and entered the
> library(xlsx) command but received the same error message about rJave.
> Please see terminal messages below. Any suggestion that would allow me
> to load and run xlsx would be appreciated.
> Thank you,
> John
>> install.packages("xlsx")
> Installing package into ‘C:/Users/John/Documents/R/win-library/3.1’
> (as ‘lib’ is unspecified)
> trying URL
> ''
> Content type 'application/zip' length 400944 bytes (391 KB)
> opened URL
> downloaded 391 KB
> package ‘xlsx’ successfully unpacked and MD5 sums checked
> The downloaded binary packages are in
> C:\Users\John\AppData\Local\Temp\Rtmp4CO5m7\downloaded_packages
>> library(xlsx)
> Loading required package: rJava
> Error : .onLoad failed in loadNamespace() for 'rJava', details:
>   call: inDL(x, as.logical(local), as.logical(now), ...)
>   error: unable to load shared object
> 'C:/Users/John/Documents/R/win-library/3.1/rJava/libs/x64/rJava.dll':
>   LoadLibrary failure:  The specified module could not be found.
> Error: package ‘rJava’ could not be loaded
>> install.packages("rJava")
> Installing package into ‘C:/Users/John/Documents/R/win-library/3.1’
> (as ‘lib’ is unspecified)
> trying URL
> ''
> Content type 'application/zip' length 759396 bytes (741 KB)
> opened URL
> downloaded 741 KB
> package ‘rJava’ successfully unpacked and MD5 sums checked
> The downloaded binary packages are in
> C:\Users\John\AppData\Local\Temp\Rtmp4CO5m7\downloaded_packages
>> library(rJava)
> Error : .onLoad failed in loadNamespace() for 'rJava', details:
>   call: inDL(x, as.logical(local), as.logical(now), ...)
>   error: unable to load shared object
> 'C:/Users/John/Documents/R/win-library/3.1/rJava/libs/x64/rJava.dll':
>   LoadLibrary failure:  The specified module could not be found.
> Error: package or namespace load failed for ‘rJava’
>> library(xlsx)
> Loading required package: rJava
> Error : .onLoad failed in loadNamespace() for 'rJava', details:
>   call: inDL(x, as.logical(local), as.logical(now), ...)
>   error: unable to load shared object
> 'C:/Users/John/Documents/R/win-library/3.1/rJava/libs/x64/rJava.dll':
>   LoadLibrary failure:  The specified module could not be found.
> Error: package ‘rJava’ could not be loaded
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:17}}

__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] misbehavior with extract_numeric() from tidyr

2015-04-20 Thread Hadley Wickham
On Mon, Apr 20, 2015 at 1:57 PM, arnaud gaboury
> On Mon, Apr 20, 2015 at 6:09 PM, William Dunlap  wrote:
>> The hyphen without a following digit confuses tidyr::extract_numeric().
>> E.g.,
>>> extract_numeric("23 ft-lbs")
>>Warning message:
>>In extract_numeric("23 ft-lbs") : NAs introduced by coercion
>>[1] NA
>>> extract_numeric("23 ft*lbs")
>>[1] 23
> See[0] for the reason on the minus in the regex. It is not a bug but a wish.
> I am honestly very surprised the maintainer decided to go with such a so
> simple solution for negative numbers.
> [0]

Any heuristic is going to fail in some circumstances. If you want to
be sure it's doing what you want for your use case, write the regular
expression yourself.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rtools 3.3 is not compatible with R 3.2.0.?

2015-04-21 Thread Hadley Wickham
It's been fixed in the dev version, and I'm planning on submitting to
CRAN in the near future.

On Tue, Apr 21, 2015 at 6:01 PM, Shi, Tao  wrote:
> hi list,
> Any updates on this issue?  Thank you very much!
> Tao
>> devtools::install_github("rstudio/packrat")
> WARNING: Rtools 3.3 found on the path at c:/Rtools is not compatible with R 
> 3.2.0.
> Please download and install Rtools 3.1 from 
>, remove the incompatible 
> version from your PATH, then run find_rtools().
> Downloading github repo rstudio/packrat@master
> Installing packrat
> "C:/PROGRA~1/R/R-32~1.0/bin/x64/R" --vanilla CMD INSTALL  \
> "C:/Users/tshi/AppData/Local/Temp/Rtmp6VYlhX/devtools25dc273e706c/rstudio-packrat-42b76ad"
>  --library="C:/Program Files/R/R-3.2.0/library"  \
> --install-tests
> * installing *source* package 'packrat' ...
> ** R
> ** inst
> ** tests
> ** preparing package for lazy loading
> ** help
> *** installing help indices
> ** building package indices
> ** testing if installed package can be loaded
> * DONE (packrat)
>> find_rtools()
> WARNING: Rtools 3.3 found on the path at c:/Rtools is not compatible with R 
> 3.2.0.
> Please download and install Rtools 3.1 from 
>, remove the incompatible 
> version from your PATH, then run find_rtools().
>> sessionInfo()
> R version 3.2.0 (2015-04-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
> States.1252LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C   LC_TIME=English_United States.1252
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
> other attached packages:
> [1] devtools_1.7.0   packrat_0.4.3-19
> loaded via a namespace (and not attached):
> [1] httr_0.6.1 tools_3.2.0RCurl_1.95-4.5 stringr_0.6.2  bitops_1.0-6


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with particular file in XML package?

2015-05-28 Thread Hadley Wickham
I have also seen this problem on a student's windows machine (with R
3.2.0 and on multiple mirrors). It appeared that the package zip
itself was being corrupted (with an error to the tune of downloaded
file size does not agree with actual file size). The most likely
explanation that I could come up with was that a virus checker was
hitting a false positive and mangling the zip file.


On Thu, May 28, 2015 at 1:38 AM, Prof Brian Ripley
> This really should have been sent to the package maintainer.  But that the
> zip file is corrupt has been reported several times, and does not block
> installation for anyone else, so your (plural) diagnosis is wrong.
> On 28/05/2015 03:56, Gen wrote:
>> I have been attempting to install the R devtools package at work.  The
>> version of R is 3.1.2 (Pumpkin Helmet).  However, the installation of
>> devtools fails because devtools depends on rversions which in turn depends
>> upon the XML package (XML_3.98-1.1.tar.gz), and the XML package is not
>> importing correctly for us.
>> One of our system administrators tried scanning through the files in the
>> XML package, and he said that the particular file:
>> /src/contrib/XML_3.98-1.1.tar.gz/XML/inst/exampleData/ looks
>> corrupted.  The actual error message he received was: "Archive parsing
>> failed!  (Data is corrupted)."  For the record, I tried downloading an
>> older version of the XML package (XML_3.95-0.1.tar.gz) but that was also
>> without success -- this time there was a separate error message about not
>> being able to locate xml2-config.  (Perhaps XML_3.95-0.1.tar.gz is just
>> not
>> compatible with R version 3.1.2?)
>> I tried browsing over to the "CRAN checks" link for the XML package and
>> noticed several red warning messages under the "Status" column -- not sure
>> if that is typical?  Has anyone else had trouble with the XML package
>> lately and if so, how did you resolve it?  Would it be possible to remove
>> the potentially corrupted file and then re-upload the package source
>> XML_3.98-1.1.tar.gz to the CRAN webpage?  Thanks for your
>> help/suggestions!
>> [[alternative HTML version deleted]]
>> __
>> mailing list -- To UNSUBSCRIBE and more, see
>> PLEASE do read the posting guide
> PLEASE do, including what it says about HTML mail, 'at a minimum'
> information required and upgrading before posting: R 3.1.2 is already 2
> versions obsolete.
> --
> Brian D. Ripley,
> Emeritus Professor of Applied Statistics, University of Oxford
> 1 South Parks Road, Oxford OX1 3TG, UK
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dplyr - counting a number of specific values in each column - for all columns at once

2015-06-16 Thread Hadley Wickham
On Tue, Jun 16, 2015 at 12:24 PM, Dimitri Liakhovitski
> Hello!
> I have a data frame:
> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5),
>   device = c(1,1,2,2,3,3))
> myvars = c("a", "b", "c")
> md[2,3] <- NA
> md[4,1] <- NA
> md
> I want to count number of 5s in each column - by device. I can do it like 
> this:
> library(dplyr)
> group_by(md, device) %>%
> summarise(counts.a = sum(a==5, na.rm = T),
>   counts.b = sum(b==5, na.rm = T),
>   counts.c = sum(c==5, na.rm = T))
> However, in real life I'll have tons of variables (the length of
> 'myvars' can be very large) - so that I can't specify those counts.a,
> counts.b, etc. manually - dozens of times.
> Does dplyr allow to run the count of 5s on all 'myvars' columns at once?

md %>%
  group_by(device) %>%
  summarise_each(funs(sum(. == 5, na.rm = TRUE)))



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] what constitutes a 'complete sentence'?

2015-07-03 Thread Hadley Wickham
It might be a line break problem - I think you want:

Description: Functions designed to test for single gene/phenotype
association and
for pleiotropy on genetic and genomic data.


On Fri, Jul 3, 2015 at 10:09 AM, Federico Calboli
> Hi All,
> I am upgrading a package for CRAN, and I get this note:
> checking DESCRIPTION meta-information ... NOTE
> Malformed Description field: should contain one or more complete sentences.
> This is puzzling because:
> ...
> Description: Functions designed to test for single gene/phenotype association 
> and for pleiotropy on genetic and genomic data.
> ...
> In my understanding "Functions designed to test for single gene/phenotype 
> association and for pleiotropy on genetic and genomic data.” *is* a complete 
> sentence.  So, what is complete sentence in the opinion of whomever coded 
> that check?
> Best
> F
> --
> Federico Calboli
> Ecological Genetics Research Unit
> Department of Biosciences
> PO Box 65 (Biocenter 3, Viikinkaari 1)
> FIN-00014 University of Helsinki
> Finland
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] what constitutes a 'complete sentence'?

2015-07-03 Thread Hadley Wickham
In that case, you need to create a minimal reproducible example and make it
publicly available.


On Friday, July 3, 2015, Federico Calboli 

> > On 3 Jul 2015, at 12:14, Hadley Wickham  > wrote:
> >
> > It might be a line break problem - I think you want:
> >
> > Description: Functions designed to test for single gene/phenotype
> > association and
> >for pleiotropy on genetic and genomic data.
> Tried this and unfortunately it does not help.
> BW
> F
> >
> > Hadley
> >
> > On Fri, Jul 3, 2015 at 10:09 AM, Federico Calboli
> > > wrote:
> >> Hi All,
> >>
> >> I am upgrading a package for CRAN, and I get this note:
> >>
> >> checking DESCRIPTION meta-information ... NOTE
> >> Malformed Description field: should contain one or more complete
> sentences.
> >>
> >> This is puzzling because:
> >>
> >>
> >> ...
> >> Description: Functions designed to test for single gene/phenotype
> association and for pleiotropy on genetic and genomic data.
> >> ...
> >>
> >> In my understanding "Functions designed to test for single
> gene/phenotype association and for pleiotropy on genetic and genomic data.”
> *is* a complete sentence.  So, what is complete sentence in the opinion of
> whomever coded that check?
> >>
> >> Best
> >>
> >> F
> >>
> >> --
> >> Federico Calboli
> >> Ecological Genetics Research Unit
> >> Department of Biosciences
> >> PO Box 65 (Biocenter 3, Viikinkaari 1)
> >> FIN-00014 University of Helsinki
> >> Finland
> >>
> >> 
> >>
> >> __
> >>  mailing list -- To UNSUBSCRIBE and
> more, see
> >>
> >> PLEASE do read the posting guide
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> >
> --
> Federico Calboli
> Ecological Genetics Research Unit
> Department of Biosciences
> PO Box 65 (Biocenter 3, Viikinkaari 1)
> FIN-00014 University of Helsinki
> Finland


[[alternative HTML version deleted]]

__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] : Ramanujan and the accuracy of floating point computations - using Rmpfr in R

2015-07-03 Thread Hadley Wickham
 It doesn’t appear to me that mpfr was ever designed to handle expressions 
 as the first argument.
>>> This could be a start. Obviously one would wnat to include code to do other 
>>> substitutions probably using the all.vars function to pull out the other 
>>> “constants” and ’numeric’ values to make them of equivalent precision. I’m 
>>> guessing you want to follow the parse-tree and then testing the numbers for 
>>> integer-ness and then replacing by paste0( “mpfr(“, val, “L, “, prec,”)” )
>>> Pre <- function(expr, prec){ sexpr <- deparse(substitute(expr) )
>> Why deparse?  That's almost never a good idea.  I can't try your code (I
>> don't have mpfr available), but it would be much better to modify the
>> expression than the text representation of it.  For example, I think
>> your code would modify strings containing "pi", or variables with those
>> letters in them, etc.  If you used substitute(expr) without the
>> deparse(), you could replace the symbol "pi" with the call to the Const
>> function, and be more robust.
> Really? I did try. I was  fairly sure that someone could do better but I 
> don’t see an open path along the lines you suggest. I’m pretty sure I tried 
> `substitute(expr, list(pi= pi))` when `expr` had been the formal argument and 
> got disappointed because there is no `pi` in the expression `expr`. I 
> _thought_ the problem was that `substitute` does not evaluate its first 
> argument, but I do admit to be pretty much of a klutz with this sort of 
> programming. I don’t think you need to have mpfr installed in order to 
> demonstrate this.

You might want to read - it's
my best attempt at explaining how to modify call trees in R.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trellis Plots: translating lattice xyplot() to ggplot()

2015-07-10 Thread Hadley Wickham
Have you tried explicitly print()ing the lattice graphics in your knitr doc?


On Friday, July 10, 2015, Rich Shepard  wrote:

>   Hadley's ggplot2 book is quite old and a new version is in the works, but
> not yet out. I've been using lattice graphics but the knitr package doesn't
> support lattice, only basic plots and ggplot2. My Web searches for Trellis
> plots in ggplot2 equivalent to those in lattice have not been productive.
>   I would appreciate a pointer to a resource that would teach me how to
> translate from lattice xyplot() to ggplot2 ggplot().
>   This is one such plot needing translation:
> xyplot(value ~ sampdate | variable, data=carlin.1.melt, = T)
> Rich
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trellis Plots: translating lattice xyplot() to ggplot()

2015-07-10 Thread Hadley Wickham
You shouldn't be explicitly opening a device in a knitr document. I think
maybe you should post a minimal document so we can figure out what's going


On Friday, July 10, 2015, Rich Shepard  wrote:

> On Fri, 10 Jul 2015, Hadley Wickham wrote:
>  Have you tried explicitly print()ing the lattice graphics in your knitr
>> doc?
> Hadley,
>   Only now. Had not thought of trying this before.
> pdf('carlin-1-descriptive.pdf')
> print(xyplot(value ~ sampdate | variable, data=carlin.1.melt, = T))
> No error messages, but no graphic, either. Without specifying the pdf
> device, TeX complains it cannot find a graphics device and lists
> bit-mapped,
> ps and svg devices.
>   Most likely I do not have the correct syntax.
> Thanks,
> Rich
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trellis Plots: translating lattice xyplot() to ggplot()

2015-07-10 Thread Hadley Wickham
I'd recommend starting with a simpler .Rmd or .Rnw file, rather than
using it with lyx. The basic .Rmd file below works for me without any
further adjustments:

# Lattice test

xyplot(mpg ~ wt, data = mtcars)


On Fri, Jul 10, 2015 at 3:39 PM, Rich Shepard  wrote:
> On Fri, 10 Jul 2015, Hadley Wickham wrote:
>> You shouldn't be explicitly opening a device in a knitr document.
> Hadley,
>   Didn't think so.
>> I think maybe you should post a minimal document so we can figure out
>> what's going wrong.
>   Agreed. Attached are the raw data (carlin.csv) and a stripped down LyX
> document with the knitr chunks.
>   This is my first attempt to use knitr; I'm reading the knitr book and
> that's where I got the impression that lattice graphics are not supported.
> Rich
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For Hadley Wickham: Need for a small fix in haven::read_spss

2015-07-20 Thread Hadley Wickham
(FWIW this would've been better send to me directly or filed on
github, rather than sent to R-help)

I think this is more of a problem with the way that you're accessing
the info, than the design of the underlying structure. I'd do
something like this:

attr_default <- function(x, which, default) {
  val <- attr(x, which)
  if (is.null(val)) default else val

sapply(spss1, attr_default, "label", NA_character_)

(code untested, but you get the idea)


On Mon, Jul 20, 2015 at 8:56 AM, Dimitri Liakhovitski
> Hadley,
> you've added function labelled to haven, which is great. However, when
> it so happens that in SPSS a variable has no long label, your code
> considers it to be NULL rather than an NA. NULL is correct, but NA
> would probably be better.
> For example, I've read in an SPSS file:
> library(haven)
> spss1 <- read_spss("SPSS_Example.sav")
> varnames <- names(spss1)
> mylabels <- unlist(lapply(spss1, attr, "label"))
> length(varnames)
> [1] 64
> length(mylabels)
> [1] 62
> Because in this particular dataset there were 2 variables without
> either variable labels or data labels.
> When I run lapply(spss1, attr, "label") I see under those 2 variables
> "NULL" - which is true and valid.
> However,  would it be possible to have instead of NULL an NA? This way
> the length of varnames and mylables would the same and one could put
> them side by side (e.g., in one data frame)?
> Thanks a lot!
> --
> Dimitri Liakhovitski
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 - Specifying Colors Manually

2015-07-22 Thread Hadley Wickham
Try this:

ggplot(mydf,aes(x)) +
  geom_line(aes(y = y1, colour = "y1")) +
  geom_line(aes(y = y2, colour = "y2"))  +
  scale_color_manual(values = c(y1 = "green4", y2 = "blue2"))

Note that you don't need to use `mydf` and names in the manual scale
should match the values in the aes() calls.


On Wed, Jul 22, 2015 at 1:13 PM, Abiram Srivatsa  wrote:
> Hi,
> Given a data frame, I'm trying to graph multiple lines on one graph, each
> line being a different color and each colored line corresponding to a
> specific name in the legend. Here is a very basic data sample to work with:
>  x <- seq(0,40,10)
>  y1 <- sample(1:50,5)
>  y2 <- sample(1:50,5)
>  mydf <- data.frame(x,y1,y2)
>  p <- ggplot(mydf,aes(x=mydf$x)) +
> geom_line(aes(y=mydf$y1,colour="green4")) +
> geom_line(aes(y=mydf$y2,colour="blue2"))  +
>  scale_color_manual(name="legend",values=c(y1="green4",y2="blue2"))
>  p
> When I run this, the entire plot is blank. What I WANT to show up is two
> lines, one being the color of green4 and the other being blue2. Besides
> that, I'm trying to associate the colors with the names "y1" and "y2" in
> the legend, but my codes don't seem to be working.
> I'm very new to R/ggplot2, and I really appreciate any and all help I can
> get.
> Thank you!
> [[alternative HTML version deleted]]
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 - geom_text() with date as x-axis

2015-07-30 Thread Hadley Wickham
I'm a bit confused what you're trying to accomplish - the mix of
annotate() and geom_text() is confusing. The following code works for
me, and I think might be what you want:

ggplot(ins, aes(td, glucose)) +
  geom_point(colour = "red") +
  geom_line(colour = "blue") +
  annotate("text", x = texdat, y = 500, label = "Glucose", cex = 3)


On Thu, Jul 30, 2015 at 10:23 AM, John Kane  wrote:
> I am trying to annotate a graph using geom_text() and I seem to be 
> misunderstanding how to use a date in the co-ordinates---or, at least, I 
> think that is the problem. Code is below.
> Can anyone give me a suggestion of where I am going wrong?
> Thanks,
> John
> John Kane
> Kingston ON Canada
> ###===
> ibrary(ggplot2)
> library(lubridate)
> ins  <-  structure(list(td = structure(c(1437804720, 1437824100, 1437836220,
> 1437851580, 1437863460, 1437878640, 1437890640, 1437904800, 1437918240,
> 1437926100, 1437941340, 1437951240), tzone = "UTC", class = c("POSIXct",
> "POSIXt")), glucose = c(328L, 390L, 358L, 387L, 440L, 328L, 365L,
> 450L, 467L, 477L, 408L, 457L), dose = c(NA, 0.5, NA, NA, 0.5,
> NA, NA, 0.5, NA, NA, NA, 0.5)), .Names = c("td", "glucose", "dose"
> ), row.names = c(NA, -12L), class = "data.frame")
> anon  <- na.omit(ins)  # extract shots
> texdat =  ymd_hm("2015-07-26 20:09")
> glucose  <-  ggplot(ins, aes(td, glucose)) + geom_point(colour = "red") + 
> geom_line(colour = "blue")
> p1  <-  glucose + annotate("text", x = texdat, y = 500, label = anon[ ,3 ], 
> cex = 3)
> p1
> # Now the problem
>  p2  <-  p1 +  geom_text(x = texdat, y = 400 , size = 2,  label= "Glucose")
> p2
> ###=
> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your 
> desktop!
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dplyr and function length()

2015-08-04 Thread Hadley Wickham
> No, the effect I described has nothing to do wit USING dplyr. It occurs with
> any (preexisting) data.frame once dplyr is LOADED (require(dplyr). It is
> this silent, sort of "backward acting" effect that disturbs me.

You're going to need to provide some evidence for that charge: dplyr
does not affect the behaviour of data.frames (only tbl_dfs)



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dplyr and function length()

2015-08-04 Thread Hadley Wickham
>> length(df[,1]).
>> Both commands will return n.
>> However, once dplyr is loaded,
>> length(df[,1]) will return a value of 1.
>> length(df$m1) and also length(df[[1]]) will correctly return n.
>> I know that using length() may not be the most elegant or efficient way to 
>> get the value of n. However, what puzzles (and somewhat disturbs) me is that 
>> loading of dplyr affects how length() works, without there being a warning 
>> or masking message upon loading it.
>> Any clarification or comment would be welcome.
> Presumably, dplyr changes how [.data.frame works (by altering the default for 
> drop=, I expect) so that df[,1] is a data frame with 1 variable and not a 
> vector. And yes, that _is_ somewhat disturbing.

It changes the behaviour for [.tbl_df (tbl_df is a very minor
extension of data frame with custom [ and print methods).  This is
partly an experiment to see what happens when you make [ more
consistent - [.tbl_df always returns a data frame, so if you want a
vector you have to use [[.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] attributes in dplyr and haven

2015-08-04 Thread Hadley Wickham
Install the latest version of dplyr? Should be fixed there.

On Tue, Aug 4, 2015 at 9:40 AM, Conklin, Mike (GfK)
> I read in spss files using haven's read_spss. Each column then gets 
> attributes assigned named
> label - a long description of the variable
> class -" labelled"
> labels --- answer labels i.e. 1=Male, 2=Female
>  example -
>> attributes(KPTV[[3]])
> $label
> [1] "DERIVED: Survey language"
> $class
> [1] "labelled"
> $labels
> English Spanish
>   1   2
> However, if I subset the data.frame  e.g. MassTV<-KPTV[row selection logic,] 
> the label attribute disappears
> attributes(MassTV[[3]])
> $labels
> English Spanish
>   1   2
> $class
> [1] "labelled"
> If I use dplyr to filter the data I simply get an ERROR that the label 
> attribute is not supported.
>> MassTV<-filter(KPTV,KPTV$MNO %in% KPMass$`KPMain$mno`)
> Error: column 'MNO' of type numeric has unsupported attributes: label
> Any ideas on how I can preserve the label attribute (i.e. the long 
> description of the variable name?)
> Thanks for any help,
> Mike
> --
> W. Michael Conklin
> Executive Vice President
> Marketing & Data Sciences - North America
> GfK | 8401 Golden Valley Road | Minneapolis | MN | 55427
> T +1 763 417 4545 | M +1 612 567 8287
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] writing binary data from RCurl and postForm

2015-08-05 Thread Hadley Wickham
> I think that is because the value returned from postForm has an attribute;
> remove it by casting the return to a vector
>   fl <- tempfile(fileext=".pdf")
>   writeBin(as.vector(postForm(url, binary=TRUE)), fl)
> The httr package might also be a good bet
>   writeBin(content(POST(url)), fl)

Or write response directly to disk with

POST(url, write_disk(fl))



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory hungry routines

2014-12-29 Thread Hadley Wickham
You might find the advice at helpful.

> Is there any way to detect which calls are consuming memory?
> I run a program whose global variables take up about 50 Megabytes of
> memory, but when I monitor the progress of the program it seems to
> allocating 150 Megabytes of memory, with peaks of up to 2 Gigabytes.
> I know that the global variables aren't "copied" many times by the
> routines, but I suspect something weird must be happening.
> Alberto Monteiro
> PS: the lines, below, count the memory allocated to all global
> variables, probably it could be adapted to track the local variables:
> y <- ls(pat="")   # get all names of the variables
> z <- rep(0, length(y))  # create array of sizes
> for (i in 1:length(y)) z[i] <- object.size(get(y[i]))  # loop: get all
> sizes (in bytes) of the variables
> # BTW, is there any way to vectorialize the above loop?
> xix <-, index.return = TRUE)  # sort the sizes
> y <- y[xix$ix]  # apply the sort to the variables
> z <- z[xix$ix]  # apply the sort to the sizes
> y <- c(y, "total")  # add a totalizator
> z <- c(z, sum(z))  # sum them all
> cbind(y, z)  # ugly way to list them
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R vs. RStudio?

2015-01-12 Thread Hadley Wickham
On Mon, Jan 12, 2015 at 2:01 AM, peter dalgaard  wrote:
>> On 11 Jan 2015, at 11:30 , Duncan Murdoch  wrote:
>> - I don't like the tiled display.  I find it doesn't give me enough space.
> This is a mixed blessing. For teaching purposes, it helps avoid shuffling 
> windows to uncover the editor, graph window, and terminal in order to 
> demonstrate various points.
> (One can fairly quickly get used to do that for one's own purposes, but in 
> the classroom it becomes "noise on the line".) However, the graph tile rather 
> too easily get into the "Figure margins too large" issue and readability of 
> the text tiles can become a problem.

I used to really dislike the tiling, but now I'm mostly ok with it
(especially once I realised RStudio is designed to be used
fullscreen). It's certainly a huge improvement for new users, since
they never lose windows behind other windows, and the same type of
thing always appears in the same place. OTOH if the projector isn't
particularly good or the room is large, and you've cranked up the size
so everyone can read it, it can be hard to fit everything on one



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R vs. RStudio?

2015-01-12 Thread Hadley Wickham
Is there a reason you don't just click the zoom button?

On Mon, Jan 12, 2015 at 8:22 AM, John Fox  wrote:
> Dear Peter and Jeff,
> I've used RStudio in teaching for quite some time now. For displaying
> graphics, I open a windows() graphics device on a Windows PC or a quartz()
> device on a Mac. I explain to the students that they don't have to do this,
> but I'm doing it so that I can make the graphs larger. There are still some
> issues arising from the paned display, but I find it reasonably simple to
> adjust the size of the panes as needed during a demonstration, often pushing
> the vertical divider far to the right.
> Best,
>  John
>> -Original Message-
>> From: R-help [] On Behalf Of peter
>> dalgaard
>> Sent: January-12-15 9:00 AM
>> To: Jeff Newmiller
>> Cc: R mailing list
>> Subject: Re: [R] R vs. RStudio?
>> On 12 Jan 2015, at 09:28 , Jeff Newmiller 
> wrote:
>> > If you have two screens the "zoom" plot window can fill the second
> screen.
>> Some laptops can handle a second external screen if you use a docking
>> station.
>> Unfortunately, such luxury is not available in the classroom. All too
> often, the
>> projector setup is calibrated to display 3-bullet PowerPoint
> presentations...
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000
>> Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Email:  Priv:
>> __
>> mailing list -- To UNSUBSCRIBE and more, see
>> PLEASE do read the posting guide
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ---
> This email has been checked for viruses by Avast antivirus software.
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] stringr::str_split_fixed query

2015-01-15 Thread Hadley Wickham
FWIW this is fixed in the dev version of stringr which uses stringi
under the hood:

> stringr::str_split_fixed('ab','',2)
 [,1] [,2]
[1,] "a"  "b"
> stringr::str_split_fixed('ab','',3)
 [,1] [,2] [,3]
[1,] "a"  "b"  ""


On Wed, Jan 14, 2015 at 12:47 PM, David Barron  wrote:
> I'm puzzled as to why I get this behaviour with str_split_fixed in the
> stringr package.
>> stringr::str_split_fixed('ab','',2)
>  [,1] [,2]
> [1,] ""   "ab"
>> stringr::str_split_fixed('ab','',3)
>  [,1] [,2] [,3]
> [1,] ""   "a"  "b"
> In the first example, I was expecting to get
>  [,1] [,2]
> [1,] "a"   "b"
> Can someone explain?
> Thanks,
> David
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Latest version of Rtools is incompatible with latest version of R !!

2015-01-16 Thread Hadley Wickham
>> Funnily, this problem disappears when I use RTools31.exe. And, I am not the 
>> only one facing this issue. A lot of people in my group (in which we all are 
>> learning R) are facing the same problem. I am really puzzled as to why 
>> RTools32.exe isn't compatible with R 3.1.2 !!
>> Thanks again for your time !!Prameet
> You are asking the wrong question.  You should be asking why devtools
> says that Rtools 3.2 is incompatible.

Yes, that was my fault - I hadn't updated devtool's list of Rtools
versions to tell it about 3.2.

I've fixed it in the dev version which can be installed by following
the instructions at
 (I'd appreciate it if someone could try this out and let me know if
it doesn't work)



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Passing a Data Frame Name as a Variable in a Function

2015-01-29 Thread Hadley Wickham
On Thu, Jan 29, 2015 at 11:43 AM, Alan Yong  wrote:
> Much thanks to everyone for their recommendations!  I agree that fishing in 
> the global environment isn't ideal & only shows my budding understanding of R.
> For now, I will adapt Chel Hee's "length(eval(parse(text=DFName))[,1])" 
> solution then fully explore Jeff's suggestion to put the data frames into a 
> list.

If you have to go down this route, at least do nrow(get(DFName))

> (1) Add a column to each data frame with a string that is parsed from the 
> appendage of the data frame name, i.e., string is "1001" from data frame 
> object of "df.1001"; then,
> (2) Bind the rows of all the files.

I'd highly recommend learning a little functional programming such as
the use of lapply (e.g.
Then you can easily do:

csvs <- dir(pattern = "\\.csv$")
all <- lapply(csvs, read.csv)
one <-"rbind", all)

to find all the csv files in a directory, load into a list and then
collapse into a single data frame.

You're much better off learning how to do this than futzing around
with named objects in the global environment.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package build help

2015-02-11 Thread Hadley Wickham
On Sun, Feb 8, 2015 at 5:15 PM, Duncan Murdoch  wrote:
> On 08/02/2015 4:06 PM, Glenn Schultz wrote:
>> Hello All,
>> I am in the final stages of building my first package "BondLab" and the 
>> check throughs the following warning.  I think this is namespace thing.  I 
>> have not done anything with the namespace at this point.  I am turning my 
>> attention to the namespace now.  Am I correct this can be a handled by the 
>> namespace?
> I would guess you have imported the lubridate and plyr packages, and
> also defined your own duration() and here() functions, hiding theirs.

You can also see this problem if you have

import(plyr, here)


Or with


since I think both provide a here() function.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Shiny User Group

2015-02-12 Thread Hadley Wickham
Maybe!forum/shiny-discuss ?


On Thu, Feb 12, 2015 at 9:07 AM, Doran, Harold  wrote:
> I found a google user group for shiny, and am curious if there is an SIG as 
> well. Didn't see one in my searches, but looking for an active place to ask 
> questions and share code.
> Thanks.
> [[alternative HTML version deleted]]
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] %>%

2015-02-17 Thread Hadley Wickham
On Tue, Feb 17, 2015 at 6:02 PM, Hervé Pagès  wrote:
> On 02/17/2015 02:10 PM, Erich Neuwirth wrote:
>> AFAIK dplyr imports magrtittr.
>> So dplyr ses %>% from migrittr, it does not have its own version.
> But it has its own man page so who knows?

If you import and re-export a function from another package, you have
to document it, even if it's already documented elsewhere.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dplyr: producing a good old data frame

2015-02-25 Thread Hadley Wickham
Hi John,

Just printing the result gives a good indication where the problem lies:

> frm %>% rowwise() %>% do(MM=max(as.numeric(.)))
Source: local data frame [6 x 1]


do() is designed to produce scalars (e.g. a linear model), not
vectors, so it doesn't join the results back into a single vector. You
can either fix this yourself with unlist(), or use tidyr::unnest()
which will also handle vectors with length > 1.


On Mon, Feb 23, 2015 at 2:54 PM, John Posner  wrote:
> I'm using the dplyr package to perform one-row-at-a-time processing of a data 
> frame:
>> rnd6 = function() sample(1:300, 6)
>> frm = data.frame(AA=rnd6(), BB=rnd6(), CC=rnd6())
>> frm
> 1 123  50  45
> 2  12  30 231
> 3 127 147 100
> 4 133  32 129
> 5  66 235  71
> 6  38 264 261
> The interface is nice and straightforward:
>> library(dplyr)
>> dplyr_result = frm %>% rowwise() %>% do(MM=max(as.numeric(.)))
> I've gotten used to the fact that dplyr_result is not a good old "vanilla" 
> data frame. The function *seems* to do the trick:
>> dplyr_result_2 =
>> dplyr_result_2
> 1 123
> 2 231
> 3 147
> 4 133
> 5 235
> 6 264
> ... but there's trouble ahead:
>> mean(dplyr_result_2$MM)
> [1] NA
> Warning message:
> In mean.default(dplyr_result_2$MM) :
>   argument is not numeric or logical: returning NA
> I need to enlist unlist() to get me to my destination:
>> mean(unlist(dplyr_result_2$MM))
> [1] 188.8333
> [NOTE: dplyr's as_data_frame() function does a better job than 
> of indicating that I was headed for trouble. ]
> By contrast, the plyr package's adply() function *does* produce a vanilla 
> data frame:
>  > library(plyr)
>> plyr_result = adply(frm, .margins=1, function(onerowfrm) 
>> max(as.numeric(onerowfrm[1,])))
>> mean(plyr_result$V1)
> [1] 188.8333
> Is there a good reason for dplyr to require the extra processing? My (naïve 
> ?) recommendation would be to have as_data_frame() produce a vanilla data 
> frame.
> Tx,
> John
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] readHTMLTable() in XML package

2015-03-02 Thread Hadley Wickham
This somewhat simpler rvest code does the trick for me:


i <- 1:10
urls <- paste0('',
  '&is_mobile=0&page=', i)

results_table <- function(url) {
  url %>% html %>% html_table(fill = TRUE) %>% .[[1]]

results <- lapply(urls, results_table)
out <- results %>% bind_rows()


On Mon, Mar 2, 2015 at 10:00 AM, Doran, Harold  wrote:
> I'm having trouble pulling down data from a website with my code below as I 
> keep encountering the same error, but the error occurs on different pages.
> My code below loops through a wensite and grabs data from the html table. The 
> error appears on different pages at different times and I'm not sure of the 
> root cause.
> Error in readHTMLTable(readLines(url), which = 1, header = TRUE) :
>   error in evaluating the argument 'doc' in selecting a method for function 
> 'readHTMLTable': Error in readHTMLTable(readLines(url), which = 1, header = 
> TRUE) :
>   error in evaluating the argument 'doc' in selecting a method for function 
> 'readHTMLTable':
> library(XML)
> for(i in 1:1000){
> url <- 
> paste(paste('',
>  i, sep=''), 
> '&division=1®ion=0&numberperpage=100&competition=0&frontpage=0&expanded=1&year=15&full=1&showtoggles=0&hidedropdowns=0&showathleteac=1&=&is_mobile=0',
>  sep='')
> tmp <- readHTMLTable(readLines(url), which=1, header=TRUE)
> names(tmp) <- gsub("\\n", "", names(tmp))
> names(tmp) <- gsub(" +", "", names(tmp))
> tmp[] <- lapply(tmp, function(x) gsub("\\n", "", x))
> if(i == 1){
> dat <- tmp
> } else {
> dat <- rbind(dat, tmp)
> }
> cat('Grabbing data from page', i, '\n')
> }
> Thanks,
> Harold
> [[alternative HTML version deleted]]
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using and abusing %>% (was Re: Why can't I access this type?)

2015-03-27 Thread Hadley Wickham
> I didn't dispute whether '%>%' may be useful -- I just pointed out that it
> is slow.  However, it is only part of the problem: 'filter()' and
> 'select()', although aesthetically pleasing, also seem to be slow:
>> all.states <- data.frame(state.x77, Name = rownames(state.x77))
>> f1 <- function()
> + all.states[all.states$Frost > 150, c("Name", "Frost")]
>> f2 <- function()
> + subset(all.states, Frost > 150, select = c("Name", "Frost"))
>> f3 <- function() {
> + filt <- subset(all.states, Frost > 150)
> + subset(filt, select = c("Name", "Frost"))
> + }
>> f4 <- function()
> + all.states %>% subset(Frost > 150) %>%
> + subset(select = c("Name", "Frost"))
>> f5 <- function()
> + select(filter(all.states, Frost > 150), Name, Frost)
>> f6 <- function()
> + all.states %>% filter(Frost > 150) %>% select(Name, Frost)
>> mb <- microbenchmark(
> + f1(), f2(), f3(), f4(), f5(), f6(),
> + times = 1000L
> + )
>> print(mb, signif = 3L)
> Unit: microseconds
>  expr min   lq  mean median   uq  max neval   cld
>  f1() 115  124  134.8812129  134 1500  1000 a
>  f2() 128  141  147.4694145  151 1520  1000 a
>  f3() 303  328  344.3175338  348 1740  1000  b
>  f4() 458  494  518.0830510  523 1890  1000   c
>  f5() 806  848  887.7270875  894 3510  1000d
>  f6() 971 1010 1056.5659   1040 1060 3110  1000 e
> So, using '%>%', but leaving 'filter()' and 'select()' out of the equation,
> as in 'f4()' is only half as bad as the "full" 'dplyr' idiom in 'f6()'.  In
> this case, since we're talking microseconds, the speed-up is negligible but
> that *is* beside the point.

When benchmarking it's important to consider both the relative and
absolute difference and to think about how the cost scales as the data
grows - the cost of using using %>% is fixed, and 500 µs doesn't seem
like a huge performance penalty to me.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] idiom for constructing data frame

2015-04-03 Thread Hadley Wickham
On Tue, Mar 31, 2015 at 6:42 PM, Sarah Goslee  wrote:
> On Tue, Mar 31, 2015 at 6:35 PM, Richard M. Heiberger  wrote:
>> I got rid of the extra column.
>> data.frame(r=seq(8), foo=NA, bar=NA, row.names="r")
> Brilliant!
> After much fussing, including a disturbing detour into nested lapply
> statements from which I barely emerged with my sanity (arguable, I
> suppose), here is a one-liner that creates a data frame of arbitrary
> number of rows given an existing data frame as template for column
> number and name:
> n <- 8
> df1 <- data.frame(A=runif(9), B=runif(9))
>, setNames(c(list(seq(n), "r"), as.list(rep(NA,
> ncol(df1, c("r", "row.names", colnames(df1
> It's not elegant, but it is fairly R-ish. I should probably stop
> hunting for an elegant solution now.

Given a template df, you can create a new df with subsetting:

df2 <- df1[rep(NA_real_, 8), ]
rownames(df2) <- NULL

This has the added benefit of preserving the types.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] open xlsx file using read.xls function of gdata package

2015-04-03 Thread Hadley Wickham
You might try the readxl package - it's only available on github but it
reads both xlsx and xls. All going well, it should be on its way to CRAN
next week.


On Friday, April 3, 2015, Luigi Marongiu  wrote:

> Dear all,
> I am trying to open excel files using the gdata package. I can do that
> using a .xls file, but the same file, containing the same data,
> formatted in .xlsx gives error (R does not recognize the pattern from
> where to start reading the data).
> Doen anybody knows whether it is possible to read .xlslx with this package?
> Am I missing another package to implement the reading of the .xlsx?
> Thank you
> Luigi
> PS: this is the error I get:
> > my.file <- "array.xlsx"
> ><-read.xls(
> +   my.file,
> +   sheet="sheet x",
> +   verbose=FALSE,
> +   pattern="row name",
> +   na.strings=c("NA","#DIV/0!"),
> +   method="tab",
> +   perl="perl"
> + )
> > Warning message:
> In read.xls(my.file, sheet = "sheet x", verbose = FALSE,  :
>   pattern not found
> The verbose version runs like this:
> “array.xlsx”
> to tab  file
> “/tmp/Rtmp2tAjzz/”
> ...
> Executing ' '/usr/bin/perl'
> '/home/gigiux/R/x86_64-pc-linux-gnu-library/3.0/gdata/perl/'
>  'array.xlsx' '/tmp/Rtmp2tAjzz/' 'sheet x' '...
> Loading 'array.xlsx'...
> Done.
> Orignal Filename: array.xlsx
> Number of Sheets: 2
> Writing sheet 'sheet x' to file '/tmp/Rtmp2tAjzz/'
> Minrow=31 Maxrow=17310 Mincol=0 Maxcol=4
>   (Ignored 0 blank lines.)
> 0
> Done.
> Searching for lines tfntaining pattern  row name ...
> Warning message:
> In read.xls(my.file, sheet = "sheet x", verbose = TRUE,  :
>   pattern not found
> >
> __
>  mailing list -- To UNSUBSCRIBE and
> more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RMarkdown vignettes v. Jupyter notebooks?

2018-10-11 Thread Hadley Wickham
I'd highly recommend Yihui's extensive write up:

On Thu, Oct 11, 2018 at 4:08 AM Spencer Graves
> Hello:
>What are the differences between Jupyter notebooks and RMarkdown
> vignettes?
>I'm trying to do real time monitoring of the broadcast quality of
> a radio station, and it seems to me that it may be easier to do that in
> Python than in R.[1]  This led me to a recent post to
> "" that mentioned "Jupyter, Mathematica, and the
> Future of the Research Paper"[2] by Paul Romer, who won the 2018 Nobel
> Memorial Prize in Economics only a few days ago.  In brief, this article
> suggests that Jupyter notebooks may replace publication in refereed
> scientific journals as the primary vehicle for sharing scientific
> research, because they make it so easy for readers to follow both the
> scientific and computational logic and test their own modifications.
>A "Jupyter Notebook Tutorial: The Definitive Guide"[3] suggested
> I first install Anaconda Navigator.  I got version 1.9.2 of that.  It
> opens with options for eight different "applications" including
> JupyterLab 0.34.9, Jupyter Notebook 5.6.0, Spyder 3.3.1 (an IDE for
> Python), and RStudio 1.1.456.
>This leads to several questions:
>  1.  In general, what experiences have people had with
> Jupyter Notebooks, Anaconda Navigator, and RMarkdown vignettes in
> RStudio, and the similarities and differences?  Do you know any
> references that discuss this?
>  2.  More specifically, does it make sense to try to use
> RStudio from within Anaconda Navigator, or is one better off using
> RStudio as a separate, stand alone application -- or should one even
> abandon RStudio and run R instead from within a Jupyter Notebook? [I'm
> new to this topic, so it's possible that this question doesn't even make
> sense.]
>Spencer Graves
> [1] If you have ideas for how best to do real time monitoring of
> broadcast quality of a radio station, I'd love to hear them.  I need
> software that will do that, preferably something that's free, open
> source.  The commercial software I've seen for this is not adequate for
> my purposes, so I'm trying to write my own.  I have a sample script in
> Python that will read a live stream from a radio tuner and output a
> *.wav of whatever length I want, and I wrote Python eight years ago for
> a similar real time application.  I'd prefer to use R, but I don't know
> how to get started.
> [2] 2018-04-13:
> "";.
> This further cites a similar article in The Atlantic from 2018-04-05:
> "".
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unable to install ggplot2

2019-03-06 Thread Hadley Wickham
rlang works with R 3.1 and up, but it does require compilation from
source, which I suspect is the root cause of this problem.


On Wed, Mar 6, 2019 at 5:36 PM peter dalgaard  wrote:
> Also, R seems to be version 3.2.x i.e. 3-4 years old. Earliest rlang is anno 
> 2017 as far as I can tell.
> -pd
> > On 6 Mar 2019, at 19:22 , Norberto Hernandez  
> > wrote:
> >
> > I have the same issue with ggplot2 and the rlang package, you need to
> > have the most updated version of the rlang library in order to get
> > installed ggplot2
> >
> > Regards
> > Norberto
> >
> > El mar., 5 mar. 2019 a las 14:24, Jeff Newmiller
> > () escribió:
> >>
> >> Please post the text version of the error in the future... your picture is 
> >> almost unreadable. Also, if it is actually important that you are using 
> >> RStudio then your question probably doesn't belong here. Also, if the 
> >> problem is a faulty contributed package then you will need to contact the 
> >> package maintainer as the Posting Guide mentioned below says.
> >>
> >> I don't know why the dependency is not being handled correctly, but my 
> >> suggestion would be to install the rlang package first, and once that is 
> >> installed try installing ggplot2. Read the errors... it says there is a 
> >> problem with the rlang package.
> >>
> >> On March 5, 2019 10:04:41 AM PST, Kamalika Ray  
> >> wrote:
> >>> Hi,
> >>> I have been trying to install the ggplot2 package but I am unable to do
> >>> so. My Mac OS version is 10.7.4 and I have downloaded the
> >>> R-Studio-1.1.463.
> >>> I have attached the screenshot of the error message which appears.
> >>>
> >>> Please help!
> >>>
> >>> Thank you,
> >>> Kamalika
> >>> India
> >>
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >>
> >> __
> >> mailing list -- To UNSUBSCRIBE and more, see
> >>
> >> PLEASE do read the posting guide 
> >>
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > mailing list -- To UNSUBSCRIBE and more, see
> >
> > PLEASE do read the posting guide
> > and provide commented, minimal, self-contained, reproducible code.
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email:  Priv:
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why I can not get work the "tidyverse" and "corrr" libraries

2019-04-17 Thread Hadley Wickham
On Wed, Apr 17, 2019 at 1:06 PM Jeff Newmiller  wrote:
> From reading
> > namespace ‘rlang’ 0.3.0 is already loaded, but >= 0.3.1 is required
> it would seem that you need to upgrade your rlang package...

Typically this indicates you need to restart R.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ATTN: Urgent Guidance Needed on scraping tweets for last 10 years using TwitteR / search twitter function.

2014-07-30 Thread Hadley Wickham
The first twitter message was sent on March 21st, 2006...


On Wed, Jul 30, 2014 at 3:58 AM, Abhishek Dutta  wrote:
> Hi
> This is Abhishek and I am trying to look for tweets on 'Election' from
> 2000 to YTD. I have registered on twitter and performed a handshake
> between the systems as well. Next I am trying to fetching tweets
> chronologically using the below code:-
> tweets1.list = searchTwitter('Election',lang="en",since='2000-07-01',
> until='2014-07-30', cainfo="cacert.pem")
> All I get in return is 26 line items between 27th - 28th of July only.
> Can you please help me understand why it gives me so less number of
> tweets, backdated by two days only & also if there is a an alternative
> method of fetching tweets over the last ten years, at the minimum,
> categorized by date ?
> Many thanks in advance for your guidance. This is an urgent request
> and hence requesting your immediate assistance.
> Best
> Abhishek
> --
> Abhishek Dutta
> [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Just stumbled across this: Advanced R programming text & code - from Hadley

2014-08-11 Thread Hadley Wickham
Or just go to ...


On Sun, Aug 10, 2014 at 9:34 PM, John McKown
> Well, it says that it's from Hadley Wickham.
> This is code and text behind the Advanced R programming book.
> The site is built using jekyll, with a custom plugin to render .rmd
> files with knitr and pandoc. To create the site, you need:
> jekyll and s3_websiter gems: gem install jekyll s3_website
> pandoc
> knitr: install.packages("knitr")
> This contains a Rstudio project file. I know because I've done a git
> clone on it and loaded it into Rstudio, on Linux. If you don't have
> git, there is a "download zip" option on the site too.
> --
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
> Maranatha! <><
> John McKown
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loading saved files with objects in same names

2014-08-23 Thread Hadley Wickham
In the future, you can avoid this problem by using saveRDS and readRDS.


On Mon, Aug 18, 2014 at 7:30 PM, Jinsong Zhao  wrote:
> Hi there,
> I have several saved data files (e.g., A.RData, B.RData and C.RData). In
> each file, there are some objects with same names but different contents.
> Now, I need to compare those objects through plotting. However, I can't find
> a way to load them into a workspace. The only thing I can do is to rename
> them and then save and load again.
> Is there a convenient to load those objects?
> Thanks a lot in advance.
> Best regards,
> Jinsong
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] URLdecode problems

2014-09-01 Thread Hadley Wickham
Hi Oliver,

I think you're being misled by the default behaviour of warnings: they
all get displayed at once, before control returns to the console.  If
you making them immediate, you get a slightly more informative error:

> URLdecode("0;%20@%gIL")
Warning in URLdecode("0;%20@%gIL") :
  out-of-range values treated as 0 in coercion to raw
Error in rawToChar(out) : embedded nul in string: '0; @\0L'

So the out of range value (%g...) is getting converted to a raw(0),
aka a nul. Then rawToChar() chokes.

The code for URLdecode is simple enough that I'd recommend rewriting
yourself to better handle bad inputs.


On Mon, Sep 1, 2014 at 11:02 AM, Oliver Keyes  wrote:
> Hey all,
> So, I'm attempting to decode some (and I don't know why anyone did this)
> URl-encoded user agents. Running URLdecode over them generates the error:
> "Error in rawToChar(out) : embedded nul in string"
> Okay, so there's an embedded nul - fair enough. Presumably decoding the URL
> is exposing it in a format R doesn't like. Except when I try to dig down
> and work out what an encoded nul looks like, in order to simply remove them
> with something like gsub(), I end up with several different strings, all of
> which apparently resolve to an embedded nul:
>> URLdecode("0;%20@%gIL")
> Error in rawToChar(out) : embedded nul in string: '0; @\0L'
> In addition: Warning message:
> In URLdecode("0;%20@%gIL") :
>   out-of-range values treated as 0 in coercion to raw
>> URLdecode("%20%use")
> Error in rawToChar(out) : embedded nul in string: ' \0e'
> In addition: Warning message:
> In URLdecode("%20%use") :
>   out-of-range values treated as 0 in coercion to raw
> I'm a relative newb to encodings, so maybe the fault is simply in my
> understanding of how this should work, but - why are both strings being
> read as including nuls, despite having different values? And how would I go
> about removing said nuls?
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
> [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Detect expired RSQLiteConnection?

2014-09-02 Thread Hadley Wickham
DBI 0.3 (just released to CRAN) includes a new generic, dbIsValid(),
for exactly this purpose. Unfortunately no packages implement a method
for it yet, but eventually it will be the right way to detect this

(I'm now the maintainer for RSQLite, so I added this to my to do list: Pull requests are very


On Tue, Sep 2, 2014 at 7:32 AM, Duncan Murdoch  wrote:
> Is there a test for an expired RSQLiteConnection?  For example, if I run
> library(RSQLite)
> f <- tempfile()
> con <- dbConnect(SQLite(), f)
> dbDisconnect(con)
> con
> then I get
>> con
> and most operations using it give errors. (In my case I have a
> persistent connection object, but if I save the workspace and then
> reload it, I get the expired connection.) I'd like to detect this case.
>  Do I need to use try(), or parse the result of printing it?
> Duncan Murdoch
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Operator proposal: %between%

2014-09-05 Thread Hadley Wickham
> Please add it if you think it fits, and expand it as discussed, I am not
> creating a package for one single utility function.

Why not?  There's nothing wrong with a package that only provides one function.



__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R's memory limitation and Hadoop

2014-09-16 Thread Hadley Wickham
Hundreds of thousands of records usually fit into memory fine.


On Tue, Sep 16, 2014 at 12:40 PM, Barry King  wrote:
> Is there a way to get around R’s memory-bound limitation by interfacing
> with a Hadoop database or should I look at products like SAS or JMP to work
> with data that has hundreds of thousands of records?  Any help is
> appreciated.
> --
> __
> *Barry E. King, Ph.D.*
> Analytics Modeler
> Qualex Consulting Services, Inc.
> O: (317)940-5464
> M: (317)507-0661
> __
> [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hadley's book: paper/PDF/etc. versus github

2014-10-03 Thread Hadley Wickham
> Hi, folks.  I've got a sort of coupon that would allow me to get a
> copy of "Advanced R" by Hadley Wickham at no cost.  OTOH, I've already
> cloned the github repository, and having the "live" Rmd files (or in
> this case, rmd files) is enormously more useful to me than having any
> form of electronic or paper format.

I presume you mean (no need to be
secretive about it ;)

> The only reason I can think of for getting, say, a PDF version of the
> book is that corrected versions of such books are sometimes (always?)
> made available for free if you've already got the PDF version of the
> book.  (I know O'Reilly does this.)

The pdf version of the book is made from the files in that repo, so I
don't see any advantage there.  (You can build the pdf yourself if you
spend a few minutes looking for the right file ;)

> But if the github version is going to continue to exist, be updated,
> and be generally available, that's even better.  IS it going to exist,
> be updated, and be generally available?  Any thoughts?

The github version _is_ the authoritative version of the book (and in
some sense it's already slightly better than the book, since a number
of minor typos have been fixed since the book was published). C&H is
mostly print on demand, so later printings of the book are likely to
pick up the improvements, although there is still some additional
human checking in the process, so it'll only get updated every 6
months or so.

The repo and will continue to exist for the
foreseeable future.



__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hadley's book: paper/PDF/etc. versus github

2014-10-06 Thread Hadley Wickham
Yes, I have, but the scripts would need some tweaking.

On Mon, Oct 6, 2014 at 12:28 PM, Greg Snow <> wrote:
> Hadley, have you tried producing the book in other electronic formats
> (other than pdf)? such as epub?  I tried and ended up with a file that
> worked, but all the example code was missing (which defeats the
> convenience of having it on an ebook reader), I did not check if
> everything else was there or not.
>  thanks,
> On Fri, Oct 3, 2014 at 6:37 AM, Hadley Wickham  wrote:
>>> Hi, folks.  I've got a sort of coupon that would allow me to get a
>>> copy of "Advanced R" by Hadley Wickham at no cost.  OTOH, I've already
>>> cloned the github repository, and having the "live" Rmd files (or in
>>> this case, rmd files) is enormously more useful to me than having any
>>> form of electronic or paper format.
>> I presume you mean (no need to be
>> secretive about it ;)
>>> The only reason I can think of for getting, say, a PDF version of the
>>> book is that corrected versions of such books are sometimes (always?)
>>> made available for free if you've already got the PDF version of the
>>> book.  (I know O'Reilly does this.)
>> The pdf version of the book is made from the files in that repo, so I
>> don't see any advantage there.  (You can build the pdf yourself if you
>> spend a few minutes looking for the right file ;)
>>> But if the github version is going to continue to exist, be updated,
>>> and be generally available, that's even better.  IS it going to exist,
>>> be updated, and be generally available?  Any thoughts?
>> The github version _is_ the authoritative version of the book (and in
>> some sense it's already slightly better than the book, since a number
>> of minor typos have been fixed since the book was published). C&H is
>> mostly print on demand, so later printings of the book are likely to
>> pick up the improvements, although there is still some additional
>> human checking in the process, so it'll only get updated every 6
>> months or so.
>> The repo and will continue to exist for the
>> foreseeable future.
>> Hadley
>> --
>> __
>> mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
> --
> Gregory (Greg) L. Snow Ph.D.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How can I overwrite a method in R?

2014-10-09 Thread Hadley Wickham
This is usually ill-advised, but I think it's the right solution for
your problem:

assignInNamespace("plot.histogram", function(...) plot(1:10), "graphics")


On Thu, Oct 9, 2014 at 1:14 AM, Tim Hesterberg  wrote:
> How can I create an improved version of a method in R, and have it be used?
> Short version:
> I think plot.histogram has a bug, and I'd like to try a version with a fix.
> But when I call hist(), my fixed version doesn't get used.
> Long version:
> hist() calls plot() which calls plot.histogram() which fails to pass ...
> when it calls plot.window().
> As a result hist() ignores xaxs and yaxs arguments.
> I'd like to make my own copy of plot.histogram that passes ... to
> plot.window().
> If I just make my own copy of plot.histogram, plot() ignores it, because my
> version is not part of the same graphics package that plot belongs to.
> If I copy hist, hist.default and plot, the copies inherit the same
> environments as
> the originals, and behave the same.
> If I also change the environment of each to .GlobalEnv, hist.default fails
> in
> a .Call because it cannot find C_BinCount.
> [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "source" command inside R package scripts

2014-10-21 Thread Hadley Wickham
Your source function will be called when the package is _built_, not
when it's loaded/attached. There's almost certainly a better way to
solve your problem than using source() inside a package


On Tue, Oct 21, 2014 at 6:24 AM, Enrico Bibbona  wrote:
> I have built a new package. I would like to put an R script (let us call it
> "script.R) into a subdirectory of the /pkg/R/ directory, called /pkg/R/sub/
> and I would like that such code is run when the package is installed.
> My way of doing so was to put an R script into /pkg/R/ with source command
> like
> source("./R/sub/script.R")
> that does not give me any error, but I know that the script (which
> actuallly defines a few functions is not run. What is wrong? What can I do
> better?
> Thanks, Enrico
> --
> Enrico Bibbona
> Dipartimento di Matematica
> Università di Torino
> [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to speed up list access in R?

2014-10-30 Thread Hadley Wickham
Or do all the subsetting in one pass - [ will use a hashmap.


On Thu, Oct 30, 2014 at 12:05 PM, William Dunlap  wrote:
> You can try using an environment instead of a list.
> Bill Dunlap
> TIBCO Software
> wdunlap
> On Thu, Oct 30, 2014 at 10:02 AM, Thomas Nyberg  wrote:
>> Thanks to all for the help everyone! For the moment I'll stick with Bill's
>> solution, but I'll check out the other recommendations as well.
>> Regarding the issue of slow looks ups for lists, are there any hash map
>> implementations in R that are faster? I like using fairly simple logic and
>> data structures when prototyping and then only optimize code when and where
>> it's necessary which is why I'm curious about these basic objects.
>> On another note, is there a vector style implementation that changes the
>> vectors in place? If I'm not mistaken, the append operation creates and
>> returns a new vector each time which is line with the functional nature of
>> R. If there were some way to have it mutable, it could be much faster. This
>> is fairly standard in many languages. Behind the scenes memory is allocated
>> at say 2 times the current size so that you only need log(n) extensions when
>> building up a vector like this. Are there any such equivalents in R? I
>> presume that lists are mutable (am I wrong?), but they seem to have the
>> lookup slowdown problem.
>> Again thanks a lot!
>> Cheers,
>> Thomas
>> On 10/30/2014 12:05 PM, William Dunlap wrote:
>>> Repeatedly extending vectors takes a lot of time.  You can do what you
>>> want with
>>>d2 <- split(values, factor(numbers, levels=unique(numbers)))
>>> If you would like the labels on d2 to be in numeric order then you can
>>> simplify that to
>>>d3 <- split(values, numbers)
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap
>>> On Thu, Oct 30, 2014 at 8:17 AM, Thomas Nyberg 
>>> wrote:


 I want to do the following: Given a set of (number, value) pairs, I want
 create a list l so that l[[toString(number)]] returns the vector of
 associated to that number. It is hundreds of times slower than the
 equivalent that I would write in python. I'm pretty new to R so I bet I'm
 using its data structures inefficiently, but I've tried more or less
 everything I can think of and can't really speed it up. I have done some
 profiling which helped me find problem areas, but I couldn't speed things
 even with that information. I'm thinking I'm just fundamentally using R
 in a
 silly way.

 I've included code for the different versions. I wrote the python code in
 style to make it as clear to R programmers as possible. Thanks a lot! Any
 help would be greatly appreciated!


 R code (with two versions depending on commenting):


 numbers <- numeric(0)
 for (i in 1:5) {
  numbers <- c(numbers, sample(1:3, 1))

 values <- numeric(0)
 for (i in 1:length(numbers)) {
  values <- append(values, sample(1:10, 1))

 starttime <- Sys.time()

 d = list()
 for (i in 1:length(numbers)) {
  number = toString(numbers[i])
  value = values[i]
  if (is.null(d[[number]])) {
  #if (number %in% names(d)) {
  d[[number]] <- c(value)
  } else {
  d[[number]] <- append(d[[number]], value)

 endtime <- Sys.time()

 print(format(endtime - starttime))


 uncommented version: "45.64791 secs"
 commented version: "1.423056 mins"

 Another version of R code:


 numbers <- numeric(0)
 for (i in 1:5) {
  numbers <- c(numbers, sample(1:3, 1))

 values <- numeric(0)
 for (i in 1:length(numbers)) {
  values <- append(values, sample(1:10, 1))

 starttime <- Sys.time()

 d = list()
 for (number in unique(numbers)) {
  d[[toString(number)]] <- numeric(0)
 for (i in 1:length(numbers)) {
  number = toString(numbers[i])
  value = values[i]
  d[[number]] <- append(d[[number]], value)

 endtime <- Sys.time()

 print(format(endtime - starttime))


 "47.15579 secs"

 The python code:


 import random
 import time

 numbers = []
 for i in range(5):
  numbers += random.sample(range(3), 1)

 values = []
 for i in range(len(numbers)):
  values.append(random.randint(1, 10))

 starttime = time.time()

 d = {}
 for i in range(len(numbers)):
  number = numbers[i]
  value = values[i]
  if d.has_key(number):

Re: [R] Knitr: how to find out from within a .Rmd file the output type?

2014-10-31 Thread Hadley Wickham
Try knitr::opts_knit$get('')


On Fri, Oct 31, 2014 at 6:56 AM, Michal Kvasnička  wrote:
> Hi.
> Is there a way how to find out from within a .Rmd file what output format
> is generated?
> The reason is this: I write a paper in R markdown in RStudio. Sometimes I
> generate .html, sometimes .pdf. My paper presents a table of regression
> models using stargazer function. I've got the following code in my paper:
> ```{r, echo=FALSE, message=FALSE, results='asis'}
> model2 <- lm(...)
> model3 <- lm(...)
> model5 <- lm(...)
> stargazer(model2, model3, model5,
>   ...,
>   type="html")
> ```
> Whenever I change the output format from .html do .pdf, I have to change
> the line type="html" to type="latex" manually. (The same holds true for
> many other functions, e.g. xtable.)
> It would be nice to replace the direct declaration with
>   type=some_knitr_variable
> What is the true name of the some_knitr_variable? I was not able to find it
> anywhere.
> Many thanks for your help.
> Best wishes,
> Michal
> [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CMD check error

2014-11-18 Thread Hadley Wickham
Do you have a .Rbuildignore? If so, what's in it?

On Tue, Nov 18, 2014 at 7:07 AM, Therneau, Terry M., Ph.D.
> I have a new package (local use only).  R CMD check fails with a messge I 
> haven't seen before, and I haven't been able to guess the cause.
> There are two vignettes, both of which have %\VignetteIndexEntry lines.
> Same failure both under R-3.1.1 and R-devel, so it's me and not R.  Linux OS.
> Hints anyone?
> Terry Therneau
> =
> tmt% R CMD build dart
> * preparing 'dart':
> * checking DESCRIPTION meta-information ... OK
> * installing the package to build vignettes
> * creating vignettes ... OK
> * checking for LF line-endings in source and make files
> * checking for empty or unneeded directories
> * looking to see if a 'data/datalist' file should be added
> * building 'dart_1.0-2.tar.gz'
> tmt% R CMD check dart*gz
> ...
> Installation failed.
> See '/people/biostat2/therneau/consult/bsi/dart.Rcheck ...
> tmt% more dart.Rcheck/00install.out
> ...
> ** installing vignettes
> Warning in file(con, "w") :
>cannot open file 
> '/people/biostat2/therneau/consult/bsi/dart.Rcheck/dart/doc/
> index.html': No such file or directory
> Error in file(con, "w") : cannot open the connection
> ERROR: installing vignettes failed
> * removing '/people/biostat2/therneau/consult/bsi/dart.Rcheck/dart'
> [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Implements XPath 2.0 in R

2014-11-18 Thread Hadley Wickham
Why do you need from xpath 2.0?  It will almost certainly be easier to
implement similar functionality using a little R code than adding
xpath 2.0 support.


On Mon, Nov 17, 2014 at 6:03 AM, Rees Morrison  wrote:
> Many users of R would like the enhanced extraction capabilities of XPath
> 2.0, but only XPath 1.0 is available..
> What would the best approach be to find someone to implement XPath 2.0 for
> R (assuming that is a good idea)?  What might the cost be and how would one
> set this package development in motion?
> Thanks
> --
> Rees Morrison
> General Counsel Metrics, LLC (management consulting and *data analytics*)
> 4 Hawthorne Ave.
> Princeton, NJ 08540-3840 USA
> (973) 568-9110
> Hosts
> [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dplyr/summarize does not create a true data frame

2014-11-23 Thread Hadley Wickham
This bug is fixed in the dev version.

On Sunday, November 23, 2014, John Posner  wrote:

> Thanks to John Kane for an off-list consultation. As the following
> annotated transcript shows, it's the group_by() function that transforms a
> data frame into something else:  a "grouped_df" object that *looks*
> identical to the original data frame (e.g. the rows are in the original
> order -- *not* grouped, as arrange() would do), but does not always act
> like a data frame.
> > library(dplyr)
> > # set up data frame, and show its structure [ see below for clean copy
> of dput() code ]
> >
> > frm = structure(list(Id = structure(1:10, .Label = c("P01", "P02",
> + "P03", "P04", "P05", "P06", "P07", "P08", "P09", "P10"), class =
> "factor"),
> + Sex = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L), .Label =
> c("Female",
> + "Male"), class = "factor"), Height = structure(c(1L, 1L,
> + 3L, 2L, 1L, 3L, 1L, 2L, 1L, 1L), .Label = c("Short", "Medium",
> + "Tall"), class = "factor"), Value = c(69.47, 64.61, 74.77,
> + 73.31, 64.76, 72.78, 64.64, 55.96, 60.45, 51.11)), .Names = c("Id",
> + "Sex", "Height", "Value"), row.names = c(NA, -10L), class = "data.frame")
> >
> > str(frm)
> 'data.frame':   10 obs. of  4 variables:
>  $ Id: Factor w/ 10 levels "P01","P02","P03",..: 1 2 3 4 5 6 7 8 9 10
>  $ Sex   : Factor w/ 2 levels "Female","Male": 2 1 1 2 2 2 1 2 2 1
>  $ Height: Factor w/ 3 levels "Short","Medium",..: 1 1 3 2 1 3 1 2 1 1
>  $ Value : num  69.5 64.6 74.8 73.3 64.8 ...
> > # run group_by() on data frame, and show resulting structure
> >
> > after.group_by = frm %>% group_by(Sex, Height)
> > str(after.group_by)
> Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame': 10 obs. of  4
> variables:
>  $ Id: Factor w/ 10 levels "P01","P02","P03",..: 1 2 3 4 5 6 7 8 9 10
>  $ Sex   : Factor w/ 2 levels "Female","Male": 2 1 1 2 2 2 1 2 2 1
>  $ Height: Factor w/ 3 levels "Short","Medium",..: 1 1 3 2 1 3 1 2 1 1
>  $ Value : num  69.5 64.6 74.8 73.3 64.8 ...
>  - attr(*, "vars")=List of 2
>   ..$ : symbol Sex
>   ..$ : symbol Height
>  - attr(*, "drop")= logi TRUE
>  - attr(*, "indices")=List of 5
>   ..$ : int  1 6 9
>   ..$ : int 2
>   ..$ : int  0 4 8
>   ..$ : int  3 7
>   ..$ : int 5
>  - attr(*, "group_sizes")= int  3 1 3 2 1
>  - attr(*, "biggest_group_size")= int 3
>  - attr(*, "labels")='data.frame':  5 obs. of  2 variables:
>   ..$ Sex   : Factor w/ 2 levels "Female","Male": 1 1 2 2 2
>   ..$ Height: Factor w/ 3 levels "Short","Medium",..: 1 3 1 2 3
>   ..- attr(*, "vars")=List of 2
>   .. ..$ : symbol Sex
>   .. ..$ : symbol Height
> > # the two data structure *seem* to be the same ...
> > frm == after.group_by
> Id  Sex Height Value
> > # ... but they're not
> > frm[4]
> 1  69.47
> 2  64.61
> > after.group_by[4]
> Error in eval(expr, envir, enclos) : index out of bounds
> > # fortunately, we can convert back to a true data frame
> >[4]
> 1  69.47
> 2  64.61
> ## dput() code below
> structure(list(Id = structure(1:10, .Label = c("P01", "P02",
> "P03", "P04", "P05", "P06", "P07", "P08", "P09", "P10"), class = "factor"),
> Sex = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L), .Label =
> c("Female",
> "Male"), class = "factor"), Height = structure(c(1L, 1L,
> 3L, 2L, 1L, 3L, 1L, 2L, 1L, 1L), .Label = c("Short", "Medium",
> "Tall"), class = "factor"), Value = c(69.47, 64.61, 74.77,
> 73.31, 64.76, 72.78, 64.64, 55.96, 60.45, 51.11)), .Names = c("Id",
> "Sex", "Height", "Value"), row.names = c(NA, -10L), class = "data.frame")
> > -Original Message-
> > From: John Kane [ ]
> > Sent: Friday, November 21, 2014 12:33 PM
> > To: John Posner; ' '
> > Subject: RE: [R] dplyr/summarize does not create a true data frame
> >
> > Your code in creating 'frm' is not working for me and it is complicated
> enough
> > that I don't want to work it out. See ?dput for a better way to supply
> data.
> > Also see:
> >
> >
> > reproducible-example
> >
> > That said, I don't see why 'my.output[4]' is not working.  Try something
> like
> > str(frm) to see what you have there and/or resubmit the data in dput
> format
> >
> > See simple example below:
> >
> > dat1  <- data.frame(aa = sample(1:20, 100, replace = TRUE), bb = 1:100 )
> > dat1[2]
> >
> > John Kane
> > Kingston ON Canada
> >
> >
> > > -Original Message-
> > > From: 
> > > Sent: Fri, 21 Nov 2014 17:10:16 +
> > > To: 
> > > Subject: [R] dplyr/summarize does not create a true data frame
> > >
> > > I got an error when trying to ex

Re: [R] function to avoid <<-

2014-12-02 Thread Hadley Wickham
At the top level do:

myenv <- new.env(parent = emptyenv())

Then in your functions do

myenv$x <- 50


You also should not be using data() in that way. Perhaps you want
R/sysdata.rda. See for more details.


On Tue, Dec 2, 2014 at 2:28 AM, Karim Mezhoud  wrote:
> Dear All,
> I am writing a GUIpackage that needs global variables.
> I had many warning message when I checked the code as for example:
> geteSet: no visible binding for global variable ‘curselectCases’
> I would like to write a function that creates a global place for Objects to
> be loaded as:
> Fun <- function(){
> Object <- 5
> Var2Global <- function(Object){
> .myDataEnv <- new.env(parent=emptyenv()) # not exported
> isLoaded <- function(Object) {
> exists(Object, .myDataEnv)
> }
> getData <- function(Object) {
> if (!isLoaded(Object)) data(Object, envir=.myDataEnv)
> .myDataEnv[[Object]]
> }
> }
> }
> To avoid the use of:  Object <<- 5
> but it seems not working yet. Object == 5 is not a global variable after
> running Fun().
> Any Idea?
> Thanks
>   Ô__
>  c/ /'_;kmezhoud
> (*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
> [[alternative HTML version deleted]]
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How are packages installed with install_github() updated in RStudio?

2015-08-18 Thread Hadley Wickham
RStudio just calls the same underlying R functions, so it doesn't make
any difference that you're using RStudio.  Currently, there's no
automatic way to update packages installed from github.


On Tue, Aug 18, 2015 at 8:14 AM, John Kane  wrote:
> Hi Michal,
> Because RStudio seems to use its own method of updating you might be better 
> off asking in their forum.
> John Kane
> Kingston ON Canada
>> -Original Message-
>> From:
>> Sent: Tue, 18 Aug 2015 10:43:20 +0200
>> To:
>> Subject: [R] How are packages installed with install_github() updated in
>> RStudio?
>> Hallo.
>> I use RStudio. Because of a bug in the latest CRAN version of dplyr, I
>> installed the GitHub version with install_github(). Now I wonder what
>> happens when there is a new version. Does RStudio update the packages
>> installed from GitHub? If so, does it replace it with the new CRAN
>> version,
>> or a new GitHub version?
>> Many thanks for you answer,
>> Michal Kvasnicka
>>   [[alternative HTML version deleted]]
>> __
>> mailing list -- To UNSUBSCRIBE and more, see
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
> Can't remember your password? Do you need a strong and secure password?
> Use Password manager! It stores your passwords & protects your account.
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Issues with RPostgres

2015-08-27 Thread Hadley Wickham
On Thu, Aug 27, 2015 at 3:46 PM, John McKown
> On Thu, Aug 27, 2015 at 2:29 PM, Abraham Mathew 
> wrote:
>> I have a user-defined function that I'm using alongside a postgresql
>> connection to
>> summarize some data. I've connected to the local machine with no problem.
>> However,
>> the connection keeps throwing the following error when I attempt to use it.
>> Can anyone point
>> to what I could be doing wrong.
>> > ds_summary(con, "test", vars=c("Age"), y=c("Class"))
>> Error in postgresqlNewConnection(drv, ...) :
>>   RS-DBI driver: (could not connect postgres@localhost on dbname "test"
>> )
>> con is the connection
> It would be helpful to see the assignment to "con" as well as any other
> assignments related to this. If you are using the DBI package, then what I
> am talking about would be something like:
> drv<-dbDriver("PgSQL")
> con<-dbConnect(drb,user=...,password=...,dbname="test');

FWIW the best way to create a connection is:

con <- dbConnect(RPostgreSQL::PostgreSQL(), ...)

The older string based approach is not advised.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 scale_shape_manual with large numbers instead of shapes

2015-08-27 Thread Hadley Wickham
Something like this?

df <- data.frame(
  x = runif(30),
  y = runif(30),
  z = factor(1:30)

ggplot(df, aes(x, y)) +
  geom_point(aes(shape = z), size = 5) +
  scale_shape_manual(values = c(letters, 0:9))


On Thu, Aug 27, 2015 at 4:48 PM, Marian Talbert  wrote:
> I'm trying to produce a plot with climate data in which colors describe one
> aspect of the data (emissions scenario) and numbers rather than shapes show
> the model used (there are 36 models for one emissions scenario and 34 for
> the other).  I'm trying to use numbers rather than symbols because there are
> 36 climate models and thus not enough symbols.  Numbering seems more
> consistent than some combo of letters and symbols.  I couldn't figure out
> how to define my own shapes as numbers 1 to 36 using scale_shape_manual so
> I'm adding the numbers with annotate.  The problem is that I'd like a second
> legend linking the numbering to the long model names but am having a hard
> time with this.  I've created a toy example below to make this more clear.
> p1 below was my original plot and I'd like p2 only with the second legend
> linking numbers to long model names any suggestions?
> library(ggplot2)
> Dat<-data.frame(Temp=c(rnorm(36,0,1),rnorm(36,1.5,1)),Precp=c(rnorm(36,0,1),rnorm(36,1,1)),
> model=factor(rep(paste("LongModelName",c(letters,1:10),sep="_"),times=2)),
>   Emissions=factor(rep(c("RCP 4.5","RCP 8.5"),each=36)))
>  EmissionsCol<-c("goldenrod2","red")
>  Pquants <- aggregate(Dat$Precp,list(RCP=Dat$Emissions),
>  Tquants <- aggregate(Dat$Temp,list(RCP=Dat$Emissions),
>  Quants<-data.frame(Emissions=Tquants$RCP,Tmin=Tquants[[2]][,1],
>   TMedian=Tquants[[2]][,2],Tmax=Tquants[[2]][,3],
> Pmin=Pquants[[2]][,1],PMedian=Pquants[[2]][,2],Pmax=Pquants[[2]][,3])
> #Original Plot
> Labels<-Dat$model
>  p1 <- ggplot()+geom_point(Dat,mapping=aes(x=Temp,y=Precp,colour=Emissions),
>  size=.1)+
>  scale_colour_manual(values=c("#EEB422BE","#FFBE"),guide="none")+
>   annotate("text", label=Labels, x=Dat$Temp,
> y=Dat$Precp,colour=c("#EEB422BE","#FFBE")[Dat$Emissions]) +
>   guides(fill=guide_legend(reverse=TRUE))+theme(axis.title =
> element_text(size = 2)) +
> geom_segment(data=Quants,mapping=aes(x=Tmin,y=PMedian,xend=Tmax,yend=PMedian),size=2,colour="black")+
> geom_segment(data=Quants,mapping=aes(x=TMedian,y=Pmin,xend=TMedian,yend=Pmax),size=2,colour="black")+
> geom_segment(data=Quants,mapping=aes(x=Tmin,y=PMedian,xend=Tmax,yend=PMedian,colour=Emissions),size=1)+
> geom_segment(data=Quants,mapping=aes(x=TMedian,y=Pmin,xend=TMedian,yend=Pmax,colour=Emissions),size=1)+
> geom_point(data=Quants,mapping=aes(x=TMedian,y=PMedian,fill=Emissions),size=6,pch=21,colour="black")+
>   scale_fill_manual(values=EmissionsCol)
> p1
> #with numbers instead of model names
> Labels<-as.numeric(factor(Dat$model))
>  p2<-
> ggplot()+geom_point(Dat,mapping=aes(x=Temp,y=Precp,colour=Emissions),size=.1)+
>  scale_colour_manual(values=c("#EEB422BE","#FFBE"),guide="none")+
>   annotate("text", label=Labels, x=Dat$Temp,
> y=Dat$Precp,colour=c("#EEB422BE","#FFBE")[Dat$Emissions])+
>   guides(fill=guide_legend(reverse=TRUE))+theme(axis.title =
> element_text(size = 2)) +
> geom_segment(data=Quants,mapping=aes(x=Tmin,y=PMedian,xend=Tmax,yend=PMedian),size=2,colour="black")+
> geom_segment(data=Quants,mapping=aes(x=TMedian,y=Pmin,xend=TMedian,yend=Pmax),size=2,colour="black")+
> geom_segment(data=Quants,mapping=aes(x=Tmin,y=PMedian,xend=Tmax,yend=PMedian,colour=Emissions),size=1)+
> geom_segment(data=Quants,mapping=aes(x=TMedian,y=Pmin,xend=TMedian,yend=Pmax,colour=Emissions),size=1)+
> geom_point(data=Quants,mapping=aes(x=TMedian,y=PMedian,fill=Emissions),size=6,pch=21,colour="black")+
>   scale_fill_manual(values=EmissionsCol)
> p2
> --
> View this message in context: 
> Sent from the R help mailing list archive at
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexpected/undocumented behavior of 'within': dropping variable names that start with '.'

2015-09-20 Thread Hadley Wickham
The problem is that calls as.list.environment with
the default value of all.names = FALSE. I doubt this is a deliberate
feature, and is more likely to be a minor oversight.


On Sun, Sep 20, 2015 at 11:49 AM, Brian  wrote:
> Dear List,
> Somewhere I missed something, and now I'm really missing something!
>> d.f <- data.frame(.id = c(TRUE, FALSE, TRUE), dummy = c(1, 2, 3), a =
> c(1, 2, 3), b = c(1, 2, 3) + 1)
>  > within(d.f, {d = a + b})
>dummy a b d
>  1 1 1 2 3
>  2 2 2 3 5
>  3 3 3 4 7
>  > d.f <- data.frame(.id = c(TRUE, FALSE, TRUE), .dummy = c(1, 2, 3), a
> = c(1, 2, 3), b = c(1, 2, 3) + 1)
>  > within(d.f, {d = a + b})
>a b d
>  1 1 2 3
>  2 2 3 5
>  3 3 4 7
> Could somebody please explain to me why this does this? I think could be
> considered a feature (for lots of calculations within a data frame you
> don't have to remove all extra variables at the end).  I just wish it
> was documented.
> Cheers,
> Brian
> sessionInfo()
>  R version 3.1.0 (2014-04-10)
>  Platform: x86_64-pc-linux-gnu (64-bit)
>  locale:
>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  attached base packages:
>  [1] splines   grid  stats graphics  grDevices utils datasets
>  [8] methods   base
>  other attached packages:
>   [1] scales_0.2.4   plyr_1.8.3 reshape2_1.4
> ccchDataProc_0.7
>   [5] ccchTools_0.6  xtable_1.7-4   tables_0.7.79  Hmisc_3.14-5
>   [9] Formula_1.1-2  survival_2.37-7ggplot2_1.0.1
> IDPmisc_1.1.17
>  [13] lattice_0.20-29myRplots_1.1   myRtools_1.2   meteoconv_0.1
>  [17] pixmap_0.4-11  RColorBrewer_1.0-5 maptools_0.8-30sp_1.1-1
>  [21] mapdata_2.2-3  mapproj_1.2-2  maps_2.3-9 chron_2.3-45
>  [25] MASS_7.3-35
>  loaded via a namespace (and not attached):
>   [1] acepack_1.3-3.3 cluster_1.15.2  colorspace_1.2-4
>   [4] compiler_3.1.0  data.table_1.9.4digest_0.6.4
>   [7] foreign_0.8-61  gtable_0.1.2labeling_0.3
>  [10] latticeExtra_0.6-26 munsell_0.4.2   nnet_7.3-8
>  [13] proto_0.3-10Rcpp_0.12.0 rpart_4.1-8
>  [16] stringr_0.6.2   tools_3.1.0
>  > within
>  function (data, expr, ...)
>  UseMethod("within")
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] retaining characters in a csv file

2015-09-22 Thread Hadley Wickham
The problem is that quotes in csv files are commonly held to me
meaningless (i.e. they don't automatically force components to be

Earlier this morning I committed a fix to readr so that numbers
starting with a sequence of zeros are read as character strings. You
may want to try out the dev version:


On Tue, Sep 22, 2015 at 5:00 PM, Therneau, Terry M., Ph.D.
> I have a csv file from an automatic process (so this will happen thousands
> of times), for which the first row is a vector of variable names and the
> second row often starts something like this:
> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .
> Notice the second variable which is
>   a character string (note the quotation marks)
>   a sequence of numeric digits
>   leading zeros are significant
> The read.csv function insists on turning this into a numeric.  Is there any
> simple set of options that
> will turn this behavior off?  I'm looking for a way to tell it to "obey the
> bloody quotes" -- I still want the first, third, etc columns to become
> numeric.  There can be more than one variable like this, and not always in
> the second position.
> This happens deep inside the httr library; there is an easy way for me to
> add more options to the read.csv call but it is not so easy to replace it
> with something else.
> Terry T


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Control of x-axis variable ordering in ggplot

2015-10-23 Thread Hadley Wickham
You have two problems:

* geom_line() always draws from right-to-left
* you're defining colour outside of the plot in a very non-ggplot2 way.

Here's how I'd do it:

data <- data.frame(
  x = rep(1:4, each = 25),
  y = rep(1:25, times = 4),
  g = rep(1:4, each = 25)
data$x <- data$x + 0.005 * data$y ^ 2 - 0.1 * data$y + 1

ggplot(data, aes(x, y, colour = factor(g))) +
  geom_point() +



On Thu, Oct 22, 2015 at 8:46 PM, sbihorel
> Hi,
> Given a certain data.frame, the lattice xyplot function will plot the data
> and join the data point in the order of the data frame. It is my
> (probably flawed) understanding that, using the same data frame, ggplot
> orders the data by increasing order of the x-axis variable. Can one control
> this behavior?
> Thanks
> Sebastien
> Code example
> library(lattice)
> library(ggplot2)
> data <- data.frame(x=rep(1:4,each=25),
> data$x <- data$x + 0.005*(data$y)^2-0.1*data$y+1
> col <- 3:7
> xyplot(y~x,data=data,groups=g,type='l',col=col)
> ggplot(data, aes(x,y,group=g)) + geom_point(colour=col[data$g]) +
>   geom_line(colour=col[data$g])
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Achieve independent fine user control of ggplot geom settings when using groups in multiple geom's

2015-10-30 Thread Hadley Wickham
I'd recommend reading the ggplot2 book - learning more about how
scales work in ggplot2 will help you understand why this isn't

On Thu, Oct 29, 2015 at 6:31 PM, sbihorel
> Thank for your reply,
> I may accept your point about the mapping consistency when the different
> geom's use the same data source. However, as pointed out in my example code,
> this does not have to be the case. Hence my question about the geom-specific
> control of group-dependent graphical settings.
> Sebastien
> On 10/29/2015 4:49 PM, Jeff Newmiller wrote:
>> I think a fundamental design principle of ggplot is that mapping of values
>> to visual representation are consistent within a single plot, so reassigning
>> color mapping for different elements would not be supported.
>> That being said, it is possible to explicitly control specific attributes
>> within a single geom outside of the mapping, though this usually does break
>> mappings in the legend.
>> ---
>> Jeff NewmillerThe .   .  Go
>> Live...
>> DCN:Basics: ##.#.   ##.#.  Live
>> Go...
>>Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>> /Software/Embedded Controllers)   .OO#.   .OO#.
>> rocks...1k
>> ---
>> Sent from my phone. Please excuse my brevity.
>> On October 29, 2015 11:27:55 AM MST, sbihorel
>>  wrote:
>>> Thank you for your reply.
>>> I do not have anything specific data/geom/grouping in mind, rather a
>>> framework in which users would just pile of each other layer after
>>> layer
>>> of geom each defined with specific settings. A minimum realistic
>>> scenario would a geom_point followed by a geom_smooth or a geom_path
>>> using different colors...
>>> Sebastien
>>> On 10/29/2015 1:34 PM, Ista Zahn wrote:

 I would say in a word, 'no'. What you seem to be implying is that you
 want multiple color scales, multiple shape scales, etc. As far as I
 know there is no support for that in ggplot2.

 Perhaps if you show us what you're actually trying to accomplish
 someone can suggest a solution or at least a work-around.


 On Thu, Oct 29, 2015 at 12:26 PM, sbihorel
> Hello,
> Before I get to my question, I want to make clear that the topic of
>>> my
> present post is similar to posts I recently submitted to the list.
>>> Although
> I appreciate the replies I got, I believe that I did not correctly
>>> frame
> these previous posts to get to the bottom of things.
> I also want to make clear that the code example that I have inserted
>>> in this
> post is meant to illustrate my points/questions and does not reflect
>>> a
> particular interest in the data or the sequence of ggplot geom's
>>> used
> (except otherwise mentioned). Actually, I purposefully used junk
>>> meaningless
> data, geom's sequence, and settings, so that we agree the plot is
>>> ugly and
> that we, hopefully, don't get hang on specifics and start discussing
>>> about
> the merit of one approach vs another.
> So here are my questions:
> 1- Can a user independently control the settings of each geom's used
>>> in a
> ggplot call sequence when grouping is required?
> By control, I mean: user defines the graphical settings (groups,
>>> symbol
> shapes, colors, fill colors, line types, size scales, and alpha) and
>>> does
> not let ggplot choose these settings from some theme default.
> By independently, I mean: the set of graphical settings can be
>>> totally
> different from one group to the next and from one geom to the next.
> If this fine control can be achieved, how would you go about it
>>> (please, be
> assured that I already spent hours miserably failing to get to
>>> anything
> remotely productive, so your help would be really appreciated)?
> library(dplyr)
> library(tidyr)
> library(ggplot2)
> set.seed(1234)
> dummy <- data.frame(dummy = numeric())
> data <- data.frame(x1 = rep(-2:2, each = 80) + rnorm(4000, sd =
>>> 0.1),
>  g1 = rep(1:4, each = 1000))
> data <- data %>% mutate(y1 = -x1^2 + 2*x1 - 2 + g1 + rnorm(4000, sd
>>> = 0.25))
> data2 <- data %>% select(x2=x1, y2=y1, g2=g1) %>% mutate(x2=-x2)
> data3 <- data.frame(x3 = sample(seq(-2, 2, by = 0.1), 20, replace =
>>> TRUE),
>   y3 = runif(20, min=-8, max=4),
>   g3 = rep(1:4, each = 5))

[R] R Consortium projects

2015-11-04 Thread Hadley Wickham
Hi all,

I'm very pleased to announce that the Infrastructure Steering
Committee (ISC) of the R consortium is calling for proposals:

In brief:

* We want to fund projects that help the R community, broadly construed.

* Currently, we are mostly focussed on funding people who have the
  skills to solve a problem. In the future, we will explore how to
  match up people with skills and people with problems.

* Proposals are due Jan 10.

Please let me know if you have any questions!

Hadley Wickham
Chair, ISC


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weird behaviour function args in ellipses

2015-12-11 Thread Hadley Wickham
On Fri, Dec 11, 2015 at 1:13 PM, Duncan Murdoch
> On 11/12/2015 1:52 PM, Mario José Marques-Azevedo wrote:
>> Hi Duncan and David,
>> Thank you for explanation. I'm really disappointed with this R "resource".
>> I think that partial match, mainly in function args, must be optional and
>> not default. We can have many problems and lost hours find errors (it
>> occur
>> with me). I tried to find a solution to disable partial match, but it
>> seems
>> that is not possible. Program with hacks for this will be sad.
> Nowadays with smart editors, I agree that partial matching isn't really
> necessary.  However, R has been around for 20 years, and lots of existing
> code depends on it.   Eventually you'll get to know the quirks of the
> design.

And if you really dislike this behavour, you can at least warn on it:

  warnPartialMatchArgs = TRUE,
  warnPartialMatchAttr = TRUE,
  warnPartialMatchDollar = TRUE



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] stopifnot with logical(0)

2015-12-12 Thread Hadley Wickham
On Sat, Dec 12, 2015 at 3:54 AM, Martin Maechler
>> Henrik Bengtsson 
>> on Fri, 11 Dec 2015 08:20:55 -0800 writes:
> > On Fri, Dec 11, 2015 at 8:10 AM, David Winsemius 
>  wrote:
> >>
> >>> On Dec 11, 2015, at 5:38 AM, Dario Beraldi  
> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> I'd like to understand the reason why stopifnot(logical(0) == x) 
> doesn't
> >>> (never?) throw an exception, at least in these cases:
> >>
> >> The usual way to test for a length-0 logical object is to use length():
> >>
> >> x <- logical(0)
> >>
> >> stopifnot( !length(x) & mode(x)=="logical" )
> > I found
> > stopifnot(!length(x), mode(x) == "logical")
> > more helpful when troubleshooting, because it will tell you whether
> > it's !length(x) or mode(x) == "logical" that is FALSE.  It's as if you
> > wrote:
> > stopifnot(!length(x))
> > stopifnot(mode(x) == "logical")
> > /Henrik
> Yes, indeed, thank you Henrik  --- and Jeff Newmiller who's nice
> humorous reply added other relevant points.
> As author stopifnot(), I do agree with Dario's  "gut feeling"
> that stopifnot()  "somehow ought to do the right thing"
> in cases such as
>stopifnot(dim(x) == c(3,4))
> which is really subtle version of his cases
> {But the gut feeling is wrong, as I argue from now on}.

Personally, I think the problem there is that people forget that == is
vectorised, and for a non-vectorised equality check you really should
use identical:

stopifnot(identical(dim(x), c(3,4)))



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] stopifnot with logical(0)

2015-12-14 Thread Hadley Wickham
On Sat, Dec 12, 2015 at 1:51 PM, Martin Maechler
>>>>>> Hadley Wickham 
>>>>>> on Sat, 12 Dec 2015 08:08:54 -0600 writes:
> > On Sat, Dec 12, 2015 at 3:54 AM, Martin Maechler
> >  wrote:
> >>>>>>> Henrik Bengtsson  on
> >>>>>>> Fri, 11 Dec 2015 08:20:55 -0800 writes:
> >>
> >> > On Fri, Dec 11, 2015 at 8:10 AM, David Winsemius
> >>  wrote:
> >> >>
> >> >>> On Dec 11, 2015, at 5:38 AM, Dario Beraldi
> >>  wrote:
> >> >>>
> >> >>> Hi All,
> >> >>>
> >> >>> I'd like to understand the reason why
> >> stopifnot(logical(0) == x) doesn't >>> (never?) throw an
> >> exception, at least in these cases:
> >> >>
> >> >> The usual way to test for a length-0 logical object is
> >> to use length():
> >> >>
> >> >> x <- logical(0)
> >> >>
> >> >> stopifnot( !length(x) & mode(x)=="logical" )
> >>
> >> > I found
> >>
> >> > stopifnot(!length(x), mode(x) == "logical")
> >>
> >> > more helpful when troubleshooting, because it will tell
> >> you whether > it's !length(x) or mode(x) == "logical"
> >> that is FALSE.  It's as if you > wrote:
> >>
> >> > stopifnot(!length(x)) > stopifnot(mode(x) == "logical")
> >>
> >> > /Henrik
> >>
> >> Yes, indeed, thank you Henrik --- and Jeff Newmiller
> >> who's nice humorous reply added other relevant points.
> >>
> >> As author stopifnot(), I do agree with Dario's "gut
> >> feeling" that stopifnot() "somehow ought to do the right
> >> thing" in cases such as
> >>
> >> stopifnot(dim(x) == c(3,4))
> >>
> >> which is really subtle version of his cases {But the gut
> >> feeling is wrong, as I argue from now on}.
> > Personally, I think the problem there is that people
> > forget that == is vectorised, and for a non-vectorised
> > equality check you really should use identical:
> > stopifnot(identical(dim(x), c(3,4)))
> You are right "in theory"  but practice is less easy:
> identical() tends to be  too subtle for many users ... even
> yourself (;-), not really of course!),  Hadley, in the above case:
> Your stopifnot() would *always* stop, i.e., signal an error
> because typically all dim() methods return integer, and c(3,4)
> is double.
> So, if even Hadley gets it wrong so easily, I wonder if its good
> to advertize to always use  identical() in such cases.
> I indeed would quite often use identical() in such tests, and
> you'd too and would quickly find and fix the "trap" of course..
> So you are mostly right also in my opinion...

Ooops, yes - but you would discover this pretty quickly if you weren't
coding in a email client ;)

I wonder if R is missing an equality operator for this case. Currently:

* == is suboptimal because it's vectorised
* all.equal is suboptimal because it returns TRUE or a text string
* identical is suboptimal because it doesn't do common coercions

Do we need another function (equals()?) that uses the same coercion
rules as == but isn't vectorised? (Like == it would only work with
vectors, so you'd still need identical() for (e.g.) comparing



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] stopifnot with logical(0)

2015-12-14 Thread Hadley Wickham
>> I wonder if R is missing an equality operator for this case. Currently:
>> * == is suboptimal because it's vectorised
>> * all.equal is suboptimal because it returns TRUE or a text string
>> * identical is suboptimal because it doesn't do common coercions
>> Do we need another function (equals()?) that uses the same coercion
>> rules as == but isn't vectorised? (Like == it would only work with
>> vectors, so you'd still need identical() for (e.g.) comparing
>> environments)
> I don't think so.  We already have all(), so all(x == y) would do what you
> want.

But that recycles, which is what we're trying to avoid here.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Make a box-whiskers plot in R with 5 variables, color coded.

2015-12-16 Thread Hadley Wickham
On Tue, Dec 15, 2015 at 9:55 AM, Martin Maechler
> > You are missing the closing bracket on the boxplot()
> > command.  Just finish with a ')'
> Hmm... I once learned
>  '()' =: parenthesis/es
>  '[]' =: bracket(s)
>  '{}' =: brace(s)
> Of course, I'm not a native English speaker, and my teacher(s) /
> teaching material may have been biased ... but, as all three
> symbol pairs play an important role in R, I think it would be
> really really helpful,  if we could agree on using the same
> precise English here.
> I'm happy to re-learn, but I'd really like to end up with three
> different simple English words, if possible.
> (Yes, I know and have seen/heard "curly braces", "round
>  parentheses", ... but I'd hope we can do without the extra adjective.)

I think this is what Americans are taught, but I can never remember
which is which. I use round brackets, square brackets, and squiggly
brackets, which are memorable, and even if you're not familiar with
the terms you can easily understand what I mean.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Make a box-whiskers plot in R with 5 variables, color coded.

2015-12-16 Thread Hadley Wickham
On Wed, Dec 16, 2015 at 9:34 AM, Hadley Wickham  wrote:
> On Tue, Dec 15, 2015 at 9:55 AM, Martin Maechler
>  wrote:
>> > You are missing the closing bracket on the boxplot()
>> > command.  Just finish with a ')'
>> Hmm... I once learned
>>  '()' =: parenthesis/es
>>  '[]' =: bracket(s)
>>  '{}' =: brace(s)
>> Of course, I'm not a native English speaker, and my teacher(s) /
>> teaching material may have been biased ... but, as all three
>> symbol pairs play an important role in R, I think it would be
>> really really helpful,  if we could agree on using the same
>> precise English here.
>> I'm happy to re-learn, but I'd really like to end up with three
>> different simple English words, if possible.
>> (Yes, I know and have seen/heard "curly braces", "round
>>  parentheses", ... but I'd hope we can do without the extra adjective.)
> I think this is what Americans are taught, but I can never remember
> which is which. I use round brackets, square brackets, and squiggly
> brackets, which are memorable, and even if you're not familiar with
> the terms you can easily understand what I mean.

I should mention that all three terms have accompanying arm motions ;)



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vjust unresponsive (ggplot2)

2015-12-23 Thread Hadley Wickham
vjust was always a hack that I never thought should work. The margins
parameter is the correct way to solve this problem as of ggplot2 2.0.0.


On Tuesday, December 22, 2015, Nordlund, Dan (DSHS/RDA) 

> Ista,
> You are correct, I was not at the latest release of ggplot2.  I updated to
> the latest version and am now seeing the same result as you and the OP.  So
> it does look like an issue with the latest version of ggplot2.
> Dan
> Daniel Nordlund, PhD
> Research and Data Analysis Division
> Services & Enterprise Support Administration
> Washington State Department of Social and Health Services
> > -Original Message-
> > From: Ista Zahn [ ]
> > Sent: Tuesday, December 22, 2015 10:48 AM
> > To: Nordlund, Dan (DSHS/RDA)
> > Cc: 
> > Subject: Re: [R] vjust unresponsive (ggplot2)
> >
> > Hi Dan,
> >
> > Chances are that you haven't yet upgraded to ggplot2 version 2.0. unit
> (as
> > well as arrow and alpha) are now re-exported from ggplot2.
> >
> > Using the latest release I also see that vjust doesn't seem to do
> anything.
> >
> > Best,
> > Ista
> >
> > On Tue, Dec 22, 2015 at 1:37 PM, Nordlund, Dan (DSHS/RDA)
> > > wrote:
> > > Are you sure it is not working for you?  Your example code did not
> work for
> > me at all until I removed the plot .margin  parameter (unit wasn't
> > recognized).  Once I did that hjust and vjust worked as expected.
> However,
> > values between .1 and .9 for vjust don't really move the axis title very
> much
> > so it may not be real noticeable.  Try a value like 2 or 3, just to make
> sure you
> > easily see the change in position before concluding that nothing is
> > happening.
> > >
> > > Dan
> > >
> > > Daniel Nordlund, PhD
> > > Research and Data Analysis Division
> > > Services & Enterprise Support Administration Washington State
> > > Department of Social and Health Services
> > >
> > >
> > > -Original Message-
> > > From: R-help [ ] On
> Behalf Of Ryan
> > > Utz
> > > Sent: Tuesday, December 22, 2015 10:00 AM
> > > To: 
> > > Subject: [R] vjust unresponsive (ggplot2)
> > >
> > > Hi all,
> > >
> > > I cannot for the life of me get my axis titles to adjust vertically in
> a ggplot.
> > I've seen several posts about this and have tried everything:
> > > keeping vjust within 0 and 1, adjusting the margins, etc. hjust is
> behaving
> > just as it should but vjust just mocks me in silence. No error message is
> > produced.
> > >
> > > Here's a sample code:
> > >
> > > x=data.frame(sample(1:10))
> > > x[,2]=sample(1:10)
> > >
> > > ggplot(data=x,aes(x=V2,y=V2))+theme(axis.title.y=element_text(vjust=.1
> > > ,hjust=0.6),
> > > plot.margin=unit(c(1,1,2,2),'cm'))
> > >
> > > No matter what I put into vjust, nothing happens. Am I missing
> something
> > obvious??
> > >
> > > Thanks ahead of time for any help,
> > > Ryan
> > >
> > >
> > > --
> > >
> > > Ryan Utz, Ph.D.
> > > Assistant professor of water resources
> > > *chatham**UNIVERSITY*
> > > Home/Cell: (724) 272-7769
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > >  mailing list -- To UNSUBSCRIBE
> and more, see
> > >
> > > PLEASE do read the posting guide
> > >
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > > __
> > >  mailing list -- To UNSUBSCRIBE
> and more, see
> > >
> > > PLEASE do read the posting guide
> > >
> > > and provide commented, minimal, self-contained, reproducible code.
> __
>  mailing list -- To UNSUBSCRIBE and
> more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Updating github R packages

2016-02-17 Thread Hadley Wickham
It will be included in the next version of devtools - it's totally
do-able, but no one has done it yet.


On Wed, Feb 17, 2016 at 6:44 PM, Jeff Newmiller
> AFAIK the answer is no. That would be one of the main drawbacks of depending 
> on github for packages. It isn't really a package repository so much as it is 
> a herd of cats.
> --
> Sent from my phone. Please excuse my brevity.
> On February 16, 2016 6:43:02 PM PST, "Hoji, Akihiko"  wrote:
>>Is there a way to update a R package and its dependencies,  installed
>>from the github repo by a simple command equivalent to
>> mailing list -- To UNSUBSCRIBE and more, see
>>PLEASE do read the posting guide
>>and provide commented, minimal, self-contained, reproducible code.
> [[alternative HTML version deleted]]
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple question on data frames assignment

2016-04-07 Thread Hadley Wickham
== is also vectorised, and you're better off with TRUE and FALSE
rather than 1 and 0, so I'd recommend:

colordata$response <- colordata$color == 'blue'


On Thu, Apr 7, 2016 at 6:52 AM, David Barron  wrote:
> ifelse is vectorised, so just use that without the loop.
> colordata$response <- ifelse(colordata$color == 'blue', 1, 0)
> David
> On 7 April 2016 at 12:41, Michael Artz  wrote:
>> Hi I'm not sure how to ask this, but its a very easy question to answer for
>> an R person.
>> What is an easy way to check for a column value and then assigne a new
>> column a value based on that old column value?
>> For example, Im doing
>>  colordata <- data.frame(id = c(1,2,3,4,5), color = c("blue", "red",
>> "green", "blue", "orange"))
>>  for (i in 1:nrow(colordata)){
>>colordata$response[i] <- ifelse(colordata[i,"color"] == "blue", 1, 0)
>>  }
>> which works,  but I don't want to use the for loop I want to "vecotrize"
>> this.  How would this be implemented?
>> [[alternative HTML version deleted]]
>> __
>> mailing list -- To UNSUBSCRIBE and more, see
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
> [[alternative HTML version deleted]]
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] installation of dplyr

2016-04-19 Thread Hadley Wickham
You normally see these errors when compiling on a vm that has very
little memory.

On Tue, Apr 19, 2016 at 2:47 PM, Ben Tupper  wrote:
> Hello,
> I am getting a fresh CentOS 6.7 machine set up with all of the goodies for R 
> 3.2.3, including dplyr package. I am unable to successfully install it.  
> Below I show the failed installation using utils::install.packages() and then 
> again using devtools::install_github().  Each yields an error similar to the 
> other but not quite exactly the same - the error messages sail right over my 
> head.
> I can contact the package author if that would be better, but thought it best 
> to start here.
> Thanks!
> Ben
> Ben Tupper
> Bigelow Laboratory for Ocean Sciences
> 60 Bigelow Drive, P.O. Box 380
> East Boothbay, Maine 04544
>> sessionInfo()
> R version 3.2.3 (2015-12-10)
> Platform: x86_64-redhat-linux-gnu (64-bit)
> Running under: CentOS release 6.7 (Final)
> locale:
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
> #   utils::install.packages()
>> install.packages("dplyr", repo = "";)
> Installing package into ‘/usr/lib64/R/library’
> (as ‘lib’ is unspecified)
> trying URL ''
> Content type 'application/x-gzip' length 655997 bytes (640 KB)
> ==
> downloaded 640 KB
> * installing *source* package ‘dplyr’ ...
> ** package ‘dplyr’ successfully unpacked and MD5 sums checked
> ** libs
> g++ -m64 -I/usr/include/R -DNDEBUG -I../inst/include -DCOMPILING_DPLYR 
> -I/usr/local/include -I"/usr/lib64/R/library/Rcpp/include" 
> -I"/usr/lib64/R/library/BH/include"   -fpic  -O2 -g -pipe -Wall 
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
> --param=ssp-buffer-size=4 -m64 -mtune=generic  -c RcppExports.cpp -o 
> RcppExports.o
> g++ -m64 -I/usr/include/R -DNDEBUG -I../inst/include -DCOMPILING_DPLYR 
> -I/usr/local/include -I"/usr/lib64/R/library/Rcpp/include" 
> -I"/usr/lib64/R/library/BH/include"   -fpic  -O2 -g -pipe -Wall 
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
> --param=ssp-buffer-size=4 -m64 -mtune=generic  -c address.cpp -o address.o
> g++ -m64 -I/usr/include/R -DNDEBUG -I../inst/include -DCOMPILING_DPLYR 
> -I/usr/local/include -I"/usr/lib64/R/library/Rcpp/include" 
> -I"/usr/lib64/R/library/BH/include"   -fpic  -O2 -g -pipe -Wall 
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
> --param=ssp-buffer-size=4 -m64 -mtune=generic  -c api.cpp -o api.o
> g++ -m64 -I/usr/include/R -DNDEBUG -I../inst/include -DCOMPILING_DPLYR 
> -I/usr/local/include -I"/usr/lib64/R/library/Rcpp/include" 
> -I"/usr/lib64/R/library/BH/include"   -fpic  -O2 -g -pipe -Wall 
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
> --param=ssp-buffer-size=4 -m64 -mtune=generic  -c arrange.cpp -o arrange.o
> In file included from ../inst/include/dplyr.h:131,
>  from arrange.cpp:1:
> ../inst/include/dplyr/DataFrameSubsetVisitors.h: In constructor 
> ‘dplyr::DataFrameSubsetVisitors::DataFrameSubsetVisitors(const 
> Rcpp::DataFrame&, const Rcpp::CharacterVector&)’:
> ../inst/include/dplyr/DataFrameSubsetVisitors.h:40: warning: ‘column’ may be 
> used uninitialized in this function
> g++ -m64 -I/usr/include/R -DNDEBUG -I../inst/include -DCOMPILING_DPLYR 
> -I/usr/local/include -I"/usr/lib64/R/library/Rcpp/include" 
> -I"/usr/lib64/R/library/BH/include"   -fpic  -O2 -g -pipe -Wall 
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
> --param=ssp-buffer-size=4 -m64 -mtune=generic  -c between.cpp -o between.o
> g++ -m64 -I/usr/include/R -DNDEBUG -I../inst/include -DCOMPILING_DPLYR 
> -I/usr/local/include -I"/usr/lib64/R/library/Rcpp/include" 
> -I"/usr/lib64/R/library/BH/include"   -fpic  -O2 -g -pipe -Wall 
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
> --param=ssp-buffer-size=4 -m64 -mtune=generic  -c bind.cpp -o bind.o
> g++ -m64 -I/usr/include/R -DNDEBUG -I../inst/include -DCOMPILING_DPLYR 
> -I/usr/local/include -I"/usr/lib64/R/library/Rcpp/include" 
> -I"/usr/lib64/R/library/BH/include"   -fpic  -O2 -g -pipe -Wall 
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
> --param=ssp-buffer-size=4 -m64 -mtune=generic  -c combine_variables.cpp -o 
> combine_variables.o
> g++ -m64 -I/usr/include/R -DNDEBUG -I../inst/include -DCOMPILING_DPLYR 
> -I/usr/local/include -I"/usr/lib64/R/library/Rcpp/include" 
> -I"/usr/lib64/R/library/BH/include"   -fpic  -O2 -g -pipe -Wall 
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
> --param=ssp-buffer-size=4 -m64 -mtune=generic  

Re: [R] with vs. attach

2016-05-06 Thread Hadley Wickham
You may want to read, which captures my
latest thinking (and tooling) around this problem. Feedback is much


On Fri, May 6, 2016 at 2:14 PM, David Winsemius  wrote:
>> On May 6, 2016, at 5:47 AM, Spencer Graves 
>>  wrote:
>> On 5/6/2016 6:46 AM, peter dalgaard wrote:
>>> On 06 May 2016, at 02:43 , David Winsemius  wrote:
> On May 5, 2016, at 5:12 PM, Spencer Graves 
>  wrote:
> I want a function to evaluate one argument
> in the environment of a data.frame supplied
> as another argument.  "attach" works for
> this, but "with" does not.  Is there a way
> to make "with" work?  I'd rather not attach
> the data.frame.
> With the following two functions "eval.w.attach"
> works but "eval.w.with" fails:
> dat <- data.frame(a=1:2)
> eval.w.attach <- function(x, dat){
>  attach(dat)
>  X <- x
>  detach()
>  X
> }
> eval.w.with <- function(x, dat){
>  with(dat, x)
> }
> eval.w.attach(a/2, dat) # returns c(.5, 1)
 How about using eval( substitute( ...))?

 eval.w.sub <- function(expr, datt){
   eval( substitute(expr), env=datt)
 eval.w.sub(a/2, dat)
 #[1] 0.5 1.0

>>> Actually, I think a better overall strategy is to say that if you want to 
>>> pass an expression to a function, then pass an expression object (or a call 
>>> object or maybe a formula object).
>>> Once you figure out _how_ your eval.w.attach works (sort of), you'll get 
>>> the creeps:
>>> Lazy evaluation causes the argument x to be evaluated after the attach(), 
>>> hence the evaluation environment of an actual argument is being temporarily 
>>> modified from inside a function.
>>> Apart from upsetting computer science purists, there could be hidden 
>>> problems: One major issue is that  values in "dat" could be masked by 
>>> values in the global environment, another issue is that an error in 
>>> evaluating the expression will leave dat attached. So at a minimum, you 
>>> need to recode using on.exit() magic.
>>> So my preferences go along these lines:
 dat <- data.frame(a=1:2)
 eval.expression <- function(e, dat) eval(e, dat)
 eval.expression(quote(a/2), dat)
>>> [1] 0.5 1.0
 eval.expression(expression(a/2), dat)
>>> [1] 0.5 1.0
 eval.formula <- function(f, dat) eval(f[[2]], dat)
 eval.formula(~a/2, dat)
>>> [1] 0.5 1.0
>> Hi, Peter:
>>  I don't like eval.expression or eval.formula, because they don't 
>> automatically accept what I naively thought should work and require more 
>> knowledge of the user.  What about David's eval.w.sub:
>> a <- pi
>> dat <- data.frame(a=1:2)
>> eval.w.sub <- function(a, Dat){
>>  eval( substitute(a), env=Dat)
>> }
>> > eval.w.sub(a/2, dat)
>> [1] 0.5 1.0
> I liked eval.expression and tested it with a bquote(...) argument to see if 
> that would succeed. It did, but it didn't return what you wanted for `a/2`, 
> so I tried seeing if a "double eval wuold deliver both yours and my desired 
> results:
>  eval.w.sub <- function(a, Dat){
>   eval( eval(substitute(a),Dat), env=Dat)
>  }
> x=2
>  eval.w.sub( a/2, dat)
> [1] 0.5 1.0
>  eval.w.sub( bquote(2*a*.(x) ), dat)
> [1] 4 8
> We are here retracing the path the Hadley took in some of his ggplot2 design 
> decsions. Unfortunately for me those NSE rules often left me confused about 
> what should and shouldn't be 'quoted' in the as-character sense and what 
> should be quote()-ed or "unquoted" in the bquote() sense.
> --
>>  This produces what's desired in a way that seems simpler to me.
>>  By the way, I really appreciate Peter's insightful comments:
>> eval.w.attachOops <- function(x, Dat){
>>  attach(Dat)
>>  X <- x
>>  detach()
>>  X
>> }
>> > eval.w.attachOops(a/2, dat)
>> The following object is masked _by_ .GlobalEnv:
>> [1] 1.570796
>> > eval.w.attachOops(b/2, dat)
>> The following object is masked _by_ .GlobalEnv:
>> Error in eval.w.attachOops(b/2, dat) : object 'b' not found
>> > search()
>> [1] ".GlobalEnv""Dat"   "package:graphics"
>> [4] "package:grDevices" "package:utils" "package:datasets"
>> [7] "package:methods"   "Autoloads" "package:base"
>> > objects(2)
>> [1] "a"
>> *** NOTES:
>>  1.  This gives a likely wrong answer with a warning if "a" exists in 
>> .GlobalEnv, and leaves "Dat" (NOT "dat") attached upon exit.
>>  2.  A stray "detach()" [not shown here] detached "package:stats".  oops.
>> *** Using "on.exit" fixes the problem with failure to detach but not the 
>> likely wrong answer:
>> detach()
>> search()
>> eval.w.attachStillWrong <- function(x, dat){
>>  attach(dat)
>>  on.exit(detach(dat))
>>  X <- x
>>  X
>> }
>> The following object is masked _by_ .GlobalEnv:
>> [1] 1.570796

Re: [R] with vs. attach

2016-05-09 Thread Hadley Wickham
On Sun, May 8, 2016 at 7:28 PM, Bert Gunter  wrote:
> Jeff:
> That's easy to do already with substitute(), since you can pass around
> an unevaluated expression (a parse tree) however you like. As I read
> it, (admittedly quickly) what it's main feature is that it allows you
> more control over the environment in which the expression is finally
> evaluated -- as well as permitting nested expression evaluation fairly
> easily.
> But maybe we're saying the same thing ...  IMHO I think Hadley has
> gone overboard here, worrying about rarely important issues, as you
> seem to be intimating also.

These are absolutely critical issues that crop up as soon as other
people want to write functions that use your functions that use NSE.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] with vs. attach

2016-05-09 Thread Hadley Wickham
On Mon, May 9, 2016 at 7:12 AM, peter dalgaard  wrote:
> On 09 May 2016, at 02:46 , Bert Gunter  wrote:
>> ... To be clear, Hadley or anyone else should also feel free to set me
>> straight, preferably publicly, but privately if you prefer.
> Not really to "set anyone straight", but there are some subtleties with mode 
> call objects versus expression objects and formulas to be aware of.
> E.g.,
>> a <- 2
>>"print", list(a*pi))
> [1] 6.283185
>>"print", list(quote(a*pi)))
> [1] 6.283185
>>"print", list(expression(a*pi)))
> expression(a * pi)
>>"print", list(~a*pi))
> ~a * pi
> Thing is, if you insert a call object into a parse tree, nothing is there to 
> preserve its nature as an unevaluated expression. Similarly, in
>> call("print", quote(a*pi))
> print(a * pi)
> the result is identical to quote(print(a * pi)), so when evaluated, quoting 
> is not seen by print().
> As far as I understand, this is also the reason that for math in ggplot, you 
> may need as.expression(bquote()).
> In general, I think that a number of things in R had been more cleanly 
> implemented using formulas/expression objects than using substitution and 
> lazy evaluation, notably subset and offset arguments in lm/glm. It would have 
> been so much cleaner to have
> lm(math ~ age, data = foo, subset = ~ sex=="1")
> than the current situation where lm internally chops its own head off and 
> substitutes with model.frame, then evaluates the call to model.frame() which 
> in turn does eval(substitute(subset), data, env). Of course, at the time, ~ 
> was intended specifically for Wilkinson Rogers type formulas; "abusing" it 
> for other kinds of expressions is something of an afterthought.

Yeah, to my mind, the cool thing about formulas is that they provide a
concise way to capture an environment and an expression, and then
Wilkinson Rogers are just a special case.

It's obvious impossible to go back and change how lm() etc works now,
but I'm reasonably confident that lazyeval provides a strong
foundation going forward. The quasiquotation stuff is particularly
important - and unquote-splice makes it possible to do things that are
impossible with bquote().  (Of course, unquote-splice could be added
to bquote(), but I think you'll still run into issues with



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about ‘The R Project’.

2016-11-14 Thread Hadley Wickham
>> We have a question about ‘The R Project’.
>> It looks like it’s an open source software, but the document from the 
>> website shows that it’s free of use not free of price.
>> Please, confirm us the if it cost fees to use it for commercial use.
>> If needed, could you inform us the price for it, too?
>> Best regards,
>> Jane Kim.
> Can I use R for commercial purposes?
> If you mean RStudio you have to pay for commercial use. RStudio and R are 
> different.

That's not true, as RStudio is also open source.  You don't have to
pay to use it commercially, but you might want to pay to use it
commercial because we provide additional features of use to people in
bigger companies.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] on ``unfolding'' a json into data frame columns

2016-11-29 Thread Hadley Wickham
Two quick hints:

* use simplifyDataFrame = FALSE in fromJSON()

* read


On Tue, Nov 29, 2016 at 8:06 AM, Daniel Bastos  wrote:
> Greetings!
> In an SQL table, I have a column that contains a JSON.  I'd like easy
> access to all (in an ideal world) of these JSON fields.  I started out
> trying to get all fields from the JSON and so I wrote this function.
> unfold.json <- function (df, column)
> {
> library(jsonlite)
> ret <- data.frame()
> for (i in 1:nrow(df)) {
> js <- fromJSON(df[i, ][[column]])
> ret <- rbind(ret, cbind(df[i, ], js))
> }
> ret
> }
> It takes a data frame and a column-string where the JSON is to be
> found.  It produces a new RET data frame with all the rows of DF but
> with new columns --- extracted from every field in the JSON.
> (The performance is horrible.)
> fromJSON sometimes produces a list that sometimes contains a data frame.
> As a result, I end up getting a RET data frame with duplicated rows.
> Here's what happens.
>> nrow(df)
> [1] 1
>> nrow(unfold.json(df, "response"))
> [1] 3
> Warning messages:
> 1: In data.frame(CreateUTC = "2016-11-29 02:00:43", Payload = list( :
>   row names were found from a short variable and have been discarded
> 2: In data.frame(..., check.names = FALSE) :
>   row names were found from a short variable and have been discarded
> I expected a data frame with 1 row.  The reason 3 rows is produced is
> because in the JSON there's an array with 3 rows.
>> fromJSON(df$response)$RawPayload
> [1] 200   1 128
> I have also cases where fromJSON(df$response)$Payload$Fields is a data
> frame containing various rows.  So unfold.json produces a data frame
> with these various rows.
> So I gave up on this general approach.
> (*) My humble approach
> For the moment I'm not interested in RawPayload nor Payload$Fields, so I
> nullified them in this new approach.  To improve performance, I guessed
> perhaps merge() would help and I think it did, but this was not at all a
> decision thought out.
> <- function (df, column)
> {
> library(jsonlite)
> ret <- data.frame()
> if (nrow(df) > 0) {
> for (i in 1:nrow(df)) {
> ls <- fromJSON(df[i, ][[column]])
> ls$RawPayload <- NULL
> ls$Payload$Fields <- NULL
> js <- data.frame(ls)
> ret <- rbind(ret, merge(df[i, ], js))
> }
> }
> ret
> }
> I'm looking for advice.  How would you approach this problem?
> Thank you!
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexpected interference between dplyr and plm

2016-11-29 Thread Hadley Wickham
On Tue, Nov 29, 2016 at 11:52 AM, William Dunlap  wrote:
>>The other option would be to load dplyr first (which would give the waring
>> that >stats::lag was masked) and then later load plm (which should give a
>> further >warning that dplyr::lag is masked). Then the plm::lag function will
>> be found
> Another option is to write the package maintainers and complain
> that masking core functions is painful for users.

Don't worry; many people have done that.



__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot aestetics: beginner question - I am lost in endless possibilites

2016-12-15 Thread Hadley Wickham
You are going to find your life much easier if you:

* Organise your code so it's easier to read
* Use a consistent naming scheme for your variables
* Learn a bit more about how to modify variables succintly

Here's my rewriting of your script to make it easier to see what's going on.


df <- tibble(
  name = c("Ernie","Ernie","Ernie", "Leon","Leon","Leon"),
  recorded_time = c("03.01.2011","04.01.2011","05.01.2011",
  known_state = c("breeding","moulting","moulting", "breeding","breeding",NA)
df$recorded_time <- lubridate::dmy(df$recorded_time)

ggplot(df) +
aes(recorded_time, name, fill = known_state),
colour = "black",
height = 0.5
  ) +
  scale_fill_discrete(na.value = "white")


On Thu, Dec 15, 2016 at 8:22 PM, Dagmar  wrote:
> # Dear all,
> # I hope someone can help me with this. I am so lost and can't find a
> solution even though I spent hours on searching for a solution of that tiny
> problem.
> # Maybe someone of you could give me hint?
> #This is my string:
> exdatframe <- data.frame(Name=c("Ernie","Ernie","Ernie",
> "Leon","Leon","Leon"),
>   recordedTime=c("03.01.2011","04.01.2011","05.01.2011",
> "04.01.2011","05.01.2011","06.01.2011"),
>knownstate =c("breeding","moulting","moulting",
>  "breeding","breeding",NA))
> exdatframe
> exdatframeT <- as.POSIXct
> (strptime(as.character(exdatframe$recordedTime),"%d.%m.%Y"))
> exdatframeT
> exdatframe2 <- cbind(exdatframe, exdatframeT)
> exdatframe2$recordedTime <-NULL
> exdatframe2
> str(datframe)
> library(ggplot2)
> ggplot(exdatframe2)+geom_tile(aes(x=exdatframeT,y=Name,fill=knownstate),
> height=0.5)
> # Now all I want is:
> # 1) a black outline around the bars. Adding colour="black" like I have
> found elsewere on the internet doesn't work
> # 2) change the colours: E.g. I want white for NAs. I can't find a command
> to describe my wishes.
> #?
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assessing the name of an object within an argument

2017-01-10 Thread Hadley Wickham
You might find helpful.


On Tue, Jan 10, 2017 at 2:49 AM,   wrote:
> Hi All,
> I have a function like
> my_func <- function(dataset)
> {
>   some operation
> }
> Now I would like not only to operate on the dataset (how this is done is
> obvious) but I would like to get the name of the dataset handed over as an
> argument.
> Example:
> my_func <- function(dataset = iris)
> {
>   print(dataset)  # here I do not want to print the dataset but the name
> of the object - iris in this case - instead
>   # quote() does not do the trick cause it prints "dataset" instead of
> "iris"
>   # gives an error saying that the object can not coerced to a
> symbol
> }
> Is there a way to do this?
> Kind regards
> Georg
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Hadley Wickham
See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip()


On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp  wrote:
> I am trying to read a series of XML files that use a namespace and I have 
> failed, thus far, to discover the proper syntax. I have a reproducible 
> example below. I have two XML character strings defined: one without a 
> namespace and one with. I show that I can successfully extract the node using 
> the XML string without the namespace and fail when using the XML string with 
> the namespace.
> Mark
> PS I am having the same problem with the xml2 package and am hoping 
> understanding one with help with the other.
> ##
> library(XML)
> ## The first XML text (no_ns_xml) does not have a namespace defined
> no_ns_xml <- c("", "",
>"MFIA 9-Plex (CharlesRiver)",
> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>useInternalNodes = TRUE)
> ## The node is found
> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
> ## The second XML text (with_ns_xml) has a namespace defined
> with_ns_xml <- c("",
>  "\";>",
>  "MFIA 9-Plex (CharlesRiver)",
>  "")
> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
>useInternalNodes = TRUE)
> ## The node is not found
> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
> ## I attempt to provide the namespace, but fail.
> ns <-  "";
> names(ns)[1] <- "xmlns"
> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
> R. Mark Sharp, Ph.D.
> Director of Data Science Core
> Southwest National Primate Research Center
> Texas Biomedical Research Institute
> P.O. Box 760549
> San Antonio, TX 78245-0549
> Telephone: (210)258-9476
> e-mail:
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
> __
> mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


__ mailing list -- To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Failure to understand namespaces in XML::getNodeSet

2017-01-31 Thread Hadley Wickham
I think you want

x <- read_xml('";>
  MFIA 9-Plex (CharlesRiver)

The collapse argument do what you think it does.


On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharp  wrote:
> Hadley,
> Thank you. I am able to get the xml_ns_strip() function to work with my file 
> directly so I will likely be able to reach my immediate goal.
> However, I still have had no success with understanding the namespace 
> problem. I am not able to use read_xml() using the object I generated for the 
> reproducible example, which is simply a character vector of length 4 having 
> the contents of the XML file as produce by readLines(). I then used dput() to 
> define the structure. The resulting structure apparently is not to the liking 
> of read_xml(). I have reproduced the necessary code here for your 
> convenience. There error is below.
> ##
> library(xml2)
> library(stringr)
> with_ns_xml <- c("",
>  "\";>",
>  "MFIA 9-Plex (CharlesRiver)",
>  "")
> ## without str_c() collapse it complain of a vector of length > 1 also.
> read_xml(str_c(with_ns_xml, collapse = TRUE))
> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = 
> as_html,  :
>   Start tag expected, '<' not found [4]
> ## produces the following error message.
> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = 
> as_html,  :
>   Start tag expected, '<' not found [4]
> I have similar issues with xml2::xml_find_all
> xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description")
> ## Produces the following error message.
> Error in UseMethod("xml_find_all") :
>   no applicable method for 'xml_find_all' applied to an object of class 
> "character"
> R. Mark Sharp, Ph.D.
>> On Jan 31, 2017, at 4:27 PM, Hadley Wickham  wrote:
>> See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip()
>> Hadley
>> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp  wrote:
>>> I am trying to read a series of XML files that use a namespace and I have 
>>> failed, thus far, to discover the proper syntax. I have a reproducible 
>>> example below. I have two XML character strings defined: one without a 
>>> namespace and one with. I show that I can successfully extract the node 
>>> using the XML string without the namespace and fail when using the XML 
>>> string with the namespace.
>>> Mark
>>> PS I am having the same problem with the xml2 package and am hoping 
>>> understanding one with help with the other.
>>> ##
>>> library(XML)
>>> ## The first XML text (no_ns_xml) does not have a namespace defined
>>> no_ns_xml <- c("", "",
>>>   "MFIA 9-Plex (CharlesRiver)",
>>>   "")
>>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>>>   useInternalNodes = TRUE)
>>> ## The node is found
>>> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>>> ## The second XML text (with_ns_xml) has a namespace defined
>>> with_ns_xml <- c("",
>>> "\";>",
>>> "MFIA 9-Plex (CharlesRiver)",
>>> "")
>>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
>>>   useInternalNodes = TRUE)
>>> ## The node is not found
>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
>>> ## I attempt to provide the namespace, but fail.
>>> ns <-  "";
>>> names(ns)[1] <- "xmlns"
>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
>>> R. Mark Sharp, Ph.D.
>>> Director of Data Science Core
>>> Southwest National Primate Research Center
>>> Texas Biomedical Research Institute
>>> P.O. Box 760549
>>> San Antonio, TX 78245-0549
>>> Telephone: (210)258-9476
>>> e-mail:
>>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...

  1   2   3   4   5   6   7   8   9   10   >