My mental model for the `[` vs `[[` behavior is that `[` indexes multiple results while `[[` indexes only one item. If returning multiple items from a list the result must be a list. For consistency, `[` always returns a list when applied to a list. The double bracket drops the containing list.
The is.vector() behavior is not intuitive to me... I avoid that function, as I think it is more useful to think of lists as vectors than as something "other". On July 9, 2021 3:44:29 PM PDT, Bert Gunter <bgunter.4...@gmail.com> wrote: >OK, I stand somewhat chastised. > >But my point still is that what you get when you "extract" depends on >how you define "extract." Do note that ?"[" yields a help file titled >"Extract or Replace Parts of an object"; and afaics, the term "subset" >is not explicitly used as Duncan prefers. The relevant part of the >Help file says for "[" for recursive objects says: "Indexing by [ is >similar to atomic vectors and selects a list of the specified >element(s)." That a data.frame is a list is explicitly stated, as I >noted; that lists are in fact vectors is also explicitly stated (?list >says: "Almost all lists in R internally are Generic Vectors") but then >one is stuck with: a data.frame is a list and therefore a vector, but >is.vector(d3) is FALSE. The explanation is explicit again in >?is.vector ("is.vector returns TRUE if x is a vector of the specified >mode having no attributes other than names. It returns FALSE >otherwise."). But I would say these issues are sufficiently murky that >my warning to be precise is not entirely inappropriate; unfortunately, >I may have made them more so. Sigh.... > >Cheers, >Bert > > > >Bert Gunter > >"The trouble with having an open mind is that people keep coming along >and sticking things into it." >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch ><murdoch.dun...@gmail.com> wrote: >> >> On 09/07/2021 5:51 p.m., Jeff Newmiller wrote: >> > "Strictly speaking", Greg is correct, Bert. >> > >> > >https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects >> > >> > Lists in R are vectors. What we colloquially refer to as "vectors" >are more precisely referred to as "atomic vectors". And without a >doubt, this "vector" nature of lists is a key underlying concept that >explains why adding a dim attribute creates a matrix that can hold data >frames. It is also a stumbling block for programmers from other >languages that have things like linked lists. >> >> I would also object to v3 (below) as "extracting" a column from d. >> "d[2]" doesn't extract anything, it "subsets" the data frame, so the >> result is a data frame, not what you get when you extract something >from >> a data frame. >> >> People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly >legal. >> That extracts the 3rd element (the number 3). The problem is that R >has >> no way to represent a scalar number, only a vector of numbers, so >x[[3]] >> gets promoted to a vector containing that number when it is returned >and >> assigned to y. >> >> Lists are vectors of R objects, so if x is a list, x[[3]] is >something >> that can be returned, and it is different from x[3]. >> >> Duncan Murdoch >> >> > >> > On July 9, 2021 2:36:19 PM PDT, Bert Gunter ><bgunter.4...@gmail.com> wrote: >> >> "1. a column, when extracted from a data frame, *is* a vector." >> >> Strictly speaking, this is false; it depends on exactly what is >meant >> >> by "extracted." e.g.: >> >> >> >>> d <- data.frame(col1 = 1:3, col2 = letters[1:3]) >> >>> v1 <- d[,2] ## a vector >> >>> v2 <- d[[2]] ## the same, i.e >> >>> identical(v1,v2) >> >> [1] TRUE >> >>> v3 <- d[2] ## a data.frame >> >>> v1 >> >> [1] "a" "b" "c" ## a character vector >> >>> v3 >> >> col2 >> >> 1 a >> >> 2 b >> >> 3 c >> >>> is.vector(v1) >> >> [1] TRUE >> >>> is.vector(v3) >> >> [1] FALSE >> >>> class(v3) ## data.frame >> >> [1] "data.frame" >> >> ## but >> >>> is.list(v3) >> >> [1] TRUE >> >> >> >> which is simply explained in ?data.frame (where else?!) by: >> >> "A data frame is a **list** [emphasis added] of variables of the >same >> >> number of rows with unique row names, given class "data.frame". If >no >> >> variables are included, the row names determine the number of >rows." >> >> >> >> "2. maybe your question is "is a given function for a vector, or >for a >> >> data frame/matrix/array?". if so, i think the only way is >reading >> >> the help information (?foo)." >> >> >> >> Indeed! Is this not what the Help system is for?! But note also >that >> >> the S3 class system may somewhat blur the issue: foo() may work >> >> appropriately and differently for different (S3) classes of >objects. A >> >> detailed explanation of this behavior can be found in appropriate >> >> resources or (more tersely) via ?UseMethod . >> >> >> >> "you might find reading ?"[" and ?"[.data.frame" useful" >> >> >> >> Not just 'useful" -- **essential** if you want to work in R, >unless >> >> one gets this information via any of the numerous online >tutorials, >> >> courses, or books that are available. The Help system is accurate >and >> >> authoritative, but terse. I happen to like this mode of >documentation, >> >> but others may prefer more extended expositions. I stand by this >claim >> >> even if one chooses to use the "Tidyverse", data.table package, or >> >> other alternative frameworks for handling data. Again, others may >> >> disagree, but R is structured around these basics, and imo one >remains >> >> ignorant of them at their peril. >> >> >> >> Cheers, >> >> Bert >> >> >> >> >> >> Bert Gunter >> >> >> >> "The trouble with having an open mind is that people keep coming >along >> >> and sticking things into it." >> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> >> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minsh...@umich.edu> >> >> wrote: >> >>> >> >>> Kai, >> >>> >> >>>> one more question, how can I know if the function is for column >> >>>> manipulations or for vector? >> >>> >> >>> i still stumble around R code. but, i'd say the following (and >look >> >>> forward to being corrected! :): >> >>> >> >>> 1. a column, when extracted from a data frame, *is* a vector. >> >>> >> >>> 2. maybe your question is "is a given function for a vector, or >for >> >> a >> >>> data frame/matrix/array?". if so, i think the only way is >> >> reading >> >>> the help information (?foo). >> >>> >> >>> 3. sometimes, extracting the column as a vector from a data >> >> frame-like >> >>> object might be non-intuitive. you might find reading ?"[" >and >> >>> ?"[.data.frame" useful (as well as ?"[.data.table" if you >use >> >> that >> >>> package). also, the str() command can be helpful in >> >> understanding >> >>> what is happening. (the lobstr:: package's sxp() function, >as >> >> well >> >>> as more verbose .Internal(inspect()) can also give you >insight.) >> >>> >> >>> with the data.table:: package, for example, if "DT" is a >> >> data.table >> >>> object, with "x2" as a column, adding or leaving off >quotation >> >> marks >> >>> for the column name can make all the difference between >ending up >> >>> with a vector, or with a (much reduced) data table: >> >>> ---- >> >>>> is.vector(DT[, x2]) >> >>> [1] TRUE >> >>>> str(DT[, x2]) >> >>> num [1:9] 32 32 32 32 32 32 32 32 32 >> >>>> >> >>>> is.vector(DT[, "x2"]) >> >>> [1] FALSE >> >>>> str(DT[, "x2"]) >> >>> Classes ‘data.table’ and 'data.frame': 9 obs. of 1 variable: >> >>> $ x2: num 32 32 32 32 32 32 32 32 32 >> >>> - attr(*, ".internal.selfref")=<externalptr> >> >>> ---- >> >>> >> >>> a second level of indexing may or may not help, mostly >depending >> >> on >> >>> the use of '[' versus of '[['. this can sometimes cause >> >> confusion >> >>> when you are learning the language. >> >>> ---- >> >>>> str(DT[, "x2"][1]) >> >>> Classes ‘data.table’ and 'data.frame': 1 obs. of 1 variable: >> >>> $ x2: num 32 >> >>> - attr(*, ".internal.selfref")=<externalptr> >> >>>> str(DT[, "x2"][[1]]) >> >>> num [1:9] 32 32 32 32 32 32 32 32 32 >> >>> ---- >> >>> >> >>> the tibble:: package (used in, e.g., the dplyr:: package) >also >> >>> (always?) returns a single column as a non-vector. again, a >> >>> second indexing with double '[[]]' can produce a vector. >> >>> ---- >> >>>> DP <- tibble(DT) >> >>>> is.vector(DP[, "x2"]) >> >>> [1] FALSE >> >>>> is.vector(DP[, "x2"][[1]]) >> >>> [1] TRUE >> >>> ---- >> >>> >> >>> but, note that a list of lists is also a vector: >> >>>> is.vector(list(list(1), list(1,2,3))) >> >>> [1] TRUE >> >>>> str(list(list(1), list(1,2,3))) >> >>> List of 2 >> >>> $ :List of 1 >> >>> ..$ : num 1 >> >>> $ :List of 3 >> >>> ..$ : num 1 >> >>> ..$ : num 2 >> >>> ..$ : num 3 >> >>> >> >>> etc. >> >>> >> >>> hth. good luck learning! >> >>> >> >>> cheers, Greg >> >>> >> >>> ______________________________________________ >> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >>> and provide commented, minimal, self-contained, reproducible >code. >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.