A bit too fast there, Duncan... x[[c(1,2)]] is illegal. On July 9, 2021 5:16:13 PM PDT, Duncan Murdoch <murdoch.dun...@gmail.com> wrote: >On 09/07/2021 6:44 p.m., Bert Gunter wrote: >> OK, I stand somewhat chastised. >> >> But my point still is that what you get when you "extract" depends on >> how you define "extract." Do note that ?"[" yields a help file titled >> "Extract or Replace Parts of an object"; and afaics, the term >"subset" >> is not explicitly used as Duncan prefers. > >?"[[" gives you the same page, but I agree: this part of the >documentation isn't written very clearly. The "Introduction to R" >manual >uses the terms I used (see section 2.7, "Index vectors; selecting and >modifying subsets of a data set"), as does the source code (and the R >Language Definition manual, though it's not as clear as the Intro). > >But the point isn't to chastise you, it's to educate you (and the OP). >Thinking of [] as subsetting is more helpful than thinking of it as >extraction. That way the result of x[c(1,2)] makes sense. It's a >little bit more of a stretch, but the result of x[[c(1,2)]] also makes >sense when you think of it as extraction. > >Duncan Murdoch > > The relevant part of the >> Help file says for "[" for recursive objects says: "Indexing by [ is >> similar to atomic vectors and selects a list of the specified >> element(s)." That a data.frame is a list is explicitly stated, as I >> noted; that lists are in fact vectors is also explicitly stated >(?list >> says: "Almost all lists in R internally are Generic Vectors") but >then >> one is stuck with: a data.frame is a list and therefore a vector, but >> is.vector(d3) is FALSE. The explanation is explicit again in >> ?is.vector ("is.vector returns TRUE if x is a vector of the specified >> mode having no attributes other than names. It returns FALSE >> otherwise."). But I would say these issues are sufficiently murky >that >> my warning to be precise is not entirely inappropriate; >unfortunately, >> I may have made them more so. Sigh.... >> >> Cheers, >> Bert >> >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming >along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch ><murdoch.dun...@gmail.com> wrote: >>> >>> On 09/07/2021 5:51 p.m., Jeff Newmiller wrote: >>>> "Strictly speaking", Greg is correct, Bert. >>>> >>>> >https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects >>>> >>>> Lists in R are vectors. What we colloquially refer to as "vectors" >are more precisely referred to as "atomic vectors". And without a >doubt, this "vector" nature of lists is a key underlying concept that >explains why adding a dim attribute creates a matrix that can hold data >frames. It is also a stumbling block for programmers from other >languages that have things like linked lists. >>> >>> I would also object to v3 (below) as "extracting" a column from d. >>> "d[2]" doesn't extract anything, it "subsets" the data frame, so the >>> result is a data frame, not what you get when you extract something >from >>> a data frame. >>> >>> People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly >legal. >>> That extracts the 3rd element (the number 3). The problem is that R >has >>> no way to represent a scalar number, only a vector of numbers, so >x[[3]] >>> gets promoted to a vector containing that number when it is returned >and >>> assigned to y. >>> >>> Lists are vectors of R objects, so if x is a list, x[[3]] is >something >>> that can be returned, and it is different from x[3]. >>> >>> Duncan Murdoch >>> >>>> >>>> On July 9, 2021 2:36:19 PM PDT, Bert Gunter ><bgunter.4...@gmail.com> wrote: >>>>> "1. a column, when extracted from a data frame, *is* a vector." >>>>> Strictly speaking, this is false; it depends on exactly what is >meant >>>>> by "extracted." e.g.: >>>>> >>>>>> d <- data.frame(col1 = 1:3, col2 = letters[1:3]) >>>>>> v1 <- d[,2] ## a vector >>>>>> v2 <- d[[2]] ## the same, i.e >>>>>> identical(v1,v2) >>>>> [1] TRUE >>>>>> v3 <- d[2] ## a data.frame >>>>>> v1 >>>>> [1] "a" "b" "c" ## a character vector >>>>>> v3 >>>>> col2 >>>>> 1 a >>>>> 2 b >>>>> 3 c >>>>>> is.vector(v1) >>>>> [1] TRUE >>>>>> is.vector(v3) >>>>> [1] FALSE >>>>>> class(v3) ## data.frame >>>>> [1] "data.frame" >>>>> ## but >>>>>> is.list(v3) >>>>> [1] TRUE >>>>> >>>>> which is simply explained in ?data.frame (where else?!) by: >>>>> "A data frame is a **list** [emphasis added] of variables of the >same >>>>> number of rows with unique row names, given class "data.frame". If >no >>>>> variables are included, the row names determine the number of >rows." >>>>> >>>>> "2. maybe your question is "is a given function for a vector, or >for a >>>>> data frame/matrix/array?". if so, i think the only way is >reading >>>>> the help information (?foo)." >>>>> >>>>> Indeed! Is this not what the Help system is for?! But note also >that >>>>> the S3 class system may somewhat blur the issue: foo() may work >>>>> appropriately and differently for different (S3) classes of >objects. A >>>>> detailed explanation of this behavior can be found in appropriate >>>>> resources or (more tersely) via ?UseMethod . >>>>> >>>>> "you might find reading ?"[" and ?"[.data.frame" useful" >>>>> >>>>> Not just 'useful" -- **essential** if you want to work in R, >unless >>>>> one gets this information via any of the numerous online >tutorials, >>>>> courses, or books that are available. The Help system is accurate >and >>>>> authoritative, but terse. I happen to like this mode of >documentation, >>>>> but others may prefer more extended expositions. I stand by this >claim >>>>> even if one chooses to use the "Tidyverse", data.table package, or >>>>> other alternative frameworks for handling data. Again, others may >>>>> disagree, but R is structured around these basics, and imo one >remains >>>>> ignorant of them at their peril. >>>>> >>>>> Cheers, >>>>> Bert >>>>> >>>>> >>>>> Bert Gunter >>>>> >>>>> "The trouble with having an open mind is that people keep coming >along >>>>> and sticking things into it." >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>> >>>>> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minsh...@umich.edu> >>>>> wrote: >>>>>> >>>>>> Kai, >>>>>> >>>>>>> one more question, how can I know if the function is for column >>>>>>> manipulations or for vector? >>>>>> >>>>>> i still stumble around R code. but, i'd say the following (and >look >>>>>> forward to being corrected! :): >>>>>> >>>>>> 1. a column, when extracted from a data frame, *is* a vector. >>>>>> >>>>>> 2. maybe your question is "is a given function for a vector, or >for >>>>> a >>>>>> data frame/matrix/array?". if so, i think the only way is >>>>> reading >>>>>> the help information (?foo). >>>>>> >>>>>> 3. sometimes, extracting the column as a vector from a data >>>>> frame-like >>>>>> object might be non-intuitive. you might find reading ?"[" >and >>>>>> ?"[.data.frame" useful (as well as ?"[.data.table" if you >use >>>>> that >>>>>> package). also, the str() command can be helpful in >>>>> understanding >>>>>> what is happening. (the lobstr:: package's sxp() function, >as >>>>> well >>>>>> as more verbose .Internal(inspect()) can also give you >insight.) >>>>>> >>>>>> with the data.table:: package, for example, if "DT" is a >>>>> data.table >>>>>> object, with "x2" as a column, adding or leaving off >quotation >>>>> marks >>>>>> for the column name can make all the difference between >ending up >>>>>> with a vector, or with a (much reduced) data table: >>>>>> ---- >>>>>>> is.vector(DT[, x2]) >>>>>> [1] TRUE >>>>>>> str(DT[, x2]) >>>>>> num [1:9] 32 32 32 32 32 32 32 32 32 >>>>>>> >>>>>>> is.vector(DT[, "x2"]) >>>>>> [1] FALSE >>>>>>> str(DT[, "x2"]) >>>>>> Classes ‘data.table’ and 'data.frame': 9 obs. of 1 variable: >>>>>> $ x2: num 32 32 32 32 32 32 32 32 32 >>>>>> - attr(*, ".internal.selfref")=<externalptr> >>>>>> ---- >>>>>> >>>>>> a second level of indexing may or may not help, mostly >depending >>>>> on >>>>>> the use of '[' versus of '[['. this can sometimes cause >>>>> confusion >>>>>> when you are learning the language. >>>>>> ---- >>>>>>> str(DT[, "x2"][1]) >>>>>> Classes ‘data.table’ and 'data.frame': 1 obs. of 1 variable: >>>>>> $ x2: num 32 >>>>>> - attr(*, ".internal.selfref")=<externalptr> >>>>>>> str(DT[, "x2"][[1]]) >>>>>> num [1:9] 32 32 32 32 32 32 32 32 32 >>>>>> ---- >>>>>> >>>>>> the tibble:: package (used in, e.g., the dplyr:: package) >also >>>>>> (always?) returns a single column as a non-vector. again, >a >>>>>> second indexing with double '[[]]' can produce a vector. >>>>>> ---- >>>>>>> DP <- tibble(DT) >>>>>>> is.vector(DP[, "x2"]) >>>>>> [1] FALSE >>>>>>> is.vector(DP[, "x2"][[1]]) >>>>>> [1] TRUE >>>>>> ---- >>>>>> >>>>>> but, note that a list of lists is also a vector: >>>>>>> is.vector(list(list(1), list(1,2,3))) >>>>>> [1] TRUE >>>>>>> str(list(list(1), list(1,2,3))) >>>>>> List of 2 >>>>>> $ :List of 1 >>>>>> ..$ : num 1 >>>>>> $ :List of 3 >>>>>> ..$ : num 1 >>>>>> ..$ : num 2 >>>>>> ..$ : num 3 >>>>>> >>>>>> etc. >>>>>> >>>>>> hth. good luck learning! >>>>>> >>>>>> cheers, Greg >>>>>> >>>>>> ______________________________________________ >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible >code. >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>
-- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.