Gents: You've both been polite and thoughtful, but I think you should take your discussion private, no?
-- Bert On Wed, May 22, 2013 at 12:57 PM, Alexandre Sieira <alexandre.sie...@gmail.com> wrote: > Please let's not turn this into an ad hominem discussion by adding remarks on > what the other thinks or knows, as this will get us nowhere fast. Let's focus > on the issue, ok? :) > > Again, the point behind my workaround was to try to change the rest of my > program as little as possible while I waited for the maintainer of the hash > package to respond. I found it was an acceptable compromise, even if it does, > as you say, add complexity. > > > As for embracing vectorization, I got into this problem exactly because I > wanted the data to be returned in a vector using the values() function. in > the first place. > > > I agree with your observation that simpler is better. However, I won't get > into the details of why I decided to use hash instead of other data > structures in my architecture, since I don't mean to put that up for > discussion on a public list. I understand you offered alternatives with the > best of intentions, and I thank you. But after careful consideration I still > think using hash is the best option and will stick with it on my code. > > Given those premises, I would ask you and the list again if you think there > is a better way of achieving what my unlistPOSIXct function does that is > closer to the natural paradigm of R. The only equivalent I found in base R is > the unlist function, but its documentation explicitly states it will coerce > data to primitive data types. So unfortunately it doesn't help me. > > Working with POSIXct in a list precludes me from doing lots of necessary > operations in a vectorized way, such as min() and max(), that will work on > POSIXct vectors. That is why I need to convert the list back into a vector in > an efficient manner and without unclassing the objects. Would really > appreciate any help with that. > > Thank you again for your interest and advice. > > -- > Alexandre Sieira > CISA, CISSP, ISO 27001 Lead Auditor > > "The truth is rarely pure and never simple." > Oscar Wilde, The Importance of Being Earnest, 1895, Act I > On 22 de maio de 2013 at 15:59:46, Jeff Newmiller (jdnew...@dcn.davis.ca.us) > wrote: > My perception of illogic was in your addition of more data structure > complexity when faced with this difficulty. R has best performance when > calculations are pushed into simple typed vectors where precompiled code can > handle the majority of the work. These are simpler structures, not more > complex structures. It seems like you are fighting the natural paradigm for > working in R and holding fast to your ideas about how things "should be" > rather than dealing with how they "are" by introducing lists rather than > working with vectors or data frames. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > Alexandre Sieira <alexandre.sie...@gmail.com> wrote: > >>Hi, Jeff. >> >>Thanks for your thoughtful suggestions. >> >>I do not plan to wait for the hash package to be redesigned to meet my >>expectations. As a matter of fact, I have: >> >>a) Submitted a report of unexpected behavior in hash::values, which the >>package maintainer quickly replied to and said would examine. >>b) Designed (with the help of this list) and implemented a workaround >>in the form of wrapping the POSIXct objects in lists, which has my >>program working correctly for now. >> >>If the hash package is updated and the workaround is no longer >>necessary, then I'll reverse this change. Otherwise, I'll look more >>deeply into my alternatives which might involve maintaining this >>workaround permanently, or analyzing alternative architectures. >> >>The hash package is a beautiful piece of code that is working perfectly >>for me in many situations. Even with the list wrapping around the >>POSIXct objects, it is meeting my performance requirements much better >>than the alternatives I tested. So I'd rather not completely >>re-engineer working complex code without a very good reason. >> >>However, I would like to respectfully disagree with you that my >>reaction to hash::values behavior was illogical. I don't want to start >>a flame war or anything, so let's try to keep the discussion civil. :) >> >>See, a hash table (or a queue, or a stack, or an R vector) is a data >>structure that works as a container. You insert objects and you get >>them back according to the specificities of each data structure (stacks >>will have a FILO ordering, queues will have FIFO ordering, hashes will >>maintain key/value pairs, and so). >> >>It is completely unreasonable to insert an object of class X into a >>container, and then get it back altered in a way that is not part of >>the 'contract' behind the data structure. If I assign X to key K on a >>hash, however I choose to ask the hash for the value associated with >>key K back, I should get exactly X as a response. I believe most >>computer scientists would agree that to be self-evident. >> >>And that is to be expected by reading hash::values documentation: >> >> Extract values from a hash object. This is a pseudo- accessor method >>that returns hash values (without keys) as a vector if possible, a list >>otherwise. >> >> >>Moreover, it has this to say about non-primitive types: >> >> If the values are of different types or of a complex class than a >>named list is returned. >> >> >>It never says it will unclass objects, or coerce them into primitive >>types. Hence the 'contract' implies I will get back what I inserted, >>unaltered, either in a vector or a list. And that is provably not what >>is happening. I would have been ok with a vector of POSIXct or a named >>list containing the POSIXct values, but instead I am getting a numeric >>vector. >> >>I understand R is based on S, and that OOP concepts were introduced >>later into its history. However, one of the key concepts in OOP is >>encapsulation - as an outside entity you do not get to see the internal >>implementation of a class, you interact with it exclusively through its >>published "interface" (method, public member variables, etc). >> >>I cannot find any justification as for why an object "losing" its class >>unintentionally is ever acceptable, as it violates the concept of >>encapsulation. That is essentially what's happening if I look up >>several keys using values(). So this violates the encapsulation of the >>POSIXct class, as I am exposed to its internal numeric value. Moreover, >>it breaks the "method-dispatch" of R functions that know to treat >>POSIXct values differently. All of a sudden, the POSIXct objects I >>inserted are being treated, for example, by format as numeric instead >>of being dispatched to format.Date as expected. >> >>So I don't think my reaction to this issue was illogical at all. Hope >>you'll agree now that I've explained myself a little better. :) >> >>-- >>Alexandre Sieira >>CISA, CISSP, ISO 27001 Lead Auditor >> >>"The truth is rarely pure and never simple." >>Oscar Wilde, The Importance of Being Earnest, 1895, Act I >>On 21 de maio de 2013 at 22:44:19, Jeff Newmiller >>(jdnew...@dcn.davis.ca.us) wrote: >>I recommend that you not plan on waiting for the hash package to be >>redesigned to meet your expectations. Also, your response to >>discovering this feature of the hash package seems illogical. >> > >From a computer science perspective, the hash mechanism is an >>implementation trick that is intended to improve lookup speed. It does >>not actually represent a fundamental data structure like a vector or a >>set does. You can always put your keys in a vector and search through >>them (e.g. vector indexing by string) to get an equivalent data >>retrieval. If the hash package is not improving the speed of your data >>access, adding an extra layer of data structure is hardly an >>appropriate solution. >> >>Why are you not using normal vectors or data frames and accessing with >>string or logical indexing? >> >>If you are avoiding vectors because they seem slow in loops, perhaps >>you just need to preallocate the vectors you will store your results in >>before your loop to regain acceptable speed. Or, perhaps the >>duplicated() or merge() functions could save you from this mess of >>incremental data processing. >>--------------------------------------------------------------------------- >> >>Jeff Newmiller The ..... ..... Go Live... >>DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... >>Live: OO#.. Dead: OO#.. Playing >>Research Engineer (Solar/Batteries O.O#. #.O#. with >>/Software/Embedded Controllers) .OO#. .OO#. rocks...1k >>--------------------------------------------------------------------------- >> >>Sent from my phone. Please excuse my brevity. >> >>Alexandre Sieira <alexandre.sie...@gmail.com> wrote: >> >>>You are absolutely right. >>> >>>I am storing POSIXct objects into a hash (from the hash package). >>>However, if I try to get them out as a vector using the values() >>>function, they are unclassed. And that breaks my (highly vectorized) >>>code. Take a look at this: >>> >>> >>>> h = hash() >>>> h[["a"]] = Sys.time() >>>> str(h[["a"]]) >>> POSIXct[1:1], format: "2013-05-20 16:54:28" >>>> str(values(h)) >>> Named num 1.37e+09 >>> - attr(*, "names")= chr "a" >>> >>> >>>I have reported this to the hash package maintainers. In the meantime, >> >>>however, I am storing, for each key, a list containing a single >>>POSIXct. Then, when I extract all using values(), I get a list >>>containing all POSIXct entries with class preserved. >>> >>> >>>> h = hash() >>>> h[["a"]] = list( Sys.time() ) >>>> h[["b"]] = list( Sys.time() ) >>>> h[["c"]] = list( Sys.time() ) >>>> values(h) >>>$a >>>[1] "2013-05-21 09:54:03 BRT" >>> >>>$b >>>[1] "2013-05-21 09:54:07 BRT" >>> >>>$c >>>[1] "2013-05-21 09:54:11 BRT" >>> >>>> str(values(h)) >>>List of 3 >>> $ a: POSIXct[1:1], format: "2013-05-21 09:54:03" >>> $ b: POSIXct[1:1], format: "2013-05-21 09:54:07" >>> $ c: POSIXct[1:1], format: "2013-05-21 09:54:11" >>> >>> >>>However, the next thing I need to do is a min() over that list, so I >>>need to convert the list into a vector again. >>> >>>I agree completely with you that this is horrible for performance, but >> >>>it is a temporary workaround until values() is "fixed". >>> >>>-- >>>Alexandre Sieira >>>CISA, CISSP, ISO 27001 Lead Auditor >>> >>>"The truth is rarely pure and never simple." >>>Oscar Wilde, The Importance of Being Earnest, 1895, Act I >>>On 20 de maio de 2013 at 19:40:14, Jeff Newmiller >>>(jdnew...@dcn.davis.ca.us) wrote: >>>I don't know what you plan to do with this list, but lists are quite a >> >>>bit less efficient than fixed-mode vectors, so you are likely losing a >> >>>lot of computational speed by using this list. I don't hesitate to use >> >>>simple data frames (lists of vectors), but processing lists is on par >> >>>with for loops, not vectorized computation. It may still support a >>>simpler model of computation, but that is an analyst comprehension >>>benefit rather than a computational efficiency benefit. >>>--------------------------------------------------------------------------- >> >>> >>>Jeff Newmiller The ..... ..... Go Live... >>>DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... >>>Live: OO#.. Dead: OO#.. Playing >>>Research Engineer (Solar/Batteries O.O#. #.O#. with >>>/Software/Embedded Controllers) .OO#. .OO#. rocks...1k >>>--------------------------------------------------------------------------- >> >>> >>>Sent from my phone. Please excuse my brevity. >>> >>>Alexandre Sieira <alexandre.sie...@gmail.com> wrote: >>> >>>>I was trying to convert a vector of POSIXct into a list of POSIXct, >>>>However, I had a problem that I wanted to share with you. >>>> >>>>Works fine with, say, numeric: >>>> >>>> >>>>> v = c(1, 2, 3) >>>>> v >>>>[1] 1 2 3 >>>>> str(v) >>>> num [1:3] 1 2 3 >>>>> l = as.vector(v, mode="list") >>>>> l >>>>[[1]] >>>>[1] 1 >>>> >>>>[[2]] >>>>[1] 2 >>>> >>>>[[3]] >>>>[1] 3 >>>> >>>>> str(l) >>>>List of 3 >>>> $ : num 1 >>>> $ : num 2 >>>> $ : num 3 >>>> >>>>If you try it with POSIXct, on the other hand… >>>> >>>> >>>>> v = c(Sys.time(), Sys.time()) >>>>> v >>>>[1] "2013-05-20 18:02:07 BRT" "2013-05-20 18:02:07 BRT" >>>>> str(v) >>>> POSIXct[1:2], format: "2013-05-20 18:02:07" "2013-05-20 18:02:07" >>>>> l = as.vector(v, mode="list") >>>>> l >>>>[[1]] >>>>[1] 1369083728 >>>> >>>>[[2]] >>>>[1] 1369083728 >>>> >>>>> str(l) >>>>List of 2 >>>> $ : num 1.37e+09 >>>> $ : num 1.37e+09 >>>> >>>>The POSIXct values are coerced to numeric, which is unexpected. >>>> >>>>The documentation for as.vector says: "The default method handles 24 >> >>>>input types and 12 values of type: the details of most coercions are >> >>>>undocumented and subject to change." It would appear that treatment >>>for >>>>POSIXct is either missing or needs adjustment. >>>> >>>>Unlist (for the reverse) is documented to converting to base types, >>so >>> >>>>I can't complain. Just wanted to share that I ended up giving up on >>>>vectorization and writing the two following functions: >>>> >>>> >>>>unlistPOSIXct <- function(x) { >>>> retval = rep(Sys.time(), length(x)) >>>> for (i in 1:length(x)) retval[i] = x[[i]] >>>> return(retval) >>>>} >>>> >>>>listPOSIXct <- function(x) { >>>> retval = list() >>>> for (i in 1:length(x)) retval[[i]] = x[i] >>>> return(retval) >>>>} >>>> >>>>Is there a better way to do this (other than using *apply instead of >> >>>>for above) that better leverages vectorization? Am I missing >>something >>> >>>>here? >>>> >>>>Thanks! >>>> >>>> >>>> >>>> >>>>-- >>>>Alexandre Sieira >>>>CISA, CISSP, ISO 27001 Lead Auditor >>>> >>>>"The truth is rarely pure and never simple." >>>>Oscar Wilde, The Importance of Being Earnest, 1895, Act I >>>> >>>>------------------------------------------------------------------------ >> >>> >>>> >>>>______________________________________________ >>>>R-help@r-project.org mailing list >>>>https://stat.ethz.ch/mailman/listinfo/r-help >>>>PLEASE do read the posting guide >>>>http://www.R-project.org/posting-guide.html >>>>and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.