Gents:

You've both been polite and thoughtful, but I think you should take
your discussion private, no?

-- Bert

On Wed, May 22, 2013 at 12:57 PM, Alexandre Sieira
<alexandre.sie...@gmail.com> wrote:
> Please let's not turn this into an ad hominem discussion by adding remarks on 
> what the other thinks or knows, as this will get us nowhere fast. Let's focus 
> on the issue, ok? :)
>
> Again, the point behind my workaround was to try to change the rest of my 
> program as little as possible while I waited for the maintainer of the hash 
> package to respond. I found it was an acceptable compromise, even if it does, 
> as you say, add complexity.
>
>
> As for embracing vectorization, I got into this problem exactly because I 
> wanted the data to be returned in a vector using the values() function. in 
> the first place.
>
>
> I agree with your observation that simpler is better. However, I won't get 
> into the details of why I decided to use hash instead of other data 
> structures in my architecture, since I don't mean to put that up for 
> discussion on a public list. I understand you offered alternatives with the 
> best of intentions, and I thank you. But after careful consideration I still 
> think using hash is the best option  and will stick with it on my code.
>
> Given those premises, I would ask you and the list again if you think there 
> is a better way of achieving what my unlistPOSIXct function does that is 
> closer to the natural paradigm of R. The only equivalent I found in base R is 
> the unlist function, but its documentation explicitly states it will coerce 
> data to primitive data types. So unfortunately it doesn't help me.
>
> Working with POSIXct in a list precludes me from doing lots of necessary 
> operations in a vectorized way, such as min() and max(), that will work on 
> POSIXct vectors. That is why I need to convert the list back into a vector in 
> an efficient manner and without unclassing the objects. Would really 
> appreciate any help with that.
>
> Thank you again for your interest and advice.
>
> --
> Alexandre Sieira
> CISA, CISSP, ISO 27001 Lead Auditor
>
> "The truth is rarely pure and never simple."
> Oscar Wilde, The Importance of Being Earnest, 1895, Act I
> On 22 de maio de 2013 at 15:59:46, Jeff Newmiller (jdnew...@dcn.davis.ca.us) 
> wrote:
> My perception of illogic was in your addition of more data structure 
> complexity when faced with this difficulty. R has best performance when 
> calculations are pushed into simple typed vectors where precompiled code can 
> handle the majority of the work. These are simpler structures, not more 
> complex structures. It seems like you are fighting the natural paradigm for 
> working in R and holding fast to your ideas about how things "should be" 
> rather than dealing with how they "are" by introducing lists rather than 
> working with vectors or data frames.
> ---------------------------------------------------------------------------
> Jeff Newmiller The ..... ..... Go Live...
> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
> Live: OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries O.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> Alexandre Sieira <alexandre.sie...@gmail.com> wrote:
>
>>Hi, Jeff.
>>
>>Thanks for your thoughtful suggestions.
>>
>>I do not plan to wait for the hash package to be redesigned to meet my
>>expectations. As a matter of fact, I have:
>>
>>a) Submitted a report of unexpected behavior in hash::values, which the
>>package maintainer quickly replied to and said would examine.
>>b) Designed (with the help of this list) and implemented a workaround
>>in the form of wrapping the POSIXct objects in lists, which has my
>>program working correctly for now.
>>
>>If the hash package is updated and the  workaround is no longer
>>necessary, then I'll reverse this change. Otherwise, I'll look more
>>deeply into my alternatives which might involve maintaining this
>>workaround permanently, or analyzing alternative architectures.
>>
>>The hash package is a beautiful piece of code that is working perfectly
>>for me in many situations. Even with the list wrapping around the
>>POSIXct objects, it is meeting my performance requirements much better
>>than the alternatives I tested. So I'd rather not completely
>>re-engineer working complex code without a very good reason.
>>
>>However, I would like to respectfully disagree with you that my
>>reaction to hash::values behavior was illogical. I don't want to start
>>a flame war or anything, so let's try to keep the discussion civil. :)
>>
>>See, a hash table (or a queue, or a stack, or an R vector) is a data
>>structure that works as a container. You insert objects and you get
>>them back according to the specificities of each data structure (stacks
>>will have a FILO ordering, queues will have FIFO ordering, hashes will
>>maintain key/value pairs, and so).
>>
>>It is completely unreasonable to insert an object of class X into a
>>container, and then get it back altered in a way that is not part of
>>the 'contract' behind the data structure. If I assign X to key K on a
>>hash, however I choose to ask the hash for the value associated with
>>key K back, I should get exactly X as a response. I believe most
>>computer scientists would agree that to be self-evident.
>>
>>And that is to be expected by reading hash::values documentation:
>>
>> Extract values from a hash object. This is a pseudo- accessor method
>>that returns hash values (without keys) as a vector if possible, a list
>>otherwise.
>>
>>
>>Moreover, it has this to say about non-primitive types:
>>
>> If the values are of different types or of a complex class than a
>>named list is returned.
>>
>>
>>It never says it will unclass objects, or coerce them into primitive
>>types. Hence the 'contract' implies I will get back what I inserted,
>>unaltered, either in a vector or a list. And that is provably not what
>>is happening. I would have been ok with a vector of POSIXct or a named
>>list containing the POSIXct values, but instead I am getting a numeric
>>vector.
>>
>>I understand R is based on S, and that OOP concepts were introduced
>>later into its history. However, one of the key concepts in OOP is
>>encapsulation - as an outside entity you do not get to see the internal
>>implementation of a class, you interact with it exclusively through its
>>published "interface" (method, public member variables, etc).
>>
>>I cannot find any justification as for why an object "losing" its class
>>unintentionally is ever acceptable, as it violates the concept of
>>encapsulation. That is essentially what's happening if I look up
>>several keys using values(). So this violates the encapsulation of the
>>POSIXct class, as I am exposed to its internal numeric value. Moreover,
>>it breaks the "method-dispatch" of R functions that know to treat
>>POSIXct values differently. All of a sudden, the POSIXct objects I
>>inserted are being treated, for example, by format as numeric instead
>>of being dispatched to format.Date as expected.
>>
>>So I don't think my reaction to this issue was illogical at all. Hope
>>you'll agree now that I've explained myself a little better. :)
>>
>>--
>>Alexandre Sieira
>>CISA, CISSP, ISO 27001 Lead Auditor
>>
>>"The truth is rarely pure and never simple."
>>Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>>On 21 de maio de 2013 at 22:44:19, Jeff Newmiller
>>(jdnew...@dcn.davis.ca.us) wrote:
>>I recommend that you not plan on waiting for the hash package to be
>>redesigned to meet your expectations. Also, your response to
>>discovering this feature of the hash package seems illogical.
>>
> >From a computer science perspective, the hash mechanism is an
>>implementation trick that is intended to improve lookup speed. It does
>>not actually represent a fundamental data structure like a vector or a
>>set does. You can always put your keys in a vector and search through
>>them (e.g. vector indexing by string) to get an equivalent data
>>retrieval. If the hash package is not improving the speed of your data
>>access, adding an extra layer of data structure is hardly an
>>appropriate solution.
>>
>>Why are you not using normal vectors or data frames and accessing with
>>string or logical indexing?
>>
>>If you are avoiding vectors because they seem slow in loops, perhaps
>>you just need to preallocate the vectors you will store your results in
>>before your loop to regain acceptable speed. Or, perhaps the
>>duplicated() or merge() functions could save you from this mess of
>>incremental data processing.
>>---------------------------------------------------------------------------
>>
>>Jeff Newmiller The ..... ..... Go Live...
>>DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
>>Live: OO#.. Dead: OO#.. Playing
>>Research Engineer (Solar/Batteries O.O#. #.O#. with
>>/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
>>---------------------------------------------------------------------------
>>
>>Sent from my phone. Please excuse my brevity.
>>
>>Alexandre Sieira <alexandre.sie...@gmail.com> wrote:
>>
>>>You are absolutely right.
>>>
>>>I am storing POSIXct objects into a hash (from the hash package).
>>>However, if I try to get them out as a vector using the values()
>>>function, they are unclassed. And that breaks my (highly vectorized)
>>>code. Take a look at this:
>>>
>>>
>>>> h = hash()
>>>> h[["a"]] = Sys.time()
>>>> str(h[["a"]])
>>> POSIXct[1:1], format: "2013-05-20 16:54:28"
>>>> str(values(h))
>>> Named num 1.37e+09
>>> - attr(*, "names")= chr "a"
>>>
>>>
>>>I have reported this to the hash package maintainers. In the meantime,
>>
>>>however, I am storing, for each key, a list containing a single
>>>POSIXct. Then, when I extract all using values(), I get a list
>>>containing all POSIXct entries with class preserved.
>>>
>>>
>>>> h = hash()
>>>> h[["a"]] = list( Sys.time() )
>>>> h[["b"]] = list( Sys.time() )
>>>> h[["c"]] = list( Sys.time() )
>>>> values(h)
>>>$a
>>>[1] "2013-05-21 09:54:03 BRT"
>>>
>>>$b
>>>[1] "2013-05-21 09:54:07 BRT"
>>>
>>>$c
>>>[1] "2013-05-21 09:54:11 BRT"
>>>
>>>> str(values(h))
>>>List of 3
>>> $ a: POSIXct[1:1], format: "2013-05-21 09:54:03"
>>> $ b: POSIXct[1:1], format: "2013-05-21 09:54:07"
>>> $ c: POSIXct[1:1], format: "2013-05-21 09:54:11"
>>>
>>>
>>>However, the next thing I need to do is a min() over that list, so I
>>>need to convert the list into a vector again.
>>>
>>>I agree completely with you that this is horrible for performance, but
>>
>>>it is a temporary workaround until values() is "fixed".
>>>
>>>--
>>>Alexandre Sieira
>>>CISA, CISSP, ISO 27001 Lead Auditor
>>>
>>>"The truth is rarely pure and never simple."
>>>Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>>>On 20 de maio de 2013 at 19:40:14, Jeff Newmiller
>>>(jdnew...@dcn.davis.ca.us) wrote:
>>>I don't know what you plan to do with this list, but lists are quite a
>>
>>>bit less efficient than fixed-mode vectors, so you are likely losing a
>>
>>>lot of computational speed by using this list. I don't hesitate to use
>>
>>>simple data frames (lists of vectors), but processing lists is on par
>>
>>>with for loops, not vectorized computation. It may still support a
>>>simpler model of computation, but that is an analyst comprehension
>>>benefit rather than a computational efficiency benefit.
>>>---------------------------------------------------------------------------
>>
>>>
>>>Jeff Newmiller The ..... ..... Go Live...
>>>DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
>>>Live: OO#.. Dead: OO#.. Playing
>>>Research Engineer (Solar/Batteries O.O#. #.O#. with
>>>/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
>>>---------------------------------------------------------------------------
>>
>>>
>>>Sent from my phone. Please excuse my brevity.
>>>
>>>Alexandre Sieira <alexandre.sie...@gmail.com> wrote:
>>>
>>>>I was trying to convert a vector of POSIXct into a list of POSIXct,
>>>>However, I had a problem that I wanted to share with you.
>>>>
>>>>Works fine with, say, numeric:
>>>>
>>>>
>>>>> v = c(1, 2, 3)
>>>>> v
>>>>[1] 1 2 3
>>>>> str(v)
>>>> num [1:3] 1 2 3
>>>>> l = as.vector(v, mode="list")
>>>>> l
>>>>[[1]]
>>>>[1] 1
>>>>
>>>>[[2]]
>>>>[1] 2
>>>>
>>>>[[3]]
>>>>[1] 3
>>>>
>>>>> str(l)
>>>>List of 3
>>>> $ : num 1
>>>> $ : num 2
>>>> $ : num 3
>>>>
>>>>If you try it with POSIXct, on the other hand…
>>>>
>>>>
>>>>> v = c(Sys.time(), Sys.time())
>>>>> v
>>>>[1] "2013-05-20 18:02:07 BRT" "2013-05-20 18:02:07 BRT"
>>>>> str(v)
>>>> POSIXct[1:2], format: "2013-05-20 18:02:07" "2013-05-20 18:02:07"
>>>>> l = as.vector(v, mode="list")
>>>>> l
>>>>[[1]]
>>>>[1] 1369083728
>>>>
>>>>[[2]]
>>>>[1] 1369083728
>>>>
>>>>> str(l)
>>>>List of 2
>>>> $ : num 1.37e+09
>>>> $ : num 1.37e+09
>>>>
>>>>The POSIXct values are coerced to numeric, which is unexpected.
>>>>
>>>>The documentation for as.vector says: "The default method handles 24
>>
>>>>input types and 12 values of type: the details of most coercions are
>>
>>>>undocumented and subject to change." It would appear that treatment
>>>for
>>>>POSIXct is either missing or needs adjustment.
>>>>
>>>>Unlist (for the reverse) is documented to converting to base types,
>>so
>>>
>>>>I can't complain. Just wanted to share that I ended up giving up on
>>>>vectorization and writing the two following functions:
>>>>
>>>>
>>>>unlistPOSIXct <- function(x) {
>>>>  retval = rep(Sys.time(), length(x))
>>>>  for (i in 1:length(x)) retval[i] = x[[i]]
>>>>  return(retval)
>>>>}
>>>>
>>>>listPOSIXct <- function(x) {
>>>>  retval = list()
>>>>  for (i in 1:length(x)) retval[[i]] = x[i]
>>>>  return(retval)
>>>>}
>>>>
>>>>Is there a better way to do this (other than using *apply instead of
>>
>>>>for above) that better leverages vectorization? Am I missing
>>something
>>>
>>>>here?
>>>>
>>>>Thanks!
>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Alexandre Sieira
>>>>CISA, CISSP, ISO 27001 Lead Auditor
>>>>
>>>>"The truth is rarely pure and never simple."
>>>>Oscar Wilde, The Importance of Being Earnest, 1895, Act I
>>>>
>>>>------------------------------------------------------------------------
>>
>>>
>>>>
>>>>______________________________________________
>>>>R-help@r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide
>>>>http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to