Thank you a lot Morgan. Your suggestion helped me to speed up my code. But I still believe that the inefficience is an S4 issue.
Best regards, André Rossi 2011/9/12 Martin Morgan <mtmor...@fhcrc.org> > Hi André... > > > On 09/12/2011 07:20 AM, André Rossi wrote: > >> Dear Martin Morgan and Martin Maechler... >> >> Here is an example of the computational time when a slot of a S4 class >> is of another S4 class and when it is just one object. I'm sending you >> the data file. >> >> Thank you! >> >> Best regards, >> >> André Rossi >> >> ##############################**############################## >> >> setClass("SupervisedExample", >> representation( >> attr.value = "ANY", >> target.value = "ANY" >> )) >> >> setClass("StreamBuffer", >> representation=representation( >> examples = "list", #SupervisedExample >> max.length = "integer" >> ), >> prototype=list( >> max.length = as.integer(10000) >> ) >> ) >> b <- new("StreamBuffer") >> >> load("~/Dropbox/dataList2.**RData") >> > > For a reproducible example, I guess you have something like > > data <- replicate(10000, new("SupervisedExample")) > > > b@examples <- data #data is a list of SupervisedExample class. >> >> > system.time({for (i in 1:100) b@examples[[1]]@attr.value[1] = 2 }) >> > > Yes, this is slow. [[<-,S4 is not as clever as [[<-,list and performs extra > duplication, including those 10,000 S4 objects it contains. > > As before, an improvement is to think in terms of vectors, maybe a > 'SupervisedExamples' class to act as a collection of examples > > setClass("SupervisedExamples", > representation=representation( > attr.value = "list", > target.value = "list")) > > setClass("StreamBuffer", > representation=representation( > examples="SupervisedExamples")**) > > SupervisedExamples <- > function(attr.value=vector("**list", n), > target.value=vector("list", n), n, ...) > { > new("SupervisedExamples", attr.value=attr.value, > target.value=target.value, ...) > } > > StreamBuffer <- > function(examples, ...) > { > new("StreamBuffer", examples=examples, ...) > } > > data <- SupervisedExamples(n=100000) > > b <- StreamBuffer(data) > > I then have > > > system.time({for (i in 1:100) data@attr.value[[1]] = 2 }) > user system elapsed > 1.081 0.013 1.094 > > system.time({for (i in 1:100) b@examples@attr.value[[1]] <- 2}) > user system elapsed > 4.283 0.000 4.295 > > (note the 10x increase in size); still slower, but this will be amortized > when the updates are vectorized, e.g., > > > idx = sample(length(b@examples@attr.**value), 100) > > system.time(b@examples@attr.**value[idx] <- list(2)) > user system elapsed > 0.013 0.000 0.014 > > A further change might be to recognize 'StreamBuffer' as an abstract class > that SupervisedExamples extends > > setClass("StreamBuffer", > representation=representation( > "VIRTUAL", max.len="integer"), > prototype=prototype(max.len=**100000L), > validity=function(object) { > if (obj...@max.len < length(object)) > "too many elements" > else TRUE > }) > > setMethod(length, "StreamBuffer", function(x) { > stop("'length' undefined on '", class(x), "'") > }) > > setClass("SupervisedExamples", > representation=representation( > attr.value = "list", > target.value = "list"), > contains="StreamBuffer") > > setMethod(length, "SupervisedExamples", function(x) { > length(x@attr.value) > }) > > SupervisedExamples <- > function(attr.value=vector("**list", n), > target.value=vector("list", n), n, ...) > { > new("SupervisedExamples", attr.value=attr.value, > target.value=target.value, ...) > } > > data <- SupervisedExamples(n=100000) > > > system.time({for (i in 1:100) data@attr.value[[1]] = 2 }) > user system elapsed > 1.043 0.014 1.061 > > Martin Morgan > > user system elapsed >> 16.837 0.108 18.244 >> >> > system.time({for (i in 1:100) data[[1]]@attr.value[1] = 2 }) >> user system elapsed >> 0.024 0.000 0.026 >> >> ##############################**############################## >> >> >> 2011/9/10 Martin Morgan <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>> >> >> >> On 09/10/2011 08:08 AM, André Rossi wrote: >> >> Hi everybody! >> >> I'm creating an object of a S4 class that has two slots: >> ListExamples, which >> is a list, and idx, which is an integer (as the code below). >> >> Then, I read a data.frame file with 10000 (ten thousands) of >> lines and 10 >> columns, do some pre-processing and, basically, I store each >> line as an >> element of a list in the slot ListExamples of the S4 object. >> However, many >> operations after this take a considerable time. >> >> Can anyone explain me why dois it happen? Is it possible to >> speed up an >> script that deals with a big number of data (it might be >> data.frame or >> list)? >> >> Thank you, >> >> André Rossi >> >> setClass("Buffer", >> representation=representation( >> Listexamples = "list", >> idx = "integer" >> ) >> ) >> >> >> Hi André, >> >> Can you provide a simpler and more reproducible example, for instance >> >> > setClass("Buf", representation=representation(**__lst="list")) >> [1] "Buf" >> > b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE)) >> > system.time({ b@lst[[1]][[1]] = 2 }) >> user system elapsed >> 0.005 0.000 0.005 >> >> Generally it sounds like you're modeling the rows as elements of >> Listofelements, but you're better served by modeling the columns >> (lst = replicate(10, integer(10000)), if all of your 10 columns were >> integer-valued, for instance). Also, S4 is providing some measure of >> type safety, and you're undermining that by having your class >> contain a 'list'. I'd go after >> >> setClass("Buffer", >> representation=representation( >> col1="integer", >> col2="character", >> col3="numeric" >> ## etc. >> ), >> validity=function(object) { >> nms <- slotNames(object) >> len <- sapply(nms, function(nm) length(slot(object, nm))) >> if (1L != length(unique(len))) >> "slots must all be of same length" >> else TRUE >> }) >> >> Buffer <- >> function(col1, col2, col3, ...) >> { >> new("Buffer", col1=col1, col2=col2, col3=col3, ...) >> } >> >> Let's see where the inefficiencies are before deciding that this is >> an S4 issue. >> >> Martin >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> >> ______________________________**__________________ >> R-help@r-project.org <mailto:R-help@r-project.org> mailing list >> >> >> https://stat.ethz.ch/mailman/_**_listinfo/r-help<https://stat.ethz.ch/mailman/__listinfo/r-help> >> >> <https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> > >> PLEASE do read the posting guide >> >> http://www.R-project.org/__**posting-guide.html<http://www.R-project.org/__posting-guide.html> >> >> <http://www.R-project.org/**posting-guide.html<http://www.R-project.org/posting-guide.html> >> > >> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 <tel:206%20667-2793> >> >> >> > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.