Dear Martin Morgan and Martin Maechler... Here is an example of the computational time when a slot of a S4 class is of another S4 class and when it is just one object. I'm sending you the data file.
Thank you! Best regards, André Rossi ############################################################ setClass("SupervisedExample", representation( attr.value = "ANY", target.value = "ANY" )) setClass("StreamBuffer", representation=representation( examples = "list", #SupervisedExample max.length = "integer" ), prototype=list( max.length = as.integer(10000) ) ) b <- new("StreamBuffer") load("~/Dropbox/dataList2.RData") b@examples <- data #data is a list of SupervisedExample class. > system.time({for (i in 1:100) b@examples[[1]]@attr.value[1] = 2 }) user system elapsed 16.837 0.108 18.244 > system.time({for (i in 1:100) data[[1]]@attr.value[1] = 2 }) user system elapsed 0.024 0.000 0.026 ############################################################ 2011/9/10 Martin Morgan <mtmor...@fhcrc.org> > On 09/10/2011 08:08 AM, André Rossi wrote: > >> Hi everybody! >> >> I'm creating an object of a S4 class that has two slots: ListExamples, >> which >> is a list, and idx, which is an integer (as the code below). >> >> Then, I read a data.frame file with 10000 (ten thousands) of lines and 10 >> columns, do some pre-processing and, basically, I store each line as an >> element of a list in the slot ListExamples of the S4 object. However, many >> operations after this take a considerable time. >> >> Can anyone explain me why dois it happen? Is it possible to speed up an >> script that deals with a big number of data (it might be data.frame or >> list)? >> >> Thank you, >> >> André Rossi >> >> setClass("Buffer", >> representation=representation( >> Listexamples = "list", >> idx = "integer" >> ) >> ) >> > > Hi André, > > Can you provide a simpler and more reproducible example, for instance > > > setClass("Buf", representation=representation(**lst="list")) > [1] "Buf" > > b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE)) > > system.time({ b@lst[[1]][[1]] = 2 }) > user system elapsed > 0.005 0.000 0.005 > > Generally it sounds like you're modeling the rows as elements of > Listofelements, but you're better served by modeling the columns (lst = > replicate(10, integer(10000)), if all of your 10 columns were > integer-valued, for instance). Also, S4 is providing some measure of type > safety, and you're undermining that by having your class contain a 'list'. > I'd go after > > setClass("Buffer", > representation=representation( > col1="integer", > col2="character", > col3="numeric" > ## etc. > ), > validity=function(object) { > nms <- slotNames(object) > len <- sapply(nms, function(nm) length(slot(object, nm))) > if (1L != length(unique(len))) > "slots must all be of same length" > else TRUE > }) > > Buffer <- > function(col1, col2, col3, ...) > { > new("Buffer", col1=col1, col2=col2, col3=col3, ...) > } > > Let's see where the inefficiencies are before deciding that this is an S4 > issue. > > Martin > > > >> [[alternative HTML version deleted]] >> >> >> >> >> ______________________________**________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 >
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.