Hi,

If you want to take advantage of Josh's example below (using an S4
subclass of data.frame), perhaps you might be interested in taking
advantage of the multitude of useful objects/classes defined in the
bioconductor IRanges package:

http://www.bioconductor.org/packages/release/bioc/html/IRanges.html

It has no other bioconductor dependencies, so it's a "slim" install,
in that respect. It defines a DataFrame class which keeps "metadata"
around with as you subset/index/etc. it, eg:

R> library(IRanges)
R> DF <- DataFrame(a=1:10, b=letters[1:10])
R> metadata(DF) <- list(units=list(a=NA, b='inches'))

R> sub.1 <- subset(DF, a %% 2 == 0)
R> sub.1
DataFrame with 5 rows and 2 columns
          a           b
  <integer> <character>
1         2           b
2         4           d
3         6           f
4         8           h
5        10           j

R> metadata(sub.1)
$units
$units$a
[1] NA

$units$b
[1] "inches"

(although I noticed that transform,DataFrame isn't defined actually ...)

Anyway, HTH.

-steve

On Mon, Oct 3, 2011 at 11:15 AM, Joshua Wiley <jwiley.ps...@gmail.com> wrote:
> Hi Bruno,
>
> It sounds like what you want is really a separate class, one that has
> stores information about units for each variable.  This is far from an
> elegant example, but depending on your situation may be useful.  I
> create a new class inheriting from the data frame class.  This is
> likely fraught with problems because a formal S4 class is inheriting
> from an informal S3.  Then a data frame can be stored in the .Data
> slot (special---I did not make it), but character data can also be
> stored in the units slot (which I did define).  You could get fancier
> imposing constraints that the length of units be equal to the number
> of columns in the data frame or the like.  S3 methods for data frames
> should still mostly work, but you also have the ability to access the
> new units slot.  You could define special S4 methods to do the
> extraction then, if you wanted, so that your ultimate syntax to get
> the units of a particular variable would be shorter.
>
> setOldClass("data.frame")
>
> setClass("mydf", representation(units = "character"),
>  contains = "data.frame", S3methods = TRUE)
>
> tmp <- new("mydf")
>
> tmp@.Data <- mtcars
> tmp@row.names <- rownames(mtcars)
> tmp@units <- c("x", "y")
>
> ## data frameish
> colMeans(tmp)
> tmp + 10
>
> # but
> tmp@units
>
> Cheers,
>
> Josh
>
> N.B. I've read once and skimmeda gain Chambers' book, but I still do
> not have a solid grasp on S4 so I may have made some fundamental
> blunder in the example.
>
>
>
> On Mon, Oct 3, 2011 at 7:35 AM, bruno Piguet <bruno.pig...@gmail.com> wrote:
>> Dear all,
>>
>>  I'd like to have a dataframe store information about the units of
>> the data it contains.
>>
>>  You'll find below a minimal exemple of the way I do, so far. I add a
>> "units" attribute to the dataframe. But  I dont' like the long syntax
>> needed to access to the unit of a given variable (namely, something
>> like :
>>   var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame,
>> "names"))]]
>>
>>  Can anybody point me to a better solution ?
>>
>> Thanks in advance,
>>
>> Bruno.
>>
>>
>> # Dataframe creation
>> x <- c(1:10)
>> y <- c(11:20)
>> z <- c(101:110)
>> my_frame <- data.frame(x, y, z)
>> attr(my_frame, "units") <- c("x_unit", "y_unit")
>>
>> #
>> # later on, using dataframe
>> for (var_name in c("x", "y")) {
>>   idx <- match(var_name, attr(my_frame, "names"))
>>   var_unit <- attr(my_frame, "units")[[idx]]
>>   print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit))
>> }
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, ATS Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to