Hi, Bernd. Glad to hear from you! I'm out of the office so will get back to you 
with a more considered response Monday. My initial thought is yes: I agree the 
task of extending the functionality to truly arbitrary yet-unknown classes is 
daunting, but it seems we can get this encoding-reconstruction cycle a lot 
closer--and hence provide a lot of valuable benefit--with far less effort than 
getting it to work in 100% of cases.

I would add that part of the motivation for choosing a non-rds file format for 
a current project was to eliminate dependence on package versions for 
transporting Bioc objects between sessions.

Thanks,
Nate

----- Original Message -----
From: "Bernd Fischer" <bernd.fisc...@embl.de>
To: "Nathaniel Hayden" <nhay...@fhcrc.org>
Cc: bioc-devel@r-project.org
Sent: Thursday, August 7, 2014 1:25:52 PM
Subject: Re: How to trigger h5read.<classname> with h5read function in rhdf5

Dear Nathaniel! 







h5read is in fact designed to be able to call a user defined function 
h5read.<myclass>, but it is not yet fully implemented, respectively tested. 
I stalled this because of the complexity of this task. But maybe you and the 
Bioc-devel list can help. 


I can imaging the following scenario: 
- h5write can write the attr(foo, "class") <- "myclass" attribute to the HDF5 
object 
This is already set up, one can invoke this by using write.attributes=TRUE as 
you mentioned 


h5write is a generic function and one can write its own h5write.<myclass> 
function. 


- Before h5read reads the object, it tries to read the class-attribute and 
invokes h5read.<myclass> 
which is defined somewhere outside rhdf5. 

The problems I came across are: 
1.) Usually, the h5read.<mycall> is implemented in some package "mypackage". 
How do I know, which package it is, if the package is not yet loaded? 
Do we have to store an additional "BioCpackage" attribute in the HDF5 object? 
2.) What happens, if the package provider changes the class definition in the 
next BioC-release? 
Do we have to store a package version number as well? 
3.) How shall we deal with R-attributes? 
HDF5 attributes are not able to store all R-attibutes, because HDF5-attributes 
are restricted to 
a maximum size, R-attibutes can be almost as large as you like. One way would 
be 
to store attributes in a group called /obj.ATTRIBUTES. 
E.g. assume you have an R-object foo with attribute names = c("A","B",…) of 
length 2^30 
and geneNames = c("ENSGA","ENSGB",…) 
Should h5write write the following: 

/foo : an HDF5 object, e.g. an integer array 
/foo.ATTRIBUTES : a group 

/foo.ATTRIBUTES/names : a string vector 

/foo.ATTRIBUTES/geneNames : a string vector 
This definitely breaks, if someone wants to write a list that contains both 
elements "foo" and "foo.ATTRIBUTES". Is this acceptable? 
4.) What is the best standard for storing S3/S4-objects in HDF5? 
Assume there is an object foo class baa with slots a = "integer", b = "double" 
and c = "mysecondclass" 
Should h5write write the following: 
/foo : a group with attributes class="baa", BioCpackage="baapackage" 
/foo/slots : a group 
/foo/slots/a : integer 
/foo/slots/b : double 
/foo/slots/c : a group with attributes class="mysecondclass", 
BioCpackage="mysecondpackage" 
/foo/slots/c/slots 
and assume foo has additional attributes as above h5write would write in 
addition: 

/foo.ATTRIBUTES : a group 
/foo.ATTRIBUTES/names : a string vector 
/foo.ATTRIBUTES/geneNames : a string vector 
This standard would allow the definition of a function that reads S3/S4-objects 
of any kind 
and still allow the user to define its own function h5read.<myclass>. 


What do you think about this? I guess that is the direction that you have in 
mind. Any other 
suggestions and comments are welcome. 


Bernd 






On 07.08.2014, at 02:49, Nathaniel Hayden < nhay...@fhcrc.org > wrote: 


When reading from an hdf5 file I would like to automatically call a function I 
define when datasets of an arbitrary type (see: 'class') are read from an hdf5 
file. Since it looks like the existing infrastructure (courtesy of the 
'callGeneric' parameter in h5read) in rhdf5 was made for this, I would like to 
avoid duplicating work. But I can't find an example of the h5read.<classname> 
functionality indicated in the callGeneric description in the h5read man page. 

A simple example is if the type is integer, I want as.integer to be 
automatically called on the read-in object before it gets passed back. But I 
intend to extend this to other Bioconductor classes of arbitrary complexity. 

Based on the documentation, it seems like either using attr(foo "class") <- 
"integer" (in conjunction with h5write(<...>, write.attributes=TRUE) or adding 
a 'class' attribute through the h5writeAttribute interface should be enough to 
trigger the h5read.integer function upon calling h5read. Neither seems to work. 
Note that I can pass read.attributes=TRUE and the attributes get assigned the 
object (for example, the object comes back with a "class" attribute), but 
that's not exactly what I'm after. 

In looking at the R/h5read.R source code, it looks like the block where the 
h5read.<classname> call gets set up (around line 59) queries the "class" 
attribute of the read-in obj before the h5 object's attributes are actually 
read, so the 'cl' variable never seems to get set. 

Here's an example where I would expect h5read.<classname> to be invoked, but it 
doesn't: 

library(rhdf5) 
h5read.integer <- function(obj) { as.integer(obj) } ## h5read.<classname> 
debug(h5read.integer) 
exists(paste("h5read","integer",sep="."),mode="function") 

h5fl <- tempfile(fileext=".h5") 
h5createFile(h5fl) 
ints <- 42L:33L 
attr(ints, "class") <- "integer" 
h5write(ints, h5fl, "foo", write.attributes=TRUE) 
H5close() 

## h5writeAttribute route 
##fid <- H5Fopen(h5fl) 
##did <- H5Dopen(fid, "foo") 
##h5writeAttribute("integer", did, name="class") 
##H5close() 

##res <- h5read(h5fl, "foo", read.attributes=FALSE) 
res <- h5read(h5fl, "foo", read.attributes=TRUE) 

Running the external h5dump utility confirms that a "class" attribute is 
attached to the foo DATASET, which seems to match what the h5read man page 
prescribes. If I edit the source code to set the 'cl' variable to "integer" my 
h5read.integer function gets invoked, as expected. 

Any help would be much appreciated. Thank you. 

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to