Ian, On Aug 23, 2007, at 4:21 PM, Cook, Ian wrote:
> I am developing a tool for converting a large data frame stored in > an uncompressed binary (XDR) RData file to a delimited text file. > The data frame is too large to load() and extract rows from on a > typical PC. I'm looking to parse through the file and extract > individual entries without loading the whole thing into memory. > > In terms of some C source functions, instead of doing RestoreToEnv > (R_Unserialize(connection)) which is essentially what load() does, > I'm looking to get the documentation I would need to build a > function "SaveToCSV()" so that I could do SaveToCSV(R_Unserialize > (connection)). > > Where can I get documentation on the RData file format? Does a > spec document exist? > I don't think so - basically the sources are all the documentation I'm aware of. It's a bit messy, because R supports so many old formats. However, if you want a stand-alone program that handles (uncompressed) XDR2 only, then I may have saved you a bit of work. I have a utility (based on the R sources) that allows you to scan through XDR2 files and to extract individual objects into a separate XDR2 file (this happens to be quite useful when you have a workspace that doesn't load into R and yet you want to save some pieces of it). Have a look at http://urbanek.info/rdcopy.c (you can either run it as "./rdcopy foo" to list the objects or "./ rdcopy foo -v" to show the full structure (all SEXPs with their offsets) or "./rdcopy foo bar 19" to copy SEXP at offset 19 from foo into a separate XDR2 file bar (use offset from the first call to copy entire objects). It's not prefect, but servers its purpose (it resolves references by copying them instead of re-indexing, but it doesn't detect loops). Maybe it helps, even though the task you describe is still far from trivial. Cheers, Simon ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel