Dear all HDF experts, I'm looking into using HDF5 for storing data from a large high-energy astrophysics observatory (vs using a custom binary format or something like ROOT's file format, which is commonly used in high-energy physics), and have run into a few problems. At the basic level, our data can be described as follows:
- large data sets (gigabytes per second) where we don't know the full size a-priori when writing - the data is a set of "events", where each event contains a set of instrumental readouts (vectors of numbers) + multiple sets of parameters - there will be many hundreds of thousands of such events in a single file - due to the large data rates, zero-suppression is needed (compression is not enough), meaning that the vector data must be variable-length - we need only sequential access, so random-access is not needed, but speed and size-efficiency of reading and writing is critical due to the data volume - the data will be written (and generally read) an event-at-a-time (e.g. a row of the table at once) At first glance, the HDF packet-table interface looked like a great solution, where each packet stores an event, and within the packet we would put structured HDF data (the resulting data set would then look like a table, with a few columns containing variable-length arrays). However, the variable-length packet tables do not seem to have been ever implemented in the HDF5 libraries, despite having examples and documentation. Is it possible to store variable-length arrays in a fixed-length packet table? Further more, in general it seems that variable-length arrays don't seem to be well documented in HDF, though they appear to be supported. Has anybody had any experience using similar data? Particularly tables containing columns that have variable-length arrays in them? Is it efficient in HDF, and are there examples of its use, or a recommendation on what interfaces to use? A second question is that in reality, this data contains variable-length-arrays of variable-length arrays. However, we can get around 1 level of encapsulation by just using an index variable, or separating one dimension into separate tables, so it's not critical to store the data this way. It would be nice though, since it would reflect the actual hierarchy of the data directly in the format. Is such a format even possible in HDF5? Cheers, Karl -- Dr. Karl Kosack CEA Saclay Bat 709 DSM/IRFU/SAp F-91191 Gif sur Yvette Cedex FRANCE
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
