Thanks for your replies, Mark!

So what I'd like to have happen is as follows (still using the case of
a uint32_t dataset in memory).

* When the dataset has values >= 2^16-1 (== 65535) that are *not*
equal to 2^32-1, the dataset will just be saved to disk also as a
bog-standard uint32_t dataset.

* When the dataset has ONLY values < 2^16-1, except perhaps for values
equal to 2^32-1, the dataset will be saved to disk as a bog-standard
uint16_t dataset, where any instances of 2^32-1 in RAM get translated
to instances of 2^16-1 on disk.

That is, later users that look at the HDF5 file in the second case
will see a 16-bit dataset, and may see values of 2^16-1 ... that's
fine, there is no need for those users to see 2^32-1 instead.  I do a
pre-check and switch to determine which of these cases to fall into.


After some more research online, I found this document:

https://support.hdfgroup.org/HDF5/doc/Supplements/dtype_conversion/Conversion.html

and I see that it says if there is no "conversion exception" callback
defined, then any conversion between integers is going to be a "hard
conversion" which acts just as if one wrote
  out_type out = (out_type)in;

So I think I actually don't need to do anything special to take care
of the conversion!  ... since the code "uint16_t out = (uint16_t)in"
would translate 2^32-1 to 2^16-1 automatically, due to the modulus
properties of unsigned arithmetic.

Best regards,
Kevin



On Thu, Oct 12, 2017 at 11:18 AM, Miller, Mark C. <mille...@llnl.gov> wrote:
> You know...I think my response here is a bit confused.
>
>
>
> Here's what I am getting at...an HDF5 dataset has a datatype associated with
> it. That type is determined at the time the dataset is created. The question
> is whether you want such datasets to be seen as *always* having uint32_t
> even if they are stored on disk as 16 bit or whether you plan to have the
> dataset's type to be determined by the condition of whether the data is
> indeed 16 bits or not? Obviously, in the latter case the caller winds up
> having to take some special action to select the data type upon creating the
> dataset to write to. In the former, the caller just always creates what it
> thinks are 32 bit datasets and then writes the data from memory to those
> datasets and, magic happens, and only 16 bit data is stored in the file if
> indeed the data written all fits in 16 bits.
>
>
>
> Hope that makes a tad more sense.
>
>
>
> Mark
>
>
>
>
>
>
>
> "Miller, Mark C." wrote:
>
>
>
> Hmmm. Do you care about whether the HDF5 dataset's type in the file shows as
> "uint32_t" for example? Or, do you simply care that you are not wasting
> space storing an array of 16 bit values using a 32 bit type?
>
>
>
> If you DO NOT CARE about the HDF5's dataset type, my first thought would be
> to handle this as a filter instead of a type_cb callback. Have you
> considered that?
>
>
>
> ·         https://support.hdfgroup.org/HDF5/doc/RM/RM_H5Z.html
>
> ·
> https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFilter
>
> ·
> https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/
>
>
>
> The filter could handle *both* the type size change and the special value
> mapping much like any "compression" filter would.
>
>
>
> Now, consumers of the dataset data in memory would still only ever think
> that the type in memory was a 32 bit type, but the data stored on disk would
> be 16 bits.
>
>
>
> Now, if you really do want the dataset's type to be some 16 bit type so that
> things like h5ls, h5dump, H5Dget_type all return a known, 16 bit type, then
> yeah, probably a custom type conversion is the way to go? But, note that it
> will still appear to be a *custom* type to HDF5 and not a built-in 16 bit
> type. Also, I don't think type conversion can be handled as a 'plugin' in
> the same way filters are so that anyone reading that data, would also need
> to have linked with (e.g. HDF5 will not -- at least I don't think it will --
> load custom type conversion code from some plugin) your implementation of
> that type conversion callback.
>
>
>
> Hope that helps.
>
>
>
> Mark
>
>
>
>
>
>
>
> "Hdf-forum on behalf of Kevin B. McCarty" wrote:
>
>
>
> Hi list,
>
>
>
> I am doing some work that should convert integer datasets
>
> automatically from a larger integer type in memory, to a smaller
>
> integer type on disk, when appropriate.
>
>
>
> To give a concrete example: I might have code that converts a uint32_t
>
> dataset in memory to a uint16_t dataset on disk if it turns out that
>
> the values in the in-memory dataset all can be expressed losslessly in
>
> 16 bits.
>
>
>
> The problem is that I wish to allow for the possibility of one
>
> specific value that does *not* fit in 16 bits, which however I'd like
>
> to translate to a suitable 16-bit replacement value on disk.  That is:
>
>
>
>   if (memValue == (uint32_t)(-1))
>
>     diskValue = (uint16_t)(-1); /* Quietly replace all instances of
>
> 4294967295 in RAM with 65535 on-disk */
>
>
>
> It seems clear that in order to effect this automatic replacement, I
>
> need to write a callback to be given to H5Pset_type_conv_cb() that
>
> will catch overflows and make them instead quietly translate the
>
> out-of-range value to the desired replacement value.  What I don't
>
> understand is what code should go in the body of the callback function
>
> to do this.  (Feel free to assume that the only out-of-range value
>
> that might occur will be the specific value I wish to translate.)
>
>
>
> I've not been able to find any examples showing how to write such a
>
> callback function online.  Advice would be greatly appreciated!
>
>
>
> Thanks in advance,
>
>
>
> --
>
> Kevin B. McCarty
>
> <kmcca...@gmail.com>
>
>
>
> _______________________________________________
>
> Hdf-forum is for HDF software users discussion.
>
> Hdf-forum@lists.hdfgroup.org
>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>
> Twitter: https://twitter.com/hdf5
>
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5



-- 
Kevin B. McCarty
<kmcca...@gmail.com>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to