[julia-users] Re: Advice on (perhaps) chunking to HDF5

sparrowhawker Tue, 13 Sep 2016 12:28:31 -0700

I only found chunking information in the docs on HDF5, not JLD. 

Could you be more specific? Do you mean using custom serialization in JLD 
files? And will that get around my problem of the dimensions of the data to 
be chunked being of dimension specified only at runtime?


On Tuesday, September 13, 2016 at 1:30:31 PM UTC-5, Kristoffer Carlsson 
wrote:
>
> How about using JLD.jl: https://github.com/JuliaIO/JLD.jl
>
> On Tuesday, September 13, 2016 at 7:00:25 PM UTC+2, sparrowhawker wrote:
>>
>> Hi,
>>
>> I'm new to Julia, and have been able to accomplish a lot of what I used 
>> to do in Matlab/Fortran, in very little time since I started using Julia in 
>> the last three months. Here's my newest stumbling block.
>>
>> I have a process which creates nsamples within a loop. Each sample takes 
>> a long time to compute as there are expensive finite difference operations, 
>> which ultimately lead to a sample, say 1 to 10 seconds. I have to store 
>> each of the nsamples, and I know the size and dimensions of each of the 
>> nsamples (all samples have the same size and dimensions). However, 
>> depending on the run time parameters, each sample may be a 32x32 image or 
>> perhaps a 64x64x64 voxset with 3 attributes, i.e., a 64x64x64x3 
>> hyper-rectangle. To be clear, each sample can be an arbitrary dimension 
>> hyper-rectangle, specified at run time.
>>
>> Obviously, since I don't want to lose computation and want to see 
>> incremental progress, I'd like to do incremental saves of these samples on 
>> disk, instead of waiting to collect all nsamples at the end. For instance, 
>> if I had to store 1000 samples of size 64x64, I thought perhaps I could 
>> chunk and save 64x64 slices to an HDF5 file 1000 times. Is this the right 
>> approach? If so, here's a prototype program to do so, but it depends on my 
>> knowing the number of dimensions of the slice, which is not known until 
>> runtime,
>>
>> using HDF5
>>
>> filename = "test.h5"
>> # open file
>> fmode ="w"
>> # get a file object
>> fid = h5open(filename, fmode)
>> # matrix to write in chunks
>> B = rand(64,64,1000)
>> # figure out its dimensions
>> sizeTuple = size(B)
>> Ndims = length(sizeTuple)
>> # set up to write in chunks of sizeArray
>> sizeArray = ones(Int, Ndims)
>> [sizeArray[i] = sizeTuple[i] for i in 1:(Ndims-1)] # last value of size 
>> array is :...:,1
>> # create a dataset models within root
>> dset = d_create(fid, "models", datatype(Float64), dataspace(size(B)), 
>> "chunk", sizeArray)
>> [dset[:,:,i] = slicedim(B, Ndims, i) for i in 1:size(B, Ndims)]
>> close(fid)
>>
>> This works, but the second last line, dset[:,:,i] requires syntax 
>> specific to writing a slice of a dimension 3 array - but I don't know the 
>> dimensions until run time. Of course I could just write to a flat binary 
>> file incrementally, but HDF5.jl could make my life so much simpler!
>>
>> Many thanks for any pointers.
>>
>

[julia-users] Re: Advice on (perhaps) chunking to HDF5

Reply via email to