I have to Point out that any such scheme as is being described would
need to be done quite carefully as to not break pass by reference data
semantics that Haskell enjoys/ the wealth of sharing

Moreover, as I understand it, something like this only is feasible in
general for statically sized data structures (though there has been
research on sml nj in the 90s that I thnk was  in the phd thesis of
zhong shao I believe that detailed an evaluation of an approach to
chunking cons cells of lists into contiguous array like chunks), and
that in general any approach to this sort of cleverness would require
a lot more dictionary passing going on than Haskell already deals
with.


Additionally, Im fairly certain both that any non heuristic approach
to such optimization would quickly hit some computational
intractibility ala knapsack and also play merry hell with satisfying
memory alignment constaints required by some ABIs and encouraged by
some processors.


On Friday, March 26, 2010, Sebastian Sylvan <sebastian.syl...@gmail.com> wrote:
>
>
> On Fri, Mar 26, 2010 at 10:52 PM, Mads Lindstrøm <mads_lindstr...@yahoo.dk> 
> wrote:
>
> Hi
>
> On Fri, 2010-03-26 at 21:33 +0000, Sebastian Sylvan wrote:
>
>>
>>
>> Reorganizing data on the fly sounds like it may be a pretty sensible
>> idea now that cache misses are so bad (in comparison). The fact that
>> Haskell data is generally immutable helps too.
>> However, I think your scheme sounds a bit complicated for my tastes,
>> and I don't really understand the idea of "lengths", what would "n
>> consecutive elements" mean for a tree? Surely this wouldn't work just
>> for list-like data structures?
>
> First of all, I had assumed that a data type's memory footprint was
> independent of its value. Similarly to how a C-union always occupies the
> same amount of memory. After reading your post, I reevaluated my
> assumption and my assumption might just be wrong.
>
> Take the Maybe data type. The Nothing value occupies no space other than the 
> Nothing-tag, the Just part occupies more (potentially a lot more, if it too 
> has been collapsed!). It's conceivable that you would force the size to be 
> the same a la C' unions, and have some thresholding where if one of the 
> possible alternatives is too big then it would always use a pointer for that 
> one alternative, to minimize padding.
>
> What do "heavily quantized lookup table" mean? It is the "heavily
> quantized" part I do not get.
>
> E.g. store a small number of bits of offset information per field in a record 
> at the top, let's say 8 bits per field. Now, that means you can have 256 
> unique values, and therefore fields, but it nothing says that the offset 
> value needs to refer to "bytes", it could just as easily refer to "multiples 
> of 16 bytes" for larger structures. This means you would need to align all 
> your fields to 16-byte boundaries, which may need padding, but lets you refer 
> to fields with a larger offset without using a full pointer per field.
>
> Maybe your gain is wasted by doing too much logic here. Perhaps a more 
> conservative suggestion would be to keep the pointers, so the runtime code is 
> the same, but store a little tag indicating that a value has an optional 
> memory extent for the purposes of GC. This way you could compact the leaves 
> of a data graph (still keeping the pointers), which lets the GC stop 
> following pointers once it hits the tag and reads the memory extent saying 
> "this is a some data totalling 612 bytes, and I promise that any pointers in 
> this memory block are all internal to the block". You'd reduce GC work, and 
> more importantly cache misses, but not overall memory usage.
>
> --
> Sebastian Sylvan
>
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to