On Wednesday, January 27, 2016 at 2:57:42 AM UTC-6, gneuner2 wrote: > What is this other field on which the file is sorted? this field is the cost in operators to arrive at the key value
> WRT a set of duplicates: are you throwing away all duplicates? Keeping > the 1st one encountered? Something else? keep first instance, chuck the rest > This structure uses a lot more space than necessary. Where did you > get the idea that a bignum is 10 bytes? not sure about the 10 bytes. if i shove 5 128 bit keys into a bignum is that about 80 bytes plus some overhead? so 80*6000000 is 480 mb not including overhead. > In the worst case of every key being a bignum no, every key is contained within a bignum which can contain many many keys. > Since you are only comparing the hash entries for equality, you could > save a lot of space [at the expense of complexity] by defining a > {bucket x chain_size x 16} byte array and storing the 16-byte keys > directly. i must be able to grow the chains. i can't make it fixed size like that. > > have another rather large bignum in memory that i use to reduce > >but not eliminate record duplication of about .5 gb. > ??? ha, ok. this is what this bignum is for. cycle elimination. a sequence of operators (2 bit per) when strung together is a number like 4126740 which represents the operator sequence (0 0 3 0 0 3 1 1 2 2 1). i change that bit in the bignum from 0 to 1. during data generation i look up my about to be applied operator sequence in the bignum. if i see a one, i skip data generation. i'm not really happy with the volume of memory this takes but it is an insanely fast lookup and keeps a ton of data off the hard drive. > In the meantime, I would suggest you look up "merge sort" and it's logarithmic? not happening -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.