On Wed, 2016-01-27 at 17:49 -0500, George Neuner wrote: > On 1/27/2016 10:50 AM, Brandon Thomas wrote: > > On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote: > > > On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas > > > <bthoma...@gmail.com> wrote: > > > > > > > Is there anything stopping you from restructuring > > > > the data on disk and using the hash directly from there > > > > > > Scotty's hash table is much larger than he thinks it is and very > > > likely is being paged to disk already. Deliberately implementing > > > it > > > as a disk file is unlikely to improve anything. > > > > That's a good point to keep in mind. But there are advantages, > > including faster startup time, using less ram+swap, easier to keep > > the > > file updated and it makes it easier to make a resize solution. > > There > > are probably more, but basically it's the reasons why all large > > (key,value) storage solutions I've herd of use an explicit file > > instead > > of swap. > > I miscalculated the scale of Scotty's hash structure - it's not as > bad > as I thought initially. But even so, it is of a scale where it is > unwieldy and bound to have virtual memory problems unless the machine > is > dedicated. > > Hashing is latency sensitive - it was designed to be a memory > resident > technique. Obviously it _can_ be done using file based buckets ... > the > effect is of querying an ISAM database for ever access. The problem > is > that the latency increases by orders of magnitude: even from > resident > blocks, every access involves the file system API and the > kernel. You > did mention (user space) caching, but that greatly complicates the > solution. > Making the hash external I think is not a win - it definitely will > handle much bigger files, but it will handle every file more > slowly. I > think it is better to leverage the file system rather than fight it. > > YMMV, > George
Yup, the canonical solutions to storing large amounts of data scalably are rather complicated and are work to implement (I wouldn't say unmanagable though). And they get much harder if you wanted, say, ACID properties. This is why it's been factored into libraries, like Redis for example. If it can fit into ram efficiently enough, that's ok. If he's planning to scale though, maybe it's worth looking at using Racket's ffi with a library like Redis or something. It might be the least work and the cleanest solution. Regards, Brandon Thomas -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.