On Wed, 2016-01-27 at 17:49 -0500, George Neuner wrote:
> On 1/27/2016 10:50 AM, Brandon Thomas wrote:
> > On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote:
> > > On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas
> > > <bthoma...@gmail.com> wrote:
> > > 
> > > > Is there anything stopping you from restructuring
> > > > the data on disk and using the hash directly from there
> > > 
> > > Scotty's hash table is much larger than he thinks it is and very
> > > likely is being paged to disk already.  Deliberately implementing
> > > it
> > > as a disk file is unlikely to improve anything.
> > 
> > That's a good point to keep in mind. But there are advantages,
> > including faster startup time, using less ram+swap, easier to keep
> > the
> > file updated and it makes it easier to make a resize solution.
> > There
> > are probably more, but basically it's the reasons why all large
> > (key,value) storage solutions I've herd of use an explicit file
> > instead
> > of swap.
> 
> I miscalculated the scale of Scotty's hash structure - it's not as
> bad 
> as I thought initially.  But even so, it is of a scale where it is 
> unwieldy and bound to have virtual memory problems unless the machine
> is 
> dedicated.
> 
> Hashing is latency sensitive - it was designed to be a memory
> resident 
> technique.  Obviously it _can_ be done using file based buckets ...
> the 
> effect is of querying an ISAM database for ever access.  The problem
> is 
> that the latency increases by orders of magnitude: even from
> resident 
> blocks, every access involves the file system API and the
> kernel.  You 
> did mention (user space) caching, but that greatly complicates the
> solution.
> Making the hash external I think is not a win - it definitely will 
> handle much bigger files, but it will handle every file more
> slowly.  I 
> think it is better to leverage the file system rather than fight it.
> 
> YMMV,
> George

Yup, the canonical solutions to storing large amounts of data scalably
are rather complicated and are work to implement (I wouldn't say
unmanagable though). And they get much harder if you wanted, say, ACID
properties. This is why it's been factored into libraries, like Redis
for example. If it can fit into ram efficiently enough, that's ok. If
he's planning to scale though, maybe it's worth looking at using
Racket's ffi with a library like Redis or something. It might be the
least work and the cleanest solution.

Regards,
Brandon Thomas

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to