I think it depends on whether this is CPU or IO bound, where the files 
will be stored and how expensive it is to generate blocks, check for 
existence, copy etc.  Over a distributed filesystem running across 
data-centers the decision will probably be different than on a 
multi-core cpu on a single RAID box.

I would guess that the scratch file method is the simplest.  The problem 
is that you could have multiple processes creating the same temp file at 
the same time without some coordination, which might not be a big deal 
if it's a cheap operation.  Otherwise, you could use the filesystem to 
coordinate, like writing an empty file with the hash name ahead of time 
to effectively take the lock before appending actual data, or you could 
use a data structure that keeps a record of previously written blocks or 
currently active blocks.  If you are over a network you either have to 
do that coordination over sockets, or use the distributed filesystem to 
do it for you.  Another option is the posix flock, but I don't have 
experience with it.

It might not work for your situation, but what I was initially thinking 
was to abstract access to the hashed blocks and use a work queue for 
writing.  The queue would automatically throw out duplicate blocks, and 
you could tune the number of parallel writer threads to maximize 
throughput.  If you want to distribute the load you can distribute 
workers that are pulling jobs off the queue.  Depending on whether you 
want readers to block or to return empty handed when requesting a block 
that isn't finished writing yet, you could use something like the 
watchers for agents to notify them when a block is ready or just 
return.  Also, depending on your consistency and isolation  
requirements, the reader interface could query the write-queue to get 
access to blocks that haven't even been written to disk yet.  In that 
way it could sort of act as a persistent cache.

-Jeff

James Reeves wrote:
> Hi folks,
>
> I've been having some difficulty coming up with a scheme for writing
> to files in a thread-safe manner. The files are named with the hash of
> their content, so they are effectively immutable.
>
> The problem comes with writing them for the first time. I need to
> ensure that while a file is initially being written, no other thread
> attempts to read or write to the file.
>
> The best solution I've come up with so far is to write to a temporary
> file, then rename the file to its hash once it has been closed. This
> seems to work, but I'd be very interested to know how other people
> have handled similar concurrent I/O problems in Clojure.
>
> - James
> >
>   


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to