Your architecture is a bit unusual in that you seem to be proposing that users get direct access to the hadoop storage layer.
More common is to have a controller layer that mediates requests to store or read data. With that layer of abstraction, you can deal with some of the problems associated with file update. See the recent hbase work, for instance. Even with that layer of abstraction and the recent massive improvements to hbase, Hadoop still tends to be much better for batch processing rather than real-time support of ad hoc user data reads and writes. Depending on the data you have and the update patterns, you might be much happier with a clustered key-value store like Voldemort or Cassandra. Voldemort especially has very nice capabilities for dumping large amounts of data from hadoop into a large store. It also works to support real-time (ish) random reads and writes. On Thu, Jul 23, 2009 at 6:44 AM, Giovanni Tusa <giovan...@gmail.com> wrote: > Could you also suggest me some other useful links, maybe with examples if > any, on how to implement such a mechanism? > -- Ted Dunning, CTO DeepDyve