On Fri, Jan 2, 2015 at 5:54 PM, mck <m...@apache.org> wrote: > > You could manually chunk them down to 64Mb pieces. > > Can this split and combine be done automatically by cassandra when inserting/fetching the file without application being bothered about it?
> > > 2) Can I replace HDFS with Cassandra so that I don't have to sync/fetch > > the file from cassandra to HDFS when I want to process it in hadoop > cluster? > > > We¹ keep HDFS as a volatile filesystem simply for hadoop internals. No > need for backups of it, no need to upgrade data, and we're free to wipe > it whenever hadoop has been stopped. > ~mck > Since the hadoop MR streaming job requires the file to be processed to be present in HDFS, I was thinking whether can it get directly from mongodb instead of me manually fetching it and placing it in a directory before submitting the hadoop job? >> There was a datastax project before in being able to replace HDFS with >> Cassandra, but i don't think it's alive anymore. I think you are referring to Brisk project ( http://blog.octo.com/en/introduction-to-datastax-brisk-an-hadoop-and-cassandra-distribution/) but I don't know its current status. Can I use http://gerrymcnicol.azurewebsites.net/ for my task in hand? Regards, Seenu.