Re: Storing large files for later processing through hadoop

Srinivasa T N Fri, 02 Jan 2015 08:54:33 -0800

On Fri, Jan 2, 2015 at 5:54 PM, mck <m...@apache.org> wrote:

>
> You could manually chunk them down to 64Mb pieces.
>
> Can this split and combine be done automatically by cassandra when
inserting/fetching the file without application being bothered about it?



>
> > 2) Can I replace HDFS with Cassandra so that I don't have to sync/fetch
> > the file from cassandra to HDFS when I want to process it in hadoop
> cluster?
>
>
> We¹ keep HDFS as a volatile filesystem simply for hadoop internals. No
> need for backups of it, no need to upgrade data, and we're free to wipe
> it whenever hadoop has been stopped.
> ~mck
>

Since the hadoop MR streaming job requires the file to be processed to be
present in HDFS, I was thinking whether can it get directly from mongodb
instead of me manually fetching it and placing it in a directory before
submitting the hadoop job?


>> There was a datastax project before in being able to replace HDFS with
>> Cassandra, but i don't think it's alive anymore.

I think you are referring to Brisk project (
http://blog.octo.com/en/introduction-to-datastax-brisk-an-hadoop-and-cassandra-distribution/)
but I don't know its current status.

Can I use http://gerrymcnicol.azurewebsites.net/ for my task in hand?

Regards,
Seenu.

Re: Storing large files for later processing through hadoop

Reply via email to