Hi Wilm,
   The reason is that for some auditing purpose, I want to store the
original files also.

Regards,
Seenu.

On Fri, Jan 2, 2015 at 11:09 PM, Wilm Schumacher <wilm.schumac...@gmail.com>
wrote:

> Hi,
>
> perhaps I totally misunderstood your problem, but why "bother" with
> cassandra for storing in the first place?
>
> If your MR for hadoop is only run once for each file (as you wrote
> above), why not copy the data directly to hdfs, run your MR job and use
> cassandra as sink?
>
> As hdfs and yarn are more or less completely independent you could
> perhaps use the "master" as ResourceManager (yarn) AND NameNode and
> DataNode (hdfs) and launch your MR job directly and as mentioned use
> Cassandra as sink for the reduced data. By this you won't need dedicated
> hardware, as you only need the hdfs once, process and delete the files
> afterwards.
>
> Best wishes,
>
> Wilm
>

Reply via email to