Hi Wilm, The reason is that for some auditing purpose, I want to store the original files also.
Regards, Seenu. On Fri, Jan 2, 2015 at 11:09 PM, Wilm Schumacher <wilm.schumac...@gmail.com> wrote: > Hi, > > perhaps I totally misunderstood your problem, but why "bother" with > cassandra for storing in the first place? > > If your MR for hadoop is only run once for each file (as you wrote > above), why not copy the data directly to hdfs, run your MR job and use > cassandra as sink? > > As hdfs and yarn are more or less completely independent you could > perhaps use the "master" as ResourceManager (yarn) AND NameNode and > DataNode (hdfs) and launch your MR job directly and as mentioned use > Cassandra as sink for the reduced data. By this you won't need dedicated > hardware, as you only need the hdfs once, process and delete the files > afterwards. > > Best wishes, > > Wilm >