Check out the aforementioned astyanax and this http://www.datastax.com/dev/blog/cassandra-file-system-design
Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/03/2013, at 1:38 PM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote: > Thanks for the great explanation. > > Dean > > On 3/4/13 1:44 PM, "Kanwar Sangha" <kan...@mavenir.com> wrote: > >> Problems with small files and HDFS >> >> A small file is one which is significantly smaller than the HDFS block >> size (default 64MB). If you're storing small files, then you probably >> have lots of them (otherwise you wouldn't turn to Hadoop), and the >> problem is that HDFS can't handle lots of files. >> >> Every file, directory and block in HDFS is represented as an object in >> the namenode's memory, each of which occupies 150 bytes, as a rule of >> thumb. So 10 million files, each using a block, would use about 3 >> gigabytes of memory. Scaling up much beyond this level is a problem with >> current hardware. Certainly a billion files is not feasible. >> >> Furthermore, HDFS is not geared up to efficiently accessing small files: >> it is primarily designed for streaming access of large files. Reading >> through small files normally causes lots of seeks and lots of hopping >> from datanode to datanode to retrieve each small file, all of which is an >> inefficient data access pattern. >> Problems with small files and MapReduce >> >> Map tasks usually process a block of input at a time (using the default >> FileInputFormat). If the file is very small and there are a lot of them, >> then each map task processes very little input, and there are a lot more >> map tasks, each of which imposes extra bookkeeping overhead. Compare a >> 1GB file broken into 16 64MB blocks, and 10,000 or so 100KB files. The >> 10,000 files use one map each, and the job time can be tens or hundreds >> of times slower than the equivalent one with a single input file. >> >> There are a couple of features to help alleviate the bookkeeping >> overhead: task JVM reuse for running multiple map tasks in one JVM, >> thereby avoiding some JVM startup overhead (see the >> mapred.job.reuse.jvm.num.tasks property), and MultiFileInputSplit which >> can run more than one split per map. >> >> -----Original Message----- >> From: Hiller, Dean [mailto:dean.hil...@nrel.gov] >> Sent: 04 March 2013 13:38 >> To: user@cassandra.apache.org >> Subject: Re: Storage question >> >> Well, astyanax I know can simulate streaming into cassandra and disperses >> the file to multiple rows in the cluster so you could check that out. >> >> Out of curiosity, why is HDFS not good for a small file size? For >> reading, it should be the bomb with RF=3 since you can read from multiple >> nodes and such. Writes might be a little slower but still shouldn't be >> too bad. >> >> Later, >> Dean >> >> From: Kanwar Sangha <kan...@mavenir.com<mailto:kan...@mavenir.com>> >> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> >> Date: Monday, March 4, 2013 12:34 PM >> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> >> Subject: Storage question >> >> Hi - Can someone suggest the optimal way to store files / images ? We are >> planning to use cassandra for meta-data for these files. HDFS is not >> good for small file size .. can we look at something else ? >