Hi,

We are evaluating the use of standalone hdfs for one of our projects.
The file system would be used to store audio,video,images and text
files for various types of batch processing applications hosted across
multiple machines and multiple platforms.

I wanted some feedback on what are the best hdfs based options
(fuse-dfs,hbase or others) that are available given the requirements
below :

1.      Data type that is required to be stored is video, audio, images,
xml and text files.
2.      These files needs to be created/accessed/deleted from linux and
windows machines
3.      Nature of data that is to be stored is transient , we store all
this data for a configurable amount of time (say 2 days) for
processing across multiple machines and then delete it after
processing is complete.
4.      The data needs to be available as close as possible to the
processing machines (linux or windows) to reduce network i/o.
5.      The no. of files that need to be stored per day is of the order of
millions. The number of folders that need to be created for storing
images for a single videos will be in the order of millions
6.     The no. of files that need to be deleted per day will be of the
order of millions as we would be cleaning up the files for whom
processing has been completed.
7.      The file size for audio/video files can range from few KB to few GB.
8.      The file permissions that are needed would be at max restricting
some hosts to access files in a read only v/s read write mode. - good
to have not a must have requirement
9.      The set up can have 200 -600 machines (mix of windows (30%) and
linux (70%)) each having 250-500 GB hard disk drives
10.     File system should be mountable from linux and windows
machines (via mapping network drive)

Please let me know if you need more details.

Thanks in advance,
Amit

Reply via email to