Re: Need Help hdfs -How to minimize access Time

Brian Bockelman Wed, 25 Mar 2009 01:26:32 -0700

Hey Snehal (removing the core-dev list; please only post to one at atime),

The access time should be fine, but it depends on what you define asan acceptable access time. If this is not acceptable, I'd suggestputting it behind a web cache like Squid. The best way to find out isto use the system as a prototype and to evaluate it based on yourrequirements.

Hadoop is useful for small data, but optimized and originally designedonly for big data. The primary downfall of the small files is that itmay cost more per file in terms of memory. Hadoop as a solution maybe overkill, however, if your total storage size is never going togrow very large.


We currently use HDFS for mostly random access.

Brian

On Mar 25, 2009, at 6:10 AM, snehal nagmote wrote:

Hello Sir,
I am doing mtech in iiit hyderabad , I am doing research projectwhose aim
is to develop the scalable storage system For esagu.
The esagu is all about taking the crop images from the fields andstore itin the filesystem and then those images would be accessed byagriculturalscientist to detect the problem, So currently many fields in theA.P. are
using this system,it may go beyond A.Pso we require storage system
1)My problem is we are using hadoop for the storage, but hadoopretrieves(reads/writes) in 64 mb chunk . these images stored would be verysmall size
say max 2 to 3 mb, So access time would be larger in case of accessing
images, Can you suggest how this access time can be reduced.Is there
anyother thing we could do to improve the performance like buildingour own
cache, To what extent it would be feasible or helpful in such kind of
application.
2)Second is does hadoop would be useful for small small data likethis, ifnot what tricks we could do to make it usable for such knid ofapplication
Please help, Thanks in advance



Regards,
Snehal Nagmote
IIIT Hyderabad

Re: Need Help hdfs -How to minimize access Time

Reply via email to