Hey Snehal (removing the core-dev list; please only post to one at a time),

The access time should be fine, but it depends on what you define as an acceptable access time. If this is not acceptable, I'd suggest putting it behind a web cache like Squid. The best way to find out is to use the system as a prototype and to evaluate it based on your requirements.

Hadoop is useful for small data, but optimized and originally designed only for big data. The primary downfall of the small files is that it may cost more per file in terms of memory. Hadoop as a solution may be overkill, however, if your total storage size is never going to grow very large.

We currently use HDFS for mostly random access.

Brian

On Mar 25, 2009, at 6:10 AM, snehal nagmote wrote:

Hello Sir,
I am doing mtech in iiit hyderabad , I am doing research project whose aim
is to develop the scalable storage system For esagu.
The esagu is all about taking the crop images from the fields and store it in the filesystem and then those images would be accessed by agricultural scientist to detect the problem, So currently many fields in the A.P. are
using this system,it may go beyond A.Pso we require storage system

1)My problem is we are using hadoop for the storage, but hadoop retrieves (reads/writes) in 64 mb chunk . these images stored would be very small size
say max 2 to 3 mb, So access time would be larger in case of accessing
images, Can you suggest how this access time can be reduced.Is there
anyother thing we could do to improve the performance like building our own
cache, To what extent it would be feasible or helpful in such kind of
application.
2)Second is does hadoop would be useful for small small data like this, if not what tricks we could do to make it usable for such knid of application

Please help, Thanks in advance



Regards,
Snehal Nagmote
IIIT Hyderabad

Reply via email to