Re: which approach is better

Bennie Schut Thu, 18 Jul 2013 00:10:05 -0700

The best way to restore is from a backup. We use distcp to keep thisscalable : http://hadoop.apache.org/docs/r1.2.0/distcp2.htmlThe data we feed to hdfs also gets pushed to this backup and themetadatabase from hive also gets pushed here. So this combination workswell for us (had to use it once).Even if a namenode could never crash and all software worked fine 100%of the time there is always the one crazy user/admin who will find a wayto wipe all data.

To me backups are not optional.

Op 17-7-2013 20:17, Hamza Asad schreef:

I use data to generates reports on daily basis, Do couple of analysisand its insert once and read many on daily basis. But My main purposeis to secure my data and easily recover it even if my hadoop(datanode)OR HDFS crashes. As uptill now, i'm using approach in which data hasbeen retrieved directly from HDFS and few days back my hadoop crashesand when i repair it, i was unable to recover my Old data whichresides on HDFS. So please let me know do i have to make architecturalchange OR is there any way to recover data which resides in crashed HDFS
On Wed, Jul 17, 2013 at 11:00 PM, Nitin Pawar <nitinpawar...@gmail.com<mailto:nitinpawar...@gmail.com>> wrote:
    what's the purpose of data storage?
    whats the read and write throughput you expect?
    whats the way you will access data while read?
    whats are your SLAs on both read and write?

    there will be more questions others will ask so be ready for that :)



    On Wed, Jul 17, 2013 at 11:10 PM, Hamza Asad
    <hamza.asa...@gmail.com <mailto:hamza.asa...@gmail.com>> wrote:

        Please let me knw which approach is better. Either i save my
        data directly to HDFS and run hive (shark) queries over it OR
        store my data in HBASE, and then query it.. as i want to
        ensure efficient data retrieval and data remains safe and can
        easily recover if hadoop crashes.
--*/Muhammad Hamza Asad/*
--Nitin Pawar
--
*/Muhammad Hamza Asad/*

Re: which approach is better

Reply via email to