Edward can you explain more please? you suggesting that I should use HBase for such tasks instead of hive?
-- Ibrahim On Mon, Dec 24, 2012 at 5:28 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > You can only do the last_update idea if this is an insert only dataset. > > If your table takes updates you need a different strategy. > 1) full dumps every interval. > 2) Using a storage handler like hbase or cassandra that takes update > operations > > > > On Mon, Dec 24, 2012 at 9:22 AM, Jeremiah Peschka < > jeremiah.pesc...@gmail.com> wrote: > >> If it were me, I would find a way to identify the partitions that have >> modified data and then re-load a subset of the partitions (only the ones >> with changes) on a regular basis. Instead of updating/deleting data, you'll >> be re-loading specific partitions as an all or nothing action. >> >> On Monday, December 24, 2012, Ibrahim Yakti wrote: >> >>> This already done, but Hive does not support update nor deletion of >>> data, so when I import the data after specific "last_update_time" records, >>> hive will append it not replace. >>> >>> >>> -- >>> Ibrahim >>> >>> >>> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>> >>> You can use Apache Oozie to schedule your imports. >>> >>> Alternatively, you can have an additional column in your SQL table, say >>> LastUpdatedTime or something. As soon as there is a change in this column >>> you can start the import from this point. This way you don't have to import >>> all the things everytime there is a change in your table. You just have to >>> move only the most recent data, say only the 'delta' amount of data. >>> >>> Best Regards, >>> Tariq >>> +91-9741563634 >>> https://mtariq.jux.com/ >>> >>> >>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <iya...@souq.com> wrote: >>> >>> My question was how to reflect MySQL updates to hadoop/hive, this is our >>> problem now. >>> >>> >>> -- >>> Ibrahim >>> >>> >>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>> >>> Cool. Then go ahead :) >>> >>> Just in case you need something in realtime, you can have a look at >>> Impala.(I know nobody likes to get preached, but just in case ;) ). >>> >>> Best Regards, >>> Tariq >>> +91-9741563634 >>> https://mtariq.jux.com/ >>> >>> >>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <iya...@souq.com> wrote: >>> >>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS with >>> Hive. Hadoop/Hive will be used as Data Warehouse & batch processing >>> computing, as I said we want to use Hive for analytical queries. >>> >>> >>> -- >>> Ibrahim >>> >>> >>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>> >>> Hello Ibrahim, >>> >>> A quick questio. Are you planning to replace your SQL DB with Hive? >>> If that is the case, I would not suggest to do that. Both are meant for >>> entirely different purposes. Hive is for batch processing and not for real >>> time system. So if you are requirements involve real time things, you need >>> to think before moving ahead. >>> >>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose. >>> >>> HTH >>> >>> Best Regards, >>> Tariq >>> +91-9741563634 >>> https://mtariq.jux.com/ >>> >>> >>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iya...@souq.com> wrote: >>> >>> Hi All, >>> >>> We are new to hadoop and hive, we are trying to use hive to >>> run analytical queries and we are using sqoop to import data into hive, in >>> our RDBMS the data updated very frequently and this needs to be reflected >>> to hive. Hive does not support update/delete but there are many workarounds >>> to do this task. >>> >>> What's in our mind is importing all the >>> >>> >> >> -- >> --- >> Jeremiah Peschka >> Founder, Brent Ozar Unlimited >> Microsoft SQL Server MVP >> >> >