You can use Apache Oozie to schedule your imports. Alternatively, you can have an additional column in your SQL table, say LastUpdatedTime or something. As soon as there is a change in this column you can start the import from this point. This way you don't have to import all the things everytime there is a change in your table. You just have to move only the most recent data, say only the 'delta' amount of data.
Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <iya...@souq.com> wrote: > My question was how to reflect MySQL updates to hadoop/hive, this is our > problem now. > > > -- > Ibrahim > > > On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>wrote: > >> Cool. Then go ahead :) >> >> Just in case you need something in realtime, you can have a look at >> Impala.(I know nobody likes to get preached, but just in case ;) ). >> >> Best Regards, >> Tariq >> +91-9741563634 >> https://mtariq.jux.com/ >> >> >> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <iya...@souq.com> wrote: >> >>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS with >>> Hive. Hadoop/Hive will be used as Data Warehouse & batch processing >>> computing, as I said we want to use Hive for analytical queries. >>> >>> >>> -- >>> Ibrahim >>> >>> >>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>> >>>> Hello Ibrahim, >>>> >>>> A quick questio. Are you planning to replace your SQL DB with >>>> Hive? If that is the case, I would not suggest to do that. Both are meant >>>> for entirely different purposes. Hive is for batch processing and not for >>>> real time system. So if you are requirements involve real time things, you >>>> need to think before moving ahead. >>>> >>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose. >>>> >>>> HTH >>>> >>>> Best Regards, >>>> Tariq >>>> +91-9741563634 >>>> https://mtariq.jux.com/ >>>> >>>> >>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iya...@souq.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> We are new to hadoop and hive, we are trying to use hive to >>>>> run analytical queries and we are using sqoop to import data into hive, in >>>>> our RDBMS the data updated very frequently and this needs to be reflected >>>>> to hive. Hive does not support update/delete but there are many >>>>> workarounds >>>>> to do this task. >>>>> >>>>> What's in our mind is importing all the tables into hive as is, then >>>>> we build the required tables for reporting. >>>>> >>>>> My questions are: >>>>> >>>>> 1. What is the best way to reflect MySQL updates into Hive with >>>>> minimal resources? >>>>> 2. Is sqoop the right tool to do the ETL? >>>>> 3. Is Hive the right tool to do this kind of queries or we should >>>>> search for alternatives? >>>>> >>>>> Any hint will be useful, thanks in advanced. >>>>> >>>>> -- >>>>> Ibrahim >>>>> >>>> >>>> >>> >> >