This already done, but Hive does not support update nor deletion of data, so when I import the data after specific "last_update_time" records, hive will append it not replace.
-- Ibrahim On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <donta...@gmail.com> wrote: > You can use Apache Oozie to schedule your imports. > > Alternatively, you can have an additional column in your SQL table, say > LastUpdatedTime or something. As soon as there is a change in this column > you can start the import from this point. This way you don't have to import > all the things everytime there is a change in your table. You just have to > move only the most recent data, say only the 'delta' amount of data. > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/ > > > On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <iya...@souq.com> wrote: > >> My question was how to reflect MySQL updates to hadoop/hive, this is our >> problem now. >> >> >> -- >> Ibrahim >> >> >> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>wrote: >> >>> Cool. Then go ahead :) >>> >>> Just in case you need something in realtime, you can have a look at >>> Impala.(I know nobody likes to get preached, but just in case ;) ). >>> >>> Best Regards, >>> Tariq >>> +91-9741563634 >>> https://mtariq.jux.com/ >>> >>> >>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <iya...@souq.com> wrote: >>> >>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS with >>>> Hive. Hadoop/Hive will be used as Data Warehouse & batch processing >>>> computing, as I said we want to use Hive for analytical queries. >>>> >>>> >>>> -- >>>> Ibrahim >>>> >>>> >>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <donta...@gmail.com>wrote: >>>> >>>>> Hello Ibrahim, >>>>> >>>>> A quick questio. Are you planning to replace your SQL DB with >>>>> Hive? If that is the case, I would not suggest to do that. Both are meant >>>>> for entirely different purposes. Hive is for batch processing and not for >>>>> real time system. So if you are requirements involve real time things, you >>>>> need to think before moving ahead. >>>>> >>>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose. >>>>> >>>>> HTH >>>>> >>>>> Best Regards, >>>>> Tariq >>>>> +91-9741563634 >>>>> https://mtariq.jux.com/ >>>>> >>>>> >>>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iya...@souq.com>wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> We are new to hadoop and hive, we are trying to use hive to >>>>>> run analytical queries and we are using sqoop to import data into hive, >>>>>> in >>>>>> our RDBMS the data updated very frequently and this needs to be reflected >>>>>> to hive. Hive does not support update/delete but there are many >>>>>> workarounds >>>>>> to do this task. >>>>>> >>>>>> What's in our mind is importing all the tables into hive as is, then >>>>>> we build the required tables for reporting. >>>>>> >>>>>> My questions are: >>>>>> >>>>>> 1. What is the best way to reflect MySQL updates into Hive with >>>>>> minimal resources? >>>>>> 2. Is sqoop the right tool to do the ETL? >>>>>> 3. Is Hive the right tool to do this kind of queries or we should >>>>>> search for alternatives? >>>>>> >>>>>> Any hint will be useful, thanks in advanced. >>>>>> >>>>>> -- >>>>>> Ibrahim >>>>>> >>>>> >>>>> >>>> >>> >> >