You can use Apache Oozie to schedule your imports.

Alternatively, you can have an additional column in your SQL table, say
LastUpdatedTime or something. As soon as there is a change in this column
you can start the import from this point. This way you don't have to import
all the things everytime there is a change in your table. You just have to
move only the most recent data, say only the 'delta' amount of data.

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <iya...@souq.com> wrote:

> My question was how to reflect MySQL updates to hadoop/hive, this is our
> problem now.
>
>
> --
> Ibrahim
>
>
> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>
>> Cool. Then go ahead :)
>>
>> Just in case you need something in realtime, you can have a look at
>> Impala.(I know nobody likes to get preached, but just in case ;) ).
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>>
>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS with
>>> Hive. Hadoop/Hive will be used as Data Warehouse & batch processing
>>> computing, as I said we want to use Hive for analytical queries.
>>>
>>>
>>> --
>>> Ibrahim
>>>
>>>
>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>>>
>>>> Hello Ibrahim,
>>>>
>>>>      A quick questio. Are you planning to replace your SQL DB with
>>>> Hive? If that is the case, I would not suggest to do that. Both are meant
>>>> for entirely different purposes. Hive is for batch processing and not for
>>>> real time system. So if you are requirements involve real time things, you
>>>> need to think before moving ahead.
>>>>
>>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose.
>>>>
>>>> HTH
>>>>
>>>> Best Regards,
>>>> Tariq
>>>> +91-9741563634
>>>> https://mtariq.jux.com/
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> We are new to hadoop and hive, we are trying to use hive to
>>>>> run analytical queries and we are using sqoop to import data into hive, in
>>>>> our RDBMS the data updated very frequently and this needs to be reflected
>>>>> to hive. Hive does not support update/delete but there are many 
>>>>> workarounds
>>>>> to do this task.
>>>>>
>>>>> What's in our mind is importing all the tables into hive as is, then
>>>>> we build the required tables for reporting.
>>>>>
>>>>> My questions are:
>>>>>
>>>>>    1. What is the best way to reflect MySQL updates into Hive with
>>>>>    minimal resources?
>>>>>    2. Is sqoop the right tool to do the ETL?
>>>>>    3. Is Hive the right tool to do this kind of queries or we should
>>>>>    search for alternatives?
>>>>>
>>>>> Any hint will be useful, thanks in advanced.
>>>>>
>>>>> --
>>>>> Ibrahim
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to