I was actually trying to answer you actual questions. What are you
currently doing to tackle this update problem and what kind of tweak you
are looking for?There is no direct solution to achieve this,
out-of-the-box, as you have said.

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Mon, Dec 24, 2012 at 7:38 PM, Ibrahim Yakti <iya...@souq.com> wrote:

> This already done, but Hive does not support update nor deletion of data,
> so when I import the data after specific "last_update_time" records, hive
> will append it not replace.
>
>
> --
> Ibrahim
>
>
> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>
>> You can use Apache Oozie to schedule your imports.
>>
>> Alternatively, you can have an additional column in your SQL table, say
>> LastUpdatedTime or something. As soon as there is a change in this column
>> you can start the import from this point. This way you don't have to import
>> all the things everytime there is a change in your table. You just have to
>> move only the most recent data, say only the 'delta' amount of data.
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>>
>>> My question was how to reflect MySQL updates to hadoop/hive, this is our
>>> problem now.
>>>
>>>
>>> --
>>> Ibrahim
>>>
>>>
>>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>>>
>>>> Cool. Then go ahead :)
>>>>
>>>> Just in case you need something in realtime, you can have a look at
>>>> Impala.(I know nobody likes to get preached, but just in case ;) ).
>>>>
>>>> Best Regards,
>>>> Tariq
>>>> +91-9741563634
>>>> https://mtariq.jux.com/
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>>>>
>>>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS
>>>>> with Hive. Hadoop/Hive will be used as Data Warehouse & batch processing
>>>>> computing, as I said we want to use Hive for analytical queries.
>>>>>
>>>>>
>>>>> --
>>>>> Ibrahim
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>>>>>
>>>>>> Hello Ibrahim,
>>>>>>
>>>>>>      A quick questio. Are you planning to replace your SQL DB with
>>>>>> Hive? If that is the case, I would not suggest to do that. Both are meant
>>>>>> for entirely different purposes. Hive is for batch processing and not for
>>>>>> real time system. So if you are requirements involve real time things, 
>>>>>> you
>>>>>> need to think before moving ahead.
>>>>>>
>>>>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose.
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>> Best Regards,
>>>>>> Tariq
>>>>>> +91-9741563634
>>>>>> https://mtariq.jux.com/
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iya...@souq.com>wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> We are new to hadoop and hive, we are trying to use hive to
>>>>>>> run analytical queries and we are using sqoop to import data into hive, 
>>>>>>> in
>>>>>>> our RDBMS the data updated very frequently and this needs to be 
>>>>>>> reflected
>>>>>>> to hive. Hive does not support update/delete but there are many 
>>>>>>> workarounds
>>>>>>> to do this task.
>>>>>>>
>>>>>>> What's in our mind is importing all the tables into hive as is, then
>>>>>>> we build the required tables for reporting.
>>>>>>>
>>>>>>> My questions are:
>>>>>>>
>>>>>>>    1. What is the best way to reflect MySQL updates into Hive with
>>>>>>>    minimal resources?
>>>>>>>    2. Is sqoop the right tool to do the ETL?
>>>>>>>    3. Is Hive the right tool to do this kind of queries or we
>>>>>>>    should search for alternatives?
>>>>>>>
>>>>>>> Any hint will be useful, thanks in advanced.
>>>>>>>
>>>>>>> --
>>>>>>> Ibrahim
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to