You can only do the last_update idea if this is an insert only dataset.

If your table takes updates you need a different strategy.
1) full dumps every interval.
2) Using a storage handler like hbase or cassandra that takes update
operations


On Mon, Dec 24, 2012 at 9:22 AM, Jeremiah Peschka <
jeremiah.pesc...@gmail.com> wrote:

> If it were me, I would find a way to identify the partitions that have
> modified data and then re-load a subset of the partitions (only the ones
> with changes) on a regular basis. Instead of updating/deleting data, you'll
> be re-loading specific partitions as an all or nothing action.
>
> On Monday, December 24, 2012, Ibrahim Yakti wrote:
>
>> This already done, but Hive does not support update nor deletion of data,
>> so when I import the data after specific "last_update_time" records, hive
>> will append it not replace.
>>
>>
>> --
>> Ibrahim
>>
>>
>> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>>
>> You can use Apache Oozie to schedule your imports.
>>
>> Alternatively, you can have an additional column in your SQL table, say
>> LastUpdatedTime or something. As soon as there is a change in this column
>> you can start the import from this point. This way you don't have to import
>> all the things everytime there is a change in your table. You just have to
>> move only the most recent data, say only the 'delta' amount of data.
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>>
>> My question was how to reflect MySQL updates to hadoop/hive, this is our
>> problem now.
>>
>>
>> --
>> Ibrahim
>>
>>
>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>>
>> Cool. Then go ahead :)
>>
>> Just in case you need something in realtime, you can have a look at
>> Impala.(I know nobody likes to get preached, but just in case ;) ).
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>>
>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS with
>> Hive. Hadoop/Hive will be used as Data Warehouse & batch processing
>> computing, as I said we want to use Hive for analytical queries.
>>
>>
>> --
>> Ibrahim
>>
>>
>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <donta...@gmail.com>wrote:
>>
>> Hello Ibrahim,
>>
>>      A quick questio. Are you planning to replace your SQL DB with Hive?
>> If that is the case, I would not suggest to do that. Both are meant for
>> entirely different purposes. Hive is for batch processing and not for real
>> time system. So if you are requirements involve real time things, you need
>> to think before moving ahead.
>>
>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose.
>>
>> HTH
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>>
>> Hi All,
>>
>> We are new to hadoop and hive, we are trying to use hive to
>> run analytical queries and we are using sqoop to import data into hive, in
>> our RDBMS the data updated very frequently and this needs to be reflected
>> to hive. Hive does not support update/delete but there are many workarounds
>> to do this task.
>>
>> What's in our mind is importing all the
>>
>>
>
> --
> ---
> Jeremiah Peschka
> Founder, Brent Ozar Unlimited
> Microsoft SQL Server MVP
>
>

Reply via email to