Re: Reflect MySQL updates into Hive

Ibrahim Yakti Wed, 26 Dec 2012 05:56:15 -0800

After more reading, a suggested scenario looks like:

MySQL ---(Extract / Load)---> HDFS ---> Load into HBase --> Read as
external in Hive ---(Transform Data & Join Tables)--> Use hive for Joins &
Queries ---> Update HBase as needed & Reload in Hive.


What do you think please?



--
Ibrahim


On Wed, Dec 26, 2012 at 9:27 AM, Ibrahim Yakti <[email protected]> wrote:

> Mohammad, I am not sure if the answers & the link were to me or to
> Kshiva's question.
>
> if I have partitioned my data based on status for example, when I run the
> update query it will add the updated data on a new partition (success or
> shipped for example) and it will keep the old data (confirmed or paid for
> example), right?
>
>
> --
> Ibrahim
>
>
> On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <[email protected]>wrote:
>
>> Also, have a look at this :
>> http://www.catb.org/~esr/faqs/smart-questions.html
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <[email protected]>wrote:
>>
>>> Have a look at Beeswax.
>>>
>>> BTW, do you have access to Google at your station?Same question on the
>>> Pig mailing list as well, that too twice.
>>>
>>> Best Regards,
>>> Tariq
>>> +91-9741563634
>>> https://mtariq.jux.com/
>>>
>>>
>>> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <[email protected]>wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there any Hive editors and where we can write 100 to 150 Hive
>>>> scripts I'm believing is not essay  to  do in CLI mode all scripts .
>>>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
>>>>
>>>>
>>>> Thanks
>>>>
>>>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler <
>>>> [email protected]> wrote:
>>>>
>>>>> This is not as hard as it sounds. The hardest part is setting up the
>>>>> incremental query against your MySQL database. Then you can write the
>>>>> results to new files in the HDFS directory for the table and Hive will see
>>>>> them immediately. Yes, even though Hive doesn't support updates, it 
>>>>> doesn't
>>>>> care how many files are in the directory. The trick is to avoid lots of
>>>>> little files.
>>>>>
>>>>> As others have suggested, you should consider partitioning the data,
>>>>> perhaps by time. Say you import about a few HDFS blocks-worth of data each
>>>>> day, then use year/month/day partitioning to speed up your Hive queries.
>>>>> You'll need to add the partitions to the table as you go, but actually, 
>>>>> you
>>>>> can add those once a month, for example, for all partitions. Hive doesn't
>>>>> care if the partition directories don't exist yet or the directories are
>>>>> empty. I also recommend using an external table, which gives you more
>>>>> flexibility on directory layout, etc.
>>>>>
>>>>> Sqoop might be the easiest tool for importing the data, as it will
>>>>> even generate a Hive table schema from the original MySQL table. However,
>>>>> that feature may not be useful in this case, as you already have the 
>>>>> table.
>>>>>
>>>>> I think Oozie is horribly complex to use and overkill for this
>>>>> purpose. A simple bash script triggered periodically by cron is all you
>>>>> need. If you aren't using a partitioned table, you have a single sqoop
>>>>> command to run. If you have partitioned data, you'll also need a hive
>>>>> statement in the script to create the partition, unless you do those in
>>>>> batch once a month, etc., etc.
>>>>>
>>>>> Hope this helps,
>>>>> dean
>>>>>
>>>>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <[email protected]>wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> We are new to hadoop and hive, we are trying to use hive to
>>>>>> run analytical queries and we are using sqoop to import data into hive, 
>>>>>> in
>>>>>> our RDBMS the data updated very frequently and this needs to be reflected
>>>>>> to hive. Hive does not support update/delete but there are many 
>>>>>> workarounds
>>>>>> to do this task.
>>>>>>
>>>>>> What's in our mind is importing all the tables into hive as is, then
>>>>>> we build the required tables for reporting.
>>>>>>
>>>>>> My questions are:
>>>>>>
>>>>>>    1. What is the best way to reflect MySQL updates into Hive with
>>>>>>    minimal resources?
>>>>>>    2. Is sqoop the right tool to do the ETL?
>>>>>>    3. Is Hive the right tool to do this kind of queries or we should
>>>>>>    search for alternatives?
>>>>>>
>>>>>> Any hint will be useful, thanks in advanced.
>>>>>>
>>>>>> --
>>>>>> Ibrahim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Dean Wampler, Ph.D.*
>>>>> thinkbiganalytics.com
>>>>> +1-312-339-1330
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Reflect MySQL updates into Hive

Reply via email to