Re: Reflect MySQL updates into Hive

Ibrahim Yakti Wed, 26 Dec 2012 06:57:32 -0800

Thanks Mohammad, I will be waiting ... meanwhile, seems I will get into
HBase and give it a try ... unless someone advised with something
better/easier.



--
Ibrahim


On Wed, Dec 26, 2012 at 5:52 PM, Mohammad Tariq <donta...@gmail.com> wrote:

> Hello Ibrahim,
>
>            Sorry for the late response. Those replies were for Kshiva. I
> saw his question(exactly same as this one) multiple times on Pig mailing
> list as well, so just thought of giving some pointers to him on how to use
> the list. I should have specified it properly. Apologies for creating the
> nuisance.
>
> Coming back to the actual point, yes the flow is fine. Normally people do
> it like this. But I was looking for some alternate way, so that we don't
> have to go through this long process for the updates. I'll let you know
> once I find something useful. But till now I haven't found anything better
> than whatever Dean sir has suggested. Please, do let me know if you find
> something before me.
>
> Many thanks.
>
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Wed, Dec 26, 2012 at 7:24 PM, Ibrahim Yakti <iya...@souq.com> wrote:
>
>> After more reading, a suggested scenario looks like:
>>
>> MySQL ---(Extract / Load)---> HDFS ---> Load into HBase --> Read as
>> external in Hive ---(Transform Data & Join Tables)--> Use hive for Joins &
>> Queries ---> Update HBase as needed & Reload in Hive.
>>
>> What do you think please?
>>
>>
>>
>> --
>> Ibrahim
>>
>>
>> On Wed, Dec 26, 2012 at 9:27 AM, Ibrahim Yakti <iya...@souq.com> wrote:
>>
>>> Mohammad, I am not sure if the answers & the link were to me or to
>>> Kshiva's question.
>>>
>>> if I have partitioned my data based on status for example, when I run
>>> the update query it will add the updated data on a new partition (success
>>> or shipped for example) and it will keep the old data (confirmed or paid
>>> for example), right?
>>>
>>>
>>> --
>>> Ibrahim
>>>
>>>
>>> On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <donta...@gmail.com>wrote:
>>>
>>>> Also, have a look at this :
>>>> http://www.catb.org/~esr/faqs/smart-questions.html
>>>>
>>>> Best Regards,
>>>> Tariq
>>>> +91-9741563634
>>>> https://mtariq.jux.com/
>>>>
>>>>
>>>> On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <donta...@gmail.com>wrote:
>>>>
>>>>> Have a look at Beeswax.
>>>>>
>>>>> BTW, do you have access to Google at your station?Same question on the
>>>>> Pig mailing list as well, that too twice.
>>>>>
>>>>> Best Regards,
>>>>> Tariq
>>>>> +91-9741563634
>>>>> https://mtariq.jux.com/
>>>>>
>>>>>
>>>>> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <kshiva...@gmail.com>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Is there any Hive editors and where we can write 100 to 150 Hive
>>>>>> scripts I'm believing is not essay  to  do in CLI mode all scripts .
>>>>>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler <
>>>>>> dean.wamp...@thinkbiganalytics.com> wrote:
>>>>>>
>>>>>>> This is not as hard as it sounds. The hardest part is setting up the
>>>>>>> incremental query against your MySQL database. Then you can write the
>>>>>>> results to new files in the HDFS directory for the table and Hive will 
>>>>>>> see
>>>>>>> them immediately. Yes, even though Hive doesn't support updates, it 
>>>>>>> doesn't
>>>>>>> care how many files are in the directory. The trick is to avoid lots of
>>>>>>> little files.
>>>>>>>
>>>>>>> As others have suggested, you should consider partitioning the data,
>>>>>>> perhaps by time. Say you import about a few HDFS blocks-worth of data 
>>>>>>> each
>>>>>>> day, then use year/month/day partitioning to speed up your Hive queries.
>>>>>>> You'll need to add the partitions to the table as you go, but actually, 
>>>>>>> you
>>>>>>> can add those once a month, for example, for all partitions. Hive 
>>>>>>> doesn't
>>>>>>> care if the partition directories don't exist yet or the directories are
>>>>>>> empty. I also recommend using an external table, which gives you more
>>>>>>> flexibility on directory layout, etc.
>>>>>>>
>>>>>>> Sqoop might be the easiest tool for importing the data, as it will
>>>>>>> even generate a Hive table schema from the original MySQL table. 
>>>>>>> However,
>>>>>>> that feature may not be useful in this case, as you already have the 
>>>>>>> table.
>>>>>>>
>>>>>>> I think Oozie is horribly complex to use and overkill for this
>>>>>>> purpose. A simple bash script triggered periodically by cron is all you
>>>>>>> need. If you aren't using a partitioned table, you have a single sqoop
>>>>>>> command to run. If you have partitioned data, you'll also need a hive
>>>>>>> statement in the script to create the partition, unless you do those in
>>>>>>> batch once a month, etc., etc.
>>>>>>>
>>>>>>> Hope this helps,
>>>>>>> dean
>>>>>>>
>>>>>>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <iya...@souq.com>wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> We are new to hadoop and hive, we are trying to use hive to
>>>>>>>> run analytical queries and we are using sqoop to import data into 
>>>>>>>> hive, in
>>>>>>>> our RDBMS the data updated very frequently and this needs to be 
>>>>>>>> reflected
>>>>>>>> to hive. Hive does not support update/delete but there are many 
>>>>>>>> workarounds
>>>>>>>> to do this task.
>>>>>>>>
>>>>>>>> What's in our mind is importing all the tables into hive as is,
>>>>>>>> then we build the required tables for reporting.
>>>>>>>>
>>>>>>>> My questions are:
>>>>>>>>
>>>>>>>>    1. What is the best way to reflect MySQL updates into Hive with
>>>>>>>>    minimal resources?
>>>>>>>>    2. Is sqoop the right tool to do the ETL?
>>>>>>>>    3. Is Hive the right tool to do this kind of queries or we
>>>>>>>>    should search for alternatives?
>>>>>>>>
>>>>>>>> Any hint will be useful, thanks in advanced.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ibrahim
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Dean Wampler, Ph.D.*
>>>>>>> thinkbiganalytics.com
>>>>>>> +1-312-339-1330
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Reflect MySQL updates into Hive

Reply via email to