After more reading, a suggested scenario looks like: MySQL ---(Extract / Load)---> HDFS ---> Load into HBase --> Read as external in Hive ---(Transform Data & Join Tables)--> Use hive for Joins & Queries ---> Update HBase as needed & Reload in Hive.
What do you think please? -- Ibrahim On Wed, Dec 26, 2012 at 9:27 AM, Ibrahim Yakti <[email protected]> wrote: > Mohammad, I am not sure if the answers & the link were to me or to > Kshiva's question. > > if I have partitioned my data based on status for example, when I run the > update query it will add the updated data on a new partition (success or > shipped for example) and it will keep the old data (confirmed or paid for > example), right? > > > -- > Ibrahim > > > On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <[email protected]>wrote: > >> Also, have a look at this : >> http://www.catb.org/~esr/faqs/smart-questions.html >> >> Best Regards, >> Tariq >> +91-9741563634 >> https://mtariq.jux.com/ >> >> >> On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <[email protected]>wrote: >> >>> Have a look at Beeswax. >>> >>> BTW, do you have access to Google at your station?Same question on the >>> Pig mailing list as well, that too twice. >>> >>> Best Regards, >>> Tariq >>> +91-9741563634 >>> https://mtariq.jux.com/ >>> >>> >>> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <[email protected]>wrote: >>> >>>> Hi, >>>> >>>> Is there any Hive editors and where we can write 100 to 150 Hive >>>> scripts I'm believing is not essay to do in CLI mode all scripts . >>>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks >>>> >>>> >>>> Thanks >>>> >>>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler < >>>> [email protected]> wrote: >>>> >>>>> This is not as hard as it sounds. The hardest part is setting up the >>>>> incremental query against your MySQL database. Then you can write the >>>>> results to new files in the HDFS directory for the table and Hive will see >>>>> them immediately. Yes, even though Hive doesn't support updates, it >>>>> doesn't >>>>> care how many files are in the directory. The trick is to avoid lots of >>>>> little files. >>>>> >>>>> As others have suggested, you should consider partitioning the data, >>>>> perhaps by time. Say you import about a few HDFS blocks-worth of data each >>>>> day, then use year/month/day partitioning to speed up your Hive queries. >>>>> You'll need to add the partitions to the table as you go, but actually, >>>>> you >>>>> can add those once a month, for example, for all partitions. Hive doesn't >>>>> care if the partition directories don't exist yet or the directories are >>>>> empty. I also recommend using an external table, which gives you more >>>>> flexibility on directory layout, etc. >>>>> >>>>> Sqoop might be the easiest tool for importing the data, as it will >>>>> even generate a Hive table schema from the original MySQL table. However, >>>>> that feature may not be useful in this case, as you already have the >>>>> table. >>>>> >>>>> I think Oozie is horribly complex to use and overkill for this >>>>> purpose. A simple bash script triggered periodically by cron is all you >>>>> need. If you aren't using a partitioned table, you have a single sqoop >>>>> command to run. If you have partitioned data, you'll also need a hive >>>>> statement in the script to create the partition, unless you do those in >>>>> batch once a month, etc., etc. >>>>> >>>>> Hope this helps, >>>>> dean >>>>> >>>>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <[email protected]>wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> We are new to hadoop and hive, we are trying to use hive to >>>>>> run analytical queries and we are using sqoop to import data into hive, >>>>>> in >>>>>> our RDBMS the data updated very frequently and this needs to be reflected >>>>>> to hive. Hive does not support update/delete but there are many >>>>>> workarounds >>>>>> to do this task. >>>>>> >>>>>> What's in our mind is importing all the tables into hive as is, then >>>>>> we build the required tables for reporting. >>>>>> >>>>>> My questions are: >>>>>> >>>>>> 1. What is the best way to reflect MySQL updates into Hive with >>>>>> minimal resources? >>>>>> 2. Is sqoop the right tool to do the ETL? >>>>>> 3. Is Hive the right tool to do this kind of queries or we should >>>>>> search for alternatives? >>>>>> >>>>>> Any hint will be useful, thanks in advanced. >>>>>> >>>>>> -- >>>>>> Ibrahim >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Dean Wampler, Ph.D.* >>>>> thinkbiganalytics.com >>>>> +1-312-339-1330 >>>>> >>>>> >>>> >>> >> >
