Hello Ibrahim, Sorry for the late response. Those replies were for Kshiva. I saw his question(exactly same as this one) multiple times on Pig mailing list as well, so just thought of giving some pointers to him on how to use the list. I should have specified it properly. Apologies for creating the nuisance.
Coming back to the actual point, yes the flow is fine. Normally people do it like this. But I was looking for some alternate way, so that we don't have to go through this long process for the updates. I'll let you know once I find something useful. But till now I haven't found anything better than whatever Dean sir has suggested. Please, do let me know if you find something before me. Many thanks. Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Wed, Dec 26, 2012 at 7:24 PM, Ibrahim Yakti <iya...@souq.com> wrote: > After more reading, a suggested scenario looks like: > > MySQL ---(Extract / Load)---> HDFS ---> Load into HBase --> Read as > external in Hive ---(Transform Data & Join Tables)--> Use hive for Joins & > Queries ---> Update HBase as needed & Reload in Hive. > > What do you think please? > > > > -- > Ibrahim > > > On Wed, Dec 26, 2012 at 9:27 AM, Ibrahim Yakti <iya...@souq.com> wrote: > >> Mohammad, I am not sure if the answers & the link were to me or to >> Kshiva's question. >> >> if I have partitioned my data based on status for example, when I run the >> update query it will add the updated data on a new partition (success or >> shipped for example) and it will keep the old data (confirmed or paid for >> example), right? >> >> >> -- >> Ibrahim >> >> >> On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <donta...@gmail.com>wrote: >> >>> Also, have a look at this : >>> http://www.catb.org/~esr/faqs/smart-questions.html >>> >>> Best Regards, >>> Tariq >>> +91-9741563634 >>> https://mtariq.jux.com/ >>> >>> >>> On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <donta...@gmail.com>wrote: >>> >>>> Have a look at Beeswax. >>>> >>>> BTW, do you have access to Google at your station?Same question on the >>>> Pig mailing list as well, that too twice. >>>> >>>> Best Regards, >>>> Tariq >>>> +91-9741563634 >>>> https://mtariq.jux.com/ >>>> >>>> >>>> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <kshiva...@gmail.com>wrote: >>>> >>>>> Hi, >>>>> >>>>> Is there any Hive editors and where we can write 100 to 150 Hive >>>>> scripts I'm believing is not essay to do in CLI mode all scripts . >>>>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks >>>>> >>>>> >>>>> Thanks >>>>> >>>>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler < >>>>> dean.wamp...@thinkbiganalytics.com> wrote: >>>>> >>>>>> This is not as hard as it sounds. The hardest part is setting up the >>>>>> incremental query against your MySQL database. Then you can write the >>>>>> results to new files in the HDFS directory for the table and Hive will >>>>>> see >>>>>> them immediately. Yes, even though Hive doesn't support updates, it >>>>>> doesn't >>>>>> care how many files are in the directory. The trick is to avoid lots of >>>>>> little files. >>>>>> >>>>>> As others have suggested, you should consider partitioning the data, >>>>>> perhaps by time. Say you import about a few HDFS blocks-worth of data >>>>>> each >>>>>> day, then use year/month/day partitioning to speed up your Hive queries. >>>>>> You'll need to add the partitions to the table as you go, but actually, >>>>>> you >>>>>> can add those once a month, for example, for all partitions. Hive doesn't >>>>>> care if the partition directories don't exist yet or the directories are >>>>>> empty. I also recommend using an external table, which gives you more >>>>>> flexibility on directory layout, etc. >>>>>> >>>>>> Sqoop might be the easiest tool for importing the data, as it will >>>>>> even generate a Hive table schema from the original MySQL table. However, >>>>>> that feature may not be useful in this case, as you already have the >>>>>> table. >>>>>> >>>>>> I think Oozie is horribly complex to use and overkill for this >>>>>> purpose. A simple bash script triggered periodically by cron is all you >>>>>> need. If you aren't using a partitioned table, you have a single sqoop >>>>>> command to run. If you have partitioned data, you'll also need a hive >>>>>> statement in the script to create the partition, unless you do those in >>>>>> batch once a month, etc., etc. >>>>>> >>>>>> Hope this helps, >>>>>> dean >>>>>> >>>>>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <iya...@souq.com>wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> We are new to hadoop and hive, we are trying to use hive to >>>>>>> run analytical queries and we are using sqoop to import data into hive, >>>>>>> in >>>>>>> our RDBMS the data updated very frequently and this needs to be >>>>>>> reflected >>>>>>> to hive. Hive does not support update/delete but there are many >>>>>>> workarounds >>>>>>> to do this task. >>>>>>> >>>>>>> What's in our mind is importing all the tables into hive as is, then >>>>>>> we build the required tables for reporting. >>>>>>> >>>>>>> My questions are: >>>>>>> >>>>>>> 1. What is the best way to reflect MySQL updates into Hive with >>>>>>> minimal resources? >>>>>>> 2. Is sqoop the right tool to do the ETL? >>>>>>> 3. Is Hive the right tool to do this kind of queries or we >>>>>>> should search for alternatives? >>>>>>> >>>>>>> Any hint will be useful, thanks in advanced. >>>>>>> >>>>>>> -- >>>>>>> Ibrahim >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Dean Wampler, Ph.D.* >>>>>> thinkbiganalytics.com >>>>>> +1-312-339-1330 >>>>>> >>>>>> >>>>> >>>> >>> >> >