Thanks Mohammad, I will be waiting ... meanwhile, seems I will get into HBase and give it a try ... unless someone advised with something better/easier.
-- Ibrahim On Wed, Dec 26, 2012 at 5:52 PM, Mohammad Tariq <donta...@gmail.com> wrote: > Hello Ibrahim, > > Sorry for the late response. Those replies were for Kshiva. I > saw his question(exactly same as this one) multiple times on Pig mailing > list as well, so just thought of giving some pointers to him on how to use > the list. I should have specified it properly. Apologies for creating the > nuisance. > > Coming back to the actual point, yes the flow is fine. Normally people do > it like this. But I was looking for some alternate way, so that we don't > have to go through this long process for the updates. I'll let you know > once I find something useful. But till now I haven't found anything better > than whatever Dean sir has suggested. Please, do let me know if you find > something before me. > > Many thanks. > > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/ > > > On Wed, Dec 26, 2012 at 7:24 PM, Ibrahim Yakti <iya...@souq.com> wrote: > >> After more reading, a suggested scenario looks like: >> >> MySQL ---(Extract / Load)---> HDFS ---> Load into HBase --> Read as >> external in Hive ---(Transform Data & Join Tables)--> Use hive for Joins & >> Queries ---> Update HBase as needed & Reload in Hive. >> >> What do you think please? >> >> >> >> -- >> Ibrahim >> >> >> On Wed, Dec 26, 2012 at 9:27 AM, Ibrahim Yakti <iya...@souq.com> wrote: >> >>> Mohammad, I am not sure if the answers & the link were to me or to >>> Kshiva's question. >>> >>> if I have partitioned my data based on status for example, when I run >>> the update query it will add the updated data on a new partition (success >>> or shipped for example) and it will keep the old data (confirmed or paid >>> for example), right? >>> >>> >>> -- >>> Ibrahim >>> >>> >>> On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <donta...@gmail.com>wrote: >>> >>>> Also, have a look at this : >>>> http://www.catb.org/~esr/faqs/smart-questions.html >>>> >>>> Best Regards, >>>> Tariq >>>> +91-9741563634 >>>> https://mtariq.jux.com/ >>>> >>>> >>>> On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <donta...@gmail.com>wrote: >>>> >>>>> Have a look at Beeswax. >>>>> >>>>> BTW, do you have access to Google at your station?Same question on the >>>>> Pig mailing list as well, that too twice. >>>>> >>>>> Best Regards, >>>>> Tariq >>>>> +91-9741563634 >>>>> https://mtariq.jux.com/ >>>>> >>>>> >>>>> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <kshiva...@gmail.com>wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Is there any Hive editors and where we can write 100 to 150 Hive >>>>>> scripts I'm believing is not essay to do in CLI mode all scripts . >>>>>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler < >>>>>> dean.wamp...@thinkbiganalytics.com> wrote: >>>>>> >>>>>>> This is not as hard as it sounds. The hardest part is setting up the >>>>>>> incremental query against your MySQL database. Then you can write the >>>>>>> results to new files in the HDFS directory for the table and Hive will >>>>>>> see >>>>>>> them immediately. Yes, even though Hive doesn't support updates, it >>>>>>> doesn't >>>>>>> care how many files are in the directory. The trick is to avoid lots of >>>>>>> little files. >>>>>>> >>>>>>> As others have suggested, you should consider partitioning the data, >>>>>>> perhaps by time. Say you import about a few HDFS blocks-worth of data >>>>>>> each >>>>>>> day, then use year/month/day partitioning to speed up your Hive queries. >>>>>>> You'll need to add the partitions to the table as you go, but actually, >>>>>>> you >>>>>>> can add those once a month, for example, for all partitions. Hive >>>>>>> doesn't >>>>>>> care if the partition directories don't exist yet or the directories are >>>>>>> empty. I also recommend using an external table, which gives you more >>>>>>> flexibility on directory layout, etc. >>>>>>> >>>>>>> Sqoop might be the easiest tool for importing the data, as it will >>>>>>> even generate a Hive table schema from the original MySQL table. >>>>>>> However, >>>>>>> that feature may not be useful in this case, as you already have the >>>>>>> table. >>>>>>> >>>>>>> I think Oozie is horribly complex to use and overkill for this >>>>>>> purpose. A simple bash script triggered periodically by cron is all you >>>>>>> need. If you aren't using a partitioned table, you have a single sqoop >>>>>>> command to run. If you have partitioned data, you'll also need a hive >>>>>>> statement in the script to create the partition, unless you do those in >>>>>>> batch once a month, etc., etc. >>>>>>> >>>>>>> Hope this helps, >>>>>>> dean >>>>>>> >>>>>>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <iya...@souq.com>wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> We are new to hadoop and hive, we are trying to use hive to >>>>>>>> run analytical queries and we are using sqoop to import data into >>>>>>>> hive, in >>>>>>>> our RDBMS the data updated very frequently and this needs to be >>>>>>>> reflected >>>>>>>> to hive. Hive does not support update/delete but there are many >>>>>>>> workarounds >>>>>>>> to do this task. >>>>>>>> >>>>>>>> What's in our mind is importing all the tables into hive as is, >>>>>>>> then we build the required tables for reporting. >>>>>>>> >>>>>>>> My questions are: >>>>>>>> >>>>>>>> 1. What is the best way to reflect MySQL updates into Hive with >>>>>>>> minimal resources? >>>>>>>> 2. Is sqoop the right tool to do the ETL? >>>>>>>> 3. Is Hive the right tool to do this kind of queries or we >>>>>>>> should search for alternatives? >>>>>>>> >>>>>>>> Any hint will be useful, thanks in advanced. >>>>>>>> >>>>>>>> -- >>>>>>>> Ibrahim >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Dean Wampler, Ph.D.* >>>>>>> thinkbiganalytics.com >>>>>>> +1-312-339-1330 >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >