Mohammad, I am not sure if the answers & the link were to me or to Kshiva's question.
if I have partitioned my data based on status for example, when I run the update query it will add the updated data on a new partition (success or shipped for example) and it will keep the old data (confirmed or paid for example), right? -- Ibrahim On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <donta...@gmail.com> wrote: > Also, have a look at this : > http://www.catb.org/~esr/faqs/smart-questions.html > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/ > > > On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <donta...@gmail.com>wrote: > >> Have a look at Beeswax. >> >> BTW, do you have access to Google at your station?Same question on the >> Pig mailing list as well, that too twice. >> >> Best Regards, >> Tariq >> +91-9741563634 >> https://mtariq.jux.com/ >> >> >> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <kshiva...@gmail.com> wrote: >> >>> Hi, >>> >>> Is there any Hive editors and where we can write 100 to 150 Hive scripts >>> I'm believing is not essay to do in CLI mode all scripts . >>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks >>> >>> >>> Thanks >>> >>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler < >>> dean.wamp...@thinkbiganalytics.com> wrote: >>> >>>> This is not as hard as it sounds. The hardest part is setting up the >>>> incremental query against your MySQL database. Then you can write the >>>> results to new files in the HDFS directory for the table and Hive will see >>>> them immediately. Yes, even though Hive doesn't support updates, it doesn't >>>> care how many files are in the directory. The trick is to avoid lots of >>>> little files. >>>> >>>> As others have suggested, you should consider partitioning the data, >>>> perhaps by time. Say you import about a few HDFS blocks-worth of data each >>>> day, then use year/month/day partitioning to speed up your Hive queries. >>>> You'll need to add the partitions to the table as you go, but actually, you >>>> can add those once a month, for example, for all partitions. Hive doesn't >>>> care if the partition directories don't exist yet or the directories are >>>> empty. I also recommend using an external table, which gives you more >>>> flexibility on directory layout, etc. >>>> >>>> Sqoop might be the easiest tool for importing the data, as it will even >>>> generate a Hive table schema from the original MySQL table. However, that >>>> feature may not be useful in this case, as you already have the table. >>>> >>>> I think Oozie is horribly complex to use and overkill for this purpose. >>>> A simple bash script triggered periodically by cron is all you need. If you >>>> aren't using a partitioned table, you have a single sqoop command to run. >>>> If you have partitioned data, you'll also need a hive statement in the >>>> script to create the partition, unless you do those in batch once a month, >>>> etc., etc. >>>> >>>> Hope this helps, >>>> dean >>>> >>>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <iya...@souq.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> We are new to hadoop and hive, we are trying to use hive to >>>>> run analytical queries and we are using sqoop to import data into hive, in >>>>> our RDBMS the data updated very frequently and this needs to be reflected >>>>> to hive. Hive does not support update/delete but there are many >>>>> workarounds >>>>> to do this task. >>>>> >>>>> What's in our mind is importing all the tables into hive as is, then >>>>> we build the required tables for reporting. >>>>> >>>>> My questions are: >>>>> >>>>> 1. What is the best way to reflect MySQL updates into Hive with >>>>> minimal resources? >>>>> 2. Is sqoop the right tool to do the ETL? >>>>> 3. Is Hive the right tool to do this kind of queries or we should >>>>> search for alternatives? >>>>> >>>>> Any hint will be useful, thanks in advanced. >>>>> >>>>> -- >>>>> Ibrahim >>>>> >>>> >>>> >>>> >>>> -- >>>> *Dean Wampler, Ph.D.* >>>> thinkbiganalytics.com >>>> +1-312-339-1330 >>>> >>>> >>> >> >