Hi, Is there any Hive editors and where we can write 100 to 150 Hive scripts I'm believing is not essay to do in CLI mode all scripts . Like IDE for JAVA /TOAD for SQL pls advice , many thanks
Thanks On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler < dean.wamp...@thinkbiganalytics.com> wrote: > This is not as hard as it sounds. The hardest part is setting up the > incremental query against your MySQL database. Then you can write the > results to new files in the HDFS directory for the table and Hive will see > them immediately. Yes, even though Hive doesn't support updates, it doesn't > care how many files are in the directory. The trick is to avoid lots of > little files. > > As others have suggested, you should consider partitioning the data, > perhaps by time. Say you import about a few HDFS blocks-worth of data each > day, then use year/month/day partitioning to speed up your Hive queries. > You'll need to add the partitions to the table as you go, but actually, you > can add those once a month, for example, for all partitions. Hive doesn't > care if the partition directories don't exist yet or the directories are > empty. I also recommend using an external table, which gives you more > flexibility on directory layout, etc. > > Sqoop might be the easiest tool for importing the data, as it will even > generate a Hive table schema from the original MySQL table. However, that > feature may not be useful in this case, as you already have the table. > > I think Oozie is horribly complex to use and overkill for this purpose. A > simple bash script triggered periodically by cron is all you need. If you > aren't using a partitioned table, you have a single sqoop command to run. > If you have partitioned data, you'll also need a hive statement in the > script to create the partition, unless you do those in batch once a month, > etc., etc. > > Hope this helps, > dean > > On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <iya...@souq.com> wrote: > >> Hi All, >> >> We are new to hadoop and hive, we are trying to use hive to >> run analytical queries and we are using sqoop to import data into hive, in >> our RDBMS the data updated very frequently and this needs to be reflected >> to hive. Hive does not support update/delete but there are many workarounds >> to do this task. >> >> What's in our mind is importing all the tables into hive as is, then we >> build the required tables for reporting. >> >> My questions are: >> >> 1. What is the best way to reflect MySQL updates into Hive with >> minimal resources? >> 2. Is sqoop the right tool to do the ETL? >> 3. Is Hive the right tool to do this kind of queries or we should >> search for alternatives? >> >> Any hint will be useful, thanks in advanced. >> >> -- >> Ibrahim >> > > > > -- > *Dean Wampler, Ph.D.* > thinkbiganalytics.com > +1-312-339-1330 > >