Re: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

2012-08-01 Thread Bejoy Ks
Hi Techy To add on to Gabriel's response  Since one of the partitions got successfully added there is least chance of permissions issues on hive storage location unless you have made any permission changes recently. I assume you are trying to load data using the LOAD DATA statement, then you

Re: Efficiently Store data in Hive

2012-08-01 Thread Bejoy Ks
Hi Techy LZO is not splittable on its own unless indexed. ie if you want your LZO compressed files splittable, after compressing using LZO you need to index the same using LZO indexer. This is mandatory for splittability if you use Text Files.  But if you are using Sequence files, it has the s

Re: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

2012-08-01 Thread Gabriel Eisbruch
Hi Techy this error use to appeare when the user executing the query has not permisions into the origin or target folder, if you create a single table (no externa) is probable that you has not permissions to write into /user/hive Respect to your before question, i am using snappy to compress the d

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

2012-08-01 Thread Techy Teck
I am trying to load data in to the date partition, so my data got succesfully loaded for 20120709 but when I tried to load the data for *20120710, * then I am seeing the below exception. Can anyone suggest me why is it happening like this? *Loading data to table data_quality partition (ds=2012071

Re: cli timeouts

2012-08-01 Thread Travis Crawford
Interesting - this issue would certainly go away with local mode as there's no thrift call to fail. I'd very much prefer to run HMS as a centralized service though. Thanks for the info - I'll have to take a look at how the thrift client handles timeouts/reconnects/etc. --travis On Wed, Aug 1, 2

Re: cli timeouts

2012-08-01 Thread Edward Capriolo
The two setup options are: cli->thriftmetastore->jdbc cli->jdbc (used to be called local mode) localmode has less moving parts so I prefer it. On Wed, Aug 1, 2012 at 2:54 PM, Travis Crawford wrote: > Oh interesting - you're saying instead of running a single > HiveMetaStore thrift service, mos

Re: cli timeouts

2012-08-01 Thread Travis Crawford
Oh interesting - you're saying instead of running a single HiveMetaStore thrift service, most users use the embedded HiveMetaStore mode and have each CLI instance connect to the DB directly? --travis On Wed, Aug 1, 2012 at 11:47 AM, Edward Capriolo wrote: > I feel that that interface is very ra

Efficiently Store data in Hive

2012-08-01 Thread Techy Teck
How can I efficiently store data in Hive and also store and retrieve compressed data in hive? Currently I am storing it as a TextFile. I was going through Bejoy article ( http://kickstarthadoop.blogspot.com/2011/10/how-to-efficiently-store-data-in-hive.html) and I found that LZO compression will

Re: cli timeouts

2012-08-01 Thread Edward Capriolo
I feel that that interface is very rarely used in the wild. The only use case I can figure out for it is people with very in depth hive experience that do not wish to interact with hive through the QL language. That being said I would think the coverage might be a little weak there. With the local

Re: cli timeouts

2012-08-01 Thread Travis Crawford
I'm using the thrift metastore via TFramedTransport. What value do you specify for hive.metastore.client.socket.timeout? I'm using 60. If I open the CLI, run "show tables", wait the timeout period, then run "show tables" the CLI hangs in: "main" prio=10 tid=0x4151a000 nid=0x448 runnable [

Difference between storing data as a TextFile and SequenceFile

2012-08-01 Thread Techy Teck
What is the difference between storing the data as a TextFile and SequenceFile? And which will be faster while doing Hive queries. I am creating a table like this- create table quality ( id bigint, total_chkout bigint, total_errpds bigint ) partitioned by (ds string) row format delimited fi

Re: cli timeouts

2012-08-01 Thread Edward Capriolo
Are you communicating with a thrift metastore or a JDBC metastore? I have had connections opened for long periods of time and never remember experiencing them timeout. Edward On Wed, Aug 1, 2012 at 12:01 PM, Travis Crawford wrote: > Hey Hive gurus - > > Does anyone know how the CLI handles met

Re: Best Report Generating tools for hive/hadoop file system

2012-08-01 Thread Anurag Tangri
Cloudera has connector with microstrategy and Tableau. Looks like Cloudera Might have better working versions in 4.x releases. Wort= h checking. Datameer is another tool that also connects to hive in their new release and= let y ou analyse data And generate reports and graphs. Thanks, Anurag Ta

Re: mapper is slower than hive' mapper

2012-08-01 Thread Yue Guan
The story here is that we have a work flow based on hive queries. It takes several stages to get to the final data. For each stage, we have a hive table. And we try to write the whole work flow in mapreduce. Ideally, it will remove all the intermediate process and take two rounds of mapreduce t

Re: mapper is slower than hive' mapper

2012-08-01 Thread Bertrand Dechoux
My bad. I wasn't sure, at least I know now. But other solutions may use other 'Serialization' strategies like Thrift (which is only other customisation point of Hadoop). Bertrand On Wed, Aug 1, 2012 at 5:49 PM, Edward Capriolo wrote: > Hive does not use combiners it uses map side aggregation. Hi

cli timeouts

2012-08-01 Thread Travis Crawford
Hey Hive gurus - Does anyone know how the CLI handles metastore connection timeouts? It seems if I leave a CLI session idle more than hive.metastore.client.socket.timeout seconds then run "show tables", the cli hangs for the timeout then throws a SocketTimeoutException. Restarting the CLI and runn

Re: mapper is slower than hive' mapper

2012-08-01 Thread Edward Capriolo
Hive does not use combiners it uses map side aggregation. Hive does use writables, sometimes it uses ones from hadoop, sometimes it uses its own custom writables for things like timestamps. On Wed, Aug 1, 2012 at 11:40 AM, Bertrand Dechoux wrote: > I am not sure about Hive but if you look at Casc

Re: mapper is slower than hive' mapper

2012-08-01 Thread Bertrand Dechoux
I am not sure about Hive but if you look at Cascading they use a pseudo combiner instead of the standard (I mean Hadoop's) combiner. I guess Hive has a similar strategy. The point is that when you use a compiler, the compiler does smart thing that you don't need to think about (like loop unwinding

Re: mapper is slower than hive' mapper

2012-08-01 Thread Edward Capriolo
As mentioned, if you avoid using new, by re-using objects and possibly use buffer objects you may be able to match or beat the speed. But in the general case the hive saves you time by allowing you not to worry about low level details like this. On Wed, Aug 1, 2012 at 10:35 AM, Connell, Chuck wro

Re: mapper is slower than hive' mapper

2012-08-01 Thread Yue Guan
Hive don't use Writable?!!. Could you please give me a pointer to hive code to see how they do the job? I check the map output record. I find this: my case: total mapper input record: 23091348 total mapper output record: 23091348 avg mapper output bytes/record: 34.819994 total combiner output re

Re: mapper is slower than hive' mapper

2012-08-01 Thread Bertrand Dechoux
One hint would be to reduce the number of writable instances you need. Create the object once and reuse it. By the way, Hive do not use Writable. ;) Bertrand On Wed, Aug 1, 2012 at 4:35 PM, Connell, Chuck wrote: > This is actually not surprising. Hive is essentially a MapReduce compiler. > It is

RE: mapper is slower than hive' mapper

2012-08-01 Thread Connell, Chuck
This is actually not surprising. Hive is essentially a MapReduce compiler. It is common for regular compilers (C, C#, Fortran) to emit faster assembler code than you write yourself. Compilers know the tricks of their target language. Chuck Connell Nuance R&D Data Team Burlington, MA -Origi

mapper is slower than hive' mapper

2012-08-01 Thread Yue Guan
Hi, there I'm writing mapreduce to replace some hive query and I find that my mapper is slow than hive's mapper. The Hive query is like: select sum(column1) from table group by column2, column3; My mapreduce program likes this: public static class HiveTableMapper extends MapperText, MyKe

Re: Best Report Generating tools for hive/hadoop file system

2012-08-01 Thread Artem Ervits
Latest eclipse birt release has Hive and Hadoop connector. Artem Ervits Data Analyst New York Presbyterian Hospital From: Techy Teck [mailto:comptechge...@gmail.com] Sent: Tuesday, July 31, 2012 08:46 PM To: user@hive.apache.org Subject: Best Report Generating tools for hive/hadoop file system

Re: Unable to merge 3 tables in hive

2012-08-01 Thread iwannaplay games
thanks i did it by creating 3 external tables and then using this query to update createddate from users table for a particular userid. insert overwrite table userinfo select u.userid,a.createddate from users a join userinfo u on u.userid=a.userid i can use query option also I ll try that now :)

Hive EXTERNAL tables not working as explained in documentation!

2012-08-01 Thread Kuldeep Chitrakar
Hi I was playing with external table in hive and it got me confused as concept of external as explain in documentation and practical implementation is not going correctly. Hive Version : 0.7 CREATE EXTERNAL TABLE IF NOT EXISTS learn.crime_external_native ( Orig_State String, TypeofCrime String

Re: Unable to merge 3 tables in hive

2012-08-01 Thread Bejoy KS
Hi If you have run simple table level sqoop import the definitely all the tables will be imported separately to hdfs. If you need the data from 3 tables joined together, frame a proper sql query and use this query in the sqoop import (sqoop import has a --query option) If it is like 3 tables h

RE: Unable to merge 3 tables in hive

2012-08-01 Thread Matouk Iftissen
Hi, are you using import-all-tables tool ? if this is make sure that you respect consigns of this sqoop tool. See the sqoop user guide : http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1766722 For more information. -Message d'origine- De : iwannaplay games [mailto:f