Re: Upper case column names

2012-08-14 Thread kulkarni.swar...@gmail.com
Mayank, Just out of curiosityany other reason other than conventions to preserve the case for column names in hive? On Tue, Aug 14, 2012 at 6:38 PM, Travis Crawford wrote: > On Tue, Aug 14, 2012 at 4:20 PM, Edward Capriolo wrote: > >> >> Just changing the code is not as easy as it sounds. It

Re: count(*) vs count(1) in hive

2012-08-14 Thread Bertrand Dechoux
count(1) is the old one count(*) is the new one see https://issues.apache.org/jira/browse/HIVE-287 I guess when you can use the newer, you should use it. Bertrand On Tue, Aug 14, 2012 at 11:33 PM, Raihan Jamal wrote: > Is there any difference between count(*) and count(1) in Hive. And which >

Re: Upper case column names

2012-08-14 Thread Travis Crawford
On Tue, Aug 14, 2012 at 4:20 PM, Edward Capriolo wrote: > > Just changing the code is not as easy as it sounds. It sounds like this > will break many things in production for a lot of people. Absolutely - case sensitivity would be a big change. In the patch we're playing around with we centraliz

Re: Upper case column names

2012-08-14 Thread Edward Capriolo
Just changing the code is not as easy as it sounds. It sounds like this will break many things in production for a lot of people. On Tuesday, August 14, 2012, Travis Crawford wrote: > Hey Mayank - > I've looked briefly at case-sensitivity in Hive, and there's a lot of places where fields are lower

Re: how to do random sampling in hive?

2012-08-14 Thread Roberto Sanabria
Try this: select * from table_name order by rand() limit 5; Cheers, R On Tue, Aug 14, 2012 at 3:23 PM, Raihan Jamal wrote: > I think you can use here LIMIT- > > Limit indicates the number of rows to be returned. The rows returned are > chosen at random. The following query returns 5 rows from

Re: how to do random sampling in hive?

2012-08-14 Thread Raihan Jamal
I think you can use here LIMIT- Limit indicates the number of rows to be returned. The rows returned are chosen at random. The following query returns 5 rows from t1 at random. SELECT * FROM t1 LIMIT 5 http://karmasphere.com/hive-queries-on-table-data *Raihan Jamal* On Tue, Aug 14, 2012

Re: Upper case column names

2012-08-14 Thread Travis Crawford
Hey Mayank - I've looked briefly at case-sensitivity in Hive, and there's a lot of places where fields are lowercased to normalize. For HCatalog, I'm playing around with a small patch that makes case-sensitivity optional and it works if you run queries with Pig/HCat against the metastore. It would

count(*) vs count(1) in hive

2012-08-14 Thread Raihan Jamal
Is there any difference between count(*) and count(1) in Hive. And which one should we use in general and why? Given that I am on Hive 0.6 version. *Raihan Jamal*

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread sudeep tokala
Thanks Bejoy On Tue, Aug 14, 2012 at 2:00 PM, Bejoy Ks wrote: > Hi Sudeep > > You can also look at join optimizations like map join, bucketed map > join,sort merge join etc and choose the right one that fits your > requirement. > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread sudeep tokala
Thanks for the reply Bertrand. On Tue, Aug 14, 2012 at 2:12 PM, Bertrand Dechoux wrote: > > > My question was every join in a hive query would constitute to a > Mapreduce job. > In the general case, yes. BUT if one side of your join is small enough (ie > you can keep all in memory), a hash join/m

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread Bertrand Dechoux
> My question was every join in a hive query would constitute to a Mapreduce job. In the general case, yes. BUT if one side of your join is small enough (ie you can keep all in memory), a hash join/map join can be performed which is much more performant (no reduce is required). Bejoy KS has just p

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread Bejoy Ks
Hi Sudeep You can also look at join optimizations like map join, bucketed map join,sort merge join etc and choose the right one that fits your requirement. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins   Regards, Bejoy KS From: sudee

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread sudeep tokala
hi Bertrand, Thanks for the reply. My question was every join in a hive query would constitute to a Mapreduce job. Mapreduce job goes through serialization and deserilaization of objects Isnt it a overhead. Store data in the smarter way? can you please elaborate on this. Regards Sudeep On Tue,

RE: Issue with creating table in hbase

2012-08-14 Thread Omer, Farah
Here is the complete stacktrace. Killed Tasks Task Complete Status Start Time Finish Time Errors Counters task_201207251201_0678_m_00 0.00% 14-Aug-2012 12:30:20 14-Aug-2012 12:30:44 (24sec) java.lang.Run

Re: Issue with creating table in hbase

2012-08-14 Thread kulkarni.swar...@gmail.com
Is that the complete stacktrace? On Tue, Aug 14, 2012 at 12:01 PM, Omer, Farah wrote: > Unfortunately the job’s log also doesn’t tell me anything very > meaningful. Have you or anyone might have seen this before? > > > > java.lang.RuntimeException: Error in configuring object > >

RE: Issue with creating table in hbase

2012-08-14 Thread Omer, Farah
Unfortunately the job's log also doesn't tell me anything very meaningful. Have you or anyone might have seen this before? java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.ut

Re: Issue with creating table in hbase

2012-08-14 Thread kulkarni.swar...@gmail.com
It seems like your Map reduce job is failing. Refer to the logs in the tracking URL " http://hadoop001:50030/jobdetails.jsp?jobid=job_201207251201_0678"; to see why exactly it is failing. On Tue, Aug 14, 2012 at 11:35 AM, Omer, Farah wrote: > Thanks. That helped. > > ** ** > > Another relate

RE: Issue with creating table in hbase

2012-08-14 Thread Omer, Farah
Thanks. That helped. Another related question: I created this table on HIVE: hive> CREATE TABLE hbase_mstr_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread Bertrand Dechoux
You may want to be clearer. Is your question : how can I change the serialization strategy of Hive? (If so I let other users answer and I am also interested in the answer.) Else the answer is simple. If you want to join data which can not be stored into memory, you need to serialize them. The only

Upper case column names

2012-08-14 Thread Mayank Bansal
Hi, The column names in hive are by default case insensitive. I was wondering if there is any way, I could make the column names case sensitive? I am running a model on a data, the data is now stored in hive, the model has columns referred in camel case. It would require a lot of effort to chang

Re: OPTIMIZING A HIVE QUERY

2012-08-14 Thread sudeep tokala
On Tue, Aug 14, 2012 at 11:08 AM, sudeep tokala wrote: > Hi all, > > How to avoid serialization and deserialization overhead in hive join query > ? will this optimize my query performance. > > Regards > sudeep >

Issue with creating table in hbase

2012-08-14 Thread Omer, Farah
Hi all, I was testing hbase integrated with hive, and running into an issue. Would anyone has an idea what it means? hbase(main):001:0> CREATE TABLE hbase_mstr_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"

unstructured file types load into hive table.

2012-08-14 Thread prabhu k
Hi Users, What are the unstructured type of files load into hive table other than apache web log. Thanks, Prabhu.

Loading transform file from HDFS

2012-08-14 Thread Tom Hall
I am using add file /home/deploy/transform.rb; which works fine when running hive directly but we usually use hive server, is there a way to use a HDFS path for the file? I tried hdfs:// but no joy. Thanks, Tom

Re: NOT IN clause in Hive

2012-08-14 Thread hj g
maybe like this: select buyerid from (select buyerid ,sellerid from transation left out join transation on buyerid=sellerid group by buyerid ,sellerid) where sellerid is null 2012/8/14 Prakrati Agrawal > Dear Phil, > > Can you be a liitle more specific about using the left outer join? > > Tha

Re: New Issue raised in Jira

2012-08-14 Thread Philip Tromans
What you're trying to do can be achieved with: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions with a "D" in a format string. See: http://docs.oracle.com/javase/1.4.2/docs/api/java/text/SimpleDateFormat.html Phil. On 14 August 2012 07:30, Deep

Re: Loading data only into one node

2012-08-14 Thread Jasper Knulst
Hi, The only way to do this is to set the replication factor to 1. dfs.replication 1 You have to set this property to 1 and upload the file to HDFS locally on the DD where you want it to be stored. Still no guarantee that it will end up there. But why would you want to do this? It total

Re: NOT IN clause in Hive

2012-08-14 Thread hj g
maybe like this: select * from (select buyerid ,sellerid from transation left out join transation on buyerid=sellerid group by buyerid ,sellerid) where sellerid is not null 2012/8/14 Prakrati Agrawal > Dear Phil, > > Can you be a liitle more specific about using the left outer join? > > Thank

Re: NOT IN clause in Hive

2012-08-14 Thread Philip Tromans
https://cwiki.apache.org/Hive/languagemanual-joins.html On 14 August 2012 10:29, Prakrati Agrawal wrote: > Dear Phil, > > Can you be a liitle more specific about using the left outer join? > > Thanks and Regards, > Prakrati > > -Original Message- > From: Philip Tromans [mailto:philip.j.tr

RE: NOT IN clause in Hive

2012-08-14 Thread Prakrati Agrawal
Dear Phil, Can you be a liitle more specific about using the left outer join? Thanks and Regards, Prakrati -Original Message- From: Philip Tromans [mailto:philip.j.trom...@gmail.com] Sent: Tuesday, August 14, 2012 2:55 PM To: user@hive.apache.org Subject: Re: NOT IN clause in Hive Hive

Re: Loading data only into one node

2012-08-14 Thread prabhu k
Hi Bejoy, I have same related query, 1 master and 2 salve nodes, is it possible to send data into one DataNode( one slave node)? Thanks, Prabhu. On Tue, Aug 14, 2012 at 2:17 PM, Bejoy Ks wrote: > Hi Shaik > > I didn't get your query correctly, but I assume with Master Node you meant > NameNod

Re: NOT IN clause in Hive

2012-08-14 Thread Philip Tromans
Hive doesn't support IN. You'll need to rewrite your query as a left outer join, and check whether the RHS is null. Phil. On 14 August 2012 10:20, Bertrand Dechoux wrote: > According to the error message, you are not using the correct synthax : > https://cwiki.apache.org/confluence/display/Hive/

Re: NOT IN clause in Hive

2012-08-14 Thread Bertrand Dechoux
According to the error message, you are not using the correct synthax : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-ALLandDISTINCTClauses Bertrand On Tue, Aug 14, 2012 at 11:12 AM, Prakrati Agrawal < prakrati.agra...@mu-sigma.com> wrote: > Dear al

NOT IN clause in Hive

2012-08-14 Thread Prakrati Agrawal
Dear all, I am trying to execute this query : SELECT distinct(buyerid) from transaction WHERE buyerid NOT IN(SELECT distinct(sellerid) from transaction) This gives me the following error: FAILED: Parse Error: line 1:59 cannot recognize input near 'SELECT' 'distinct' '(' in expression specificat

Re: Need Unstructured data sample file

2012-08-14 Thread Bejoy Ks
HI Shaik I don't have any experience dealing with image and audio files with hive but on the structured part I have dealt with Apache log files. You can get some insight here at hive wiki https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-ApacheWeblogData    Regards

Re: Loading data only into one node

2012-08-14 Thread Bejoy Ks
Hi Shaik I didn't get your query correctly, but I assume with Master Node you meant NameNode(NN) and JobTracker(JT) and with Slave Nodes it is DataNode(DN) and TaskTracker(TT). In hdfs the NN holds just the meta data the actual blocks are stored in DNs . So your question seems a little out of t

Need Unstructured data sample file

2012-08-14 Thread shaik ahamed
Hi Users, Need the clarifications on the below tasks. 1. What are the unstrutured type files in hive with examples . 2 .What is the cmd to load unstructured(images&audio) files into hive table. Please let me know is it possible going thorugh with the hive,i