Re: Hive shell code exception, urgent help needed

2014-07-24 Thread Sarfraz Ramay
Hi, Thanks for your reply. Have been following links for the past two days now. Finally got hadoop natively compiled. Let's see if that solves the problem. Yes, increasing the memory was on my list but i think i tried that, didn't work. Memory can be issue as it is working perfectly fine for quer

Re: Errors while creating a new table using existing table schema

2014-07-24 Thread Vidya Sujeet
thanks all. I created a new database and it works fine there.. On Sat, Jul 19, 2014 at 1:37 PM, Lefty Leverenz wrote: > And now it's documented in the DDL wiki: > >- Use Database > > >

Re: HIVE 0.12 SUM() returning NULL for decimal values

2014-07-24 Thread 丁桂涛(桂花)
try select sum(sales) from salestemp where sales is not null; On Thu, Jul 24, 2014 at 11:10 PM, Abhishek Gayakwad wrote: > I am trying to aggregate one column of decimal type, which is returning me > null. If I cast this column to double it returns me some value. following > are the steps to re

Re: Hive shell code exception, urgent help needed

2014-07-24 Thread Juan Martin Pampliega
Hi, The actual useful part of the error is: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask If you do a search for this plus "EC2" in Google you will find a couple of results that point to memory exhaustion issues. You should try increasing the configurated memory

Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread Juan Martin Pampliega
The following article about using Klout's Brickhouse library to access an HBase table as a map through its key might be useful. http://brickhouseconfessions.wordpress.com/2013/08/06/squash-the-long-tail-with-brickhouses-hbase-udfs/ On Jul 24, 2014 8:56 PM, "Andrew Mains" wrote: > Agreed--as far

Re: CREATE TABLE throwing error for large number of columns (very long script)

2014-07-24 Thread Juan Martin Pampliega
Are you using MySQL or Postgres for the Metastore database? On Jul 24, 2014 9:08 PM, "Prasanth Jayachandran" < pjayachand...@hortonworks.com> wrote: > What version of hive are you using? What file format are you using? > > Thanks > Prasanth Jayachandran > > On Jul 24, 2014, at 5:03 PM, < > azaz.r

Re: CREATE TABLE throwing error for large number of columns (very long script)

2014-07-24 Thread Prasanth Jayachandran
What version of hive are you using? What file format are you using? Thanks Prasanth Jayachandran On Jul 24, 2014, at 5:03 PM, wrote: > I am trying to Create a table in Hive. It’s a very long script contained > large number of columns and also contains complex fields like STRUCT, ARRAY > etc

CREATE TABLE throwing error for large number of columns (very long script)

2014-07-24 Thread azaz.rasool
I am trying to Create a table in Hive. It's a very long script contained large number of columns and also contains complex fields like STRUCT, ARRAY etc. * Cannot create full table in one shot using CREATE TABLE statement so need to first run CREATE and then ALTER * If fields ar

Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread Andrew Mains
Agreed--as far as I can tell there isn't any support for this currently. This JIRA (https://issues.apache.org/jira/browse/HIVE-3727, referenced in http://hortonworks.com/blog/hbase-via-hive-part-1/) seems relevant, but there's no recent work on it, and I imagine the patch included is out of da

RE: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread java8964
I don't think Hbase-Hive integration part is that smart, be able to utilize the index existing in the HBase. But I think it depends on the version you are using. >From my experience, there are a lot of improvement space in the Hbase-hive >integration, especially "push down" logic into HBase engi

Re: doing upsert possible?

2014-07-24 Thread Juan Martin Pampliega
Hi Yang. That's correct. You should check out the HBase UDFs in Klout's Brickhouse library https://github.com/klout/brickhouse/tree/master/src/main/java/brickhouse/hbase On Jul 24, 2014 8:07 PM, "Yang" wrote: > if we have a huge table, and every 1 hour only 1% of that has some > updates, it would

doing upsert possible?

2014-07-24 Thread Yang
if we have a huge table, and every 1 hour only 1% of that has some updates, it would be a huge waste to slurp in the whole table through MR job and write out the new table. instead, if we store this table in HBASE, and use the current HBase+Hive integration, as long as we can do upsert, then we ca

RE: python UDF and Avro tables

2014-07-24 Thread java8964
Are you trying to read the Avro file directly in your UDF? If so, that is not the correct way to do it in UDF. Hive can support Avro file natively. Don't know your UDF requirement, but here is normally what I will do: Create the table in hive as using AvroContainerInputFormat create external tabl

Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread Yang
kind of found this http://hortonworks.com/blog/hbase-via-hive-part-1/ " >From a performance perspective, there are things Hive can do today (ie, not dependent on data types) to take advantage of HBase. There’s also the possibility of an HBase-aware Hive to make use of HBase tables as intermediate

does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread Yang
if I do a join of a table based on txt file and a table based on HBase, and say the latter is very large, is HIVE smart enough to utilize the HBase table's index to do the join, instead of implementing this as a regular map reduce job, where each table is scanned fully, bucketed on join keys, and t

Re: Reg:Column Statistics with Parquet

2014-07-24 Thread Prasanth Jayachandran
You have to explicit specifics column list in analyze command for gathering columns stats. This command will only collect basic stats like number of rows, total file size, raw data size, number of files. analyze table user_table partition(dt='2014-06-01',hour='00') compute statistics; To colle

python UDF and Avro tables

2014-07-24 Thread Kevin Weiler
Hi All, I hope I’m not duplicating a previous question, but I couldn’t find any search functionality for the user list archives. I have written a relatively simple python script that is meant to take a field from a hive query and transform it (just some string processing through a dict) given

HIVE 0.12 SUM() returning NULL for decimal values

2014-07-24 Thread Abhishek Gayakwad
I am trying to aggregate one column of decimal type, which is returning me null. If I cast this column to double it returns me some value. following are the steps to recreate this scenario. CREATE TABLE salestemp(sku int, sales decimal); LOAD DATA LOCAL INPATH '0

RE: [HELP]Hive Statistics

2014-07-24 Thread Navdeep Agrawal
Well the problem exactly didn’t get solved but I observed this kind of behavior is persistent when I partition my table by date type otherwise its working . may its worth a issue . Thank you From: Navdeep Agrawal [mailto:navdeep_agra...@symantec.com] Sent: Thursday, July 24, 2014 1:22 PM To: us

Reg:Column Statistics with Parquet

2014-07-24 Thread Sandeep Samudrala
I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though *Basic stats: COMPLETE *is seen *Column stats *is seen as*NONE.* Can someone please explain what else I need to debug/fix this. set hive.compute.query.usi

Re: A question about SessionManager

2014-07-24 Thread Navis류승우
https://issues.apache.org/jira/browse/HIVE-5799 is for that kind of cases, but not included in releases yet. Thanks, Navis 2014-07-24 20:04 GMT+09:00 Zhanghe (D) : > Hey Guys, > >I'm working with HiveServer2. I know the HiveServer holds a session for > each client, and close it when the cli

Column Statistics with Parquet

2014-07-24 Thread Suma Shivaprasad
I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though *Basic stats: COMPLETE *is seen *Column stats *is seen as* NONE.* Can someone please explain what else I need to debug/fix this. set hive.compute.query.us

A question about SessionManager

2014-07-24 Thread Zhanghe (D)
Hey Guys, I'm working with HiveServer2. I know the HiveServer holds a session for each client, and close it when the client execute 'CloseSession'. But if the client is forced to terminate, like Ctrl+Z or kill -9, the session in HiveServer will not be closed. Does there exists a problem

Re: Hive shell code exception, urgent help needed

2014-07-24 Thread Sarfraz Ramay
Can anyone please help with this ? [image: Inline image 1] i followed the advice here http://stackoverflow.com/questions/20390217/mapreduce-job-in-headless-environment-fails-n-times-due-to-am-container-exceptio and added to mapred-site.xml following properties but still getting the same error.

Fwd: Column Stats with parquet

2014-07-24 Thread Suma Shivaprasad
I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though *Basic stats: COMPLETE *is seen *Column stats *is seen as* NONE.* Can someone please explain what else I need to debug/fix this. set hive.compute.query.us

Re: Hive UDF gives duplicate result regardless of parameters, when nested in a subquery

2014-07-24 Thread 丁桂涛(桂花)
Yeah. After setting hive.cache.expr.evaluation=false, all queries output expected results. And I found that it's related to the getDisplayString function in the UDF. At first the function returns a string regardless of its parameters. And I had to set hive.cache.expr.evaluation = false. But after

[HELP]Hive Statistics

2014-07-24 Thread Navdeep Agrawal
Stuck .need help I created a small table with multiple partition desc (id int ,term int) partitioned by id ,whenever I run analyze on any id I am getting perfectly good answers . I am unable to figure out the difference each file is making . New table Table Parameters: transient_lastDdl

create table / data type syntax for csv files with comma in the column

2014-07-24 Thread Vidya Sujeet
Hello, I have a csv file that has columns which contains commas within a string enclosed with a ". ex: column name:*'Issue' *value:*"Other (phone, health club, etc)"* *Question:* What should the data type of 'Issue' be? Or how should I format the table (row format delimited terminated by) so that

Column Stats with parquet

2014-07-24 Thread Suma Shivaprasad
I am trying to enable Column statistics usage with Parquet tables. This is the query I am executing. However on explain, I see that even though *Basic stats: COMPLETE *is seen *Column stats *is seen as* NONE.* Can someone please explain what else I need to debug/fix this. set hive.compute.query.us