Re: Anyway to avoid creating subdirectories by "Insert with union²

2016-02-23 Thread Gopal Vijayaraghavan
>Is there anyway to avoid creating sub-directories while running in tez? >Or this is by design and can not be changed? Yes, this is by design. The Tez execution of UNION is entirely parallel & the task-ids overlaps - so the files created have to have unique names. But the total counts for "Map 1

Anyway to avoid creating subdirectories by "Insert with union”

2016-02-23 Thread mahender bigdata
Hi Below insert with union will create sub-directories while executing in Tez. set hive.execution.engine=tez; insert overwrite table t3 select * from t1 limit 1 union select * from t2 limit 2 ; Is there anyway to avoid creating sub-directories while running in tez? Or this is by d

Re: a newline in column data ruin Hive

2016-02-23 Thread Nicholas Hakobian
We just had this problem recently with our data. There are actually 2 things you have to worry about. The reader (which the suggestion above seems to solve) and the intermediate stages (if using MR). We didn't have the issue with the reader since we use Parquet and Avro to store our data, but we ha

Re: a newline in column data ruin Hive

2016-02-23 Thread Rajit Saha
Hi Mahender, You can try ESCAPED BY '\\' Like a sample below CREATE EXTERNAL TABLE test ( a1 int, b1 string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ESCAPED BY '\\' STORED AS TEXTFILE LOCATION ‘'; Thanks Rajit From: mahender bigdata mailto:mahender.bigd...@outlook.com>> Reply-To: "us

a newline in column data ruin Hive

2016-02-23 Thread mahender bigdata
Hi, We are facing issue while loading/reading data from file which has line delimiter characters like \n has part of column data. When we try to query the Hive table, data with \n gets split up into multiple rows. Is there a way to tell hive to skip escape character like \n ( row delimiter o

Combine rows with json string and map

2016-02-23 Thread Buntu Dev
I'm looking for ideas on how to go about merging columns from 2 tables. In one of the table I got a json string column that needs to be added to the map column of other table. json string: {"type": "fruit", "name":"apple"} map: {'type' -> 'fruit', 'f' -> 'foo', 'b' -> 'bar'} The resulting map fie

Re: How to set default value for a certain field

2016-02-23 Thread mahender bigdata
Thanks Zack. But it tries to modify the actual data which has null value. This might cause data issue. correct me if I'm wrong. On 2/23/2016 11:03 AM, Riesland, Zack wrote: “null defined as” is what we use *From:*mahender bigdata [mailto:mahender.bigd...@outlook.com] *Sent:* Tuesday, February

RE: How to set default value for a certain field

2016-02-23 Thread Riesland, Zack
“null defined as” is what we use From: mahender bigdata [mailto:mahender.bigd...@outlook.com] Sent: Tuesday, February 23, 2016 1:26 PM To: user@hive.apache.org Subject: Re: How to set default value for a certain field Any idea on below requirement. On 2/19/2016 2:47 PM, mahender bigdata wrote: Hi

Re: How to set default value for a certain field

2016-02-23 Thread mahender bigdata
Any idea on below requirement. On 2/19/2016 2:47 PM, mahender bigdata wrote: Hi, is there Ideal solution in Hive to specify default values at schema level. Currently we are using *COALESCE *operator in converting null values to default value, this would require reading entire table. But it w

RE: hive memory error: GC overhead limit exceeded

2016-02-23 Thread Gary Clark
The problem is that your garbage collector has maxed out. One of the ways I got around this was to reduce your datasets in the query that your running. Increasing limits is a temporary solution and eventually it will be hit. Thanks, Gazza From: Daniel Lopes [mailto:dan...@bankfacil.com.br] Sent:

hive memory error: GC overhead limit exceeded

2016-02-23 Thread Daniel Lopes
Hi, Anyone know this error? running at Amazon EMR. 2016-02-19 10:32:34 Starting to launch local task to process map join; maximum memory = 932184064 # # java.lang.OutOfMemoryError: GC overhead limit exceeded # -XX:OnOutOfMemoryError="kill -9 %p kill -9 %p" # Executing /bin/sh -c "kill -9 15759

Re: Spark SQL is not returning records for hive bucketed tables on HDP

2016-02-23 Thread @Sanjiv Singh
Yes, It is very strange and also very opposite to my belief on Spark SQL on hive tables. I am facing this issue on HDP setup on which COMPACTION is required only once. On the other hand, Apache setup doesn't required compaction even once. May be something got triggered on meta-store after compact

Re: Spark SQL is not returning records for hive bucketed tables on HDP

2016-02-23 Thread @Sanjiv Singh
Try this, hive> create table default.foo(id int) clustered by (id) into 2 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true'); hive> insert into default.foo values(10); scala> sqlContext.table("default.foo").count // Gives 0, which is wrong because data is still in delta files Now run