Re: locks are held on tables even when no job running

2015-10-23 Thread Divakar Reddy
might be your issue related to https://issues.apache.org/jira/browse/HIVE-10500 Can you please add this in the custom hive-site.xml and try? Name: datanucleus.connectionPoolingType Value: dbcp On Fri, Oct 23, 2015 at 2:37 PM, Mich Talebzadeh wrote: > Hi Eugene, > > > > The code drops the table

Hive 1.0.0 Error: cannot be cast

2015-10-23 Thread Daniel Lopes
I'm HIVE 1.0.0 and I got this error Query ID = hadoop_20151023210202_9d73cf48-62f0-4d47-ae26-f5dfff0a24d9 Total jobs = 3 Execution log at: /var/log/hive/tmp/hadoop/hadoop_20151023210202_9d73cf48-62f0-4d47-ae26-f5dfff0a24d9.log 2015-10-23 09:03:14 Starting to launch local task to process map join;

RE: locks are held on tables even when no job running

2015-10-23 Thread Mich Talebzadeh
Hi Eugene, The code drops the table if exists and that is the exclusive lock. Once created it is populated from another use asehadoop; drop table if exists t; create table t ( owner varchar(30) ,object_name varchar(30) ,subobject_name varchar(30)

Re: Issue with job serialization formats mangling results

2015-10-23 Thread Aaron Wiebe
Right on - that solved it. Thanks Gopal. On Fri, Oct 23, 2015 at 3:31 PM, Gopal Vijayaraghavan wrote: > > >>I've then created ORC and Parquet versions of this same table. The >>behavior remains... select * works, any filter creates horribly >>mangled results. >> >>To replace- throw this into a

Reading JSON data & org.apache.hadoop.hive.contrib.serde2.JsonSerde

2015-10-23 Thread Sam Joe
Hi, Does *org.apache.hadoop.hive.contrib.serde2.JsonSerde* come with features of reading nested data? Also, could you please help me with a location to download the jar for: *org.apache.hadoop.hive.contrib.serde2.JsonSerde*? Appreciate your help! Thanks, Joel

Re: Issue with job serialization formats mangling results

2015-10-23 Thread Gopal Vijayaraghavan
>I've then created ORC and Parquet versions of this same table. The >behavior remains... select * works, any filter creates horribly >mangled results. > >To replace- throw this into a file: > >{"id":1,"order_id":8,"number":1,"broken":"#\n---\nstuff\nstuff2: >\"stuff3\"\nstuff4: '730'\nstuff5: []

Issue with job serialization formats mangling results

2015-10-23 Thread Aaron Wiebe
Hey folks, I've been working on a rather odd issue for a while now, and I'm going to need a hand here. In one field of a table, I have yaml inside the field (including \n's). Regardless of the storage format (parquet, orc, json using the openx serde), hive will unpack the newlines (even though t

Re: locks are held on tables even when no job running

2015-10-23 Thread Eugene Koifman
Mich, how were you running/killing the job? was it ^C of CLI or something else? (The only time you’d get Exclusive lock is to drop an object. (With DbTxnManager which looks like what you are using)) The locks will timeout but https://issues.apache.org/jira/browse/HIVE-11317 may be relevant. Fu

Re: the number of files after merging

2015-10-23 Thread Prasanth Jayachandran
Hi CONCATENATE uses CombineHiveInputFormat internally to group files for concatenate. Many small files are grouped until the total size reaches default max split size of 256MB. These files are then merged to form a single file. So if you want to control the number of files you have to bump up t

Re: the column names removed after insert select

2015-10-23 Thread Elliot West
Excellent news. Thanks. On 23 October 2015 at 15:50, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Hi > > This has been fixed recently > https://issues.apache.org/jira/browse/HIVE-4243 > > This used to be a problem with the way hive writes rows out. The > ObjectInspectors sent o

Re: the column names removed after insert select

2015-10-23 Thread Prasanth Jayachandran
Hi This has been fixed recently https://issues.apache.org/jira/browse/HIVE-4243 This used to be a problem with the way hive writes rows out. The ObjectInspectors sent out by hive’s filesink operator contains internal column names and not the names of the destination table. From the record rea

Re: Hive on Spark

2015-10-23 Thread Xuefu Zhang
you need to increase spark.yarn.executor.memoryOverhead. it has nothing to do with storage layer. --Xuefu On Fri, Oct 23, 2015 at 4:49 AM, Jone Zhang wrote: > I get an the error every time while I run a query on a large data set. I > think use MEMORY_AND_DISK can avoid this problem under the li

Re: Hive on Spark

2015-10-23 Thread Jone Zhang
I get an the error every time while I run a query on a large data set. I think use MEMORY_AND_DISK can avoid this problem under the limited resources. "15/10/23 17:37:13 Reporter WARN org.apache.spark.deploy.yarn.YarnAllocator>> Container killed by YARN for exceeding memory limits. 7.6 GB of 7.5 GB

Re: Hive on Spark

2015-10-23 Thread Xuefu Zhang
Yeah. for that, you cannot really cache anything through Hive on Spark. Could you detail more what you want to achieve? When needed, Hive on Spark uses memory+disk for storage level. On Fri, Oct 23, 2015 at 4:29 AM, Jone Zhang wrote: > 1.But It's no way to set Storage Level through properties f

Re: Hive on Spark

2015-10-23 Thread Jone Zhang
1.But It's no way to set Storage Level through properties file in spark, Spark provided "def persist(newLevel: StorageLevel)" api only... 2015-10-23 19:03 GMT+08:00 Xuefu Zhang : > quick answers: > 1. you can pretty much set any spark configuration at hive using set > command. > 2. no. you have t

Re: Hive on Spark

2015-10-23 Thread Xuefu Zhang
quick answers: 1. you can pretty much set any spark configuration at hive using set command. 2. no. you have to make the call. On Thu, Oct 22, 2015 at 10:32 PM, Jone Zhang wrote: > 1.How can i set Storage Level when i use Hive on Spark? > 2.Do Spark have any intention of dynamically determine

Re: Need suggestions on processing JSON junk (e.g., invalid double quotes) data using HIVE

2015-10-23 Thread Artem Ervits
Hive 0.13 and up has json serde built in, no need to register another serde . Flume has a Hive streaming sink so you could directly stream to Hive as well with flume 1.6.the json serde is from hcatalog BTW. You sure the text field doesn't have the quote? User-entered data may be malformed. Your oth

Re: the column names removed after insert select

2015-10-23 Thread Elliot West
I was seeing something similar in the initial ORC delta file when inserting rows into a newly created ACID table. Subsequent deltas had the correct columns names. On 23 October 2015 at 08:25, patcharee wrote: > Hi > > I inserted a table from select (insert into table newtable select date, > hh,

RE: locks are held on tables even when no job running

2015-10-23 Thread Mich Talebzadeh
Hi Furcy, Thanks for the info. I ran the same job twice, killing it first time and starting again. Actually your point about 5 min duration seems to be correct. my process basically creates a new hive table with two additional columns and populate it from an existing table hence the lock

Re: locks are held on tables even when no job running

2015-10-23 Thread Furcy Pin
Hi Mich, I believe the duration of locks is defined by hive.txn.timeout, which is 5 min by default. https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties Retry your SHOW LOCKS command and check that the Last HeartBeat is not changing. If it is, it means your query is still act

locks are held on tables even when no job running

2015-10-23 Thread Mich Talebzadeh
Hi, What is the duration of locks held in Hive? I have got the following locks in Hive, although I have already killed the jobs! Lock ID DatabaseTable Partition State TypeTransaction ID Last Hearbeat Acquired At UserHostname 14031 asehadoop

the column names removed after insert select

2015-10-23 Thread patcharee
Hi I inserted a table from select (insert into table newtable select date, hh, x, y from oldtable). After the insert the column names of the table have been removed, see the output below when I use hive --orcfiledump - Type: struct<_col0:int,_col1:int,_col2:int,_col3:int> while it is suppose