Re: Creating hive external table gives GC pool 'PS MarkSweep' had collection(s)

2016-02-28 Thread Margus Roo
Hi Can someone confirm that Hive checks files in destination directory before creating external table? At the moment in Hive 1.2.1 end user can just easily kill whole Hive server creating external table and pointing to directory where are loads of files. Margus (margusja) Roo http://margus.r

Re: ORC file split calculation problems

2016-02-28 Thread Prasanth Jayachandran
Hi Patrick Please find answers inline On Feb 26, 2016, at 9:36 AM, Patrick Duin mailto:patd...@gmail.com>> wrote: Hi Prasanth. Thanks for the quick reply! The logs don't show much more of the stacktrace I'm afraid: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.Or

Re: SARG predicate is ignored when query ORC table

2016-02-28 Thread Prasanth Jayachandran
Hi Please find answers inline. On Feb 28, 2016, at 2:50 AM, Mich Talebzadeh mailto:mich.talebza...@gmail.com>> wrote: Hi Jessica, Interesting. The ORC files are laid out in stripes that are specified by orc.stripe.size (default 64MB). Within each stripe you have row groups of 10K rows that

[BEST PRACTICES]: Registering Hbase table as hive external table

2016-02-28 Thread Divya Gehlot
Hi, Has any worked on registering Hbase tables as hive ? I would like to know the best practices as well as pros and cons of it . Would really appreciate if you could refer me to good blog ,study materials etc. If anybody has hands on /production experience ,could you please share the tips? Than

Re: How the actual "sample data" are implemented when using tez reduce auto-parallelism

2016-02-28 Thread Rajesh Balamohan
"tez.shuffle-vertex-manager.desired-task-input-size" - Determines the amount of desired task input size per reduce task. Default is around 100 MB. "tez.shuffle-vertex-manager.min-task-parallelism" - Min task parallelism that ShuffleVertexManager should honor. I.e, if the client has set it as 100,

[Error] : while registering Hbase table with hive

2016-02-28 Thread Divya Gehlot
Hi, I trying to register a hbase table with hive and getting following error : Error while processing statement: FAILED: Execution Error, return code 1 > from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: > MetaException(message:org.apache.hadoop.hive.serde2.SerDeException E

How to Query running in background in tez

2016-02-28 Thread mahender bigdata
Hi, I have 2 queries regarding Hive Query 1. Is there a way to know which Hive Query is running in background by application ID, I would also like to know location of Log during running of the hive query in TEZ mode 2. If I'm having cluster 20 Nodes, If I submit a query, query takes ent

Re: Running hive queries in different queue

2016-02-28 Thread Rajit Saha
Thanks a lot Sathi. I also found in the Hive Execution Engine is MapReduce set mapreduce.job.queuename=; works If the Hive Execution Engine is Tez We need to do set tez.queue.name=; Cheers Rajit Saha Principal DevOps Engineer | BigData Lending

Re: SARG predicate is ignored when query ORC table

2016-02-28 Thread Mich Talebzadeh
Hi Jessica, Interesting. The ORC files are laid out in stripes that are specified by *orc.stripe.size* (default 64MB). Within each stripe you have row groups of 10K rows that keep statistics for both data and index. Your query should perform a SARG pushdown that limits which rows are required for