Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Rajesh Balamohan
Congratulations Simhadri. :) ~Rajesh.B On Fri, Apr 19, 2024 at 2:02 AM Aman Sinha wrote: > Congrats Simhadri ! > > On Thu, Apr 18, 2024 at 12:25 PM Naveen Gangam > wrote: > >> Congrats Simhadri. Looking forward to many more contributions in the >> future. >> >> On Thu, Apr 18, 2024 at 12:25 PM

Re: Tez hook for "INSERT INTO TABLE PARTITION(...)" query

2023-01-03 Thread Rajesh Balamohan
If it is at the end of creating the partition, check whether "HMS::MetaStoreEventListener::onAddPartition" can be of help. This may need customer listener to be added in HMS side. https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/

Re: TPCDS query degrade with hive-3.1.2 because of wrong estimation for reducers

2022-10-02 Thread Rajesh Balamohan
Based on the plan, filtered output in map-1 had mis-estimates and also groupby operators have large misestimates. This is causing the number of reducers to be estimated as "4" which is less for this query. Due to the partition factor of tez, it ends up with 8 reducer slots at runtime for hive 3.x.

Re: engine for hive3

2022-04-12 Thread Rajesh Balamohan
Defaults to tez. MR is deprecated and hive on spark isn't under active dev. On Wed, Apr 13, 2022 at 9:02 AM linuxspace wrote: > for hive3, what's the suggested engine? tez, spark or the default mr? > > Thanks. >

Re: Too many S3 API calls for simple queries like select and create external table

2022-02-21 Thread Rajesh Balamohan
If you are using parquet format, HIVE-25827 would be causing additional calls to s3 as the footer is read atleast twice. Add to this atleast 9+ list_status calls being made for split gen. ~Rajesh.B On Mon, Feb 21, 2022 at 10:16 AM Sungwoo Park wr

Re: Question regarding lock manager

2021-09-06 Thread Rajesh Balamohan
For the specific code you mentioned, check if you have "hive.privilege.synchronizer" enabled or not. If so, disable it explicitly. PrivSync is needed for populating information_schema. ~Rajesh.B On Mon, Sep 6, 2021 at 8:04 PM Antoine DUBOIS wrote: > Hello all > After some digging and remote jav

Re: Running Hive on Spark

2019-03-13 Thread Rajesh Balamohan
nd "what does it mean if I do that" > > Best regards > Daniel > > On Tue 12 Mar 2019, 02:21 Rajesh Balamohan, wrote: > >> Not sure why you are using SparkThriftServer. OOTB HiveServer2 would be >> good enough for this. >> >> Is there any specific

Re: Running Hive on Spark

2019-03-11 Thread Rajesh Balamohan
Not sure why you are using SparkThriftServer. OOTB HiveServer2 would be good enough for this. Is there any specific reason for moving from tez to spark as execution engine? ~Rajesh.B On Mon, Mar 11, 2019 at 9:45 PM Daniel Mateus Pires wrote: > Hi there, > > I would like to run Hive using Spark

Re: hive-testbench - Hive + TEZ TPC-DS job gets stuck

2017-09-25 Thread Rajesh Balamohan
'Pending' count of 4 for long time suggests that you may have to check the cluster capacity. ~Rajesh.B On Sun, Sep 24, 2017 at 7:41 AM, Krishnanand Khambadkone < kkhambadk...@yahoo.com> wrote: > Hi, I am trying to run a small 4GB TPC-DS test using the hortonworks > hive-testbench framework. I

Re: Out of Memory while generating ORC Splits

2017-09-13 Thread Rajesh Balamohan
ed by setting this parameter to > true is less than the number of mappers generated by setting the > split.strategy=BI. Therefore, I am hoping that using this parameter along > with HYBRID is better than using BI split strategy. Can you please comment > on this? > > Thanks,

Re: Out of Memory while generating ORC Splits

2017-09-13 Thread Rajesh Balamohan
With "HYBRID" can you try with "hive.orc.cache.use.soft.references=true"? That should help in preventing OOM with Hybrid strategy. ~Rajesh.B On Wed, Sep 13, 2017 at 2:54 PM, Jay wrote: > Hi All, > > I am running a simple select query as below > > select distinct vehicle_no from > rmd.gets_dw_e

Re: Fail to load table via Tez

2017-07-07 Thread Rajesh Balamohan
You can run *"yarn logs -applicationId application_1499426430661_0113 > application_1499426430661_**0113.log"* to get the app logs. Would suggest you to try with *"hive --hiveconf tez.grouping.max-size=134217728 --hiveconf tez.grouping.min-size=** 134217728" *for running your hive query. You may

Re: u...@tez.apache.org

2016-12-25 Thread Rajesh Balamohan
ive-llap-functionality.html. > > > 2016. 12. 26., 오후 2:34, Rajesh Balamohan 작성: > > Much easier option is to make use of https://github.com/ > t3rmin4t0r/tez-autobuild (edit/set args in slider-gen.sh). > > ~Rajesh.B > > On Mon, Dec 26, 2016 at 11:02 AM, Rajesh Balamohan &g

Re: u...@tez.apache.org

2016-12-25 Thread Rajesh Balamohan
Much easier option is to make use of https://github.com/t3rmin4t0r/tez-autobuild (edit/set args in slider-gen.sh). ~Rajesh.B On Mon, Dec 26, 2016 at 11:02 AM, Rajesh Balamohan wrote: > Here is an example: > > hive --service llap --instances 1 --args "-XX:+UseG1GC > -agentl

Re: u...@tez.apache.org

2016-12-25 Thread Rajesh Balamohan
Here is an example: hive --service llap --instances 1 --args "-XX:+UseG1GC -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000" --cache 48000m --executors 8 --iothreads 8 --size 18m --xmx 128000m --loglevel INFO --javaHome /usr/jdk64/jdk1.8.0_77/ This would generate a "run.sh"

Re: tez + union stmt

2016-12-24 Thread Rajesh Balamohan
Are there any exceptions in hive.log?. Is tmp_pv_v4* table part of the select query? Assuming you are creating the table in staging.db, it would have created the table location as staging.db/foo (as you have not specified the location). Adding user@hive.apache.org as this is hive related. ~Raje

Re: [ANNOUNCE] New Hive Committer - Rajesh Balamohan

2016-12-13 Thread Rajesh Balamohan
s Rajesh! :) >> >> On Tue, Dec 13, 2016 at 9:36 PM, Pengcheng Xiong >> wrote: >> >>> Congrats Rajesh! :) >>> >>> On Tue, Dec 13, 2016 at 6:51 PM, Prasanth Jayachandran < >>> prasan...@apache.org >>> > wrote: >>> >

Re: Trace Key-Value pairs

2016-12-04 Thread Rajesh Balamohan
Hi Robert, Tez deals with bytes and does not understand if the data is coming from Hive/Pig/Cascading etc. So in case you print the content from Hive, you would get mostly binary data. For hive, org.apache.hadoop.hive.ql.io.HiveKey, and value would be org.apache.hadoop.io.BytesWritable. Printing

Re: Some Hive on Tez queries don't finish

2016-11-28 Thread Rajesh Balamohan
Are there are any exceptions seen in the app logs (you can ignore the Interrupted exceptions in the logs as you killed the job). It would be helpful if you can share the app logs. ~Rajesh.B On Mon, Nov 28, 2016 at 2:53 PM, Premal Shah wrote: > Hi, > We've been running Hive 2.0.1 on Tez 0.8.4 fo

Re: msck repair table and hive v2.1.0

2016-07-14 Thread Rajesh Balamohan
Hi Stephen, Can you try by turning off multi-threaded approach by setting "hive.mv.files.thread=0"? You mentioned that your tables tables are in s3, but the external table created was pointing to HDFS. Was that intentional? ~Rajesh.B On Fri, Jul 15, 2016 at 6:58 AM, Stephen Sprague wrote: > i

Re: How the actual "sample data" are implemented when using tez reduce auto-parallelism

2016-02-28 Thread Rajesh Balamohan
"tez.shuffle-vertex-manager.desired-task-input-size" - Determines the amount of desired task input size per reduce task. Default is around 100 MB. "tez.shuffle-vertex-manager.min-task-parallelism" - Min task parallelism that ShuffleVertexManager should honor. I.e, if the client has set it as 100,

Re: Hive on TEZ fails starting

2016-01-06 Thread Rajesh Balamohan
m > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or en

Re: Hive on TEZ fails starting

2016-01-05 Thread Rajesh Balamohan
ion in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > L

Re: Hive on TEZ fails starting

2016-01-04 Thread Rajesh Balamohan
estroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free,

Re: Hive on TEZ fails starting

2016-01-04 Thread Rajesh Balamohan
Can you try removing double-quotes for "tez.lib.uris" in tez-site.xml (i.e just use hdfs://rhes564:9000/apps/tez-0.7.1-SNAPSHOT/tez-0.7.1- SNAPSHOT.tar.gz)? ~Rajesh.B On Tue, Jan 5, 2016 at 5:30 AM, Mich Talebzadeh wrote: > Hi, > > > > Trying to run Hive on TEZ for the first time. Getting the

Re: config recommendations to boost performance

2015-02-25 Thread Rajesh Balamohan
>> A query like "select name,count(id) from table where date='2015-01-01' or date='2015-01-02' group by (name)" takes almost forever and needs to be cancelled after ~30min. >> It should have ideally scanned only the 2 partitions. Do you see any container launches after which you had to kill the jo

Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-22 Thread Rajesh Balamohan
Congratulations Gopal and Szehon On Mon, Jun 23, 2014 at 9:12 AM, Carl Steinbach wrote: > The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho > committers on the Apache Hive Project. > > Please join me in congratulating Gopal and Szehon! > > Thanks. > > - Carl > -- ~Ra

Re: Hive 0.13 Metastore => MySQL BoneCP issue

2014-06-11 Thread Rajesh Balamohan
Hi Soam, Can you please provide the value specified for "datanucleus.connectionPool.maxIdle"? ~Rajesh.B On Wed, Jun 11, 2014 at 2:26 AM, Soam Acharya wrote: > Hi Vaibhav, > > good question. We're using 0.8.0 RELEASE. Would 0.7.1 be preferable > instead? > > Thanks! > > Soam > > > On Tue, Jun

Re: [ANNOUNCE] New Hive Committers - Prasanth J and Vaibhav Gumashta

2014-04-25 Thread Rajesh Balamohan
Congrats folks. On Apr 25, 2014 8:52 AM, "Sushanth Sowmyan" wrote: > Congrats, guys! :) > > On Fri, Apr 25, 2014 at 12:33 AM, Lefty Leverenz > wrote: > > Congratulations! > > > > -- Lefty > > > > > > On Fri, Apr 25, 2014 at 12:10 AM, Hari Subramaniyan < > > hsubramani...@hortonworks.com> wrote:

Re: Vectorizied execution on RCFile

2014-01-10 Thread Rajesh Balamohan
putformat from HIVE-4483 with RC File. It is much less than for > vectorized query on ORC. > > > > Eric > > > > *From:* Rajesh Balamohan [mailto:rajesh.balamo...@gmail.com] > *Sent:* Wednesday, January 8, 2014 6:47 PM > *To:* user@hive.apache.org > *Subject:* Vecto

Vectorizied execution on RCFile

2014-01-08 Thread Rajesh Balamohan
Hi All, Vectorization with ORCFile provides amazing performance. Does vectorization work with RCFile as well? As per explain plan of Hive 0.13 (snapshot), it does not use vectorization with RCFile. Any pointers would be appreciated. -- ~Rajesh.B

Re: Hive skewed tables

2013-11-14 Thread Rajesh Balamohan
o worry about which > data is skewed and let the framework handle it. > > > > On Thu, Nov 14, 2013 at 11:16 AM, Rajesh Balamohan < > rajesh.balamo...@gmail.com> wrote: > >> Thanks Nitin. I have only one partition in this table for testing. I >> thought wit

Re: Hive skewed tables

2013-11-13 Thread Rajesh Balamohan
for your query it will look at all partitions. > > The setting you have kept is only applicable to join queries as it clearly > says skewjoin. Non join queries it does not have an affect. > > > Thanks, > Nitin > > > > > On Thu, Nov 14, 2013 at 6:35 AM, Rajesh Ba

Hive skewed tables

2013-11-13 Thread Rajesh Balamohan
Hi All, I have the following skewed table "addresses_1" select id, count(*) c from addresses_1 group by id order by c desc limit 10; 1426246531554806 198477395958492 102641838220181 138947865211331 156483436193429 96411677179771 210082076168033 800174765152421 1391

Re: Hive 12 with Hadoop 2.x with ORC

2013-10-22 Thread Rajesh Balamohan
. (but i am certain if this > is a protobuf version issue). > > > On Tue, Oct 22, 2013 at 6:53 AM, Rajesh Balamohan > wrote: > > Hi All, > > > > When running Hive 12 with Hadoop 2.x with ORC, I get the following error > > while converting a table with text f

Hive 12 with Hadoop 2.x with ORC

2013-10-22 Thread Rajesh Balamohan
Hi All, When running Hive 12 with Hadoop 2.x with ORC, I get the following error while converting a table with text file to ORC format table. Any help will be greatly appreciated 2013-10-22 06:50:49,563 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeEx

Re: Impala Query problem

2013-10-22 Thread Rajesh Balamohan
Can you check whether you have connectivity to the port for meta store? On Oct 22, 2013 1:44 PM, "Garg, Rinku" wrote: > Hi All, > > ** ** > > We have installed cludera *hadoop-2.0.0-mr1-cdh4.2.0* with * > hive-0.10.0-cdh4.2.0*. Both are working as desired. We can run any query > on hive shel

Re: Can we reduce/fix number of mappers in Hive

2013-10-09 Thread Rajesh Balamohan
Did you try adjusting fileinputformat. Min and max size parameters? On Oct 9, 2013 5:51 PM, "Garg, Rinku" wrote: > Hi All > > We did a successful setup of hadoop-0.20.203.0 and hive-0.7.1. We have > the following query: > > ** ** > > Is there any option in Hive where mappers can be redu

Re: only one mapper

2013-08-21 Thread Rajesh Balamohan
Good to hear that. On Thu, Aug 22, 2013 at 9:02 AM, 闫昆 wrote: > thanks all i move lzo index to hive directory is work fine . > thanks > > > 2013/8/22 Rajesh Balamohan > >> Create the LZO index after moving the file to hive directory (i.e after >> executing your

Re: only one mapper

2013-08-21 Thread Rajesh Balamohan
Create the LZO index after moving the file to hive directory (i.e after executing your LOAD DATA* statement). Index file is needed only during job execution and if its not present in the same directory, it would not split the large file. On Thu, Aug 22, 2013 at 7:11 AM, 闫昆 wrote: > In hive i u

Re: HBase --> Hive / HCatalog --> PIG

2013-07-10 Thread Rajesh Balamohan
e case on a secure cluster. > > -Thiruvel > > From: Rajesh Balamohan > Reply-To: "user@hive.apache.org" > Date: Wednesday, July 10, 2013 5:30 PM > To: "user@hive.apache.org" > Subject: HBase --> Hive / HCatalog --> PIG > > Hi All, > >

HBase --> Hive / HCatalog --> PIG

2013-07-10 Thread Rajesh Balamohan
Hi All, Has anyone tried out the following usecase in security enabled hadoop cluster?. 1. Create a table in HBase 2. Create a table in Hive pointing to the HBase table created in #1 3. Since HCatalog uses hive metastore, this table will be visible to HCatalog as well. 4. Try to access hive table

RCFile performance

2013-02-04 Thread Rajesh Balamohan
Hi Experts, I have a large file with 300+ columns. In order to query only few rows efficiently, I am using RCFile format in Hive. I have tried setting the RCFile rowgroup size from default size till 32 MB. ex: set hive.io.rcfile.record.buffer.size = 134217728; However, I do not see major change