Questions about HIVE-20508

2019-09-19 Thread Julien Phalip
Hi, I'm interested in a new config property that was added as part of HIVE-20508 , and had a few questions: 1) The update was merged into the master branch, however i

Delegation tokens for HDFS

2019-09-20 Thread Julien Phalip
Hi, My understanding is that the most common (perhaps the only?) way to let users run Hive queries on datasets stored in HDFS, is to configure Hive as a proxy user in the namenodes config. I'm wondering if, instead of using proxy user privileges, a Hive client could be configured to first collect

Re: Delegation tokens for HDFS

2019-09-29 Thread Julien Phalip
quires >giving the 'hive' user proxy privileges. > > If you aren't using Hive Server 2, the user acquires tokens before the > query gets submitted to Yarn. > > There are trade offs in each of the models. > > .. Owen > > On Fri, Sep 20, 2019 at 9:37 AM

Custom OutputCommitter not called by Tez

2022-04-26 Thread Julien Phalip
Hi, I'm working on a custom storage handler. My custom output committer class gets called normally when using the "mr" engine. However, it seems to be entirely ignored when using the "tez" engine. I'm setting the JobConf's "mapred.output.committer.class" key to my fully-qualified output committer

RE: Re: Custom OutputCommitter not called by Tez

2022-04-27 Thread Julien Phalip
https://issues.apache.org/jira/browse/HIVE-25208 < https://issues.apache.org/jira/browse/HIVE-25208> > > I hope this helps, > Peter > > > On 2022. Apr 27., at 1:46, Julien Phalip wrote: > > > > Hi, > > > > I'm working on a custom storage handle

Using IntelliJ debugger with Tez

2022-04-27 Thread Julien Phalip
Hi, I'm able to successfully use the IntelliJ debugger and set breakpoints with Hive while using the MapReduce engine by running this command: HADOOP_OPTS="-Djava.library.path=/usr/local/hadoop/lib -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=7070" hi --service hiveserver2 Howev

RE: Re: Custom OutputCommitter not called by Tez

2022-04-29 Thread Julien Phalip
lt; https://issues.apache.org/jira/browse/TEZ-4279 < https://issues.apache.org/jira/browse/TEZ-4279>> > > > > > > Later we ended up with this final solution: https://issues.apache.org/jira/browse/HIVE-25208 < https://issues.apache.org/jira/browse/HIVE-25208> < https://i

Detecting write mode (append, overwrite) in custom storage handler

2022-04-29 Thread Julien Phalip
Hi, I'm working on a custom storage handler and am wondering if there's a way to detect the appropriate write mode for the output table. For example, an "INSERT" statement is expected to append rows to the table, whereas an "INSERT OVERWRITE" statement should first clear the table before adding ne

RE: Detecting write mode (append, overwrite) in custom storage handler

2022-04-29 Thread Julien Phalip
r/src/java/org/apache/hadoop/hive/druid/DruidStorageHandler.java#L806> uses that method, so I'll have to dive into that a bit more. Thanks, Julien On 2022/04/30 00:44:44 Julien Phalip wrote: > Hi, > > I'm working on a custom storage handler and am wondering if there's a way &

RE: RE: Re: Custom OutputCommitter not called by Tez

2022-04-30 Thread Julien Phalip
java#L900-L912>), whereas the former should theoretically be called if any error occurs during the job (i.e. a continuously failing task). One of my goals would be to clean up the temporary files created by the tasks if the job fails. What would be a good hook for that? Thank you, Julien On 2022/

RE: RE: Detecting write mode (append, overwrite) in custom storage handler

2022-04-30 Thread Julien Phalip
I realize this is in fact quite related to another thread that I recently started: https://lists.apache.org/thread/s0pzmgmq6trdjtxc50qwpww2dlzxql9b So this discussion could continue there. On 2022/04/30 04:08:27 Julien Phalip wrote: > I've noticed that the DefaultHiveMetaHook &

RE: RE: Re: Custom OutputCommitter not called by Tez

2022-04-30 Thread Julien Phalip
other instance would load it up on the other end to perform the commit. Julien On 2022/04/29 21:06:49 Julien Phalip wrote: > Hi Peter, > > Looking at https://issues.apache.org/jira/browse/TEZ-4279, it seems that > the fix might have been applied to 0.9.3. Is that correct? If so, do you

RE: Custom OutputCommitter not called by Tez

2022-05-02 Thread Julien Phalip
Hi Peter, Thanks a lot for the breakdown, it all makes sense. Unfortunately I work with companies who are stuck with older versions of Hive, so I'm trying to find some workarounds. I was actually able to make it mostly work. Here's what I do: - In configureJobConf(): - Create a work di

AvroSerde's inferred schema for NOT NULL columns

2022-05-04 Thread Julien Phalip
Hi, I'm trying to create a table with a NOT NULL column: CREATE TABLE mytable (int_required BIGINT NOT NULL, ) However, it looks like the schema that AvroSerde generates ignores the "NOT NULL" part and outputs the following avro field schema: {"name":"int_required","type":["null","long"],"de

Issue with the "hive.io.file.readcolumn.names" property

2022-05-15 Thread Julien Phalip
Hi, I've noticed an odd behavior with the 'hive.io.file.readcolumn.names' conf property. Imagine a simple table "mytable" with two fields: "text" and "number". - If you run the query "SELECT * FROM mytable", then the "hive.io.file.readcolumn.names" has the value: "text,number". Makes sense so fa

RE: Issue with the "hive.io.file.readcolumn.names" property

2022-05-15 Thread Julien Phalip
Also, I forgot to mention, I'm using Hive v3.1.2. On 2022/05/16 03:09:19 Julien Phalip wrote: > Hi, > > I've noticed an odd behavior with the 'hive.io.file.readcolumn.names' conf > property. > > Imagine a simple table "mytable" with two fields:

RE: RE: Issue with the "hive.io.file.readcolumn.names" property

2022-05-18 Thread Julien Phalip
/HiveIcebergStorageHandler.java#L538 However, it still works fine for me with tez even without setting "tez.mrreader.config.update.properties". Do you know what's causing this? Is there a workaround for the "mr" engine to consistently get the proper value for "hive.i

Creating table with "interval" column type

2022-11-11 Thread Julien Phalip
Hi, I'm using Hive 3.1.2 and I can't quite figure out how to define a table with an "interval" column type. I've tried both: CREATE TABLE (duration INTERVAL); and: CREATE TABLE (duration INTERVAL DAY); but that returns an exception: cannot recognize input near 'INTERVAL' 'DAY' ',' in column

Tez hook for "INSERT INTO TABLE PARTITION(...)" query

2022-12-20 Thread Julien Phalip
Hi, I'm writing a custom Storage Handler and would need to run some custom code at the end of an INSERT query. I can easily do that by providing a custom OutputCommitter class and overriding the commitJob() method. However, that only works for the "mr" execution engine, as the "commitJob()" metho