COLUMNS_V2 high RDS load

2023-10-02 Thread Patrick Duin
Hi, We've been investigating some high db load in our HMS server (version 2.3.9 on Mysql 5.7 aurora 2.11.2). This seems to be due to sort indexing being created for queries on the COLUMNS_V2 table. After some digging we think we see the same thing as this ticket/PR tries to solve: https://issues.a

Re: IMetaStoreClient.alter_table does not support dropping columns?

2021-11-04 Thread Patrick Duin
Hi Yussuf A hive user here having the same issues. I think the interface method just follows the same code path as an Alter table query would do. My current thinking is that this safeguard was probably more useful in the olden days of CSV files. With the more modern file formats like Avro, ORC and

Re: Usage of IMetaStoreClient#reconnect

2021-09-29 Thread Patrick Duin
In some of our tools we used to interact with the metastore we've moved away from long running clients altogether the thrift protocol is best served by just creating a new client for a request. Try to just create a new client every time. They are fast to make. They metastore clients are also not th

Override hive.metastore.disallow.incompatible.col.type.changes

2021-01-18 Thread Patrick Duin
Hi I'm struggling to override the 'hive.metastore.disallow.incompatible.col.type.changes' conf. I've got a table (Parquet format) which needs some columns renamed/dropped, structs changed, Hive cli doesn't have the option to drop columns so I'm going Hive thrift api route but keep getting the exce

Re: What is the persistance of a parameters set in HQL ?

2020-06-03 Thread Patrick Duin
That's only for the session, so if you start up the Hive cli set some params and do a query they will be used. Close client and start up a new hive cli session and you're back to defaults. To make it permanent you'll have to change them in the hive-site.xml. Hope that helps, Patrick. Op wo 3 jun.

union distinct and complex types (maps)

2020-05-12 Thread Patrick Duin
Hi, I've got a question is doing a union distinct on a column with a 'map' type fully supported? We've seen in Spark that it is not and Spark throws an exception. Hive seems fine but we were wondering if anyone ever had any issues with this (we're on hive 2.3.x). Any pointers on where in the code

Re: OutOfMemoryError after loading lots of dynamic partitions

2020-01-09 Thread Patrick Duin
hive-outofmemoryerror-heap-space/ > > > > > On Wed, Jan 8, 2020 at 11:38 AM Patrick Duin wrote: > >> The query is rather large it won't tell you much (it's generated). >> >> It comes down to this: >> WITH gold AS ( select * f

Re: OutOfMemoryError after loading lots of dynamic partitions

2020-01-08 Thread Patrick Duin
< rock...@gmail.com>: > Could you please post your insert query snippet along with the SET > statements ? > > On Wed, Jan 8, 2020 at 11:17 AM Patrick Duin wrote: > >> Hi, >> I got a query that's producing about 3000 partitions which we load &g

OutOfMemoryError after loading lots of dynamic partitions

2020-01-08 Thread Patrick Duin
Hi, I got a query that's producing about 3000 partitions which we load dynamically (On Hive 2.3.5). At the end of this query (running on M/R which runs fine) the M/R job is finished and we see this on the hive cli: Loading data to table my_db.temp__v1_2019_12_03_182627 partition (c_date=null, c_ho

Re: out of memory using Union operator and array column type

2019-03-12 Thread Patrick Duin
set hive.map.aggr=false; Worked for me. Slow and steady wins the race :) Many thanks all! Patrick Op di 12 mrt. 2019 om 03:23 schreef Gopal Vijayaraghavan : > > > I'll try the simplest query I can reduce it to with loads of memory and > see if that gets anywhere. Other pointers are much appre

Re: out of memory using Union operator and array column type

2019-03-11 Thread Patrick Duin
; > regards > Dev > > > On Mon, Mar 11, 2019 at 9:21 PM Patrick Duin wrote: > >> Very good question, Yes that does give the same problem. >> >> Op ma 11 mrt. 2019 om 16:28 schreef Devopam Mittra : >> >>> Can you please try doing SELECT DISTINCT *

Re: out of memory using Union operator and array column type

2019-03-11 Thread Patrick Duin
Very good question, Yes that does give the same problem. Op ma 11 mrt. 2019 om 16:28 schreef Devopam Mittra : > Can you please try doing SELECT DISTINCT * FROM DELTA into a physical > table first ? > regards > Dev > > > On Mon, Mar 11, 2019 at 7:59 PM Patrick Duin wrot

out of memory using Union operator and array column type

2019-03-11 Thread Patrick Duin
Hi, I'm running into oom issue trying to do a Union all on a bunch of AVRO files. The query is something like this: with gold as ( select * from table1 where local_date=2019-01-01), delta ss ( select * from table2 where local_date=2019-01-01) insert overwrite table3 PARTITION ('local_date'

Re: Control large file output in dynamic partitioned insert

2018-09-25 Thread Patrick Duin
this benefits anyone else. Op ma 24 sep. 2018 om 18:22 schreef Patrick Duin : > Hi all, > > I got a query doing an insert overwrite like this: > > WITH tbl1 AS ( > SELECT >col0, col1, local_date, local_hour > FROM tbl1 > WHERE > ), > tbl2

Control large file output in dynamic partitioned insert

2018-09-24 Thread Patrick Duin
Hi all, I got a query doing an insert overwrite like this: WITH tbl1 AS ( SELECT col0, col1, local_date, local_hour FROM tbl1 WHERE ), tbl2 AS ( SELECT col0, col1, local_date, local_hour FROM tbl2 WHERE ) INSERT OVERWRITE TABLE tbl3 PARTITION (local_date, local_hour)

Re: Enabling Snappy compression on Parquet

2018-08-22 Thread Patrick Duin
ta and > you are creating a table with snappy compression, you need to do use > "CREATE into new_compressed table as select * from un_compressed_table" in > order to actually compress the data > > Regards, > Tanvi Thacker > > On Fri, Aug 10, 2018 at 6:30 AM Patrick

Enabling Snappy compression on Parquet

2018-08-10 Thread Patrick Duin
Hi, I got some hive tables in Parquet format and I am trying to find out how best to enable compression. Done a bit of searching and the information is a bit scattered but I found I can use this hive property to enable compression.It needs to be set before doing an insert. set parquet.compressio

Re: Parquet schema evolution, column conversion not supported

2018-07-27 Thread Patrick Duin
Replying to myself as I found my issue, I hadn't updated the schema of my partitions correctly, I've only updated the table schema, the error went away when I updated my partitions. All data was query-able old and newly landed data. Op do 26 jul. 2018 om 11:22 schreef Patrick Duin

Parquet schema evolution, column conversion not supported

2018-07-26 Thread Patrick Duin
I'm encountering errors in Hive 2.3.2 when reading sets of Parquet files, where the schema has evolved. The error I'm seeing is : Failed with exception java.io.IOException:java.lang.RuntimeException: Hive internal error: conversion of string to arraynot supported yet. My schema has a top-level co

BeeJU: JUnit rules for testing code manipulating the Hive Metastore Client

2017-01-27 Thread Patrick Duin
Hi, We've just open sourced a library that we have been using internally at Hotels.com for unit testing applications that use the Hive metastore service. It's called BeeJU and is a set of JUnit rules that spin up (and tear down) a Hive Metastore client using an in-memory database. If you write any

Re: 答复: Difference between MANAGED_TABLE and EXTERNAL_TABLE in org.apache.hadoop.hive.metastore.TableType

2016-12-02 Thread Patrick Duin
Hi, I've noticed the same thing we set the table parameter as well to make sure the table is External. replica.putToParameters("EXTERNAL", "TRUE") Not sure if the tableType is actually used anywhere, we set it anyway as well as the table parameter just to be sure when using the Metastore API. No

Re: ORC file split calculation problems

2016-03-04 Thread Patrick Duin
inline.. > > > On Mar 1, 2016, at 8:41 AM, Patrick Duin wrote: > > Hi Prasanth, > > Thanks for this. I tried out the configuration and I wanted to share some > number with you. > > My test setup is a cascading job that reads in 240 files (ranging from > 1.5GB to

Re: ORC file split calculation problems

2016-03-01 Thread Patrick Duin
. Would be good to know if other users have similar experiences. Again thanks for your help. Kind regards, Patrick. 2016-02-29 6:38 GMT+00:00 Prasanth Jayachandran < pjayachand...@hortonworks.com>: > Hi Patrick > > Please find answers inline > > On Feb 26, 2016, at 9:36 AM

Re: ORC file split calculation problems

2016-02-26 Thread Patrick Duin
s will be read for split pruning. > > The default strategy does it automatically (choosing between when to read > and when not to footers). It is configurable as well. > > > > > Thanks > > Prasanth > > > >> On Feb 25, 2016, at 7:08 AM, Patrick Duin wrote

ORC file split calculation problems

2016-02-25 Thread Patrick Duin
Hi, We've recently moved one of our datasets to ORC and we use Cascading and Hive to read this data. We've had problems reading the data via Cascading, because of the generation of splits. We read in a large number of files (thousands) and they are about 1GB each. We found that the split calculati

Re: hive ORC wrong number of index entries error

2015-09-24 Thread Patrick Duin
wn from default 256KB. You can do that by setting > orc.compress.size tblproperties. > > On Sep 24, 2015, at 3:27 AM, Patrick Duin wrote: > > Thanks for the reply, > My first thought was out of memory as well but the illegal argument > exception happens before it is a separate entr

Re: hive ORC wrong number of index entries error

2015-09-24 Thread Patrick Duin
jayachand...@hortonworks.com>: > Looks like you are running out of memory. Trying increasing the heap > memory or reducing the stripe size. How many columns are you writing? Any > idea how many record writers are open per map task? > > - Prasanth > > On Sep 22, 2015, at 4:32 AM,

hive ORC wrong number of index entries error

2015-09-22 Thread Patrick Duin
Hi all, I am struggling trying to understand a stack trace I am getting trying to write an ORC file: I am using hive-0.13.0/hadoop-2.4.0. 2015-09-21 09:15:44,603 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewDirectOutputColle