Re: Hive on tez - fix number of tasks

2015-02-19 Thread Fabio C.
Thanks Siddharth, at a first glimpse I couldn't find an option in hive to disable split grouping, but I will check and eventually try the min-max setting for split size. Thanks a lot Fabio On Thu, Feb 19, 2015 at 11:02 AM, Siddharth Seth wrote: > Fabio, > One of the simplest ways to achieve th

Re: select on parquet hive tables always gives NULL ?

2015-02-19 Thread Yang
ah... found out. my issue is that hive 0.13 doesn't handle this correctly. could be a bug. used 0.14, it works. btw the UNION[int, null] translates to parquet as a field "optional int32 myfieldName", I found this by calling ParquetFileReader.readFooter() On Thu, Feb 19, 2015 at 11:32 AM, Yang

Re: Parquet support for Timestamp in 0.14

2015-02-19 Thread Yang
ah... found out. my issue is that hive 0.13 doesn't handle this correctly. could be a bug. used 0.14, it works. btw the UNION[int, null] translates to parquet as a field "optional int32 myfieldName", I found this by calling ParquetFileReader.readFooter() On Thu, Feb 19, 2015 at 12:08 PM, Yang

Re: Parquet support for Timestamp in 0.14

2015-02-19 Thread Yang
Szehon: another question related to the types support: if I convert an avro field of UNION to parquet, does hive support that UNION field ? a UNION is needed because avro field can not take NULL, and I have to define every field as an UNION of original type and NULL. Thanks Yang On Mon, Feb 9,

select on parquet hive tables always gives NULL ?

2015-02-19 Thread Yang
I created a parquet file, expose that to hive using an external table, but select from such tables are always giving NULL. to show the symptom, I created the following data set , each record has only 2 fields __PRIMARY_KEY__ and nullableInt. the schema represented in avro is the following (I co

CombineHiveInputFormat does not call getSplits on custom InputFormat

2015-02-19 Thread Luke Lovett
I'm working on defining a custom InputFormat and OutputFormat for use with Hive. I'd like tables using these IF/OF to be native tables, so that I can LOAD DATA and INSERT INTO them. However, I'm finding that with the default CombineHiveInputFormat, the getSplits method of my InputFormat is not

Re: Remove duplicated rows

2015-02-19 Thread Philippe Kernévez
Hi Dev and thank you for you On Wed, Feb 18, 2015 at 11:31 AM, Devopam Mittra wrote: > hi Philippe, > Performance improvement has two factors : 1. availability (read abundance) > of resources 2.need for speed > All "advise" usually is to address mainly these two factors , as I have > usually see

Table name concatenation

2015-02-19 Thread Philippe Kernévez
Hi, I'am using the properties "hiveconf hive.cli.print.header" to add a first row with the names of the columns. Since my upgrade (0.12 -> 0.14) my columns are prefixed by the name of the table : before : campaign_name, campaign_id, etc. now : table.campaign_name, table.campaign_id, e

Re: Hive on tez - fix number of tasks

2015-02-19 Thread Siddharth Seth
Fabio, One of the simplest ways to achieve this is to disable split grouping completely. You may end up with a large number of tasks in this case though. This gets rid of the dynamic split generation based on cluster node. (You'll have to check with Hive on how to disable this). Other than this, se

Hive on tez - fix number of tasks

2015-02-19 Thread Fabio C.
Hi everyone, I see that Hive on Tez dynamically chooses the number of tasks to launch for each vertex in the generated DAG according to cluster load (other than data size). For research purposes I'd like to avoid this feature since I need every query (running on the same datasets) to be executed wi

Re: Union all with a field 'hard coded'

2015-02-19 Thread Lefty Leverenz
Xuefu, I've taken a stab at documenting this in the Union wikidoc (near the end). Would you please review it and make any necessary corrections or additions? Thanks. -- Lefty On Mon, Feb 2, 2015 at 2:02 PM, DU DU wrote: >