from:"Avrilia Floratou"

Re: ORC file question

2014-02-10 Thread Avrilia Floratou

cion is that ORC generates wrong splits because of this bug > https://issues.apache.org/jira/browse/HIVE-6326. I will try to reproduce > your scenario and see if I hit similar issue. > > Thanks > Prasanth Jayachandran > > On Feb 10, 2014, at 1:46 PM, Avrilia Floratou > wrote: > > Hi Pra

Re: ORC file question

2014-02-10 Thread Avrilia Floratou

3) Which hive version are you using? > 4) What query are you using? > > Thanks > Prasanth Jayachandran > > On Feb 10, 2014, at 1:26 PM, Avrilia Floratou > wrote: > > Hi Prasanth, > > No it's not a partitioned table. The table consists of only one file of >

Re: ORC file question

2014-02-10 Thread Avrilia Floratou

le (512MB on > HDFS block boundary + remaining 3MB). This happens when the input format is > set to HiveInputFormat. > > Thanks > Prasanth Jayachandran > > On Feb 10, 2014, at 12:49 AM, Avrilia Floratou > wrote: > > > Hi all, > > > > I'm running a

ORC file question

2014-02-10 Thread Avrilia Floratou

Hi all, I'm running a query that scans a file stored in ORC format and extracts some columns. My file is about 92 GB, uncompressed. I kept the default stripe size. The MapReduce job generates 363 map tasks. I have noticed that the first 180 map tasks finish in 3 secs (each) and after they complet

Map-side join memory limit is too low

2014-01-31 Thread Avrilia Floratou

Hi, I'm running hive 0.12 on yarn and I'm trying to convert a common join into a map join. My map join fails and from the logs I can see that the memory limit is very low: Starting to launch local task to process map join; maximum memory = 514523136 How can I increase the maximum memory? I'

ORC file tuning

2013-12-29 Thread Avrilia Floratou

Hi all, I'm using Hive 0.12 and running some experiments with the ORC file. The hdfs block size is 128MB and I was wondering what is the best stripe size to use. The default one (250MB) is larger than the block size. Is each stripe splittable or in this case each map task will have to access data

Re: Question on correlation optimizer

2013-12-10 Thread Avrilia Floratou

ion functions, you will see the optimized plan. Also, > another kind of cases that the correlation optimizer does not optimize > right now is that a table is used in multiple MR jobs but rows in this > table are shuffled in different ways. > > Thanks, > > Yin > > > On Tu

Question on correlation optimizer

2013-12-10 Thread Avrilia Floratou

Hi, I'm running TPCH query 21 on Hive. 0.12 and have enabled hive.optimize.correlation. I could see the effect of the correlation optimizer on query 17 but when running query 21 I don't actually see the optimizer being used. I used the publicly available tpc-h queries for hive and merged all the i

TPC-H queries on Hive 0.12

2013-11-22 Thread Avrilia Floratou

Hello, I'd like to run a few TPC-H queries on Hive 0.12. I've found the TPC-H scripts here: https://issues.apache.org/jira/browse/HIVE-600. but noticed that these scripts were generated a long time ago. Since Hive could not support full SQL-92 specification some queries were split into smaller s

Predicate pushdown/indexing on ORC file

2013-11-07 Thread Avrilia Floratou

Hi all, I'm using hive-12. I have a file that contains 10 integer columns stored in ORC format. The ORC file is zlib compressed and indexing is enabled. I'm running a simple select count(*) with a predicate of the form (Col1 =0 OR col2 = 0 etc). The predicate touches all 10 columns but its selecti

Hive-Hadoop compatibility

2012-05-09 Thread Avrilia Floratou

Hi all, Does anyone know if Hive 0.7 or 0.8 can work with Hadoop 0.21.0 or 0.22.0? Thanks, Avrilia

RCFile and Hadoop Counters

2012-01-31 Thread Avrilia Floratou

Hi, I have a question related to the hadoop counters when RCFile is used. I have 16TB of (uncompressed) data stored in compressed RCFile format. The size of the compressed RCFile is approximately 3 TB. I ran a simple scan query on this table. Each split is 256 MB (HDFS block size). From the co

Re: Question on bucketed map join

2012-01-24 Thread Avrilia Floratou

would come to a normal join only after the map join attempt fails. > AFAIK, if the number of buckets are same or multiples between the two tables > involved in a join and if the join is on the same columns that are bucketed, > with bucketmapjoin enabled it shouldn't execute a p

Question on bucketed map join

2012-01-19 Thread Avrilia Floratou

Hi, I have two tables with 8 buckets each on the same key and want to join them. I ran "explain extended" and get the plan produced by HIVE which shows that a map-side join is a possible plan. I then set in my script the hive.optimize.bucketmapjoin option to true and reran the "explain extended

Re: Problem with query on bucketed table

2011-10-09 Thread Avrilia Floratou

e average of the whole > data then you can use sampling to avoid scanning the whole table. In > Partitions use Dynamic Partitions to load data from the source table into > the target table on partitions on the fly. > > > Hope it helps!.. > > Regards > Bejoy.K.S > > > &g

Problem with query on bucketed table

2011-10-09 Thread Avrilia Floratou

Hello, I have a question regarding the execution of some queries on bucketed tables. I've created a compressed bucketed table using the following statement: create external table partRC (P_PARTKEY BIGINT,P_NAME STRING, P_MFGR STRING, P_BRAND STRING, P_TYPE STRING, P_SIZE INT, P_CONTAINER STRING,

Indexing

2011-10-07 Thread Avrilia Floratou

Hi, I'd like to know what's the current status of indexing in hive. What I've found so far is that the user has to manually set the index table for each query. Sth like this: ** insert overwrite directory "/tmp/index_result" select `_bucketname`

Convert data to BytesRefArrayWritable

2010-11-13 Thread Avrilia Floratou

Hi, I want to convert data stored in a hadoop sequence file to BytesRefArrayWritable so that I can use RCFileOutputFormat and create an RCFile. My data contains integers,strings and hashmaps. I guess I don't have to write my own serializer/deserializer for these. I tried using the ColumnarS

Re: ORC file question

Re: ORC file question

Re: ORC file question

ORC file question

Map-side join memory limit is too low

ORC file tuning

Re: Question on correlation optimizer

Question on correlation optimizer

TPC-H queries on Hive 0.12

Predicate pushdown/indexing on ORC file

Hive-Hadoop compatibility

RCFile and Hadoop Counters

Re: Question on bucketed map join

Question on bucketed map join

Re: Problem with query on bucketed table

Problem with query on bucketed table

Indexing

Convert data to BytesRefArrayWritable

18 matches

Site Navigation

Mail list logo

Footer information