cion is that ORC generates wrong splits because of this bug
> https://issues.apache.org/jira/browse/HIVE-6326. I will try to reproduce
> your scenario and see if I hit similar issue.
>
> Thanks
> Prasanth Jayachandran
>
> On Feb 10, 2014, at 1:46 PM, Avrilia Floratou
> wrote:
>
> Hi Pra
3) Which hive version are you using?
> 4) What query are you using?
>
> Thanks
> Prasanth Jayachandran
>
> On Feb 10, 2014, at 1:26 PM, Avrilia Floratou
> wrote:
>
> Hi Prasanth,
>
> No it's not a partitioned table. The table consists of only one file of
>
le (512MB on
> HDFS block boundary + remaining 3MB). This happens when the input format is
> set to HiveInputFormat.
>
> Thanks
> Prasanth Jayachandran
>
> On Feb 10, 2014, at 12:49 AM, Avrilia Floratou
> wrote:
>
> > Hi all,
> >
> > I'm running a
Hi all,
I'm running a query that scans a file stored in ORC format and extracts
some columns. My file is about 92 GB, uncompressed. I kept the default
stripe size. The MapReduce job generates 363 map tasks.
I have noticed that the first 180 map tasks finish in 3 secs (each) and
after they complet
Hi,
I'm running hive 0.12 on yarn and I'm trying to convert a common join into
a map join. My map join fails
and from the logs I can see that the memory limit is very low:
Starting to launch local task to process map join; maximum memory =
514523136
How can I increase the maximum memory?
I'
Hi all,
I'm using Hive 0.12 and running some experiments with the ORC file. The
hdfs block size is 128MB and I was wondering what is the best stripe size
to use. The default one (250MB) is larger than the block size. Is each
stripe splittable or in this case each map task will have to access data
ion functions, you will see the optimized plan. Also,
> another kind of cases that the correlation optimizer does not optimize
> right now is that a table is used in multiple MR jobs but rows in this
> table are shuffled in different ways.
>
> Thanks,
>
> Yin
>
>
> On Tu
Hi,
I'm running TPCH query 21 on Hive. 0.12 and have enabled
hive.optimize.correlation.
I could see the effect of the correlation optimizer on query 17 but when
running query 21 I don't actually see the optimizer being used. I used the
publicly available tpc-h queries for hive and merged all the i
Hello,
I'd like to run a few TPC-H queries on Hive 0.12. I've found the TPC-H
scripts here:
https://issues.apache.org/jira/browse/HIVE-600.
but noticed that these scripts were generated a long time ago. Since Hive
could not support full SQL-92 specification some queries were split into
smaller s
Hi all,
I'm using hive-12. I have a file that contains 10 integer columns stored in
ORC format. The ORC file is zlib compressed and indexing is enabled.
I'm running a simple select count(*) with a predicate of the form (Col1 =0
OR col2 = 0 etc). The predicate touches all 10 columns but its selecti
Hi all,
Does anyone know if Hive 0.7 or 0.8 can work with Hadoop 0.21.0 or 0.22.0?
Thanks,
Avrilia
Hi,
I have a question related to the hadoop counters when RCFile is used.
I have 16TB of (uncompressed) data stored in compressed RCFile format. The size
of the compressed RCFile is approximately 3 TB.
I ran a simple scan query on this table. Each split is 256 MB (HDFS block
size).
From the co
would come to a normal join only after the map join attempt fails.
> AFAIK, if the number of buckets are same or multiples between the two tables
> involved in a join and if the join is on the same columns that are bucketed,
> with bucketmapjoin enabled it shouldn't execute a p
Hi,
I have two tables with 8 buckets each on the same key and want to join them.
I ran "explain extended" and get the plan produced by HIVE which shows that a
map-side join is a possible plan.
I then set in my script the hive.optimize.bucketmapjoin option to true and
reran the "explain extended
e average of the whole
> data then you can use sampling to avoid scanning the whole table. In
> Partitions use Dynamic Partitions to load data from the source table into
> the target table on partitions on the fly.
>
>
> Hope it helps!..
>
> Regards
> Bejoy.K.S
>
>
>
&g
Hello,
I have a question regarding the execution of some queries on bucketed tables.
I've created a compressed bucketed table using the following statement:
create external table partRC (P_PARTKEY BIGINT,P_NAME STRING, P_MFGR
STRING, P_BRAND STRING, P_TYPE STRING, P_SIZE INT, P_CONTAINER STRING,
Hi,
I'd like to know what's the current status of indexing in hive. What I've
found so far is that the user has to manually set the index table for each
query. Sth like this:
**
insert overwrite directory "/tmp/index_result" select `_bucketname`
Hi,
I want to convert data stored in a hadoop sequence file to
BytesRefArrayWritable so that I can use RCFileOutputFormat and create an
RCFile.
My data contains integers,strings and hashmaps. I guess I don't have to
write my own serializer/deserializer for these. I tried using the
ColumnarS
18 matches
Mail list logo