.
Furthermore, orc (or parquet) require that the data is sorted on the filtering
column.
Hive provides also other relevant features, such as partitioning.
Best regards
> On 31. Dec 2017, at 04:28, Sachit Murarka wrote:
>
>
> Hello,
> I have seen some blog saying that Indexing is
Hello,
I have seen some blog saying that Indexing is not recommended , instead we
can use ORC format. Can you please provide suggestion?
I could not see any official declaration.
Kind Regards,
Sachit Murarka
Thank you, I will do that.
B
Subject: Re: Hive indexing optimization
From: jpullokka...@hortonworks.com
To: user@hive.apache.org
Date: Tue, 30 Jun 2015 18:46:50 +
Index doesn’t seems to be kicking in this case.
Please file a bug for this.
Thanks
John
From: Bennie Leo
Reply-To
PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>"
mailto:user@hive.apache.org>>
Subject: RE: Hive indexing optimization
I've attached the output. Thanks.
B
Subject: Re: Hive indexing optimization
From: jpullokka...@horton
I've attached the output. Thanks.
B
Subject: Re: Hive indexing optimization
From: jpullokka...@hortonworks.com
To: user@hive.apache.org
Date: Mon, 29 Jun 2015 19:17:44 +
Could you post explain extended output?
From: Bennie Leo
Reply-To:
:user@hive.apache.org>"
mailto:user@hive.apache.org>>
Subject: RE: Hive indexing optimization
Here is the explain output:
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
Vertices:
Map 1
putFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Thank you,
B
> Subject: Re: Hive indexing optimiz
"SELECT StartIp, EndIp, Country FROM ipv4geotable” should have been
rewritten as a scan against index table.
BitMap Indexes seems to support inequalities (<=, <, >=).
Post the explain plan.
On 6/26/15, 8:56 PM, "Gopal Vijayaraghavan" wrote:
>Hi,
>
>Hive indexes won¹t really help you speed up t
Hi,
Hive indexes won¹t really help you speed up that query right now, because
of the plan it generates due to the <= clauses.
> CREATETABLE ipv4table
> AS
> SELECT logon.IP, ipv4.Country
> FROM
> (SELECT * FROM logontable WHERE isIpv4(IP)) logon
> LEFT OUTER JOIN
> (SELECT StartIp, EndIp, Country
veCompactIndexInputFormat;"
?
I don't know how I could include this within my current query.
Cheers,
B
Subject: Re: Hive indexing optimization
From: jpullokka...@hortonworks.com
To: user@hive.apache.org
Date: Fri, 26 Jun 2015 01:27:21 +
Set hive.optimize.index.filter=tru
e.org<mailto:user@hive.apache.org>"
mailto:user@hive.apache.org>>
Subject: Hive indexing optimization
Hi,
I am attempting to optimize a query using indexing. My current query converts
an ipv4 address to a country using a geolocation table. However, the
geolocation table is fairly large and
Hi,
I am attempting to optimize a query using indexing. My current query converts
an ipv4 address to a country using a geolocation table. However, the
geolocation table is fairly large and the query takes an impractical amount of
time. I have created indexes and set the binary search
Hello,
Is it possible to create an index on table stored as ORC and compressed as
Snappy?
Does it make sense? I am wondering if Hive indexing is a mature functionality?
Thanks,
Alain
Hello,
Is it possible to create an index on table stored as ORC and compressed as
Snappy?
Does it make sense? I am wondering if Hive indexing is a mature functionality?
Thanks,
Alain
Hello,
Is it possible to create an index on table stored as ORC and compressed as
Snappy?
Does it make sense? I am wondering if Hive indexing is a mature functionality?
Thanks,
Alain
Hi,
For large tables, its takes a lot of time to load the indexes in the index
table. Is there any way we can reduce the index load time?
CREATE TABLE SE_TX_SUMMARY (COUNTY string, BLOCKGROUPID string, GROUPING_ID
int) PARTITIONED BY (EXPOSED_TIME int) row format delimited fields
terminated by '
Hi Guys,
We have a Hive 0.12 ORC table that is partitioned on year, month, day, hour
and is bucketed by one column.
So far so good - We are seeing good speed up improvements as compared to
non-ORC format.
- Now we want to add an index on another commonly used column. My
question was - Give
ve a file that contains 10 integer columns stored in
> ORC format. The ORC file is zlib compressed and indexing is enabled.
> I'm running a simple select count(*) with a predicate of the form (Col1 =0 OR
> col2 = 0 etc). The predicate touches all 10 columns but its selectivity is 0
> (
Hi all,
I'm using hive-12. I have a file that contains 10 integer columns stored in
ORC format. The ORC file is zlib compressed and indexing is enabled.
I'm running a simple select count(*) with a predicate of the form (Col1 =0
OR col2 = 0 etc). The predicate touches all 10 colum
Hi,
I am new to Hive, and am trying to setup an index on a Hive table to
improve query performance.
I am presently using the CDH 4.2 Hadoop distribution, which ships with
Hive 0.10, so from what I have read table index support should be
available.
What I am seeing though is that when I go and cre
The stub of an Indexing user doc in the Hive wiki's Language Manual now
includes some simple examples, adapted from the test suite.
Would someone who uses Hive indexes please review it and make any necessary
corrections & additions? For example, I omitted examples of indexes on
pa
I have written a custom index handler and wanted to test it. However hive
is not using it.
So I test with simple table (pokes (int foo, string bar)) which comes with
hive distribution for testing purpose.
Then I created a compact index and set the set
hive.optimize.index.filter=true;
However, upon
I do not have answers to any of your questions, but I appreciate you raising
them. My team is very interested in Hive indexing as well, so I look forward to
this discussion.
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From: John Omernik [mailto:j...@omernik.com]
Sent: Thursday, Jul
I am playing with Hive indexing and a little discouraged by the gap between
the potential seen and the amount of documentation around indexing. I am
running Hive 0.9 and started playing with indexing as follows:
I have a table logs that has a bunch of fields but for this, lets say
three
.ql.index.compact.HiveCompactIndexInputFormat;
> SELECT a, count(*) from t where j='and' group by a;
>
> Since the semantics of this usage make you specify the compact file, I have
> not been able to figure out a way to use multiple indexes in the same query.
> In this ca
In
this case we are using the index on j, the column in the where clause.
I hope you now understand why indexing in Hive is a work in progress:-)
Good luck!
Mark
Mark Grover, Business Intelligence Analyst
OANDA Corporation
www: oanda.com www: fxtrade.com
- Original Message ---
: Zhaojun (Terry)
Subject: Re: Indexing in hive
Ransom,
From this JIRA (https://issues.apache.org/jira/browse/HIVE-1644), it looks like
automatic use of indexes using hive.optimize.index.filter was introduced in
Hive 0.8. However, Ranjith seems to be using Hive 0.7.1 which doesn't support
che.org
Cc: "Zhaojun (Terry)"
Sent: Wednesday, May 16, 2012 8:32:55 PM
Subject: RE: Indexing in hive
“ hive.optimize.index.filter ” is the conf automatically use indexes
If u set hive.optimize.index.groupby = true.
It will set hive.optimize.index.filter =false.
See your configurat
Hi Ranjith,
Hive 0.7 supports the ability to build indexes, but the query compiler in
0.7 doesn't know how to optimize queries with these indexes. Hive 0.8 was
the first release to include some support for optimizing query plans with
indexes, and that only applies to GROUP BY and WHERE clauses und
“hive.optimize.index.filter” is the conf automatically use indexes
If u set hive.optimize.index.groupby = true.
It will set hive.optimize.index.filter=false.
See your configurations.
And you need to build index after create index.
Best regards
Ransom.
I am currently using hive 0.7.1 and creating indexes based on columns in the
where clause. However, when I run the explain plan I do not see the index being
leveraged. The syntax that I am using to build the index is as follows:
CREATE INDEX x ON TABLE t(j)
AS 'org.apache.hadoop.hive.ql.index.c
oject+%3D+HIVE+AND+component+%3D+Indexing+ORDER+BY+priority+DESC&mode=hide
Now that so much work has been contributed in this area, it would be awesome if
someone could take on HIVE-1502 (doc updates).
JVS
On Oct 7, 2011, at 11:30 AM, Avrilia Floratou wrote:
> Hi,
>
> I'd like
Hi,
I'd like to know what's the current status of indexing in hive. What I've
found so far is that the user has to manually set the index table for each
query. Sth like this:
**
insert overwrite directory "/tmp/index_result&
hi, i'm looking at adding indexes to our hive tables, am wondering if anyone
can share some thoughts on it. is there a performance/space trade off
comparison or metrics? obviously it would be costly to index all columns in
your tables, so what types of columns are worth indexing?
thx!
options I need to set up?
Thanks again!
Mark
P.S: I am using Hive 0.7.1 with Hadoop 0.20
On 11-09-15 05:41 PM, yongqiang he wrote:
Question 1:
Indexing should work for both. But i suggest u use block compression.
Question 3 (and perhaps, the most important):
block based compression.
On T
>>Question 1:
Indexing should work for both. But i suggest u use block compression.
>>Question 3 (and perhaps, the most important):
block based compression.
On Thu, Sep 15, 2011 at 2:16 PM, Mark Grover wrote:
> Hi all,
> I've a question regarding compression and indexi
Hi all,
I've a question regarding compression and indexing.
I would like to compress our Hive data (presently present as
SequenceFile). Also, I have an index on this table and would like to
maintain the index as well (i.e. keep using it).
Question 1:
Sequence file compression can be blo
on a side note, i'm looking at adding indexes to our hive tables as well, is
there a performance/space trade off comparison or metrics?
thx!
On Wed, Aug 3, 2011 at 10:52 AM, Siddharth Ramanan <
siddharth.rama...@gmail.com> wrote:
> Hi all,
> I have used compact index for my table and th
Thanks! Can Hive index LZO compressed files then? LZO compression isn't
part of Cloudera's release, right?
On 03/08/2011 19:38, yongqiang he wrote:
unfortunately it does not, because can not split .gz file.
2011/8/3 Martin Konicek:
Hi,
can indexes work on gzipped files?
The index gets build
Hi all,
I have used compact index for my table and the response time is
same for a query with as well as without index now. Previously, it was
showing improvement. I just changed some parameters to increase heap size
and then it is behaving weird. so, how can I make sure that my query is
u
unfortunately it does not, because can not split .gz file.
2011/8/3 Martin Konicek :
> Hi,
>
> can indexes work on gzipped files?
>
> The index gets build without errors using
> ALTER INDEX syslog_index ON syslog PARTITION(dt='2011-08-03') REBUILD;
>
> but when querying, no results are returned (a
Hi,
can indexes work on gzipped files?
The index gets build without errors using
ALTER INDEX syslog_index ON syslog PARTITION(dt='2011-08-03') REBUILD;
but when querying, no results are returned (and no errors reported). The
query should be correct because with plaintext files it works.
Bes
The reduce percentage keeps fluctuating when, the alter index command is
being keyed. The logs just give " out of memory error " after tweaking some
properties, the earlier exceptions doesn't appear now. Can anyone guide me
here? I have increased the heap space upto 4gb.. still, getting the same
ex
Hi,
I am adding the log information for a reduce task. I am running hadoop
in standalone mode.
2011-07-28 19:16:42,621 ERROR
org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during JDBC
connection to jdbc:derby:;databaseName=TempStatsStore;create=true.
java.lang.ClassNotFoundExc
Hi,
I have a table, which has close to a billion rows.. I am trying to
create an index for the table, when I do the alter command, I always end up
with map-reduce jobs with errors. The same runs fine for small tables
though, I also notice that the number of reducers are set to 24, even if set
Hi all,
I tried to index the lzo file but got the following error while indexing the
lzo file :
java.lang.ClassCastException:
com.hadoop.compression.lzo.LzopCodec$LzopDecompressor cannot be cast to
com.hadoop.compression.lzo.LzopDecompressor
at
46 matches
Mail list logo