Re: Indexing

2017-12-30 Thread Jörn Franke
. Furthermore, orc (or parquet) require that the data is sorted on the filtering column. Hive provides also other relevant features, such as partitioning. Best regards > On 31. Dec 2017, at 04:28, Sachit Murarka wrote: > > > Hello, > I have seen some blog saying that Indexing is

Indexing

2017-12-30 Thread Sachit Murarka
Hello, I have seen some blog saying that Indexing is not recommended , instead we can use ORC format. Can you please provide suggestion? I could not see any official declaration. Kind Regards, Sachit Murarka

RE: Hive indexing optimization

2015-06-30 Thread Bennie Leo
Thank you, I will do that. B Subject: Re: Hive indexing optimization From: jpullokka...@hortonworks.com To: user@hive.apache.org Date: Tue, 30 Jun 2015 18:46:50 + Index doesn’t seems to be kicking in this case. Please file a bug for this. Thanks John From: Bennie Leo Reply-To

Re: Hive indexing optimization

2015-06-30 Thread John Pullokkaran
PM To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: RE: Hive indexing optimization I've attached the output. Thanks. B Subject: Re: Hive indexing optimization From: jpullokka...@horton

RE: Hive indexing optimization

2015-06-29 Thread Bennie Leo
I've attached the output. Thanks. B Subject: Re: Hive indexing optimization From: jpullokka...@hortonworks.com To: user@hive.apache.org Date: Mon, 29 Jun 2015 19:17:44 + Could you post explain extended output? From: Bennie Leo Reply-To:

Re: Hive indexing optimization

2015-06-29 Thread John Pullokkaran
:user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: RE: Hive indexing optimization Here is the explain output: STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE) Vertices: Map 1

RE: Hive indexing optimization

2015-06-29 Thread Bennie Leo
putFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Thank you, B > Subject: Re: Hive indexing optimiz

Re: Hive indexing optimization

2015-06-27 Thread John Pullokkaran
"SELECT StartIp, EndIp, Country FROM ipv4geotable” should have been rewritten as a scan against index table. BitMap Indexes seems to support inequalities (<=, <, >=). Post the explain plan. On 6/26/15, 8:56 PM, "Gopal Vijayaraghavan" wrote: >Hi, > >Hive indexes won¹t really help you speed up t

Re: Hive indexing optimization

2015-06-26 Thread Gopal Vijayaraghavan
Hi, Hive indexes won¹t really help you speed up that query right now, because of the plan it generates due to the <= clauses. > CREATETABLE ipv4table > AS > SELECT logon.IP, ipv4.Country > FROM > (SELECT * FROM logontable WHERE isIpv4(IP)) logon > LEFT OUTER JOIN > (SELECT StartIp, EndIp, Country

RE: Hive indexing optimization

2015-06-26 Thread Bennie Leo
veCompactIndexInputFormat;" ? I don't know how I could include this within my current query. Cheers, B Subject: Re: Hive indexing optimization From: jpullokka...@hortonworks.com To: user@hive.apache.org Date: Fri, 26 Jun 2015 01:27:21 + Set hive.optimize.index.filter=tru

Re: Hive indexing optimization

2015-06-25 Thread John Pullokkaran
e.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: Hive indexing optimization Hi, I am attempting to optimize a query using indexing. My current query converts an ipv4 address to a country using a geolocation table. However, the geolocation table is fairly large and

Hive indexing optimization

2015-06-25 Thread Bennie Leo
Hi, I am attempting to optimize a query using indexing. My current query converts an ipv4 address to a country using a geolocation table. However, the geolocation table is fairly large and the query takes an impractical amount of time. I have created indexes and set the binary search

Hive Indexing and ORC

2014-09-06 Thread Alain Petrus
Hello, Is it possible to create an index on table stored as ORC and compressed as Snappy? Does it make sense? I am wondering if Hive indexing is a mature functionality? Thanks, Alain

Hive Indexing and ORC

2014-09-06 Thread Alain Petrus
Hello, Is it possible to create an index on table stored as ORC and compressed as Snappy? Does it make sense? I am wondering if Hive indexing is a mature functionality? Thanks, Alain

Hive Indexing and ORC

2014-09-06 Thread Alain Petrus
Hello, Is it possible to create an index on table stored as ORC and compressed as Snappy? Does it make sense? I am wondering if Hive indexing is a mature functionality? Thanks, Alain

Indexing in Hive

2014-04-18 Thread saquib khan
Hi, For large tables, its takes a lot of time to load the indexes in the index table. Is there any way we can reduce the index load time? CREATE TABLE SE_TX_SUMMARY (COUNTY string, BLOCKGROUPID string, GROUPING_ID int) PARTITIONED BY (EXPOSED_TIME int) row format delimited fields terminated by '

Indexing in Hive 0.12 on a partitioned and bucketed table

2014-03-20 Thread Sagar Mehta
Hi Guys, We have a Hive 0.12 ORC table that is partitioned on year, month, day, hour and is bucketed by one column. So far so good - We are seeing good speed up improvements as compared to non-ORC format. - Now we want to add an index on another commonly used column. My question was - Give

Re: Predicate pushdown/indexing on ORC file

2013-11-07 Thread Prasanth Jayachandran
ve a file that contains 10 integer columns stored in > ORC format. The ORC file is zlib compressed and indexing is enabled. > I'm running a simple select count(*) with a predicate of the form (Col1 =0 OR > col2 = 0 etc). The predicate touches all 10 columns but its selectivity is 0 > (

Predicate pushdown/indexing on ORC file

2013-11-07 Thread Avrilia Floratou
Hi all, I'm using hive-12. I have a file that contains 10 integer columns stored in ORC format. The ORC file is zlib compressed and indexing is enabled. I'm running a simple select count(*) with a predicate of the form (Col1 =0 OR col2 = 0 etc). The predicate touches all 10 colum

Help me understand Hive indexing.

2013-11-06 Thread Heller, Chris
Hi, I am new to Hive, and am trying to setup an index on a Hive table to improve query performance. I am presently using the CDH 4.2 Hadoop distribution, which ships with Hive 0.10, so from what I have read table index support should be available. What I am seeing though is that when I go and cre

Review & improvement request: Hive indexing doc

2013-06-28 Thread Lefty Leverenz
The stub of an Indexing user doc in the Hive wiki's Language Manual now includes some simple examples, adapted from the test suite. Would someone who uses Hive indexes please review it and make any necessary corrections & additions? For example, I omitted examples of indexes on pa

Problem with indexing in Hive

2012-07-26 Thread Ablimit Aji
I have written a custom index handler and wanted to test it. However hive is not using it. So I test with simple table (pokes (int foo, string bar)) which comes with hive distribution for testing purpose. Then I created a compact index and set the set hive.optimize.index.filter=true; However, upon

RE: Hive 0.9 and Indexing

2012-07-26 Thread Connell, Chuck
I do not have answers to any of your questions, but I appreciate you raising them. My team is very interested in Hive indexing as well, so I look forward to this discussion. Chuck Connell Nuance R&D Data Team Burlington, MA From: John Omernik [mailto:j...@omernik.com] Sent: Thursday, Jul

Hive 0.9 and Indexing

2012-07-26 Thread John Omernik
I am playing with Hive indexing and a little discouraged by the gap between the potential seen and the amount of documentation around indexing. I am running Hive 0.9 and started playing with indexing as follows: I have a table logs that has a bunch of fields but for this, lets say three

Re: Indexing in hive

2012-05-16 Thread Ranjith
.ql.index.compact.HiveCompactIndexInputFormat; > SELECT a, count(*) from t where j='and' group by a; > > Since the semantics of this usage make you specify the compact file, I have > not been able to figure out a way to use multiple indexes in the same query. > In this ca

Re: Indexing in hive

2012-05-16 Thread Mark Grover
In this case we are using the index on j, the column in the where clause. I hope you now understand why indexing in Hive is a work in progress:-) Good luck! Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com - Original Message ---

Re: Indexing in hive

2012-05-16 Thread Raghunath, Ranjith
: Zhaojun (Terry) Subject: Re: Indexing in hive Ransom, From this JIRA (https://issues.apache.org/jira/browse/HIVE-1644), it looks like automatic use of indexes using hive.optimize.index.filter was introduced in Hive 0.8. However, Ranjith seems to be using Hive 0.7.1 which doesn't support

Re: Indexing in hive

2012-05-16 Thread Mark Grover
che.org Cc: "Zhaojun (Terry)" Sent: Wednesday, May 16, 2012 8:32:55 PM Subject: RE: Indexing in hive “ hive.optimize.index.filter ” is the conf automatically use indexes If u set hive.optimize.index.groupby = true. It will set hive.optimize.index.filter =false. See your configurat

Re: Indexing in hive

2012-05-16 Thread Carl Steinbach
Hi Ranjith, Hive 0.7 supports the ability to build indexes, but the query compiler in 0.7 doesn't know how to optimize queries with these indexes. Hive 0.8 was the first release to include some support for optimizing query plans with indexes, and that only applies to GROUP BY and WHERE clauses und

RE: Indexing in hive

2012-05-16 Thread Hezhiqiang (Ransom)
“hive.optimize.index.filter” is the conf automatically use indexes If u set hive.optimize.index.groupby = true. It will set hive.optimize.index.filter=false. See your configurations. And you need to build index after create index. Best regards Ransom.

Indexing in hive

2012-05-16 Thread Raghunath, Ranjith
I am currently using hive 0.7.1 and creating indexes based on columns in the where clause. However, when I run the explain plan I do not see the index being leveraged. The syntax that I am using to build the index is as follows: CREATE INDEX x ON TABLE t(j) AS 'org.apache.hadoop.hive.ql.index.c

Re: Indexing

2011-10-10 Thread John Sichi
oject+%3D+HIVE+AND+component+%3D+Indexing+ORDER+BY+priority+DESC&mode=hide Now that so much work has been contributed in this area, it would be awesome if someone could take on HIVE-1502 (doc updates). JVS On Oct 7, 2011, at 11:30 AM, Avrilia Floratou wrote: > Hi, > > I'd like

Indexing

2011-10-07 Thread Avrilia Floratou
Hi, I'd like to know what's the current status of indexing in hive. What I've found so far is that the user has to manually set the index table for each query. Sth like this: ** insert overwrite directory "/tmp/index_result&

hive indexing

2011-09-30 Thread Shouguo Li
hi, i'm looking at adding indexes to our hive tables, am wondering if anyone can share some thoughts on it. is there a performance/space trade off comparison or metrics? obviously it would be costly to index all columns in your tables, so what types of columns are worth indexing? thx!

Re: Compression and Indexing

2011-09-16 Thread Mark Grover
options I need to set up? Thanks again! Mark P.S: I am using Hive 0.7.1 with Hadoop 0.20 On 11-09-15 05:41 PM, yongqiang he wrote: Question 1: Indexing should work for both. But i suggest u use block compression. Question 3 (and perhaps, the most important): block based compression. On T

Re: Compression and Indexing

2011-09-15 Thread yongqiang he
>>Question 1: Indexing should work for both. But i suggest u use block compression. >>Question 3 (and perhaps, the most important): block based compression. On Thu, Sep 15, 2011 at 2:16 PM, Mark Grover wrote: > Hi all, > I've a question regarding compression and indexi

Compression and Indexing

2011-09-15 Thread Mark Grover
Hi all, I've a question regarding compression and indexing. I would like to compress our Hive data (presently present as SequenceFile). Also, I have an index on this table and would like to maintain the index as well (i.e. keep using it). Question 1: Sequence file compression can be blo

Re: Indexing Help

2011-08-05 Thread Shouguo Li
on a side note, i'm looking at adding indexes to our hive tables as well, is there a performance/space trade off comparison or metrics? thx! On Wed, Aug 3, 2011 at 10:52 AM, Siddharth Ramanan < siddharth.rama...@gmail.com> wrote: > Hi all, > I have used compact index for my table and th

Re: Indexing .gz files

2011-08-03 Thread Martin Konicek
Thanks! Can Hive index LZO compressed files then? LZO compression isn't part of Cloudera's release, right? On 03/08/2011 19:38, yongqiang he wrote: unfortunately it does not, because can not split .gz file. 2011/8/3 Martin Konicek: Hi, can indexes work on gzipped files? The index gets build

Indexing Help

2011-08-03 Thread Siddharth Ramanan
Hi all, I have used compact index for my table and the response time is same for a query with as well as without index now. Previously, it was showing improvement. I just changed some parameters to increase heap size and then it is behaving weird. so, how can I make sure that my query is u

Re: Indexing .gz files

2011-08-03 Thread yongqiang he
unfortunately it does not, because can not split .gz file. 2011/8/3 Martin Konicek : > Hi, > > can indexes work on gzipped files? > > The index gets build without errors using > ALTER INDEX syslog_index ON syslog PARTITION(dt='2011-08-03') REBUILD; > > but when querying, no results are returned (a

Indexing .gz files

2011-08-03 Thread Martin Konicek
Hi, can indexes work on gzipped files? The index gets build without errors using ALTER INDEX syslog_index ON syslog PARTITION(dt='2011-08-03') REBUILD; but when querying, no results are returned (and no errors reported). The query should be correct because with plaintext files it works. Bes

Re: Indexing help

2011-08-01 Thread Siddharth Ramanan
The reduce percentage keeps fluctuating when, the alter index command is being keyed. The logs just give " out of memory error " after tweaking some properties, the earlier exceptions doesn't appear now. Can anyone guide me here? I have increased the heap space upto 4gb.. still, getting the same ex

Re: Indexing help

2011-07-28 Thread Siddharth Ramanan
Hi, I am adding the log information for a reduce task. I am running hadoop in standalone mode. 2011-07-28 19:16:42,621 ERROR org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during JDBC connection to jdbc:derby:;databaseName=TempStatsStore;create=true. java.lang.ClassNotFoundExc

Indexing help

2011-07-28 Thread Siddharth Ramanan
Hi, I have a table, which has close to a billion rows.. I am trying to create an index for the table, when I do the alter command, I always end up with map-reduce jobs with errors. The same runs fine for small tables though, I also notice that the number of reducers are set to 24, even if set

Error while indexing the LZO file

2011-07-27 Thread Ankit Jain
Hi all, I tried to index the lzo file but got the following error while indexing the lzo file : java.lang.ClassCastException: com.hadoop.compression.lzo.LzopCodec$LzopDecompressor cannot be cast to com.hadoop.compression.lzo.LzopDecompressor at