Re: Re: The build-in indexes in ORC file does not work.

2016-03-20 Thread Joseph
le number is 800, each of them is about 51M. my query statement is : select count(*) from gprs where terminal_type = 25080; select * from gprs where terminal_type = 25080; In the gprs table, the "terminal_type" column's value is in [0, 25066] Joseph From: Jörn Franke Date: 2016-0

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Jörn Franke
minal_type = 25080; > select * from gprs where terminal_type = 25080; > > In the gprs table, the "terminal_type" column's value is in [0, 25066] > > Joseph > > From: Jörn Franke > Date: 2016-03-16 19:26 > To: Joseph > CC: user; user > Subject: Re

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Jörn Franke
How much data are you querying? What is the query? How selective it is supposed to be? What is the block size? > On 16 Mar 2016, at 11:23, Joseph wrote: > > Hi all, > > I have known that ORC provides three level of indexes within each file, file > level, stripe level, and row level. > The fi

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Gopal Vijayaraghavan
> I love to see these ORC table optimization help but it is not obvious to >me under what circumstances they bare fruit. Are you using Tez or LLAP? Your explain plans are clearly missing the optimizations I've added as part of Stinger.next. https://github.com/apache/hive/blob/master/ql/src/test/

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Mich Talebzadeh
; Sent with Good (www.good.com) > -- > *From:* Joseph > *Sent:* Wednesday, March 16, 2016 9:46:25 AM > *To:* user > *Cc:* user; user > *Subject:* Re: Re: The build-in indexes in ORC file does not work. > > > terminal_type =0, 260,000,000 rows, almost cov

Re: Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Joseph
y way to check the use of stats? Joseph From: Gopal Vijayaraghavan Date: 2016-03-16 22:18 To: user@hive.apache.org CC: Joseph Subject: Re: The build-in indexes in ORC file does not work. > I have tried bloom filter ,but it makes no improvement。I know about > tez, but never use, I will try it

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Mich Talebzadeh
Hi Gopal, I am using Hive 2 on Spark 1.3.1 engine. OK, This is only a test table. What would be the best way to create this table in Hive as ORC format? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: The build-in indexes in ORC file does not work.

2016-03-18 Thread Mich Talebzadeh
I love to see these ORC table optimization help but it is not obvious to me under what circumstances they bare fruit. Case in point. I have an ORC table with 100 Million rows created as follows: CREATE TABLE `dummy`( `id` int, `clustered` int, `scattered` int, `randomised` int, `random_

Re: The build-in indexes in ORC file does not work.

2016-03-18 Thread Gopal Vijayaraghavan
> I have tried bloom filter ,but it makes no improvement。I know about > tez, but never use, I will try it later. ... >select count(*) from gprs where terminal_type=25080; > will not scan data > Time taken: 353.345 seconds CombineInputFormat does not do any split-elimination, so MapRed

Re: The build-in indexes in ORC file does not work.

2016-03-16 Thread Mich Talebzadeh
Hi, The parameters that control the stripe, row group are configurable via the ORC creation script CREATE TABLE dummy ( ID INT , CLUSTERED INT , SCATTERED INT , RANDOMISED INT , RANDOM_STRING VARCHAR(50) , SMALL_VC VARCHAR(10) , PADDING VARCHAR(10) ) CLUSTERED BY (ID) INT