Re: understanding execution semantics...

ashish tapdiya Tue, 08 Dec 2015 21:34:35 -0800

Hello James,

Thanks for the response. I will look into adding relevant TPC-H indexes.


We are doing a performance study comparing Impala, Phoenix using TPC-H
queries. For scale factor 1, I am not able to get numbers exhibiting trend
similar to performance graph on our website comparing Impala and Phoenix
.

I have few questions regarding the performance graph comparing Phoenix and
Impala for aggregation query (select count(1))

Q1: Was the Impala data stored in parquet/text format?

Q2: What and how performance parameters (hadoop, hbase) were tuned for
phoenix.

Q3. Can I get the same population script so that I can report numbers from
the local cluster.

Thanks,
Ashish

On Thu, Dec 3, 2015 at 5:35 PM, James Taylor <jamestay...@apache.org> wrote:

> Hi Ashish,
> Please make sure to add a secondary index on l_shipdate + l_discount + 
> l_quantity
> for that query. There's a kind of well-known, canonical description of the
> optimal indexes to add for the TPC benchmarks that I can't seem to find
> (but maybe someone else can point you to it). You should include the time
> to call resultset.next() as that call will block until the result is
> calculated. Also, subsequent runs of the query will always be faster due to
> various caching that happens (OS, HBase block cache, etc.) as well as the
> JIT compiler kicking in.
>
> Thanks,
> James
>
> On Thu, Dec 3, 2015 at 12:34 PM, ashish tapdiya <ashishtapd...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am profiling TPC-H queries using Phoenix. For query no. 6 and db size
>> 1GB (lineitem table size is around 760 MB),
>>
>> Query 6 : select sum(l_extendedprice * l_discount) as revenue from
>> lineitem_sf1 where l_shipdate >= TO_DATE('1993-01-01') and l_shipdate <
>> TO_DATE('1994-01-01') and l_discount between 0.06 and 0.08 and l_quantity <
>> 24
>>
>> the execution time recorded using following code:
>>
>> long startTime = System.currentTimeMillis();
>> rset = stmt.executeQuery();
>> long stopTime = System.currentTimeMillis();
>> long elapsedTime = stopTime - startTime;
>>
>> first run - 240 ms
>> second run onwards - 80 ms
>>
>> However, when I iterate the result set (single row), query response time
>> including result set iteration shoots up to 19 seconds.
>>
>> Does the query gets executed with stmt.executeQuery() or phoenix does not
>> execute query until first resultset.next() is invoked?
>>
>> Cluster includes 4 slaves nodes. Phoenix version is 4.3.0
>>
>> Thanks,
>> ~Ashish
>>
>
>

Re: understanding execution semantics...

Reply via email to