On Tue, Dec 8, 2015 at 9:33 PM, ashish tapdiya <ashishtapd...@gmail.com> wrote:
> Hello James, > > Thanks for the response. I will look into adding relevant TPC-H indexes. > > We are doing a performance study comparing Impala, Phoenix using TPC-H > queries. For scale factor 1, I am not able to get numbers exhibiting trend > similar to performance graph on our website comparing Impala and Phoenix > . > > I have few questions regarding the performance graph comparing Phoenix and > Impala for aggregation query (select count(1)) > > Q1: Was the Impala data stored in parquet/text format? > No, the data is stored in HBase for both Impala and Phoenix as stated by the title "Phoenix vs Impala (running over HBase)" > Q2: What and how performance parameters (hadoop, hbase) were tuned for > phoenix. > No tuning, but I'd recommend doing a major compaction prior to running the queries. > Q3. Can I get the same population script so that I can report numbers from > the local cluster. > You can use our bin/performance.py script to generate the data. > > Thanks, > Ashish > > On Thu, Dec 3, 2015 at 5:35 PM, James Taylor <jamestay...@apache.org> > wrote: > >> Hi Ashish, >> Please make sure to add a secondary index on l_shipdate + l_discount + >> l_quantity >> for that query. There's a kind of well-known, canonical description of the >> optimal indexes to add for the TPC benchmarks that I can't seem to find >> (but maybe someone else can point you to it). You should include the >> time to call resultset.next() as that call will block until the result is >> calculated. Also, subsequent runs of the query will always be faster due to >> various caching that happens (OS, HBase block cache, etc.) as well as the >> JIT compiler kicking in. >> >> Thanks, >> James >> >> On Thu, Dec 3, 2015 at 12:34 PM, ashish tapdiya <ashishtapd...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I am profiling TPC-H queries using Phoenix. For query no. 6 and db size >>> 1GB (lineitem table size is around 760 MB), >>> >>> Query 6 : select sum(l_extendedprice * l_discount) as revenue from >>> lineitem_sf1 where l_shipdate >= TO_DATE('1993-01-01') and l_shipdate < >>> TO_DATE('1994-01-01') and l_discount between 0.06 and 0.08 and l_quantity < >>> 24 >>> >>> the execution time recorded using following code: >>> >>> long startTime = System.currentTimeMillis(); >>> rset = stmt.executeQuery(); >>> long stopTime = System.currentTimeMillis(); >>> long elapsedTime = stopTime - startTime; >>> >>> first run - 240 ms >>> second run onwards - 80 ms >>> >>> However, when I iterate the result set (single row), query response time >>> including result set iteration shoots up to 19 seconds. >>> >>> Does the query gets executed with stmt.executeQuery() or phoenix does >>> not execute query until first resultset.next() is invoked? >>> >>> Cluster includes 4 slaves nodes. Phoenix version is 4.3.0 >>> >>> Thanks, >>> ~Ashish >>> >> >> >