James, Schema is pretty simple, I guess. Here it is (I have renamed some actual column names)
TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULLABLE | COLUMN_DEF | SQ | +------------+-------------+------------+-------------+------------+------------+-------------+---------------+----------------+----------------+------------+------------+----+ | null | null | table_name | TIMESTAMP | -5 | BIGINT | null | null | null | null | 0 | null | | null | null | table_name | ID | 12 | VARCHAR | 255 | null | null | null | 0 | null | | null | null | table_name | TEXT_FIELD | 12 | VARCHAR | 255 | null | null | null | 1 | null | | null | null | table_name | USER_ID | 12 | VARCHAR | 255 | null | null | null | 0 | null | | null | null | table_name | TEXT_FIELD | 12 | VARCHAR | 25523 | null | null | null | 1 | null | | null | null | table_name | TYPE | 12 | VARCHAR | 255 | null | null | null | 1 | null | | null | null | table_name | COUNT_1 | 4 | INTEGER | null | null | null | null | 1 | null | | null | null | table_name | COUNT_2 | 4 | INTEGER | null | null | null | null | 1 | null | | null | null | table_name | COUNT_3 | 4 | INTEGER | null | null | null | null | 1 | null | | null | null | table_name | COUNT_4 | -5 | BIGINT | null | null | null | null | 1 | null | | null | null | table_name | COUNT_5 | -5 | BIGINT | null | null | null | null | 1 | null | | null | null | table_name | COUNT_6 | -5 | BIGINT | null | null | null | null | 1 | null | | null | null | table_name | TAGS | 2003 | VARCHAR_ARRAY | null | null | null | null | 1 | null | | null | null | table_name | UPDATED | -5 | BIGINT | null | null | null | null | 1 | null | | null | null | table_name | SOME_FIELD | 12 | VARCHAR | 255 | null | null | null | 1 | null | | null | null | table_name | LOCATIONS | 12 | VARCHAR | 255 | null | null | null | 1 | null | +------------+-------------+------------+-------------+------------+------------+-------------+---------------+----------------+----------------+------------+------------+----+ *Query:* SELECT USER_ID FROM HJK_SI_LEAD_FEED WHERE ID='507449491025170432'; On Sat, Sep 6, 2014 at 10:15 AM, James Taylor <jamestay...@apache.org> wrote: > Vikas, > Please post your schema and query. > Thanks, > James > > On Fri, Sep 5, 2014 at 9:18 PM, Vikas Agarwal <vi...@infoobjects.com> > wrote: > > Ours is also a single node setup right now and as of now there are less > than > > 1 million rows which is expected to grow around 100m at minimum. > > > > I am aware of secondary indexes but when I am querying on primary/row > key, > > why would it take so much time? > > > > I am directly querying using sqlline for Phoenix and hbase shell for > HBase > > query. I am not expecting to do any fine tuning for such small dataset. > I am > > assumimg a minimum performance level out of the box. > > > > On Friday, September 5, 2014, yeshwanth kumar <yeshwant...@gmail.com> > wrote: > >> > >> hi vikas, > >> > >> we used phoenix on a 4 core/23Gb machine, as a single node setup. > >> used HDP 2.1 > >> our table has 50-70M rows, > >> select on that table took less than 2 seconds. > >> Aggregation queries took less than 8 seconds. > >> for achieving good performance we created secondary index on the table. > >> > >> make sure you finetuned hbase, > >> enabling compression on the data makes a difference in response. > >> if u distribute the data and load over all regions in hbase, > >> look at the performance tips mentioned in phoenix blog > >> > >> -yeshwanth > >> > >> > >> > >> Cheers, > >> Yeshwanth > >> > >> > >> > >> On Fri, Sep 5, 2014 at 5:42 PM, Vikas Agarwal <vi...@infoobjects.com> > >> wrote: > >>> > >>> Hi, > >>> > >>> Preface: We are testing phoenix using Hortonworks distribution for > HBase > >>> on Amazon EC2 instance (r3.large, 2 CPU/15 GB RAM). > >>> > >>> With contrast to performance benchmarks, I found Phoenix to be very > slow > >>> in querying even on primary key or row key. So, tried to increase the > RAM > >>> for HBase and Phoenix and increasing the CPU and RAM by upgrading the > EC2 > >>> machine type to r3.xlarge (4 CPU, 30 GB RAM). Results were like this: > >>> > >>> Time takes in returning result of query on row key: > >>> With Storm running and very less RAM available: 50 sec > >>> > >>> With Storm stopped and RAM available to Phoenix and HBase: 18 sec > >>> > >>> With new machine of next higher category (4 CPU and 30 GB RAM): 8 sec > >>> > >>> Pure HBase query by row key with Storm stopped and (2 CPU, 15 GB RAM): > >>> 0.0150 seconds. :) > >>> > >>> So, the difference seems to be many fold of what native HBase is > >>> providing to us. I am not able to understand how it can be possible? > What I > >>> am missing here? > >>> > >>> -- > >>> Regards, > >>> Vikas Agarwal > >>> 91 – 9928301411 > >>> > >>> InfoObjects, Inc. > >>> Execution Matters > >>> http://www.infoobjects.com > >>> 2041 Mission College Boulevard, #280 > >>> Santa Clara, CA 95054 > >>> +1 (408) 988-2000 Work > >>> +1 (408) 716-2726 Fax > >> > >> > > > > > > -- > > Regards, > > Vikas Agarwal > > 91 – 9928301411 > > > > InfoObjects, Inc. > > Execution Matters > > http://www.infoobjects.com > > 2041 Mission College Boulevard, #280 > > Santa Clara, CA 95054 > > +1 (408) 988-2000 Work > > +1 (408) 716-2726 Fax > > > > > -- Regards, Vikas Agarwal 91 – 9928301411 InfoObjects, Inc. Execution Matters http://www.infoobjects.com 2041 Mission College Boulevard, #280 Santa Clara, CA 95054 +1 (408) 988-2000 Work +1 (408) 716-2726 Fax