I don't have experience running Phoenix in AWS. Andrew Purtell is a good person to ask. I'm curious if our support under their EMR helps in any way: http://phoenix.apache.org/phoenix_on_emr.html
Thanks, James On Sat, Sep 6, 2014 at 12:27 AM, Alex Kamil <alex.ka...@gmail.com> wrote: > not sure, that's not my experience with phoenix, but if you have unstable > network connection to your storage (which is EBS is well known for) it may > affect the results > > > On Sat, Sep 6, 2014 at 3:14 AM, Vikas Agarwal <vi...@infoobjects.com> wrote: >> >> Of course, I can do a lot of optimizations. However, my concern is that >> what I am missing that is causing Phoenix to perform bad while exactly on >> same time, Hbase is giving results amazingly fast. >> >> >> On Sat, Sep 6, 2014 at 12:41 PM, Alex Kamil <alex.ka...@gmail.com> wrote: >>> >>> well it is still network attached, If you allocate enough heap to fit the >>> whole thing in memory (in hbase/conf/hbase-env.sh) you could probably >>> eliminate this as a possible reason >>> >>> >>> On Sat, Sep 6, 2014 at 2:43 AM, Vikas Agarwal <vi...@infoobjects.com> >>> wrote: >>>> >>>> EBS but with new generation SSD not magnetic one. >>>> >>>> >>>> On Sat, Sep 6, 2014 at 12:11 PM, Alex Kamil <alex.ka...@gmail.com> >>>> wrote: >>>>> >>>>> do you use EBS or ephemeral storage, I found EBS performance to be >>>>> somewhat unpredictable >>>>> >>>>> >>>>> On Sat, Sep 6, 2014 at 2:37 AM, Vikas Agarwal <vi...@infoobjects.com> >>>>> wrote: >>>>>> >>>>>> Hbase is 0.98.0 >>>>>> Phoenix is 4.0 >>>>>> >>>>>> >>>>>> On Sat, Sep 6, 2014 at 12:04 PM, Vikas Agarwal <vi...@infoobjects.com> >>>>>> wrote: >>>>>>> >>>>>>> Yes, that is why it is a trouble for me. However, on contrary, HBase >>>>>>> shell is also on the same machine and same environment, so if it is an >>>>>>> issue >>>>>>> of resource (CPU or memory) it should have affected the HBase too, but >>>>>>> HBase >>>>>>> is able to give me results within 0.0150 seconds. :( >>>>>>> >>>>>>> No, I haven't tested it outside AWS. I guess, it should not be the >>>>>>> case due to much better performance by native HBase query on HBase >>>>>>> shell. >>>>>>> >>>>>>> >>>>>>> On Sat, Sep 6, 2014 at 11:59 AM, James Taylor >>>>>>> <jamestay...@apache.org> wrote: >>>>>>>> >>>>>>>> Something is up in your environment. What version of Phoenix and >>>>>>>> HBase >>>>>>>> are you using and in what environment? Have you tried this locally, >>>>>>>> outside of AWS to compare? >>>>>>>> >>>>>>>> Take a look at our perf numbers, generated more-or-less daily, and >>>>>>>> which run over more data that what you're testing against: >>>>>>>> >>>>>>>> http://phoenix-bin.github.io/client/performance/phoenix-20140904095313.htm >>>>>>>> >>>>>>>> Some of these are point queries and they take in the neighborhood of >>>>>>>> 0.01 seconds. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> James >>>>>>>> >>>>>>>> On Fri, Sep 5, 2014 at 10:48 PM, Vikas Agarwal >>>>>>>> <vi...@infoobjects.com> wrote: >>>>>>>> > Missed to mention that count query (posted in my last mail) is >>>>>>>> > also taking >>>>>>>> > very long time to return the count. >>>>>>>> > >>>>>>>> > >>>>>>>> > On Sat, Sep 6, 2014 at 11:17 AM, Vikas Agarwal >>>>>>>> > <vi...@infoobjects.com> >>>>>>>> > wrote: >>>>>>>> >> >>>>>>>> >> As I mentioned, schema is nothing but bunch of fields (some being >>>>>>>> >> integers, longs and text) along with primary key (row key) and I >>>>>>>> >> am making >>>>>>>> >> simple query to get result for a particular primary key, nothing >>>>>>>> >> more than >>>>>>>> >> that. >>>>>>>> >> >>>>>>>> >> 0: jdbc:phoenix:localhost> SELECT count(1) FROM table_name; >>>>>>>> >> >>>>>>>> >> +------------+ >>>>>>>> >> >>>>>>>> >> | COUNT(1) | >>>>>>>> >> >>>>>>>> >> +------------+ >>>>>>>> >> >>>>>>>> >> | 4667515 | >>>>>>>> >> >>>>>>>> >> +------------+ >>>>>>>> >> >>>>>>>> >> 1 row selected (132.11 seconds) >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> On Sat, Sep 6, 2014 at 11:09 AM, Puneet Kumar Ojha >>>>>>>> >> <puneet.ku...@pubmatic.com> wrote: >>>>>>>> >>> >>>>>>>> >>> If you can share the schema,data type,cardinality of each >>>>>>>> >>> dimension and >>>>>>>> >>> usual queries, I can help to design a schema with performance of >>>>>>>> >>> less than 1 >>>>>>>> >>> sec using Phoenix. >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> Thanks >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> ------ Original message------ >>>>>>>> >>> >>>>>>>> >>> From: James Taylor >>>>>>>> >>> >>>>>>>> >>> Date: Sat, Sep 6, 2014 10:15 AM >>>>>>>> >>> >>>>>>>> >>> To: user; >>>>>>>> >>> >>>>>>>> >>> Subject:Re: Phoenix response time >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> Vikas, >>>>>>>> >>> Please post your schema and query. >>>>>>>> >>> Thanks, >>>>>>>> >>> James >>>>>>>> >>> >>>>>>>> >>> On Fri, Sep 5, 2014 at 9:18 PM, Vikas Agarwal >>>>>>>> >>> <vi...@infoobjects.com> >>>>>>>> >>> wrote: >>>>>>>> >>> > Ours is also a single node setup right now and as of now there >>>>>>>> >>> > are less >>>>>>>> >>> > than >>>>>>>> >>> > 1 million rows which is expected to grow around 100m at >>>>>>>> >>> > minimum. >>>>>>>> >>> > >>>>>>>> >>> > I am aware of secondary indexes but when I am querying on >>>>>>>> >>> > primary/row >>>>>>>> >>> > key, >>>>>>>> >>> > why would it take so much time? >>>>>>>> >>> > >>>>>>>> >>> > I am directly querying using sqlline for Phoenix and hbase >>>>>>>> >>> > shell for >>>>>>>> >>> > HBase >>>>>>>> >>> > query. I am not expecting to do any fine tuning for such small >>>>>>>> >>> > dataset. >>>>>>>> >>> > I am >>>>>>>> >>> > assumimg a minimum performance level out of the box. >>>>>>>> >>> > >>>>>>>> >>> > On Friday, September 5, 2014, yeshwanth kumar >>>>>>>> >>> > <yeshwant...@gmail.com> >>>>>>>> >>> > wrote: >>>>>>>> >>> >> >>>>>>>> >>> >> hi vikas, >>>>>>>> >>> >> >>>>>>>> >>> >> we used phoenix on a 4 core/23Gb machine, as a single node >>>>>>>> >>> >> setup. >>>>>>>> >>> >> used HDP 2.1 >>>>>>>> >>> >> our table has 50-70M rows, >>>>>>>> >>> >> select on that table took less than 2 seconds. >>>>>>>> >>> >> Aggregation queries took less than 8 seconds. >>>>>>>> >>> >> for achieving good performance we created secondary index on >>>>>>>> >>> >> the >>>>>>>> >>> >> table. >>>>>>>> >>> >> >>>>>>>> >>> >> make sure you finetuned hbase, >>>>>>>> >>> >> enabling compression on the data makes a difference in >>>>>>>> >>> >> response. >>>>>>>> >>> >> if u distribute the data and load over all regions in hbase, >>>>>>>> >>> >> look at the performance tips mentioned in phoenix blog >>>>>>>> >>> >> >>>>>>>> >>> >> -yeshwanth >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> >>> >> Cheers, >>>>>>>> >>> >> Yeshwanth >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> >>> >> On Fri, Sep 5, 2014 at 5:42 PM, Vikas Agarwal >>>>>>>> >>> >> <vi...@infoobjects.com> >>>>>>>> >>> >> wrote: >>>>>>>> >>> >>> >>>>>>>> >>> >>> Hi, >>>>>>>> >>> >>> >>>>>>>> >>> >>> Preface: We are testing phoenix using Hortonworks >>>>>>>> >>> >>> distribution for >>>>>>>> >>> >>> HBase >>>>>>>> >>> >>> on Amazon EC2 instance (r3.large, 2 CPU/15 GB RAM). >>>>>>>> >>> >>> >>>>>>>> >>> >>> With contrast to performance benchmarks, I found Phoenix to >>>>>>>> >>> >>> be very >>>>>>>> >>> >>> slow >>>>>>>> >>> >>> in querying even on primary key or row key. So, tried to >>>>>>>> >>> >>> increase the >>>>>>>> >>> >>> RAM >>>>>>>> >>> >>> for HBase and Phoenix and increasing the CPU and RAM by >>>>>>>> >>> >>> upgrading the >>>>>>>> >>> >>> EC2 >>>>>>>> >>> >>> machine type to r3.xlarge (4 CPU, 30 GB RAM). Results were >>>>>>>> >>> >>> like this: >>>>>>>> >>> >>> >>>>>>>> >>> >>> Time takes in returning result of query on row key: >>>>>>>> >>> >>> With Storm running and very less RAM available: 50 sec >>>>>>>> >>> >>> >>>>>>>> >>> >>> With Storm stopped and RAM available to Phoenix and HBase: >>>>>>>> >>> >>> 18 sec >>>>>>>> >>> >>> >>>>>>>> >>> >>> With new machine of next higher category (4 CPU and 30 GB >>>>>>>> >>> >>> RAM): 8 sec >>>>>>>> >>> >>> >>>>>>>> >>> >>> Pure HBase query by row key with Storm stopped and (2 CPU, >>>>>>>> >>> >>> 15 GB >>>>>>>> >>> >>> RAM): >>>>>>>> >>> >>> 0.0150 seconds. :) >>>>>>>> >>> >>> >>>>>>>> >>> >>> So, the difference seems to be many fold of what native >>>>>>>> >>> >>> HBase is >>>>>>>> >>> >>> providing to us. I am not able to understand how it can be >>>>>>>> >>> >>> possible? >>>>>>>> >>> >>> What I >>>>>>>> >>> >>> am missing here? >>>>>>>> >>> >>> >>>>>>>> >>> >>> -- >>>>>>>> >>> >>> Regards, >>>>>>>> >>> >>> Vikas Agarwal >>>>>>>> >>> >>> 91 – 9928301411 >>>>>>>> >>> >>> >>>>>>>> >>> >>> InfoObjects, Inc. >>>>>>>> >>> >>> Execution Matters >>>>>>>> >>> >>> http://www.infoobjects.com >>>>>>>> >>> >>> 2041 Mission College Boulevard, #280 >>>>>>>> >>> >>> Santa Clara, CA 95054 >>>>>>>> >>> >>> +1 (408) 988-2000 Work >>>>>>>> >>> >>> +1 (408) 716-2726 Fax >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> >>> > >>>>>>>> >>> > >>>>>>>> >>> > -- >>>>>>>> >>> > Regards, >>>>>>>> >>> > Vikas Agarwal >>>>>>>> >>> > 91 – 9928301411 >>>>>>>> >>> > >>>>>>>> >>> > InfoObjects, Inc. >>>>>>>> >>> > Execution Matters >>>>>>>> >>> > http://www.infoobjects.com >>>>>>>> >>> > 2041 Mission College Boulevard, #280 >>>>>>>> >>> > Santa Clara, CA 95054 >>>>>>>> >>> > +1 (408) 988-2000 Work >>>>>>>> >>> > +1 (408) 716-2726 Fax >>>>>>>> >>> > >>>>>>>> >>> > >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> -- >>>>>>>> >> Regards, >>>>>>>> >> Vikas Agarwal >>>>>>>> >> 91 – 9928301411 >>>>>>>> >> >>>>>>>> >> InfoObjects, Inc. >>>>>>>> >> Execution Matters >>>>>>>> >> http://www.infoobjects.com >>>>>>>> >> 2041 Mission College Boulevard, #280 >>>>>>>> >> Santa Clara, CA 95054 >>>>>>>> >> +1 (408) 988-2000 Work >>>>>>>> >> +1 (408) 716-2726 Fax >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > -- >>>>>>>> > Regards, >>>>>>>> > Vikas Agarwal >>>>>>>> > 91 – 9928301411 >>>>>>>> > >>>>>>>> > InfoObjects, Inc. >>>>>>>> > Execution Matters >>>>>>>> > http://www.infoobjects.com >>>>>>>> > 2041 Mission College Boulevard, #280 >>>>>>>> > Santa Clara, CA 95054 >>>>>>>> > +1 (408) 988-2000 Work >>>>>>>> > +1 (408) 716-2726 Fax >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> Vikas Agarwal >>>>>>> 91 – 9928301411 >>>>>>> >>>>>>> InfoObjects, Inc. >>>>>>> Execution Matters >>>>>>> http://www.infoobjects.com >>>>>>> 2041 Mission College Boulevard, #280 >>>>>>> Santa Clara, CA 95054 >>>>>>> +1 (408) 988-2000 Work >>>>>>> +1 (408) 716-2726 Fax >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Vikas Agarwal >>>>>> 91 – 9928301411 >>>>>> >>>>>> InfoObjects, Inc. >>>>>> Execution Matters >>>>>> http://www.infoobjects.com >>>>>> 2041 Mission College Boulevard, #280 >>>>>> Santa Clara, CA 95054 >>>>>> +1 (408) 988-2000 Work >>>>>> +1 (408) 716-2726 Fax >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Vikas Agarwal >>>> 91 – 9928301411 >>>> >>>> InfoObjects, Inc. >>>> Execution Matters >>>> http://www.infoobjects.com >>>> 2041 Mission College Boulevard, #280 >>>> Santa Clara, CA 95054 >>>> +1 (408) 988-2000 Work >>>> +1 (408) 716-2726 Fax >>> >>> >> >> >> >> -- >> Regards, >> Vikas Agarwal >> 91 – 9928301411 >> >> InfoObjects, Inc. >> Execution Matters >> http://www.infoobjects.com >> 2041 Mission College Boulevard, #280 >> Santa Clara, CA 95054 >> +1 (408) 988-2000 Work >> +1 (408) 716-2726 Fax > >