Additionally I wanted to tell that presently I was running the query on one machine with 3gm ram and the join query was taking around 6 seconds.
Thanks, Udbhav Agarwal From: Udbhav Agarwal Sent: 13 March, 2015 12:45 PM To: 'Akhil Das' Cc: [email protected] Subject: RE: spark sql performance Okay Akhil! Thanks for the information. Thanks, Udbhav Agarwal From: Akhil Das [mailto:[email protected]] Sent: 13 March, 2015 12:34 PM To: Udbhav Agarwal Cc: [email protected]<mailto:[email protected]> Subject: Re: spark sql performance Can't say that unless you try it. Thanks Best Regards On Fri, Mar 13, 2015 at 12:32 PM, Udbhav Agarwal <[email protected]<mailto:[email protected]>> wrote: Sounds great! So can I expect response time in milliseconds from the join query over this much data ( 0.5 million in each table) ? Thanks, Udbhav Agarwal From: Akhil Das [mailto:[email protected]<mailto:[email protected]>] Sent: 13 March, 2015 12:27 PM To: Udbhav Agarwal Cc: [email protected]<mailto:[email protected]> Subject: Re: spark sql performance So you can cache upto 8GB of data in memory (hope your data size of one table is < 2GB), then it should be pretty fast with SparkSQL. Also i'm assuming you have around 12-16 cores total. Thanks Best Regards On Fri, Mar 13, 2015 at 12:22 PM, Udbhav Agarwal <[email protected]<mailto:[email protected]>> wrote: Lets say am using 4 machines with 3gb ram. My data is customers records with 5 columns each in two tables with 0.5 million records. I want to perform join query on these two tables. Thanks, Udbhav Agarwal From: Akhil Das [mailto:[email protected]<mailto:[email protected]>] Sent: 13 March, 2015 12:16 PM To: Udbhav Agarwal Cc: [email protected]<mailto:[email protected]> Subject: Re: spark sql performance The size/type of your data, and your cluster configuration would be fine i think. Thanks Best Regards On Fri, Mar 13, 2015 at 12:07 PM, Udbhav Agarwal <[email protected]<mailto:[email protected]>> wrote: Thanks Akhil, What more info should I give so we can estimate query time in my scenario? Thanks, Udbhav Agarwal From: Akhil Das [mailto:[email protected]<mailto:[email protected]>] Sent: 13 March, 2015 12:01 PM To: Udbhav Agarwal Cc: [email protected]<mailto:[email protected]> Subject: Re: spark sql performance That totally depends on your data size and your cluster setup. Thanks Best Regards On Thu, Mar 12, 2015 at 7:32 PM, Udbhav Agarwal <[email protected]<mailto:[email protected]>> wrote: Hi, What is query time for join query on hbase with spark sql. Say tables in hbase have 0.5 million records each. I am expecting a query time (latency) in milliseconds with spark sql. Can this be possible ? Thanks, Udbhav Agarwal
