RE: Joining by timestamp.

2014-07-22 Thread durga
Thanks Chen -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-timestamp-tp10367p10449.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: Joining by timestamp.

2014-07-21 Thread Cheng, Hao
...@spark.incubator.apache.org Subject: RE: Joining by timestamp. Hi Chen, Thank you very much for your reply. I think I do not understand how can I do the join using spark api. If you have time , could you please write some code . Thanks again, D. -- View this message in context: http://apache-spark-user-list

RE: Joining by timestamp.

2014-07-21 Thread durga
Hi Chen, Thank you very much for your reply. I think I do not understand how can I do the join using spark api. If you have time , could you please write some code . Thanks again, D. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-timestamp-tp1

RE: Joining by timestamp.

2014-07-21 Thread Cheng, Hao
Actually it's just a pseudo algorithm I described, you can do it with spark API. Hope the algorithm helpful. -Original Message- From: durga [mailto:durgak...@gmail.com] Sent: Tuesday, July 22, 2014 11:56 AM To: u...@spark.incubator.apache.org Subject: RE: Joining by timestamp. Hi

RE: Joining by timestamp.

2014-07-21 Thread durga
Hi Chen, I am new to the Spark as well as SparkSQL , could you please explain how would I create a table and run query on top of it.That would be super helpful. Thanks, D. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-timestamp-tp10367p10381.ht

RE: Joining by timestamp.

2014-07-21 Thread Cheng, Hao
This is a very interesting problem. SparkSQL supports the Non Equi Join, but it is in very low efficiency with large tables. One possible solution is make both table partition based and the partition keys are (cast(ds as bigint) / 240), and with each partition in dataset1, you probably can writ