Yeah, it is not related to timezone. I think you hit this issue <https://issues.apache.org/jira/browse/SPARK-3173> and it was fixed after 1.1 release.
On Mon, Oct 13, 2014 at 11:24 AM, Mohammed Guller <moham...@glassbeam.com> wrote: > Good guess, but that is not the reason. Look at this code: > > > > scala> val data = > sc.parallelize(1325548800000L::1335548800000L::Nil).map(i=> T(i.toString, > new java.sql.Timestamp(i))) > > data: org.apache.spark.rdd.RDD[T] = MappedRDD[17] at map at <console>:17 > > > > scala> data.collect > > res3: Array[T] = *Array(T(1325548800000,2012-01-02 16:00:00.0), > T(1335548800000,2012-04-27 10:46:40.0))* > > > > scala> data.registerTempTable("x") > > > > scala> val s = sqlContext.sql("select a from x where ts>='1970-01-01 > 00:00:00';") > > s: org.apache.spark.sql.SchemaRDD = > > SchemaRDD[20] at RDD at SchemaRDD.scala:103 > > == Query Plan == > > == Physical Plan == > > Project [a#2] > > ExistingRdd [a#2,ts#3], MapPartitionsRDD[22] at mapPartitions at > basicOperators.scala:208 > > > > scala> s.collect > > res5: Array[org.apache.spark.sql.Row] = Array() > > > > > > > > Mohammed > > > > *From:* Yin Huai [mailto:huaiyin....@gmail.com] > *Sent:* Monday, October 13, 2014 7:19 AM > *To:* Mohammed Guller > *Cc:* Cheng, Hao; Cheng Lian; user@spark.apache.org > > *Subject:* Re: Spark SQL parser bug? > > > > Seems the reason that you got "wrong" results was caused by timezone. > > > > The time in java.sql.Timestamp(long time) means "milliseconds since > January 1, 1970, 00:00:00 *GMT*. A negative number is the number of > milliseconds before January 1, 1970, 00:00:00 *GMT*." > > > > However, in ts>='1970-01-01 00:00:00', '1970-01-01 00:00:00' is using > your local timezone. > > > > Thanks, > > > > Yin > > > > > > On Mon, Oct 13, 2014 at 9:58 AM, Mohammed Guller <moham...@glassbeam.com> > wrote: > > Hi Cheng, > > I am using version 1.1.0. > > > > Looks like that bug was fixed sometime after 1.1.0 was released. > Interestingly, I tried your code on 1.1.0 and it gives me a different > (incorrect) result: > > > > case class T(a:String, ts:java.sql.Timestamp) > > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > > import sqlContext.createSchemaRDD > > val data = sc.parallelize(10000::20000::Nil).map(i=> T(i.toString, new > java.sql.Timestamp(i))) > > data.registerTempTable("x") > > val s = sqlContext.sql("select a from x where ts>='1970-01-01 00:00:00';") > > > > scala> s.collect > > res1: Array[org.apache.spark.sql.Row] = Array() > > > > Mohammed > > > > *From:* Cheng, Hao [mailto:hao.ch...@intel.com] > *Sent:* Sunday, October 12, 2014 1:35 AM > *To:* Mohammed Guller; Cheng Lian; user@spark.apache.org > > > *Subject:* RE: Spark SQL parser bug? > > > > Hi, I couldn’t reproduce the bug with the latest master branch. Which > version are you using? Can you also list data in the table “x”? > > > > case class T(a:String, ts:java.sql.Timestamp) > > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > > import sqlContext.createSchemaRDD > > val data = sc.parallelize(10000::20000::Nil).map(i=> T(i.toString, new > java.sql.Timestamp(i))) > > data.registerTempTable("x") > > val s = sqlContext.sql("select a from x where ts>='1970-01-01 00:00:00';") > > s.collect > > > > output: > > res1: Array[org.apache.spark.sql.Row] = Array([10000], [20000]) > > > > Cheng Hao > > > > *From:* Mohammed Guller [mailto:moham...@glassbeam.com > <moham...@glassbeam.com>] > *Sent:* Sunday, October 12, 2014 12:06 AM > *To:* Cheng Lian; user@spark.apache.org > *Subject:* RE: Spark SQL parser bug? > > > > I tried even without the “T” and it still returns an empty result: > > > > scala> val sRdd = sqlContext.sql("select a from x where ts >= '2012-01-01 > 00:00:00';") > > sRdd: org.apache.spark.sql.SchemaRDD = > > SchemaRDD[35] at RDD at SchemaRDD.scala:103 > > == Query Plan == > > == Physical Plan == > > Project [a#0] > > ExistingRdd [a#0,ts#1], MapPartitionsRDD[37] at mapPartitions at > basicOperators.scala:208 > > > > scala> sRdd.collect > > res10: Array[org.apache.spark.sql.Row] = Array() > > > > > > Mohammed > > > > *From:* Cheng Lian [mailto:lian.cs....@gmail.com <lian.cs....@gmail.com>] > *Sent:* Friday, October 10, 2014 10:14 PM > *To:* Mohammed Guller; user@spark.apache.org > *Subject:* Re: Spark SQL parser bug? > > > > Hmm, there is a “T” in the timestamp string, which makes the string not a > valid timestamp string representation. Internally Spark SQL uses > java.sql.Timestamp.valueOf to cast a string to a timestamp. > > On 10/11/14 2:08 AM, Mohammed Guller wrote: > > scala> rdd.registerTempTable("x") > > > > scala> val sRdd = sqlContext.sql("select a from x where ts >= '2012-01-01 > *T*00:00:00';") > > sRdd: org.apache.spark.sql.SchemaRDD = > > SchemaRDD[4] at RDD at SchemaRDD.scala:103 > > == Query Plan == > > == Physical Plan == > > Project [a#0] > > ExistingRdd [a#0,ts#1], MapPartitionsRDD[6] at mapPartitions at > basicOperators.scala:208 > > > > scala> sRdd.collect > > res2: Array[org.apache.spark.sql.Row] = Array() > > > > >