subject:"RE\: TimeStamp selection with SparkSQL"

Re: TimeStamp selection with SparkSQL

2014-09-05 Thread Brad Miller

Preprocessing (after loading the data into HDFS). I started with data in JSON format in text files (stored in HDFS), and then loaded the data into parquet files with a bit of preprocessing and now I always retrieve the data by creating a SchemaRDD from the parquet file and using the SchemaRDD to b

Re: TimeStamp selection with SparkSQL

2014-09-05 Thread Brad Miller

My approach may be partly influenced by my limited experience with SQL and Hive, but I just converted all my dates to seconds-since-epoch and then selected samples from specific time ranges using integer comparisons. On Thu, Sep 4, 2014 at 6:38 PM, Cheng, Hao wrote: > There are 2 SQL dialects,

RE: TimeStamp selection with SparkSQL

2014-09-04 Thread Cheng, Hao

There are 2 SQL dialects, one is a very basic SQL support and another is Hive QL. In most of cases I think people prefer using the HQL, which also means you have to use HiveContext instead of the SQLContext. In this particular query you showed, seems datatime is the type Date, unfortunately, ne