Preprocessing (after loading the data into HDFS).
I started with data in JSON format in text files (stored in HDFS), and then
loaded the data into parquet files with a bit of preprocessing and now I
always retrieve the data by creating a SchemaRDD from the parquet file and
using the SchemaRDD to b
My approach may be partly influenced by my limited experience with SQL and
Hive, but I just converted all my dates to seconds-since-epoch and then
selected samples from specific time ranges using integer comparisons.
On Thu, Sep 4, 2014 at 6:38 PM, Cheng, Hao wrote:
> There are 2 SQL dialects,
There are 2 SQL dialects, one is a very basic SQL support and another is Hive
QL. In most of cases I think people prefer using the HQL, which also means you
have to use HiveContext instead of the SQLContext.
In this particular query you showed, seems datatime is the type Date,
unfortunately, ne