Question on partitionColumn for a JDBC read using a timestamp from MySql

lucas.g...@gmail.com Mon, 18 Sep 2017 13:22:31 -0700

 I'm pretty sure you can use a timestamp as a partitionColumn, It's
Timestamp type in MySQL.  It's at base a numeric type and Spark requires a
numeric type passed in.


This doesn't work as the where parameter in MySQL becomes raw numerics
which won't query against the mysql Timestamp.


minTimeStamp = 1325605540 <-- This is wrong, but I'm not sure what to put
> in here.
>
> maxTimeStamp = 1505641420
>
> numPartitions = 20*7
>
>
> dt = spark.read \
>
>     .format("jdbc") \
>
>     .option("url", os.environ["JDBC_URL"]) \
>
>     .option("dbtable", "schema.table") \
>
>     .option("numPartitions", numPartitions) \
>
>     .option("partitionColumn", "Timestamp") \
>
>     .option("lowerBound", minTimeStamp) \
>
>     .option("upperBound", maxTimeStamp) \
>
>     .load()
>

mysql DB schema:

> create table table
>
> (
>
> EventId VARCHAR(50) not null primary key,
>
> userid VARCHAR(200) null,
>
> Timestamp TIMESTAMP(19) default CURRENT_TIMESTAMP not null,
>
> Referrer VARCHAR(4000) null,
>
> ViewedUrl VARCHAR(4000) null
>
> );
>
> create index Timestamp on Fact_PageViewed (Timestamp);
>

I'm obviously doing it wrong, but couldn't find anything obvious while
digging around.

The query that gets generated looks like this (not exactly, it's optimized
to include some upstream query parameters):

>
> *SELECT *`Timestamp`,`Referrer`,`EventId`,`UserId`,`ViewedUrl`
> *FROM *schema.table  (*Timestamp*)
> *WHERE  Timestamp *>= 1452916570 *AND Timestamp *< 1454202540;  <-- this
> doesn't query against mysql timestamp type meaningfully.


Thanks!

Gary Lucas

Question on partitionColumn for a JDBC read using a timestamp from MySql

Reply via email to