I'm pretty sure you can use a timestamp as a partitionColumn, It's Timestamp type in MySQL. It's at base a numeric type and Spark requires a numeric type passed in.
This doesn't work as the where parameter in MySQL becomes raw numerics which won't query against the mysql Timestamp. minTimeStamp = 1325605540 <-- This is wrong, but I'm not sure what to put > in here. > > maxTimeStamp = 1505641420 > > numPartitions = 20*7 > > > dt = spark.read \ > > .format("jdbc") \ > > .option("url", os.environ["JDBC_URL"]) \ > > .option("dbtable", "schema.table") \ > > .option("numPartitions", numPartitions) \ > > .option("partitionColumn", "Timestamp") \ > > .option("lowerBound", minTimeStamp) \ > > .option("upperBound", maxTimeStamp) \ > > .load() > mysql DB schema: > create table table > > ( > > EventId VARCHAR(50) not null primary key, > > userid VARCHAR(200) null, > > Timestamp TIMESTAMP(19) default CURRENT_TIMESTAMP not null, > > Referrer VARCHAR(4000) null, > > ViewedUrl VARCHAR(4000) null > > ); > > create index Timestamp on Fact_PageViewed (Timestamp); > I'm obviously doing it wrong, but couldn't find anything obvious while digging around. The query that gets generated looks like this (not exactly, it's optimized to include some upstream query parameters): > > *SELECT *`Timestamp`,`Referrer`,`EventId`,`UserId`,`ViewedUrl` > *FROM *schema.table (*Timestamp*) > *WHERE Timestamp *>= 1452916570 *AND Timestamp *< 1454202540; <-- this > doesn't query against mysql timestamp type meaningfully. Thanks! Gary Lucas