It works perfectly. You can use pseudo columns like ROWNUM in Oracle and
RRN in DB2. To avoid skewing you can apply the great coalesce
function...Spark is sparkling..

Best,


On Thu, Dec 7, 2017 at 2:20 PM, Tomasz Dudek <megatrontomaszdu...@gmail.com>
wrote:

> Hey Ravion,
>
> yes, you can obviously specify other column than a primary key. Be aware
> though, that if the key range is not spread evenly (for example in your
> code, if there's a "gap" in primary keys and no row has id between 0 and
> 17220) some of the executors may not assist in loading data (because
> "SELECT * FROM orders WHERE order_id IS BETWEEN 0 AND 17220 will return an
> empty result). I think you might want to repartition afterwards to ensure
> that df is evenly distributed(<--- could somebody confirm my last sentence?
> I don't want to mislead and I am not sure).
>
> The first question - could you just check and provide us the answer? :)
>
> Cheers,
> Tomasz
>
> 2017-12-03 7:39 GMT+01:00 ☼ R Nair (रविशंकर नायर) <
> ravishankar.n...@gmail.com>:
>
>> Hi all,
>>
>> I am using a query to fetch data from MYSQL as follows:
>>
>> var df = spark.read.
>> format("jdbc").
>> option("url", "jdbc:mysql://10.0.0.192:3306/retail_db").
>> option("driver" ,"com.mysql.jdbc.Driver").
>> option("user", "retail_dba").
>> option("password", "cloudera").
>> option("dbtable", "orders").
>> option("partitionColumn", "order_id").
>> option("lowerBound", "1").
>> option("upperBound", "68883").
>> option("numPartitions", "4").
>> load()
>>
>> Question is, can I use a pseudo column (like ROWNUM in Oracle or
>> RRN(employeeno) in DB2) in option where I specify the "partitionColumn" ?
>> If not, can we specify a partition column which is not a primary key ?
>>
>> Best,
>> Ravion
>>
>>
>>
>>
>


--

Reply via email to