HI Deenar,
Please find the SQL query below:
var SQL_RDD= new JdbcRDD( sc, ()=>
DriverManager.getConnection(url,user,pass),"select col1, col2,
col3......col 37 from schema.Table LIMIT ? OFFSET ?",100,0,*1*,(r:
ResultSet) => (r.getInt("col1"),r.getInt("col2").......r.getInt("col37")))
When I have the above 100,0,*1 * I am getting SQL_RDD.count as 100
When set to 100,0,2 I am getting SQL_RDD.count as 151
When set to 100,0,3 I am getting SQL RDD.count as 201
But where as I expect every execution count should be 100, let me know if I
am missing anything here
Regards,
Satish Chandra
On Thu, Sep 24, 2015 at 12:48 AM, Deenar Toraskar <
[email protected]> wrote:
> Satish
>
> Can you post the SQL query you are using?
>
> The SQL query must have 2 placeholders and both of them should be an
> inclusive range (<= and >=)..
>
> e.g. select title, author from books where ? <= id and id <= ?
>
> Are you doing this?
>
> Deenar
>
>
>
>
> *Think Reactive Ltd*
> [email protected]
> 07714140812
>
>
>
> On 23 September 2015 at 13:47, satish chandra j <[email protected]>
> wrote:
>
>> HI,
>> Could anybody provide inputs if they have came across similar issue
>>
>> @Rishitesh
>> Could you provide if any sample code to use JdbcRDDSuite
>>
>>
>> Regards,
>> Satish Chandra
>>
>> On Wed, Sep 23, 2015 at 5:14 PM, Rishitesh Mishra <
>> [email protected]> wrote:
>>
>>> I am using Spark 1.5. I always get count = 100, irrespective of num
>>> partitions.
>>>
>>> On Wed, Sep 23, 2015 at 5:00 PM, satish chandra j <
>>> [email protected]> wrote:
>>>
>>>> HI,
>>>> Currently using Spark 1.2.2, could you please let me know correct
>>>> results output count which you got it by using JdbcRDDSuite
>>>>
>>>> Regards,
>>>> Satish Chandra
>>>>
>>>> On Wed, Sep 23, 2015 at 4:02 PM, Rishitesh Mishra <
>>>> [email protected]> wrote:
>>>>
>>>>> Which version of Spark you are using ?? I can get correct results
>>>>> using JdbcRDD. Infact there is a test suite precisely for this (
>>>>> JdbcRDDSuite) .
>>>>> I changed according to your input and got correct results from this
>>>>> test suite.
>>>>>
>>>>> On Wed, Sep 23, 2015 at 11:00 AM, satish chandra j <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> HI All,
>>>>>>
>>>>>> JdbcRDD constructor has following parameters,
>>>>>>
>>>>>> *JdbcRDD
>>>>>> <https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/rdd/JdbcRDD.html#JdbcRDD(org.apache.spark.SparkContext,%20scala.Function0,%20java.lang.String,%20long,%20long,%20int,%20scala.Function1,%20scala.reflect.ClassTag)>*
>>>>>> (SparkContext
>>>>>> <https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/SparkContext.html>
>>>>>> sc,
>>>>>> scala.Function0<java.sql.Connection> getConnection, String sql, *long
>>>>>> lowerBound,
>>>>>> long upperBound, int numPartitions*,
>>>>>> scala.Function1<java.sql.ResultSet,T
>>>>>> <https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/rdd/JdbcRDD.html>>
>>>>>> mapRow,
>>>>>> scala.reflect.ClassTag<T
>>>>>> <https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/rdd/JdbcRDD.html>
>>>>>> > evidence$1)
>>>>>>
>>>>>> where the below parameters *lowerBound* refers to Lower boundary of
>>>>>> entire data, *upperBound *refers to Upper boundary of entire data
>>>>>> and *numPartitions *refer to Number of partitions
>>>>>>
>>>>>> Source table to which JbdcRDD is fetching data from Oracle DB has
>>>>>> more than 500 records but its confusing when I tried several executions
>>>>>> by
>>>>>> changing "numPartitions" parameter
>>>>>>
>>>>>> LowerBound,UpperBound,numPartitions: Output Count
>>>>>>
>>>>>> 0 ,100 ,1 : 100
>>>>>>
>>>>>> 0 ,100 ,2 : 151
>>>>>>
>>>>>> 0 ,100 ,3 : 201
>>>>>>
>>>>>>
>>>>>> Please help me in understanding the why Output count is 151 if
>>>>>> numPartitions is 2 and Output count is 201 if numPartitions is 3
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Satish Chandra
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>