[ 
https://issues.apache.org/jira/browse/BEAM-14161?focusedWorklogId=747453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747453
 ]

ASF GitHub Bot logged work on BEAM-14161:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Mar/22 21:02
            Start Date: 24/Mar/22 21:02
    Worklog Time Spent: 10m 
      Work Description: pabloem commented on pull request #16863:
URL: https://github.com/apache/beam/pull/16863#issuecomment-1078291316


   Looking at some of the database monitoring, the workload looks pretty much 
the same - but this is just a simple test database that is not serving any 
extra load, so I am not really sure that this monitoring information supports 
any hypothesis:
   
   Reading from the database with non-splittable reads:
   
   
![image](https://user-images.githubusercontent.com/1301740/160008947-7b9a8f32-1352-4e2c-9306-245d7b6fac23.png)
   
   
   Reading the database with splittable reads:
   
   
![image](https://user-images.githubusercontent.com/1301740/160007694-653e83d5-06fc-434e-95cc-46d63e71497c.png)
   
   The test is relatively simple. We would probably need something more 
complicated to have a stronger indication of the tradeoffs here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 747453)
    Remaining Estimate: 0h
            Time Spent: 10m

> Add dynamic splitting to JdbcIO.readWithPartitions
> --------------------------------------------------
>
>                 Key: BEAM-14161
>                 URL: https://issues.apache.org/jira/browse/BEAM-14161
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-jdbc
>            Reporter: Pablo Estrada
>            Assignee: Jean-Baptiste Onofré
>            Priority: P2
>             Fix For: Not applicable
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now, the JDBC IO is basically a {{DoFn}} executed with a {{{}ParDo{}}}. So, 
> it means that parallelism is "limited" and executed on one executor. 
> ReadWithPartitions does some preliminary partitioning of the data, but any 
> skew in data range or workload will create an unbalanced workload.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to