Hi haifang, 1. Maybe filters not being correctly pushed down or the performance impact of single-concurrency writing to Iceberg. Can you please check the actual number of records written to Iceberg? Additionally, could you provide the version of the Iceberg connector and the SQL statement used for writing? This will help us investigate any potential planner issues.
2. It is also a good way to use the maximum id from yesterday as the lower bound. By the way, for scenarios that require continuous writing, you can also try using Flink CDC. Best, Jiabao > 2024年1月11日 10:52,haifang luo <luohaifang1...@gmail.com> 写道: > > Hello JiaBao > Thank you for your reply~ > This doesn't seem to solve my problem. > My steps are: > Read the oracle table (super large wide table) according to the timestamp or > auto-incremented primary key ID every day, and write it to the iceberg table. > Only timestamp or ID are filter conditions, there are no other filter > conditions, and they are all index fields of the Oracle table. > 1. If I do not configure partition scanning, the job will always have only > one degree of parallelism operating. When I execute a select query, the job > is completed quickly. > But when I write the results of the select query to the iceberg table, the > jdbc connector will scan the oracle table from scratch, and it is very slow. > Whether it is to enter the entire table or filter part of the data, it takes > more than 7 hours to execute. I have checked and found that the read and > write performance of the Oracle library is no problem. > 2. If I add a partition scan and filter the same amount of data from Oracle > and write it to the Iceberg table, it will complete the scan very quickly and > end the execution. > I can't figure out whether this is a problem with the flink connector or > iceberg. > > Jiabao Sun <jiabao....@xtransfer.cn <mailto:jiabao....@xtransfer.cn>> > 于2024年1月10日周三 18:15写道: >> Hi haifang, >> >> lower-bound and upper-bound are defined as long types, and it seems >> difficult to fill in the value of timestamp. >> However, you may use WHERE t > TIMESTAMP '2022-01-01 07:00:01.333', as JDBC >> supports filter pushdown. >> >> Best, >> Jiabao >> >> On 2024/01/10 08:31:23 haifang luo wrote: >> > Hello~~ >> > My Flink version: 1.15.4 >> > [image: image.png] >> > 'scan.partition.column' type is timestamp, how should I fill in >> > 'scan.partition.lower-bound' and 'scan.partition.upper-bound'? >> > Thank you for your reply~~ >> >