Re: Iceberg connector

2024-04-16 Thread Péter Váry
Hi Chetas, > the only way out to use only the DataStream API (and not the table api) if I want to use a custom splitComparator? You can use watermark generation, and with that, watermark based split ordering using the table api. OTOH, currently there is no way to define a custom comparator using

Re: Iceberg connector

2024-04-16 Thread Chetas Joshi
Hi Péter, Great! Thanks! The resources are really useful. I don't have TABLE_EXEC_ICEBERG_USE_FLIP27_SOURCE set so it is the FlinkSource

Re: Pyflink w Nessie and Iceberg in S3 Jars

2024-04-16 Thread Robert Prat
Hi Péter, Thanks for pointing this out! I was aware of the difference in version between pyflink and some of the JAR dependencies. I was starting out with PyFlink 1.16 and I had some errors when creating the Dockerfile that seemed to be fixed when upgrading the version to 1.18. Thus the resul

Re: Pyflink w Nessie and Iceberg in S3 Jars

2024-04-16 Thread Péter Váry
Is it intentional, that you are using iceberg-flink-runtime-1.16-1.3.1.jar with 1.18.0 PyFlink? This might cause issues later. I would try to synchronize the Flink versions throughout all the dependencies. On Tue, Apr 16, 2024, 11:23 Robert Prat wrote: > I finally managed to make it work followi

Re: Iceberg connector

2024-04-16 Thread Péter Váry
Hi Chetas, See my answers below: On Tue, Apr 16, 2024, 06:39 Chetas Joshi wrote: > Hello, > > I am running a batch flink job to read an iceberg table. I want to > understand a few things. > > 1. How does the FlinkSplitPlanner decide which fileScanTasks (I think one > task corresponds to one da

RE: Table Source from Parquet Bug

2024-04-16 Thread Sohil Shah
Hi David, Since this is a ClassNotFoundException, you may be missing a dependency. Could you share your pom.xml. Thanks -Sohil Project: Braineous https://bugsbunnyshah.github.io/braineous/ On 2024/04/16 15:22:34 David Silva via user wrote: > Hi, > > Our team would like to leverage Flink but we'r

Re: Table Source from Parquet Bug

2024-04-16 Thread Sohil Shah
Hello David, Since this is a ClassNotFoundException, you maybe missing a dependency. Could you share your pom.xml. Thanks -Sohil Project: Braineous https://bugsbunnyshah.github.io/braineous/ On Tue, Apr 16, 2024 at 11:25 AM David Silva via user wrote: > Hi, > > Our team would like to leverage

Re: GCS FileSink Read Timeouts

2024-04-16 Thread Dylan Fontana via user
Thanks for the links! We've tried the `gs.writer.chunk.size` before and found it didn't make a meaningful difference unfortunately. The hadoop-connector link you've sent I think is actually not applicable since the gcs Filesystem connector isn't using the hadoop implementation but instead the Clou

Elasticsearch8 example

2024-04-16 Thread Tauseef Janvekar
Dear Team, Can anyone please share an example for flink-connector-elasticsearch8 I found this connector being added to the github. But no proper documentation is present around it. It will be of great help if a sample code is provided on the above connector. Thanks, Tauseef

Re: Pyflink Performance and Benchmark

2024-04-16 Thread Chase Zhang
On Mon, Apr 15, 2024 at 16:17 Niklas Wilcke wrote: > Hi Flink Community, > u > I wanted to reach out to you to get some input about Pyflink performance. > Are there any resources available about Pyflink benchmarks and maybe a > comparison with the Java API? I wasn't able to find something valuabl

Re: Pyflink w Nessie and Iceberg in S3 Jars

2024-04-16 Thread Robert Prat
I finally managed to make it work following the advice of Robin Moffat who replied to the earlier email: There's a lot of permutations that you've described, so it's hard to take one reproducible test case here to try and identify the error :) It certainly looks JAR related. You could try adding