Dynamic Iceberg Sink

2024-11-13 Thread Péter Váry
Hi Team, With Max Michels, we started to work on enhancing the current Iceberg Sink to allow inserting evolving records into a changing table. Created the design doc [1] Created the Iceberg proposal [2] Started the conversation on the Iceberg dev list [3] >From the abstract: - Flink Icebe

PostCommitTopology question

2024-08-27 Thread Péter Váry
Hi everyone, I am working on the Iceberg Connector's Table Maintenance function [1], and we plan to utilize the SinkV2 SupportsPostCommitTopology (formerly known as WithPostCommitTopology) to start the compaction after several commits. With Steven Zhen Wu were debating [2] the expectations about t

Force co-location of operators

2024-07-23 Thread Péter Váry
Hi Team, I have 2 operators with `forceNonParallel`: - Lock generator - Lock remover I need them to be located on the same JVM. My testing shows that if I put them to the same slotSharingGroup (or leave both of them as "default"), then they will be running on the same task manager slot. This ensur

Re: [VOTE] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-14 Thread Péter Váry
+1 (non-binding) Thanks, Peter On Tue, May 14, 2024, 09:50 gongzhongqiang wrote: > +1 (non-binding) > > Best. > Zhongqiang Gong > > Martijn Visser 于2024年5月14日周二 14:45写道: > > > Hi everyone, > > > > With no more discussions being open in the thread [1] I would like to > start > > a vote on FLIP-4

Re: Re: Iceberg table maintenance

2024-05-03 Thread Péter Váry
Hi Flink Team, The Iceberg table maintenance proposal is on vote on the Iceberg dev list [1]. Non-binding votes are important too, so if you are interested, please vote. Thanks, Peter [1] - https://lists.apache.org/thread/qhz17ppdbb57ql356j49qqk3nyk59rvm On Tue, Apr 2, 2024, 08:35 Péter Váry

Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-03 Thread Péter Váry
'll add it for completeness, thanks! > > > With regards to FLINK-35149, the fix version indicates a change at > Flink > > > CDC; is that indeed correct, or does it require a change in the SinkV2 > > > interface? > > > > > > Best regards, > >

Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-02 Thread Péter Váry
Hi Martijn, We might want to add FLIP-371 [1] to the list. (Or we aim only for higher level FLIPs?) We are in the process of using the new API in Iceberg connector [2] - so far, so good. I know of one minor known issue about the sink [3], which should be ready for the release. All-in-all, I thi

Iceberg table maintenance

2024-04-08 Thread Péter Váry
ight be another > option we can implement for users. > > Thanks, > Manu > > On Tue, Apr 2, 2024 at 5:26 AM Péter Váry > wrote: > >> Hi Ajantha, >> >> I thought about enabling post commit topology based compaction for sinks >> using options, like we us

Re: Re: Iceberg table maintenance

2024-04-01 Thread Péter Váry
the > chain. In this case, I think using tags is a better way than lock > mechanisms, for simplicity and ease of use for user. > > Thanks, > Wenjin. > > On 2024/03/30 13:22:12 Péter Váry wrote: > > Hi Wenjin, > > > > See my answers below: > > &

Re: Iceberg table maintenance

2024-03-30 Thread Péter Váry
tiple writers, the scheduling should be based on monitoring the changes on the table, instead of the information coming from the committer (which could only contain the changes only from a single writer) I hope this helps, Peter > Thank you. > > Best, > Wenjin > > On 2024/03/

Iceberg table maintenance

2024-03-28 Thread Péter Váry
Hi Team, I am working on adding a possibility to the Flink Iceberg connector to run maintenance tasks on the Iceberg tables. This will fix the small files issues and in the long run help compacting the high number of positional and equality deletes created by Flink tasks writing CDC data to Iceber

DataOutputSerializer serializing long UTF Strings

2024-01-19 Thread Péter Váry
Hi Team, During the root cause analysis of an Iceberg serialization issue [1], we have found that *DataOutputSerializer.writeUTF* has a hard limit on the length of the string (64k). This is inherited from the *DataOutput.writeUTF* method, where the JDK specifically defines this limit [2]. For our

Re: Flink SQL ANTI JOIN results with windowing in DataStream

2024-01-16 Thread Péter Váry
window did not fire. Sorry about the false alarm. Thanks, Peter Péter Váry ezt írta (időpont: 2024. jan. 16., K, 13:27): > Hi Team, > > I am working on Iceberg in process compaction, and trying to use SQL > window join to compare 2 streams like this: > > > > > *Ta

Flink SQL ANTI JOIN results with windowing in DataStream

2024-01-16 Thread Péter Váry
Hi Team, I am working on Iceberg in process compaction, and trying to use SQL window join to compare 2 streams like this: *Table fsFiles = tEnv.sqlQuery("SELECT runId, location, window_start, window_end " +"FROM TABLE(" +"* *TUMBLE(" +"TABLE " + fileSystemFilesTable +

Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Péter Váry
Thanks Danny for working on this! It would be good to do this in a way that the different connectors could reuse as much code as possible, so if possible put most of the code to the flink connector shared utils repo [1] +1 from for the general direction (non-binding) Thanks, Peter [1] https://g

[RESULT][VOTE] FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2024-01-03 Thread Péter Váry
Hi All, `FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable` [1], which has been renamed to `FLIP-372: Enhance and synchronize Sink API to match the Source API` has been accepted and voted through this thread [2]. The proposal received 6 binding appr

Re: [VOTE] FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-12-21 Thread Péter Váry
>>> Jiabao > >>> > >>> > >>> On 2023/12/18 12:06:05 Gyula Fóra wrote: > >>>> +1 (binding) > >>>> > >>>> Gyula > >>>> > >>>> On Mon, 18 Dec 2023 at 13:04, Márton Balassi > >

Re: [VOTE] FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-12-21 Thread Péter Váry
Best, > > Jiabao > > > > > > On 2023/12/18 12:06:05 Gyula Fóra wrote: > >> +1 (binding) > >> > >> Gyula > >> > >> On Mon, 18 Dec 2023 at 13:04, Márton Balassi > >> wrote: > >> > >>> +1 (binding) >

[VOTE] FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-12-18 Thread Péter Váry
Hi everyone, Since there were no further comments on the discussion thread [1], I would like to start the vote for FLIP-372 [2]. The FLIP started as a small new feature, but in the discussion thread and in a similar parallel thread [3] we opted for a somewhat bigger change in the Sink V2 API. Pl

Re: [DISCUSS] Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-12-13 Thread Péter Váry
/thread/h6nkgth838dlh5s90sd95zd6hlsxwx57 [3] - https://github.com/apache/flink/pull/23912 Péter Váry ezt írta (időpont: 2023. dec. 11., H, 14:28): > We identified another issue with the current Sink API in a thread [1] > related to FLIP-371 [2]. Currently it is not possible to evol

Re: [DISCUSS] Resolve diamond inheritance of Sink.createWriter

2023-12-13 Thread Péter Váry
ks for updating the patch. The latest patch looks good to me. I've > +1ed > > on the PR. > > > > Cheers, > > > > Jiangjie (Becket) Qin > > > > On Mon, Dec 11, 2023 at 9:21 PM Péter Váry > > wrote: > > > >> Thanks everyone for t

Re: [DISCUSS] Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-12-11 Thread Péter Váry
t; future > - Easier to declare parts of the api stable going forward as it's not all > or nothing > > The ability to do proper compile time validation is definitely a downside > but this should mostly make initial development a little harder I believe. > > Cheers,

Re: [DISCUSS] Resolve diamond inheritance of Sink.createWriter

2023-12-11 Thread Péter Váry
hat the consensus is to > move towards the mixin interface approach (and accept its disadvantages > given its advantages). > > +1 for the general direction of your proposed code change in > https://github.com/apache/flink/pull/23876. > > On Tue, Dec 5, 2023 at 3:44 PM Péter Váry

Re: [DISCUSS] Release Flink 1.18.1

2023-12-08 Thread Péter Váry
Hi Jing, Thanks for taking care of this! +1 (non-binding) Peter Sergey Nuyanzin ezt írta (időpont: 2023. dec. 8., P, 9:36): > Thanks Jing driving it > +1 > > also +1 to include FLINK-33313 mentioned by Benchao Li > > On Fri, Dec 8, 2023 at 9:17 AM Benchao Li wrote: > > > Thanks Jing for driving

SQL return type change from 1.17 to 1.18

2023-12-06 Thread Péter Váry
Hi Team, We are working on upgrading the Iceberg-Flink connector from 1.17 to 1.18, and found that some of our tests are failing. Prabhu Joseph created a jira [1] to discuss this issue, along with short example code. In a nutshell: - Create a table with an 'ARRAY' column - Run a select which retu

Re: [DISCUSS] Resolve diamond inheritance of Sink.createWriter

2023-12-05 Thread Péter Váry
or changes we can make easily, we should switch > to > > > the decorative/mixin interface approach used successfully in the source > > and > > > table interfaces. Let's try to do this as much as possible within the > v2 > > > and compatibility boundaries and w

Re: [DISCUSS] Release flink-connector-parent v1.01

2023-12-04 Thread Péter Váry
Hi Etienne, Which branch would you cut the release from? I find the flink-connector-parent branches confusing. If I merge a PR to the ci_utils branch, would it immediately change the CI workflow of all of the connectors? If I merge something to the release_utils branch, would it immediately cha

Re: [DISCUSS] Resolve diamond inheritance of Sink.createWriter

2023-12-01 Thread Péter Váry
or repos should build against the > >latest release and not the master branch. > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-321%3A+Introduce+an+API+deprecation+process > > [2] > > > > > https://nightlies.apache.org/flink/fli

Re: [DISCUSS] Resolve diamond inheritance of Sink.createWriter

2023-11-27 Thread Péter Váry
I think we should try to separate the discussion in a few different topics: - Concrete issue - How to solve this problem in 1.19 and wrt the affected createWriter interface - Update the documentation [1], so FLIP-321 is visible for every contributor - Generic issue

Python connector question

2023-11-27 Thread Péter Váry
Hi Team, During some, mostly unrelated work we come to realize that during the externalization of the connector's python modules and the related tests are not moved to the respective connectors repository. We created the jiras [1] to create the infra, and to move the python code and the tests to t

Re: [DISCUSS] Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-11-23 Thread Péter Váry
r implementations of WithPreCommitTopology) 3. Do you have a better idea? Thanks, Peter CC: Jiabao Sun - as he might be interested in this discussion [1] - https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/api/java/typeutils/TypeExtractor.html Péter Váry ezt írta

Re: [DISCUSS] Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-10-25 Thread Péter Váry
gt; cluttered. > > I'll need to experiment the builder approach a bit more to see if it makes > sense at all, but wanted to throw out the idea earlier to see what you > think. > > On Mon, Oct 9, 2023 at 6:59 AM Péter Váry > wrote: > > > Hi Team, > > >

[RESULT][VOTE] FLIP-371: Provide initialization context for Committer creation in TwoPhaseCommittingSink

2023-10-17 Thread Péter Váry
Hi all, I am happy to announce that FLIP-371: FLIP-371: Provide initialization context for Committer creation in TwoPhaseCommittingSink[1] has been accepted. There are 7 approving votes, 6 of which are binding: - Gyula Fóra(binding) - Márton Balassi (binding) - Leonard Xu (binding) - Gabor Somogyi

Re: [VOTE] FLIP-371: Provide initialization context for Committer creation in TwoPhaseCommittingSink

2023-10-16 Thread Péter Váry
; > > > On Wed, Oct 11, 2023 at 8:20 PM Gyula Fóra wrote: > > > >> Thanks , Peter. > >> > >> +1 > >> > >> Gyula > >> > >> On Wed, 11 Oct 2023 at 14:47, Péter Váry > >> wrote: > >> > >>

Re: [DISCUSS] FLIP-371: SinkV2 Committer imporvements

2023-10-16 Thread Péter Váry
t; Best, > Leonard > > 2023年10月10日 下午1:21,Péter Váry 写道: > > Hi everyone, > > It seems we have no more comments. So I would like to start a vote tomorrow > if there are no further things to discuss. > > Please let me know if you have any concerns, thanks! > >

[VOTE] FLIP-371: Provide initialization context for Committer creation in TwoPhaseCommittingSink

2023-10-11 Thread Péter Váry
Hi all, Thank you to everyone for the feedback on FLIP-371[1]. Based on the discussion thread [2], I think we are ready to take a vote to contribute this to Flink. I'd like to start a vote for it. The vote will be open for at least 72 hours (excluding weekends, unless there is an objection or an i

Re: [DISCUSS] FLIP-371: SinkV2 Committer imporvements

2023-10-09 Thread Péter Váry
I've just analyzed it through and I think it's useful feature. > > +1 from my side. > > G > > > On Thu, Oct 5, 2023 at 12:35 PM Péter Váry > wrote: > > > For the record, after the rename, the new FLIP link is: > > > > > https://cwiki.apa

Re: [DISCUSS] Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-10-09 Thread Péter Váry
have updated the FLIP to use this new approach. Any comments, thoughts are welcome. Thanks, Peter Péter Váry ezt írta (időpont: 2023. okt. 5., Cs, 16:04): > Hi Team, > > In my previous email[1] I have described our challenges migrating the > existing Iceberg SinkFunction based implem

[DISCUSS] Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-10-05 Thread Péter Váry
Hi Team, In my previous email[1] I have described our challenges migrating the existing Iceberg SinkFunction based implementation, to the new SinkV2 based implementation. As a result of the discussion around that topic, I have created the FLIP-371 [2] to address the Committer related changes, and

Re: [DISCUSS] FLIP-371: SinkV2 Committer imporvements

2023-10-05 Thread Péter Váry
For the record, after the rename, the new FLIP link is: https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink Thanks, Peter Péter Váry ezt írta (időpont: 2023. okt. 5., Cs, 11:02): > Thanks Gordon for

Re: [DISCUSS] FLIP-371: SinkV2 Committer imporvements

2023-10-05 Thread Péter Váry
t; wrote: > > > > > > Thanks, Peter. I agree that this is needed for Iceberg and beneficial > for > > > other connectors too. > > > > > > +1 > > > > > > On Wed, Oct 4, 2023 at 3:56 PM Péter Váry > > > > wrote: > > &

[DISCUSS] FLIP-371: SinkV2 Committer imporvements

2023-10-04 Thread Péter Váry
Hi Team, In my previous email[1] I have described our challenges migrating the existing Iceberg SinkFunction based implementation, to the new SinkV2 based implementation. As a result of the discussion around that topic, I have created the first [2] of the FLIP-s addressing the missing features th

Re: [DISCUSS] Sink V2 API missing features

2023-09-29 Thread Péter Váry
Martijn > > [1] https://lists.apache.org/thread/zjc4p47k4sxjcbnntt8od2grnmx51xg0 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-191%3A+Extend+unified+Sink+interface+to+support+small+file+compaction#FLIP191:ExtendunifiedSinkinterfacetosupportsmallfilecompaction-Alternat

[DISCUSS] Sink V2 API missing features

2023-09-28 Thread Péter Váry
Hi Flink Team, I am working on implementing a new Iceberg Sink which would use the new SinkV2 API [1], [2]. The main goal is to open up the possibility for doing the incremental small files compaction inside a Flink job using the new Iceberg Sink. During the implementation I have found several pl

Re: DefaultInputSplitAssigner question

2023-04-20 Thread Péter Váry
Created a PR for the change: https://github.com/apache/flink/pull/22437 Could you please review Zhu? Thanks, Peter Péter Váry ezt írta (időpont: 2023. ápr. 20., Cs, 13:54): > Thanks Zhu for the quick response! > > In the Iceberg Flink Source (old version, there is a FLIP-27 version

Re: DefaultInputSplitAssigner question

2023-04-20 Thread Péter Váry
s to me more an outdated documentation problem. Furthermore, > DefaultInputSplitAssigner is used for legacy sources which will be > deprecated in the future. > > [1] > https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/sources/#the-data-source-api > > Thank

DefaultInputSplitAssigner question

2023-04-19 Thread Péter Váry
Hi Team, Recently I ran into the DefaultInputSplitAssigner [1]. The javadoc documentation states: /** * This is the default implementation of the {@link InputSplitAssigner} interface. The default input * split assigner simply returns all input splits of an input vertex *in the order they wer

Re: [Discussion] externalize Hive connector

2023-01-10 Thread Péter Váry
Hi Team, Somewhat, but not strictly related: - We would like to use delegation tokens to connect from the IcebergFilesCommitter tasks to kerberized Hive Metastore servers when committing changes in the Iceberg connector [1]. Gabor Somogyi is working on generalizing token support [2]. I am working

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.3.0, release candidate #1

2022-12-10 Thread Péter Váry
+1 (non-binding) 1. Checked the artifacts (binaries, headers, versions) 2. Checked the signatures 3. Compiled code 4. Created docker images 5. Run some manual tests 6. Run the examples Other than one small issue (which is a local one) everything works like charm. Thanks Matyas for taking care of

Re: [ANNOUNCE] New Apache Flink Committer - Matyas Orhidi

2022-11-22 Thread Péter Váry
Congratulations Mátyás! On Tue, Nov 22, 2022, 06:40 Matthias Pohl wrote: > Congratulations, Matyas :) > > On Tue, Nov 22, 2022 at 11:44 AM Xingbo Huang wrote: > > > Congrats Matyas! > > > > Best, > > Xingbo > > > > Yanfei Lei 于2022年11月22日周二 11:18写道: > > > > > Congrats Matyas! 🍻 > > > > > > Zhe

Re: flink-s3-fs-hadoop dependencies

2022-10-25 Thread Péter Váry
tion on all the places around Flink > > In filesToLoad one could specify core-site.xml, hdfs-site.xml etc. > Never tried it out but this idea is in my head for quite some time... > > BR, > G > > > On Tue, Oct 25, 2022 at 11:43 AM Péter Váry > wrote: > > > Hi Team, &g

flink-s3-fs-hadoop dependencies

2022-10-25 Thread Péter Váry
Hi Team, I have recently faced the issue that the S3 FileSystem read my core-site.xml until it was on the classpath, but later when I tried to add it using the HADOOP_CONF_DIR then the configuration file was not loaded. Filed a jira [1] and created a PR [2] for fixing it. HadoopUtils.getHadoopCon

Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-17 Thread Péter Váry
Hi all, I think the main advantage of having the Flink-Iceberg connector in its own repo is that we could release it independently of both the Iceberg and Flink releases. Previously having the Flink connector in the Iceberg repo made sense, since the Iceberg project was rapidly moving forward whic

Re: Re: Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-11 Thread Péter Váry
Thanks Abid, Count me in, and drop a note, if I can help in any way. Thanks, Peter On Tue, Oct 11, 2022, 20:13 wrote: > Hi Martijn, > > Yes catalog integration exists and catalogs can be created using Flink > SQL. > > https://iceberg.apache.org/docs/latest/flink/#creating-catalogs-and-using-ca

dev@flink.apache.org

2022-09-30 Thread Péter Váry
parate repositories. I think that will confuse the user. I would > expect that with modules you should be able to have support for multiple > versions in one repository. > > Best regards, > > Martijn > > On Fri, Sep 30, 2022 at 7:44 AM Péter Váry > wrote: > > > Than

dev@flink.apache.org

2022-09-29 Thread Péter Váry
/ > > [1] https://github.com/apache/flink-connector-elasticsearch > > On Thu, Sep 29, 2022 at 3:17 PM Péter Váry > wrote: > > > Hi Team, > > > > Just joining the conversation for the first time, so pardon me if I > repeat > > already answered question

dev@flink.apache.org

2022-09-29 Thread Péter Váry
Hi Team, Just joining the conversation for the first time, so pardon me if I repeat already answered questions. It might be already discussed, but I think the version for the "connected" system could be important as well. There might be some API changes between Iceberg 0.14.2, and 1.0.0, which w