subject:"Flink table maintenance"

Re: Flink Table Maintenance - Tag based locking

2024-08-07 Thread Ryan Blue

priate actions (suitable for more expensive > operations like merging equality deletes into data files, removing orphan > files) > > I would rephrase this a bit differently: Offer an ability to create a > Flink Table Maintenance service which will listen to the table changes and >

Re: Flink Table Maintenance - Tag based locking

2024-08-07 Thread Péter Váry

rase this a bit differently: Offer an ability to create a Flink Table Maintenance service which will listen to the table changes and trigger and execute the appropriate actions. I think the main difference here is that the whole monitoring/scheduling/executing is coupled into a single job. > - S

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Ryan Blue

Iceberg tables. > The `TriggerLockFactory.Lock` interface is specifically designed to allow > the user to choose the prefered type of locking. I was trying to come up > with a solution where the Apache Iceberg users don't need to rely on one > more external system for the F

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Anton Okolnychyi

coordinate >> externally. >> [..] >> > I agree with Ryan that an Iceberg solution is not a good choice here. >> >> I agree that we don't need to *rely* on locking in the Iceberg tables. >> The `TriggerLockFactory.Lock` interface is specifically designed

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Péter Váry

. It should have a way to coordinate > externally. > [..] > > I agree with Ryan that an Iceberg solution is not a good choice here. > > I agree that we don't need to *rely* on locking in the Iceberg tables. > The `TriggerLockFactory.Lock` interface is specifically designed to allow &g

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Xianjin YE

actory.Lock` interface is specifically designed to allow the > user to choose the prefered type of locking. I was trying to come up with a > solution where the Apache Iceberg users don't need to rely on one more > external system for the Flink Table Maintenance to work. > >

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Péter Váry

re the Apache Iceberg users don't need to rely on one more external system for the Flink Table Maintenance to work. I understand that this discussion is very similar to the HadoopCatalog situation, where we have a hacky "solution" which is working in some cases, but suboptimal.

Re: Flink Table Maintenance - Tag based locking

2024-08-05 Thread Manu Zhang

instead of 'tag', and we could use the Catalogs atomic >> change requirement to make locking atomic. My main concern with this >> approach is that it relies on the linear history of the table and produces >> more contention in write side. I was OK with this in the

Re: Flink Table Maintenance - Tag based locking

2024-08-05 Thread Ryan Blue

logs atomic change > requirement to make locking atomic. My main concern with this approach is > that it relies on the linear history of the table and produces more > contention in write side. I was OK with this in the very specific use-case > with Flink Table Maintenance (few chang

Re: Flink Table Maintenance - Tag based locking

2024-08-04 Thread Péter Váry

talogs atomic change requirement to make locking atomic. My main concern with this approach is that it relies on the linear history of the table and produces more contention in write side. I was OK with this in the very specific use-case with Flink Table Maintenance (few changes, controlled time

Re: Flink Table Maintenance - Tag based locking

2024-08-04 Thread Manu Zhang

t is done in the Kafka Connect sink? I think that's a >> cleaner way to solve the problem if there is not going to be a way to fix >> it in Flink. >> >> Ryan >> >> On Wed, Jul 31, 2024 at 7:45 AM Péter Váry >> wrote: >> >>> Hi Team

Re: Flink Table Maintenance - Tag based locking

2024-08-04 Thread Steven Wu

aner way to solve the problem if there is not going to be a way to fix > it in Flink. > > Ryan > > On Wed, Jul 31, 2024 at 7:45 AM Péter Váry > wrote: > >> Hi Team, >> >> During the discussion around the Flink Table Maintenance [1], [2], I have >> highlig

Re: Flink Table Maintenance - Tag based locking

2024-08-04 Thread Ryan Blue

oblem if there is not going to be a way to fix it in Flink. Ryan On Wed, Jul 31, 2024 at 7:45 AM Péter Váry wrote: > Hi Team, > > During the discussion around the Flink Table Maintenance [1], [2], I have > highlighted that one of the main decision points is the way we preven

Flink Table Maintenance - Tag based locking

2024-07-31 Thread Péter Váry

Hi Team, During the discussion around the Flink Table Maintenance [1], [2], I have highlighted that one of the main decision points is the way we prevent concurrent Maintenance Tasks from happening concurrently. At that time we did not find better solution than providing an interface for locking

Re: Flink table maintenance

2024-04-21 Thread Gen Luo

ources >>>>>> (nodes) as the default jobs. >>>>>> >>>>>> 2. resources >>>>>>> Occupying lots of resources when tasks are idle may be an issue, but >>>>>>> to execute the tasks periodically in the job

Re: Flink table maintenance

2024-04-19 Thread Zhu Zhu

n), which can trigger tasks by >>>>>> executing CALL statements. In this way resource wasting can be avoided, >>>>>> but >>>>>> this is quite a different direction indeed. >>>>>> >>>>> >>>>> CALL statements are

Re: Flink table maintenance

2024-04-19 Thread Péter Váry

enance tasks by auto scaling. I'm afraid it's not a good >>>>> idea to rely on it. As far as I know, rescaling it triggers needs to >>>>> restart the job, which may not be acceptable by most of the users. >>>>> >>>> >>>

Re: Flink table maintenance

2024-04-18 Thread Gen Luo

>>>> If the table is only operated in one Flink job, we can use >>>> OperatorCoordinator to coordinate the tasks. OperatorCoordinator may also >>>> contact each other to ensure different kinds of tasks are executed in >>>> expected ways

Re: Flink table maintenance

2024-04-18 Thread Péter Váry

ead the small files, and create a single big file >> 3. Updater - to collect the new files and update the Iceberg table >> >> Is it possible for the Update Operator to communicate with the Planner >> Operator through the OperatorCoordinator? >> >> >>> While I suppo

Re: Flink table maintenance

2024-04-18 Thread Gen Luo

nism if we want to >> coordinate the operation from different engines. >> >> (Am I replying to the correct mail?) >> > > Yes :) > >> >> Thanks, >> Gen >> >> On Tue, Apr 9, 2024 at 12:29 AM Brian Olsen >> wrote: >> >>>

Re: Flink table maintenance

2024-04-18 Thread Zhu Zhu

anks, >> Gen >> >> On Tue, Apr 9, 2024 at 12:29 AM Brian Olsen >> wrote: >> >>> Hey Iceberg nation, >>> >>> I would like to share about the meeting this Wednesday to further >>> discuss details of Péter's proposal on Flink Maintenance Tasks

Re: Flink table maintenance

2024-04-17 Thread Péter Váry

asks. >> Calendar Link: https://calendar.app.google/83HGYWXoQJ8zXuVCA >> >> List discussion: >> https://lists.apache.org/thread/10mdf9zo6pn0dfq791nf4w1m7jh9k3sl >> <https://www.google.com/url?q=https://lists.apache.org/thread/10mdf9zo6pn0dfq791nf4w1m7jh9k3sl&

Re: Flink table maintenance

2024-04-17 Thread Gen Luo

.apache.org/thread/10mdf9zo6pn0dfq791nf4w1m7jh9k3sl > <https://www.google.com/url?q=https://lists.apache.org/thread/10mdf9zo6pn0dfq791nf4w1m7jh9k3sl&sa=D&source=calendar&usd=2&usg=AOvVaw2-aePIRr6APFVHpRDipMgX> > > Design Doc: Flink table maintenance > <https://www.google.com

RE: Flink table maintenance

2024-04-11 Thread ismail simsek

- Technically would it be possible not to force partition cols into the PK? I believe this is possible, but probably less performant. It is mentioned in the docs https://iceberg.apache.org/spec/#scan-planning >From the documentation: "An equality delete file must be applied to a data file when a

Re: Flink table maintenance

2024-04-08 Thread Brian Olsen

9k3sl <https://www.google.com/url?q=https://lists.apache.org/thread/10mdf9zo6pn0dfq791nf4w1m7jh9k3sl&sa=D&source=calendar&usd=2&usg=AOvVaw2-aePIRr6APFVHpRDipMgX> Design Doc: Flink table maintenance <https://www.google.com/url?q=https://docs.google.com/document/d/16g3vR18mVBy8j

Re: Flink table maintenance

2024-04-01 Thread Manu Zhang

Hi Peter, Are you proposing to create a user facing locking feature in Iceberg, or > just something something for internal use? > Since it's a general issue, I'm proposing to create a general user interface first, while the implementation can be left to users. For example, we use Airflow to sched

Re: Flink table maintenance

2024-04-01 Thread Péter Váry

Hi Ajantha, I thought about enabling post commit topology based compaction for sinks using options, like we use for the parametrization of streaming reads [1]. I think it will be hard to do it in a user friendly way - because of the high number of parameters -, but I think it is a possible solutio

Re: Flink table maintenance

2024-04-01 Thread Ajantha Bhat

Thanks for the proposal Peter. I just wanted to know do we have any plans for supporting SQL syntax for table maintenance (like CALL procedure) for pure Flink SQL users? I didn't see any custom SQL parser plugin support in Flink. I also saw that Branch write doesn't have SQL support (only Branch r

Re: Flink table maintenance

2024-04-01 Thread Péter Váry

Hi Manu, Just to clarify: - Are you proposing to create a user facing locking feature in Iceberg, or just something something for internal use? I think we shouldn't add locking to Iceberg's user facing scope in this stage. A fully featured locking system has many more features that we need (prior

Re: Flink table maintenance

2024-04-01 Thread Manu Zhang

> > What would the community think of exploiting tags for preventing > concurrent maintenance loop executions. This issue is not specific to Flink maintenance jobs. We have a service scheduling Spark maintenance jobs by watching table commits. When we don't check in-progress maintenance jobs for

Re: Flink table maintenance

2024-03-28 Thread Péter Váry

What would the community think of exploiting tags for preventing concurrent maintenance loop executions. The issue: Some maintenance tasks couldn't run parallel, like DeleteOrphanFiles vs. ExpireSnapshots, or RewriteDataFiles vs. RewriteManifestFiles. We make sure, not to run tasks started by a si

Flink table maintenance

2024-03-28 Thread Péter Váry

Hi Team, As discussed on yesterday's community sync, I am working on adding a possibility to the Flink Iceberg connector to run maintenance tasks on the Iceberg tables. This will fix the small files issues and in the long run help compacting the high number of positional and equality deletes creat

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Re: Flink Table Maintenance - Tag based locking

Flink Table Maintenance - Tag based locking

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

RE: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Re: Flink table maintenance

Flink table maintenance

32 matches

Site Navigation

Mail list logo

Footer information