Re: spec question on equality deletes

2024-04-15 Thread Renjie Liu
Hi, Wing: I totally agree that we should clearly define the expected behavior in spec. I lean towards a), e.g. the row should be completed ignored or completed same as original row, intermediate state should be defined as invalid. On Tue, Apr 16, 2024 at 8:40 AM Wing Yew Poon wrote: > Hi Yufei,

Re: Status of Kafka Connect contribution?

2024-04-15 Thread Ajantha Bhat
Bryan, thanks for the update. Can we please create subtasks in the GitHub issue and keep discussions there? Keeping discussions public would allow others to contribute more easily. Good to know that we are targeting the functional connector for 1.6.0. - Ajantha On Tue, Apr 16, 2024 at 7:07 AM B

[VOTE] Release Apache PyIceberg 0.6.1rc2

2024-04-15 Thread Honah J.
Hi Everyone, I propose that we release the following RC as the official PyIceberg 0.6.1 release. This is a patch release due to the following bugs: - Fail to create version 1 table with non-empty partition-spec and sort-order - Hive Ca

Re: Status of Kafka Connect contribution?

2024-04-15 Thread Bryan Keller
Hi Ajantha, Yes, there is still the coordinator piece to add before the sink is functional at all. There have been discussions with some in the community around the best path forward for that part to ensure we have a good foundation to build on, which is why we have held off on opening the PR.

Re: spec question on equality deletes

2024-04-15 Thread Wing Yew Poon
Hi Yufei, Thank you for your response. It sounds like on 2, your thinking is that (b) is the correct behavior. Indeed, I have tried it out with Spark and afaict, it does (b). However, that does not mean that it is the correct behavior. The spec should clearly define it. - Wing Yew On Mon, Apr 15,

Re: spec question on equality deletes

2024-04-15 Thread Wing Yew Poon
Hi Renjie, Thank you for your perspective. On 1, I am inclined to the same view as you. On 2, I feel that the spec should clearly define the expected behavior; it should not be left to engines. At worst, the spec can say, e.g., that the correct behavior is (b) but it is acceptable for an engine to

Re: spec question on equality deletes

2024-04-15 Thread Yufei Gu
Hi Wing Yew Poon, Here is my understanding, but not necessarily how an engine implements it. It should only consider the columns in equality_ids when we apply eq deletes. Also the engine should ignore the unrelated columns. It will still delete the row with id 3 in the following case you described

Status of Kafka Connect contribution?

2024-04-15 Thread Ajantha Bhat
Hey everyone, We've been testing the Kafka Connect connector from the 1.5.0 release, and unfortunately, it seems it's still not ready for use. We're not quite up to date on its current status. We've noticed that three pull requests have been merged so far: #8701, #9466, and #9641. There's also a

Streamlining the monitoring of active projects/proposals

2024-04-15 Thread Ajantha Bhat
Hey everyone, We have several active projects, such as Views, multi-table transactions, Kafka Connect, and partition stats, etc., where proposals have been approved but implementation is still ongoing. Most of these proposals will involve multiple PRs, making it difficult to monitor progress and i

Re: How and where iceberg spark streaming determines latest StreamingOffset upon trigger

2024-04-15 Thread Nirav Patel
Hi, > Can you describe a bit more on your ingestion rate ? > what exactly were the read limits? Streaming job ingestion is maximum 1M records per batch. Trigger interval is every 1 minute which seem to be fine for regular stream processing. Our avg per minute record count is way less than that. >