Re: Iceberg MV Refresh

2024-06-20 Thread Piotr Findeisen
Hi Benny, on the staleness topic I'd recommend to check how Trino implements materialized views in Iceberg and how it defines staleness. In particular - a view can have defined grace period which defines how stale the data can be for the materialization to be considered useful (defaults to unlimi

Re: Making the NDV property required for theta sketch blobs in Puffin

2024-06-22 Thread Piotr Findeisen
Hi, non-binding +1 to require that `apache-datasketches-theta-v1` sketch has ndv blob property set. Best PF On Sat, 22 Jun 2024 at 06:27, Jean-Baptiste Onofré wrote: > Hi Amogh > > +1 to have ndv blob metadata property required. > > As discussed during the last community meeting, we discussed

Re: Iceberg MV Refresh

2024-06-24 Thread Piotr Findeisen
that the refresh job ran on say 6/20/2024 >>>>> 12:02:10 UTC, then whatever data is in the materialization has to be >>>>> "fresh >>>>> as of" 6/20/2024 12:02:10 UTC. >>>>> >>>>> Thanks >>>>> Benny

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-03 Thread Piotr Findeisen
Hi Szehon, re listing 'removed' snapshots If I understand what you're saying is the following: Iceberg table format requires users to first delete metadata information about files and only then delete the files, and sometimes users want to order these events differently. We can solve this within

Re: Making the NDV property required for theta sketch blobs in Puffin

2024-07-05 Thread Piotr Findeisen
Agreed this isn't a Puffin format change copying comment from the "Spec: Make NDV blob metadata property required" PR Puffin fole format has place for versioning within the magic, but the > Puffin format doesn't change, only its use changes. Puffin s

Re: [DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-08 Thread Piotr Findeisen
Hi Shehon, Walaa Thank Shehon for bringing this up. And thank you Walaa for proving more context from similar existing solution to the problem. The choices that LakeChime seems to have made -- to keep information in a separate RDBMS and which particular metadata information to retain -- they indee

Building with JDK 21

2024-07-09 Thread Piotr Findeisen
Hi, Java 21 is the latest "LTS version" released GA in September 2023. Some Iceberg users already run with Java 21 on production (and FWIW Trino runs with 22 already) I thought it would be nice to add support for building and testing Iceberg with Java 21. Conceptually this is simple (see PR

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-09 Thread Piotr Findeisen
Hi, I totally hear Ryan's concerns about further dividing the discussion. I had the same feeling when we opened discussions in Trino. The reality was more positive though. Discussions predominantly serve as a way to ask questions rather than drive decision-making in the project. Thinking from user

Re: Building with JDK 21

2024-07-09 Thread Piotr Findeisen
uld be fine if we disable the formatter >>> when using Java 21 and just make sure we always have tests run with Java 8 >>> and the formatter checks in our CI. If we go this route I think we stay >>> with Java 8 for formatting and save the reformat for when Java 8 is droppe

Re: Building with JDK 21

2024-07-10 Thread Piotr Findeisen
; Step 2. Upgrade to JDK21 (build + CI) and Java Format > > Regards > JB > > On Tue, Jul 9, 2024 at 2:31 PM Piotr Findeisen > wrote: > > > > Hi, > > > > Java 21 is the latest "LTS version" released GA in September 2023. > > Some Iceberg users

Re: Building with JDK 21

2024-07-10 Thread Piotr Findeisen
Hi, Thanks for additional feedback. It sounds like we're OK enabling builds and testing with JDK 21 with the caveats that formatter is off. The non-negotiable condition is that CI still checks format. I think this is exactly what this PR is doing, whi

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Piotr Findeisen
+1 (non binding) On Wed, 10 Jul 2024 at 10:11, Jean-Baptiste Onofré wrote: > +1 (non binding) > > Regards > JB > > On Wed, Jul 10, 2024 at 5:35 AM Eduard Tudenhöfner > wrote: > > > > Hey everyone, > > > > I propose to fix the property names in the REST spec for statistics / > partition statisti

Re: [VOTE] spec: remove the JSON spec for content file and file scan task sections

2024-07-11 Thread Piotr Findeisen
it looks it's part of the spec that's not connected to the other parts of the spec (like "dead code") +1 (non binding) On Thu, 11 Jul 2024 at 08:30, Eduard Tudenhöfner wrote: > +1 (non-binding) > > On Thu, Jul 11, 2024 at 8:29 AM Ajantha Bhat > wrote: > >> +1 (non-binding) >> >> - Ajantha >>

Re: [VOTE] Release Apache Iceberg 1.6.0 RC0

2024-07-12 Thread Piotr Findeisen
Hi, The release is probably good to go, but i didn't verify it, so no -1 nor +1 from me. Still, it would be awesome if we could include these PRs somewhat important for Trino https://github.com/apache/iceberg/pull/10691 (OOM fix, especially under concurrency or for tables with large numbers of f

Re: Building with JDK 21

2024-07-19 Thread Piotr Findeisen
Hi, We recently started to test Hive3 with Java 11 and 17 and the tests pass. So dropping Java 8 doesn't technically require removing the Hive 3 related modules, unless users cannot do anything useful with them (because e.g. they can only run Hive run

Re: Building with JDK 21

2024-07-22 Thread Piotr Findeisen
that if we can separate the discussion about how to support >>>> Hive, then we should do that. >>>> >>>> +1 to removing Java 8 support >>>> +1 to adding Java 21 support. >>>> >>>> On Fri, Jul 19, 2024 at 12:58 PM huaxin g

Dropping JDK 8 support

2024-07-22 Thread Piotr Findeisen
Hi, in the "Building with JDK 21" email thread we discussed adding JDK 21 support and also dropping JDK 8 support, as these things were initially related. A lot of people expressed acceptance for dropping JDK 8 support, and release 2.0 was proposed as a timeline. There were also concerned raised,

Re: Building with JDK 21

2024-07-22 Thread Piotr Findeisen
t thing > to do. We want to be as transparent about these changes as possible. > > Kind regards, > Fokko Driesprong > > Op ma 22 jul 2024 om 14:37 schreef Piotr Findeisen < > piotr.findei...@gmail.com>: > >> Thanks for this lively discussion, it is great to see so

Re: [ANNOUNCE] Welcoming new committers and PMC members

2024-07-23 Thread Piotr Findeisen
Dear all, thank you for your trust. Very much appreciated Kevin, Sung, Xuanwo, Honah, Renjie -- congratulations! it's awesome that your efforts were noticed and the value you bring to the table -- recognized. Best, Piotr On Tue, 23 Jul 2024 at 18:56, Steve Zhang wrote: > Congrats everyone! >

[VOTE] Drop Java 8 support in Iceberg 1.7.0

2024-07-26 Thread Piotr Findeisen
Hi, Dropping support for building and running on Java 8 was discussed previously on "Dropping JDK 8 support" and "Building with JDK 21" mail threads. As JB kindly pointed out, for a vote we need a "VOTE" thread, so here we go. Question: Should we drop Java 8 support in Iceberg 1.7.0? Best, Piotr

[DISCUSS] Iceberg 1.6.1 release

2024-07-26 Thread Piotr Findeisen
Hi, ParallelIterable memory limit PR [1] is backported to 1.6.x branch [2]. Are there any other bug fixes that should go into 1.6.1 release? Best, Piotr [1] https://github.com/apache/iceberg/pull/10691 [2] https://github.com/apache/iceberg/pull/10787

Re: [VOTE] Drop Java 8 support in Iceberg 1.7.0

2024-08-01 Thread Piotr Findeisen
Hi Thank you all for your participation. This is the summary of votes binding +1: 6 binding -1: 0 non-binding +1: 10 non-binding -1: 1 If i am not mistaken, this means we concluded the vote as 'yes' for dropping Java 8 support in 1.7.0 release. I wish we had unanimous decision, but I am aware

Re: [DISCUSS] Implementing a table-level statistics file to store column statistics

2024-08-02 Thread Piotr Findeisen
Hi, First of all, thank you Huaxin for raising this topic. It's important for Spark, but also for Trino. Min, max, and null counts can be derived from manifests. I am not saying that a query engine should derive them from manifests at query time, but it definitely can. If we want to pull min, max

Re: [DISCUSS] Clarify in REST spec expected implementation behavior for unknown updates or requirements

2024-08-02 Thread Piotr Findeisen
Hi, Thank you Amogh for starting the discussion and creating the PR in the first place. It makes sense to me to return 400 bad request for requests that server doesn't understand. (Any other options considered?) Best, Piotr On Fri, 2 Aug 2024 at 19:15, Amogh Jahagirdar <2am...@gmail.com> wrote:

Re: [DISCUSS] Implementing a table-level statistics file to store column statistics

2024-08-07 Thread Piotr Findeisen
ood approach to derive >>>>> these statistics at query execution time. That's why I propose saving >>>>> these >>>>> metrics in the table-level stats file. I am thinking of reusing the >>>>> existing aggregate pushdown mechanism to compute min,

Re: [DISCUSS] Iceberg 1.6.1 release

2024-08-07 Thread Piotr Findeisen
8:13 AM Jean-Baptiste Onofré > wrote: > >> > >> Hi, > >> > >> It would be great to include the Avro update in 1.6.1 release. > >> > >> I agree for a maintenance release on 1.6.x, but I would like to > >> include a couple of updat

Re: [DISCUSS] Iceberg 1.6.1 release

2024-08-07 Thread Piotr Findeisen
gards, > Fokko > > Op wo 7 aug 2024 om 16:15 schreef Piotr Findeisen < > piotr.findei...@gmail.com>: > >> Hi >> >> Thank you JB and Eduard for commenting! >> >> JB, which Avro version we would be updating to for the CVE fix? >> >> Best

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-07 Thread Piotr Findeisen
Hi, Thank you Ajantha for creating this thread. The Iceberg UDFs are an interesting idea! Is there a plan to make the user-created functions sharable between the engines? If so, how would a CREATE FUNCTION statement look like in e..g Spark or Trino? Meanwhile, added a few comments in the doc. Be

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-08 Thread Piotr Findeisen
wrote: > Piotr, what do you mean by making user-created functions shareable > between engines? Do you mean UDFs written in imperative code? > > On Wed, Aug 7, 2024 at 12:00 PM Piotr Findeisen > wrote: > > > > Hi, > > > > Thank you Ajantha for creating this

Re: [DISCUSS] Iceberg 1.6.1 release

2024-08-15 Thread Piotr Findeisen
t sound reasonable? > > Kind regards, > Fokko > > Op wo 7 aug 2024 om 20:03 schreef Piotr Findeisen < > piotr.findei...@gmail.com>: > >> Hey Fokko, >> >> thanks, that makes sense! >> Do you maybe know the timeline for the Avro release? >> Trino

Re: Type promotion in v3

2024-08-19 Thread Piotr Findeisen
Hi, Lack of type information in lower/upper bounds is definitely an interesting problem. For example the 4 bytes \x31\x32\x33\x34 value can be interpreted as string "1234" or 875770417 integer value (stored little-endian). if the reader logic depends on the length of data in bytes, will this preve

[DISCUSS] Release source and binary verification

2024-08-20 Thread Piotr Findeisen
Hi All, Hi The release verification [1] includes testing release source tarball builds and also testing the binaries with downstream projects. Does it also contain, should it contain or is it a conscious omission of: 1. verifying the source tarball is what it should be (source matches the git r

Re: [VOTE] Release Apache Iceberg 1.6.1 RC1

2024-08-20 Thread Piotr Findeisen
Hi -1 (non-binding) I verified source tarball matches the git tag (except it lacks jitpack.yml, docs/ and 'examples/Convert table to Iceberg.ipynb'). However, i noted that source tarball verification is not part of https://iceberg.apache.org/how-to-release/#validating-a-source-release-candidate .

Re: [VOTE] Release Apache Iceberg 1.6.1 RC1

2024-08-21 Thread Piotr Findeisen
>> did you mean the Avro update (which I think we were planning for 1.6.2)? >> >> On Tue, Aug 20, 2024 at 7:05 PM Piotr Findeisen < >> piotr.findei...@gmail.com> wrote: >> >>> Hi >>> >>> -1 (non-binding) >>> >>> I ver

string bucketing compatibility issue

2021-07-16 Thread Piotr Findeisen
Hi, It was discovered by @Mateusz Gajewski that Iceberg bucketing transformation for string isn't regular Murmur3 32-bit hash. Upon closer investigation we found out that the code: https://github.com/apache/iceberg/blob/0c50b2074cd5dad59bbcb4b4599ec3ae11a34b49/api/src/main/java/org/apache/icebe

Re: string bucketing compatibility issue

2021-07-19 Thread Piotr Findeisen
Hi, I've filed https://github.com/apache/iceberg/issues/2837 for this as well. Best PF On Sat, Jul 17, 2021 at 12:48 AM Piotr Findeisen wrote: > Hi, > > It was discovered by @Mateusz Gajewski > that Iceberg bucketing > transformation for string isn't regular Murm

Re: Proposal: Support for views in Iceberg

2021-07-20 Thread Piotr Findeisen
Hi, FWIW, in Trino we just added Trino views support. https://github.com/trinodb/trino/pull/8540 Of course, this is by no means usable by other query engines. Anjali, your document does not talk much about compatibility between query engines. How do you plan to address that? For example, I am fa

Re: Proposal: Support for views in Iceberg

2021-07-22 Thread Piotr Findeisen
;> 4. Intermediate structured language of our own. (What additional >> functionality does it provide over Calcite?) >> >> Given that the view metadata is json, it is easily extendable to >> incorporate any new fields needed to make the SQL truly compatible across >

Re: Proposal: Z-Ordering in Iceberg

2021-07-22 Thread Piotr Findeisen
Hi Bhavyam, Has this been discussed on the sync? Ryan, will it be making into the table metadata spec? Best, PF On Wed, Jul 21, 2021 at 1:50 PM Bhavyam Kamal wrote: > Hi Everyone, > > I would like to discuss and get feedback on the following proposal for > Z-Ordering in the Iceberg Sync today:

Re: [DISCUSS] Distinct count map

2021-07-23 Thread Piotr Findeisen
Hi, File level distinct count (a number) has limited applicability in Trino. It's useful for pointed queries, where we can prune all the other files away, but in other cases, Trino optimizer wouldn't be able to make an educated use of that. Internally, Łukasz and I we were talking about sketches

Re: [DISCUSS] Moving to apache-iceberg Slack workspace

2021-07-27 Thread Piotr Findeisen
Hi, I don't have opinion which Slack workspace this is in, as long as it's easy to join. Manual joining process is not healthy for sure. Btw, the apache-iceberg is currently limited to @apache.org emails, which some people do not have (e.g. i do not). Will you be sharing an invite link or somethin

Re: Proposal: Support for views in Iceberg

2021-07-27 Thread Piotr Findeisen
t >> to support this use case. Different engines can build this logical >> structure when traversing their own AST during a create view query. >> >> 3. With these considerations, I think the "sql" field can potentially be >> a map (maybe called "engine-s

Re: [DISCUSS] Distinct count map

2021-07-27 Thread Piotr Findeisen
place for >>> sketches to live. >>> >>> Best, >>> Ryan >>> >>> [1] >>> https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY/edit#heading=h.uqr5wcfm85p7 >>> [2] >>> https://docs.google.com/docu

Re: [DISCUSS] Moving to apache-iceberg Slack workspace

2021-07-29 Thread Piotr Findeisen
12:51 AM Eduard Tudenhoefner < >>>>> edu...@dremio.com> wrote: >>>>> >>>>>> Could we just update the slack link to >>>>>> https://join.slack.com/t/apache-iceberg/ on the website (see PR#2882 >>>>>> <https:

Re: [DISCUSS] UUID type

2021-07-29 Thread Piotr Findeisen
Hi, I agree with Ryan, that it takes some precautions before one can assume uniqueness of UUID values, and that this shouldn't be any special for UUIDs at all. After all, this is just a primitive type, which is commonly used for certain things, but "commonly" doesn't mean "always". The advantages

Re: [DISCUSS] Moving to apache-iceberg Slack workspace

2021-07-29 Thread Piotr Findeisen
] Best PF On Thu, Jul 29, 2021 at 12:51 PM Eduard Tudenhoefner wrote: > The default invite link expires after 30 days, that's why I was looking > for alternatives. Maybe a slack admin can check if the invite link can be > configured to not expire. > > On Thu, Jul 29, 2021, 1

Re: Proposal: Support for views in Iceberg

2021-07-29 Thread Piotr Findeisen
ot need it and Trino uses it only for validation). >- @Jacques Table references in the views can be arbitrary objects such >as tables from other catalogs or elasticsearch tables etc. I will clarify >it in the spec. > > I will work on incorporating all the comments in the s

Re: [DISCUSS] Moving to apache-iceberg Slack workspace

2021-07-29 Thread Piotr Findeisen
Hi, I was told the screenshot in my previous email doesn't show up, so sharing it as link instead https://gist.github.com/findepi/68e6a141d6ea06049c33e85c5ccd5835#gistcomment-3835460 Best PF On Thu, Jul 29, 2021 at 4:13 PM Piotr Findeisen wrote: > Hi, > > @Ryan Blue , wher

Re: [DISCUSS] UUID type

2021-09-13 Thread Piotr Findeisen
;> >>> If we want the values to be stored as 16-byte fixed, then we need to >>> make it easy to get the expected string representation in and out, just >>> like we do with date/time types. I don't think that's specific to any >>> engine. >>>

Re: [DISCUSS] UUID type

2021-09-17 Thread Piotr Findeisen
;s message here. >> >> Have we converged? I think most people would assume that silence is a >> vote for the status-quo. >> >> On Mon, Sep 13, 2021 at 7:30 AM Piotr Findeisen >> wrote: >> >>> Hi, >>> >>> It seems we converged he

Re: Drop table behavior

2021-11-23 Thread Piotr Findeisen
Hi, When you come from storage perspective, then the current design of 'not owning' location makes sense. However, if you come from SQL perspective, then all this is impractical limitation. Analysts and other SQL users want to be able to delete their data and must have confidence that all the da

Re: Supporting gs:// prefix in S3URI for Google Cloud S3 Storage

2021-12-01 Thread Piotr Findeisen
Hi Just curious. S3URI seems aws s3-specific. What would be the goal of using S3URI with google cloud storage urls? what problem are we solving? PF On Wed, Dec 1, 2021 at 4:56 PM Russell Spitzer wrote: > Sounds reasonable to me if they are compatible > > On Wed, Dec 1, 2021 at 8:27 AM Mayur S

Re: Supporting gs:// prefix in S3URI for Google Cloud S3 Storage

2021-12-01 Thread Piotr Findeisen
are compatible > with the AWS S3 SDKs and if they are added to the list of supported > prefixes, they work with S3FileIO. > > > > Thanks, > > Mayur > > > > *From:* Piotr Findeisen > *Sent:* Wednesday, December 1, 2021 10:58 AM > *To:* Iceberg Dev List > *

Re: Supporting gs:// prefix in S3URI for Google Cloud S3 Storage

2021-12-02 Thread Piotr Findeisen
GCS, and some GCS features to be not supported in S3FileIO, so I >>> think a specific GCS FileIO would likely be better for GCS support in the >>> long term. >>> >>> >>> >>> Could you describe how you configure S3FileIO to talk to GCS? Do you >>

Re: Handling pandas.Timestamps in nanos

2021-12-03 Thread Piotr Findeisen
Hi, I don't know about Pandas, but the question about timestamp precision is interesting to me nonetheless. At Starburst, we've had customer asking for nanosecond timestamp precision, and this drove adding that capability to Trino. (Actually, picosecond timestamp precision was implemented, but I a

Re: High memory usage with highly concurrent committers

2021-12-06 Thread Piotr Findeisen
Hi Igor, does fs.gs.outputstream.upload.chunk.size affect the file size I can upload? Can i upload e.g. 1GB Parquet file, while also setting fs.gs.outputstream. upload.chunk.size=8388608 (8MB / MiB)? Best PF On Fri, Dec 3, 2021 at 5:33 PM Igor Dvorzhak wrote: > No, right now this is a global

Re: [External] Re: Continuing the Secondary Index Discussion

2022-03-07 Thread Piotr Findeisen
Hi Zaicheng, thanks for following up on this. I'm certainly interested. The proposed time doesn't work for me though, I'm in the CET time zone. Best, PF On Sat, Mar 5, 2022 at 9:33 AM Zaicheng Wang wrote: > Hi dev folks, > > As discussed in the sync >

Iceberg NDV stats

2022-03-16 Thread Piotr Findeisen
Hi, We at Starburst are looking into adding number distinct values (NDV) statistics to Iceberg tables, to let e.g. the Trino cost-based query optimizer produce better plans when working with Iceberg tables. The initial approach is for table-level statistics, and may be improved in the future. I w

Re: Positional delete with vs without the delete row values

2022-05-09 Thread Piotr Findeisen
Hi Peter, FWIW, Trino Iceberg connector writes deletion files with just positions, without row data. cc @Alexander Jo > For the 1st point we just need to collect the statistics during the delete, but we do not have to actually persist the data. I would be weary of creating ORC/Parquet files wit

[VOTE] Adopt Puffin format as a file format for statistics and indexes

2022-06-09 Thread Piotr Findeisen
Hi Everyone, I propose that we adopt Puffin file format as a file format for statistics and indexes in Iceberg tables. Puffin file format specification: https://github.com/apache/iceberg/blob/master/format/puffin-spec.md (previous discussions: https://github.com/apache/iceberg/pull/4944, https:/

Re: [VOTE] Adopt Puffin format as a file format for statistics and indexes

2022-06-22 Thread Piotr Findeisen
04 AM Szehon Ho >> wrote: >> >> +1, it's an exciting step for Iceberg, look forward to all the new >> statistics and secondary indices it will allow. >> >> >> >> Had a few questions of what the reference to Puffin file(s) will be in >> the

Re: [Proposal] Partition stats in Iceberg

2022-11-23 Thread Piotr Findeisen
Hi Ajantha, this is very interesting document, thank you for your work on this! I've added a few comments there. I have one high-level design comment so I thought it would be nicer to everyone if I re-post it here is "partition" the right level of keeping the stats? > We do this in Hive, but was

Re: [VOTE] Release Apache Iceberg 1.1.0 RC4

2022-11-28 Thread Piotr Findeisen
Hi, https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-core/1.1.0/ is already published (on Nov 22, so before voting was concluded) Is it "the" release, or there will be new tag pushed to maven central? best, PF On Mon, Nov 28, 2022 at 9:18 AM Gabor Kaszab wrote: > > Hey All, > >

Re: Change default format-version of our forked Iceberg to v2

2023-01-11 Thread Piotr Findeisen
Hi, FWIW Trino already creates v2 tables by default. Thought it's worth sharing for context. Best PF On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang wrote: > Hi all, > > We've maintained a forked Iceberg internally and all our use cases involve > v2 tables with row-level updates and deletes. Ou

Re: [DISCUSS] Hive locks removal

2023-01-20 Thread Piotr Findeisen
Hi Peter Thanks for bringing this issue up and thanks for working on it as well. I haven't experienced these problems first-hand, so don't have opinion yet on what the solution should or shouldn't be. With this new approach, is there any plan to address situations where more than one application

Re: [DISCUSS] Release source and binary verification

2024-08-23 Thread Piotr Findeisen
Hi! Russel, Justin, JB, thanks for your comments! I started this thread because i believe that source and binary verification are very important verification steps we should be doing, especially in the light of supply chain attacks that we've witnessed (XZ). We should have processes in the Iceber

Re: [VOTE] Release Apache Iceberg 1.6.1 RC2

2024-08-23 Thread Piotr Findeisen
+1 (non-binding) Trino integration https://github.com/trinodb/trino/actions/runs/10529992246/job/29179087096?pr=23083 https://github.com/trinodb/trino/actions/runs/10529992246/job/29179401378?pr=23083 On Fri, 23 Aug 2024 at 15:19, Eduard Tudenhöfner wrote: > +1 (binding) > > - verified signat

Re: [DISCUSS] Drop Hive 2 support

2024-08-27 Thread Piotr Findeisen
Hi, Trino query engine won't be affected by the drop. These are the artifacts Trino depends on iceberg-api iceberg-bundled-guava iceberg-core iceberg-nessie iceberg-orc iceberg-parquet iceberg-snowflake Best Piotr On Tue, 27 Aug 2024 at 02:51, Anton Okolnychyi wrote: > Are we aware of any e

Re: [Discuss] LZ4 compression in Puffin spec

2024-08-27 Thread Piotr Findeisen
Hi Gabor Thanks for creating this discussion thread. This is indeed a good topic to discuss. The idea was to have lightweight compression for the footer for cass when Puffin files are bigger. It is true that the implementation didn't follow the spec yet. If we remove this from the Puffin spec, we

Re: Request to Add RisingWave to Apache Iceberg Documentation

2024-08-28 Thread Piotr Findeisen
Hi Alice, Thank you for your message and awesome to learn about RisingWave project. Also, congratulations on your Series A funding! I’m reaching out on behalf of the Apache RisingWave team. RisingWave is an > Apache open-source project dedicated to real-time stream processing regarding the "Ap

Re: [DISCUSS] Additional language implementations for Iceberg Puffin reader/writer

2024-08-29 Thread Piotr Findeisen
Hi Gabor, thanks for starting this topic. it would be awesome to have Puffin readers/writers available to all languages supported by the Iceberg community! The topic is important for v3, but also if we want to support stats updates when writing to tables that already have some stats collected. i

Re: [Discuss] LZ4 compression in Puffin spec

2024-08-31 Thread Piotr Findeisen
y be in favor of > deprecating LZ4 framed and using LZ4 withing framing which already has high > quality native java implementation. > > Cheers, > Micah > > On Tue, Aug 27, 2024 at 5:44 AM Piotr Findeisen > wrote: > >> Hi Gabor >> >> Thanks for creating th

Re: [Discuss] LZ4 compression in Puffin spec

2024-10-14 Thread Piotr Findeisen
I think this is still fairly simple compared > to implementing the block format. > > On Sat, Aug 31, 2024 at 12:11 PM Piotr Findeisen < > piotr.findei...@gmail.com> wrote: > >> Hi Micah, >> >> Good point. >> Does unframed LZ4 provide a checksum of the cont

Re: [DISCUSS] [PyIceberg] Use of asserts to "programming the negative space"

2024-10-12 Thread Piotr Findeisen
Hi Andre, I am not very familiar with PyIceberg, but i am always for ensuring that assumptions in our code are validated. I am not quite sure that assert is the way to go though. In Java, one typically does not use `assert`, which can be enabled or disabled. checkState / checkArgument are preferr

Re: [Discuss] Apache Iceberg 1.6.2 release because of Avro CVE ?

2024-10-12 Thread Piotr Findeisen
I'm fine both ways, with or without release. I am a strong believer of low-ceremony and automated releases. Maybe we could automate at least the core part of the release (shipping binaries), and then we don't need to think so much about when (not) to release, because it would be cheap to redo if ne

Re: [DISCUSS] Iceberg Summit 2025 ?

2024-09-29 Thread Piotr Findeisen
Hi Meeting in person is always the best, but online is much more inclusive. So +1 for a hybrid event. Best Piotr On Mon, 30 Sept 2024 at 08:27, Eduard Tudenhöfner wrote: > +1 for a hybrid event > > On Sun, Sep 29, 2024 at 4:51 AM Steven Wu wrote: > >> +1 for hybrid with in-person elements. >

Re: [DISCUSS] [PyIceberg] Use of asserts to "programming the negative space"

2024-10-16 Thread Piotr Findeisen
eeper research if we are about to roll something on our own. It sounds unlikely that such a fundamental need is not addressed in Python ecosystem. Best Piotr On Tue, 15 Oct 2024 at 01:53, André Luis Anastácio wrote: > Thank you Piotr Findeisen and Sung Yun, for your insights. > >

Re: [Discuss] Iceberg View Interoperability

2024-10-28 Thread Piotr Findeisen
Hi, I have no experience with Substrait, but i agree it looks like the tool for the job. Or, as I proposed earlier, we define our own Iceberg IR for the views. We can experiment with serialized IR being stored as a String with new dialect name, and this is how we should get this started. It's pro

Re: [DISCUSS] - Deprecate Equality Deletes

2024-10-31 Thread Piotr Findeisen
Hi, Thank you Russell for bringing up this topic and nice write-up. >From perspective of engines like Trino, equality deletes bring little value and add lot complications, so +1 from me on this. I understand they exist for a reason though. Maybe it was just a lazy choice that we should just revis

Re: [VOTE] Deletion Vectors in V3

2024-10-30 Thread Piotr Findeisen
Thank you Anton, +1 (non-binding) On Thu, 31 Oct 2024 at 05:07, John Zhuge wrote: > +1 (non-binding) > > On Wed, Oct 30, 2024 at 1:56 PM Anton Okolnychyi > wrote: > >> +1 (binding) >> >> - Anton >> >> ср, 30 жовт. 2024 р. о 21:32 Amogh Jahagirdar <2am...@gmail.com> пише: >> >>> +1 (binding)

Re: Very strange (AI generated) issues

2025-01-31 Thread Piotr Findeisen
Extending the issue template is an option, but let's be aware of downsides and let's make sure we believe it's net positive (also outside of current situation). Some people (and bots) will overlook the checkboxes. If "I am a human" is not checked, do we auto-reject the issue? Some people will noti

Re: [DISCUSS] Update supported blob types in puffin spec

2025-02-04 Thread Piotr Findeisen
Thanks Denys for starting this discussion! Thanks Ryan, i agree it would be better to have engine agnostic data structures in the Blobs we maintain in the Iceberg project. At least for the "standard blob types". Note however that Puffin format is intentionally open-ended. An application can put a

Re: Optimize object lookup in REST catalog

2024-12-05 Thread Piotr Findeisen
Hi I like the idea to just "get relation" to get the relation in one shot. Similar thing applies to listing relations. This is obviously less common operation, but not uncommon and also more expensive. BI tools query information_schema.tables (and other information_schema information). To complete

Re: [PROPOSAL] Create Iceberg DockerHub repository

2024-12-05 Thread Piotr Findeisen
Hi, Sorry for coming late here. Did we consider GitHub packages as a home of the Apache docker images? We already use GitHub for development and GitHub packages are better integrated with GitHub. In my personal opinion github packages are also less likely to be rate limited. Best Piotr On Fri

Re: Very strange (AI generated) issues

2025-01-22 Thread Piotr Findeisen
Hi Thank you Jarek for taking care of this matter! > Should we react and block new users from interacting with Airflow repo if we see it happening again? Maintainers' time is not an infinite resource, so "yes!" from me (also for Iceberg). Best On Wed, 22 Jan 2025 at 15:40, Russell Spitzer

Re: Very strange (AI generated) issues

2025-01-23 Thread Piotr Findeisen
people into thinking they are "testing" >> issue creation where it actually created those issues >> * I guess whoever has the tool realised their mistake and either stopped >> it or removed some confusion >> * I have my own suspicions (which I am exploring) - but I aske

Re: [VOTE] Deprecate IRC snapshot-id Field of SetStatisticsUpdate

2025-01-21 Thread Piotr Findeisen
+1 non-binding On Tue, 21 Jan 2025 at 10:25, Fokko Driesprong wrote: > +1 > > Thanks for cleaning this up Christian! > > Kind regards, > Fokko > > Op di 21 jan 2025 om 08:25 schreef Christian Thiel < > christian.t.b...@gmail.com>: > >> Hi everyone, >> >> based on good feedback on the [DISCUSS] t