Hi Jingsong, Yeah auto compaction (and auto vacuum) should be the ideal state we probably want to reach. That requires "additional latency" on finishing a batch (assuming synchronous task on compaction), though it won't matter much with a proper interval, as end users would expect the latency on a given interval. The details would be how to restrict the time to complete so that it isn't beyond the expectation from end users.
I guess it's better to fork a new discussion thread for this (as the origin thread is basically for questions), but I'm still a newbie on Iceberg and exploring so probably I wouldn't be helpful as of now (treat me as one of the end users). If there's a plan from your end, you can go ahead designing and sharing. It'd be amazing to see the plan on supporting this! On Wed, Jul 29, 2020 at 2:40 PM Jingsong Li <jingsongl...@gmail.com> wrote: > Thanks Jungtaek for starting this discussion. > > What our team wants to do is data ingest into the Iceberg table with one > minute frequency. This frequency can also lead to a large number of small > files. > > Auto compaction(rewrites manifest and data files) in the streaming > sink(writer) looks wonderful. As you said, we need to also consider auto > `expireSnapshots` and `write.metadata.delete-after-commit.enabled`. > > Ideally, we provide a streaming writer, giving two config options: > - auto-compact = true ((like Databricks auto compaction [1], after a > successful commit, the selected files (including manifest and datafile) > need to be merged for a reasonable compaction.) > - snapshot-time-to-live = 10 day (limit how far back you can time travel) > > I feel that this is very useful for users and can really bring iceberg > into the field of quasi real-time. > > [1] > https://docs.databricks.com/delta/optimizations/auto-optimize.html#auto-compaction > > Best, > Jingsong > > On Wed, Jul 29, 2020 at 3:27 AM Ryan Blue <rb...@netflix.com.invalid> > wrote: > >> Thanks for the PR! We will review it soon. >> >> The RewriteManifestAction is a parallel way to rewrite manfiests and then >> commit them using the RewriteManifests table operation. The idea here is >> that the metadata tree (manifest list and manifest files) functions as an >> index over data in the table. Sometimes, you want to rewrite that index to >> make finding data easier. >> >> When data is appended, the new data files are placed into a new manifest >> that is added to the manifest list file. Manifests are also compacted when >> there are enough of them or a group reaches a certain size. Both of these >> operations tend to cluster data in manifests by the time that it was added >> to the table. This is great for time-series data because you get manifests >> that each have a few days or a week of data, and a time filter can find the >> right manifests and data files quickly. But if you want to look up data by >> a dimension other than time -- for example, using the bucket of an ID -- >> then the natural clustering doesn't work well. In that case, you can use >> RewriteManifests or the RewriteManifestsAction to cluster data files by >> some key. That really helps speed up queries for specific rows. >> >> On Mon, Jul 27, 2020 at 8:41 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> I'd love to contribute documentation about the actions - just need some >>> time to understand the needs for some actions (like RewriteManifestAction). >>> >>> I just submitted a PR for structured streaming sink [1]. I mentioned >>> expireSnapshot() there with linking javadoc page, but it'd be nice if >>> there's also a code example in the API page, as it'd not be bound for >>> Spark's case (any fast changing cases including Flink streaming would >>> need this as well). >>> >>> 1. https://github.com/apache/iceberg/pull/1261 >>> >>> On Tue, Jul 28, 2020 at 10:39 AM Ryan Blue <rb...@netflix.com> wrote: >>> >>>> > seems to fail on high rate writing streaming query being run on the >>>> other side >>>> >>>> This kind of situation is where you'd want to tune the number of >>>> retries for a table. That's a likely source of the problem. We can also >>>> check to make sure we're being smart about conflict detection. A rewrite >>>> needs to scan any manifests that might have data files that conflict, which >>>> is why retries can take a little while. The farther the rewrite is from >>>> active data partitions, the better. And we can double-check to make sure >>>> we're using the manifest file partition ranges to avoid doing unnecessary >>>> work. >>>> >>>> > from the end user's point of view, all actions are not documented >>>> >>>> Yes, we need to add documentation for the actions. If you're >>>> interested, feel free to open PRs! The actions are fairly new, so we don't >>>> yet have docs for them. >>>> >>>> Same with the streaming sink, we just need someone to write up docs and >>>> contribute them. We don't use the streaming sink, so I've unfortunately >>>> overlooked it. >>>> >>>> On Mon, Jul 27, 2020 at 3:25 PM Jungtaek Lim < >>>> kabhwan.opensou...@gmail.com> wrote: >>>> >>>>> Thanks for the quick response! >>>>> >>>>> And yes I also went through experimenting expireSnapshots() and it >>>>> looked good. I can imagine some alternative conditions on expiring >>>>> snapshots (like adjusting "granularity" between snapshots instead of >>>>> removing all snapshots before the specific timestamp), but for now it's >>>>> just an idea and I don't have support for real world needs. >>>>> >>>>> I also went through RewriteDataFilesAction and looked good as well. >>>>> There's an existing Github issue to make the action be more intelligent >>>>> which is valid and good to add. One thing I indicated is that it's a bit >>>>> time-consuming task (expected for sure, not a problem) and seems to fail >>>>> on >>>>> high rate writing streaming query being run on the other side (this is a >>>>> concern). I guess the action is only related to the old snapshots, hence >>>>> no >>>>> conflict is expected against fast append. It would be nice to know whether >>>>> it's an expected behavior and it's recommended to stop all writes before >>>>> running the action, or sounds like a bug. >>>>> >>>>> I haven't gone through RewriteManifestAction, though for now I am only >>>>> curious about the needs. I'm eager to experiment with a streaming source >>>>> which is in review - don't know about the details of Iceberg so >>>>> not qualified to participate reviewing. I'd rather play with it when >>>>> available and use it as a chance to learn about Iceberg itself. >>>>> >>>>> Btw, from the end user's point of view, all actions are not documented >>>>> - even structured streaming sink is not documented and I had to go over >>>>> the >>>>> code. While I think it's obvious to document a streaming sink on Spark doc >>>>> (wondering why it missed the documentation), would we want to document for >>>>> actions as well? Looks like these actions are still evolving so wondering >>>>> whether we are waiting for stabilizing, or just missed documentation. >>>>> >>>>> Thanks, >>>>> Jungtaek Lim >>>>> >>>>> On Tue, Jul 28, 2020 at 2:45 AM Ryan Blue <rb...@netflix.com.invalid> >>>>> wrote: >>>>> >>>>>> Hi Jungtaek, >>>>>> >>>>>> That setting controls whether Iceberg cleans up old copies of the >>>>>> table metadata file. The metadata file holds references to all of the >>>>>> table's snapshots (that have no expired) and is self-contained. No >>>>>> operations need to access previous metadata files. >>>>>> >>>>>> Those aren't typically that large, but could be when streaming data >>>>>> because you create a lot of versions. For streaming, I'd recommend >>>>>> turning >>>>>> it on and making sure you're running `expireSnapshots()` regularly to >>>>>> prune >>>>>> old table versions -- although expiring snapshots will remove them from >>>>>> table metadata and limit how far back you can time travel. >>>>>> >>>>>> On Mon, Jul 27, 2020 at 4:33 AM Jungtaek Lim < >>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>> >>>>>>> Hi devs, >>>>>>> >>>>>>> I'm experimenting with Apache Iceberg for Structured Streaming sink >>>>>>> - plan to experiment with source as well, but I see PR still in review. >>>>>>> >>>>>>> It seems that "fast append" pretty much helps to retain reasonable >>>>>>> latency for committing, though the metadata directory grows too fast. I >>>>>>> found the option 'write.metadata.delete-after-commit.enabled' (false by >>>>>>> default), and disabled it, and the overall size looks fine afterwards. >>>>>>> >>>>>>> That said, given the option is false by default, I'm wondering which >>>>>>> would be impacted when turning off this option. My understanding is >>>>>>> that it >>>>>>> doesn't affect time-travel (as it refers to a snapshot), and restoring >>>>>>> is >>>>>>> also from snapshot, so not sure which point to consider when turning on >>>>>>> the >>>>>>> option. >>>>>>> >>>>>>> Thanks, >>>>>>> Jungtaek Lim >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Software Engineer >>>>>> Netflix >>>>>> >>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > > > -- > Best, Jingsong Lee >