I'm going to revert the change to NaN tracking that makes that field
required. I think we can make other fields required in table metadata.json
files and manifests, but that one in the manifest list isn't a good idea.
I'll open a PR to update it this weekend and I'll update the distinct
counts PR f
For the last month, I’ve been actively working on using the v2 spec in Spark.
Specifically, my focus is to implement merge-on-read using the proposed API in
Spark [1]. That’s why I would support the idea of adopting v2 as the current
design is sufficient to implement considered use cases. I expe
The motivation is that some query engines want to at least estimate a
min/max range for distinct value counts. Even if these are imperfect, at
least it is better than no information.
On Fri, Jul 23, 2021 at 4:08 PM Anton Okolnychyi
wrote:
> I am OK returning the metric back as long as it is base
I am OK returning the metric back as long as it is based on writing data and is
an approximation (to avoid too big performance and space overhead on write).
It seems the biggest problem is that metric per file is not useful unless we
query a single file. That’s why we should have an idea how th
Yeah, like Ryan said we are currently thinking about storing secondary
indexes and sketches at the partition level. To do that, we're considering
a new partition-granularity metadata file that has stats that are useful
for job planning and pointers to indexes and sketches.
As for the sketches you
Hey Piotr,
There are a few proposals around secondary indexes floating around[1][2].
The current thinking is that this would be the best place for sketches to
live.
Best,
Ryan
[1]
https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY/edit#heading=h.uqr5wcfm85p7
[2]
http
Hi,
File level distinct count (a number) has limited applicability in Trino.
It's useful for pointed queries, where we can prune all the other files
away, but in other cases, Trino optimizer wouldn't be able to make an
educated use of that.
Internally, Łukasz and I we were talking about sketches