Thanks for the updates, Jing!
+1 to start the vote.
Best,
Jark
On Fri, 22 Jul 2022 at 20:10, Jing Ge wrote:
> Hi,
>
> I have updated the FLIP. Please check again. If there are no other
> concerns, I will start voting. Thank you all for your support!
>
> Best regards,
> Jing
>
> On Fri, Jul 22,
Hi,
I have updated the FLIP. Please check again. If there are no other
concerns, I will start voting. Thank you all for your support!
Best regards,
Jing
On Fri, Jul 22, 2022 at 1:32 PM Jing Ge wrote:
> Thanks Jark, fair enough, I will update the FLIP accordingly.
>
> Best regards,
> Jing
>
> O
Thanks Jark, fair enough, I will update the FLIP accordingly.
Best regards,
Jing
On Fri, Jul 22, 2022 at 6:07 AM Jark Wu wrote:
> Hi Jing,
>
> I have some concerns about the isBulkGetSupported() approach.
> 1. Catalog developers need to learn the contract between
> `isBulkGetSupported()` and bu
Hi Jing,
I have some concerns about the isBulkGetSupported() approach.
1. Catalog developers need to learn the contract between
`isBulkGetSupported()` and bulk get methods
2. The contract of isBulkGetSupported() is fragile. Because developers may
forget to override
`isBulkGetSupported` and be c
Thanks Jingsong and Jark. I will create another FLIP to cover the
optimization topic that partitions and partition stats could be fetched
from catalog in one single call.
Thanks for the hint w.r.t. the compatibility issue. I have updated the FLIP
to provide all methods as default interface methods
I agree with Jingsong.
There are use cases to get partitions and partition stats in a single call
to reduce the IO cost.
For example, extending Catalog#listPartitions to
Catalog#listPartitionsWithStats,
and extending Catalog#listPartitionsByFilter to
Catalog#listPartitionsWithStatsByFilter.
This a
Thanks for your reply.
- Consider bulkGetPartitionStatistics, partition statistics are
already in HiveMetastoreClient.listPartitions. But on our side, we
need Catalog.getPartitions first, and then
Catalog.bulkGetPartitionStatistics.
- Consider bulkGetPartitionColumnStatistics, yes, as you said, w
Hi Jingsong,
Thanks for clarifying it. Are you suggesting a new method or changing the
name of the methods described in the FLIP?
Please see my answers and further questions below.
Best regards,
Jing
On Wed, Jul 20, 2022 at 4:28 AM Jingsong Li wrote:
> Hi Jing,
>
> I understand that the statis
Hi Jing,
I understand that the statistics for partitions are currently only
used by Hive, so we can look at the Hive implementation:
See HiveCatalog.getPartitionStatistics.
To get the statistics, we actually get them from the
org.apache.hadoop.hive.metastore.api.Partition object.
According to Hi
Thanks Jingsong for the suggestion.
Do you mean using a different naming convention? There is a thought and
description in the FLIP about using "list" or "bulkGet":
- bulkGetPartitionStatistics(...) has been chosen over
listPartitionStatistics(...), because, comparing to database and partit
Hi Jing,
Thanks for the driving this, LGTM.
Best,
Godfrey
Jingsong Li 于2022年7月15日周五 11:38写道:
>
> Thanks for starting this discussion.
>
> Have we considered introducing a listPartitionWithStats() in Catalog?
>
> Best,
> Jingsong
>
> On Fri, Jul 15, 2022 at 10:08 AM Jark Wu wrote:
> >
> > Hi Ji
Thanks for starting this discussion.
Have we considered introducing a listPartitionWithStats() in Catalog?
Best,
Jingsong
On Fri, Jul 15, 2022 at 10:08 AM Jark Wu wrote:
>
> Hi Jing,
>
> Thanks for starting this discussion. The bulk fetch is a great improvement
> for the optimizer.
> The FLIP l
Hi Jing,
Thanks for starting this discussion. The bulk fetch is a great improvement
for the optimizer.
The FLIP looks good to me.
Best,
Jark
On Fri, 8 Jul 2022 at 17:36, Jing Ge wrote:
> Hi devs,
>
> After having multiple discussions with Jark and Goldfrey, I'd like to start
> a discussion on
Hi devs,
After having multiple discussions with Jark and Goldfrey, I'd like to start
a discussion on the mailing list w.r.t. FLIP-247[1], which will
significantly improve the performance by providing the bulk fetch
capability for table and column statistics.
Currently the statistics information a
14 matches
Mail list logo