Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

Jiabao Sun Tue, 19 Dec 2023 06:29:28 -0800

Hi David,

Sorry, the last two comments at the bottom of the email list were replies from 
a long time ago. 
I didn't have a good understanding of the display order of the email list 
before, which caused inconvenience.

You can refer to the comments above and the latest FLIP-377 document[1] for 
some updates on the previous discussion.

1. Although the current name has been adjusted to "filter.handling.policy", I 
believe it is necessary to provide some explanation. 
The SupportsFilterPushdown interface is specifically designed for 
ScanTableSource, as clearly stated in the Java documentation. 
Additionally, if a LookupTableSource does not allow performing filters, it is 
sufficient to use a ScanTableSource to read the entire data into memory.

2. Full table scans often occur when our query conditions do not hit an index. 
In such cases, the database needs to traverse all the data to find the records 
that match the query conditions. 
This can be costly, especially for large tables. 
If we do not perform filters pushdown, we can traverse the data in a specific 
order, such as using a primary key index, 
and filter them by external computing resources. 
This approach can help reduce the CPU overhead for the database.

3. As explained in point 2, if we do not want to increase additional query 
overhead on the database
but still want to filter the data, we may choose to disable filter pushdown.

Best,
Jiabao

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768

> 2023年12月19日 20:09，David Radley <david...@apache.org> 写道：
> 
> Hi,
> 
> I had a 3 of comments:
> - the name of the config option is "scan.filter-push-down.enabled". This 
> implies it is only for scan sources and not lookups. I suggest removing the 
> scan. prefix.
> - there is a talk of having a numeric option, as the filter pushdown might 
> result in a full table scan. Without the filter being pushed down - I assume 
> there will be a full table scan anyway - how is the filter pushdown full 
> table scan worse than the full table scan that will occur without it.
> - what are the use cases to not pushdown filters if the source supports it. 
> The only one I can think of is that during development , you can easily 
> compare query results between pushed down filters and not - to check they are 
> the same. Are there other cases?    
> 
> On 2023/10/24 14:13:38 Jiabao Sun wrote:
>> Thanks Jark, Martijn, Xuyang for the valuable feedback.
>> 
>> Adding only the "scan.filter-push-down.enabled" configuration option would 
>> be great for me as well.
>> Optimization for this public behavior can be added later.
>> 
>> I made some modifications to the FLIP document and added the approach of 
>> adding new method to the Rejected Alternatives section. 
>> 
>> Looking forward to your feedback again.
>> 
>> Best,
>> Jiabao
>> 
>> 
>>> 2023年10月24日 16:58，Jark Wu <imj...@gmail.com> 写道：
>>> 
>>> the current interface can already satisfy your requirements.
>>> The connector can reject all the filters by returning the input filters
>>> as `Result#remainingFilters`.
>>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

Reply via email to