alamb opened a new issue, #15177: URL: https://github.com/apache/datafusion/issues/15177
### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/14586 [Comparing ClickBench on DataFusion 45 and DuckDB (link)](https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQiI6ZmFsc2UsIkFsbG95REIgKHR1bmVkKSI6ZmFsc2UsIkF0aGVuYSAocGFydGl0aW9uZWQpIjpmYWxzZSwiQXRoZW5hIChzaW5nbGUpIjpmYWxzZSwiQXVyb3JhIGZvciBNeVNRTCI6ZmFsc2UsIkF1cm9yYSBmb3IgUG9zdGdyZVNRTCI6ZmFsc2UsIkJ5Q29uaXR5IjpmYWxzZSwiQnl0ZUhvdXNlIjpmYWxzZSwiY2hEQiAoRGF0YUZyYW1lKSI6ZmFsc2UsImNoREIgKFBhcnF1ZXQsIHBhcnRpdGlvbmVkKSI6ZmFsc2UsImNoREIiOmZhbHNlLCJDaXR1cyI6ZmFsc2UsIkNsaWNrSG91c2UgQ2xvdWQgKGF3cykiOmZhbHNlLCJDbGlja0hvdXNlIENsb3VkIChhenVyZSkiOmZhbHNlLCJDbGlja0hvdXNlIENsb3VkIChnY3ApIjpmYWxzZSwiQ2xpY2tIb3VzZSAoZGF0YSBsYWtlLCBwYXJ0aXRpb25lZCkiOmZhbHNlLCJDbGlja0hvdXNlIChkYXRhIGxha2UsIHNpbmdsZSkiOmZhbHNlLCJDbGlja0hvdXNlIChQYXJxdWV0LCBwYXJ0aXRpb25lZCkiOmZhbHNlLCJDbGlja0hvdXNlIChQYXJxdWV0LCBzaW5nbGUpIjpmYWxzZSwiQ2xpY2tIb3VzZSAod2ViKSI6ZmFsc2UsIkNsaWNrSG91c2UiOmZhbHNlLCJDbGlja0hvdXNlICh0dW5lZCkiOmZhbHNlLCJDbGlja0hvdXNlICh0dW5lZCwgbWVtb3J5KSI6ZmFsc2UsIkNsb3VkYmVycnkiOmZhbHNlLCJDcmF0ZURCIjpmYWx zZSwiQ3J1bmNoeSBCcmlkZ2UgZm9yIEFuYWx5dGljcyAoUGFycXVldCkiOmZhbHNlLCJEYXRhYmVuZCI6ZmFsc2UsIkRhdGFGdXNpb24gKFBhcnF1ZXQsIHBhcnRpdGlvbmVkKSI6dHJ1ZSwiRGF0YUZ1c2lvbiAoUGFycXVldCwgc2luZ2xlKSI6ZmFsc2UsIkFwYWNoZSBEb3JpcyI6ZmFsc2UsIkRyaWxsIjpmYWxzZSwiRHJ1aWQiOmZhbHNlLCJEdWNrREIgKERhdGFGcmFtZSkiOmZhbHNlLCJEdWNrREIgKG1lbW9yeSkiOmZhbHNlLCJEdWNrREIgKFBhcnF1ZXQsIHBhcnRpdGlvbmVkKSI6dHJ1ZSwiRHVja0RCIjpmYWxzZSwiRWxhc3RpY3NlYXJjaCI6ZmFsc2UsIkVsYXN0aWNzZWFyY2ggKHR1bmVkKSI6ZmFsc2UsIkdsYXJlREIiOmZhbHNlLCJHcmVlbnBsdW0iOmZhbHNlLCJIZWF2eUFJIjpmYWxzZSwiSHlkcmEiOmZhbHNlLCJTYWxlc2ZvcmNlIEh5cGVyIChQYXJxdWV0KSI6ZmFsc2UsIlNhbGVzZm9yY2UgSHlwZXIiOmZhbHNlLCJJbmZvYnJpZ2h0IjpmYWxzZSwiS2luZXRpY2EiOmZhbHNlLCJNYXJpYURCIENvbHVtblN0b3JlIjpmYWxzZSwiTWFyaWFEQiI6ZmFsc2UsIk1vbmV0REIiOmZhbHNlLCJNb25nb0RCIjpmYWxzZSwiTW90aGVyRHVjayI6ZmFsc2UsIk15U1FMIChNeUlTQU0pIjpmYWxzZSwiTXlTUUwiOmZhbHNlLCJPY3RvU1FMIjpmYWxzZSwiT3B0ZXJ5eCI6ZmFsc2UsIk94bGEiOmZhbHNlLCJQYW5kYXMgKERhdGFGcmFtZSkiOmZhbHNlLCJQYXJhZGVEQiAoUGFycXVldCwgcGFydGl0aW9uZWQpIjpm YWxzZSwiUGFyYWRlREIgKFBhcnF1ZXQsIHNpbmdsZSkiOmZhbHNlLCJwZ19kdWNrZGIgKHdpdGggaW5kZXhlcykiOmZhbHNlLCJwZ19kdWNrZGIgKE1vdGhlckR1Y2sgZW5hYmxlZCkiOmZhbHNlLCJwZ19kdWNrZGIiOmZhbHNlLCJwZ19kdWNrZGIgKFBhcnF1ZXQpIjpmYWxzZSwiUG9zdGdyZVNRTCB3aXRoIHBnX21vb25jYWtlIjpmYWxzZSwiUGlub3QiOmZhbHNlLCJQb2xhcnMgKERhdGFGcmFtZSkiOmZhbHNlLCJQb2xhcnMgKFBhcnF1ZXQpIjpmYWxzZSwiUG9zdGdyZVNRTCAod2l0aCBpbmRleGVzKSI6ZmFsc2UsIlBvc3RncmVTUUwiOmZhbHNlLCJRdWVzdERCIjpmYWxzZSwiUmVkc2hpZnQiOmZhbHNlLCJTZWxlY3REQiI6ZmFsc2UsIlNpbmdsZVN0b3JlIjpmYWxzZSwiU25vd2ZsYWtlIjpmYWxzZSwiU3BhcmsiOmZhbHNlLCJTUUxpdGUiOmZhbHNlLCJTdGFyUm9ja3MiOmZhbHNlLCJUYWJsZXNwYWNlIjpmYWxzZSwiVGVtYm8gT0xBUCAoY29sdW1uYXIpIjpmYWxzZSwiVGltZXNjYWxlIENsb3VkIjpmYWxzZSwiVGltZXNjYWxlREIgKG5vIGNvbHVtbnN0b3JlKSI6ZmFsc2UsIlRpbWVzY2FsZURCIjpmYWxzZSwiVGlueWJpcmQgKEZyZWUgVHJpYWwpIjpmYWxzZSwiVW1icmEiOmZhbHNlLCJVcnNhIjpmYWxzZSwiVmljdG9yaWFMb2dzIjpmYWxzZX0sInR5cGUiOnsiQyI6dHJ1ZSwiY29sdW1uLW9yaWVudGVkIjp0cnVlLCJQb3N0Z3JlU1FMIGNvbXBhdGlibGUiOnRydWUsIm1hbmFnZWQiOnRydWUsImdjcCI6d HJ1ZSwic3RhdGVsZXNzIjp0cnVlLCJKYXZhIjp0cnVlLCJDKysiOnRydWUsIk15U1FMIGNvbXBhdGlibGUiOnRydWUsInJvdy1vcmllbnRlZCI6dHJ1ZSwiQ2xpY2tIb3VzZSBkZXJpdmF0aXZlIjp0cnVlLCJlbWJlZGRlZCI6dHJ1ZSwic2VydmVybGVzcyI6dHJ1ZSwiZGF0YWZyYW1lIjp0cnVlLCJhd3MiOnRydWUsImF6dXJlIjp0cnVlLCJhbmFseXRpY2FsIjp0cnVlLCJSdXN0Ijp0cnVlLCJzZWFyY2giOnRydWUsImRvY3VtZW50Ijp0cnVlLCJHbyI6dHJ1ZSwic29tZXdoYXQgUG9zdGdyZVNRTCBjb21wYXRpYmxlIjp0cnVlLCJEYXRhRnJhbWUiOnRydWUsInBhcnF1ZXQiOnRydWUsInRpbWUtc2VyaWVzIjp0cnVlfSwibWFjaGluZSI6eyIxNiB2Q1BVIDEyOEdCIjpmYWxzZSwiOCB2Q1BVIDY0R0IiOmZhbHNlLCJzZXJ2ZXJsZXNzIjpmYWxzZSwiMTZhY3UiOmZhbHNlLCJjNmEuNHhsYXJnZSwgNTAwZ2IgZ3AyIjp0cnVlLCJMIjpmYWxzZSwiTSI6ZmFsc2UsIlMiOmZhbHNlLCJYUyI6ZmFsc2UsImM2YS5tZXRhbCwgNTAwZ2IgZ3AyIjpmYWxzZSwiMTJHaUIsIDEgcmVwbGljYShzKSI6ZmFsc2UsIjhHaUIsIDEgcmVwbGljYShzKSI6ZmFsc2UsIjEyR2lCLCAyIHJlcGxpY2EocykiOmZhbHNlLCIxMjBHaUIsIDIgcmVwbGljYShzKSI6ZmFsc2UsIjE2R2lCLCAyIHJlcGxpY2EocykiOmZhbHNlLCIyMzZHaUIsIDIgcmVwbGljYShzKSI6ZmFsc2UsIjMyR2lCLCAyIHJlcGxpY2EocykiOmZhbHNlLCI2NEdpQiwgMiByZX BsaWNhKHMpIjpmYWxzZSwiOEdpQiwgMiByZXBsaWNhKHMpIjpmYWxzZSwiMTJHaUIsIDMgcmVwbGljYShzKSI6ZmFsc2UsIjEyMEdpQiwgMyByZXBsaWNhKHMpIjpmYWxzZSwiMTZHaUIsIDMgcmVwbGljYShzKSI6ZmFsc2UsIjIzNkdpQiwgMyByZXBsaWNhKHMpIjpmYWxzZSwiMzJHaUIsIDMgcmVwbGljYShzKSI6ZmFsc2UsIjY0R2lCLCAzIHJlcGxpY2EocykiOmZhbHNlLCI4R2lCLCAzIHJlcGxpY2EocykiOmZhbHNlLCJjNW4uNHhsYXJnZSwgNTAwZ2IgZ3AyIjpmYWxzZSwiQW5hbHl0aWNzLTI1NkdCICg2NCB2Q29yZXMsIDI1NiBHQikiOmZhbHNlLCJjNS40eGxhcmdlLCA1MDBnYiBncDIiOmZhbHNlLCJjNmEuNHhsYXJnZSwgMTUwMGdiIGdwMiI6ZmFsc2UsIlhMIjpmYWxzZSwiSnVtYm8iOmZhbHNlLCJQdWxzZSI6ZmFsc2UsIlN0YW5kYXJkIjpmYWxzZSwiZGMyLjh4bGFyZ2UiOmZhbHNlLCJyYTMuMTZ4bGFyZ2UiOmZhbHNlLCJyYTMuNHhsYXJnZSI6ZmFsc2UsInJhMy54bHBsdXMiOmZhbHNlLCJTMiI6ZmFsc2UsIlMyNCI6ZmFsc2UsIjJYTCI6ZmFsc2UsIjNYTCI6ZmFsc2UsIjRYTCI6ZmFsc2UsIkwxIC0gMTZDUFUgMzJHQiI6ZmFsc2UsImM2YS40eGxhcmdlLCA1MDBnYiBncDMiOmZhbHNlLCIxNiB2Q1BVIDY0R0IiOmZhbHNlLCI0IHZDUFUgMTZHQiI6ZmFsc2UsIjggdkNQVSAzMkdCIjpmYWxzZX0sImNsdXN0ZXJfc2l6ZSI6eyIxIjp0cnVlLCIyIjp0cnVlLCIzIjp0cnVlLCI0Ijp0cnVlLCI4Ijp0cnV lLCIxNiI6dHJ1ZSwiMzIiOnRydWUsIjY0Ijp0cnVlLCIxMjgiOnRydWUsInNlcnZlcmxlc3MiOnRydWUsInVuZGVmaW5lZCI6dHJ1ZX0sIm1ldHJpYyI6ImhvdCIsInF1ZXJpZXMiOlt0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlXX0=) You can see that for 23 DataFusion is almost 2x slower (around 10s where DuckDB is 5s)  You can run this query like this: ```shell cd datafusion cd benchmarks # download data ./bench.sh data clickbench_partitioned # run query with datafusion-cli (note escapes datafusion-cli -c "SELECT * FROM 'data/hits_partitioned' WHERE \"URL\" LIKE '%google%' ORDER BY \"EventTime\" LIMIT 10;" ``` Here is the explain plan ``` andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion/benchmarks$ datafusion-cli -c "EXPLAIN SELECT * FROM 'data/hits_partitioned' WHERE \"URL\" LIKE '%google%' ORDER BY \"EventTime\" LIMIT 10;" DataFusion CLI v46.0.0 +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------+ | logical_plan | Sort: data/hits_partitioned.EventTime ASC NULLS LAST, fetch=10 | | | Filter: CAST(data/hits_partitioned.URL AS Utf8View) LIKE Utf8View("%google%") | | | TableScan: data/hits_partitioned projection=[WatchID, JavaEnable, Title, GoodEvent, EventTime, EventDate, CounterID, ClientIP, RegionID, UserID, CounterClass, OS, UserAgent, URL, Referer, IsRefresh, RefererCategoryID, RefererRegionID, URLCategoryID, URLRegionID, ResolutionWidth, ResolutionHeight, ResolutionDepth, FlashMajor, FlashMinor, FlashMinor2, NetMajor, NetMinor, UserAgentMajor, UserAgentMinor, CookieEnable, JavascriptEnable, IsMobile, MobilePhone, MobilePhoneModel, Params, IPNetworkID, TraficSourceID, SearchEngineID, SearchPhrase, AdvEngineID, IsArtifical, WindowClientWidth, WindowClientHeight, ClientTimeZone, ClientEventTime, SilverlightVersion1, SilverlightVersion2, SilverlightVersion3, SilverlightVersion4, PageCharset, CodeVersion, IsLink, IsDownload, IsNotBounce, FUniqID, OriginalURL, HID, IsOldCounter, IsEvent, IsParameter, DontCountHits, WithHash, HitColor, LocalEventTime, Age, Sex, Income, Interests, Robotness, RemoteIP, WindowName, OpenerName, HistoryLength, BrowserLanguage, BrowserCountry, SocialNetwork, SocialAction, HTTPError, SendTiming, DNSTiming, ConnectTiming, ResponseStartTiming, ResponseEndTiming, FetchTiming, SocialSourceNetworkID, SocialSourcePage, ParamPrice, ParamOrderID, ParamCurrency, ParamCurrencyID, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID, UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash, URLHash, CLID], partial_filters=[CAST(data/hits_partitioned.URL AS Utf8View) LIKE Utf8View("%google%")] | | physical_plan | SortPreservingMergeExec: [EventTime@4 ASC NULLS LAST], fetch=10 | | | SortExec: TopK(fetch=10), expr=[EventTime@4 ASC NULLS LAST], preserve_partitioning=[true] | | | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: CAST(URL@13 AS Utf8View) LIKE %google% | | | DataSourceExec: file_groups={16 groups: [[Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_0.parquet:0..122446530, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_1.parquet:0..174965044, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_10.parquet:0..101513258, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_11.parquet:0..118419888, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_12.parquet:0..149514164, ...], [Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_14.parquet:108113265..151121699, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_15.parquet:0..103098894, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_16.parquet:0..101067219, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_17.parquet:0..116867853, Users/andrewla mb/Software/datafusion/benchmarks/data/hits_partitioned/hits_18.parquet:0..133119589, ...], [Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_21.parquet:3887560..113455196, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_22.parquet:0..79775901, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_23.parquet:0..79631107, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_24.parquet:0..78257049, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_25.parquet:0..144169728, ...], [Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_28.parquet:106905624..162772407, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_29.parquet:0..79213288, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_3.parquet:0..192507052, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_30.parquet:0. .124187913, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_31.parquet:0..123065410, ...], [Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_35.parquet:54087340..153632381, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_36.parquet:0..92487304, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_37.parquet:0..108247781, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_38.parquet:0..132005180, Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned/hits_39.parquet:0..103522954, ...], ...]}, projection=[WatchID, JavaEnable, Title, GoodEvent, EventTime, EventDate, CounterID, ClientIP, RegionID, UserID, CounterClass, OS, UserAgent, URL, Referer, IsRefresh, RefererCategoryID, RefererRegionID, URLCategoryID, URLRegionID, ResolutionWidth, ResolutionHeight, ResolutionDepth, FlashMajor, FlashMinor, FlashMinor2, NetMajor, NetMinor, UserAgentMajor, User AgentMinor, CookieEnable, JavascriptEnable, IsMobile, MobilePhone, MobilePhoneModel, Params, IPNetworkID, TraficSourceID, SearchEngineID, SearchPhrase, AdvEngineID, IsArtifical, WindowClientWidth, WindowClientHeight, ClientTimeZone, ClientEventTime, SilverlightVersion1, SilverlightVersion2, SilverlightVersion3, SilverlightVersion4, PageCharset, CodeVersion, IsLink, IsDownload, IsNotBounce, FUniqID, OriginalURL, HID, IsOldCounter, IsEvent, IsParameter, DontCountHits, WithHash, HitColor, LocalEventTime, Age, Sex, Income, Interests, Robotness, RemoteIP, WindowName, OpenerName, HistoryLength, BrowserLanguage, BrowserCountry, SocialNetwork, SocialAction, HTTPError, SendTiming, DNSTiming, ConnectTiming, ResponseStartTiming, ResponseEndTiming, FetchTiming, SocialSourceNetworkID, SocialSourcePage, ParamPrice, ParamOrderID, ParamCurrency, ParamCurrencyID, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID, UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash, URLHash, CLID], file_type=parquet, predicate=CAST(URL@13 AS Utf8View) LIKE %google% | | | | +---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.056 seconds. ``` Something that immediately jumps out at me in the explain plan is this line ``` | | DataSourceExec: file_groups={16 groups: ...}, projection=[WatchID, JavaEnable, Title, GoodEvent, EventTime, EventDate, CounterID, ClientIP, RegionID, UserID, CounterClass, OS, UserAgent, URL, Referer, IsRefresh, RefererCategoryID, RefererRegionID, URLCategoryID, URLRegionID, ResolutionWidth, ResolutionHeight, ResolutionDepth, FlashMajor, FlashMinor, FlashMinor2, NetMajor, NetMinor, UserAgentMajor, UserAgentMinor, CookieEnable, JavascriptEnable, IsMobile, MobilePhone, MobilePhoneModel, Params, IPNetworkID, TraficSourceID, SearchEngineID, SearchPhrase, AdvEngineID, IsArtifical, WindowClientWidth, WindowClientHeight, ClientTimeZone, ClientEventTime, SilverlightVersion1, SilverlightVersion2, SilverlightVersion3, SilverlightVersion4, PageCharset, CodeVersion, IsLink, IsDownload, IsNotBounce, FUniqID, OriginalURL, HID, IsOldCounter, IsEvent, IsParameter, DontCountHits, WithHash, HitColor, LocalEventTime, Age, Sex, Income, Interests, Robotness, RemoteIP, WindowN ame, OpenerName, HistoryLength, BrowserLanguage, BrowserCountry, SocialNetwork, SocialAction, HTTPError, SendTiming, DNSTiming, ConnectTiming, ResponseStartTiming, ResponseEndTiming, FetchTiming, SocialSourceNetworkID, SocialSourcePage, ParamPrice, ParamOrderID, ParamCurrency, ParamCurrencyID, OpenstatServiceName, OpenstatCampaignID, OpenstatAdID, OpenstatSourceID, UTMSource, UTMMedium, UTMCampaign, UTMContent, UTMTerm, FromTag, HasGCLID, RefererHash, URLHash, CLID], file_type=parquet, predicate=CAST(URL@13 AS Utf8View) LIKE %google% | ``` "Projection" I think means that all of those columns are being read/ decoded from parquet, which makes sense as the query has a `SELECT *` on it. However, in this case all but the top 10 rows are returned (out of 100M rows in the file) So this means that most of the decoded data is decoded and thrown away immediately ### Describe the solution you'd like I would like to close the gap with DuckDB with some general purpose improvement ### Describe alternatives you've considered I think the way to improve performance here is to defer decoding ("Materializing") the other columns until we know what the top 10 rows are. some wacky ideas: 1. Push the topk / ordering into the scan somehow 2. implement "late materialization" Late materialization would look something like 1. decode only the EventTime column and a `row_id` 2. determine the top 10 row_id by sorting by EventTime 3. Decode only those 10 rows from the parquet file(s) ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org