zhuqi-lucas commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2993487268
> Thank you @zhuqi-lucas for experimenting on this. Maybe it's a good idea to do some profiling to see the hots spots? > > For example, this is the profile I get from the sort-tpch benchmark. > > <img alt="image" width="1728" src="https://private-user-images.githubusercontent.com/163737/457561945-88a72c7b-472e-438f-964b-ee43101df958.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTA0OTg1OTUsIm5iZiI6MTc1MDQ5ODI5NSwicGF0aCI6Ii8xNjM3MzcvNDU3NTYxOTQ1LTg4YTcyYzdiLTQ3MmUtNDM4Zi05NjRiLWVlNDMxMDFkZjk1OC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNjIxJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDYyMVQwOTMxMzVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lNzQ4MGNhY2FhZjllN2I1Y2NhZTM5MGUyNGM2NThjYTQ3NDc2ZjY0ZGRhMTg2YWRjZWY4ZTU5YjNkNDljOWI1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.4BjfhYttn4cKWwp2bWfeOZpFi9oLvqqXgyKqBhe5LGc"> > * You can see here most of the work is concentrated in SortPreservingMerge, rather than the sorts, so perhaps in this case making the `SortExec` faster won't help a ton to improve the total performance. Maybe we can use `target_partitions=1` to concentrate more work on `SortExec` so we can have a look. > * I made a change here that's [Speedup `interleave_views` (4-7x faster) arrow-rs#7695](https://github.com/apache/arrow-rs/pull/7695) that will probably help a quite a bit with the performance of `SortPreserveMergeExec` and`SortExec`, maybe we can look at where the next hotspots after this change, I think probably a lot in converting to `Row`, doing comparison on byte slices and doing allocations. But also some parts seem related that we don't handle views as efficiently as possible. > * One example I see is for example we do call `.gc()` which currently does a not-fast implementation. > > <img alt="image" width="1179" src="https://private-user-images.githubusercontent.com/163737/457562908-07e3de93-9b3d-4f63-8d08-c328b8e39f73.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTA0OTg1OTUsIm5iZiI6MTc1MDQ5ODI5NSwicGF0aCI6Ii8xNjM3MzcvNDU3NTYyOTA4LTA3ZTNkZTkzLTliM2QtNGY2My04ZDA4LWMzMjhiOGUzOWY3My5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNjIxJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDYyMVQwOTMxMzVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02Y2RjMjY1M2U2OWQxODZkMWMxODQ0ZWY4NzlkNjMxYjg1M2QzZWM1NDc0MjkyNWI1YjJlOWYwYTg2MmEzMmI4JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.A8Vw--OYzb0KIUonmGmJCQ5QCNDtn5ElvBrf2nvxOVQ"> > * Another one, GenericByteViewArray::compare_unchecked: > > <img alt="image" width="1070" src="https://private-user-images.githubusercontent.com/163737/457563415-fdddbf69-c176-4adc-9a05-c8e44c23ad3d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTA0OTg1OTUsIm5iZiI6MTc1MDQ5ODI5NSwicGF0aCI6Ii8xNjM3MzcvNDU3NTYzNDE1LWZkZGRiZjY5LWMxNzYtNGFkYy05YTA1LWM4ZTQ0YzIzYWQzZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNjIxJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDYyMVQwOTMxMzVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hYjdhZmM2YzM3MzZlMWY2MmEyODYwOWQwNmY2YWM0MjBjMWJiOWY3MWJmODY0NTJiYWY4NDZkMmU3ZWRlYTMxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.cU5MDDRd4tWNQpUI7RreL7an372KsDXyCtfaoCgLLh0"> Thank you @Dandandan , this is really helpful and valuable for further investigation, i will do some investigation based on these directions. And i also take one of the above topic: https://github.com/apache/arrow-rs/issues/7621#issuecomment-2991750296 May be i can start from it to see if we can benefit from it, thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org