zhuqi-lucas commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2993487268

   > Thank you @zhuqi-lucas for experimenting on this. Maybe it's a good idea 
to do some profiling to see the hots spots?
   > 
   > For example, this is the profile I get from the sort-tpch benchmark.
   > 
   > <img alt="image" width="1728" 
src="https://private-user-images.githubusercontent.com/163737/457561945-88a72c7b-472e-438f-964b-ee43101df958.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTA0OTg1OTUsIm5iZiI6MTc1MDQ5ODI5NSwicGF0aCI6Ii8xNjM3MzcvNDU3NTYxOTQ1LTg4YTcyYzdiLTQ3MmUtNDM4Zi05NjRiLWVlNDMxMDFkZjk1OC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNjIxJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDYyMVQwOTMxMzVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lNzQ4MGNhY2FhZjllN2I1Y2NhZTM5MGUyNGM2NThjYTQ3NDc2ZjY0ZGRhMTg2YWRjZWY4ZTU5YjNkNDljOWI1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.4BjfhYttn4cKWwp2bWfeOZpFi9oLvqqXgyKqBhe5LGc";>
   > * You can see here most of the work is concentrated in 
SortPreservingMerge, rather than the sorts, so perhaps in this case making the 
`SortExec` faster won't help a ton to improve the total performance. Maybe we 
can use `target_partitions=1` to concentrate more work on `SortExec` so we can 
have a look.
   > * I made a change here that's [Speedup `interleave_views` (4-7x faster) 
arrow-rs#7695](https://github.com/apache/arrow-rs/pull/7695) that will probably 
help a quite a bit with the performance of `SortPreserveMergeExec` 
and`SortExec`, maybe we can look at where the next hotspots after this change, 
I think probably a lot in converting to `Row`, doing comparison on byte slices 
and doing allocations. But also some parts seem related that we don't handle 
views as efficiently as possible.
   > * One example I see is for example we do call `.gc()` which currently does 
a not-fast implementation.
   > 
   > <img alt="image" width="1179" 
src="https://private-user-images.githubusercontent.com/163737/457562908-07e3de93-9b3d-4f63-8d08-c328b8e39f73.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTA0OTg1OTUsIm5iZiI6MTc1MDQ5ODI5NSwicGF0aCI6Ii8xNjM3MzcvNDU3NTYyOTA4LTA3ZTNkZTkzLTliM2QtNGY2My04ZDA4LWMzMjhiOGUzOWY3My5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNjIxJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDYyMVQwOTMxMzVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02Y2RjMjY1M2U2OWQxODZkMWMxODQ0ZWY4NzlkNjMxYjg1M2QzZWM1NDc0MjkyNWI1YjJlOWYwYTg2MmEzMmI4JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.A8Vw--OYzb0KIUonmGmJCQ5QCNDtn5ElvBrf2nvxOVQ";>
   > * Another one, GenericByteViewArray::compare_unchecked:
   > 
   > <img alt="image" width="1070" 
src="https://private-user-images.githubusercontent.com/163737/457563415-fdddbf69-c176-4adc-9a05-c8e44c23ad3d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTA0OTg1OTUsIm5iZiI6MTc1MDQ5ODI5NSwicGF0aCI6Ii8xNjM3MzcvNDU3NTYzNDE1LWZkZGRiZjY5LWMxNzYtNGFkYy05YTA1LWM4ZTQ0YzIzYWQzZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNjIxJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDYyMVQwOTMxMzVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hYjdhZmM2YzM3MzZlMWY2MmEyODYwOWQwNmY2YWM0MjBjMWJiOWY3MWJmODY0NTJiYWY4NDZkMmU3ZWRlYTMxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.cU5MDDRd4tWNQpUI7RreL7an372KsDXyCtfaoCgLLh0";>
   
   
   
   Thank you @Dandandan , this is really helpful and valuable for further 
investigation, i will do some investigation based on these directions.
   
   
   And i also take one of the above topic:
   https://github.com/apache/arrow-rs/issues/7621#issuecomment-2991750296
   
   May be i can start from it to see if we can benefit from it, thanks again!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to