[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282355#comment-17282355 ]
Antoine Pitrou edited comment on ARROW-10899 at 2/10/21, 10:14 AM: ------------------------------------------------------------------- Even a radix sort is not necessarily stable. You have to be careful when implementing it: see https://en.wikipedia.org/wiki/Radix_sort Also, I assume your benchmarks are on random data? Data is very often not randomly distributed. On real-world data, a int32 or int64 column doesn't necessarily span the whole int32 or int64 range, for example. Perhaps heuristics would need to be devised, based on the input range and the input size you would either use a radix sort (which is O\(n\)) or a comparison-based sort such as spinsort (which is O\(n log n\)). was (Author: pitrou): Even a radix sort is not necessarily stable. You have to be careful when implementing it: see https://en.wikipedia.org/wiki/Radix_sort Also, I assume your benchmarks are on random data? Data is very often not randomly distributed. On real-world data, a int32 or int64 column doesn't necessarily span the whole int32 or int64 range, for example. Perhaps heuristics would need to be devised, based on the input range and the input size you would either use a radix sort (which is {{O(n)}}) or a comparison-based sort such as spinsort (which is {{O(n log n)}}). > [C++] Investigate radix sort for integer arrays > ----------------------------------------------- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ > Reporter: Antoine Pitrou > Assignee: Kirill Lykov > Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)