[
https://issues.apache.org/jira/browse/CALCITE-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17300077#comment-17300077
]
Vladimir Sitnikov commented on CALCITE-4522:
--------------------------------------------
{quote}There is a significant per-row overhead in a sort algorithm even if
comparisons are very cheap{quote}
Could you please clarify what is the overhead?
We discuss CPU cost here, aren't we?
N Log N comes from https://en.m.wikipedia.org/wiki/Comparison_sort which reads
{quote}Wikipedia: A comparison sort must have an average-case lower bound of
Ω(n log n) comparison operations,[1] which is known as linearithmic time{quote}
For instance, arrayList.sort(...) is inplace, and its average runtime is
dominated by comparison.
{quote}The only significant difference is sorting on zero fields (i.e. not
sorting at all) and sorting on 1 or more fields{quote}
Julian, I do not understand why do you think that a single ideal example proves
me wrong. Apparently the runtime of comparison (e.g. average) does depend on
the complexity of the comparator function. The complexity of the comparator
does have relation with the number of columns compared and its types (e.g. ints
are easier to compare than strings)
Here's a recent case from PostgreSQL mailing list that shows "bad plan being
selected" issue caused by the fact that PG does not account for the number of
columns in the sort key:
https://www.postgresql.org/message-id/CAKU4AWqx2TeXyhmr1TNBt0u+WNCr=dpTTpNcwudE=knjbfv...@mail.gmail.com
The current inmemory costing model in PostgreSQL agrees with me with an
exception that miss "number of keys in sort" feature.
> Sort cost should account for the number of columns in collation
> ---------------------------------------------------------------
>
> Key: CALCITE-4522
> URL: https://issues.apache.org/jira/browse/CALCITE-4522
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: hqx
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 9h 20m
> Remaining Estimate: 0h
>
> The old method to compute the cost of sort has some problem.
> # When the RelCollation is empty, there is no need to sort, but it still
> compute the cpu cost of sort.
> # use n * log\(n) * row_byte to estimate the cpu cost may be inaccurate,
> where n means the output row count of the sort operator, and row_byte means
> the average bytes of one row .
> Instead, I give follow suggestion.
> # the cpu cost is zero if the RelCollation is empty.
> # let heap_size be min\(offset + output_count, input_count), and use
> input_count * log\(heap_size)* row_byte to compute the cpu cost.
> When fetch is zero, I found the output_count is 1 not 0. This conveniently
> ensure the log\(heap_size) no less than zero
--
This message was sent by Atlassian Jira
(v8.3.4#803005)