New submission from Raymond Hettinger <raymond.hettin...@gmail.com>:

The current code for mode() does a good deal of extra work to support its two 
error outcomes (empty input and multimodal input).  That latter case is 
informative but doesn't provide any reasonable way to find just one of those 
modes, where any of the most popular would suffice.  This arises in nearest 
neighbor algorithms for example. I suggest adding an option to the API:

   def mode(seq, *, first_tie=False):       
       if tie_goes_to_first:
           # CHOOSE FIRST x ∈ S | ∄ y ∈ S : x ≠ y ∧ count(y) > count(x)
           return return Counter(seq).most_common(1)[0][0]
       ...

Use it like this:

    >>> data = 'ABBAC'
    >>> assert mode(data, first_tie=True) == 'A'

With the current API, there is no reasonable way to get to 'A' from 'ABBAC'.

Also, the new code path is much faster than the existing code path because it 
extracts only the 1 most common using min() rather than the n most common which 
has to sort the whole items() list.  New path: O(n).  Existing path: O(n log n).

Note, the current API is somewhat awkward to use.  In general, a user can't 
know in advance that the data only contains a single mode.  Accordingly, every 
call to mode() has to be wrapped in a try-except.  And if the user just wants 
one of those modal values, there is no way to get to it.  See 
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mode.html for 
comparison.

There may be better names for the flag.  "tie_goes_to_first_encountered" seemed 
a bit long though ;-)

----------
assignee: steven.daprano
components: Library (Lib)
messages: 334796
nosy: rhettinger, steven.daprano
priority: normal
severity: normal
status: open
title: Fix awkwardness of statistics.mode() for multimodal datasets
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35892>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to