[ https://issues.apache.org/jira/browse/ARROW-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662297#comment-17662297 ]
Rok Mihevc commented on ARROW-5274: ----------------------------------- This issue has been migrated to [issue #21744|https://github.com/apache/arrow/issues/21744] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [JavaScript] Wrong array type for countBy > ----------------------------------------- > > Key: ARROW-5274 > URL: https://issues.apache.org/jira/browse/ARROW-5274 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript > Reporter: Yngve Kristiansen > Assignee: Yngve Kristiansen > Priority: Critical > Labels: pull-request-available > Fix For: 0.14.0 > > Original Estimate: 5m > Time Spent: 1h > Remaining Estimate: 0h > > The {{countBy}} function is not returning correct histograms, as it seems to > select the wrong array type for the indexing. > The following line in countBy seems to be causing the problems: > {{const countByteLength = Math.ceil(Math.log(vector.dictionary.length) / > Math.log(256));}} > For example, if the dictionary length is 3, yet the indices length is 1 > million, the result of this expression will be 1, which will lead to a > Uint8Array being used, again resulting in overflows. > Codepen example > [https://codepen.io/Yngve92/pen/mYdWrr] > If I switch the expression to: {{const countByteLength = > Math.ceil(Math.log(vector.length) / Math.log(256));}} it seems to be working > all right, but I am not sure if this is correct. > The expression is on L63, L189 in src/compute/dataframe.ts. > > PR submitted: [https://github.com/apache/arrow/pull/4265] -- This message was sent by Atlassian Jira (v8.20.10#820010)