[
https://issues.apache.org/jira/browse/SPARK-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437616#comment-15437616
]
Nicholas Chammas commented on SPARK-14241:
------------------------------------------
[~marmbrus] - Would it be tough to make this function deterministic, or somehow
"stable"? The linked Stack Overflow question shows some pretty surprising
behavior from an end-user perspective.
If this would be tough to change, what are some alternatives you would
recommend?
Do you think, for example, it would be possible to make a window function that
_is_ deterministic and does effectively the same thing? Maybe something like
{{row_number()}}, except the {{WindowSpec}} would not need to specify any
partitioning or ordering. (Required ordering would be the main downside of
using {{row_number()}} instead of {{monotonically_increasing_id()}}.)
> Output of monotonically_increasing_id lacks stable relation with rows of
> DataFrame
> ----------------------------------------------------------------------------------
>
> Key: SPARK-14241
> URL: https://issues.apache.org/jira/browse/SPARK-14241
> Project: Spark
> Issue Type: Bug
> Components: PySpark, Spark Core
> Affects Versions: 1.6.0, 1.6.1
> Reporter: Paul Shearer
>
> If you use monotonically_increasing_id() to append a column of IDs to a
> DataFrame, the IDs do not have a stable, deterministic relationship to the
> rows they are appended to. A given ID value can land on different rows
> depending on what happens in the task graph:
> http://stackoverflow.com/questions/35705038/how-do-i-add-an-persistent-column-of-row-ids-to-spark-dataframe/35706321#35706321
> From a user perspective this behavior is very unexpected, and many things one
> would normally like to do with an ID column are in fact only possible under
> very narrow circumstances. The function should either be made deterministic,
> or there should be a prominent warning note in the API docs regarding its
> behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]