Re: [Spark Core]: Support for un-pivoting data ('melt')

Enrico Minack Mon, 11 Apr 2022 13:49:02 -0700

The melt function has recently been implemented in the PySpark PandasAPI (because melt is part of the Pandas API). I think, Scala/JavaDataset and Python DataFrame APIs deserve this method equally well,ideally all based on one implementation.

I'd like to fuel the conversation with some code:https://github.com/apache/spark/pull/36150


Cheers,
Enrico


Am 02.01.22 um 20:59 schrieb Daniel Davies:

Level: Intermediate (I think?)
Scenario: Feature Request

Hello dev@,
(First time posting on this mailing list; apologies in advance if thisshould have been routed elsewhere or is missing any information).
Un-pivoting data is supported on numerous SQL engines & in Pandas(with the 'melt' function), but it isn't directly available in spark.It's easy enough to derive this functionality using the 'stack'function or a combination of struct, array, and explode (e.g. such asthe reproduction of the melt function in pandas-on-pyspark here<https://github.com/apache/spark/blob/c92bd5cafe62ca5226176446735171cc877e805a/python/pyspark/pandas/frame.py#L9651>),but I was wondering whether a more native solution had beenconsidered? It would make end-user code more lightweight at the veryleast; and I wonder whether it could be made more efficient than usingthe stack function/struct-array-explode method.
I'm happy to try and make a PR if this is something that might beuseful within spark. No worries if this is not something that youthink should be supported; the methods above work and are welldocumented on StackOverflow. I was personally just caught out by this,and thought it would be useful to raise.
I did see a thread in the Pony archive about this issue, but it lookslike it didn't go anywhere. Does anyone else have context on this<https://lists.apache.org/list?dev@spark.apache.org:lte=60M:unpivot>?
Kind Regards,

--
/_*Daniel Davies*_/

Re: [Spark Core]: Support for un-pivoting data ('melt')

Reply via email to