[ https://issues.apache.org/jira/browse/SPARK-51426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-51426: ----------------------------------- Labels: pull-request-available (was: ) > Setting metadata to empty dict does not work > -------------------------------------------- > > Key: SPARK-51426 > URL: https://issues.apache.org/jira/browse/SPARK-51426 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core > Affects Versions: 3.5.0 > Environment: PySpark in Databricks. > Databricks Runtime Version: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12) > Reporter: Sebastian Bengtsson > Priority: Major > Labels: pull-request-available > > It should be possible to remove column metadata in a dataframe by setting > metadata to an empty dictionary. > Surprisingly, it is not possible to remove metadata by setting metadata to > empty dict. > If column "a" has metadata set, the following has no effect: > {code:java} > df.withMetadata('a', {}){code} > Expected: Metadata would be removed/replaced by an empty dict. > Experienced: Metadata is still there, unaffected. > > Code to demonstrate this behavior: > {code:java} > df = spark.createDataFrame([('',)], ['a']) > print('no metadata:', df.schema['a'].metadata) > df = df.withMetadata('a', {'foo': 'bar'}) > print('metadata has been set:', df.schema['a'].metadata) > df = df.select([col('a').alias('a', metadata={})]) > print('metadata has not been removed:', df.schema['a'].metadata) > df = df.withMetadata('a', {'baz': 'burr'}) > print('metadata has been replaced:', df.schema['a'].metadata) > df = df.withMetadata('a', {}) > print('metadata still there:', df.schema['a'].metadata){code} > {code:java} > no metadata: {} > metadata has been set: {'foo': 'bar'} > metadata has not been removed: {'foo': 'bar'} > metadata has been replaced: {'baz': 'burr'} > metadata still there: {'baz': 'burr'} > {code} > Fixing this would include the following patch: > {code:java} > --- a/python/pyspark/sql/classic/column.py > +++ b/python/pyspark/sql/classic/column.py > @@ -518,7 +518,7 @@ class Column(ParentColumn): > sc = get_active_spark_context() > if len(alias) == 1: > - if metadata: > + if metadata is not None: > assert sc._jvm is not None > jmeta = getattr(sc._jvm, > "org.apache.spark.sql.types.Metadata").fromJson( > json.dumps(metadata) {code} > But I suspect further changes in the Scala part of spark is also required. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org