Sebastian Bengtsson created SPARK-51426: -------------------------------------------
Summary: Setting metadata to empty dict does not work Key: SPARK-51426 URL: https://issues.apache.org/jira/browse/SPARK-51426 Project: Spark Issue Type: Bug Components: PySpark, Spark Core Affects Versions: 3.5.0 Environment: PySpark in Databricks. Databricks Runtime Version: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12) Reporter: Sebastian Bengtsson It should be possible to remove column metadata in a dataframe by setting metadata to an empty dictionary. Surprisingly, it is not possible to remove metadata by setting metadata to empty dict. If column "a" has metadata set, the following has no effect: {code:java} df.withMetadata('a', {}){code} Expected: Metadata would be removed/replaced by an empty dict. Experienced: Metadata is still there, unaffected. Code to demonstrate this behavior: {code:java} df = spark.createDataFrame([('',)], ['a']) print('no metadata:', df.schema['a'].metadata) df = df.withMetadata('a', {'foo': 'bar'}) print('metadata has been set:', df.schema['a'].metadata) df = df.select([col('a').alias('a', metadata={})]) print('metadata has not been removed:', df.schema['a'].metadata) df = df.withMetadata('a', {'baz': 'burr'}) print('metadata has been replaced:', df.schema['a'].metadata) df = df.withMetadata('a', {}) print('metadata still there:', df.schema['a'].metadata){code} {code:java} no metadata: {} metadata has been set: {'foo': 'bar'} metadata has not been removed: {'foo': 'bar'} metadata has been replaced: {'baz': 'burr'} metadata still there: {'baz': 'burr'} {code} Fixing this would include the following patch: {code:java} --- a/python/pyspark/sql/classic/column.py +++ b/python/pyspark/sql/classic/column.py @@ -518,7 +518,7 @@ class Column(ParentColumn): sc = get_active_spark_context() if len(alias) == 1: - if metadata: + if metadata is not None: assert sc._jvm is not None jmeta = getattr(sc._jvm, "org.apache.spark.sql.types.Metadata").fromJson( json.dumps(metadata) {code} But I suspect further changes in the Scala part of spark is also required. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org