ostronaut commented on PR #49144: URL: https://github.com/apache/spark/pull/49144#issuecomment-2545249990
Okay, thank you for your point @cloud-fan! Just to double check if i've got everything correctly before further implementation: instead of prohibiting map expressions for partitioning, we can implemented a `Rule[LogicalPlan]` named `InsertMapSortInPartitioningExpressions` (any other name can be recommended) where we will replace MapType to MapSort. Having Map Sorted will then produce the same hash codes for the same maps, as per `InterpretedHashFunction.hash` logic (where order of elements matter for the final cash value): ```scala case map: MapData => val (kt, vt) = dataType match { case udt: UserDefinedType[_] => val mapType = udt.sqlType.asInstanceOf[MapType] mapType.keyType -> mapType.valueType case MapType(kt, vt, _) => kt -> vt } val keys = map.keyArray() val values = map.valueArray() var result = seed var i = 0 while (i < map.numElements()) { result = hash(keys.get(i, kt), kt, result) result = hash(values.get(i, vt), vt, result) i += 1 } result ``` Please let me know if im missing something or if you have any other recommendations! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org