LuciferYang commented on code in PR #54572:
URL: https://github.com/apache/spark/pull/54572#discussion_r2872452574
##########
core/src/main/scala/org/apache/spark/util/SizeEstimator.scala:
##########
@@ -97,32 +96,56 @@ object SizeEstimator extends Logging {
// Size of an object reference
// Based on https://wikis.oracle.com/display/HotSpotInternals/CompressedOops
private var isCompressedOops = false
+
+ // Whether Compact Object Headers (JEP 450/519) are enabled.
+ // With Compact Object Headers, the object header is 8 bytes on 64-bit JVMs
+ // (the class pointer is encoded inside the mark word), so objectSize = 8
+ // and pointerSize = 4 regardless of UseCompressedOops.
+ private var isCompactObjectHeaders = false
+
private var pointerSize = 4
// Minimum size of a java.lang.Object
private var objectSize = 8
initialize()
- // Sets object size, pointer size based on architecture and CompressedOops
settings
- // from the JVM.
+ // Sets object size, pointer size based on architecture, CompressedOops
+ // and CompactObjectHeaders settings from the JVM.
private def initialize(): Unit = {
val arch = Utils.osArch
is64bit = arch.contains("64") || arch.contains("s390x")
+ isCompactObjectHeaders = is64bit && getIsCompactObjectHeaders
isCompressedOops = getIsCompressedOops
objectSize = if (!is64bit) 8 else {
- if (!isCompressedOops) {
+ if (isCompactObjectHeaders) {
+ 8
+ } else if (!isCompressedOops) {
16
} else {
12
}
}
- pointerSize = if (is64bit && !isCompressedOops) 8 else 4
+ pointerSize = if (is64bit && !isCompressedOops && !isCompactObjectHeaders)
8 else 4
Review Comment:
It seems that `pointerSize` logic here assumes that enabling
`UseCompactObjectHeaders` implies `pointerSize` is always 4 bytes. However,
`UseCompactObjectHeaders` (which compresses the object header) and
`UseCompressedOops` (which compresses object references in fields) are
technically orthogonal.
According to [JDK-8341555](https://bugs.openjdk.org/browse/JDK-8341555), it
is valid to run with `-XX:+UseCompactObjectHeaders -XX:-UseCompressedOops`
(e.g., in large heap scenarios > 32GB). In that case, the object header would
be 8 bytes (compact), but the object references (oops) within fields would
still be 8 bytes (uncompressed).
The current logic:
```scala
pointerSize = if (is64bit && !isCompressedOops && !isCompactObjectHeaders) 8
else 4
```
would incorrectly calculate `pointerSize` as 4 in the
`-XX:-UseCompressedOops` + `-XX:+UseCompactObjectHeaders` scenario.
I'd suggest simplifying this to:
```scala
pointerSize = if (is64bit && !isCompressedOops) 8 else 4
```
This correctly decouples the header size logic from the pointer size logic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]