Re: [PR] [FLINK-37833] [table-runtime] Fix BatchExecExchange codegen for BINARY type by evaluating Unsafe.arrayBaseOffset(byte[].class) in TM rather than in JM [flink]

via GitHub Thu, 29 May 2025 14:33:09 -0700


venkata91 commented on PR #26592:
URL: https://github.com/apache/flink/pull/26592#issuecomment-2920638048


   > > > > Thanks for the patch. It is a good catch.
   > > > > I understand it might be a little tricky to do unit test. but we may 
be able to do integration test. We can do the following:
   > > > > 
   > > > > 1. start a session cluster with normal size JM/TM ( < 32GB )
   > > > > 2. when launch the job on the command line, use an additional JVM 
arg of `-XX:-UseCompressedOops` to force a 24-bytes object header with JVM 
memory > 4 GB.
   > > > > 
   > > > > This way we should be able to test it.
   > > > 
   > > > 
   > > > Thanks for reviewing the patch !
   > > > Yes, I'm working on e-2-e test with `FlinkContainers`. In this case, 
we need 2 different JVMs (JM and TM) one with `UseCompressedOops` and one 
without that. Still working on the e-2-e test, will post it once it is ready.
   > > 
   > > 
   > > Last week, I was trying to reproduce the issue locally for Flink SQL 
with inner join and custom datasets, but it is not reproducible locally. Note: 
with remote debugging I could confirm the issue still exists - basically JM 
codegens `HashPartitioner` with `arrayBaseOffset` = 24 while on the TM side it 
is 16. What happens is the following:
   > > 
   > > 1. Bytes [16 ... 31] is where the key is present and length = 16
   > > 2. Bytes [32 ... 39] is filled with JVM garbage which looks something 
like [1, 0, 0, 0, ... 0]
   > > 3. Hash function reads bytes from [24 ... 39] where [24 ... 31] first 8 
bytes are valid bytes of the key and the remaining 8 bytes [31 ... 39] is just 
JVM garbage.
   > > 4. The same pattern exists for both the occurrences of the same key (in 
fact for all the keys).
   > > 5. Due to the above, the hashcode computed is consistent b/w both the 
occurrences of the same key and gets shuffled to the appropriate task, even 
though the hashcode computation is wrong.
   > > 
   > > I'm still trying to construct a dataset where [32 ... 39] bytes are 
different for the same key. Note: even in our internal environment, this issue 
is reproducible only when `nested fields projection pushdown` is enabled. This 
optimization is implemented in our internal connector.
   > 
   > Upon investigating it further, looks like `json` FileSystemSource supports 
`nested fields projection pushdown`. With that enabled, I am able to reproduce 
it locally. Let me add a ITCase using that.
   
   I think I got misled by the 
[JsonFormatFactory.supportsNestedProjectionPushdown](https://github.com/apache/flink/blob/master/flink-formats/flink-json/src/main/java/org/apache/flink/formats/json/JsonFormatFactory.java#L113),
 but seems like FileSystemTableSource that wraps the JsonFormat simply returns 
false. For a moment I thought I was able to reproduce it locally, but seems 
like I misspoke too soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-37833] [table-runtime] Fix BatchExecExchange codegen for BINARY type by evaluating Unsafe.arrayBaseOffset(byte[].class) in TM rather than in JM [flink]

Reply via email to