[ 
https://issues.apache.org/jira/browse/FLINK-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963838#comment-16963838
 ] 

Dawid Wysakowicz edited comment on FLINK-13702 at 10/31/19 10:27 AM:
---------------------------------------------------------------------

I agree the solution suggested by [~lzljs3620320] should work, it is very 
fragile though. It is based on the assumption that {{MemorySegments}} are 
immutable which in general case is not true. You can mutate MemorySegment via 
put methods. It is also possible that with this solution different code paths 
might work against different MemorySegments for the same {{LazyBinaryFormat}} 
instances. It does improve the current situation, I agree. We can add that 
solution, but we should keep in mind it is basically a hack and add a proper 
documentation with the assumptions that we do there.

I was also thinking if it would make sense to make {{MemorySegment[] 
getSegments}} return {{UnmodifiableMemorySegment}}. This would require more 
changes, as we would have to extract {{MemorySegment}} interface. Still that 
would not make the {{MemorySegment}} fully immutable as it still gives direct 
access to underlying memory array via {{getHeapMemory}}.
I would keep mind open for a better solution in a long run though.

As for the BinaryGeneric I think we could:
* remove the {{copy}} method from the class and move it to the {{Serializer}} 
as [~lzljs3620320] was suggesting
* in the {{materialize}} method call {{duplicate}}. This would ensure that 
every caller thread uses its own copy of the serializer.
* Moreover we can simply remove the getter for the serializer. There is no need 
to ever get it. The only place when it is called will actually never be 
reached. Because the {{if (input.getSegments() == null) {}}} will materialize 
the segments and they will always be not null. 
* additionally we can think of keeping {{TypeSerializerSnapshot}} instead of 
{{Serializer} and call {{restoreSerializer}} in the {{materialize}} method. 
Which should create a new copy of a Serializer. Unfortunately thats not the 
case for some of the {{Serializer}}s in Blink planner (this should be fixed 
independently anyway). To be on the safe side we would still call duplicate. 
What do you think?


was (Author: dawidwys):
I agree the solution suggested by [~lzljs3620320] should work, it is very 
fragile though. It is based on the assumption that {{MemorySegments}} are 
immutable which in general case is not true. You can mutate MemorySegment via 
put methods. It is also possible that with this solution different code paths 
might work against different MemorySegments for the same {{LazyBinaryFormat}} 
instances. It does improve the current situation, I agree. We can add that 
solution, but we should keep in mind it is basically a hack and add a proper 
documentation with the assumptions that we do there.

I was also thinking if it would make sense to make {{MemorySegment[] 
getSegments}} return {{UnmodifiableMemorySegment}}. This would require more 
changes, as we would have to extract {{MemorySegment}} interface. Still that 
would not make the {{MemorySegment}} fully immutable as it still gives direct 
access to underlying memory array via {{getHeapMemory}}.
I would keep mind open for a better solution in a long run though.

As for the BinaryGeneric I think we could:
* remove the {{copy}} method from the class and move it to the {{Serializer}} 
as [~lzljs3620320] was suggesting
* in the {{materialize}} method call {{duplicate}}. This would ensure that 
every caller thread uses its own copy of the serializer.
* additionally we can think of keeping {{TypeSerializerSnapshot}} instead of 
{{Serializer} and call {{restoreSerializer}} in the {{materialize}} method. 
Which should create a new copy of a Serializer. Unfortunately thats not the 
case for some of the {{Serializer}}s in Blink planner (this should be fixed 
independently anyway). To be on the safe side we would still call duplicate. 
What do you think?

> BaseMapSerializerTest.testDuplicate fails on Travis
> ---------------------------------------------------
>
>                 Key: FLINK-13702
>                 URL: https://issues.apache.org/jira/browse/FLINK-13702
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>    Affects Versions: 1.10.0
>            Reporter: Till Rohrmann
>            Assignee: Dawid Wysakowicz
>            Priority: Critical
>              Labels: test-stability
>
> The {{BaseMapSerializerTest.testDuplicate}} fails on Travis with an 
> {{java.lang.IndexOutOfBoundsException}}.
> https://api.travis-ci.org/v3/job/570973199/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to