Hello,
as I am confused that in my tests the Exception on restoring keyed state backend only happens on scale-in/node-fail and not in scale-out, I re-run the scale-out test case.
And indeed, there the exception by HeapKeyedStateBackendBuilder does not happen, and restoring keyed state backend works fine.
My incoming data is in all test cases the same; I also had multiple runs for each test case, so I am sure that at least my actual setup results only for the scale-in in my issue.
All task manager / tasks work properly after scale-out and load is uniformly distributed.
Is the behavoiur for loading keyed state from checkpoints, or even broader the general process for reacting o,n adding/removal of TaskManager in a flink cluster diffrent?
In the scale-in scenario I see some additional exception due the fact that the TaskManager is immediately gone (like Netty and Akka Exceptions for the TaskManager which is removed).
Is this normal, or did I maybe have a problem with my configuration?
BR
Martin
Martin schrieb am 08.01.2022 21:35 (GMT +01:00):
Hello David,
> That is just to avoid falling back to Kryo serialization, which is less effective, but this IMO shouldn't break your application.
Thanks for that hint. I would have spend some hours working on that without fixing my issue.
> Maybe can you try to check these for possible null values / cyclic references?
I will have a look into that.
The data model is quite complex and automatically dervied from an OpenAPI spec from a 3GPP specification and I am not complete aware of all details of it, so there is a chance that the null or cyclic ref. case could be.> . I think if I were to debug this, I'd start with writing a unit test for the state serialization (manually instantiating the serializer and running hand crafted value through the ser/de cycle to see if it fails).
I think that approach is reasonable and I will go for that.
Can you hint me to a starting point (docs, existing unit tests, code fragments) for manually doing a flink state serialization cycle?
Say I start with having some of my complex objectes already and how to go on now.> I'd probably need a minimal case to reproduce this locally to get more insight.
When I am not able, with your suggested approach, to solve my issue, I will try to create a minimal case.Thanks for helping me David!BR
Martin
What I understand from the docs [1] and David A. answer on SO [2] is, that I have to create for my POJOs with Lists and Maps TypeInfos.That is just to avoid falling back to Kryo serialization, which is less effective, but this IMO shouldn't break your application. There might be of course some edge case that Kryo can't handle, in that case switching to the POJO serializer might actually help.Caused by: com.esotericsoftware.kryo.KryoException: java.io.IOException: Stream Closed
Serialization trace:
IdString (de.martin.model.ContainerInfo)
pDUContainerInformation (de.martin.model.UsedContainer)
usedContainer (de.martin.model.UnitUsage)
usagePerKey (de.telekom.mschalln.processing.logic.Accumulator)Maybe can you try to check these for possible null values / cyclic references? These are just guesses. I think if I were to debug this, I'd start with writing a unit test for the state serialization (manually instantiating the serializer and running hand crafted value through the ser/de cycle to see if it fails).Other than that I can't really think of anything right now, I'd probably need a minimal case to reproduce this locally to get more insight.D.