Hi Niels, the size of the jar does not play a role for Flink. What could be a problem is that the serialized `JobGraph` (user code with closures) is larger than 10 MB and, thus, exceeds the maximum default framesize of Akka. In such a case, it cannot be sent to the `JobMaster`. You can control the framesize via `akka.framesize`.
In order to debug the problem properly, I would need access to the client log and the JobManager logs if possible. Cheers, Till On Tue, Feb 27, 2018 at 11:05 AM, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Niels, > > There should be no size constraints on the complexity of an application or > the size of a JAR file. > The problem that you describe sounds a bit strange and should be fixed. > > Apparently, it has to spend more time on planning / submitting the > application than before. > Have you tried to increase the akka.client.timeout parameter? > > If that does not help, it would be good to learn what the JobManager is > doing after the application was submitted. > Either it just takes longer than before such that the client timeout is > exceeded or it might even get stuck in some kind of deadlock (which would > be bad). > In that case it might help to take a few stacktraces of JM process after > the application was submitted to check if the threads are making progress. > > I'll also include Till who is more familiar with the submission process > and JM planning and coordination. > > Best, Fabian > > > 2018-02-27 9:31 GMT+01:00 Niels <nielsdenis...@gmail.com>: > >> Hi All, >> >> We've been using Flink 1.3.2 for a while now, but recently failed to >> deploy >> our fat jar to the cluster. The deployment only works when we remove 2 >> arbitrary operators, thus giving us the impression our job is too large. >> However, we only changed some case classes and serializers (to support >> Avro) >> compared to a working version of our jar. I'll provide some context below. >> >> *Streaming operators used: *(same list as when deploy worked) >> - 9 Incoming streams from Kafka (all parsed from JSON -> Case Classes) >> - 6 Stateful Joins (extend CoProcessFunction) >> - 4 Stateful Processors (extend ProcessFunction) >> - 5 Maps >> - 2 Filters >> - 1 Union of 3 Streams >> - 1 Sink to Kafka (Case class -> JSON) >> >> *Changes made:* >> - add extended Type Serializer for Avro support >> - add companion objects to case classes for translation to Avro Generic >> Records >> - alter state full functions to use above changes >> >> *what does work:* >> - remove 2 arbitrary operators and deploy fat jar >> - run full program using sbt run locally >> >> Could it be that somehow the complexity causes the job deploy as jar to >> fail? We simply get a timeout from Flinks CLI when trying to deploy, even >> when extending the timeout to several minutes. >> >> Any help would be very much appreciated! >> >> Thanks, >> Niels >> >> >> >> -- >> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4. >> nabble.com/ >> > >