Re: Extremely large job serialization produced by union operator

杨力 Fri, 09 Mar 2018 14:37:16 -0800

Thank you for your response. It occurs both in a standalone cluster anda a
yarn-cluster. I am trying to remove business code and reproduce it with a
minimal demo.


On Sat, Mar 10, 2018 at 2:27 AM Piotr Nowojski <pi...@data-artisans.com>
wrote:

> Hi,
>
> Could you provide more details about your queries and setup? Logs could be
> helpful as well.
>
> Piotrek
>
> > On 9 Mar 2018, at 11:00, 杨力 <bill.le...@gmail.com> wrote:
> >
> > I wrote a flink-sql app with following topography.
> >
> > KafkaJsonTableSource -> SQL -> toAppendStream -> Map ->
> JDBCAppendTableSink
> > KafkaJsonTableSource -> SQL -> toAppendStream -> Map ->
> JDBCAppendTableSink
> > ...
> > KafkaJsonTableSource -> SQL -> toAppendStream -> Map ->
> JDBCAppendTableSink
> >
> > I have a dozen of TableSources And tens of SQLs. As a result, the number
> of JDBCAppendTableSink times parallelism, that is the number of concurrent
> connections to database, is too large for the database server to handle. So
> I tried union DataStreams before connecting them to the TableSink.
> >
> > KafkaJsonTableSource -> SQL -> toAppendStream -> Map
> > \
> > KafkaJsonTableSource -> SQL -> toAppendStream -> Map --- union ->
> JDBCAppendTableSink
> > ... /
> > KafkaJsonTableSource -> SQL -> toAppendStream -> Map
> >
> > With this strategy, job submission failed with an
> OversizedPayloadException of 104 MB. Increasing akka.framesize helps to
> avoid this exception, but job submission hangs and times out.
> >
> > I can't understand why a simple union operator would serialize to such a
> large message. Can I avoid this problem?
> > Or can I change some configuration to fix the submission time out?
> >
> > Regards,
> > Bill
>
>

Re: Extremely large job serialization produced by union operator

Reply via email to