Re: Huge backpressure when using AggregateFunction with Session Window

Timo Walther Wed, 20 Oct 2021 08:50:09 -0700

Hi Ori,

this sounds indeed strange. Can you also reproduce this behavior locallywith a faker source? We should definitely add a profiler and see wherethe bottleneck lies.


Which Flink version and state backend are you using?

Regards,
Timo

On 20.10.21 16:17, Ori Popowski wrote:

I have a simple Flink application with a simple keyBy, a SessionWindow,and I use an AggregateFunction to incrementally aggregate a result, andwrite to a Sink.
Some of the requirements involve accumulating lists of fields from theevents (for example, all URLs), so not all the values in the end shouldbe primitives (although some are, like total number of events, andsession duration).
This job is experiencing a huge backpressure 40 minutes after launching.
I've found out that the append and concatenate operations in the logicof my AggregateFunction's add() and merge() functions are what's ruiningthe job (i.e. causing the backpressure).
I've managed to create a reduced version of my job, where I just appendand concatenate some of the event values and I can confirm that abackpressure starts just 40 minutes after launching the job:
class SimpleAggregator extends AggregateFunction[Event, Accumulator,Session] with LazyLogging {
override def createAccumulator(): Accumulator = (
Vector.empty,
Vector.empty,
Vector.empty,
Vector.empty,
Vector.empty
)

override def add(value: Event, accumulator: Accumulator): Accumulator = {
(
accumulator._1 :+ value.getEnvUrl,
accumulator._2 :+ value.getCtxVisitId,
accumulator._3 :+ value.getVisionsSId,
accumulator._4 :+ value.getTime.longValue(),
accumulator._5 :+ value.getTime.longValue()
)
}

override def merge(a: Accumulator, b: Accumulator): Accumulator = {
(
a._1 ++ b._1,
a._2 ++ b._2,
a._3 ++ b._3,
a._4 ++ b._4,
a._5 ++ b._5
)
}

override def getResult(accumulator: Accumulator): Session = {
Session.newBuilder()
.setSessionDuration(1000)
.setSessionTotalEvents(1000)
.setSId("-" + UUID.randomUUID().toString)
.build()
}
}


This is the job overall (simplified version):

class App(
source: SourceFunction[Event],
sink: SinkFunction[Session]
) {

def run(config: Config): Unit = {
val senv = StreamExecutionEnvironment.getExecutionEnvironment
senv.setMaxParallelism(256)
val dataStream = senv.addSource(source).uid("source")
dataStream
.assignAscendingTimestamps(_.getTime)
.keyBy(event => (event.getWmUId, event.getWmEnv, event.getSId).toString())
.window(EventTimeSessionWindows.withGap(config.sessionGap.asFlinkTime))
.allowedLateness(0.seconds.asFlinkTime)
.process(new ProcessFunction).uid("process-session")
.addSink(sink).uid("sink")

senv.execute("session-aggregation")
}
}
After 3 weeks of grueling debugging, profiling, checking theserialization and more I couldn't solve the backpressure issue.However, I got an idea and used Flink's ProcessWindowFunction which justaggregates all the events behind the scenes and just gives them to me asan iterator, where I can then do all my calculations.Surprisingly, there's no backpressure. So even though theProcessWindowFunction actually aggregates more data, and also doesconcatenations and appends, for some reason there's no backpressure.
To finish this long post, what I'm trying to understand here is why whenI collected the events using an AggregateFunction there was abackpressure, and when Flink does this for me with ProcessWindowFunctionthere's no backpressure? It seems to me something is fundamentally wronghere, since it means I cannot do any non-reducing operations withoutcreating backpressure. I think it shouldn't cause the backpressure Iexperienced. I'm trying to understand what I did wrong here.
Thanks!

Re: Huge backpressure when using AggregateFunction with Session Window

Reply via email to