Hi all,

We're currently trying to upgrade our Camel from 3.20.2, spring boot 2.7.12
running on Java 11 to spring.boot.version=3.1.2 and camel.version=4.0.0
running on Java 17.

We are seeing weird behaviour in one of our applications where our SEDA
component just 'hangs' after a specific set of interactions involving
dynamically adding/removing routes via the dynamicRouter feature.

Some notes:
 - We use SEDA component (where things seem to hang)
 - flow is basically:
  initialize camel,
  process msg1 to create dynamic route and process message on it,
  process msg2 which deletes route,
  process msg3 which re-creates route and processes it,
  process msg4 which deletes route,
  process msg5 which re-creates route and HANGS at this point
 - Basically the 2nd time we delete the route and re-create again (which
has the same name as before) is when we see the 'hang' for processing
(super weird)
 - This only happens in version 4.x same code works fine in 3.x

We're really wondering what could've changed between version 3.x and 4.x
that causes this now to hang?

I've included some sudo code below with log snippets to help clarify

Our main RouteBuilder creates these two
    from("servlet:dialcommand")
      .routeId("dialCommandServletRoute")
      .convertBodyTo(String::class.java)
      .to("seda:dialcommandqueue?timeout=$commandTimeout")
      .removeHeader("args")


from("seda:dialcommandqueue?concurrentConsumers=$concurrentConsumerCount&timeout=$commandTimeout")
      .setExchangePattern(ExchangePattern.InOut)
      .routeId("dialcommandprocess")
      .dynamicRouter { it: Exchange -> dialDynamicRouting(it) }

Our dynamic router logic is like this:
fun dialDynamicRouting(exchange: Exchange): String? {

 // when we need to kill a route
   is KillRoute -> {
     context.routeController.stopRoute(routeId)
     context.removeRoute(routeId)

exchange.`in`.body = success("OK", exchange.`in`.getHeader("id",
String::class.java))
     return null

 // when we add a new route dynamically
    val dialCommandRoute = createDialCommandRoute(routeId, contextId,
contextKey, concurrency)
    // add to camel context
    context.addRoutes(dialCommandRoute)
    context.getRoute(routeId).properties["contextId"] = contextId
return dialCommandRoute.from

The "createDialCommandRoute" logic dynamic router created uses this type of
code
    from(from)
      .routeId(routeId)
      .process {
        val message = it.`in`
        val body = message.body
        if (body is DialCommand<*>) {
          body.id = message.getHeader("id", String::class.java) ?:
body.standardOptions.id ?: body.id
          body.configureLogContext()
        }
      }.id("[$routeId] prepare command object")
      .log(LoggingLevel.INFO, "[\${headers.id}] \${body.procName}")
      .process { it: Exchange -> it.`in`.body = run(it) }
  }

With DEBUG level logging enabled we can see where it hangs is basically
right between when it would normally
process items off that newly created route.

Log Sample 1: Example logs of a valid run where it creates the new route
and start processing it
2023-08-22  INFO 54168 --- [ialcommandqueue]
c.g.dm.automation.etl.route.Operator     : Route dialCommandRoute-6fddd8cf
does not exist. This command (RecoverUnfinishedOperations) will create the
route.
2023-08-22  INFO 54168 --- [ialcommandqueue]
c.g.dm.automation.etl.route.Operator     : Creating route:
dialCommandRoute-6fddd8cf
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.impl.engine.AbstractCamelContext   :
seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0 converted to
endpoint: seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0 by
component: org.apache.camel.component.seda.SedaComponent@4a23350
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.i.e.InternalRouteStartupManager    : Warming up route id:
dialCommandRoute-6fddd8cf having autoStartup=true
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.i.e.InternalRouteStartupManager    : Route: dialCommandRoute-6fddd8cf
>>> Route[seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0 ->
null]
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.i.e.InternalRouteStartupManager    : Starting consumer (order: 1004)
on route: dialCommandRoute-6fddd8cf
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.i.e.BaseExecutorServiceManager     : Created new ThreadPool for
source:
Consumer[seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0] with
name: seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0. ->
org.apache.camel.util.concurrent.RejectableThreadPoolExecutor@10881933[Running,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
0][seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0]
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.i.e.InternalRouteStartupManager    : Route: dialCommandRoute-6fddd8cf
started and consuming from: seda://dialcommand:6fddd8cf

-- NOTE: this is where we see it now runs our route's process method on
different thread
2023-08-22  INFO 54168 --- [  e70c0565-main] dialCommandRoute-6fddd8cf
           : [e70c0565] dhRecoverUnfinishedOperations
2023-08-22  INFO 54168 --- [  e70c0565-main]
c.g.d.a.etl.cli.command.DialCommand      : Beginning
dhRecoverUnfinishedOperations
2023-08-22  INFO 54168 --- [  e70c0565-main]
c.g.d.a.etl.cli.command.DialCommand      : Completed
dhRecoverUnfinishedOperations
2023-08-22  INFO 54168 --- [ialcommandqueue] dialcommandprocess
          : [e70c0565] {"id":"e70c0565","status":"OK"}

Log Sample 2: Example logs snippet of where it creates the new route but
just hangs on us after creation is complete
2023-08-22  INFO 54168 --- [ialcommandqueue]
c.g.dm.automation.etl.route.Operator     : Route dialCommandRoute-6fddd8cf
does not exist. This command (RecoverUnfinishedOperations) will create the
route.
2023-08-22  INFO 54168 --- [ialcommandqueue]
c.g.dm.automation.etl.route.Operator     : Creating route:
dialCommandRoute-6fddd8cf
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.impl.engine.AbstractCamelContext   :
seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0 converted to
endpoint: seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0 by
component: org.apache.camel.component.seda.SedaComponent@4a23350
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.i.e.InternalRouteStartupManager    : Warming up route id:
dialCommandRoute-6fddd8cf having autoStartup=true
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.i.e.InternalRouteStartupManager    : Route: dialCommandRoute-6fddd8cf
>>> Route[seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0 ->
null]
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.i.e.InternalRouteStartupManager    : Starting consumer (order: 1005)
on route: dialCommandRoute-6fddd8cf
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.i.e.BaseExecutorServiceManager     : Created new ThreadPool for
source:
Consumer[seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0] with
name: seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0. ->
org.apache.camel.util.concurrent.RejectableThreadPoolExecutor@5c41c3f8[Running,
pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
0][seda://dialcommand:6fddd8cf?concurrentConsumers=8&timeout=0]
2023-08-22 DEBUG 54168 --- [ialcommandqueue]
o.a.c.i.e.InternalRouteStartupManager    : Route: dialCommandRoute-6fddd8cf
started and consuming from: seda://dialcommand:6fddd8cf
-- NOTE here is where we'd normally see the route thread be running like
above but we don't and it just hangs at this point!
 [  e70c0565-main] dialCommandRoute-6fddd8cf                : [e70c0565]
dhRecoverUnfinishedOperations (THIS LOG NEVER HAPPENS!!)

Additionally, we can also tell that our routes/endpoints are being
created/destroyed correctly if we log out context route/endpoint info
seda://dialcommandqueue?timeout=0,
seda://dialcommandqueue?concurrentConsumers=20&timeout=0,
seda://dialcommand:9591ba9e?concurrentConsumers=8&timeout=0 (NEW ENDPOINT
EXAMPLE)
servlet://dialcommand, direct://defaultCommand

Lastly, when it hangs we can see that the SEDA component thinks its up and
running
INFO 59169 --- [ialcommandqueue] c.g.dm.automation.etl.route.Operator     :
### SEDA info: currQueueSize: 0, endPointSize: 1000, queueSize: 0,
queue.remainCapacity: 1000, stoppingOrStopped: false, suspended: false,
isRunAllowed: true, isStarted: true, status: Started

We're totally stumped at this point as to what could be causing this issue?

Reply via email to