djanderson commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2684139203
Hey all, sorry for the noise on this thread lately, I've been working to try and understand how to apply this example to an actual DataFusion-based server, and not having much luck. I now have the dedicated executor from this PR fully integrated and working in my [executor starvation reproducer](https://github.com/djanderson/parquet-sink-dedicated-exec-repro), however, I'm not observing a significant improvement. I see others mentioning that a similar approach has worked okay for them in the past, which makes me question what I'm seeing. I could really use another set of eyes on it. I'm intentionally exercising the ingest path. It seems to more consistently exhibit the starvation issues, and the result of hitting the issue on ingest is potential data loss.... definitely something a database should avoid 😬 Does it show what I think it shows, or _might_ it be some other phenomenon? If it shows 1) consistent executor starvation and 2) that this PR doesn't seem to improve it and 3) that we therefore have no working example as a community of how to ingest data into a server-deployed datafusion-based database, would we consider that a significant issue? Maybe even worth calling out as a [priority on the roadmap](https://github.com/apache/datafusion/issues/14580)? As always, willing to keep testing with the threadpool approach or other suggestions. LMK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org