In answer to this part of your question "..*Understanding the Issue:* Are there known reasons within Spark that could explain this difference in behavior when loading dependencies via `--packages` versus placing JARs directly? *2. "*
--jar Adds only that jar --package adds the Jar and a its dependencies listed in maven *HTH* Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Sat, 4 May 2024 at 12:24, Damien Hawes <marley.ha...@gmail.com> wrote: > Hi folks, > > I'm contributing to the OpenLineage project, specifically the Apache Spark > integration. My current focus is on extending the project to support data > lineage extraction for Spark Streaming, beginning with Apache Kafka sources > and sinks. > > I've encountered an obstacle when attempting to access information > essential for lineage extraction from Apache Kafka-related classes within > the OpenLineage Spark code base. Specifically, I need to access details > like Kafka topic names and bootstrap servers from objects like > StreamingDataSourceV2Relation. > > While I can successfully access these details if the Kafka JARs are placed > directly in the 'spark/jars' directory, I'm unable to do so when using the > `--packages` option for dependency management. This creates a significant > obstacle for users who rely on `--packages` for their Spark applications. > > I've taken initial steps to investigate (viewable in this GitHub PR > <https://github.com/OpenLineage/OpenLineage/pull/2647>, the class in > question is *StreamingDataSourceV2RelationVisitor*), but I'd greatly > appreciate any insights or guidance on the following: > > *1. Understanding the Issue:* Are there known reasons within Spark that > could explain this difference in behavior when loading dependencies via > `--packages` versus placing JARs directly? > *2. Alternative Approaches:* Are there recommended techniques or > patterns to access the necessary Kafka class information within a > SparkListener extension, especially when dependencies are managed via > `--packages`? > > I'm eager to find a solution that avoids heavy reliance on reflection. > > Thank you for your time and assistance! > > Kind regards, > Damien > >