I like an artefact repo as the proper solution. Problem with environments that haven't yet fully embraced devops: artefact repos are considered development tools and are often not yet used to promote packages to production, air gapped if necessary. -wim
On Wed, 21 Oct 2020 at 19:00, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > Hi Wim, > > This is an issue DEV/OPS face all the time. Cannot access the internet > behind the company firewall. There is Nexus > <https://www.sonatype.com/nexus/repository-pro> for this that manages > dependencies with usual load times in seconds. However, only authorised > accounts can request it through a service account. I concur it is messy. > > cheers, > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Wed, 21 Oct 2020 at 06:34, Wim Van Leuven < > wim.vanleu...@highestpoint.biz> wrote: > >> Sean, >> >> Problem with the -packages is that in enterprise settings security might >> not allow the data environment to link to the internet or even the internal >> proxying artefect repository. >> >> Also, wasn't uberjars an antipattern? For some reason I don't like them... >> >> Kind regards >> -wim >> >> >> >> On Wed, 21 Oct 2020 at 01:06, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> Thanks again all. >>> >>> Anyway as Nicola suggested I used the trench war approach to sort this >>> out by just using jars and working out their dependencies in ~/.ivy2/jars >>> directory using grep -lRi <missing> :) >>> >>> >>> This now works with just using jars (new added ones in grey) after >>> resolving the dependencies >>> >>> >>> ${SPARK_HOME}/bin/spark-submit \ >>> >>> --master yarn \ >>> >>> --deploy-mode client \ >>> >>> --conf spark.executor.memoryOverhead=3000 \ >>> >>> --class org.apache.spark.repl.Main \ >>> >>> --name "my own Spark shell on Yarn" "$@" \ >>> >>> --driver-class-path /home/hduser/jars/ddhybrid.jar \ >>> >>> --jars /home/hduser/jars/spark-bigquery-latest.jar, \ >>> >>> /home/hduser/jars/ddhybrid.jar, \ >>> >>> >>> /home/hduser/jars/com.google.http-client_google-http-client-1.24.1.jar, \ >>> >>> >>> >>> /home/hduser/jars/com.google.http-client_google-http-client-jackson2-1.24.1.jar, >>> \ >>> >>> >>> /home/hduser/jars/com.google.cloud.bigdataoss_util-1.9.4.jar, \ >>> >>> >>> /home/hduser/jars/com.google.api-client_google-api-client-1.24.1.jar, \ >>> >>> >>> /home/hduser/jars/com.google.oauth-client_google-oauth-client-1.24.1.jar, \ >>> >>> >>> >>> /home/hduser/jars/com.google.apis_google-api-services-bigquery-v2-rev398-1.24.1.jar, >>> \ >>> >>> >>> >>> /home/hduser/jars/com.google.cloud.bigdataoss_bigquery-connector-0.13.4-hadoop2.jar, >>> \ >>> >>> /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar \ >>> >>> >>> Compared to using the package itself as before >>> >>> >>> ${SPARK_HOME}/bin/spark-submit \ >>> >>> --master yarn \ >>> >>> --deploy-mode client \ >>> >>> --conf spark.executor.memoryOverhead=3000 \ >>> >>> --class org.apache.spark.repl.Main \ >>> >>> --name "my own Spark shell on Yarn" "$@" \ >>> >>> --driver-class-path /home/hduser/jars/ddhybrid.jar \ >>> >>> --jars /home/hduser/jars/spark-bigquery-latest.jar, \ >>> >>> /home/hduser/jars/ddhybrid.jar \ >>> >>> >>> --packages >>> com.github.samelamin:spark-bigquery_2.11:0.2.6 >>> >>> >>> >>> I think as Sean suggested this approach may or may not work (a manual >>> process) and if jars change, the whole thing has to be re-evaluated adding >>> to the complexity. >>> >>> >>> Cheers >>> >>> >>> On Tue, 20 Oct 2020 at 23:01, Sean Owen <sro...@gmail.com> wrote: >>> >>>> Rather, let --packages (via Ivy) worry about them, because they tell >>>> Ivy what they need. >>>> There's no 100% guarantee that conflicting dependencies are resolved in >>>> a way that works in every single case, which you run into sometimes when >>>> using incompatible libraries, but yes this is the point of --packages and >>>> Ivy. >>>> >>>> On Tue, Oct 20, 2020 at 4:43 PM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Thanks again all. >>>>> >>>>> Hi Sean, >>>>> >>>>> As I understood from your statement, you are suggesting just use >>>>> --packages without worrying about individual jar dependencies? >>>>> >>>>>> >>>>>>>>