Re: Why spark-submit works with package not with jar

Wim Van Leuven Wed, 21 Oct 2020 11:15:58 -0700

I like an artefact repo as the proper solution. Problem with environments
that haven't yet fully embraced devops: artefact repos are considered
development tools and are often not yet used to promote packages to
production, air gapped if necessary.
-wim


On Wed, 21 Oct 2020 at 19:00, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

>
> Hi Wim,
>
> This is an issue DEV/OPS face all the time. Cannot access the internet
> behind the company firewall. There is Nexus
> <https://www.sonatype.com/nexus/repository-pro> for this that manages
> dependencies with usual load times in seconds. However, only authorised
> accounts can request it through a service account. I concur it is messy.
>
> cheers,
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 21 Oct 2020 at 06:34, Wim Van Leuven <
> wim.vanleu...@highestpoint.biz> wrote:
>
>> Sean,
>>
>> Problem with the -packages is that in enterprise settings security might
>> not allow the data environment to link to the internet or even the internal
>> proxying artefect repository.
>>
>> Also, wasn't uberjars an antipattern? For some reason I don't like them...
>>
>> Kind regards
>> -wim
>>
>>
>>
>> On Wed, 21 Oct 2020 at 01:06, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Thanks again all.
>>>
>>> Anyway as Nicola suggested I used the trench war approach to sort this
>>> out by just using jars and working out their dependencies in ~/.ivy2/jars
>>> directory using grep -lRi <missing> :)
>>>
>>>
>>> This now works with just using jars (new added ones in grey) after
>>> resolving the dependencies
>>>
>>>
>>> ${SPARK_HOME}/bin/spark-submit \
>>>
>>>                 --master yarn \
>>>
>>>                 --deploy-mode client \
>>>
>>>                 --conf spark.executor.memoryOverhead=3000 \
>>>
>>>                 --class org.apache.spark.repl.Main \
>>>
>>>                 --name "my own Spark shell on Yarn" "$@" \
>>>
>>>                 --driver-class-path /home/hduser/jars/ddhybrid.jar \
>>>
>>>                 --jars /home/hduser/jars/spark-bigquery-latest.jar, \
>>>
>>>                        /home/hduser/jars/ddhybrid.jar, \
>>>
>>>
>>>  /home/hduser/jars/com.google.http-client_google-http-client-1.24.1.jar, \
>>>
>>>
>>>  
>>> /home/hduser/jars/com.google.http-client_google-http-client-jackson2-1.24.1.jar,
>>> \
>>>
>>>
>>>  /home/hduser/jars/com.google.cloud.bigdataoss_util-1.9.4.jar, \
>>>
>>>
>>>  /home/hduser/jars/com.google.api-client_google-api-client-1.24.1.jar, \
>>>
>>>
>>> /home/hduser/jars/com.google.oauth-client_google-oauth-client-1.24.1.jar, \
>>>
>>>
>>>  
>>> /home/hduser/jars/com.google.apis_google-api-services-bigquery-v2-rev398-1.24.1.jar,
>>> \
>>>
>>>
>>>  
>>> /home/hduser/jars/com.google.cloud.bigdataoss_bigquery-connector-0.13.4-hadoop2.jar,
>>> \
>>>
>>>                        /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar \
>>>
>>>
>>> Compared to using the package itself as before
>>>
>>>
>>> ${SPARK_HOME}/bin/spark-submit \
>>>
>>>                 --master yarn \
>>>
>>>                 --deploy-mode client \
>>>
>>>                 --conf spark.executor.memoryOverhead=3000 \
>>>
>>>                 --class org.apache.spark.repl.Main \
>>>
>>>                 --name "my own Spark shell on Yarn" "$@" \
>>>
>>>                 --driver-class-path /home/hduser/jars/ddhybrid.jar \
>>>
>>>                 --jars /home/hduser/jars/spark-bigquery-latest.jar, \
>>>
>>>                        /home/hduser/jars/ddhybrid.jar \
>>>
>>>
>>>                 --packages
>>> com.github.samelamin:spark-bigquery_2.11:0.2.6
>>>
>>>
>>>
>>> I think as Sean suggested this approach may or may not work (a manual
>>> process) and if jars change, the whole thing has to be re-evaluated adding
>>> to the complexity.
>>>
>>>
>>> Cheers
>>>
>>>
>>> On Tue, 20 Oct 2020 at 23:01, Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> Rather, let --packages (via Ivy) worry about them, because they tell
>>>> Ivy what they need.
>>>> There's no 100% guarantee that conflicting dependencies are resolved in
>>>> a way that works in every single case, which you run into sometimes when
>>>> using incompatible libraries, but yes this is the point of --packages and
>>>> Ivy.
>>>>
>>>> On Tue, Oct 20, 2020 at 4:43 PM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Thanks again all.
>>>>>
>>>>> Hi Sean,
>>>>>
>>>>> As I understood from your statement, you are suggesting just use
>>>>> --packages without worrying about individual jar dependencies?
>>>>>
>>>>>>
>>>>>>>>

Re: Why spark-submit works with package not with jar

Reply via email to