Re: Local spark interpreter with extra java options

Lior Chaga Sun, 25 Jul 2021 01:28:44 -0700

Awesome, thanks Jeff.

On Sun, Jul 25, 2021 at 11:24 AM Jeff Zhang <[email protected]> wrote:


> Hi Lior，
>
> It would be fixed in https://github.com/apache/zeppelin/pull/4127
>
>
> Lior Chaga <[email protected]> 于2021年7月25日周日 下午3:58写道：
>
>> After a couple of attempts of code fixes, when every time I seemed to
>> make things work just to find out the next step in the process breaks, I've
>> found the most simple solution - put them extraJavaOptions in
>> spark-defaults.conf (instead of keeping them in interpreter settings)
>>
>>
>>
>> On Sun, Jul 11, 2021 at 1:30 PM Lior Chaga <[email protected]> wrote:
>>
>>> Thanks Jeff,
>>> So I should escape the whitespaces? Is there a ticket for it? couldn't
>>> find one
>>>
>>> On Sun, Jul 11, 2021 at 1:10 PM Jeff Zhang <[email protected]> wrote:
>>>
>>>> I believe this is due to SparkInterpreterLauncher doesn't support
>>>> parameters with whitespace. (It would use whitespace as delimiter to
>>>> separate parameters), this is a known issue
>>>>
>>>> Lior Chaga <[email protected]> 于2021年7月11日周日 下午4:14写道：
>>>>
>>>>> So after adding the quotes in both SparkInterpreterLauncher
>>>>> and interpreter.sh, interpreter is still failing with same error of
>>>>> Unrecognized option.
>>>>> But the weird thing is that if I copy the command supposedly executed
>>>>> from zeppelin (as it is printed to log) and run it directly in shell, the
>>>>> interpreter process is properly running. So my guess is that the forked
>>>>> process command that is created, is not really identical to the one that 
>>>>> is
>>>>> logged.
>>>>>
>>>>> This is how my cmd looks like (censored a bit):
>>>>>
>>>>> /usr/local/spark/bin/spark-submit
>>>>> --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
>>>>> --driver-class-path
>>>>> :/zeppelin/local-repo/spark/*:/zeppelin/interpreter/spark/*:::/zeppelin/inter
>>>>> preter/zeppelin-interpreter-shaded-0.10.0-SNAPSHOT.jar:/zeppelin/interpreter/spark/spark-interpreter-0.10.0-SNAPSHOT.jar:/etc/hadoop/conf
>>>>>
>>>>> *--driver-java-options " -DSERVICENAME=zeppelin_docker
>>>>> -Dfile.encoding=UTF-8
>>>>> -Dlog4j.configuration=file:///zeppelin/conf/log4j.properties
>>>>> -Dlog4j.configurationFile=file:///zeppelin/conf/log4j2.properties
>>>>> -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-shared_process--zeppelin-test-spark3-7d74d5df4-2g8x5.log"
>>>>> *
>>>>> --conf spark.driver.host=10.135.120.245
>>>>> --conf "spark.dynamicAllocation.minExecutors=1"
>>>>> --conf "spark.shuffle.service.enabled=true"
>>>>> --conf "spark.sql.parquet.int96AsTimestamp=true"
>>>>> --conf "spark.ui.retainedTasks=10000"
>>>>> --conf "spark.executor.heartbeatInterval=600s"
>>>>> --conf "spark.ui.retainedJobs=100"
>>>>> --conf "spark.sql.ui.retainedExecutions=10"
>>>>> --conf "spark.hadoop.cloneConf=true"
>>>>> --conf "spark.debug.maxToStringFields=200000"
>>>>> --conf "spark.executor.memory=70g"
>>>>> --conf
>>>>> "spark.executor.extraClassPath=../mysql-connector-java-8.0.18.jar:../guava-19.0.jar"
>>>>>
>>>>> --conf "spark.hadoop.fs.permissions.umask-mode=000"
>>>>> --conf "spark.memory.storageFraction=0.1"
>>>>> --conf "spark.scheduler.mode=FAIR"
>>>>> --conf "spark.sql.adaptive.enabled=true"
>>>>> --conf
>>>>> "spark.master=mesos://zk://zk003:2181,zk004:2181,zk006:2181,/mesos-zeppelin"
>>>>>
>>>>> --conf "spark.driver.memory=15g"
>>>>> --conf "spark.io.compression.codec=lz4"
>>>>> --conf "spark.executor.uri=
>>>>> https://artifactory.company.com/artifactory/static/spark/spark-dist/spark-3.1.2.2-hadoop-2.7-zulu";
>>>>> -
>>>>> -conf "spark.ui.retainedStages=500"
>>>>> --conf "spark.mesos.uris=
>>>>> https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/mysql-connector-java-8.0.18.jar,https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/guava-19.0.jar";
>>>>>
>>>>> --conf "spark.driver.maxResultSize=8g"
>>>>> *--conf "spark.executor.extraJavaOptions=-DSERVICENAME=Zeppelin
>>>>> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2015
>>>>> -XX:-OmitStackTraceInFastThrow -Dcom.sun.management.jmxremote
>>>>> -Dcom.sun.management.jmxremote.port=55745
>>>>> -Dcom.sun.management.jmxremote.authenticate=false
>>>>> -Dcom.sun.management.jmxremote.ssl=false -verbose:gc
>>>>> -Dlog4j.configurationFile=/etc/config/log4j2-executor-config.xml
>>>>> -XX:+UseG1GC -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
>>>>> -XX:+PrintGCTimeStamps -XX:+PrintFlagsFinal -XX:+PrintReferenceGC
>>>>> -XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy
>>>>> -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark
>>>>> -XX:+PrintStringDeduplicationStatistics -XX:+UseStringDeduplication
>>>>> -XX:InitiatingHeapOccupancyPercent=35
>>>>> -Dhttps.proxyHost=proxy.service.consul -Dhttps.proxyPort=3128" *
>>>>> --conf "spark.dynamicAllocation.enabled=true"
>>>>> --conf "spark.default.parallelism=1200"
>>>>> --conf
>>>>> "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2"
>>>>> --conf
>>>>> "spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS"
>>>>>
>>>>> --conf "spark.app.name=zeppelin_docker_spark3"
>>>>> --conf "spark.shuffle.service.port=7337"
>>>>> --conf "spark.memory.fraction=0.75"
>>>>> --conf "spark.mesos.coarse=true"
>>>>> --conf "spark.ui.port=4041"
>>>>> --conf "spark.dynamicAllocation.executorIdleTimeout=60s"
>>>>> --conf "spark.sql.shuffle.partitions=1200"
>>>>> --conf "spark.sql.parquet.outputTimestampType=TIMESTAMP_MILLIS"
>>>>> --conf "spark.dynamicAllocation.cachedExecutorIdleTimeout=120s"
>>>>> --conf "spark.network.timeout=1200s"
>>>>> --conf "spark.cores.max=600"
>>>>> --conf
>>>>> "spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem"
>>>>>
>>>>> --conf "spark.worker.timeout=150000"
>>>>> *--conf
>>>>> "spark.driver.extraJavaOptions=-Dhttps.proxyHost=proxy.service.consul
>>>>> -Dhttps.proxyPort=3128
>>>>> -Dlog4j.configuration=file:/usr/local/spark/conf/log4j.properties
>>>>> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>>>>> -Djavax.jdo.option.ConnectionPassword=2eebb22277
>>>>> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://proxysql-backend.service.consul.company.com:6033/hms?useSSL=false&databaseTerm=SCHEMA&nullDatabaseMeansCurrent=true
>>>>> <http://proxysql-backend.service.consul.company.com:6033/hms?useSSL=false&databaseTerm=SCHEMA&nullDatabaseMeansCurrent=true>
>>>>> -Djavax.jdo.option.ConnectionUserName=hms_rw" *
>>>>> --conf "spark.files.overwrite=true"
>>>>> /zeppelin/interpreter/spark/spark-interpreter-0.10.0-SNAPSHOT.jar
>>>>> 10.135.120.245
>>>>> 36419
>>>>> spark-shared_process :
>>>>>
>>>>>
>>>>>
>>>>> *Error: Unrecognized option:
>>>>> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2015*
>>>>>
>>>>> Will continue tackling it...
>>>>>
>>>>> On Thu, Jul 8, 2021 at 4:49 PM Jeff Zhang <[email protected]> wrote:
>>>>>
>>>>>> Thanks Lior for the investigation.
>>>>>>
>>>>>>
>>>>>> Lior Chaga <[email protected]> 于2021年7月8日周四 下午8:31写道：
>>>>>>
>>>>>>> Ok, I think I found the issue. It's not only that the quotations are
>>>>>>> missing from the --conf param, they are also missing from
>>>>>>> the --driver-java-options, which is concatenated to
>>>>>>> the INTERPRETER_RUN_COMMAND in interpreter.sh
>>>>>>>
>>>>>>> I will fix it in my build, but would like a confirmation that this
>>>>>>> is indeed the issue (and I'm not missing anything), so I'd open a pull
>>>>>>> request.
>>>>>>>
>>>>>>> On Thu, Jul 8, 2021 at 3:05 PM Lior Chaga <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm trying to run zeppelin using local spark interpreter.
>>>>>>>> Basically everything works, but if I try to set
>>>>>>>> `spark.driver.extraJavaOptions` or `spark.executor.extraJavaOptions`
>>>>>>>> containing several arguments, I get an exception.
>>>>>>>> For instance, for providing `-DmyParam=1 -DmyOtherParam=2`, I'd get:
>>>>>>>> Error: Unrecognized option: -DmyOtherParam=2
>>>>>>>>
>>>>>>>> I noticed that the spark submit looks as follow:
>>>>>>>>
>>>>>>>> spark-submit --class
>>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 
>>>>>>>> --driver-class-path
>>>>>>>> ....   *--conf spark.driver.extraJavaOptions=-DmyParam=1
>>>>>>>> -DmyOtherParam=2*
>>>>>>>>
>>>>>>>> So I tried to patch SparkInterpreterLauncher to add quotation marks
>>>>>>>> (like in the example from spark documentation -
>>>>>>>> https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties
>>>>>>>> )
>>>>>>>>
>>>>>>>> I see that the quotation marks were added: *--conf
>>>>>>>> "spark.driver.extraJavaOptions=-DmyParam=1 -DmyOtherParam=2"*
>>>>>>>> But I still get the same error.
>>>>>>>>
>>>>>>>> Any idea how I can make it work?
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>>
>>>>>> Jeff Zhang
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Local spark interpreter with extra java options

Reply via email to