Awesome, thanks Jeff. On Sun, Jul 25, 2021 at 11:24 AM Jeff Zhang <zjf...@gmail.com> wrote:
> Hi Lior, > > It would be fixed in https://github.com/apache/zeppelin/pull/4127 > > > Lior Chaga <lio...@taboola.com> 于2021年7月25日周日 下午3:58写道: > >> After a couple of attempts of code fixes, when every time I seemed to >> make things work just to find out the next step in the process breaks, I've >> found the most simple solution - put them extraJavaOptions in >> spark-defaults.conf (instead of keeping them in interpreter settings) >> >> >> >> On Sun, Jul 11, 2021 at 1:30 PM Lior Chaga <lio...@taboola.com> wrote: >> >>> Thanks Jeff, >>> So I should escape the whitespaces? Is there a ticket for it? couldn't >>> find one >>> >>> On Sun, Jul 11, 2021 at 1:10 PM Jeff Zhang <zjf...@gmail.com> wrote: >>> >>>> I believe this is due to SparkInterpreterLauncher doesn't support >>>> parameters with whitespace. (It would use whitespace as delimiter to >>>> separate parameters), this is a known issue >>>> >>>> Lior Chaga <lio...@taboola.com> 于2021年7月11日周日 下午4:14写道: >>>> >>>>> So after adding the quotes in both SparkInterpreterLauncher >>>>> and interpreter.sh, interpreter is still failing with same error of >>>>> Unrecognized option. >>>>> But the weird thing is that if I copy the command supposedly executed >>>>> from zeppelin (as it is printed to log) and run it directly in shell, the >>>>> interpreter process is properly running. So my guess is that the forked >>>>> process command that is created, is not really identical to the one that >>>>> is >>>>> logged. >>>>> >>>>> This is how my cmd looks like (censored a bit): >>>>> >>>>> /usr/local/spark/bin/spark-submit >>>>> --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer >>>>> --driver-class-path >>>>> :/zeppelin/local-repo/spark/*:/zeppelin/interpreter/spark/*:::/zeppelin/inter >>>>> preter/zeppelin-interpreter-shaded-0.10.0-SNAPSHOT.jar:/zeppelin/interpreter/spark/spark-interpreter-0.10.0-SNAPSHOT.jar:/etc/hadoop/conf >>>>> >>>>> *--driver-java-options " -DSERVICENAME=zeppelin_docker >>>>> -Dfile.encoding=UTF-8 >>>>> -Dlog4j.configuration=file:///zeppelin/conf/log4j.properties >>>>> -Dlog4j.configurationFile=file:///zeppelin/conf/log4j2.properties >>>>> -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-shared_process--zeppelin-test-spark3-7d74d5df4-2g8x5.log" >>>>> * >>>>> --conf spark.driver.host=10.135.120.245 >>>>> --conf "spark.dynamicAllocation.minExecutors=1" >>>>> --conf "spark.shuffle.service.enabled=true" >>>>> --conf "spark.sql.parquet.int96AsTimestamp=true" >>>>> --conf "spark.ui.retainedTasks=10000" >>>>> --conf "spark.executor.heartbeatInterval=600s" >>>>> --conf "spark.ui.retainedJobs=100" >>>>> --conf "spark.sql.ui.retainedExecutions=10" >>>>> --conf "spark.hadoop.cloneConf=true" >>>>> --conf "spark.debug.maxToStringFields=200000" >>>>> --conf "spark.executor.memory=70g" >>>>> --conf >>>>> "spark.executor.extraClassPath=../mysql-connector-java-8.0.18.jar:../guava-19.0.jar" >>>>> >>>>> --conf "spark.hadoop.fs.permissions.umask-mode=000" >>>>> --conf "spark.memory.storageFraction=0.1" >>>>> --conf "spark.scheduler.mode=FAIR" >>>>> --conf "spark.sql.adaptive.enabled=true" >>>>> --conf >>>>> "spark.master=mesos://zk://zk003:2181,zk004:2181,zk006:2181,/mesos-zeppelin" >>>>> >>>>> --conf "spark.driver.memory=15g" >>>>> --conf "spark.io.compression.codec=lz4" >>>>> --conf "spark.executor.uri= >>>>> https://artifactory.company.com/artifactory/static/spark/spark-dist/spark-3.1.2.2-hadoop-2.7-zulu" >>>>> - >>>>> -conf "spark.ui.retainedStages=500" >>>>> --conf "spark.mesos.uris= >>>>> https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/mysql-connector-java-8.0.18.jar,https://artifactory.company.com/artifactory/static/spark/spark-executor/jars/guava-19.0.jar" >>>>> >>>>> --conf "spark.driver.maxResultSize=8g" >>>>> *--conf "spark.executor.extraJavaOptions=-DSERVICENAME=Zeppelin >>>>> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2015 >>>>> -XX:-OmitStackTraceInFastThrow -Dcom.sun.management.jmxremote >>>>> -Dcom.sun.management.jmxremote.port=55745 >>>>> -Dcom.sun.management.jmxremote.authenticate=false >>>>> -Dcom.sun.management.jmxremote.ssl=false -verbose:gc >>>>> -Dlog4j.configurationFile=/etc/config/log4j2-executor-config.xml >>>>> -XX:+UseG1GC -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps >>>>> -XX:+PrintGCTimeStamps -XX:+PrintFlagsFinal -XX:+PrintReferenceGC >>>>> -XX:+PrintGCDetails -XX:+PrintAdaptiveSizePolicy >>>>> -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark >>>>> -XX:+PrintStringDeduplicationStatistics -XX:+UseStringDeduplication >>>>> -XX:InitiatingHeapOccupancyPercent=35 >>>>> -Dhttps.proxyHost=proxy.service.consul -Dhttps.proxyPort=3128" * >>>>> --conf "spark.dynamicAllocation.enabled=true" >>>>> --conf "spark.default.parallelism=1200" >>>>> --conf >>>>> "spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2" >>>>> --conf >>>>> "spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS" >>>>> >>>>> --conf "spark.app.name=zeppelin_docker_spark3" >>>>> --conf "spark.shuffle.service.port=7337" >>>>> --conf "spark.memory.fraction=0.75" >>>>> --conf "spark.mesos.coarse=true" >>>>> --conf "spark.ui.port=4041" >>>>> --conf "spark.dynamicAllocation.executorIdleTimeout=60s" >>>>> --conf "spark.sql.shuffle.partitions=1200" >>>>> --conf "spark.sql.parquet.outputTimestampType=TIMESTAMP_MILLIS" >>>>> --conf "spark.dynamicAllocation.cachedExecutorIdleTimeout=120s" >>>>> --conf "spark.network.timeout=1200s" >>>>> --conf "spark.cores.max=600" >>>>> --conf >>>>> "spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem" >>>>> >>>>> --conf "spark.worker.timeout=150000" >>>>> *--conf >>>>> "spark.driver.extraJavaOptions=-Dhttps.proxyHost=proxy.service.consul >>>>> -Dhttps.proxyPort=3128 >>>>> -Dlog4j.configuration=file:/usr/local/spark/conf/log4j.properties >>>>> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver >>>>> -Djavax.jdo.option.ConnectionPassword=2eebb22277 >>>>> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://proxysql-backend.service.consul.company.com:6033/hms?useSSL=false&databaseTerm=SCHEMA&nullDatabaseMeansCurrent=true >>>>> <http://proxysql-backend.service.consul.company.com:6033/hms?useSSL=false&databaseTerm=SCHEMA&nullDatabaseMeansCurrent=true> >>>>> -Djavax.jdo.option.ConnectionUserName=hms_rw" * >>>>> --conf "spark.files.overwrite=true" >>>>> /zeppelin/interpreter/spark/spark-interpreter-0.10.0-SNAPSHOT.jar >>>>> 10.135.120.245 >>>>> 36419 >>>>> spark-shared_process : >>>>> >>>>> >>>>> >>>>> *Error: Unrecognized option: >>>>> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2015* >>>>> >>>>> Will continue tackling it... >>>>> >>>>> On Thu, Jul 8, 2021 at 4:49 PM Jeff Zhang <zjf...@gmail.com> wrote: >>>>> >>>>>> Thanks Lior for the investigation. >>>>>> >>>>>> >>>>>> Lior Chaga <lio...@taboola.com> 于2021年7月8日周四 下午8:31写道: >>>>>> >>>>>>> Ok, I think I found the issue. It's not only that the quotations are >>>>>>> missing from the --conf param, they are also missing from >>>>>>> the --driver-java-options, which is concatenated to >>>>>>> the INTERPRETER_RUN_COMMAND in interpreter.sh >>>>>>> >>>>>>> I will fix it in my build, but would like a confirmation that this >>>>>>> is indeed the issue (and I'm not missing anything), so I'd open a pull >>>>>>> request. >>>>>>> >>>>>>> On Thu, Jul 8, 2021 at 3:05 PM Lior Chaga <lio...@taboola.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I'm trying to run zeppelin using local spark interpreter. >>>>>>>> Basically everything works, but if I try to set >>>>>>>> `spark.driver.extraJavaOptions` or `spark.executor.extraJavaOptions` >>>>>>>> containing several arguments, I get an exception. >>>>>>>> For instance, for providing `-DmyParam=1 -DmyOtherParam=2`, I'd get: >>>>>>>> Error: Unrecognized option: -DmyOtherParam=2 >>>>>>>> >>>>>>>> I noticed that the spark submit looks as follow: >>>>>>>> >>>>>>>> spark-submit --class >>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer >>>>>>>> --driver-class-path >>>>>>>> .... *--conf spark.driver.extraJavaOptions=-DmyParam=1 >>>>>>>> -DmyOtherParam=2* >>>>>>>> >>>>>>>> So I tried to patch SparkInterpreterLauncher to add quotation marks >>>>>>>> (like in the example from spark documentation - >>>>>>>> https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties >>>>>>>> ) >>>>>>>> >>>>>>>> I see that the quotation marks were added: *--conf >>>>>>>> "spark.driver.extraJavaOptions=-DmyParam=1 -DmyOtherParam=2"* >>>>>>>> But I still get the same error. >>>>>>>> >>>>>>>> Any idea how I can make it work? >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards >>>>>> >>>>>> Jeff Zhang >>>>>> >>>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> > > -- > Best Regards > > Jeff Zhang >