Package option gets outdated jar when running with "latest"

2018-12-28 Thread Alessandro Liparoti
Hi everyone, I am encountering an annoying issue when running spark with external jar dependency downloaded from maven. This is how we run it spark-shell --repositories --packages When we release a new version and we have some big change in the API, things start to randomly break for some user

Add packages for ipython notebook

2018-12-28 Thread Haibo Yan
Dear spark dev I am trying to run IPython notebook with Kafka structured streaming support, I couldn't find a way to load Kafka package by adding "--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0" to PYSPARK_DRIVER_PYTHON_OPTS or even I changed my local pyspark script to "exec "${SPARK_H

Re: Add packages for ipython notebook

2018-12-28 Thread Haibo Yan
Thank you for replying, Sean. error is as follows: Py4JJavaError: An error occurred while calling o49.load. : org.apache.spark.sql.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;

RE: barrier execution mode with DataFrame and dynamic allocation

2018-12-28 Thread Ilya Matiach
Hi Xiangrui, Thank you for the quick reply and the great questions. “How does mmlspark handle dynamic allocation? Do you have a watch thread on the driver to restart the job if there are more workers? And when the number of workers decrease, can training continue without driver involved?” Curren

DataSourceV2 implementation for JDBC sources

2018-12-28 Thread Thomas D'Silva
Hi, I am trying to use the DataSourceV2 API to implement a spark connector for Apache Phoenix. I am not using JDBCRelation because I want to optimize how partitions are created during reads and provide support for more complicated filter pushdown. For reading I am using JdbcUtils.resultSetToSpark