Hi,
I am trying to use the DataSourceV2 API to implement a spark connector for
Apache Phoenix. I am not using JDBCRelation because I want to optimize how
partitions are created during reads and provide support for more
complicated filter pushdown.
For reading I am using JdbcUtils.resultSetToSpark
Hi Xiangrui,
Thank you for the quick reply and the great questions.
“How does mmlspark handle dynamic allocation? Do you have a watch thread on the
driver to restart the job if there are more workers? And when the number of
workers decrease, can training continue without driver involved?”
Curren
Thank you for replying, Sean. error is as follows:
Py4JJavaError: An error occurred while calling o49.load.
: org.apache.spark.sql.AnalysisException: Failed to find data source:
kafka. Please deploy the application as per the deployment section of
"Structured Streaming + Kafka Integration Guide".;
Dear spark dev
I am trying to run IPython notebook with Kafka structured streaming
support, I couldn't find a way to load Kafka package by adding "--packages
org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0"
to PYSPARK_DRIVER_PYTHON_OPTS or even I changed my local pyspark script to
"exec "${SPARK_H
Hi everyone,
I am encountering an annoying issue when running spark with external jar
dependency downloaded from maven. This is how we run it
spark-shell --repositories --packages
When we release a new version and we have some big change in the API,
things start to randomly break for some user