Hi,
I'm running spark3 on Kubernetes and using S3A staging committer (directory
committer) to write data to s3 bucket. The same set up works fine with spark
2.4.5 but with spark3 the final data (writing in parquet format) is not visible
in s3 bucket and when read operation is performed on that
Hi,
I'm running spark3 on Kubernetes and using S3A staging committer (directory
committer) to write data to s3 bucket. The same set up works fine with
spark2 but with spark3 the final data (writing in parquet format) is not
visible in s3 bucket and when read operation is performed on that parquet
d
Hi,
Trying to make PySpark with PyCharm work with Structured Streaming
spark-3.0.1-bin-hadoop3.2
kafka_2.12-1.1.0
Basic code
from __future__ import print_function
from src.config import config, hive_url
import sys
from sparkutils import sparkstuff as s
class MDStreaming:
def __init__(self,
You should include commons-pool2-2.9.0.jar and remove
spark-streaming-kafka-0-10_2.12-3.0.1.jar (unnecessary jar).
On Mon, Feb 22, 2021 at 12:42 PM Mich Talebzadeh
wrote:
> Hi,
>
> Trying to make PySpark with PyCharm work with Structured Streaming
>
> spark-3.0.1-bin-hadoop3.2
> kafka_2.12-1.1.0
Many thanks Muru. That was a great help!
-
---+-+---+
|key |value
|headers|
+
I have written up a JIRA, and there is a gist attached that has code that
reproduces the issue. This is a fairly serious issue as it probably affects
everyone who uses spark to fit binary logistic regressions.
https://issues.apache.org/jira/browse/SPARK-34448
Would be great if someone who unders
I'll take a look. At a glance - is it converging? might turn down the
tolerance to check.
Also what does scikit learn say on the same data? we can continue on the
JIRA.
On Mon, Feb 22, 2021 at 5:42 PM Yakov Kerzhner wrote:
> I have written up a JIRA, and there is a gist attached that has code th