Hi There,
My use case is to read a simple json message from Kafka queue using Spark
Structured Streaming. But I’m getting the following error message when I run
my Kafka consumer. I don’t get this error when using Spark direct stream. The
issue is happening only with structured streaming. Any
If you tested it end to end with the current version and it works fine , I'd
say go ahead unless there is another similar way. If they change the
functionality you can always update it.
Regarding "non-experimental" functions ,they could also be marked as
deprecated and then removed on later versio
From: Pat Ferrel
Reply: Pat Ferrel
Date: February 12, 2019 at 5:40:41 PM
To: user@spark.apache.org
Subject: Spark with Kubernetes connecting to pod id, not address
We have a k8s deployment of several services including Apache Spark. All
services seem to be operational. Our application con
Yeah, then the easiest would be to fork spark and run using the forked
version, and in case of YARN it should be pretty easy to do.
git clone https://github.com/apache/spark.git
cd spark
export MAVEN_OPTS="-Xmx4g -XX:ReservedCodeCacheSize=512m"
./build/mvn -DskipTests clean package
./dev/make-
* It is not getPartitions() but getNumPartitions().
El mar., 12 de feb. de 2019 a la(s) 13:08, Pedro Tuero (tuerope...@gmail.com)
escribió:
> And this is happening in every job I run. It is not just one case. If I
> add a forced repartitions it works fine, even better than before. But I run
> the
And this is happening in every job I run. It is not just one case. If I add
a forced repartitions it works fine, even better than before. But I run the
same code for different inputs so the number to make repartitions must be
related to the input.
El mar., 12 de feb. de 2019 a la(s) 11:22, Pedro
Hi Jacek.
I 'm not using SparkSql, I'm using RDD API directly.
I can confirm that the jobs and stages are the same on both executions.
In the environment tab of the web UI, when using spark 2.4
spark.default.parallelism=128 is shown while in 2.3.1 is not.
But in 2.3.1 should be the same, because 12
Hi,
Can you show the plans with explain(extended=true) for both versions?
That's where I'd start to pinpoint the issue. Perhaps the underlying
execution engine change to affect keyBy? Dunno and guessing...
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Mastering Spark SQL https:
I tried a similar approach, it works well for user functions. but I need to
crash tasks or executor when spark application runs "repartition". I didn't
any away to inject "poison pill" into repartition call :(
пн, 11 февр. 2019 г. в 21:19, Vadim Semenov :
> something like this
>
> import org.apac
Too little information to give an answer, if indeed an answer a priori is
possible.
However, I would do the following on your test instances:
- Run jstat -gc on all your nodes. It might be that the GC is taking a lot
of time.
- Poll with jstack semi frequently. I can give you a fairly good idea
Hi,
I'm in a somewhat similar situation. Here's what I do (it seems to be
working so far):
1. Stream in the JSON as a plain string.
2. Feed this string into a JSON library to validate it (I use Circe).
3. Using the same library, parse the JSON and extract fields X, Y and Z.
4. Create a dataset wi
12 matches
Mail list logo