Hi Kapil,
Thanks for suggestion. Yes, It worked.
Regards
Sachit
On Tue, 9 Mar 2021, 00:19 Kapil Garg, wrote:
> Hi Sachit,
> What do you mean by "spark is running only 1 executor with 1 task" ?
> Did you submit the spark application with multiple executors but only 1 is
> being used and rest ar
Hi Sachit,
What do you mean by "spark is running only 1 executor with 1 task" ?
Did you submit the spark application with multiple executors but only 1 is
being used and rest are idle ?
If that's the case, then it might happen due to spark.locality.wait setting
which is by default set to 3s. This w
Hi All,
I am using Spark 3.0.1 Structuring streaming with Pyspark.
The problem is spark is running only 1 executor with 1 task. Following is
the summary of what I am doing.
Can anyone help on why my executor is 1 only?
def process_events(event):
fetch_actual_data()
#many more steps
def fetch_a
In SS, checkpointing is now a part of running micro-batch and it's
supported natively. (making clear, my library doesn't deal with the native
behavior of checkpointing)
In other words, it can't be customized like you have been doing with your
database. You probably don't need to do it with SS, but
Thanks Lim, this is really helpful. I have few questions.
Our earlier approach used low level customer to read offsets from database
and use those information to read using spark streaming in Dstreams. Save
the offsets back once the process is finished. This way we never lost data.
with your libr
There're sections in SS programming guide which exactly answer these
questions:
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#monitoring-streaming-queries
A
In 3.0 the community just added it.
On Sun, 5 Jul 2020, 14:28 KhajaAsmath Mohammed,
wrote:
> Hi,
>
> We are trying to move our existing code from spark dstreams to structured
> streaming for one of the old application which we built few years ago.
>
> Structured streaming job doesn’t have stream
Hi,
We are trying to move our existing code from spark dstreams to structured
streaming for one of the old application which we built few years ago.
Structured streaming job doesn’t have streaming tab in sparkui. Is there a way
to monitor the job submitted by us in structured streaming ? Since
ok, thanks for confirming, I will do it this way.
Regards
Srini
On Tue, Jun 9, 2020 at 11:31 PM Gerard Maas wrote:
> Hi Srinivas,
>
> Reading from different brokers is possible but you need to connect to each
> Kafka cluster separately.
> Trying to mix connections to two different Kafka cluster
Hi Srinivas,
Reading from different brokers is possible but you need to connect to each
Kafka cluster separately.
Trying to mix connections to two different Kafka clusters in one subscriber
is not supported. (I'm sure that it would give all kind of weird errors)
The "kafka.bootstrap.servers" opti
Thanks for the quick reply. This may work but I have like 5 topics to
listen to right now, I am trying to keep all topics in an array in a
properties file and trying to read all at once. This way it is dynamic and
you have one code block like below and you may add or delete topics from
the config f
Hello,
I've never tried that, this doesn't work?
val df_cluster1 = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", "cluster1_host:cluster1_port")
.option("subscribe", "topic1")
val df_cluster2 = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", "cluste
Hello,
In Structured Streaming, is it possible to have one spark application with
one query to consume topics from multiple kafka clusters?
I am trying to consume two topics each from different Kafka Cluster, but it
gives one of the topics as an unknown topic and the job keeps running
without com
Key option is not work!
I have set the " maxOffsetsPerTrigger",but it still receive one partition
per trigger on micro-batch mode.So where to set receiving on 10 partitions
parallel like what is Spark Streaming doing?
Hi
NA function will replace null with some default value and not all my
columns are of type string, so for some other data types (long/int etc) I
have to provide some default value
But ideally those values should be null
Actually this null column drop is happening in this step
df.selectExpr( "
See also here:
https://stackoverflow.com/questions/44671597/how-to-replace-null-values-with-a-specific-value-in-dataframe-using-spark-in-jav
On Mon, Apr 29, 2019 at 5:27 PM Jason Nerothin
wrote:
> Spark SQL has had an na.fill function on it since at least 2.1. Would that
> work for you?
>
>
> ht
Spark SQL has had an na.fill function on it since at least 2.1. Would that
work for you?
https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/DataFrameNaFunctions.html
On Mon, Apr 29, 2019 at 4:57 PM Shixiong(Ryan) Zhu
wrote:
> Hey Snehasish,
>
> Do you have a reproducer for this i
Hey Snehasish,
Do you have a reproducer for this issue?
Best Regards,
Ryan
On Wed, Apr 24, 2019 at 7:24 AM SNEHASISH DUTTA
wrote:
> Hi,
>
> While writing to kafka using spark structured streaming , if all the
> values in certain column are Null it gets dropped
> Is there any way to override t
Hi,
While writing to kafka using spark structured streaming , if all the values
in certain column are Null it gets dropped
Is there any way to override this , other than using na.fill functions
Regards,
Snehasish
If you are using kafka direct connect api it might be committing offset
back to kafka itself
בתאריך יום ה׳, 7 ביוני 2018, 4:10, מאת licl :
> I met the same issue and I have try to delete the checkpoint dir before the
> job ,
>
> But spark seems can read the correct offset even though after the
I met the same issue and I have try to delete the checkpoint dir before the
job ,
But spark seems can read the correct offset even though after the
checkpoint dir is deleted ,
I don't know how spark do this without checkpoint's metadata.
--
Sent from: http://apache-spark-user-list.1001560.n3.
Hi:
I am working on a realtime application using spark structured streaming (v
2.2.1). The application reads data from kafka and if there is a failure, I
would like to ignore the checkpoint. Is there any configuration to just read
from last kafka offset after a failure and ignore any offset che
Structured Streaming AUTOMATICALLY saves the offsets in a checkpoint
directory that you provide. And when you start the query again with the
same directory it will just pick up where it left off.
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failur
Hi:
I am working with Spark (2.2.1) and Kafka (0.10) on AWS EMR and for the last
few days, after running the application for 30-60 minutes get exception from
Kafka Consumer included below.
The structured streaming application is processing 1 minute worth of data from
kafka topic. So I've tried
Hi,
I have a problem with Structured Streaming and Kafka.
I have 2 brokers and a topic with 8 partitions and replication factor 2.
This is my driver program:
public static void main(String[] args)
{
SparkSession spark = SparkSession
.builder()
.app
26 matches
Mail list logo