Hi,
You need to add the prefix "kafka." for the configurations which should be
propagated to the Kafka. Others will be used in Spark data source
itself. (Kafka connector in this case)
https://spark.apache.org/docs/2.4.5/structured-streaming-kafka-integration.html#kafka-specific-configurations
Ho
Looking at the stack trace, your data from Spark gets serialized to an
ArrayList (of something) whereas in your scala code you are using an Array
of Rows. So, the types don't lign up. That's the exception you are seeing:
the JVM searches for a signature that simply does not exist.
Try to turn the
Hello,
Even for me it comes as 0 when I print in OnQueryProgress. I use
LongAccumulator as well. Yes, it prints on my local but not on cluster.
But one consolation is that when I send metrics to Graphana, the values are
coming there.
On Tue, May 26, 2020 at 3:10 AM Something Something <
mailinglis
Hi,
I've hit a wall with trying to implement a couple of Scala methods in a
Python version of our project. I've implemented a number of these
already, but I'm getting hung up with this one.
My Python function looks like this:
def Write_Graphml(data, graphml_path, sc):
return sc.getOrCreate()
I keep getting this error message:
*The message is 1169350 bytes when serialized which is larger than the
maximum request size you have configured with the max.request.size
configuration.*
As indicated in other posts, I am trying to set the “max.request.size”
configuration in the Producer as f
No this is not working even if I use LongAccumulator.
On Fri, May 15, 2020 at 9:54 PM ZHANG Wei wrote:
> There is a restriction in AccumulatorV2 API [1], the OUT type should be
> atomic or thread safe. I'm wondering if the implementation for
> `java.util.Map[T, Long]` can meet it or not. Is ther
So even on RDDs cache/persist mutate the RDD object. The important thing
for Spark is that the data represented/in the RDD/Dataframe isn’t mutated.
On Mon, May 25, 2020 at 10:56 AM Chris Thomas
wrote:
>
> The cache() method on the DataFrame API caught me out.
>
> Having learnt that DataFrames a
The cache() method on the DataFrame API caught me out.
Having learnt that DataFrames are built on RDDs and that RDDs are
immutable, when I saw the statement df.cache() in our codebase I thought
‘This must be a bug, the result is not assigned, the statement will have no
affect.’
However, I’ve sinc
Hey, from what I know you can try to Union them df.union(df2)
Not sure if this is what you need
> On 25. May 2020, at 13:53, Tanveer Ahmad - EWI wrote:
>
> Hi all,
>
> I need some help regarding Arrow RecordBatches/Pandas Dataframes to (Arrow
> enabled) Spark Dataframe conversions.
> Here th
Hi all,
I need some help regarding Arrow RecordBatches/Pandas Dataframes to (Arrow
enabled) Spark Dataframe conversions.
Here the example explains very well how to convert a single Pandas Dataframe to
Spark Dataframe [1].
But in my case, some external applications are generating Arrow Record
Thanks Dhaval for the suggestion, but in the case i mentioned in previous mail
still data can be missed as the row number will change.
-
Manjunath
From: Dhaval Patel
Sent: Monday, May 25, 2020 3:01 PM
To: Manjunath Shetty H
Subject: Re: Parallelising JDBC reads
Thanks Georg for the suggestion, but at this point changing the design is not
really the option.
Any other pointer would be helpful.
Thanks
Manjunath
From: Georg Heiler
Sent: Monday, May 25, 2020 11:52 AM
To: Manjunath Shetty H
Cc: Mike Artz ; user
Subject: R
12 matches
Mail list logo