Updating Broadcast Variable in Spark Streaming 2.4.4

2022-09-28 Thread Dipl.-Inf. Rico Bergmann
Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup (usi

Updating Broadcast Variable in Spark Streaming 2.4.4

2022-07-22 Thread Dipl.-Inf. Rico Bergmann
Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup (usin

Re: Cast int to string not possible?

2022-02-17 Thread Rico Bergmann
I found the reason why it did not work: When returning the Spark data type I was calling new StringType(). When changing it to DataTypes.StringType it worked. Greets, Rico. > Am 17.02.2022 um 14:13 schrieb Gourav Sengupta : > >  > Hi, > > can you please post a screen

Re: Cast int to string not possible?

2022-02-17 Thread Rico Bergmann
. Best, Rico > Am 17.02.2022 um 09:56 schrieb ayan guha : >  > Can you try to cast any other Int field which is NOT a partition column? > > On Thu, 17 Feb 2022 at 7:34 pm, Gourav Sengupta > wrote: >> Hi, >> >> This appears interesting, casting INT to STRIN

Re: Cast int to string not possible?

2022-02-16 Thread Rico Bergmann
the column and the data type of the column. Best, Rico. > Am 17.02.2022 um 03:17 schrieb Morven Huang : > > Hi Rico, you have any code snippet? I have no problem casting int to string. > >> 2022年2月17日 上午12:26,Rico Bergmann 写道: >> >> Hi! >> >> I am

Cast int to string not possible?

2022-02-16 Thread Rico Bergmann
column to StringType. But this fails with AnalysisException “cannot cast int to string”. Is this a bug? Or is it really not allowed to cast an int to a string? I’m using Spark 3.1.1 Best regards Rico. - To unsubscribe e-mail

Re: Spark DataFrame CodeGeneration in Java generates Scala specific code?

2021-04-29 Thread Rico Bergmann
Indeed adding public constructors solved the problem... Thanks a lot! > Am 29.04.2021 um 18:53 schrieb Rico Bergmann : > >  > It didn’t have it. So I added public no args and all args constructors. But I > still get the same error > > > >>> Am 29.0

Re: Spark DataFrame CodeGeneration in Java generates Scala specific code?

2021-04-29 Thread Rico Bergmann
gt; On Thu, Apr 29, 2021 at 9:55 AM Rico Bergmann wrote: >> Here is the relevant generated code and the Exception stacktrace. >> >> The problem in the generated code is at line 35. >>

Re: Spark DataFrame CodeGeneration in Java generates Scala specific code?

2021-04-29 Thread Rico Bergmann
is looking for > members of a companion object when there is none here. Can you show any more > of the stack trace or generated code? > >> On Thu, Apr 29, 2021 at 7:40 AM Rico Bergmann wrote: >> Hi all! >> >> A simplified code snippet of what my Spark pipe

Spark DataFrame CodeGeneration in Java generates Scala specific code?

2021-04-29 Thread Rico Bergmann
out an unknown variable or type "MyPojo$.MODULE$". For me it looks like the CodeGenerator generates code for Scala (since as far as I know .MODULE$ is a scala specific variable). I tried it with Spark 3.1.1 and Spark 3.0.1. Does anyone h

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
Hi! Thanks for your reply! For several reasons we don't want to "pipe" the real data through Kafka. What may be a problem arising from this approach? Best, Rico. Am 05.03.2021 um 09:18 schrieb Roland Johann: Hi Rico, there is no way to deferr records from one micro batch t

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
but I limit the number of events per trigger in the Kafka reader. What do you mean with process rate below batch duration? The process rate is records per sec. (in my current deployment it's approx. 1), batch duration is sec. (at around 60 sec.) Best, Rico Am 05.03.2021 um 10:58 sch

Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
a few thousand records (rows) that have to be stored. And to write the data I use foreachBatch(). My question is now: Is it guaranteed by Spark that all output records of one event are always contained in a single batch or can the records also be split into multiple batches? Best,

Spark 2.2.1 Dataframes multiple joins bug?

2020-03-23 Thread Dipl.-Inf. Rico Bergmann
+---+---+ +---+---+ When doing a distinct on the 4-way join I get the expected number of records: buse.alias("buse").join(bdef.alias("bdef"), $"buse._c4"===$"bdef._c4").join(crnb.alias("crnb"), $"bdef._c9"===$"crnb._c4").join(wreg.alias("wreg"), $"crnb._c1"===$"wreg._c5").distinct.count res10: Long = 20554365 This (in my opinion) means, that Spark is creating duplicte rows, although it shouldn't. Or do I miss something? Best, Rico.

Re: Spark DataSets and multiple write(.) calls

2018-11-20 Thread Dipl.-Inf. Rico Bergmann
, avoiding to write to disk (at least until the grouping). Any other ideas or proposals? Best, Rico. Am 19.11.2018 um 19:12 schrieb Vadim Semenov: > You can use checkpointing, in this case Spark will write out an rdd to > whatever destination you specify, and then the RDD can be reused from

Re: Spark DataSets and multiple write(.) calls

2018-11-19 Thread Dipl.-Inf. Rico Bergmann
Thanks for your advise. But I'm using Batch processing. Does anyone have a solution for the batch processing case? Best, Rico. Am 19.11.2018 um 09:43 schrieb Magnus Nilsson: > > > Magnus Nilsson > > > 9:43 AM (0 minutes ago) > >

Spark DataSets and multiple write(.) calls

2018-11-19 Thread Dipl.-Inf. Rico Bergmann
subpart, that can be shared by all plans. Is there a possibility to do this? (Caching is not a solution because the input dataset is way to large...) Hoping for advises ... Best, Rico B. --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. https://www.avast.com/antivirus

Re: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to Case class

2018-11-16 Thread Rico B.
Did you or anyone else find a solution to this problem? I'm stuck with the same Issue ... -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Strange codegen error for SortMergeJoin in Spark 2.2.1

2018-06-08 Thread Rico Bergmann
Hi! I finally found the problem. I was not aware, that the program was run in Client mode. The client used version 2.2.0. This caused the problem. Best, Rico. Am 07.06.2018 um 08:49 schrieb Kazuaki Ishizaki: > Thank you for reporting a problem. > Would it be possible to create a JIRA

Strange codegen error for SortMergeJoin in Spark 2.2.1

2018-06-05 Thread Rico Bergmann
Threshold);") it should get 2 parameters, not just one. May be anyone has an idea? Best, Rico.

Problem running Kubernetes example v2.2.0-kubernetes-0.5.0

2018-04-11 Thread Rico Bergmann
Hi! I was trying to get the SparkPi example running using the spark-on-k8s distro from kubespark. But I get the following error: + /sbin/tini -s -- driver [FATAL tini (11)] exec driver failed: No such file or directory Did anyone get the example running on a Kubernetes cluster? Best, Rico

Re: memory leak query

2014-07-25 Thread Rico
Hi Michael, I have similar question before. My problem was that my data was too large to be cached in memory because of serializatio

Re: Caching issue with msg: RDD block could not be dropped from memory as it does not exist

2014-07-25 Thread Rico
I could find out the issue. In fact, I did not realize before that when loaded into memory, the data is deserialized. As a result, what seems to be a 21Gb dataset occupies 77Gb in memory. Details about this is clearly explained in the guide on serialization and memory tuning

Re: Are all transformations lazy?

2014-07-25 Thread Rico
It may be confusing at first but there is also an important difference between reduce and reduceByKey operations. reduce is an action on an RDD. Hence, it will request the evaluation of transformations that resulted to the RDD. In contrast, reduceByKey is a transformation on PairRDDs, not an act