class KafkaCluster related errors

2021-06-06 Thread Kiran Biswal
*I am using spark 3.0.1 AND Kafka 0.10 AND Scala 2.12. Getting an error related to KafkaCluster (not found: type KafkaCluster). Is this class deprecated? How do I find a replacement?* *I am upgrading from spark 2.0.1 to spark 3.0.1* *In spark 2.0.1 KafkaCluster was supported* https://spark.apache

Re: class KafkaCluster related errors

2021-06-07 Thread Kiran Biswal
is email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sun, 6 Jun 2021 at 21:18, Kiran Biswal wrote: > >> *I am using spark 3.0.1 AND Kaf

java.lang.AssertionError: assertion failed: Found duplicate rewrite attributes

2021-08-28 Thread Kiran Biswal
Hello Experts During a join operation, I see this error below (spark 3.0.2) Any suggestions on how to debug? Error: java.lang.AssertionError: assertion failed: Found duplicate rewrite attribute Source code: val dfFilteredFinal=dfFiltered .join(dfScenarioSite, Seq("tid","site"), "left_oute

Spark DStream application memory leak debugging

2021-09-25 Thread Kiran Biswal
Hello Experts I have a spark streaming application(DStream). I use spark 3.0.2, scala 2.12 This application reads about 20 different kafka topics and produces a single stream and I filter the RDD per topic and store in cassandra I see that there is a steady increase in executor memory over the ho

Re: Spark DStream application memory leak debugging

2021-09-27 Thread Kiran Biswal
you're storing state. > You'd want to dump the heap to take a first look > > On Sat, Sep 25, 2021 at 7:24 AM Kiran Biswal > wrote: > >> Hello Experts >> >> I have a spark streaming application(DStream). I use spark 3.0.2, scala >> 2.12 This appl

protobuf data as input to spark streaming

2022-04-05 Thread Kiran Biswal
Hello Experts Has anyone used protobuf (proto3) encoded data (from kafka) as input source and been able to do spark structured streaming? I would appreciate if you can share any sample code/example Regards Kiran >

Re: protobuf data as input to spark streaming

2022-04-06 Thread Kiran Biswal
Stelios Philippou wrote: > Yes we are currently using it as such. > Code is in java. Will that work? > > On Wed, 6 Apr 2022 at 00:51, Kiran Biswal wrote: > >> Hello Experts >> >> Has anyone used protobuf (proto3) encoded data (from kafka) as input >> sour

Re: protobuf data as input to spark streaming

2022-04-08 Thread Kiran Biswal
Hello Stelios Just a gentle follow up if you can share any sample code/repo Regards Kiran On Wed, Apr 6, 2022 at 3:19 PM Kiran Biswal wrote: > Hello Stelios > > Preferred language would have been Scala or pyspark but if Java is proven > I am open to using it > > Any s

Re: protobuf data as input to spark streaming

2022-05-30 Thread Kiran Biswal
Hello Stelios, friendly reminder if you could share any sample code/repo Are you using a schema registry? Thanks Kiran On Fri, Apr 8, 2022 at 4:37 PM Kiran Biswal wrote: > Hello Stelios > > Just a gentle follow up if you can share any sample code/repo > > Regards > Kiran

Structured streaming with protobuf proto3 schema registry

2022-06-06 Thread Kiran Biswal
Hello Experts Has anyone been able to use schema registry for spark structured streaming where input data is protobuf proto3 If input is Avro, I believe schema registry is doable. Wondering about protobuf schema registry kafka (protobuf proto3) -> schema published to registry-> spark structured

Driver throws exception every few hours

2022-09-19 Thread Kiran Biswal
Hello Experts Seeing below exceptions thrown by the spark driver every few hours. Using spark 3.3.0 com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:392 Caused by: com.fasterxml.jackson.databind.JsonMappingException: timeout (through reference chain: io

Autoscaling in Spark

2023-10-10 Thread Kiran Biswal
Hello Experts Is there any true auto scaling option for spark? The dynamic auto scaling works only for batch. Any guidelines on spark streaming autoscaling and how that will be tied to any cluster level autoscaling solutions? Thanks