Re: Checkpointing in foreachPartition in Spark batck

2025-04-17 Thread Abhishek Singla
@Ángel Álvarez Pascua Thanks, however I am thinking of some other solution which does not involve saving the dataframe result. Will update this thread with details soon. @daniel williams Thanks, I will surely check spark-testing-base out. Regards, Abhishek Singla On Thu, Apr 17, 2025 at 11

Re: Kafka Connector: producer throttling

2025-04-17 Thread Abhishek Singla
am missing something? Regards, Abhishek Singla On Thu, Apr 17, 2025 at 6:25 AM daniel williams wrote: > The contract between the two is a dataset, yes; but did you construct the > former via headstream? If not, it’s still batch. > > -dan > > > On Wed, Apr 16, 2025 at 4:54 PM Abh

Re: Checkpointing in foreachPartition in Spark batck

2025-04-16 Thread Abhishek Singla
failures. I wanted to know if there is an existing way in spark batch to checkpoint already processed rows of a partition if using foreachPartition or mapParitions, so that they are not processed again on rescheduling of task due to failure or retriggering of job due to failures. Regards, Abhish

Re: Kafka Connector: producer throttling

2025-04-16 Thread Abhishek Singla
: "org.apache.kafka.common.serialization.ByteArraySerializer" } "linger.ms" and "batch.size" are successfully set in Producer Config (verified via startup logs) and are being honored.

Re: Kafka Connector: producer throttling

2025-04-16 Thread Abhishek Singla
kafka connector? Regards, Abhishek Singla

Checkpointing in foreachPartition in Spark batck

2025-04-16 Thread Abhishek Singla
offset so that those 100 should not get processed again. Regards, Abhishek Singla

Re: Kafka Connector: producer throttling

2025-04-16 Thread Abhishek Singla
ssl.truststore.password = null ssl.truststore.type = JKS transaction.timeout.ms = 6 transactional.id = null value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer On Wed, Apr 16, 2025 at 5:55 PM Abhishek Singla wrote: > Hi Daniel and Jungtaek, > > I am using Spark

Re: Kafka Connector: producer throttling

2025-04-16 Thread Abhishek Singla
10 records and they are flushed to kafka server immediately, however kafka producer behaviour when publishing via kafka-clients using foreachPartition is as expected. Am I missing something here or is throttling not supported in the kafka connector? Regards, Abhishek Singla On Thu, Mar 27, 2025 at 4:56

Re: Kafka Connector: producer throttling

2025-02-24 Thread Abhishek Singla
Isn't there a way to do it with kafka connector instead of kafka client? Isn't there any way to throttle kafka connector? Seems like a common problem. Regards, Abhishek Singla On Mon, Feb 24, 2025 at 7:24 PM daniel williams wrote: > I think you should be using a foreachPa

Kafka Connector: producer throttling

2025-02-24 Thread Abhishek Singla
Config: ProducerConfig values* at runtime that they are not being set. Is there something I am missing? Is there any other way to throttle kafka writes? *dataset.write().format("kafka").options(options).save();* Regards, Abhishek Singla

Re: Facing Error org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for s3ablock-0001-

2024-02-13 Thread Abhishek Singla
Hi Team, Could someone provide some insights into this issue? Regards, Abhishek Singla On Wed, Jan 17, 2024 at 11:45 PM Abhishek Singla < abhisheksingla...@gmail.com> wrote: > Hi Team, > > Version: 3.2.2 > Java Version: 1.8.0_211 > Scala Version: 2.12.15 > Cluster: S

Facing Error org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for s3ablock-0001-

2024-01-17 Thread Abhishek Singla
ionId, appConfig)) .option("checkpointLocation", appConfig.getChk().getPath()) .start() .awaitTermination(); Regards, Abhishek Singla

Re: config: minOffsetsPerTrigger not working

2023-04-27 Thread Abhishek Singla
:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising f

config: minOffsetsPerTrigger not working

2023-04-27 Thread Abhishek Singla
t:7077", "spark.app.name": "app", "spark.sql.streaming.kafka.useDeprecatedOffsetFetching": false, "spark.sql.streaming.metricsEnabled": true } But these configs do not seem to be working as I can see Spark processing batches of 3k-15k immediately one after another. Is there something I am missing? Ref: https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html Regards, Abhishek Singla