@Ángel Álvarez Pascua
Thanks, however I am thinking of some other solution which does not involve
saving the dataframe result. Will update this thread with details soon.
@daniel williams
Thanks, I will surely check spark-testing-base out.
Regards,
Abhishek Singla
On Thu, Apr 17, 2025 at 11
am missing something?
Regards,
Abhishek Singla
On Thu, Apr 17, 2025 at 6:25 AM daniel williams
wrote:
> The contract between the two is a dataset, yes; but did you construct the
> former via headstream? If not, it’s still batch.
>
> -dan
>
>
> On Wed, Apr 16, 2025 at 4:54 PM Abh
failures. I wanted to know if there is an
existing way in spark batch to checkpoint already processed rows of a
partition if using foreachPartition or mapParitions, so that they are not
processed again on rescheduling of task due to failure or retriggering of
job due to failures.
Regards,
Abhish
:
"org.apache.kafka.common.serialization.ByteArraySerializer"
}
"linger.ms" and "batch.size" are successfully set in Producer Config
(verified via startup logs) and are being honored.
kafka connector?
Regards,
Abhishek Singla
offset so that those 100 should not get processed again.
Regards,
Abhishek Singla
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 6
transactional.id = null
value.serializer = class
org.apache.kafka.common.serialization.ByteArraySerializer
On Wed, Apr 16, 2025 at 5:55 PM Abhishek Singla
wrote:
> Hi Daniel and Jungtaek,
>
> I am using Spark
10 records and they are flushed to kafka server immediately,
however kafka producer behaviour when publishing via kafka-clients using
foreachPartition is as expected. Am I missing something here or is
throttling not supported in the kafka connector?
Regards,
Abhishek Singla
On Thu, Mar 27, 2025 at 4:56
Isn't there a way to do it with kafka connector instead of kafka client?
Isn't there any way to throttle kafka connector? Seems like a common
problem.
Regards,
Abhishek Singla
On Mon, Feb 24, 2025 at 7:24 PM daniel williams
wrote:
> I think you should be using a foreachPa
Config:
ProducerConfig values* at runtime that they are not being set. Is there
something I am missing? Is there any other way to throttle kafka writes?
*dataset.write().format("kafka").options(options).save();*
Regards,
Abhishek Singla
Hi Team,
Could someone provide some insights into this issue?
Regards,
Abhishek Singla
On Wed, Jan 17, 2024 at 11:45 PM Abhishek Singla <
abhisheksingla...@gmail.com> wrote:
> Hi Team,
>
> Version: 3.2.2
> Java Version: 1.8.0_211
> Scala Version: 2.12.15
> Cluster: S
ionId, appConfig))
.option("checkpointLocation", appConfig.getChk().getPath())
.start()
.awaitTermination();
Regards,
Abhishek Singla
:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising f
t:7077",
"spark.app.name": "app",
"spark.sql.streaming.kafka.useDeprecatedOffsetFetching": false,
"spark.sql.streaming.metricsEnabled": true
}
But these configs do not seem to be working as I can see Spark processing
batches of 3k-15k immediately one after another. Is there something I am
missing?
Ref:
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
Regards,
Abhishek Singla
14 matches
Mail list logo