In case you are using old Spark Streaming and processing Avro data from
kafka, this might be useful:
https://www.linkedin.com/pulse/avro-data-processing-spark-streaming-kafka-chandan-prakash/
Regards,
Chandan
On Sat, Nov 3, 2018 at 9:04 AM Divya Narayan
wrote:
> Hi,
>
> I produced avr
correct?
>
> I am looking for something along the lines of
>
> ```
> df.withWatermark("ts", ...).filter(F.col("ts") ```
>
> Is there any way to get the watermark value to achieve that?
>
> Thanks!
>
--
Chandan Prakash
Anyone who can clear doubts on the questions asked here ?
Regards,
Chandan
On Sat, Aug 11, 2018 at 10:03 PM chandan prakash
wrote:
> Hi All,
> I was going through this pull request about new CheckpointFileManager
> abstraction in structured streaming coming in 2.4
Hi All,
I was going through this pull request about new CheckpointFileManager
abstraction in structured streaming coming in 2.4 :
https://issues.apache.org/jira/browse/SPARK-23966
https://github.com/apache/spark/pull/21048
I went through the code in detail and found it will indtroduce a very nice
cannot plugin existing sinks like Kafka and
> you need to write the custom logic yourself and you cannot scale the
> partitions for the sinks independently.
>
> [1]
> https://spark.apache.org/docs/2.1.2/api/java/org/apache/spark/sql/ForeachWriter.html
>
> From: chandan prakash
o output the results of a structured streaming query
>>> to multiple Kafka topics each with a different key column but without
>>> having to execute multiple streaming queries?
>>>
>>>
>>> 2. If not, would it be efficient to cascade the multiple queries such
>>> that the first query does the complex aggregation and writes output
>>> to Kafka and then the other queries just read the output of the first query
>>> and write their topics to Kafka thus avoiding doing the complex aggregation
>>> again?
>>>
>>>
>>> Thanks in advance for any help.
>>>
>>>
>>> Priyank
>>>
>>>
>>>
>>
>
--
Chandan Prakash
ke a look
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Chandan Prakash
ds,
Chandan
On Thu, Jul 5, 2018 at 12:38 PM SRK wrote:
> Hi,
>
> Is there a way that Automatic Json Schema inference can be done using
> Structured Streaming? I do not want to supply a predefined schema and bind
> it.
>
> With Spark Kafka Direct I could do spark.read.json(). I see that this is
> not
> supported in Structured Streaming.
>
>
> Thanks!
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Chandan Prakash
Thanks
> Amiya
>
>
>
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Chandan Prakash
LongAccumulator but it seems like it is per batch
and not for the life of the application.
Any other way or workaround to achieve this, please share.
Thanks in advance.
Regards,
--
Chandan Prakash
n you keep posting once you have any solution.
>
> Thanks
> Amiya
>
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Chandan Prakash
to use yarn-client mode only
2. want to enable logging in yarn container only so that it is aggregated
and backed up by yarn every hour to hdfs/s3
*How can I get a workaround this to enable driver logs rolling and
aggregation as well?*
Any pointers will be helpful.
thanks in advance.
--
Chandan Prakash
follow up for a fix.
>
> _
> From: Hyukjin Kwon
> Sent: Wednesday, February 14, 2018 6:49 PM
> Subject: Re: SparkR test script issue: unable to run run-tests.h on spark
> 2.2
> To: chandan prakash
> Cc: user @spark
>
>
>
> From a
like testthat as mentioned in doc
3. run run-tests.h
Every time I am getting this error line:
Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
object 'run_tests' not found
Calls: ::: -> get
Execution halted
Any Help?
--
Chandan Prakash
ons (e.g.
> updateStateByKey), but you're certainly not required to rely on it for
> fault tolerance. I try not to.
>
> On Fri, Aug 19, 2016 at 1:51 PM, chandan prakash <
> chandanbaran...@gmail.com> wrote:
>
>> Thanks Cody for the pointer.
>>
>> I am ab
> On Aug 18, 2016 8:53 AM, "chandan prakash"
> wrote:
>
>> Yes,
>> i looked into the source code implementation. sparkConf is serialized
>> and saved during checkpointing and re-created from the checkpoint directory
>> at time of restart. So any sparkConf p
guide.html#upgrading-application-code
>
>
> On Thu, Aug 18, 2016 at 4:39 AM, chandan prakash <
> chandanbaran...@gmail.com> wrote:
>
>> Is it possible that i use checkpoint directory to restart streaming but
>> with modified parameter value in config file (e.g. username/p
Is it possible that i use checkpoint directory to restart streaming but
with modified parameter value in config file (e.g. username/password for
db connection) ?
Thanks in advance.
Regards,
Chandan
On Thu, Aug 18, 2016 at 1:10 PM, chandan prakash
wrote:
> Hi,
> I am using direct kafk
ssc = StreamingContext.getOrCreate(
checkpointDir,
setupSsc(topics, kafkaParams, jdbcDriver, jdbcUrl, *jdbcUser*,
*jdbcPassword*, checkpointDir) _
)
--
Chandan Prakash
;
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/JavaStreamingContext-stop-hangs-tp27257.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Chandan Prakash
rties ")
On Tue, May 24, 2016 at 10:24 PM, chandan prakash wrote:
> Any suggestion?
>
> On Mon, May 23, 2016 at 5:18 PM, chandan prakash <
> chandanbaran...@gmail.com> wrote:
>
>> Hi,
>> I am able to do logging for driver but not for executor.
>>
Any suggestion?
On Mon, May 23, 2016 at 5:18 PM, chandan prakash
wrote:
> Hi,
> I am able to do logging for driver but not for executor.
>
> I am running spark streaming under mesos.
> Want to do log4j logging separately for driver and executor.
>
> Used the below option in
estLogDriver.log) is happening
fine.
But for executor, there is no logging happening (shud be at
/tmp/requestLogExecutor.log as mentioned in log4j_RequestLogExecutor.properties
on executor machines)
*Any suggestions how to get logging enabled for executor ?*
TIA,
Chandan
--
Chandan Prakash
question about "but what if checkpoints don't let me
> do this" is always going to be "well, don't rely on checkpoints."
>
> If you want dynamic topicpartitions,
> https://issues.apache.org/jira/browse/SPARK-12177
>
>
> On Fri, May 13, 2016 at
Streaming's Kafka integration can't update its
>> parallelism when Kafka's partition number is changed?
>>
>
>
--
Chandan Prakash
but you're better off addressing the
> underlying issue if you can.
>
> On Tue, May 10, 2016 at 8:08 AM, Soumitra Siddharth Johri
> wrote:
> > Also look at back pressure enabled. Both of these can be used to limit
> the
> > rate
> >
> > Sent from my iPhon
fast rate and some with very slow rate.*
Regards,
Chandan
--
Chandan Prakash
ing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Chandan Prakash
28 matches
Mail list logo