Try if this works...
println(query.lastProgress.eventTime.get("watermark"))
Regards,Sanjay
On 2018/09/30 09:05:40, peay wrote:
> Thanks for the pointers. I guess right now the only workaround would be to
> apply a "dummy" aggregation (e.g., group by the timestamp itself) only to
> have the sta
Hi,
I was going through Matei's Advanced Spark presentation at
https://www.youtube.com/watch?v=w0Tisli7zn4 , and had few questions.
The presentation of this video is at
http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf
The PageRank example int
Hi,
I had initially thought of a streaming approach to solve my problem, and I am
stuck at few places and want opinion if this problem is suitable for streaming,
or is it better to stick to basic spark.
Problem: I get chunks of log files in a folder and need to do some analysis on
them on an h
ening because of the lazy
transformations.
Wherever sliding window would be used, in most of the cases, no actions will be
taken on the pre-window batch, hence my gut feeling was that Streaming would
read every batch if any actions are being taken in the windowed stream.
Regards,
Sanjay
On Fr
Hi,
I want to run a map/reduce process over last 5 seconds of data, every 4
seconds. This is quite similar to the sliding window pictorial example under
Window Operations section on
http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html
.
The RDDs returned by window t
Hi Jaonary,
I believe the n folds should be mapped into n Keys in spark using a map
function. You can reduce the returned PairRDD and you should get your metric.
I don't understand partitions fully, but from whatever I understand of it, they
aren't required in your scenario.
Regards,
Sanjay
e element being the count
>>>result for the corresponding original RDD.
>>>
>>>
>>>
>>>For reduce, it's the same using reduce operation...
>>>
>>>The only operations that are a bit more complex are reduceByWindow &
>>
tiple RDDs from
a DStream in *every batch*. Can you elaborate on why do you need multiple RDDs
every batch?
>
>
>TD
>
>
>
>On Wed, Mar 19, 2014 at 10:20 PM, Sanjay Awatramani
>wrote:
>
>Hi,
>>
>>
>>As I understand, a DStream consists of 1 or more
Hi,
As I understand, a DStream consists of 1 or more RDDs. And foreachRDD will run
a given func on each and every RDD inside a DStream.
I created a simple program which reads log files from a folder every hour:
JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new Duration(60
* 60