from:"Sanjay Awatramani"

Re: Watermarking without aggregation with Structured Streaming

2018-10-24 Thread Sanjay Awatramani

Try if this works... println(query.lastProgress.eventTime.get("watermark")) Regards,Sanjay On 2018/09/30 09:05:40, peay wrote: > Thanks for the pointers. I guess right now the only workaround would be to > apply a "dummy" aggregation (e.g., group by the timestamp itself) only to > have the sta

Require some clarity on partitioning

2014-04-07 Thread Sanjay Awatramani

Hi, I was going through Matei's Advanced Spark presentation at https://www.youtube.com/watch?v=w0Tisli7zn4 , and had few questions. The presentation of this video is at http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf The PageRank example int

Implementation problem with Streaming

2014-03-25 Thread Sanjay Awatramani

Hi, I had initially thought of a streaming approach to solve my problem, and I am stuck at few places and want opinion if this problem is suitable for streaming, or is it better to stick to basic spark. Problem: I get chunks of log files in a folder and need to do some analysis on them on an h

Re: Sliding Window operations do not work as documented

2014-03-24 Thread Sanjay Awatramani

ening because of the lazy transformations. Wherever sliding window would be used, in most of the cases, no actions will be taken on the pre-window batch, hence my gut feeling was that Streaming would read every batch if any actions are being taken in the windowed stream. Regards, Sanjay On Fr

Sliding Window operations do not work as documented

2014-03-21 Thread Sanjay Awatramani

Hi, I want to run a map/reduce process over last 5 seconds of data, every 4 seconds. This is quite similar to the sliding window pictorial example under Window Operations section on http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html . The RDDs returned by window t

Re: N-Fold validation and RDD partitions

2014-03-21 Thread Sanjay Awatramani

Hi Jaonary, I believe the n folds should be mapped into n Keys in spark using a map function. You can reduce the returned PairRDD and you should get your metric. I don't understand partitions fully, but from whatever I understand of it, they aren't required in your scenario. Regards, Sanjay

Re: Relation between DStream and RDDs

2014-03-21 Thread Sanjay Awatramani

e element being the count >>>result for the corresponding original RDD. >>> >>> >>> >>>For reduce, it's the same using reduce operation... >>> >>>The only operations that are a bit more complex are reduceByWindow & >>

Re: Relation between DStream and RDDs

2014-03-20 Thread Sanjay Awatramani

tiple RDDs from a DStream in *every batch*. Can you elaborate on why do you need multiple RDDs every batch? > > >TD > > > >On Wed, Mar 19, 2014 at 10:20 PM, Sanjay Awatramani >wrote: > >Hi, >> >> >>As I understand, a DStream consists of 1 or more

Relation between DStream and RDDs

2014-03-19 Thread Sanjay Awatramani

Hi, As I understand, a DStream consists of 1 or more RDDs. And foreachRDD will run a given func on each and every RDD inside a DStream. I created a simple program which reads log files from a folder every hour: JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new Duration(60 * 60

Re: Watermarking without aggregation with Structured Streaming

Require some clarity on partitioning

Implementation problem with Streaming

Re: Sliding Window operations do not work as documented

Sliding Window operations do not work as documented

Re: N-Fold validation and RDD partitions

Re: Relation between DStream and RDDs

Re: Relation between DStream and RDDs

Relation between DStream and RDDs

9 matches

Site Navigation

Mail list logo

Footer information