Re: Time window on Processing Time

2017-08-30 Thread madhu phatak
ark.sql.functions._ > > ds.withColumn("processingTime", current_timestamp()) > .groupBy(window("processingTime", "1 minute")) > .count() > > > On Mon, Aug 28, 2017 at 5:46 AM, madhu phatak > wrote: > >> Hi, >> As I am playing with structure

Time window on Processing Time

2017-08-28 Thread madhu phatak
Hi, As I am playing with structured streaming, I observed that window function always requires a time column in input data.So that means it's event time. Is it possible to old spark streaming style window function based on processing time. I don't see any documentation on the same. -- Regards, M

Review of ML PR

2017-08-14 Thread madhu phatak
Hi, I have provided a PR around 2 months back to improve the performance of decision tree by allowing flexible user provided storage class for intermediate data. I have posted few questions about handling backward compatibility but there is no answers from long. Can anybody help me to move this f

Re: RandomForest caching

2017-05-12 Thread madhu phatak
Hi, I opened a jira. https://issues.apache.org/jira/browse/SPARK-20723 Can some one have a look? On Fri, Apr 28, 2017 at 1:34 PM, madhu phatak wrote: > Hi, > > I am testing RandomForestClassification with 50gb of data which is cached > in memory. I have 64gb of ram, in which 28gb

RandomForest caching

2017-04-28 Thread madhu phatak
Hi, I am testing RandomForestClassification with 50gb of data which is cached in memory. I have 64gb of ram, in which 28gb is used for original dataset caching. When I run random forest, it caches around 300GB of intermediate data which un caches the original dataset. This caching is triggered by

Re: Contributing Documentation Changes

2015-04-24 Thread madhu phatak
ink that your own tutorials and such should live on your blog. The > goal isn't to pull in a bunch of external docs to the site. > > On Fri, Apr 24, 2015 at 12:57 AM, madhu phatak > wrote: > > Hi, > > As I was reading contributing to Spark wiki, it was mentioned that

Contributing Documentation Changes

2015-04-23 Thread madhu phatak
Hi, As I was reading contributing to Spark wiki, it was mentioned that we can contribute external links to spark tutorials. I have written many of them in my blog. It will be great if someone can add it to the spark website. Regards, Madhukara

Help needed to publish SizeEstimator as separate library

2014-11-19 Thread madhu phatak
Hi, As I was going through spark source code, SizeEstimator caught my eye. It's a very useful tool to do the size estimations on JVM which helps in use cases like memory bounded cache. It w