Re: Time window on Processing Time

2017-08-30 Thread madhu phatak
ark.sql.functions._ > > ds.withColumn("processingTime", current_timestamp()) > .groupBy(window("processingTime", "1 minute")) > .count() > > > On Mon, Aug 28, 2017 at 5:46 AM, madhu phatak > wrote: > >> Hi, >> As I am playing with structure

Time window on Processing Time

2017-08-28 Thread madhu phatak
Hi, As I am playing with structured streaming, I observed that window function always requires a time column in input data.So that means it's event time. Is it possible to old spark streaming style window function based on processing time. I don't see any documentation on the same. -- Regards, M

Review of ML PR

2017-08-14 Thread madhu phatak
Hi, I have provided a PR around 2 months back to improve the performance of decision tree by allowing flexible user provided storage class for intermediate data. I have posted few questions about handling backward compatibility but there is no answers from long. Can anybody help me to move this f

Re: RandomForest caching

2017-05-12 Thread madhu phatak
Hi, I opened a jira. https://issues.apache.org/jira/browse/SPARK-20723 Can some one have a look? On Fri, Apr 28, 2017 at 1:34 PM, madhu phatak wrote: > Hi, > > I am testing RandomForestClassification with 50gb of data which is cached > in memory. I have 64gb of ram, in which 28gb

RandomForest caching

2017-04-28 Thread madhu phatak
Hi, I am testing RandomForestClassification with 50gb of data which is cached in memory. I have 64gb of ram, in which 28gb is used for original dataset caching. When I run random forest, it caches around 300GB of intermediate data which un caches the original dataset. This caching is triggered by

Re: spark-shell 1.5 doesn't seem to work in local mode

2015-09-19 Thread Madhu
other, but that should be about it. Does 1.5.0 pick up HADOOP_INSTALL? Wouldn't spark-shell --master local override that? 1.5 seemed to completely ignore --master local - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-devel

spark-shell 1.5 doesn't seem to work in local mode

2015-09-19 Thread Madhu
38) ... 76 more :10: error: not found: value sqlContext import sqlContext.implicits._ ^ :10: error: not found: value sqlContext import sqlContext.sql ^ - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in contex

Re: Detecting configuration problems

2015-09-08 Thread Madhu
t) and raise an alarm if it's getting too high. Even a warning on the console would be better than a catastrophic OOM. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Detecting-configuration-pr

Detecting configuration problems

2015-09-06 Thread Madhu
lenge and does not present Spark in a positive light. I can help with that effort if someone is willing to point me to the precise location of memory pressure during shuffle. Thanks! - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-devel

Re: Contributing Documentation Changes

2015-04-24 Thread madhu phatak
ink that your own tutorials and such should live on your blog. The > goal isn't to pull in a bunch of external docs to the site. > > On Fri, Apr 24, 2015 at 12:57 AM, madhu phatak > wrote: > > Hi, > > As I was reading contributing to Spark wiki, it was mentioned that

Contributing Documentation Changes

2015-04-23 Thread madhu phatak
Hi, As I was reading contributing to Spark wiki, it was mentioned that we can contribute external links to spark tutorials. I have written many of them in my blog. It will be great if someone can add it to the spark website. Regards, Madhukara

VertexId type in GraphX

2015-01-13 Thread Madhu
that helps. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VertexId-type-in-GraphX-tp10104.html Sent from the Apache Spark Developers List mailing list archive at Nabbl

Re: RDD data flow

2014-12-17 Thread Madhu
on this point. Maybe I'll add this to the docs. Thanks Patrick! - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-tp9804p9820.html Sent from the Apache Spark Developers List mail

RDD data flow

2014-12-16 Thread Madhu
ion of the flow. The declaration of Partition is throwing me off. Thanks! - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-tp9804.html Sent from the Apache Spark Developers Li

Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-11 Thread Madhu
-hadoop1.0.4.jar Ran some of my 1.2 code successfully. Review some docs, looks good. spark-shell.cmd works as expected. Env details: sbtconfig.txt: -Xmx1024M -XX:MaxPermSize=256m -XX:ReservedCodeCacheSize=128m sbt --version sbt launcher version 0.13.1 - -- Madhu https://www.linkedin.com/in

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Madhu
Thanks Patrick. I've been testing some 1.2 features, looks good so far. I have some example code that I think will be helpful for certain MR-style use cases (secondary sort). Can I still add that to the 1.2 documentation, or is that frozen at this point? - -- Madhu https://www.linkedi

Help needed to publish SizeEstimator as separate library

2014-11-19 Thread madhu phatak
Hi, As I was going through spark source code, SizeEstimator caught my eye. It's a very useful tool to do the size estimations on JVM which helps in use cases like memory bounded cache. It w

Re: Jira tickets for starter tasks

2014-08-29 Thread Madhu
a good thing. Just my $0.02 - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Jira-tickets-for-starter-tasks-tp8102p8127.html Sent from the Apache Spark Developers List mailing list archive

Re: Handling stale PRs

2014-08-26 Thread Madhu
tion and a shared vision might be a reason for their success. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-tp8015p8061.html Sent from the Apache Spark Developers List mailin

Re: Handling stale PRs

2014-08-26 Thread Madhu
dress immediate concerns of open PRs and excessive, overlapping Jira issues, we probably have to create a meta issue and assign resources to fix it. I don't mind helping with that also. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apac

Re: Unit test best practice for Spark-derived projects

2014-08-07 Thread Madhu
How long does it take to get a spark context? I found that if you don't have a network connection (reverse DNS lookup most likely), it can take up 30 seconds to start up locally. I think a hosts file entry is sufficient. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View

Re: Buidling spark in Eclipse Kepler

2014-08-07 Thread Madhu
le to build *core* in Eclipse Kepler? In my view, tool independence is a good thing. I'll do what I can to support Eclipse. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Buidling-spark-

Re: Eclipse Scala IDE/Scala test and Wiki

2014-06-03 Thread Madhu
I was able to edit the page and add Eclipse setup steps. Thanks Matei and Reynold! - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Eclipse-Scala-IDE-Scala-test-and-Wiki-tp6908p6930.html Sent

Eclipse Scala IDE/Scala test and Wiki

2014-06-02 Thread Madhu
#ContributingtoSpark-IDESetup I can't seem to edit that page. Confluence usually has a an "Edit" button in the upper right, but it does not appear for me, even though I am logged in. Am I missing something? - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in

Re: Sorting partitions in Java

2014-05-20 Thread Madhu
wse/SPARK-983>Andrew mentioned covers the rdd.sortPartitions() use case. Can someone comment on the scope of SPARK-983? Thanks! - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Sorting-pa

Re: Sorting partitions in Java

2014-05-20 Thread Madhu
thod that deals with it efficiently and reliably. Is there another solution for sorting arbitrarily large partitions? If not, I don't mind developing and contributing a solution. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-s

Sorting partitions in Java

2014-05-20 Thread Madhu
. Ideally, it would be nice to have an efficient, robust method in RDD to sort each partition. Does something like that exist? Thanks! - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Sorting-p

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-14 Thread Madhu
I built rc5 using sbt/sbt assembly on Linux without any problems. There used to be an sbt.cmd for Windows build, has that been deprecated? If so, I can document the Windows build steps that worked for me. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-13 Thread Madhu
I just built rc5 on Windows 7 and tried to reproduce the problem described in https://issues.apache.org/jira/browse/SPARK-1712 It works on my machine: 14/05/13 21:06:47 INFO DAGScheduler: Stage 1 (sum at :17) finished in 4.548 s 14/05/13 21:06:47 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whos

Re: Spark 1.0.0 rc3

2014-05-01 Thread Madhu
I'm guessing EC2 support is not there yet? I was able to build using the binary download on both Windows 7 and RHEL 6 without issues. I tried to create an EC2 cluster, but saw this: ~/spark-ec2 Initializing spark ~ ~/spark-ec2 ERROR: Unknown Spark version Initializing shark ~ ~/spark-ec2 ~/spark-

Re: get -101 error code when running select query

2014-04-23 Thread Madhu
I have seen a similar error message when connecting to Hive through JDBC. This is just a guess on my part, but check your query. The error occurs if you have a select that includes a null literal with an alias like this: select a, b, null as c, d from foo In my case, rewriting the query to use an