Re: RepartitionByKey Behavior

2018-06-22 Thread Nathan Kronenfeld
> > On Thu, Jun 21, 2018 at 4:51 PM, Chawla,Sumit >>> wrote: >>> Hi I have been trying to this simple operation. I want to land all values with one key in same partition, and not have any different key in the same partition. Is this possible? I am getting b and c alwa

Re: Problems with spark.locality.wait

2014-11-13 Thread Nathan Kronenfeld
resources > were > > offered, then wait for spark.locality.wait.node, which was setted to 30 > > minutes, the 2 RACK_LOCAL tasks will wait 30 minutes even though > resources > > are avaliable. > > > > > > Does any one have met this problem? Do you have a nice solution? > > > > > > Thanks > > > > > > > > > > Ma chong > > > -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com

PR 5140

2015-04-08 Thread Nathan Kronenfeld
Could I get someone to look at PR 5140 please? It's been languishing more than two weeks.

Re: Spark streaming vs. spark usage

2015-04-17 Thread Nathan Kronenfeld
https://github.com/apache/spark/pull/5565, and would very much appreciate comments. Thanks, Nathan On Thu, Dec 19, 2013 at 12:42 AM, Reynold Xin wrote: > > On Wed, Dec 18, 2013 at 12:17 PM, Nathan Kronenfeld < > nkronenf...@oculusinfo.com> wrote: &

Re: Change for submitting to yarn in 1.3.1

2015-05-21 Thread Nathan Kronenfeld
> In researching and discussing these issues with Cloudera and others, we've > been told that only one mechanism is supported for starting Spark jobs: the > *spark-submit* scripts. > Is this new? We've been submitting jobs directly from a programatically created spark context (instead of through s

Re: Change for submitting to yarn in 1.3.1

2015-05-21 Thread Nathan Kronenfeld
Thanks, Marcelo > Instantiating SparkContext directly works. Well, sorta: it has > limitations. For example, see discussions about Spark not really liking > multiple contexts in the same JVM. It also does not work in "cluster" > deploy mode. > > That's fine - when one is doing something out of s

Testing spark applications

2015-05-21 Thread Nathan Kronenfeld
> > see discussions about Spark not really liking multiple contexts in the > same JVM > Speaking of this - is there a standard way of writing unit tests that require a SparkContext? We've ended up copying out the code of SharedSparkContext to our own testing hierarchy, but it occurs to me someone

repositories for spark jars

2014-03-17 Thread Nathan Kronenfeld
maven repo. Is this already done in some other repo about which I don't know, perhaps? I know it would save us a lot of time and grief simply to be able to point a project we build at the right version, and not have to rebuild and deploy spark manually. -- Nathan Kronenfeld Senior Vis

Compile error when compiling for cloudera

2014-07-17 Thread Nathan Kronenfeld
't know flume from a hole in the wall - does anyone know what I can do to fix this? Thanks, -Nathan -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com

Re: Compile error when compiling for cloudera

2014-07-17 Thread Nathan Kronenfeld
at might be changing the version of Jetty used by Spark? > It depends a lot on how you are building things. > > Good to specify exactly how your'e building here. > > On Thu, Jul 17, 2014 at 3:43 PM, Nathan Kronenfeld > wrote: > > I'm trying to compile the latest code,

Re: Compile error when compiling for cloudera

2014-07-17 Thread Nathan Kronenfeld
er, that line being in toDebugString, where it really shouldn't affect anything (no signature changes or the like) On Thu, Jul 17, 2014 at 10:58 AM, Nathan Kronenfeld < nkronenf...@oculusinfo.com> wrote: > My full build command is: > ./sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.6

Fwd: Accumulator question

2014-10-08 Thread Nathan Kronenfeld
d queries using some relatively sizable accumulators; at the moment, we're creating one per query, and running out of memory after far too few queries. I've tried methods that don't involve accumulators; they involve a shuffle instead, and take 10x as long. Thanks, -Na

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-26 Thread Nathan Kronenfeld
On Wed, Feb 26, 2014 at 2:11 PM, Sean Owen wrote: > I also favor Maven. I don't the the logic is "because it's common". As > Sandy says, it's because of the things that brings: more plugins, > easier to consume by more developers, etc. These are, however, just > some reasons 'for', and have to be