Re: Working with Avro Generic Records in the interactive scala shell

2014-05-24 Thread Josh Marcus
Jeremy, Just to be clear, are you assembling a jar with that class compiled (with its dependencies) and including the path to that jar on the command line in an environment variable (e.g. SPARK_CLASSPATH=path ./spark-shell)? --j On Saturday, May 24, 2014, Jeremy Lewi wrote: > Hi Spark Users, >

Re: advice on maintaining a production spark cluster?

2014-05-20 Thread Josh Marcus
; This isn't helpful of me to say, but, I see the same sorts of problem >> and messages semi-regularly on CDH5 + 0.9.0. I don't have any insight >> into when it happens, but usually after heavy use and after running >> for a long time. I had figured I'd see if the c

Re: advice on maintaining a production spark cluster?

2014-05-20 Thread Josh Marcus
20, 2014 at 3:28 PM, Josh Marcus wrote: > We're using spark 0.9.0, and we're using it "out of the box" -- not using > Cloudera Manager or anything similar. > > There are warnings from the master that there continue to be heartbeats > from the unregistered workers

Re: advice on maintaining a production spark cluster?

2014-05-20 Thread Josh Marcus
9, 2014 at 10:51 PM, Matei Zaharia >> >> > wrote: >> >>> Which version is this with? I haven’t seen standalone masters lose >>> workers. Is there other stuff on the machines that’s killing them, or what >>> errors do you see? >>> >>> Matei

advice on maintaining a production spark cluster?

2014-05-16 Thread Josh Marcus
Hey folks, I'm wondering what strategies other folks are using for maintaining and monitoring the stability of stand-alone spark clusters. Our master very regularly loses workers, and they (as expected) never rejoin the cluster. This is the same behavior I've seen using akka cluster (if that's w

Re: Is Spark a good choice for geospatial/GIS applications? Is a community volunteer needed in this area?

2014-04-23 Thread Josh Marcus
Hey there, I'd encourage you to check out the development currently going on with the GeoTrellis project (http://github.com/geotrellis/geotrellis) or talk to the developers on irc (freenode, #geotrellis) as they're currently developing raster processing capabilities with spark as a backend, as wel

Re: Is it common in spark to broadcast a 10 gb variable?

2014-03-12 Thread Josh Marcus
Aureliano, Just to answer your second question (unrelated to Spark), arrays in java and scala can't be larger than the maximum value of an Integer (Integer.MAX_VALUE), which means that arrays are limited to about 2.2 billion elements. --j On Wed, Mar 12, 2014 at 1:08 PM, Aureliano Buendia wrot