Re: Multi-Line JSON in SparkSQL

2015-05-04 Thread Paul Brown
It's not JSON, per se, but data formats like smile ( http://en.wikipedia.org/wiki/Smile_%28data_interchange_format%29) provide support for markers that can't be confused with content and also provide reasonably similar ergonomics. — p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Paul Brown
Moving over to the dev list, as this isn't a user-scope issue. I just ran into this issue with the missing saveAsTestFile, and here's a little additional information: - Code ported from 0.9.1 up to 1.0.0; works with local[n] in both cases. - Driver built as an uberjar via Maven. - Deployed to sma

Re: Any plans for new clustering algorithms?

2014-04-21 Thread Paul Brown
I agree that it will be good to see more algorithms added to the MLlib universe, although this does bring to mind a couple of comments: - MLlib as Mahout.next would be a unfortunate. There are some gems in Mahout, but there are also lots of rocks. Setting a minimal bar of working, correctly impl

Re: Spark Streaming and Storehaus -- example?

2014-03-06 Thread Paul Brown
I'd hazard that this is a generic issue. The "store" is in the context of the driver code, not the worker code, and that's why Spark is trying to send it off to a worker for execution. It's not serializable (and shouldn't be...), so that fails. Try making a Scala object that lives on the worker