Re: Task not Serializable Exception

2017-01-02 Thread Prashant Sharma
Can you minimize the code snippet with which we can get this `NotSerializableException` exception? Thanks, - Prashant Sharma Spark Technology Center http://www.spark.tc/ -- On Sun, Jan 1, 2017 at 9:36 AM, khyati wrote: > Getting error for the following code snippet: > > object SparkTaskT

Re: Kafka Spark structured streaming latency benchmark.

2017-01-02 Thread Prashant Sharma
This issue was fixed in https://issues.apache.org/jira/browse/SPARK-18991. --Prashant On Tue, Dec 20, 2016 at 6:16 PM, Prashant Sharma wrote: > Hi Shixiong, > > Thanks for taking a look, I am trying to run and see if making > ContextCleaner run more frequently and/or making it non blocking wil

Re: Spark Improvement Proposals

2017-01-02 Thread Cody Koeninger
I'm bumping this one more time for the new year, and then I'm giving up. Please, fix your process, even if it isn't exactly the way I suggested. On Tue, Nov 8, 2016 at 11:14 AM, Ryan Blue wrote: > On lazy consensus as opposed to voting: > > First, why lazy consensus? The proposal was for consens

Re: What is mainly different from a UDT and a spark internal type that ExpressionEncoder recognized?

2017-01-02 Thread Shuai Lin
Disclaimer: I'm not a spark guru, and what's written below are some notes I took when reading spark source code, so I could be wrong, in which case I'd appreciate a lot if someone could correct me. > > Let me rephrase this. How does the SparkSQL engine call the codegen APIs > to > do the job of p

Re: mllib metrics vs ml evaluators and how to improve apis for users

2017-01-02 Thread Joseph Bradley
Hi Ilya, Thanks for your thoughts. Here's my understanding of where we are headed: * We will want to move the *Metrics functionality to the spark.ml package, as part of *Evaluator or related classes such as model/result summaries. * It has not yet been decided if or when the spark.mllib package w

StateStoreSaveExec / StateStoreRestoreExec

2017-01-02 Thread Jeremy Smith
I have a question about state tracking in Structured Streaming. First let me briefly explain my use case: Given a mutable data source (i.e. an RDBMS) in which we assume we can retrieve a set of newly created row versions (being a row that was created or updated between two given `Offset`s, whateve