date:20180427

Re: Correlated subqueries in the DataFrame API

2018-04-27 Thread Nicholas Chammas

What about exposing transforms that make it easy to coerce data to what the method needs? Instead of passing a dataframe, you’d pass df.toSet to isin Assuming toSet returns a local list, wouldn’t that have the problem of not being able to handle extremely large lists? In contrast, I believe SQL’s

Re: [MLLib] Logistic Regression and standadization

2018-04-27 Thread Valeriy Avanesov

Hi all, maybe I'm missing something, but from what was discussed here I've gathered that the current mllib implementation returns exactly the same model whether standardization is turned on or off. I suggest to consider an R script (please, see below) which trains two penalized logistic regr

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Thakrar, Jayesh

Thanks Joseph! From: Joseph Torres Date: Friday, April 27, 2018 at 11:23 AM To: "Thakrar, Jayesh" Cc: "dev@spark.apache.org" Subject: Re: Datasource API V2 and checkpointing The precise interactions with the DataSourceV2 API haven't yet been hammered out in design. But much of this comes down

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Joseph Torres

The precise interactions with the DataSourceV2 API haven't yet been hammered out in design. But much of this comes down to the core of Structured Streaming rather than the API details. The execution engine handles checkpointing and recovery. It asks the streaming data source for offsets, and then

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Thakrar, Jayesh

Wondering if this issue is related to SPARK-23323? Any pointers will be greatly appreciated…. Thanks, Jayesh From: "Thakrar, Jayesh" Date: Monday, April 23, 2018 at 9:49 PM To: "dev@spark.apache.org" Subject: Datasource API V2 and checkpointing I was wondering when checkpointing is enabled, w

unsubscribe

2018-04-27 Thread Deepesh Maheshwari

unsubscribe

2018-04-27 Thread hari haran

-- - Hariharan M K

Re: Sorting on a streaming dataframe

2018-04-27 Thread Hemant Bhanawat

I see. monotonically_increasing_id on streaming dataFrames will be really helpful to me and I believe to many more users. Adding this functionality in Spark would be efficient in terms of performance as compared to implementing this functionality inside the applications. Hemant On Thu, Apr 26, 2

Re: Correlated subqueries in the DataFrame API

Re: [MLLib] Logistic Regression and standadization

Re: Datasource API V2 and checkpointing

Re: Datasource API V2 and checkpointing

Re: Datasource API V2 and checkpointing

unsubscribe

unsubscribe

Re: Sorting on a streaming dataframe

8 matches

Site Navigation

Mail list logo

Footer information