What about exposing transforms that make it easy to coerce data to what the
method needs? Instead of passing a dataframe, you’d pass df.toSet to isin
Assuming toSet returns a local list, wouldn’t that have the problem of not
being able to handle extremely large lists? In contrast, I believe SQL’s
Hi all,
maybe I'm missing something, but from what was discussed here I've
gathered that the current mllib implementation returns exactly the same
model whether standardization is turned on or off.
I suggest to consider an R script (please, see below) which trains two
penalized logistic regr
Thanks Joseph!
From: Joseph Torres
Date: Friday, April 27, 2018 at 11:23 AM
To: "Thakrar, Jayesh"
Cc: "dev@spark.apache.org"
Subject: Re: Datasource API V2 and checkpointing
The precise interactions with the DataSourceV2 API haven't yet been hammered
out in design. But much of this comes down
The precise interactions with the DataSourceV2 API haven't yet been
hammered out in design. But much of this comes down to the core of
Structured Streaming rather than the API details.
The execution engine handles checkpointing and recovery. It asks the
streaming data source for offsets, and then
Wondering if this issue is related to SPARK-23323?
Any pointers will be greatly appreciated….
Thanks,
Jayesh
From: "Thakrar, Jayesh"
Date: Monday, April 23, 2018 at 9:49 PM
To: "dev@spark.apache.org"
Subject: Datasource API V2 and checkpointing
I was wondering when checkpointing is enabled, w
--
- Hariharan M K
I see.
monotonically_increasing_id on streaming dataFrames will be really helpful
to me and I believe to many more users. Adding this functionality in Spark
would be efficient in terms of performance as compared to implementing this
functionality inside the applications.
Hemant
On Thu, Apr 26, 2