Requesting a Plan for Avro-typed Datasets

2019-05-16 Thread Aleksander Eskilson
Hi all, There's been longstanding demand for statically typed Datasets of Avro. Functionality from the now-deprecated Databricks Spark-Avro project was folded into Spark, but can still only provide DataFrames over Avro data. As is discussed in the PR below, there are still drawbacks from not havin

Open PRs RE: Datasets Typed by Arbitrary Avro

2019-04-18 Thread Aleksander Eskilson
ution can come out of this visibility. Aleksander Eskilson https://github.com/bdrillard

[ML] Setting Non-Transform Params for a Pipeline & PipelineModel

2018-09-04 Thread Aleksander Eskilson
In a nutshell, what I'd like to do is persist a instantiate a Pipeline (or extension class of Pipeline) with metadata that is copied to the PipelineModel when fitted, and can be read again when the fitted model is loaded by another consumer. These params are specific to the PipelineModel more than

[Catalyst] Code Generation and the Constant Pool Limit

2017-05-12 Thread Aleksander Eskilson
Hi all, I want to take a moment to highlight an issue and invite hopefully some developers to review a pull request [1] for SPARK-18016 [2]. Code generated by Catalyst currently places all split method

Re: DataFrameReader Schema Supersedes Schema Provided by Encoder, Renders Fields Nullable

2016-10-14 Thread Aleksander Eskilson
I've opened a Jira to the issue you requested earlier: https://issues.apache.org/jira/browse/SPARK-17939 On Fri, Oct 14, 2016 at 9:24 AM Aleksander Eskilson wrote: > Interesting. I'm quite glad to read your explanation, it makes some of our > work quite a bit more clear. I

Re: DataFrameReader Schema Supersedes Schema Provided by Encoder, Renders Fields Nullable

2016-10-14 Thread Aleksander Eskilson
the data, there are cases that can still produce a null result. For example, when parsing JSON, we null out all columns other than _corrupt_record when we fail to parse a line. The fact that its different in streaming is a bug. Would you mind opening up a JIRA ticket, and we discuss the right path f

DataFrameReader Schema Supersedes Schema Provided by Encoder, Renders Fields Nullable

2016-10-13 Thread Aleksander Eskilson
Hi there, Working in the space of custom Encoders/ExpressionEncoders, I've noticed that the StructType schema as set when creating an object of the ExpressionEncoder[T] class [1] is not the schema actually used to set types for the columns of a Dataset, as created by using the .as(encoder) method

Re: Catalyst - ObjectType for Encoders

2016-10-03 Thread Aleksander Eskilson
t; advanced users can experiment. > > So, you can write a custom encoder, but until we finalize the encoder API, > you might need to update it each Spark release. > > On Fri, Sep 30, 2016 at 6:36 AM, Aleksander Eskilson < > aleksander...@gmail.com> wrote: > >

Catalyst - ObjectType for Encoders

2016-09-30 Thread Aleksander Eskilson
scala/org/apache/spark/sql/types/ObjectType.scala#L39 Thanks, Aleksander Eskilson