Xinh,
Thanks for the clarification. I'm new to Spark and trying to navigate the
different APIs. I was just following some examples and retrofitting them, but
I see now I should stick with plain RDDs until my schema is known (at the end
of the data pipeline).
Thanks again!
On 06/24/2016 04:5
Hi Martin,
Since your schema is dynamic, how would you use Datasets? Would you know
ahead of time the row type T in a Dataset[T]?
One option is to start with DataFrames in the beginning of your data
pipeline, figure out the field types, and then switch completely over to
RDDs or Dataset in the ne
Indeed. But I'm dealing with 1.6 for now unfortunately.
On 06/24/2016 02:30 PM, Ted Yu wrote:
In Spark 2.0, Dataset and DataFrame are unified.
Would this simplify your use case ?
On Fri, Jun 24, 2016 at 7:27 AM, Martin Serrano
mailto:mar...@attivio.com>> wrote:
Hi,
I'm exposing a custom sourc
In Spark 2.0, Dataset and DataFrame are unified.
Would this simplify your use case ?
On Fri, Jun 24, 2016 at 7:27 AM, Martin Serrano wrote:
> Hi,
>
> I'm exposing a custom source to the Spark environment. I have a question
> about the best way to approach this problem.
>
> I created a custom r