Re: data source api v2 refactoring

2018-10-21 Thread JackyLee
I have pushed a patch for SQLStreaming, which just resolved the problem just discussed. the Jira: https://issues.apache.org/jira/browse/SPARK-24630 the Patch: https://github.com/apache/spark/pull/22575 SQLStreaming just defined the table API for StructStreaming, and the Table APIs for Stre

RE: data source api v2 refactoring

2018-10-18 Thread Mendelson, Assaf
). Thanks, Assaf From: Wenchen Fan [mailto:cloud0...@gmail.com] Sent: Thursday, October 18, 2018 5:26 PM To: Reynold?Xin Cc: Ryan Blue; Hyukjin Kwon; Spark dev list Subject: Re: data source api v2 refactoring [EXTERNAL EMAIL] Please report any suspicious attachments, links, or requests for

Re: data source api v2 refactoring

2018-10-18 Thread Wenchen Fan
esh" > *Cc: *Wenchen Fan , Hyukjin Kwon , > Spark Dev List > *Subject: *Re: data source api v2 refactoring > > > > Hi Jayesh, > > > > The existing sources haven't been ported to v2 yet. That is going to be > tricky because the existing sources imple

Re: data source api v2 refactoring

2018-09-19 Thread Thakrar, Jayesh
Thanks for the info Ryan – very helpful! From: Ryan Blue Reply-To: "rb...@netflix.com" Date: Wednesday, September 19, 2018 at 3:17 PM To: "Thakrar, Jayesh" Cc: Wenchen Fan , Hyukjin Kwon , Spark Dev List Subject: Re: data source api v2 refactoring Hi Jayesh, The exis

Re: data source api v2 refactoring

2018-09-19 Thread Ryan Blue
; > > *From: *Ryan Blue > *Reply-To: * > *Date: *Friday, September 7, 2018 at 2:19 PM > *To: *Wenchen Fan > *Cc: *Hyukjin Kwon , Spark Dev List < > dev@spark.apache.org> > *Subject: *Re: data source api v2 refactoring > > > > There are a few v2-related c

Re: data source api v2 refactoring

2018-09-07 Thread Thakrar, Jayesh
: Wenchen Fan Cc: Hyukjin Kwon , Spark Dev List Subject: Re: data source api v2 refactoring There are a few v2-related changes that we can work in parallel, at least for reviews: * SPARK-25006, #21978<https://github.com/apache/spark/pull/21978>: Add catalog to TableIdentifier - this proposes

Re: data source api v2 refactoring

2018-09-07 Thread Ryan Blue
LogicalWrite newLogicalWrite(writeConfig); >>>> } >>>> >>>> Without WriteConfig, the API looks like >>>> trait Table { >>>> LogicalWrite newAppendWrite(); >>>> >>>> LogicalWrite newDeleteWrite(deleteExprs); >>>>

Re: data source api v2 refactoring

2018-09-07 Thread Wenchen Fan
t;>> >>> LogicalWrite newDeleteWrite(deleteExprs); >>> } >>> >>> >>> It looks to me that the API is simpler without WriteConfig, what do you >>> think? >>> >>> Thanks, >>> Wenchen >>> >&g

Re: data source api v2 refactoring

2018-09-07 Thread Hyukjin Kwon
, but there is a difference: for micro-batch >>> mode, a physical scan outputs data for one epoch, but it's not true for >>> continuous mode. >>> >>> I'm not sure if it's necessary to include streaming epoch in the API >>> abstraction, for feature

Re: data source api v2 refactoring

2018-09-06 Thread Ryan Blue
:24 AM Ryan Blue > wrote: > >> Latest from Wenchen in case it was dropped. >> >> -- Forwarded message - >> From: Wenchen Fan >> Date: Mon, Sep 3, 2018 at 6:16 AM >> Subject: Re: data source api v2 refactoring >> To: >> Cc: Rya

Re: data source api v2 refactoring

2018-09-04 Thread Wenchen Fan
st from Wenchen in case it was dropped. > > -- Forwarded message - > From: Wenchen Fan > Date: Mon, Sep 3, 2018 at 6:16 AM > Subject: Re: data source api v2 refactoring > To: > Cc: Ryan Blue , Reynold Xin , < > dev@spark.apache.org> > > > Hi M

Fwd: data source api v2 refactoring

2018-09-04 Thread Ryan Blue
Latest from Wenchen in case it was dropped. -- Forwarded message - From: Wenchen Fan Date: Mon, Sep 3, 2018 at 6:16 AM Subject: Re: data source api v2 refactoring To: Cc: Ryan Blue , Reynold Xin , < dev@spark.apache.org> Hi Mridul, I'm not sure what's going

Re: data source api v2 refactoring

2018-09-04 Thread Marcelo Vanzin
r in archives ... [1] > Wondering which othersenderswere getting dropped (if yes). > > Regards > Mridul > > [1] > http://apache-spark-developers-list.1001551.n3.nabble.com/data-source-api-v2-refactoring-td24848.html > > > On Sat, Sep 1, 2018 at 8:58 PM Ryan Blue wrote: &g

Re: data source api v2 refactoring

2018-09-01 Thread Mridul Muralidharan
-source-api-v2-refactoring-td24848.html On Sat, Sep 1, 2018 at 8:58 PM Ryan Blue wrote: > Thanks for clarifying, Wenchen. I think that's what I expected. > > As for the abstraction, here's the way that I think about it: there are > two important parts of a scan: the definitio

Re: data source api v2 refactoring

2018-09-01 Thread Ryan Blue
Thanks for clarifying, Wenchen. I think that's what I expected. As for the abstraction, here's the way that I think about it: there are two important parts of a scan: the definition of what will be read, and task sets that actually perform the read. In batch, there's one definition of the scan and

Re: data source api v2 refactoring

2018-08-31 Thread Jungtaek Lim
Nice suggestion Reynold and great news to see that Wenchen succeeded prototyping! One thing I would like to make sure is, how continuous mode works with such abstraction. Would continuous mode be also abstracted with Stream, and createScan would provide unbounded Scan? Thanks, Jungtaek Lim (Heart

Re: data source api v2 refactoring

2018-08-31 Thread Ryan Blue
Thanks, Reynold! I think your API sketch looks great. I appreciate having the Table level in the abstraction to plug into as well. I think this makes it clear what everything does, particularly having the Stream level that represents a configured (by ScanConfig) streaming read and can act as a fac

data source api v2 refactoring

2018-08-30 Thread Reynold Xin
I spent some time last week looking at the current data source v2 apis, and I thought we should be a bit more buttoned up in terms of the abstractions and the guarantees Spark provides. In particular, I feel we need the following levels of "abstractions", to fit the use cases in Spark, from batch,