Re: DataSourceV2 capability API

2018-11-12 Thread JackyLee
I don't know if it is a right thing to make table API as ContinuousScanBuilder -> ContinuousScan -> ContinuousBatch, it makes batch/microBatch/Continuous too different from each other. In my opinion, these are basically similar at the table level. So is it possible to design an API like this? ScanB

Re: DataSourceV2 capability API

2018-11-12 Thread Wenchen Fan
gt;>>> >>>>> >>>>> On Fri, Nov 9, 2018 at 9:11 AM Ryan Blue wrote: >>>>> >>>>>> I'd have two places. First, a class that defines properties supported >>>>>> and identified by Spark, like the SQLCon

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
ption. >>>> >>>> >>>> On Fri, Nov 9, 2018 at 9:11 AM Ryan Blue wrote: >>>> >>>>> I'd have two places. First, a class that defines properties supported >>>>> and identified by Spark, like the SQLConf definitions. Second, in >>&

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
t;> and identified by Spark, like the SQLConf definitions. Second, in >>>> documentation for the v2 table API. >>>> >>>> On Fri, Nov 9, 2018 at 9:00 AM Felix Cheung >>>> wrote: >>>> >>>>> One question is where will th

Re: DataSourceV2 capability API

2018-11-09 Thread Reynold Xin
ns. Second, in >>> documentation for the v2 table API. >>> >>> On Fri, Nov 9, 2018 at 9:00 AM Felix Cheung >>> wrote: >>> >>>> One question is where will the list of capability strings be defined? >>>> >>>> >>>> ---

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
> *From:* Ryan Blue >>> *Sent:* Thursday, November 8, 2018 2:09 PM >>> *To:* Reynold Xin >>> *Cc:* Spark Dev List >>> *Subject:* Re: DataSourceV2 capability API >>> >>> >>> Yes, we currently use traits that have methods. Something like

Re: DataSourceV2 capability API

2018-11-09 Thread Reynold Xin
elix Cheung > wrote: > >> One question is where will the list of capability strings be defined? >> >> >> -- >> *From:* Ryan Blue >> *Sent:* Thursday, November 8, 2018 2:09 PM >> *To:* Reynold Xin >> *Cc:* Spark D

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
defined? > > > -- > *From:* Ryan Blue > *Sent:* Thursday, November 8, 2018 2:09 PM > *To:* Reynold Xin > *Cc:* Spark Dev List > *Subject:* Re: DataSourceV2 capability API > > > Yes, we currently use traits that have methods. Something

Re: DataSourceV2 capability API

2018-11-09 Thread Felix Cheung
One question is where will the list of capability strings be defined? From: Ryan Blue Sent: Thursday, November 8, 2018 2:09 PM To: Reynold Xin Cc: Spark Dev List Subject: Re: DataSourceV2 capability API Yes, we currently use traits that have methods. Something

Re: DataSourceV2 capability API

2018-11-08 Thread Ryan Blue
Yes, we currently use traits that have methods. Something like “supports reading missing columns” doesn’t need to deliver methods. The other example is where we don’t have an object to test for a trait ( scan.isInstanceOf[SupportsBatch]) until we have a Scan with pushdown done. That could be expens

Re: DataSourceV2 capability API

2018-11-08 Thread Reynold Xin
This is currently accomplished by having traits that data sources can extend, as well as runtime exceptions right? It's hard to argue one way vs another without knowing how things will evolve (e.g. how many different capabilities there will be). On Thu, Nov 8, 2018 at 12:50 PM Ryan Blue wrote:

DataSourceV2 capability API

2018-11-08 Thread Ryan Blue
Hi everyone, I’d like to propose an addition to DataSourceV2 tables, a capability API. This API would allow Spark to query a table to determine whether it supports a capability or not: val table = catalog.load(identifier) val supportsContinuous = table.isSupported("continuous-streaming") There a