They are based on a physical column, the column is real. The function just
only exists in the datasource.
For example
Select ttl(a), ttl(b) FROM table ks.tab
On Tue, Sep 4, 2018 at 11:16 PM Reynold Xin wrote:
> Russell your special columns wouldn’t actually work with option 1 because
> Spark w
Russell your special columns wouldn’t actually work with option 1 because
Spark would have to fail them in analysis without an actual physical
column.
On Tue, Sep 4, 2018 at 9:12 PM Russell Spitzer
wrote:
> I'm a big fan of 1 as well. I had to implement something similar using
> custom expressio
I'm a big fan of 1 as well. I had to implement something similar using
custom expressions and it was a bit more work than it should be. In
particular our use case is that columns have certain metadata (ttl,
writetime) which exist not as separate columns but as special values which
can be surfaced.
Thanks for posting the summary. I'm strongly in favor of option 1.
I think that API footprint is fairly small, but worth it. Not only does it
make sources easier to implement by handling parsing, it also makes sources
more reliable because Spark handles validation the same way across sources.
A g
Ryan, Michael and I discussed this offline today. Some notes here:
His use case is to support partitioning data by derived columns, rather
than physical columns, because he didn't want his users to keep adding the
"date" column when in reality they are purely derived from some timestamp
column. We
I think I found a good solution to the problem of using Expression in the
TableCatalog API and in the DeleteSupport API.
For DeleteSupport, there is already a stable and public subset of
Expression named Filter that can be used to pass filters. The reason why
DeleteSupport would use Expression is
I agree that it would be great to have a stable public expression API that
corresponds to what is parsed, not the implementations. That would be
great, but I worry that it will get out of date, and a data source that
needs to support a new expression has to wait up to 6 months for a public
release
Sorry I completely disagree with using Expression in critical public APIs
that we expect a lot of developers to use. There's a huge difference
between exposing InternalRow vs Expression. InternalRow is a relatively
small surface (still quite large) that I can see ourselves within a version
getting
Reynold, did you get a chance to look at my response about using
`Expression`? I think that it's okay since it is already exposed in the v2
data source API. Plus, I wouldn't want to block this on building a public
expression API that is more stable.
I think that's the only objection to this SPIP.
I don’t think that we want to block this work until we have a public and
stable Expression. Like our decision to expose InternalRow, I think that
while this option isn’t great, it at least allows us to move forward. We
can hopefully replace it later.
Also note that the use of Expression is in the
Seems reasonable at high level. I don't think we can use Expression's and
SortOrder's in public APIs though. Those are not meant to be public and can
break easily across versions.
On Tue, Jul 24, 2018 at 9:26 AM Ryan Blue wrote:
> The recently adopted SPIP to standardize logical plans requires
11 matches
Mail list logo