> There’s already a wrapper to adapt Arrow to ColumnarBatch, as well as an
iterator to read a ColumnarBatch as a sequence of InternalRow. That’s what
we want to take advantage of. You’re right that the first thing that Spark
does it to get each row as InternalRow. But we still get a benefit from
ve
if Iceberg Reader was to wrap Arrow or ColumnarBatch behind an
Iterator[InternalRow] interface, it would still not work right? Coz it
seems to me there is a lot more going on upstream in the operator execution
path that would be needed to be done here.
There’s already a wrapper to adapt Arrow to C
Hello devs,
As a follow up to
https://github.com/apache/incubator-iceberg/issues/9 I'v been reading
through how Spark does vectorized reading in it's current implementation
which is in DataSource V1 path. Trying to see how we can achieve the same
impact in Iceberg's reading. To start with I
Yes, I agree. I'll talk a little about a couple of the constraints of this
as well.
On Fri, May 24, 2019 at 5:52 AM Anton Okolnychyi
wrote:
> The agenda looks good to me. I think it would also make sense to clarify
> the responsibilities of query engines and Iceberg. Not only in terms of
> uniqu
The agenda looks good to me. I think it would also make sense to clarify the
responsibilities of query engines and Iceberg. Not only in terms of uniqueness,
but also in terms of applying diffs on read, for example.
> On 23 May 2019, at 01:59, Ryan Blue wrote:
>
> Here’s a rough agenda:
>
> U