As Micah said, this would be pretty cool to use in Arrow datasets.  I can't
make any promises about helping develop it but if it were developed I could
help integrate into Arrow datasets / Acero and provide some proof of
concept.

On Wed, Jun 8, 2022, 6:35 AM Ryan Blue <b...@tabular.io> wrote:

> While I understand Kyle's concerns, I'm all for a C++ or Rust
> implementation.
>
> We know that this is going to help a lot of people that want to integrate
> Iceberg in engines that are outside the JVM ecosystem. I think it would be
> great to work with anyone that is interested and build up the community in
> this area!
>
> Ryan
>
> On Wed, Jun 8, 2022 at 3:16 AM OpenInx <open...@gmail.com> wrote:
>
>> As a cloud-native table format standard for the big-data ecosystem,  I
>> believe supporting multiple languages is the correct direction so that
>> different languages can connect to the apache iceberg table format.
>>
>> But I can also get Kyle's point about lacking enough resources(developers
>> and reviewers ) to accomplish this goal.  In my mind,  Python, Golang, C++,
>> Rust , all of them can be regarded as the native language support.  we may
>> just need to support the Rust SDK and then all of the other languages can
>> just wrap the Rust SDK to access the table format.
>>
>> Anyway,  we will need to wait for the REST catalog finished before we
>> introduce another languages support , because we can not access the iceberg
>> table by invoking the JVM catalog interfaces.
>>
>> On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>>> There’s also the question of how useful this would be in practice given
>>>> the complexity of using C++ (or Rust etc) within some of the major
>>>> frameworks.
>>>>
>>>
>>> One place this would be useful is for the Arrow's DataSet API [1].  An
>>> option the Arrow community might be open to is hosting parts of the code
>>> there (this is what is done for Apache Parquet C++).  This helps shape some
>>> of the answers to other questions posed (ORC and Parquet are already in the
>>> Repo, it provides a Filesystem interface, etc).  The project doesn't
>>> currently consume Avro, and I think the preferred approach is to make a
>>> clean room Avro parser.  But I agree this is a non-trivial effort to get
>>> underway.
>>>
>>> Another area to consider is compatibility testing.  I think before a
>>> third officially supported community library is introduced it would be good
>>> to have a compatibility framework in place to make sure implementations are
>>> all interpreting the specification correctly.  If there isn't already an
>>> effort here, I'd like to start contributing something (probably will have
>>> bandwidth sometime place in Q3).
>>>
>>> Thanks,
>>> -Micah
>>>
>>>
>>> [1] https://arrow.apache.org/docs/cpp/dataset.html
>>>
>>> On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io> wrote:
>>>
>>>> Hi caneGuy,
>>>>
>>>> I personally don’t dislike this idea. I understand the performance
>>>> benefits.
>>>>
>>>> But this would be a huge undertaking for the community. We’d need to
>>>> ensure we had sufficient developer support for reviews (likely one of the
>>>> biggest issues), as well as a number of other things. Particularly
>>>> dependencies, package management, etc. We’d also need to scope support down
>>>> to specific OS / compilers etc.
>>>>
>>>> We’d also need to be sure we had adequate developer support from a wide
>>>> enough range of the community to support the project long term. One issue
>>>> in open source is that developers will work on something tangential to
>>>> their project in another repository, but nobody is available to maintain 
>>>> it.
>>>>
>>>> There’s also the question of how useful this would be in practice given
>>>> the complexity of using C++ (or Rust etc) within some of the major
>>>> frameworks.
>>>>
>>>> Again, I’m not opposed to the idea but just trying to be realistic
>>>> about the realities of such an undertaking. It would need full community
>>>> support (or at least support from enough community members to be
>>>> sustainable).
>>>>
>>>> If you wanted to make a design doc, the milestones tab in the Iceberg
>>>> project has some that you might use as reference.
>>>>
>>>> *I highly suggest you come to the next community sync and bring this up
>>>> to the community then.*
>>>>
>>>> If you’re not already on the invite list for the monthly community
>>>> sync, you can get on it by joining the Google group. You’ll receive incites
>>>> when they go out:
>>>> https://groups.google.com/g/iceberg-sync
>>>>
>>>> Looking forward to seeing you at the next community sync.
>>>>
>>>> A design document and/or any prior art would be very helpful as the
>>>> community sync does discuss many topics (possibly there is existing C++
>>>> support in StarRocks for Iceberg V1?).
>>>>
>>>> Thank you,
>>>> Kyle Bendickson
>>>> GitHub: kbendick
>>>>
>>>> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> wrote:
>>>>
>>>>> Currently there is no existing effort to develop a C++ package. That
>>>>> being said I think it would be awesome to have one! If anyone is willing 
>>>>> to
>>>>> start that development effort, I can help with some of the ground work to
>>>>> kickstart it.
>>>>>
>>>>> I would say the first step would be for someone to prepare a
>>>>> high-level proposal.
>>>>>
>>>>> -Sam
>>>>>
>>>>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com> wrote:
>>>>>
>>>>>> Hi team
>>>>>> I am a dev from StarRocks community, and we have supported iceberg v1
>>>>>> format.
>>>>>> We are also planning to support v2 format. If there is a C++ package,
>>>>>> it will be very convenient for our implementation.
>>>>>> At the same time, other c++ computing engines support v2 format will
>>>>>> also be faster.
>>>>>>
>>>>>> Do we have plans to support c++ version sdk?
>>>>>> --
>>>>>> caneGuy
>>>>>>
>>>>> --
>>>>>
>>>>> Sam Redai <s...@tabular.io>
>>>>>
>>>>> Developer Advocate  |  Tabular <https://tabular.io/>
>>>>>
>>>>> c (267) 226-8606
>>>>>
>>>>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to