> two
> > > possible idioms in the same library. It means code written against the
> > > library becomes less portable (you need to know how the memory allocator
> > is
> > > using GC or not).
> > >
> > > I understand manual memory management i
gt; addressing the problem?
>
>
> -Micah
>
>
>
> On Thu, Oct 7, 2021 at 3:48 PM Hongze Zhang wrote:
>
> > We don't have to concern about that since no difference will be made on
> > current manual release path unless "MemoryChunkCleaner" is explicit
s less portable (you need to know how the memory allocator is
> > using GC or not).
> >
> > I understand manual memory management in Java is tedious but is there a
> > specific problem this is addressing other than making Arrow have more
> > expected semantics to Java users?
rely on refcounting for keeping things in check, I'm not
>sure why changing the default is such a good idea...
>
>On Tue, Oct 5, 2021 at 2:20 AM Hongze Zhang wrote:
>
>> Hi Laurent,
>>
>>
>>
>>
>> Sorry I might describe it unclearly and yes
the GC itself to collect and free buffers?
>
>On Wed, Sep 29, 2021 at 11:58 PM Hongze Zhang wrote:
>
>> Hi,
>>
>> I would like to discuss on the potential of introducing a GC-based
>> reference management strategy to Arrow Java, and we
>> have already been wor
Hi,
I would like to discuss on the potential of introducing a GC-based reference
management strategy to Arrow Java, and we
have already been working on an implementation in our own project. I have put
the related codes in following branch and
if it makes sense to upstream Apache Arrow I can open
On Wed, 2021-08-25 at 21:02 +0300, roee shlomo wrote:
> This means that an API to import an ArrowSchema (in C) into a
> Field/Schema
> (in Java) is not suitable for dictionary encoded arrays because there
> is an
> information loss. Specifically, there is nothing in Field/Schema to
> indicate the
b.com/apache/arrow/pull/10883
[3] https://github.com/apache/arrow/pull/10333
[4] https://github.com/apache/arrow/pull/10114
[5] https://github.com/apache/arrow/pull/10652
On Thu, 2021-08-05 at 18:27 +0800, Hongze Zhang wrote:
> Thanks everyone for the quick response! By the way I might raise this
Thanks everyone for the quick response! By the way I might raise this
review request a little bit late because I was working on some other
projects in the last few months either. Now I just have some time to
push this forward. :)
About ARROW-11776:
On Wed, 2021-08-04 at 08:45 -0700, Micah Kornfi
Hi,
I have some PRs that were to improve Dataset API's Java implementation
have not been reviewing for months. Could someone help me to review
them? Thanks in advance.
1. https://github.com/apache/arrow/pull/10201
ARROW-11776: [Java][Dataset] Support writing to files within dataset
scanner via JN
On Wed, 2021-06-02 at 13:56 -0700, Micah Kornfield wrote:
> >
> > Any SQL interface to Arrow should follow the SQL standard. So, for
> > instance, if a column has TIMESTAMP type, it should behave as a
> > date-time without a time-zone.
>
>
> At least in bigquery we do the following mapping:
> SQ
Hi All,
Sorry to send a request to all but just would like to ask if anyone could be
able to help finish the review for PR#7030[1].
As of now the PR contains following parts:
1. Base dataset API for Java language (which follows the shape of C++ API)
2. A JNI-based implementation of FileSyste
ty dependencies.
>
>On Mon, Jul 20, 2020 at 3:52 AM Hongze Zhang wrote:
>
>> Hi,
>>
>> I want to discuss a bit about the discussion[1] in the pending PR[2] for
>> Java Dataset(it's no longer "Datasets" I guess?) API.
>>
>>
>> - Backgr
Hi all,
Does anyone ever try using Arrow Dataset API in a distributed system? E.g.
create scan tasks in machine 1, then send and execute these tasks from machine
2, 3, 4.
So far I think a possible workaround is to:
1. Create Dataset on machine 1;
2. Call Scan(), collect all scan tasks from sca
Hi,
I want to discuss a bit about the discussion[1] in the pending PR[2] for Java
Dataset(it's no longer "Datasets" I guess?) API.
- Background:
We are transferring C++ Arrow buffers to Java side BufferAllocators. We should
decide whether to use -XX:MaxDirectMemorySize as a limit of these buf
Hongze Zhang created ARROW-8596:
---
Summary: [C++][Dataset] Add test case to check if all essential
properties are reserved once ScannerBuilder::Project is called
Key: ARROW-8596
URL: https://issues.apache.org/jira
Hongze Zhang created ARROW-8499:
---
Summary: [C++][Dataset] In ScannerBuilder, batch_size will not
work if projecter is not empty
Key: ARROW-8499
URL: https://issues.apache.org/jira/browse/ARROW-8499
Hongze Zhang created ARROW-7808:
---
Summary: [Java][Dataset] Implement Datasets Java API
Key: ARROW-7808
URL: https://issues.apache.org/jira/browse/ARROW-7808
Project: Apache Arrow
Issue Type
Hongze Zhang created ARROW-7329:
---
Summary: AllocationManager: Allow managing different types of
memory other than those are allocated using Netty
Key: ARROW-7329
URL: https://issues.apache.org/jira/browse/ARROW
he future we might have
> OdbcDataSource and FlightDataSource
>
> Basically, dataset::FileFormat is meant to be a unified interface to
> interact with file formats. Here's an example of such usage without
> all the dataset machinery [3].
>
> François
>
> [1] https://issues.a
ub.com/apache/arrow/pull/5608
>
> Regards
>
> Antoine.
>
>
>
> Le 27/11/2019 à 11:16, Hongze Zhang a écrit :
> > Hi Micah,
> >
> >
> > Regarding our use cases, we'd use the API on Parquet files with some pushed
> > filters and
>
ew
>and relationships between components and how it will co-exist with existing
>Java code). If I understand correctly, one goal is to use this as a basis
>for a new Spark DataSet API with better performance than the vectorized
>spark parquet reader? Are there others?
>
>Wes, wha
I-based
>> interface to the C++ libraries as one potential approach to save on
>> development time.
>>
>> - Wes
>>
>>
>>
>> On Tue, Nov 26, 2019 at 5:54 AM Hongze Zhang wrote:
>> >
>> > Hi all,
>> >
>> >
>> &g
Hi all,
Recently the datasets API has been improved a lot and I found some of the new
features are very useful to my own work. For example to me a important one is
the fix of ARROW-6952[1]. And as I currently work on Java/Scala projects like
Spark, I am now investigating a way to call some of
24 matches
Mail list logo