function
> for that specific case.
>
> Best,
>
> Will
>
> On Tue, Mar 28, 2023 at 10:14 AM John Muehlhausen wrote:
>
> > Is there a way to pass a RecordBatch (or a batch wrapped as a Table) to
> > Take and get back a Table composed of in-place (zero copy) slices
Is there a way to pass a RecordBatch (or a batch wrapped as a Table) to
Take and get back a Table composed of in-place (zero copy) slices of the
input? I suppose this is not too hard to code, just wondered if there is
already a utility.
Result Take(const Datum& values, const Datum& indices,
Hello,
pyarrow.Table
from_batches(batches, Schema schema=None)
Construct a Table from a sequence or iterator of Arrow RecordBatches.
What is the equivalent of this in Java? What is the relationship between
VectorSchemaRoot, Table and RecordBatch in Java? It all seems a bit
different...
Specifi
:GetValueBytes(int64_t index)
> > >
> > >
> > > I think this would be problematic for Boolean?
> > >
> > > On Tue, Nov 15, 2022 at 11:01 AM John Muehlhausen wrote:
> > >
> > >> If that covers primitive and binary(string) types, that
If that covers primitive and binary(string) types, that would work for me.
On Tue, Nov 15, 2022 at 13:50 Antoine Pitrou wrote:
>
> Then perhaps we can define a method:
>
> std::string_view FlatArray::GetValueBytes(int64_t index)
>
> ?
>
>
> Le 15/11/2022 à 19:3
r place for this method if there is
> > consensus on adding it.
> >
> > Cheers,
> > Micah
> >
> > [1]
> >
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_base.h#L219
> >
> > On Mon, Nov 14, 2022 at 11:46 AM John Muehlha
There exists:
const uint8_t* BaseBinaryArray::GetValue(int64_t i, offset_type*
out_length) const
What about adding:
const uint8_t* Array::GetValue(int64_t i, offset_type* out_length) const
This would allow GetValue to get the untyped bytes/length of any value?
E.g. out_length would be set to size
if (fieldNullCount < 0)
{
throw new InvalidDataException("Null count length must be
>= 0"); // TODO:Localize exception message
}
Above from Ipc/ArrowReaderImplementation.cs.
pyarrow is fine with -1, probably due to the following. It would be ni
When
building
/// messages using the encapsulated IPC message, padding bytes may be
written
/// after a buffer, but such padding bytes do not need to be accounted
for in
/// the size here.
length: long;
}
On Thu, Sep 22, 2022 at 9:10 AM John Muehlhausen wrote:
> Regarding tab=feather.read_tab
e positions of the messages are declared in the file's footer's
> "record_batches".
>
> [1] https://github.com/apache/arrow/blob/master/format/Message.fbs#L87
>
> Best,
> Jorge
>
>
> On Thu, Sep 22, 2022 at 3:01 AM John Muehlhausen wrote:
>
>
5
On Wed, Sep 21, 2022 at 7:49 PM John Muehlhausen wrote:
> The following seems like good news... like I should be able to decompress
> just one column of a RecordBatch in the middle of a compressed feather v2
> file. Is there a Python API for this kind of access? C++?
>
> ///
/// compression does not yield appreciable savings.
BUFFER
}
On Wed, Sep 21, 2022 at 7:03 PM John Muehlhausen wrote:
> ``Internal structure supports random access and slicing from the middle.
> This also means that you can read a large file chunk by chunk without
> having to pull the whole t
``Internal structure supports random access and slicing from the middle.
This also means that you can read a large file chunk by chunk without
having to pull the whole thing into memory.''
https://ursalabs.org/blog/2020-feather-v2/
For a compressed v2 file, can I decompress just one column of a ba
error: invalid operands to binary expression
('nonstd::sv_lite::basic_string_view >' and
'basic_string_view')
This from
val == "str"sv
Is there a way to access a util::string_view as a std::string_view other
than re-building a std::string_view from data()/size() ?
-John
ons& options, io::InputStream* stream);
On Fri, Jul 1, 2022 at 3:18 PM John Muehlhausen wrote:
> If I call `Consume(std::shared_ptr buffer)` and it is already
> pre-framed to contain (e.g.) an entire RecordBatch Message and nothing
> else, will it use this Buffer in zero-copy mode w
If I call `Consume(std::shared_ptr buffer)` and it is already
pre-framed to contain (e.g.) an entire RecordBatch Message and nothing
else, will it use this Buffer in zero-copy mode when calling my
Listener::OnRecordBatchDecoded() implementation? I.e. will data in that
RecordBatch refer directly to
om default C++ memory pool on Linux, and/or interception/auditing
> of system pool" on Tue, 14 Jun 2022 09:06:51 -0500,
> John Muehlhausen wrote:
>
> > Hello,
> >
> > This comment is regarding installation with `apt` on ubuntu 18.04 ...
> > `libarrow-dev/
oc
-fno-builtin-__libc_memalign -fno-builtin-__posix_memalign
-fno-builtin-operator_new -fno-builtin-operator_delete" cmake --preset
ninja-debug-minimal -DARROW_JEMALLOC=OFF -DARROW_MIMALLOC=OFF
-DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=/usr/local ..
On Tue, Jun 14, 2022 at 12:36 PM John Muehl
My best guess at this moment is that the Arrow lib I'm using was built with
a compiler that had something like __builtin_posix_memalign in effect ??
I say this because deploying __builtin_malloc has the same deleterious
effect on my own .so
On Tue, Jun 14, 2022 at 10:53 AM John Muehlh
>
> > Arrow still uses the system allocator for all non-buffer allocations.
> > So, for example, when reading in a large IPC file, the majority of the
> > data will be allocated by Arrow's memory pool. However, the schema,
> > and the wrapper array object itself wi
I take that back... the preload is not intercepting memory_pool.cc
-> SystemAllocator -> AllocateAligned -> posix_memalign (if indeed this is
the system allocator path), although it is intercepting posix_memalign from
a different .so
On Tue, Jun 14, 2022 at 10:27 AM John Muehlhausen wr
4, 2022 at 9:06 AM John Muehlhausen wrote:
> Hello,
>
> This comment is regarding installation with `apt` on ubuntu 18.04 ...
> `libarrow-dev/bionic,now 8.0.0-1 amd64`
>
> I'm a bit confused about the memory pool situation:
>
> * I run with `ARROW_DEFAULT
Hello,
This comment is regarding installation with `apt` on ubuntu 18.04 ...
`libarrow-dev/bionic,now 8.0.0-1 amd64`
I'm a bit confused about the memory pool situation:
* I run with `ARROW_DEFAULT_MEMORY_POOL=system` and check that
`arrow::default_memory_pool()->backend_name() ==
arrow::system_m
Motivation:
We have memory-mappable Arrow IPC files with N batches where column(s) are
sorted to support binary search. Because log2(n) < log2(n/2)+log2(n/2) and
binary search is required on each batch, we prefer the batches to be as
large as possible to reduce total search time... perhaps larger
to build.
>
> This is one of many reasons we recommend using conda to organizations
> because things like the VS runtime are automatically handled. I'm not
> sure if there's a way to equivalently handle this with pip
>
> On Tue, Oct 6, 2020 at 9:16 AM John Muehlhausen
"pip install pyarrow
If you encounter any importing issues of the pip wheels on Windows, you may
need to install the Visual C++ Redistributable for Visual Studio 2015."
http://arrow.apache.org/docs/python/install.html
Just now wading into the use of pyarrow on Windows. Users are confused and
irr
a good idea.
>
> Cheers,
> Micah
>
>
> [1]
>
> https://15721.courses.cs.cmu.edu/spring2018/papers/22-vectorization2/p31-feng.pdf
> [2] https://github.com/apache/arrow/pull/4815
> [3]
>
> https://github.com/apache/arrow/blob/master/docs/source/format/Colu
new datatypes there is no separate flag to check?
On Thu, Jan 23, 2020 at 1:09 PM Wes McKinney wrote:
> On Thu, Jan 23, 2020 at 12:42 PM John Muehlhausen wrote:
> >
> > Again, I know very little about Parquet, so your patience is appreciated.
> >
> > At the moment I
have compression algorithm where the columnar engine can
> benefit from it [1] than marginally improving a file-system-os
> specific feature.
>
> François
>
> [1] Section 4.3 http://db.csail.mit.edu/pubs/abadi-column-stores.pdf
>
>
>
>
> On Thu, Jan 23, 2020 at 12:
n Thu, Jan 23, 2020 at 11:23 AM Antoine Pitrou wrote:
>
>
> Le 23/01/2020 à 18:16, John Muehlhausen a écrit :
> > Perhaps related to this thread, are there any current or proposed tools
to
> > transform columns for fixed-length data types according to a "shuffle?"
>
Perhaps related to this thread, are there any current or proposed tools to
transform columns for fixed-length data types according to a "shuffle?"
For precedent see the implementation of the shuffle filter in hdf5.
https://support.hdfgroup.org/ftp/HDF5//documentation/doc1.6/TechNotes/shuffling-alg
Given input data and a type, how do we predict whether array() will produce
ChunkedArray?
I figure the formula involves:
- the length of input
- the type, and max length (to be conservative) for variable length types
- some constant(s) that Arrow knows internally... that may change in the
future?
n that
> does modify headers) and then "touch" up the metadata for later analysis,
> so it conforms to the specification (and standard libraries can be used).
>
> [1] https://github.com/apache/arrow/blob/master/format/Message.fbs#L49
> [2] https://github.com/apache/arrow/blob/mast
> I contend that it can only be useful and will never be harmful. What are
> > the counter-examples of concrete harm?
>
>
> I'm not sure there is anything obviously wrong, however changes to
> semantics are always dangerous. One blemish on the current proposal is
"that's where the danger lies"
What danger? I have no idea what the specific danger is, assuming that all
reference implementations have test cases that hedge around this.
I contend that it can only be useful and will never be harmful. What are
the counter-examples of concrete harm?
s, we need it in a handful of implementations. I'm
willing to provide all of them. To me that is the lowest complexity
solution.
-John
On Wed, Oct 16, 2019 at 10:45 AM Wes McKinney wrote:
> On Wed, Oct 16, 2019 at 10:17 AM John Muehlhausen wrote:
> >
> > "pyar
t works without my proposed change, we can go back to how
the user ignores the empty/undefined array portions without knowing whether
they exist.
-John
On Wed, Oct 16, 2019 at 10:45 AM Wes McKinney wrote:
> On Wed, Oct 16, 2019 at 10:17 AM John Muehlhausen wrote:
> >
> > "pya
uot;smart" or "magical", instead maintaining tight
> developer control over what is going on.
>
> - Wes
>
> On Wed, Oct 16, 2019 at 2:18 AM Micah Kornfield
> wrote:
> >
> > Still thinking through the implications here, but to save others from
> &g
fashion and therefore has some unused array elements.
The change itself seems relatively simple. What negative consequences do
we anticipate, if any?
Thanks,
-John
On Fri, Jul 5, 2019 at 10:42 AM John Muehlhausen wrote:
> This seems to help... still testing it though.
>
> Status GetF
ARROW-6837 (which, er, includes ARROW-6836) and ARROW-5916 have PRs.
Would appreciate some feedback. I will finish the Python part of 6837 when
I know I'm on the right track.
Thanks,
John
On Thu, Oct 10, 2019 at 9:54 AM John Muehlhausen wrote:
> The format change is ARROW-6836 .
I'm missing something about this script.
FORMAT_DIR=$CWD/../..
How can any of the fbs files be in ../../ when they are in format/ ?
ntegration tests to prove it. The issues you listed
> sound more like C++ library changes to me?
>
> If you want to propose Format-related changes, that would need to
> happen right away otherwise the ship will sail on that.
>
> - Wes
>
> On Wed, Oct 9, 2019 at 9:08 PM John M
ARROW-5916
ARROW-6836/6837
These are of particular interest to me because they enable recordbatch
"incrementalism" which is useful for streaming applications:
ARROW-5916 allows a recordbatch to pre-allocate space for future records
that have not yet been populated, making it safe for readers to c
John Muehlhausen created ARROW-6840:
---
Summary: [C++/Python] retrieve fd of open memory mapped file and
Open() memory mapped file by fd
Key: ARROW-6840
URL: https://issues.apache.org/jira/browse/ARROW-6840
John Muehlhausen created ARROW-6839:
---
Summary: [Java] access File Footer custom_metadata
Key: ARROW-6839
URL: https://issues.apache.org/jira/browse/ARROW-6839
Project: Apache Arrow
Issue
John Muehlhausen created ARROW-6838:
---
Summary: [JS] access File Footer custom_metadata
Key: ARROW-6838
URL: https://issues.apache.org/jira/browse/ARROW-6838
Project: Apache Arrow
Issue
John Muehlhausen created ARROW-6837:
---
Summary: [C++/Python] access File Footer custom_metadata
Key: ARROW-6837
URL: https://issues.apache.org/jira/browse/ARROW-6837
Project: Apache Arrow
John Muehlhausen created ARROW-6836:
---
Summary: [Format] add a custom_metadata:[KeyValue] field to the
Footer table in File.fbs
Key: ARROW-6836
URL: https://issues.apache.org/jira/browse/ARROW-6836
I thought I should open all of the issues for tracking even if I don't
implement all of them right away?
On Thu, Oct 3, 2019 at 5:46 PM Antoine Pitrou wrote:
>
> Le 04/10/2019 à 00:18, John Muehlhausen a écrit :
> > I need to create two (or more) issues for
> > cu
PM Antoine Pitrou wrote:
>
> Le 03/10/2019 à 23:21, John Muehlhausen a écrit :
> >
> > Would we just make a variant of Open() that takes a fd rather than a
> path?
>
> That sounds like a good idea. Would you like to open a JIRA and a PR?
>
> > Would this API hav
I need to create two (or more) issues for
custom_metadata in Footer ...
https://lists.apache.org/thread.html/c3b3d1456b7062a435f6795c0308ccb7c8fe55c818cfed2cf55f76c5@%3Cdev.arrow.apache.org%3E
and
memory map based on fd ...
https://lists.apache.org/thread.html/83373ab00f552ee8afd2bac2b2721468b
I have a situation where multiple processes need to access a memory mapped
file.
However, between the time the first process maps the file and the time a
subsequent process in the group maps the file, the file may have been
removed from the filesystem. (I.e. has no "path") Coordinating the cache
John Muehlhausen created ARROW-5916:
---
Summary: [C++] Allow RecordBatch.length to be less than array
lengths
Key: ARROW-5916
URL: https://issues.apache.org/jira/browse/ARROW-5916
Project: Apache
It seems as if Arrow expects for some vectors to be empty rather than null.
(Examples: Footer.dictionaries, Field.children)
Anyone using --gen-object-api with flatc will get code that writes null
when (e.g.) _o->children.size() is zero in CreateField().
I may be missing something but I don't see
kely malformed");
}
const flatbuf::FieldNode* node = nodes->Get(field_index);
*//out->length = node->length();*
*out->length = metadata_->length();*
out->null_count = node->null_count();
out->offset = 0;
return Status::OK();
}
On Fri, Jul
So far it seems as if pyarrow is completely ignoring the RecordBatch.length
field. More info to follow...
On Tue, Jul 2, 2019 at 3:02 PM John Muehlhausen wrote:
> Crikey! I'll do some testing around that and suggest some test cases to
> ensure it continues to work, assuming t
Crikey! I'll do some testing around that and suggest some test cases to
ensure it continues to work, assuming that it does.
-John
On Tue, Jul 2, 2019 at 2:41 PM Wes McKinney wrote:
> Thanks for the attachment, it's helpful.
>
> On Tue, Jul 2, 2019 at 1:40 PM John
Attachments referred to in previous two messages:
https://www.dropbox.com/sh/6ycfuivrx70q2jx/AAAt-RDaZWmQ2VqlM-0s6TqWa?dl=0
On Tue, Jul 2, 2019 at 1:14 PM John Muehlhausen wrote:
> Thanks, Wes, for the thoughtful reply. I really appreciate the
> engagement. In order to clarify things a
: on
the one hand, length 1 RecordBatches that don't result in a stream that is
computationally efficient. On the other hand, adding artificial latency by
accumulating events before "freezing" a larger batch and only then making
it available to computation.
-John
On Tue, Jul 2,
During my time building financial analytics and trading systems (23
years!), both the "batch processing" and "stream processing" paradigms have
been extensively used by myself and by colleagues.
Unfortunately, the tools used in these paradigms have not successfully
overlapped. For example, an ana
If there is going to be a breaking change to the IPC format, I'd appreciate
some discussion about an idea I had for RecordBatch metadata. I previously
promised to create a discussion thread with an initial write-up but have
not yet done so. I will try to do this tomorrow. (The basic idea is to
h
; > >
> > > Note here are the other places where we have such fields:
> > >
> > > * Field
> > > * Schema
> > > * Message
> > >
> > > An alternative solution would be to handle such metadata in a separate
> > > file
Original write of File:
Schema: custom_metadata: {"value":1}
Message
Message
Footer
Schema: custom_metadata: {"value":1}
Process appends messages (new data in bold):
Schema: custom_metadata: {"value":1}
Message
Message
*Message*
*Footer*
* Schema: custom_metadata: {"value":2}*
Re-writing t
John Muehlhausen created ARROW-5439:
---
Summary: [Java] Utilize stream EOS in File format
Key: ARROW-5439
URL: https://issues.apache.org/jira/browse/ARROW-5439
Project: Apache Arrow
Issue
John Muehlhausen created ARROW-5438:
---
Summary: [JS] Utilize stream EOS in File format
Key: ARROW-5438
URL: https://issues.apache.org/jira/browse/ARROW-5438
Project: Apache Arrow
Issue Type
ach, so maybe we can just sort out C++
> for now
>
> On Wed, May 22, 2019 at 3:03 PM John Muehlhausen wrote:
> >
> > I added this to https://github.com/apache/arrow/pull/4372 and am hoping
> CI
> > will test it for me. Do Java/JS require separate JIRA entries?
> &
ent across
> platforms
>
> On Wed, May 22, 2019 at 11:02 PM John Muehlhausen wrote:
> >
> > Well, it works fine on Linux... and the Linux mmap man page seems to
> > indicate you are right about MAP_PRIVATE:
> >
> > "It is unspecified whether changes ma
We have __eq__ leaning on as_py() already ... any reason not to have __lt__
?
This makes it possible to use bisect to find slices in ordered data without
a __getitem__ wrapper:
1176.0 key=pa.array(['AAPL'])
110.0 print(bisect.bisect_left(batch[3],key[0]))
64.0 print(bisect.bisec
ow/blob/master/ci/conda_env_cpp.yml#L31
>
> On Thu, May 23, 2019 at 12:53 PM John Muehlhausen wrote:
> >
> > The pyarrow-dev conda environment does not include llvm 7, which appears
> to
> > be a requirement for Gandiva.
> >
> > So I'm just trying to figure out a pa
hon.rst
>
> Let us know if that does not work.
>
> - Wes
>
> On Wed, May 22, 2019 at 11:02 AM John Muehlhausen wrote:
> >
> > Set up pyarrow-dev conda environment as at
> > https://arrow.apache.org/docs/developers/python.html
> >
> > Got the following
es it work as expected on MacOS. Still odd
that the changes are only sometimes visible ... but I guess that is
compatible with it being "unspecified."
-John
On Wed, May 22, 2019 at 8:56 PM John Muehlhausen wrote:
> I'll mess with this on various platforms and report back. Tha
>field1
> 0 1.0
> 1 NaN
>
> Now ran dd to overwrite the file contents
>
> In [14]: batch.to_pandas()
> Out[14]:
> field1
> 0 NaN
> 1 -245785081.0
>
> On Wed, May 22, 2019 at 8:34 PM John Muehlhausen wrote:
> >
> > I don
ithub.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L393
>
> Some more investigation would be required
>
> On Wed, May 22, 2019 at 7:43 PM John Muehlhausen wrote:
> >
> > Is there an example somewhere of referring to the RecordBatch data in a
> memory-mapped IPC
(new test attached)
On Wed, May 22, 2019 at 8:09 PM John Muehlhausen wrote:
> I don't think that is it. I changed my mmap to MAP_PRIVATE in the first
> raw mmap test and the dd changes are still visible. I also changed to
> storing the stream format instead of the file format an
Is there an example somewhere of referring to the RecordBatch data in a
memory-mapped IPC File in a zero-copy manner?
I tried to do this in Python and must be doing something wrong. (I don't
really care whether the example is Python or C++)
In the attached test, when I get to the first prompt an
/vector/ipc/ArrowFileWriter.java#L67
>
> On Wed, May 22, 2019 at 12:24 PM John Muehlhausen wrote:
> >
> > https://github.com/apache/arrow/pull/4372
> >
> > First contribution attempt... sorry in advance if I'm not coloring inside
> > the lines!
> >
https://github.com/apache/arrow/pull/4372
First contribution attempt... sorry in advance if I'm not coloring inside
the lines!
On Wed, May 22, 2019 at 9:06 AM John Muehlhausen wrote:
> I will submit a patch once I get set up for that. My crystal ball says
> that some people w
John Muehlhausen created ARROW-5395:
---
Summary: Utilize stream EOS in File format
Key: ARROW-5395
URL: https://issues.apache.org/jira/browse/ARROW-5395
Project: Apache Arrow
Issue Type
Set up pyarrow-dev conda environment as at
https://arrow.apache.org/docs/developers/python.html
Got the following error. I will disable Gandiva for now but I'd like to
get it back at some point. I'm on Mac OS 10.13.6.
CMake Error at cmake_modules/FindLLVM.cmake:33 (find_package):
Could not fi
tation is not "wrong".
>
> On Wed, May 22, 2019 at 8:37 AM John Muehlhausen wrote:
> >
> > I believe the change involves updating the File format notes as above, as
> > well as something like the following. The format also mentions "there is
> >
ote:
> This seems like a reasonable change. Is there any reason that we shouldnt
> always append EOS?
>
> On Tuesday, May 21, 2019, John Muehlhausen wrote:
>
> > Wes,
> >
> > Check out reader.cpp. It seg faults when it gets to the next
> > message-that
where messages are popped off the InputStream here
>
>
> https://github.com/apache/arrow/blob/6f80ea4928f0d26ca175002f2e9f511962c8b012/cpp/src/arrow/ipc/message.cc#L281
>
> If the end of the byte stream is reached, or EOS (0) is encountered,
> then the stream reader stops iteration.
>
https://arrow.apache.org/docs/format/IPC.html#file-format
If this stream marker is optional in the file format, doesn't this prevent
someone from reading the file without being able to seek() it, e.g. if it
is "piped in" to a program? Or otherwise they'll have to stream in the
entire thing befo
; On Mon, May 13, 2019 at 8:36 AM Wes McKinney
> > wrote:
> > >
> > > > hi John -- I'd recommend implementing these capabilities as Kernel
> > > > functions under cpp/src/arrow/compute, then they can be exposed in
> > > > Python easily.
; number of interested parties and start designing a proposal (which may
> or may not include spec additions).
>
> Regards
>
> Antoine.
>
>
> Le 13/05/2019 à 15:38, John Muehlhausen a écrit :
> > Micah, yes, it all works at the moment. How have we staked out that it
Does pyarrow currently support filter/sort/search without conversion to
pandas? I don’t see anything but want to be sure. Sorry if I overlooked it.
Specific needs:
1- filter an arrow record batch and sort the results into a new batch
2- find slice locations for a sorted batch using binary search
favor of
> > making changes to the binary protocol for this use case; if others
> > have opinions I'll let them speak for themselves.
> >
> > - Wes
> >
> > On Mon, May 13, 2019 at 7:50 AM John Muehlhausen wrote:
> > >
> > > Any thoughts on
ocks so that readers know to call "Slice" on the blocks to obtain
> only the written-so-far portion. I'm not likely to be in favor of
> making changes to the binary protocol for this use case; if others
> have opinions I'll let them speak for themselves.
>
>
Any thoughts on a RecordBatch distinguishing size from capacity? (To borrow
std::vector terminology)
Thanks,
John
On Thu, May 9, 2019 at 2:46 PM John Muehlhausen wrote:
> Wes et al, I think my core proposal is that Message.fbs:RecordBatch split
> the "length" parameter into
e case of the file format, while the file is locked, a new
RecordBatch would overwrite the previous file Footer and a new Footer would
be written. In order to be able to delete or archive old data multiple
files could be strung together in a logical series.
-John
On Tue, May 7, 2019 at 2:39
f you'd like to experiment with creating an API for pre-allocating
> > fixed-size Arrow protocol blocks and then mutating the data and
> > metadata on disk in-place, please be our guest. We don't have the
> > tools developed yet to do this for you
> >
> > - Wes
27;m not
sure how to better make my case
-John
On Tue, May 7, 2019 at 11:02 AM Wes McKinney wrote:
> hi John,
>
> On Tue, May 7, 2019 at 10:53 AM John Muehlhausen wrote:
> >
> > Wes et al, I completed a preliminary study of populating a Feather file
> > incrementally.
t forking the project, IMHO that is a dark path
> that leads nowhere good. We have a large community here and we accept
> pull requests -- I think the challenge is going to be defining the use
> case to suitable clarity that a general purpose solution can be
> developed.
>
> - Wes
l/27945533db782361143586fd77ca08e15e96e2f2a5250ff084b462d6@%3Cdev.arrow.apache.org%3E
>
>
>
>
>
>
>
> On Mon, May 6, 2019 at 10:39 AM John Muehlhausen wrote:
> >
> > Wes,
> >
> > I’m not afraid of writing my own C++ code to deal with all of this on
s restarted or two separate processes active simultaneously) you'll
> > need to build up your own data structures to help with this.
> >
> > On Mon, May 6, 2019 at 6:28 PM John Muehlhausen wrote:
> >
> > > Hello,
> > >
> > > Glad to
t is the
> specific pattern you're trying to undertake for building.
>
> If you're trying to go across independent processes (whether the same
> process restarted or two separate processes active simultaneously) you'll
> need to build up your own data structures to hel
Hello,
Glad to learn of this project— good work!
If I allocate a single chunk of memory and start building Arrow format
within it, does this chunk save any state regarding my progress?
For example, suppose I allocate a column for floating point (fixed width)
and a column for string (variable wid
97 matches
Mail list logo