Re: [C++] Deprecate Skyhook?

2025-05-06 Thread Aldrin
row's build system? > Short of that, maybe just moving the Skyhook sub-tree (and related > files outside of it) into its own repo would be a start, even if it > doesn't build and we just documented that fact. i.e., make it a > source-only archive. > > On Mon, May 5,

Re: [C++] Deprecate Skyhook?

2025-05-05 Thread Aldrin
're willing to rebase it). Is there a way to do this that doesn't essentially look like [1]? [1]: https://github.com/uccross/skyhookdm-arrow -Aldrin On Mon, May 5, 2025 at 10:57, Bryce Mecum wrote: > +1 for deprecating. I think it would be great if we could find a > voluntee

Re: [C++] Deprecate Skyhook?

2025-05-05 Thread Aldrin
I think deprecating is a good idea. I haven't had time to try and maintain it and I'm doubtful the original author is following any of the communications. If I get around to picking up [1], then I can see about "reviving" skyhook, but in that case the component will look very different anyways. [

Re: Kapa.ai bot now live on the dev docs

2025-03-28 Thread Aldrin
Ooh, yeah it's looking fairly effective. I asked some questions and I like that the answers address differences in language implementations (e.g. python bindings vs cpp) and that there are relatively good code suggestions. I assume this means the tests and benchmarks are helping the RAG a lot, so

Re: [DISCUSS] Arrow Flight Predicate Pushdown

2025-03-27 Thread Aldrin
d the already written framework, I essentially use custom logic everywhere else (my Tickets are protobuf messages and I don't do anything with descriptors, etc). - Aldrin Sent from Proton Mail for iOS On Thu, Mar 27, 2025 at 04:59, David Li <lidav...@apache.org> wrote: It's n

Re: Inquiry on Using RecordBatchStreamWriter/RecordBatchStreamReader for Network Transmission

2025-01-21 Thread Aldrin
Hello! Can I depend on these interfaces to leverage Arrow format as binary exchange mechanism over HTTP?Yes. You can see [1] for a bit of discussion and some github links. But, the short answer is that the stream writer and stream reader interfaces are convenient interfaces to data movement over

Re: Arrow Maintainer Dashboard

2024-08-07 Thread Aldrin
Hi Raúl, Clickable in what way? I can click on the legend and i can zoom in on the graphs, just wondering what interaction you're thinking of? -Aldrin Sent from Proton Mail for iOS On Wed, Aug 7, 2024 at 02:14, Alenka Frim <frim.ale...@gmail.com> wrote: Hi Raúl, Thank you for yo

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-16 Thread Aldrin
ch is most useful for very high-level users. [1]: https://arrow.apache.org/docs/cpp/io.html#filesystems Sent from Proton Mail for iOS On Tue, Jul 16, 2024 at 07:22, Antoine Pitrou <anto...@python.org> wrote: Hello Aldrin, It's not either/or, the directory marker is created everytime n

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-15 Thread Aldrin
em is optimized for that very thing and it could be mounted to memory instead of a block device). # ------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Monday, July 15th, 2024 at 10:20, Aldrin wrot

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-15 Thread Aldrin
bout the S3Filesystem implementation of Arrow) or was an old option that was changed in favor of creating the marker on deletion. [1]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html # -- # Aldrin https://github.com/drin/ https://gitlab

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Aldrin
> ...then I still expect the directory /foo to exist Right, but if that is the sole purpose of empty directory markers, I'm curious if there was an attempt at keeping track of the prefixes/directories locally? # -- # Aldrin https://github.com/drin

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Aldrin
ssume it's for listing objects, but what else? [1]: https://github.com/apache/arrow/issues/36275 # -- # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Friday, July 12th, 2024 at 14:26, Raphael Taylor-Davie

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Aldrin
Hello! This may be naive, but why does the empty directory marker need to exist on the S3 side at all? If a local directory is created (because filesystem semantics), then I am not sure why a fake object needs to exist on the object-store side. # -- # Aldrin

Re: [DISCUSS] Approach to generic schema representation

2024-07-08 Thread Aldrin
Based on the response to using an empty IPC stream/file, it sounds to me like something substrait-like is ideal. Maybe an interface that can go between the equivalent of relational schemas and (generated) arrow code as you have shown. Then, there could be straightforward integration points with

Re: [DISCUSS] Statistics through the C data interface

2024-05-23 Thread Aldrin
For what it's worth, duckdb accesses arrow data via IPC in an extension then exports to C data interface to call into code in its core. Also, assumptions about when query optimization occurs relative to data access potentially breaks down in scenarios involving: views, distributed tables, substr

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-23 Thread Aldrin
ybe someone can chime in with more information and thoughts in the meantime. [1]: https://arxiv.org/pdf/2304.05028.pdf Sent from Proton Mail for iOS On Sat, Mar 23, 2024 at 05:23, Andrei Lazăr <lazarandrei...@gmail.com> wrote: Hi Aldrin, thanks for taking the time to reply to my email! In

Re: [C++][Parquet] Support different compression algorithms per row group

2024-03-22 Thread Aldrin
Hello! I don't do much with compression, so I could be wrong, but I assume a compression algorithm spans the whole column and areas of large variance generally benefit less from the compression, but the encoding still provides benefits across separate areas (e.g. separate row groups). My impress

Re: [DISCUSS][C++] Help needed to refactor Skyhook

2024-03-15 Thread Aldrin
ok.cc#L153-L156 # ------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Thursday, March 14th, 2024 at 09:10, Jayjeet Chakraborty wrote: > Hi Ben, I am willing to help out with the refactor too ! > > On Wed, Mar 13, 2024 at

Re: [DISCUSS][C++] Help needed to refactor Skyhook

2024-03-13 Thread Aldrin
I am interested in helping to refactor! -Aldrin On Wed, Mar 13, 2024 at 08:54, Benjamin Kietzman <bengil...@gmail.com> wrote: Skyhook [1] enables efficient predicate and projection pushdown from Arrow Dataset to a Ceph storage cluster. This is very cool functionality, but it's tigh

Re: dev question - is it possible to store different types in a single array ?

2024-02-29 Thread Aldrin
Hello! For an Array of mixed types, you can use a DenseUnion [1] or SparseUnion type [2]. For modeling as rows instead of columns, the short answer is "no" but you could store the pivot/rotation of the table (columns represent rows) or you can use something like a StructArray [3]. The data in

Re: [DISCUSS] Move sqlparser-rs back into DataFusion project?

2024-02-27 Thread Aldrin
Maybe it would be valuable to more explicitly define "moving back into DataFusion project". I assumed it meant absorbing into the datafusion repo, but it occurs to me that may not be the case. Then, how would sqlparser-rs be "moved"? # ---

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

2024-02-27 Thread Aldrin
ed in providing feedback. I glanced at the document before but I'll go through again to see if there is anything I can comment on. # ------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Tuesday, February 27th,

Re: [DISCUSS] Move sqlparser-rs back into DataFusion project?

2024-02-17 Thread Aldrin
<<< text/html; charset=utf-8: Unrecognized >>> publicKey - octalene.dev@pm.me - 0x21969656.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature

Re: Is there a way we can read a data frame from a cpp program in Apache fusion program in Rust?

2024-02-08 Thread Aldrin
://docs.rs/datafusion/latest/datafusion/execution/context/struct.SessionContext.html#method.read_csv [4]: https://arrow.apache.org/datafusion/library-user-guide/custom-table-providers.html # -- # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://

Re: [VOTE] Flight SQL as experimental

2023-12-08 Thread Aldrin
<<< text/html; charset=utf-8: Unrecognized >>> publicKey - octalene.dev@pm.me - 0x21969656.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature

Re: Is there anyway to resize record batches

2023-11-22 Thread Aldrin
cross implementations since ChunkedArray is not part of the specification, though I am optimistic that if you pass ChunkedArray to a different implementation then the C++ implementation could consolidate it as a single Array. # -- # Aldrin https://github.com

Re: Is there anyway to resize record batches

2023-11-22 Thread Aldrin
/api/table.html#_CPPv4N5arrow16TableBatchReaderE [8]: https://arrow.apache.org/docs/cpp/compute.html#selections # ------ # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene On Wednesday, November 22nd, 2023 at 10:58, Jacek Pliszka wrote: > Hi!

Re: [Format] C Data Interface integration testing

2023-10-19 Thread Aldrin
try the unsubscribe link at [1].[1]: https://arrow.apache.org/community/ Sent from Proton Mail for iOS On Thu, Oct 19, 2023 at 23:41, Richard Haven wrote: UNSUBSCRIBEBAJARSEANFOSGRIFIADОТПИШИHLOKOMELAOn Thu, Oct 19, 2023 at 9:56 AM Antoine Pitrou wrote:>> Hello again

Re: Apache Arrow file format

2023-10-19 Thread Aldrin
And the first paper's reference of arrow (in the references section) lists 2022 as the date of last access. Sent from Proton Mail for iOS On Thu, Oct 19, 2023 at 18:51, Aldrin <octalene@pm.me.INVALID> wrote: For context, that second referenced paper has Wes McKinney as a co

Re: Apache Arrow file format

2023-10-19 Thread Aldrin
For context, that second referenced paper has Wes McKinney as a co-author, so they were much better positioned to say "the right things." Sent from Proton Mail for iOS On Thu, Oct 19, 2023 at 18:38, Jin Shang wrote: Honestly I don't understand why this VLDB paper [1] ch

Re: [DISCUSS][C++] Raw pointer string views

2023-09-27 Thread Aldrin
tions convert any type to a raw pointer I assume that internal representations are not problematic. But, even so, perhaps those benchmarks can be reused to do the comparison (if that helps reduce the amount of work to be done for Ben).-Aldrin Sent from Proton Mail for iOS On Wed, Sep 27, 2023 at

Re: Need help on ArrayaSpan and writing C++ udf

2023-07-17 Thread Aldrin
Oh wait, I see now that you're incrementing with a uint8_t*. That could be fine for your own use, but you might want to make sure it aligns with the type of your output (Int64Array vs Int32Array). Sent from Proton Mail for iOS On Mon, Jul 17, 2023 at 06:20, Aldrin <octalene@pm.me

Re: Need help on ArrayaSpan and writing C++ udf

2023-07-17 Thread Aldrin
Hi Wenbo,An ArraySpan is like an ArrayData but does not own the data, so the ColumnarFormat doc that Jon shared is relevant for both.In the case of a binary format, the output ArraySpan must have at least 2 buffers: the offsets and the contiguous binary data (values). If the output of your UDF i

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-13 Thread Aldrin
columnar format without having to prove out the benefits for libraries that >use a different tech stack (e.g. rust vs C++ vs go). [1]: https://docs.google.com/presentation/d/1EiBgwtoYW6ADTxFc9iRs8KLPV0st0GZqmGy40Uz8jPk/edit?usp=sharing # -- # Aldrin https:/

Re: Apache Arrow | Graph Algorithms & Data Structures

2023-06-30 Thread Aldrin
an adjacency matrix or adjacency lists or if you're using a more normalized relational format. Thanks! # -- # Aldrin https://github.com/drin/ https://gitlab.com/octalene https://keybase.io/octalene publickey - octalene.dev@pm.me - 0x21969656.asc Description

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-05-20 Thread Aldrin
I don't feel like this representation is necessarily a detail of the query engine, but I am also not sure why this representation would have to be converted to a non-view format when serializing. Could you clarify that? My impression is that this representation could be used for persistence or d

Re: [DISCUSS][C++] How to run arrow-dataset-dataset-writer-test

2023-04-07 Thread Aldrin
see if the script itself is working or if there's something in your configuration that's wrong. I can show more direct examples once I update my environment. Aldrin Montana Computer Science PhD Student UC Santa Cruz On Fri, Apr 7, 2023 at 7:34 AM Haocheng Liu wrote: > Hi, > > I&#

Re: Proposal: add a bot to close PRs that haven't been updated in 30 days

2023-03-31 Thread Aldrin
PR a draft PR? In general I agree with the general direction of the discussion otherwise. Aldrin Montana Computer Science PhD Student UC Santa Cruz On Fri, Mar 31, 2023 at 7:49 AM Will Jones wrote: > > Also good to know: contributors apparently can't re-open PRs if it was > >

Re: [ANNOUNCE] New Arrow PMC member: Will Jones

2023-03-13 Thread Aldrin
Congrats Will!! Aldrin Montana Computer Science PhD Student UC Santa Cruz On Mon, Mar 13, 2023 at 11:13 AM Dewey Dunnington wrote: > Congrats, Will! > > On Mon, Mar 13, 2023 at 3:07 PM Matt Topol wrote: > > > > Congrats Will! > > > > On Mon, Mar 13, 2023, 2

Re: [DISCUSS] Acero roadmap / philosophy

2023-03-09 Thread Aldrin
rait is seen as valuable (should be prioritized) or if additional support is going to be "as-needed". Note that I have a minimal understanding of how "large" substrait is and what proportion of it is already supported by Acero. Aldrin Montana Computer Science PhD Student UC Santa Cru

Re: Question about memory usage and type casting using pyarrow Table

2023-02-15 Thread Aldrin
]: https://arrow.apache.org/docs/python/generated/pyarrow.Field.html#pyarrow.Field.with_metadata Aldrin Montana Computer Science PhD Student UC Santa Cruz On Wed, Feb 15, 2023 at 2:52 PM Li Jin wrote: > Oh thanks that could be a workaround! I thought pa tables are supposed to > be immutabl

Re: [FLIGHT] Question about Flight Protocol Usage

2023-02-03 Thread Aldrin
point out, your main concern should probably be protocol compatibility. If you will have control of the client side of communications, then I think there are minimal concerns other than how you design what a Ticket or FlightInfo contains. Aldrin Montana Computer Science PhD Student UC Santa Cruz O

Re: [DISCUSS][C++] C++ API as a user-facing API

2022-09-29 Thread Aldrin
PIs, especially while Arrow is still growing. In addition, if I want to contribute to Arrow, I would also need to interact with the lower-level API at some point and I wouldn't necessarily want to start with trying to contribute code before using it in my own project(s). Aldrin Montana Compu

Re: [ANNOUNCE] New Arrow PMC member: Weston Pace

2022-09-06 Thread Aldrin
awesome, congrats! Aldrin Montana Computer Science PhD Student UC Santa Cruz On Tue, Sep 6, 2022 at 6:10 AM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > Congrats Weston! It is great to have you on the team! > > On Tue, 6 Sept 2022 at 06:10, Weston Pace wrote:

Re: Usage of the name Feather?

2022-08-31 Thread Aldrin
quot;IPC" is necessary, but it does push the intent into the name (unless it's actually a misnomer). Aldrin Montana Computer Science PhD Student UC Santa Cruz On Tue, Aug 30, 2022 at 8:29 PM Micah Kornfield wrote: > I think one source of ambiguity for Arrow files, at least for

Re: Using Acero in a distributed environment

2022-08-31 Thread Aldrin
e.com/presentation/d/1Nollf087CRhMmEAWcwfudIizIhF-ttPRGgaqmuXtSBQ/edit#slide=id.g12c2952ca0d_0_67 Aldrin Montana Computer Science PhD Student UC Santa Cruz On Wed, Aug 31, 2022 at 10:29 AM Jayjeet Chakraborty < jayjeetchakrabort...@gmail.com> wrote: > Thanks a lot for your reply, Nira

Re: [C++] Read Flight data source into Acero

2022-08-17 Thread Aldrin
I don't have any pointers, but just wanted to mention that I am going to try and figure this out quite a bit in the next week. I can try to create some relevant cookbook recipes as I plod along. Aldrin Montana Computer Science PhD Student UC Santa Cruz On Wed, Aug 17, 2022 at 9:15 AM L

Re: [C++] Disable anonymous namespaces in debug mode

2022-08-12 Thread Aldrin
ooh, that seems like a good idea to me. I'd be happy to follow that style. Aldrin Montana Computer Science PhD Student UC Santa Cruz On Wed, Aug 10, 2022 at 4:21 PM Sasha Krassovsky wrote: > Hi everyone, > I've recently had quite a few pain points while debugging due to the us

Re: [Rust] IPC Format / Feather support in Datafusion

2022-07-25 Thread Aldrin
oh, perfect. I'll just link the JIRAs. Thanks Kou! Aldrin Montana Computer Science PhD Student UC Santa Cruz On Mon, Jul 25, 2022 at 1:53 PM Sutou Kouhei wrote: > Hi, > > https://issues.apache.org/jira/browse/ARROW-17092 may be > related. > > Thanks, > -- > ko

Re: [Rust] IPC Format / Feather support in Datafusion

2022-07-25 Thread Aldrin
://arrow.apache.org/docs/format/Columnar.html#ipc-file-format [3]: https://arrow.apache.org/docs/cpp/ipc.html Aldrin Montana Computer Science PhD Student UC Santa Cruz On Fri, Jul 22, 2022 at 2:46 PM Will Jones wrote: > FYI It looks like there is active work to change the Python [1] and R

Re: [Rust] IPC Format / Feather support in Datafusion

2022-07-22 Thread Aldrin
sorry, I meant "...especially *for* the rust community if they are just using IPC directly for file formats." Aldrin Montana Computer Science PhD Student UC Santa Cruz On Fri, Jul 22, 2022 at 11:14 AM Aldrin wrote: > I always assumed IPC was when it was in memory, feather wa

Re: [Rust] IPC Format / Feather support in Datafusion

2022-07-22 Thread Aldrin
since V2. I'm not sure if a feather V3 would ever diverge from IPC format or if feather adds anything that's more filesystem friendly (versus other storage system interfaces) or makes filesystem performance more predictable. Aldrin Montana Computer Science PhD Student UC Santa Cruz On F

Re: arrow usage

2022-06-29 Thread Aldrin
table.html#_CPPv4N5arrow17ConcatenateTablesERKNSt6vectorINSt10shared_ptrI5Table24ConcatenateTablesOptionsP10MemoryPool Aldrin Montana Computer Science PhD Student UC Santa Cruz On Wed, Jun 29, 2022 at 9:53 AM L Ait wrote: > Hi, > > I would like to be added to the mailing list and would like it if there is > some dedicated forum to ask some questions. > > I would lik

Re: Arrow FunctionRegsitry usage in Python

2022-06-23 Thread Aldrin
done. [1]: https://arrow.apache.org/docs/cpp/compute.html#invoking-functions Aldrin Montana Computer Science PhD Student UC Santa Cruz On Wed, Jun 22, 2022 at 12:34 PM Murali S wrote: > Hi , > > I was wondering if it is possible to add a C++ Function to the Compute > Functi

Re: vectorized processing for arrow::take()

2022-06-23 Thread Aldrin
ector instructions? I think a little bit more context about what you know and what you're trying to do could also help others who know more about this function (and vectorization in Arrow in general) to chime in. Aldrin Montana Computer Science PhD Student UC Santa Cruz On Thu, Jun 23, 202

Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-05-20 Thread Aldrin
ally "C++" can be inserted ("A C++ compute...") Aldrin Montana Computer Science PhD Student UC Santa Cruz On Thu, May 19, 2022 at 6:07 PM Will Jones wrote: > > > > A relatively obscure name at least makes it easy to search for. I guess > > we'll wa

Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-05-09 Thread Aldrin
in that vein, I feel like you could also say that "ACE" has an "an" prefix to deflect the connotation of primacy: - An Arrow Compute Engine - An Arrow C++ Compute Engine Aldrin Montana Computer Science PhD Student UC Santa Cruz On Mon, May 9, 2022 at 2:12 PM Ian Cook wrot

Re: what is the default batch size of the RecordBatchReader?

2022-04-25 Thread Aldrin
code that verifies this, though. [1]: https://github.com/apache/arrow/blob/apache-arrow-7.0.0/cpp/src/parquet/properties.h#L556 [2]: https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset7Scanner7ToTableEv Aldrin Montana Computer Science PhD Student UC Santa Cruz On Mon, Apr 25, 2022 at

Re: storing per record batch metadata in arrow IPC file

2022-04-05 Thread Aldrin
E [2]: https://github.com/apache/arrow/blob/apache-arrow-7.0.0/cpp/src/arrow/ipc/writer.cc#L644 [3]: https://github.com/apache/arrow/blob/apache-arrow-7.0.0/cpp/src/arrow/ipc/writer.cc#L665 [4]: https://github.com/apache/arrow/blob/apache-arrow-7.0.0/cpp/src/arrow/ipc/writer.cc#L1253 Aldrin Montana Co

Re: Recompiling pyarrow package without static libraries

2022-02-14 Thread Aldrin
Thanks for the response! I'll try that out. It didn't occur to me that archlinux might be building the static libraries yet not installing them (and/or removing them). I'll check a few things and report back here what works. Aldrin Montana Computer Science PhD Student UC Santa Cru

Recompiling pyarrow package without static libraries

2022-02-11 Thread Aldrin
\ -DARROW_PYTHON=ON \ -DARROW_SIMD_LEVEL=AVX2\ -DARROW_USE_GLOG=ON\ -DARROW_WITH_BROTLI=ON \ -DPARQUET_REQUIRE_ENCRYPTION=ON make -C build Thank you for any help you can offer! Aldrin Montana Computer Science PhD Student UC Santa Cruz

Re: Jira Access

2021-12-22 Thread Aldrin
I think you just sign up: https://issues.apache.org/jira/secure/Dashboard.jspa Aldrin Montana Computer Science PhD Student UC Santa Cruz On Wed, Dec 22, 2021 at 9:08 PM Dulvin Witharane wrote: > Hi, > > I would love to have access to JIRA. Please enroll me or let me know the >

Re: [Parquet][C++][Python] Maximum Row Group Length Default

2021-11-22 Thread Aldrin
nding a lot of time parsing metadata and > much less time actually reading data. Thanks! > -- Aldrin Montana Computer Science PhD Student UC Santa Cruz

Re: [DISCUSS] Deprecate user@ in favor for github issues/discussions

2021-10-05 Thread Aldrin
> > How about trying GitHub issues and/or discussion in a > specified period without deprecating user@? e.g. between > 6.0.0 release and 7.0.0 release. Oooh, I like this idea. Aldrin Montana Computer Science PhD Student UC Santa Cruz On Mon, Oct 4, 2021 at 7:11 PM Sutou Kouhei

Re: [DISCUSS] Deprecate user@ in favor for github issues/discussions

2021-09-29 Thread Aldrin
n. To some degree, though, the ease of searching should mitigate this if people are properly cross-referencing as appropriate. But, I'm not entirely sure what this would be problematic for. Aldrin Montana Computer Science PhD Student UC Santa Cruz On Wed, Sep 29, 2021 at 11:16 AM Micah K

Re: Flight SQL

2021-08-19 Thread Aldrin
re, or distribution is prohibited. If you are not the > intended recipient, please contact the sender by reply email and destroy > all copies of the original message. Thank you. > -- Aldrin Montana Computer Science PhD Student UC Santa Cruz

Re: [ANNOUNCE] New Arrow PMC member: David M Li

2021-06-22 Thread Aldrin
Congrats David! Thanks for the contributions to documentation, it's pretty awesome. :) Aldrin Montana Computer Science PhD Student UC Santa Cruz On Tue, Jun 22, 2021 at 10:55 AM Daniël Heres wrote: > Congrats to you! > > On Tue, Jun 22, 2021, 19:42 Eduardo Ponce wrote: &g

Re: Long title on github page

2021-05-17 Thread Aldrin
art of the interface for efficiency - Arrow certainly has a data format, but that format is the crux of the interface (IMO). However, it also makes using other formats easy (via filesystem API and parquet reader/writers, etc.). So, focusing on the data format seems unnecessary in such

Re: New style in documentation on the website looks great

2021-05-05 Thread Aldrin
I very much enjoy the new theme Aldrin Montana Computer Science PhD Student UC Santa Cruz On Tue, May 4, 2021 at 11:47 PM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > Thanks, I am happy that people like it! > It's a slightly customized version of the py

Re: [Rust][DataFusion] Query Engine Design / DataFusion Implementation talk

2021-03-12 Thread Aldrin
This is great, thanks! Aldrin Montana Computer Science PhD Student UC Santa Cruz On Fri, Mar 12, 2021 at 11:39 AM Andrew Lamb wrote: > Here are links to the content, should anyone be interested: > > Query Engine Design and the Rust-Based DataFusion in Apache Arrow > reco

Re: Question about joining two tables

2021-03-11 Thread Aldrin
Great, thanks for the responses! That all makes sense :) On Thu, Mar 11, 2021 at 1:29 PM Benjamin Kietzman wrote: > Hi Aldrin, > > We don't have a unified repository for design docs that I'm aware of. > Governance-wise only JIRA and the mailing lists are canonical, bu

Re: Question about joining two tables

2021-03-11 Thread Aldrin
gure out how to navigate to a google drive or a page enumerating the various documents. Thank you! Aldrin Montana Computer Science PhD Student UC Santa Cruz On Thu, Mar 11, 2021 at 10:07 AM Benjamin Kietzman wrote: > Hi, > > This is not yet implemented but it is on the roadmap for th

Documenting the dataset/compute/expression APIs

2021-02-12 Thread Aldrin
t" OR description ~ "expression") Specifically, I'm interested in C++ rather than python (though, I suppose pyarrow documentation can help with the C++ documentation?). I wanted to ping here in case anyone has materials to gather, and also in case anyone knows of materials I've missed. Thanks! Aldrin Montana Computer Science PhD Student UC Santa Cruz

Re: Computational Kernels: the project overview

2021-01-29 Thread Aldrin
rt in (or completed) consolidating or expanding documentation on the compute and dataset/expression APIs and how they interact, etc.? Thanks! Aldrin Montana Computer Science PhD Student UC Santa Cruz On Mon, Nov 30, 2020 at 7:40 AM Wes McKinney wrote: > One objective of the precompiled kernels

[jira] [Created] (ARROW-2683) Resource Warning (Unclosed File) when using pyarrow.parquet.read_table()

2018-06-08 Thread Aldrin (JIRA)
Aldrin created ARROW-2683: - Summary: Resource Warning (Unclosed File) when using pyarrow.parquet.read_table() Key: ARROW-2683 URL: https://issues.apache.org/jira/browse/ARROW-2683 Project: Apache Arrow