O 25/03/24 ás 19:17, Julian Gilbey escribiu:
Hi all,
Hi :)
[NB: sent to d-science, d-python, d-devel and the RFP bug; reply-to set to d-science and the RFP bug only] An update on Apache Arrow, and in particular the Python library PyArrow. For those who don't know: Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. The project is developing a multi-language collection of libraries for solving systems problems related to in-memory analytical data processing. This includes such topics as: * Zero-copy shared memory and RPC-based data movement * Reading and writing file formats (like CSV, Apache ORC, and Apache Parquet) * In-memory analytics and query processing (from: https://arrow.apache.org/docs/index.html) Pandas has announced that Pandas 3.x will depend on PyArrow in a critical way (it will back the "string" datatype), and it is due to be released imminently. So this is a plea for anyone looking for something really helpful to do: it would be great to have a group of developers finally package this! There was some initial work done (see the RFP bug report for details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970021), but that is fairly old now. As Apache Arrow supports numerous languages, it may well benefit from having a group of developers with different areas of expertise to build it. (Or perhaps it would make more sense to split the upstream source into a collection of different Debian source packages for the different supported languages. I don't know.) Unfortunately I don't have the capacity to devote any time to it myself. Thanks in advance for anyone who can step forward for this! Best wishes, Julian
If possible, I would like to contribute. At work we use the Go and Python implementations, also, in the short term, we will start using the Rust one.
Just to point out, the Rust version has its own native implementation, here: https://github.com/apache/arrow-rs .
Cheers, Jose -- José Manuel Abuín Mosquera PhD. | Scientific Software Developer | Researcher http://jmabuin.github.io