Re: Plasma will be removed in Arrow 12.0.0

2023-03-29 Thread Will Jones
Thanks for the feedback on the benchmark. By switching from Unix domain socket to TCP and reducing the batch size to under 5MB I was able to get nearly 5Gbps throughput. I think Unix domain sockets are just slower on Macs. Updated that repo [1] [1] https://github.com/wjones127/arrow-ipc-bench/tree

Re: Plasma will be removed in Arrow 12.0.0

2023-03-17 Thread Antoine Pitrou
Le 17/03/2023 à 16:34, Alessandro Molina a écrit : How does PyArrow cope with multiprocessing.Manager? I'm not sure anyone tried it. Also, I don't think multiprocessing.Manager was updated to use pickle v5 out-of-band buffers (which would help reduce copying), so I wouldn't expect very high

Re: Plasma will be removed in Arrow 12.0.0

2023-03-17 Thread Alessandro Molina
How does PyArrow cope with multiprocessing.Manager? I remember there were some inefficiencies when Pickle was used (mostly related to slicing) but that in theory it should work. That is probably an easy enough replacement for Plasma and is standard. On Wed, Mar 15, 2023 at 10:24 PM Will Jones wro

Re: Plasma will be removed in Arrow 12.0.0

2023-03-16 Thread David Li
I'd suggest explicitly chunking the table into batches of maybe ~2 MiB (it appears the table is one contiguous chunk and I believe it'll just try to send that entire table as one chunk). IIRC the Flight benchmark over localhost should be up to a couple GiB/s. (That said, that doesn't match up to

Re: Plasma will be removed in Arrow 12.0.0

2023-03-16 Thread Antoine Pitrou
0.5 GB/second for local Flight transfer seems unexpectedly slow (one could expect 10x more), but perhaps tuning of default parameters needs to be improving. David Li can probably elaborate on that. I'll add that Unix sockets might not be the fastest anymore these days. It may be worth testi

Plasma will be removed in Arrow 12.0.0

2023-03-15 Thread Will Jones
Hello all, First, a reminder that Plasma has been deprecated and will be removed in the 12.0.0 release of the C++, Python, and Java Arrow libraries. [1] I know some used Plasma as a convenient way to share Arrow data between Python processes, so I pulled together a quick performance comparison ag