Re: [PR] [SPARK-54446][ML] FPGrowth supports local filesystem [spark]

via GitHub Mon, 24 Nov 2025 00:59:08 -0800


zhengruifeng commented on PR #53150:
URL: https://github.com/apache/spark/pull/53150#issuecomment-3569616521


   > Arrow isn't intended for long term storage it's intended as a wire 
protocol -- I don't love using it for persisting models. I'm -0.9 on this 
change for now. Parquet seems like a better choice most likely.
   
   > does the arrow library provide APIs to write to local file?
   
   @holdenk @cloud-fan Arrow supports Random Access Files, and it provides 
[APIs](https://arrow.apache.org/docs/python/ipc.html#writing-and-reading-random-access-files)
 to write to local file. But our arrow utils mainly works with serialized 
`ArrowRecordBatches` the `Array[Byte]`, we will need add new helper functions 
for `ArrowRecordBatches` if we want to use arrow files APIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54446][ML] FPGrowth supports local filesystem [spark]

Reply via email to