Good morning, I am experiencing problems with the RecordBatches stored in plasma in a particular situation.
If I return a RecordBatch as result of a python function, I am able to read just the metadata, while I get an error when reading the columns. For example, the following code def retrieve1(): client = plasma.connect('test', "", 0) key = "keynumber1keynumber1" pid = plasma.ObjectID(bytearray(key,'UTF-8')) [buff] = client .get_buffers([pid]) batch = pa.RecordBatchStreamReader(buff).read_next_batch() return batch batch = retrieve1() print(batch) print(batch.schema) print(batch[0]) Represents a simple python code in which a function is in charge of retrieving the RecordBatch from the plasma store, and then returns it to the caller. Running the previous example I get: <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48> FIELD1: int32 metadata -------- {} <pyarrow.lib.Int32Array object at 0x7fd0ebfc0f98> [ 1, 12, 23, 3, 21, 34 ] <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48> FIELD1: int32 metadata -------- {} Errore di segmentazione (core dump creato) If I retrieve and use the data in the same part of the code (as I do in the function retrieve1(), but it also works when I put everything in the main program.) everything runs without problems. Also the problem seems to be related to the particular case in which I retrieve the RecordBatch from the plasma store, since the following (simpler) code: def create(): test1 = [1, 12, 23, 3, 21, 34] test1 = pa.array(test1, pa.int32()) batch = pa.RecordBatch.from_arrays([test1], ['FIELD1']) print(batch) print(batch.schema) print(batch[0]) return batch batch1 = create() print(batch1) print(batch1.schema) print(batch1[0]) Prints: <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598> FIELD1: int32 <pyarrow.lib.Int32Array object at 0x7f5f691fd9a8> [ 1, 12, 23, 3, 21, 34 ] <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598> FIELD1: int32 <pyarrow.lib.Int32Array object at 0x7f5f7e29f318> [ 1, 12, 23, 3, 21, 34 ] Which is what I expect. Is this issue known or am I doing something wrong when retrieving the RecordBatch from plasma? Also I would like to pinpoint the fact that this problem was as easy to find as hard to re-create. For this reason, there can be other situations in which the same problem arises that I did not experienced, since I mostly deal with plasma and I’ve been using only python so long: the description I gave is not intended to be complete. Thank you, Alberto