Good morning,

I am experiencing problems with the RecordBatches stored in plasma in a 
particular situation.

If I return a RecordBatch as result of a python function, I am able to read 
just the metadata, while I get an error when reading the columns.

For example, the following code
def retrieve1():
        client = plasma.connect('test', "", 0)

        key = "keynumber1keynumber1"
        pid = plasma.ObjectID(bytearray(key,'UTF-8'))

        [buff] = client .get_buffers([pid])
        batch = pa.RecordBatchStreamReader(buff).read_next_batch()
        return batch

batch = retrieve1()
print(batch)
print(batch.schema)
print(batch[0])

Represents a simple python code in which a function is in charge of retrieving 
the RecordBatch from the plasma store, and then returns it to the caller. 
Running the previous example I get:
<pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
FIELD1: int32
metadata
--------
{}
<pyarrow.lib.Int32Array object at 0x7fd0ebfc0f98>
[
  1,
  12,
  23,
  3,
  21,
  34
]
<pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48>
FIELD1: int32
metadata
--------
{}
Errore di segmentazione (core dump creato)


If I retrieve and use the data in the same part of the code (as I do in the 
function retrieve1(), but it also works when I put everything in the main 
program.) everything runs without problems.

Also the problem seems to be related to the particular case in which I retrieve 
the RecordBatch from the plasma store, since the following (simpler) code:
def create():
        test1 = [1, 12, 23, 3, 21, 34]
        test1 = pa.array(test1, pa.int32())

        batch = pa.RecordBatch.from_arrays([test1], ['FIELD1'])
        print(batch)
        print(batch.schema)
        print(batch[0])
        return batch

batch1 = create()
print(batch1)
print(batch1.schema)
print(batch1[0])

Prints:

<pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
FIELD1: int32
<pyarrow.lib.Int32Array object at 0x7f5f691fd9a8>
[
  1,
  12,
  23,
  3,
  21,
  34
]
<pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598>
FIELD1: int32
<pyarrow.lib.Int32Array object at 0x7f5f7e29f318>
[
  1,
  12,
  23,
  3,
  21,
  34
]

Which is what I expect.

Is this issue known or am I doing something wrong when retrieving the 
RecordBatch from plasma?

Also I would like to pinpoint the fact that this problem was as easy to find as 
hard to re-create. For this reason, there can be other situations in which the 
same problem arises that I did not experienced, since I mostly deal with plasma 
and I’ve been using only python so long: the description I gave is not intended 
to be complete.

Thank you,
Alberto

Reply via email to