gary-cloud opened a new issue, #733:
URL: https://github.com/apache/incubator-graphar/issues/733
### Describe the bug, including details regarding any error messages,
version, and platform.
## Summary
When reading edges from `graphar::EdgesCollection` / `EdgeIter` in
**segmented jump** or **batch scan** mode, edge properties such as
`creationDate` become **misaligned** (out of sync with the order seen in
`high_level_example`). In some cases, `EdgeIter` attempts to open non-existent
chunk files, leading to `runtime_error` / `abort` (e.g., `Failed to open ...
part10/chunk0`) or AddressSanitizer errors.
In contrast, the `high_level_example` sequential one-pass iteration (`auto
edge = *it`) usually appears correct, but repeated segmented traversal or
direct calls to `it.property()` trigger the bug.
---
## Steps to Reproduce (Minimal Example)
Using GraphAr sample data (`ldbc_sample`), the issue can be reproduced with
the following pseudocode:
```cpp
auto edges = graphar::EdgesCollection::Make(...,
graphar::AdjListType::ordered_by_source).value();
auto begin = edges->begin();
auto end = edges->end();
size_t count = 0;
for (auto it = begin; it != end; ++it) {
if (count > 2000) continue;
count++;
std::cout << it.source() << "," << it.destination()
<< "," << it.property<std::string>("creationDate").value() <<
std::endl;
}
// Register a new EdgeIter and perform segmented traversal
auto begin2 = edges->begin();
int i = 0;
for (auto it = begin2; it != end; ++it, i++) {
if (i <= 2000) continue;
if (i > 4000) break;
count++;
std::cout << "src=" << it.source() << ", dst=" << it.destination()
<< ", creationDate="
<< it.property<std::string>("creationDate").value()
<< std::endl;
}
```
On the second or later segmented traversal, `creationDate` values become
misaligned. When crossing vertex-chunk boundaries, the program may attempt to
open a non-existent file and crash with `IOError: failed to open local file
...`.
---
## Expected Behavior
* Properties read via **any traversal mode** (`auto edge = *it`,
`it.property(...)`, segmented/jump iteration, batch/parallel scans) should
always match the results of a **single sequential traversal**.
* Iteration should never attempt to open non-existent chunk files, nor crash.
---
## Actual Behavior
* Property values become misaligned starting from a certain position.
* Crossing chunk/vertex-chunk boundaries may trigger attempts to open
invalid file paths, causing crashes.
---
## Root Cause (Preliminary Analysis)
`EdgeIter` maintains two state layers:
* `vertex_chunk_index_` (current vertex-chunk)
* `cur_offset_` (offset within the vertex/edge chunk).
Each `AdjListPropertyArrowChunkReader` separately tracks its own
`chunk_index_`, `seek_offset_`, and cached `chunk_table_`. The synchronization
across these components is inconsistent:
* **`operator++()`**: When crossing chunk/vertex-chunk boundaries, it does
not always reliably update all `property_readers_`. In cases where
`next_chunk()` fails (e.g., `IndexError`), stale or unsafe reader states may
persist.
* **`operator*()`**: Only seeks the `adj_list_reader_` to `cur_offset_`,
without ensuring that `property_readers_` are aligned with the iterator state.
This can cause `Edge` construction to read stale or wrong chunks.
* **`property()`**: Does not enforce reader alignment on each call, so
calling `it.property()` directly may diverge in behavior from
`(*it).property()`.
This inconsistency leads to property misalignment, stale chunk access, and
attempts to open non-existent files.
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]