gary-cloud opened a new issue, #733:
URL: https://github.com/apache/incubator-graphar/issues/733

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   ## Summary
   
   When reading edges from `graphar::EdgesCollection` / `EdgeIter` in 
**segmented jump** or **batch scan** mode, edge properties such as 
`creationDate` become **misaligned** (out of sync with the order seen in 
`high_level_example`). In some cases, `EdgeIter` attempts to open non-existent 
chunk files, leading to `runtime_error` / `abort` (e.g., `Failed to open ... 
part10/chunk0`) or AddressSanitizer errors.
   
   In contrast, the `high_level_example` sequential one-pass iteration (`auto 
edge = *it`) usually appears correct, but repeated segmented traversal or 
direct calls to `it.property()` trigger the bug.
   
   ---
   
   ## Steps to Reproduce (Minimal Example)
   
   Using GraphAr sample data (`ldbc_sample`), the issue can be reproduced with 
the following pseudocode:
   
   ```cpp
   auto edges = graphar::EdgesCollection::Make(..., 
graphar::AdjListType::ordered_by_source).value();
   auto begin = edges->begin();
   auto end = edges->end();
   
   size_t count = 0;
   for (auto it = begin; it != end; ++it) {
       if (count > 2000) continue;
       count++;
       std::cout << it.source() << "," << it.destination()
                 << "," << it.property<std::string>("creationDate").value() << 
std::endl;
   }
   
   // Register a new EdgeIter and perform segmented traversal
   auto begin2 = edges->begin();
   int i = 0;
   for (auto it = begin2; it != end; ++it, i++) {
     if (i <= 2000) continue;
     if (i > 4000) break;
   
     count++;
     std::cout << "src=" << it.source() << ", dst=" << it.destination()
               << ", creationDate="
               << it.property<std::string>("creationDate").value()
               << std::endl;
   }
   ```
   
   On the second or later segmented traversal, `creationDate` values become 
misaligned. When crossing vertex-chunk boundaries, the program may attempt to 
open a non-existent file and crash with `IOError: failed to open local file 
...`.
   
   ---
   
   ## Expected Behavior
   
   * Properties read via **any traversal mode** (`auto edge = *it`, 
`it.property(...)`, segmented/jump iteration, batch/parallel scans) should 
always match the results of a **single sequential traversal**.
   * Iteration should never attempt to open non-existent chunk files, nor crash.
   
   ---
   
   ## Actual Behavior
   
   * Property values become misaligned starting from a certain position.
   * Crossing chunk/vertex-chunk boundaries may trigger attempts to open 
invalid file paths, causing crashes.
   
   ---
   
   ## Root Cause (Preliminary Analysis)
   
   `EdgeIter` maintains two state layers:
   
   * `vertex_chunk_index_` (current vertex-chunk)
   * `cur_offset_` (offset within the vertex/edge chunk).
   
   Each `AdjListPropertyArrowChunkReader` separately tracks its own 
`chunk_index_`, `seek_offset_`, and cached `chunk_table_`. The synchronization 
across these components is inconsistent:
   
   * **`operator++()`**: When crossing chunk/vertex-chunk boundaries, it does 
not always reliably update all `property_readers_`. In cases where 
`next_chunk()` fails (e.g., `IndexError`), stale or unsafe reader states may 
persist.
   * **`operator*()`**: Only seeks the `adj_list_reader_` to `cur_offset_`, 
without ensuring that `property_readers_` are aligned with the iterator state. 
This can cause `Edge` construction to read stale or wrong chunks.
   * **`property()`**: Does not enforce reader alignment on each call, so 
calling `it.property()` directly may diverge in behavior from 
`(*it).property()`.
   
   This inconsistency leads to property misalignment, stale chunk access, and 
attempts to open non-existent files.
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to