NineSue commented on code in PR #782:
URL: https://github.com/apache/incubator-graphar/pull/782#discussion_r2474380847


##########
maven-projects/spark/snb-graphar-bridge/README.md:
##########
@@ -0,0 +1,182 @@
+# LDBC SNB to GraphAr Bridge
+
+This module provides a direct memory pipeline for converting LDBC Social 
Network Benchmark (SNB) data to Apache GraphAr format.
+
+## Design
+
+**Dual-Track Architecture**:
+- **Static entities** (Person, Place, Organisation, Tag, TagClass): RDD-based 
batch processing from LDBC dictionaries
+- **Dynamic entities** (Forum, Post, Comment): Streaming architecture 
processing LDBC activity serializer output
+
+**Key Features**:
+- Direct memory pipeline eliminates intermediate CSV file I/O
+- Supports all 31 LDBC SNB entity types (8 vertices + 23 edges)
+- Fully compliant with GraphAr v1.0 standard
+- Batch processing with configurable chunk sizes
+
+## Building
+
+### Prerequisites
+
+- Java 8 or Java 11
+- Maven 3.6+
+- SBT 1.x (for LDBC dependency)
+- Apache Spark 3.5.1
+
+### Build LDBC SNB Datagen Dependency
+
+```bash
+cd ../ldbc-snb-datagen
+sbt assembly
+```
+
+### Build This Module
+
+**IMPORTANT**: Build from project root, not from this directory.
+
+```bash
+cd /path/to/incubator-graphar/maven-projects
+
+# Build with dependencies
+mvn clean install -DskipTests \
+  -pl spark/graphar,spark/datasources-35,spark/snb-graphar-bridge -am
+```
+
+## Usage
+
+### Quick Start
+
+```bash
+spark-submit \
+  --class 
org.apache.graphar.datasources.ldbc.examples.LdbcEnhancedBridgeExample \
+  --master "local[*]" \
+  --jars 
../ldbc-snb-datagen/target/ldbc_snb_datagen_2.12_spark3.2-0.5.1+23-1d60a657-jar-with-dependencies.jar
 \
+  target/snb-graphar-bridge-0.13.0-SNAPSHOT.jar \
+  0.1 /tmp/graphar_output ldbc_test 256 256 parquet
+```
+
+### Parameters
+
+| Parameter | Description | Example |
+|-----------|-------------|---------|
+| scaleFactor | LDBC scale factor | `0.1` |
+| outputPath | Output directory | `/tmp/graphar_output` |
+| graphName | Graph identifier | `ldbc_test` |
+| vertexChunkSize | Vertex chunk size | `256` |
+| edgeChunkSize | Edge chunk size | `256` |
+| fileType | File format | `parquet` |
+
+**Scale Factor Guidelines**: Use SF≥0.1 for full testing (SF0.003 has limited 
dynamic entities).
+
+### Output Structure
+
+```
+/tmp/graphar_output/
+├── ldbc_test.graph.yml

Review Comment:
   **Reply**
   
   Yes, the `xxx.vertex.yml` and `xxx.edge.yml` files are generated. It's just 
that the example I provided in the README is somewhat misleading.



##########
maven-projects/spark/snb-graphar-bridge/README.md:
##########
@@ -0,0 +1,182 @@
+# LDBC SNB to GraphAr Bridge
+
+This module provides a direct memory pipeline for converting LDBC Social 
Network Benchmark (SNB) data to Apache GraphAr format.
+
+## Design
+
+**Dual-Track Architecture**:
+- **Static entities** (Person, Place, Organisation, Tag, TagClass): RDD-based 
batch processing from LDBC dictionaries
+- **Dynamic entities** (Forum, Post, Comment): Streaming architecture 
processing LDBC activity serializer output
+
+**Key Features**:
+- Direct memory pipeline eliminates intermediate CSV file I/O
+- Supports all 31 LDBC SNB entity types (8 vertices + 23 edges)
+- Fully compliant with GraphAr v1.0 standard
+- Batch processing with configurable chunk sizes
+
+## Building
+
+### Prerequisites
+
+- Java 8 or Java 11
+- Maven 3.6+
+- SBT 1.x (for LDBC dependency)
+- Apache Spark 3.5.1
+
+### Build LDBC SNB Datagen Dependency
+
+```bash
+cd ../ldbc-snb-datagen
+sbt assembly
+```
+
+### Build This Module
+
+**IMPORTANT**: Build from project root, not from this directory.
+
+```bash
+cd /path/to/incubator-graphar/maven-projects
+
+# Build with dependencies
+mvn clean install -DskipTests \
+  -pl spark/graphar,spark/datasources-35,spark/snb-graphar-bridge -am
+```
+
+## Usage
+
+### Quick Start
+
+```bash
+spark-submit \
+  --class 
org.apache.graphar.datasources.ldbc.examples.LdbcEnhancedBridgeExample \
+  --master "local[*]" \
+  --jars 
../ldbc-snb-datagen/target/ldbc_snb_datagen_2.12_spark3.2-0.5.1+23-1d60a657-jar-with-dependencies.jar
 \
+  target/snb-graphar-bridge-0.13.0-SNAPSHOT.jar \
+  0.1 /tmp/graphar_output ldbc_test 256 256 parquet
+```
+
+### Parameters
+
+| Parameter | Description | Example |
+|-----------|-------------|---------|
+| scaleFactor | LDBC scale factor | `0.1` |
+| outputPath | Output directory | `/tmp/graphar_output` |
+| graphName | Graph identifier | `ldbc_test` |
+| vertexChunkSize | Vertex chunk size | `256` |
+| edgeChunkSize | Edge chunk size | `256` |
+| fileType | File format | `parquet` |
+
+**Scale Factor Guidelines**: Use SF≥0.1 for full testing (SF0.003 has limited 
dynamic entities).
+
+### Output Structure
+
+```
+/tmp/graphar_output/
+├── ldbc_test.graph.yml

Review Comment:
   Yes, the `xxx.vertex.yml` and `xxx.edge.yml` files are generated. It's just 
that the example I provided in the README is somewhat misleading.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to