NineSue commented on code in PR #782: URL: https://github.com/apache/incubator-graphar/pull/782#discussion_r2474380847
########## maven-projects/spark/snb-graphar-bridge/README.md: ########## @@ -0,0 +1,182 @@ +# LDBC SNB to GraphAr Bridge + +This module provides a direct memory pipeline for converting LDBC Social Network Benchmark (SNB) data to Apache GraphAr format. + +## Design + +**Dual-Track Architecture**: +- **Static entities** (Person, Place, Organisation, Tag, TagClass): RDD-based batch processing from LDBC dictionaries +- **Dynamic entities** (Forum, Post, Comment): Streaming architecture processing LDBC activity serializer output + +**Key Features**: +- Direct memory pipeline eliminates intermediate CSV file I/O +- Supports all 31 LDBC SNB entity types (8 vertices + 23 edges) +- Fully compliant with GraphAr v1.0 standard +- Batch processing with configurable chunk sizes + +## Building + +### Prerequisites + +- Java 8 or Java 11 +- Maven 3.6+ +- SBT 1.x (for LDBC dependency) +- Apache Spark 3.5.1 + +### Build LDBC SNB Datagen Dependency + +```bash +cd ../ldbc-snb-datagen +sbt assembly +``` + +### Build This Module + +**IMPORTANT**: Build from project root, not from this directory. + +```bash +cd /path/to/incubator-graphar/maven-projects + +# Build with dependencies +mvn clean install -DskipTests \ + -pl spark/graphar,spark/datasources-35,spark/snb-graphar-bridge -am +``` + +## Usage + +### Quick Start + +```bash +spark-submit \ + --class org.apache.graphar.datasources.ldbc.examples.LdbcEnhancedBridgeExample \ + --master "local[*]" \ + --jars ../ldbc-snb-datagen/target/ldbc_snb_datagen_2.12_spark3.2-0.5.1+23-1d60a657-jar-with-dependencies.jar \ + target/snb-graphar-bridge-0.13.0-SNAPSHOT.jar \ + 0.1 /tmp/graphar_output ldbc_test 256 256 parquet +``` + +### Parameters + +| Parameter | Description | Example | +|-----------|-------------|---------| +| scaleFactor | LDBC scale factor | `0.1` | +| outputPath | Output directory | `/tmp/graphar_output` | +| graphName | Graph identifier | `ldbc_test` | +| vertexChunkSize | Vertex chunk size | `256` | +| edgeChunkSize | Edge chunk size | `256` | +| fileType | File format | `parquet` | + +**Scale Factor Guidelines**: Use SF≥0.1 for full testing (SF0.003 has limited dynamic entities). + +### Output Structure + +``` +/tmp/graphar_output/ +├── ldbc_test.graph.yml Review Comment: **Reply** Yes, the `xxx.vertex.yml` and `xxx.edge.yml` files are generated. It's just that the example I provided in the README is somewhat misleading. ########## maven-projects/spark/snb-graphar-bridge/README.md: ########## @@ -0,0 +1,182 @@ +# LDBC SNB to GraphAr Bridge + +This module provides a direct memory pipeline for converting LDBC Social Network Benchmark (SNB) data to Apache GraphAr format. + +## Design + +**Dual-Track Architecture**: +- **Static entities** (Person, Place, Organisation, Tag, TagClass): RDD-based batch processing from LDBC dictionaries +- **Dynamic entities** (Forum, Post, Comment): Streaming architecture processing LDBC activity serializer output + +**Key Features**: +- Direct memory pipeline eliminates intermediate CSV file I/O +- Supports all 31 LDBC SNB entity types (8 vertices + 23 edges) +- Fully compliant with GraphAr v1.0 standard +- Batch processing with configurable chunk sizes + +## Building + +### Prerequisites + +- Java 8 or Java 11 +- Maven 3.6+ +- SBT 1.x (for LDBC dependency) +- Apache Spark 3.5.1 + +### Build LDBC SNB Datagen Dependency + +```bash +cd ../ldbc-snb-datagen +sbt assembly +``` + +### Build This Module + +**IMPORTANT**: Build from project root, not from this directory. + +```bash +cd /path/to/incubator-graphar/maven-projects + +# Build with dependencies +mvn clean install -DskipTests \ + -pl spark/graphar,spark/datasources-35,spark/snb-graphar-bridge -am +``` + +## Usage + +### Quick Start + +```bash +spark-submit \ + --class org.apache.graphar.datasources.ldbc.examples.LdbcEnhancedBridgeExample \ + --master "local[*]" \ + --jars ../ldbc-snb-datagen/target/ldbc_snb_datagen_2.12_spark3.2-0.5.1+23-1d60a657-jar-with-dependencies.jar \ + target/snb-graphar-bridge-0.13.0-SNAPSHOT.jar \ + 0.1 /tmp/graphar_output ldbc_test 256 256 parquet +``` + +### Parameters + +| Parameter | Description | Example | +|-----------|-------------|---------| +| scaleFactor | LDBC scale factor | `0.1` | +| outputPath | Output directory | `/tmp/graphar_output` | +| graphName | Graph identifier | `ldbc_test` | +| vertexChunkSize | Vertex chunk size | `256` | +| edgeChunkSize | Edge chunk size | `256` | +| fileType | File format | `parquet` | + +**Scale Factor Guidelines**: Use SF≥0.1 for full testing (SF0.003 has limited dynamic entities). + +### Output Structure + +``` +/tmp/graphar_output/ +├── ldbc_test.graph.yml Review Comment: Yes, the `xxx.vertex.yml` and `xxx.edge.yml` files are generated. It's just that the example I provided in the README is somewhat misleading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
