SemyonSinchenko commented on code in PR #755:
URL: https://github.com/apache/incubator-graphar/pull/755#discussion_r2343000872


##########
docs/libraries/java/info/getting-started.md:
##########
@@ -0,0 +1,189 @@
+---
+id: getting-started
+title: Getting Started with Info Module
+sidebar_position: 1
+---
+
+# Getting Started with Info Module
+
+This article is a quick guide that explains how to work with GraphAr Java Info 
module. The Info module is part of the pure Java implementation of GraphAr and 
provides capabilities for parsing graph metadata (schema) from YAML files.
+
+## metadata Files
+
+GraphAr uses a group of Yaml files to save the metadata for a graph.
+
+### GraphInfo

Review Comment:
   I think it would be much better to make it part of 
https://graphar.apache.org/docs/specification/format, what do you think?



##########
docs/libraries/java/info/getting-started.md:
##########
@@ -0,0 +1,189 @@
+---
+id: getting-started
+title: Getting Started with Info Module
+sidebar_position: 1
+---
+
+# Getting Started with Info Module
+
+This article is a quick guide that explains how to work with GraphAr Java Info 
module. The Info module is part of the pure Java implementation of GraphAr and 
provides capabilities for parsing graph metadata (schema) from YAML files.
+
+## metadata Files
+
+GraphAr uses a group of Yaml files to save the metadata for a graph.
+
+### GraphInfo
+
+The graphInfo file defines the most basic metadata of a graph including its 
name, the root directory path of the data files, the vertex metadata and edge 
metadata files it contains, and the version of GraphAr. For example, the file 
"ldbc_sample.graph.yml" defines an example graph named "ldbc_sample", which 
includes one type of vertices ("person") and one type of edges ("person knows 
person").
+
+```yaml
+name: ldbc_sample
+prefix: ./
+vertices:
+  - person.vertex.yml
+edges:
+  - person_knows_person.edge.yml
+version: gar/v1
+```
+
+### VertexInfo
+
+Each vertexInfo file defines the metadata of a type of vertex, e.g., "person" 
in this case. The vertex chunk size, the path for vertex data files and the 
version of GraphAr are specified. These vertices could have some properties, 
which are divided into property groups. Each property group has its own file 
type (CSV, ORC or Parquet) and the prefix of the path for its data files, it 
also lists all properties in this group, with every property contains the name, 
data type and if it is the primary key.
+
+An example of the vertex metadata file is shown as follows:
+
+```yaml
+type: person
+chunk_size: 100
+prefix: vertex/person/
+property_groups:
+  - properties:
+      - name: id
+        data_type: int64
+        is_primary: true
+    file_type: parquet
+  - properties:
+      - name: firstName
+        data_type: string
+        is_primary: false
+      - name: lastName
+        data_type: string
+        is_primary: false
+      - name: gender
+        data_type: string
+        is_primary: false
+    file_type: parquet
+version: gar/v1
+```
+
+### EdgeInfo
+
+Each edge metadata file defines a single type of edges with specific types for 
the source vertex, destination vertex and the edge, e.g., "person_knows_person" 
in this case. It defines the metadata such as the edge chunk size, the source 
vertex chunk size, the destination vertex chunk size, if the edges are directed 
or not, the relative file path for edge data files, the adjLists and the 
version of GraphAr.
+
+An example of the edge metadata file is shown as follows:
+
+```yaml
+src_type: person
+edge_type: knows
+dst_type: person
+chunk_size: 1024
+src_chunk_size: 100
+dst_chunk_size: 100
+directed: false
+prefix: edge/person_knows_person/
+adj_lists:
+  - ordered: false
+    aligned_by: src
+    file_type: parquet
+  - ordered: true
+    aligned_by: src
+    file_type: parquet
+  - ordered: true
+    aligned_by: dst
+    file_type: parquet
+property_groups:
+  - file_type: parquet
+    properties:
+      - name: creationDate
+        data_type: string
+        is_primary: false
+version: gar/v1
+```
+
+In GraphAr format, separate data files are used to store the structure (called 
adjList) and the properties for edges. The adjList type can be either of 
**unordered_by_source**, **unordered_by_dest**, **ordered_by_source** or 
**ordered_by_dest**. For a specific type of adjList, the metadata includes its 
file path prefix, the file type, as well as all the property groups attached.
+
+## How to Use GraphAr Java Info Module
+
+### Load graph Info
+
+#### Load GraphInfo from local file system
+
+Here's a simple example of how to use the graphar-info module:
+
+```java
+import org.apache.graphar.info.GraphInfo;
+import org.apache.graphar.info.loader.GraphInfoLoader;
+import 
org.apache.graphar.info.loader.impl.LocalFileSystemStreamGraphInfoLoader;
+import java.nio.file.Paths;
+
+// Load graph info from a YAML file
+GraphInfoLoader loader = new LocalFileSystemStreamGraphInfoLoader();
+GraphInfo graphInfo = 
loader.loadGraphInfo(Paths.get("path/to/graph.yml").toUri());
+
+// Access graph metadata
+String graphName = graphInfo.getName();
+List<VertexInfo> vertices = graphInfo.getVertexInfos();
+List<EdgeInfo> edges = graphInfo.getEdgeInfos();
+
+// Access vertex metadata
+VertexInfo personVertex = vertices.get(0);
+String vertexType = personVertex.getType();
+long chunkSize = personVertex.getChunkSize();
+
+// Access edge metadata
+EdgeInfo knowsEdge = edges.get(0);
+String edgeType = knowsEdge.getEdgeType();
+boolean isDirected = knowsEdge.isDirected();
+```
+
+#### Custom YAML Loader Implementation
+
+The java-info module requires users to implement their own YAML reading 
interface, because we typically face data lakes where data may be stored 
anywhere (local file or HDFS or S3 or OSS...). Users can implement one or more 
of StringGraphInfoLoader, ReaderGraphInfoLoader, or StreamGraphInfoLoader (a 
local file system implementation is already provided).

Review Comment:
   ```suggestion
   The java-info module requires users to implement their own YAML reading 
interface, because we typically face data lakes where data may be stored 
anywhere (local file or HDFS or S3 or OSS...). Users can implement one or more 
of `StringGraphInfoLoader`, `ReaderGraphInfoLoader`, or `StreamGraphInfoLoader` 
(a local file system implementation is already provided).
   ```



##########
docs/libraries/java/info/getting-started.md:
##########
@@ -0,0 +1,189 @@
+---
+id: getting-started
+title: Getting Started with Info Module
+sidebar_position: 1
+---
+
+# Getting Started with Info Module
+
+This article is a quick guide that explains how to work with GraphAr Java Info 
module. The Info module is part of the pure Java implementation of GraphAr and 
provides capabilities for parsing graph metadata (schema) from YAML files.
+
+## metadata Files
+
+GraphAr uses a group of Yaml files to save the metadata for a graph.
+
+### GraphInfo
+
+The graphInfo file defines the most basic metadata of a graph including its 
name, the root directory path of the data files, the vertex metadata and edge 
metadata files it contains, and the version of GraphAr. For example, the file 
"ldbc_sample.graph.yml" defines an example graph named "ldbc_sample", which 
includes one type of vertices ("person") and one type of edges ("person knows 
person").
+
+```yaml
+name: ldbc_sample
+prefix: ./
+vertices:
+  - person.vertex.yml
+edges:
+  - person_knows_person.edge.yml
+version: gar/v1
+```
+
+### VertexInfo
+
+Each vertexInfo file defines the metadata of a type of vertex, e.g., "person" 
in this case. The vertex chunk size, the path for vertex data files and the 
version of GraphAr are specified. These vertices could have some properties, 
which are divided into property groups. Each property group has its own file 
type (CSV, ORC or Parquet) and the prefix of the path for its data files, it 
also lists all properties in this group, with every property contains the name, 
data type and if it is the primary key.
+
+An example of the vertex metadata file is shown as follows:
+
+```yaml
+type: person
+chunk_size: 100
+prefix: vertex/person/
+property_groups:
+  - properties:
+      - name: id
+        data_type: int64
+        is_primary: true
+    file_type: parquet
+  - properties:
+      - name: firstName
+        data_type: string
+        is_primary: false
+      - name: lastName
+        data_type: string
+        is_primary: false
+      - name: gender
+        data_type: string
+        is_primary: false
+    file_type: parquet
+version: gar/v1
+```
+
+### EdgeInfo
+
+Each edge metadata file defines a single type of edges with specific types for 
the source vertex, destination vertex and the edge, e.g., "person_knows_person" 
in this case. It defines the metadata such as the edge chunk size, the source 
vertex chunk size, the destination vertex chunk size, if the edges are directed 
or not, the relative file path for edge data files, the adjLists and the 
version of GraphAr.
+
+An example of the edge metadata file is shown as follows:
+
+```yaml
+src_type: person
+edge_type: knows
+dst_type: person
+chunk_size: 1024
+src_chunk_size: 100
+dst_chunk_size: 100
+directed: false
+prefix: edge/person_knows_person/
+adj_lists:
+  - ordered: false
+    aligned_by: src
+    file_type: parquet
+  - ordered: true
+    aligned_by: src
+    file_type: parquet
+  - ordered: true
+    aligned_by: dst
+    file_type: parquet
+property_groups:
+  - file_type: parquet
+    properties:
+      - name: creationDate
+        data_type: string
+        is_primary: false
+version: gar/v1
+```
+
+In GraphAr format, separate data files are used to store the structure (called 
adjList) and the properties for edges. The adjList type can be either of 
**unordered_by_source**, **unordered_by_dest**, **ordered_by_source** or 
**ordered_by_dest**. For a specific type of adjList, the metadata includes its 
file path prefix, the file type, as well as all the property groups attached.
+
+## How to Use GraphAr Java Info Module
+
+### Load graph Info
+
+#### Load GraphInfo from local file system
+
+Here's a simple example of how to use the graphar-info module:
+
+```java
+import org.apache.graphar.info.GraphInfo;
+import org.apache.graphar.info.loader.GraphInfoLoader;
+import 
org.apache.graphar.info.loader.impl.LocalFileSystemStreamGraphInfoLoader;
+import java.nio.file.Paths;
+
+// Load graph info from a YAML file
+GraphInfoLoader loader = new LocalFileSystemStreamGraphInfoLoader();
+GraphInfo graphInfo = 
loader.loadGraphInfo(Paths.get("path/to/graph.yml").toUri());
+
+// Access graph metadata
+String graphName = graphInfo.getName();
+List<VertexInfo> vertices = graphInfo.getVertexInfos();
+List<EdgeInfo> edges = graphInfo.getEdgeInfos();
+
+// Access vertex metadata
+VertexInfo personVertex = vertices.get(0);
+String vertexType = personVertex.getType();
+long chunkSize = personVertex.getChunkSize();
+
+// Access edge metadata
+EdgeInfo knowsEdge = edges.get(0);
+String edgeType = knowsEdge.getEdgeType();
+boolean isDirected = knowsEdge.isDirected();
+```
+
+#### Custom YAML Loader Implementation
+
+The java-info module requires users to implement their own YAML reading 
interface, because we typically face data lakes where data may be stored 
anywhere (local file or HDFS or S3 or OSS...). Users can implement one or more 
of StringGraphInfoLoader, ReaderGraphInfoLoader, or StreamGraphInfoLoader (a 
local file system implementation is already provided).
+
+Here's an example of how to implement a custom YAML loader by extending the 
StringGraphInfoLoader abstract class:

Review Comment:
   Offtop: at the moment we do not have Java API references on the website, 
when we have it would be very nice to have a link to the javadocs.



##########
docs/libraries/java/info/getting-started.md:
##########
@@ -0,0 +1,189 @@
+---
+id: getting-started
+title: Getting Started with Info Module
+sidebar_position: 1
+---
+
+# Getting Started with Info Module
+
+This article is a quick guide that explains how to work with GraphAr Java Info 
module. The Info module is part of the pure Java implementation of GraphAr and 
provides capabilities for parsing graph metadata (schema) from YAML files.
+
+## metadata Files
+
+GraphAr uses a group of Yaml files to save the metadata for a graph.

Review Comment:
   What do you think about adding a link to the corresponding part of docs?



##########
maven-projects/info/README.md:
##########
@@ -1,5 +1,112 @@
 # GraphAr Info
 
-Module contains a lightweight set of interfaces and utils for building a 
GraphAr read/write tools on top.
+The **graphar-info** module is part of the pure Java implementation of GraphAr.
+
+## Key Features
+
+The **java-info** module offers a lightweight yet powerful solution for 
**GraphAr metadata management**. It is responsible for:

Review Comment:
   ```suggestion
   The **java-info** module offers a lightweight, zero-dependencies yet 
powerful solution for **GraphAr metadata management**. It is responsible for:
   ```



##########
docs/libraries/java/info/getting-started.md:
##########
@@ -0,0 +1,189 @@
+---
+id: getting-started
+title: Getting Started with Info Module
+sidebar_position: 1
+---
+
+# Getting Started with Info Module
+
+This article is a quick guide that explains how to work with GraphAr Java Info 
module. The Info module is part of the pure Java implementation of GraphAr and 
provides capabilities for parsing graph metadata (schema) from YAML files.
+
+## metadata Files
+
+GraphAr uses a group of Yaml files to save the metadata for a graph.
+
+### GraphInfo
+
+The graphInfo file defines the most basic metadata of a graph including its 
name, the root directory path of the data files, the vertex metadata and edge 
metadata files it contains, and the version of GraphAr. For example, the file 
"ldbc_sample.graph.yml" defines an example graph named "ldbc_sample", which 
includes one type of vertices ("person") and one type of edges ("person knows 
person").
+
+```yaml
+name: ldbc_sample
+prefix: ./
+vertices:
+  - person.vertex.yml
+edges:
+  - person_knows_person.edge.yml
+version: gar/v1
+```
+
+### VertexInfo
+
+Each vertexInfo file defines the metadata of a type of vertex, e.g., "person" 
in this case. The vertex chunk size, the path for vertex data files and the 
version of GraphAr are specified. These vertices could have some properties, 
which are divided into property groups. Each property group has its own file 
type (CSV, ORC or Parquet) and the prefix of the path for its data files, it 
also lists all properties in this group, with every property contains the name, 
data type and if it is the primary key.
+
+An example of the vertex metadata file is shown as follows:
+
+```yaml
+type: person
+chunk_size: 100
+prefix: vertex/person/
+property_groups:
+  - properties:
+      - name: id
+        data_type: int64
+        is_primary: true
+    file_type: parquet
+  - properties:
+      - name: firstName
+        data_type: string
+        is_primary: false
+      - name: lastName
+        data_type: string
+        is_primary: false
+      - name: gender
+        data_type: string
+        is_primary: false
+    file_type: parquet
+version: gar/v1
+```
+
+### EdgeInfo
+
+Each edge metadata file defines a single type of edges with specific types for 
the source vertex, destination vertex and the edge, e.g., "person_knows_person" 
in this case. It defines the metadata such as the edge chunk size, the source 
vertex chunk size, the destination vertex chunk size, if the edges are directed 
or not, the relative file path for edge data files, the adjLists and the 
version of GraphAr.
+
+An example of the edge metadata file is shown as follows:
+
+```yaml
+src_type: person
+edge_type: knows
+dst_type: person
+chunk_size: 1024
+src_chunk_size: 100
+dst_chunk_size: 100
+directed: false
+prefix: edge/person_knows_person/
+adj_lists:
+  - ordered: false
+    aligned_by: src
+    file_type: parquet
+  - ordered: true
+    aligned_by: src
+    file_type: parquet
+  - ordered: true
+    aligned_by: dst
+    file_type: parquet
+property_groups:
+  - file_type: parquet
+    properties:
+      - name: creationDate
+        data_type: string
+        is_primary: false
+version: gar/v1
+```
+
+In GraphAr format, separate data files are used to store the structure (called 
adjList) and the properties for edges. The adjList type can be either of 
**unordered_by_source**, **unordered_by_dest**, **ordered_by_source** or 
**ordered_by_dest**. For a specific type of adjList, the metadata includes its 
file path prefix, the file type, as well as all the property groups attached.
+
+## How to Use GraphAr Java Info Module
+
+### Load graph Info
+
+#### Load GraphInfo from local file system
+
+Here's a simple example of how to use the graphar-info module:
+
+```java
+import org.apache.graphar.info.GraphInfo;
+import org.apache.graphar.info.loader.GraphInfoLoader;
+import 
org.apache.graphar.info.loader.impl.LocalFileSystemStreamGraphInfoLoader;
+import java.nio.file.Paths;
+
+// Load graph info from a YAML file
+GraphInfoLoader loader = new LocalFileSystemStreamGraphInfoLoader();
+GraphInfo graphInfo = 
loader.loadGraphInfo(Paths.get("path/to/graph.yml").toUri());
+
+// Access graph metadata
+String graphName = graphInfo.getName();
+List<VertexInfo> vertices = graphInfo.getVertexInfos();
+List<EdgeInfo> edges = graphInfo.getEdgeInfos();
+
+// Access vertex metadata
+VertexInfo personVertex = vertices.get(0);
+String vertexType = personVertex.getType();
+long chunkSize = personVertex.getChunkSize();
+
+// Access edge metadata
+EdgeInfo knowsEdge = edges.get(0);
+String edgeType = knowsEdge.getEdgeType();
+boolean isDirected = knowsEdge.isDirected();
+```
+
+#### Custom YAML Loader Implementation
+
+The java-info module requires users to implement their own YAML reading 
interface, because we typically face data lakes where data may be stored 
anywhere (local file or HDFS or S3 or OSS...). Users can implement one or more 
of StringGraphInfoLoader, ReaderGraphInfoLoader, or StreamGraphInfoLoader (a 
local file system implementation is already provided).
+
+Here's an example of how to implement a custom YAML loader by extending the 
StringGraphInfoLoader abstract class:

Review Comment:
   ```suggestion
   Here's an example of how to implement a custom YAML loader by extending the 
`StringGraphInfoLoader` abstract class:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to