This is an automated email from the ASF dual-hosted git repository.
gangwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git
The following commit(s) were added to refs/heads/master by this push:
new dca2f42 PARQUET-2479: Update README with link to parquet website,
clarify contents (#243)
dca2f42 is described below
commit dca2f42d536cbd56cd5042d233c78b23e294bf05
Author: Andrew Lamb <[email protected]>
AuthorDate: Tue May 21 11:12:07 2024 -0400
PARQUET-2479: Update README with link to parquet website, clarify contents
(#243)
---
README.md | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/README.md b/README.md
index 18a7507..42578c7 100644
--- a/README.md
+++ b/README.md
@@ -19,12 +19,17 @@
# Parquet [](https://github.com/apache/parquet-format/actions)
-Parquet is a columnar storage format that supports nested data.
+This repository contains the specification for [Apache Parquet] and
+[Apache Thrift] definitions to read and write Parquet metadata.
-Parquet metadata is encoded using Apache Thrift.
+Apache Parquet is an open source, column-oriented data file format
+designed for efficient data storage and retrieval. It provides high
+performance compression and encoding schemes to handle complex data in
+bulk and is supported in many programming language and analytics
+tools.
-The `Parquet-format` project contains all Thrift definitions that are
necessary to create readers
-and writers for Parquet files.
+[Apache Parquet]: https://parquet.apache.org
+[Apache Thrift]: https://thrift.apache.org
## Motivation
@@ -176,7 +181,7 @@ following rules:
* If the min is +0, the row group may contain -0 values as well.
* If the max is -0, the row group may contain +0 values as well.
* When looking for NaN values, min and max should be ignored.
-
+
* BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY - Lexicographic unsigned byte-wise
comparison.