This is an automated email from the ASF dual-hosted git repository.
zivanfi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-mr.git
The following commit(s) were added to refs/heads/master by this push:
new aed9097 PARQUET-1311: Update README.md (#487)
aed9097 is described below
commit aed9097640c7adffe1151b32e86b5efc3702c657
Author: nandorKollar <[email protected]>
AuthorDate: Mon Jun 4 17:35:47 2018 +0200
PARQUET-1311: Update README.md (#487)
parquet-mr documentation was not up to date:
- pointed to broken URLs
- instructed to install old Thrift version
- current version was stated as 1.8.1, although 1.10.0 is already released
---
README.md | 86 ++++++++++++++++++++++++++++-------------------------------
dev/README.md | 4 +--
2 files changed, 43 insertions(+), 47 deletions(-)
diff --git a/README.md b/README.md
index f084f50..4b6b96a 100644
--- a/README.md
+++ b/README.md
@@ -20,9 +20,9 @@
Parquet MR [](http://travis-ci.org/apache/parquet-mr)
======
-Parquet-MR contains the java implementation of the [Parquet
format](https://github.com/apache/parquet-format).
+Parquet-MR contains the java implementation of the [Parquet
format](https://github.com/apache/parquet-format).
Parquet is a columnar storage format for Hadoop; it provides efficient storage
and encoding of data.
-Parquet uses the [record shredding and assembly
algorithm](https://github.com/Parquet/parquet-mr/wiki/The-striping-and-assembly-algorithms-from-the-Dremel-paper)
described in the Dremel paper to represent nested structures.
+Parquet uses the [record shredding and assembly
algorithm](https://github.com/julienledem/redelm/wiki/The-striping-and-assembly-algorithms-from-the-Dremel-paper)
described in the Dremel paper to represent nested structures.
You can find some details about the format and intended use cases in our
[Hadoop Summit 2013
presentation](http://www.slideshare.net/julienledem/parquet-hadoop-summit-2013)
@@ -49,11 +49,11 @@ sudo ldconfig
To build and install the thrift compiler, run:
```
-wget -nv http://archive.apache.org/dist/thrift/0.7.0/thrift-0.7.0.tar.gz
-tar xzf thrift-0.7.0.tar.gz
-cd thrift-0.7.0
+wget -nv http://archive.apache.org/dist/thrift/0.9.3/thrift-0.9.3.tar.gz
+tar xzf thrift-0.9.3.tar.gz
+cd thrift-0.9.3
chmod +x ./configure
-./configure --disable-gen-erl --disable-gen-hs --without-ruby
--without-haskell --without-erlang
+./configure --disable-gen-erl --disable-gen-hs --without-ruby
--without-haskell --without-erlang --without-php --without-nodejs
sudo make install
```
@@ -67,31 +67,29 @@ LC_ALL=C mvn clean install
## Features
-Parquet is a very active project, and new features are being added quickly;
below is the state as of June 2013.
-
-
-<table>
- <tr><th>Feature</th><th>In trunk</th><th>In
dev</th><th>Planned</th><th>Expected release</th></tr>
- <tr><td>Type-specific
encoding</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Hive integration</td><td>YES (<a href
="https://github.com/Parquet/parquet-mr/pull/28">28</a>)</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Pig
integration</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Cascading
integration</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Crunch integration</td><td>YES (<a href
="https://issues.apache.org/jira/browse/CRUNCH-277">CRUNCH-277</a>)</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Impala integration</td><td>YES
(non-nested)</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Java Map/Reduce
API</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Native Avro
support</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Native Thrift
support</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Complex structure
support</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Future-proofed
versioning</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>RLE</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Bit Packing</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Adaptive dictionary
encoding</td><td>YES</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Predicate pushdown</td><td>YES (<a href
="https://github.com/Parquet/parquet-mr/pull/68">68</a>)</td><td></td></td><td></td><td>1.0</td></tr>
- <tr><td>Column
stats</td><td>YES</td><td></td></td><td></td><td>2.0</td></tr>
- <tr><td>Delta
encoding</td><td>YES</td><td></td></td><td></td><td>2.0</td></tr>
- <tr><td>Native Protocol Buffers
support</td><td>YES</td><td></td><td></td><td>1.0</td></tr>
- <tr><td>Index pages</td><td></td><td></td></td><td>YES</td><td>2.0</td></tr>
-</table>
+Parquet is a very active project, and new features are being added quickly.
Here are a few features:
+
+
+* Type-specific encoding
+* Hive integration
+* Pig integration
+* Cascading integration
+* Crunch integration
+* Apache Arrow integration
+* Apache Scrooge integration
+* Impala integration (non-nested)
+* Java Map/Reduce API
+* Native Avro support
+* Native Thrift support
+* Native Protocol Buffers support
+* Complex structure support
+* Run-length encoding (RLE)
+* Bit Packing
+* Adaptive dictionary encoding
+* Predicate pushdown
+* Column stats
+* Delta encoding
+* Index pages
## Map/Reduce integration
@@ -138,46 +136,44 @@ Hive integration is provided via the
[parquet-hive](https://github.com/apache/pa
## Build
-to run the unit tests:
-mvn test
+To run the unit tests: `mvn test`
-to build the jars:
-mvn package
+To build the jars: `mvn package`
The build runs in [Travis CI](http://travis-ci.org/apache/parquet-mr):
[](http://travis-ci.org/apache/parquet-mr)
## Add Parquet as a dependency in Maven
-The current release is version `1.8.1`
+The current release is version `1.10.0`
```xml
<dependencies>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-common</artifactId>
- <version>1.8.1</version>
+ <version>1.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-encoding</artifactId>
- <version>1.8.1</version>
+ <version>1.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-column</artifactId>
- <version>1.8.1</version>
+ <version>1.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
- <version>1.8.1</version>
+ <version>1.10.0</version>
</dependency>
</dependencies>
```
### How To Contribute
-We prefer to receive contributions in the form of GitHub pull requests. Please
send pull requests against the
[github.com/apache/parquet-mr](https://github.com/apache/parquet-mr)
repository. If you've previously forked Parquet from its old location, you will
need to add a remote or update your origin remote to
https://github.com/apache/parquet-mr.git
+We prefer to receive contributions in the form of GitHub pull requests. Please
send pull requests against the
[parquet-mr](https://github.com/apache/parquet-mr) Git repository. If you've
previously forked Parquet from its old location, you will need to add a remote
or update your origin remote to https://github.com/apache/parquet-mr.git
If you are looking for some ideas on what to contribute, check out jira issues
for this project labeled
["pick-me-up"](https://issues.apache.org/jira/browse/PARQUET-5?jql=project%20%3D%20PARQUET%20and%20labels%20%3D%20pick-me-up%20and%20status%20%3D%20open).
Comment on the issue and/or contact
[[email protected]](http://mail-archives.apache.org/mod_mbox/parquet-dev/)
with your questions and ideas.
@@ -189,8 +185,8 @@ To contribute a patch:
1. Break your work into small, single-purpose patches if possible. It’s much
harder to merge in a large change with a lot of disjoint features.
2. Create a JIRA for your patch on the [Parquet Project
JIRA](https://issues.apache.org/jira/browse/PARQUET).
3. Submit the patch as a GitHub pull request against the master branch. For
a tutorial, see the GitHub guides on forking a repo and sending a pull request.
Prefix your pull request name with the JIRA name (ex:
https://github.com/apache/parquet-mr/pull/240).
- 4. Make sure that your code passes the unit tests. You can run the tests
with `mvn test` in the root directory.
- 5. Add new unit tests for your code.
+ 4. Make sure that your code passes the unit tests. You can run the tests
with `mvn test` in the root directory.
+ 5. Add new unit tests for your code.
We tend to do fairly close readings of pull requests, and you may get a lot of
comments. Some common issues that are not code structure related, but still
important:
* Use 2 spaces for whitespace. Not tabs, not 4 spaces. The number of the
spacing shall be 2.
@@ -212,11 +208,11 @@ We hold ourselves and the Parquet developer community to
two codes of conduct:
2. [The Twitter OSS Code of
Conduct](https://github.com/twitter/code-of-conduct/blob/master/code-of-conduct.md)
## Discussions
-* Mailing list:
[[email protected]](http://mail-archives.apache.org/mod_mbox/parquet-dev/)
+* Mailing list:
[[email protected]](http://mail-archives.apache.org/mod_mbox/parquet-dev/)
* Bug trackter: [jira](https://issues.apache.org/jira/browse/PARQUET)
* Discussions also take place in github pull requests
## License
Licensed under the Apache License, Version 2.0:
http://www.apache.org/licenses/LICENSE-2.0
-See also:
+See also:
diff --git a/dev/README.md b/dev/README.md
index 8fe30e0..b984b11 100644
--- a/dev/README.md
+++ b/dev/README.md
@@ -27,7 +27,7 @@ Merging a pull request requires being a committer on the
project.
have an apache and apache-github remote setup
```
git remote add apache-github https://github.com/apache/parquet-mr.git
-git remote add apache https://git-wip-us.apache.org/repos/asf/parquet-mr.git
+git remote add apache https://gitbox.apache.org/repos/asf?p=parquet-mr.git
```
run the following command
```
@@ -50,7 +50,7 @@ source repo/branch
target master
url https://api.github.com/repos/apache/parquet-mr/pulls/X
-Proceed with merging pull request #3? (y/n):
+Proceed with merging pull request #3? (y/n):
```
If this looks good, type y and hit enter.
```
--
To stop receiving notification emails like this one, please contact
[email protected].