[GitHub] rdblue commented on issue #9: Vectorize reads and deserialize to Arrow

2018-12-07 Thread GitBox
rdblue commented on issue #9: Vectorize reads and deserialize to Arrow URL: https://github.com/apache/incubator-iceberg/issues/9#issuecomment-445412220 There's more context and discussion on the issue in the old Netflix project: https://github.com/Netflix/iceberg/issues/90

[GitHub] rdblue opened a new issue #43: Support snapshot selection in Spark read options

2018-12-07 Thread GitBox
rdblue opened a new issue #43: Support snapshot selection in Spark read options URL: https://github.com/apache/incubator-iceberg/issues/43 Spark passes query options from `DataFrameReader` to the Iceberg source. Iceberg should support selecting a specific snapshot ID or the table state at

[GitHub] rdblue opened a new issue #44: Support cryptographic integrity

2018-12-07 Thread GitBox
rdblue opened a new issue #44: Support cryptographic integrity URL: https://github.com/apache/incubator-iceberg/issues/44 Parquet encryption protects integrity of individual data files. However, in an untrusted storage, removal of one or more data file in a table might go unnoticed. Replac

[GitHub] ggershinsky commented on issue #20: Encryption in Data Files

2018-12-09 Thread GitBox
ggershinsky commented on issue #20: Encryption in Data Files URL: https://github.com/apache/incubator-iceberg/issues/20#issuecomment-445532295 Parquet encryption has additional metadata parameters (such as per-file aadPrefix's, column keys, etc) - but, since this is basically a column enc

[GitHub] danielcweeks commented on a change in pull request #30: Update to Spark 2.4

2018-12-10 Thread GitBox
danielcweeks commented on a change in pull request #30: Update to Spark 2.4 URL: https://github.com/apache/incubator-iceberg/pull/30#discussion_r240302343 ## File path: build.gradle ## @@ -302,7 +300,7 @@ project(':iceberg-presto-runtime') { shadow "org.apache.avro

[GitHub] danielcweeks commented on issue #30: Update to Spark 2.4

2018-12-10 Thread GitBox
danielcweeks commented on issue #30: Update to Spark 2.4 URL: https://github.com/apache/incubator-iceberg/pull/30#issuecomment-445896677 LTGM +1 This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] rdblue commented on a change in pull request #30: Update to Spark 2.4

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #30: Update to Spark 2.4 URL: https://github.com/apache/incubator-iceberg/pull/30#discussion_r240308367 ## File path: build.gradle ## @@ -302,7 +300,7 @@ project(':iceberg-presto-runtime') { shadow "org.apache.avro:avro:

[GitHub] rdblue closed pull request #30: Update to Spark 2.4

2018-12-10 Thread GitBox
rdblue closed pull request #30: Update to Spark 2.4 URL: https://github.com/apache/incubator-iceberg/pull/30 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from

[GitHub] dongjoon-hyun commented on a change in pull request #30: Update to Spark 2.4

2018-12-10 Thread GitBox
dongjoon-hyun commented on a change in pull request #30: Update to Spark 2.4 URL: https://github.com/apache/incubator-iceberg/pull/30#discussion_r240309257 ## File path: build.gradle ## @@ -302,7 +300,7 @@ project(':iceberg-presto-runtime') { shadow "org.apache.avr

[GitHub] mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240408506 ## File path: core/src/main/java/com/netflix/iceberg/BaseMetastoreTableOperations.java ## @@ -

[GitHub] mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240408875 ## File path: core/src/main/java/com/netflix/iceberg/BaseMetastoreTableOperations.java ## @@ -

[GitHub] mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240410224 ## File path: core/src/main/java/com/netflix/iceberg/hadoop/HadoopTableOperations.java ## @@ -

[GitHub] mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240410990 ## File path: core/src/main/java/com/netflix/iceberg/hadoop/HadoopTableOperations.java ## @@ -

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240415338 ## File path: core/src/main/java/com/netflix/iceberg/BaseMetastoreTableOperations.java ## @@ -1

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240416052 ## File path: core/src/main/java/com/netflix/iceberg/hadoop/HadoopTableOperations.java ## @@ -9

[GitHub] mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240416792 ## File path: core/src/main/java/com/netflix/iceberg/hadoop/HadoopTableOperations.java ## @@ -

[GitHub] mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240418333 ## File path: core/src/main/java/com/netflix/iceberg/BaseMetastoreTableOperations.java ## @@ -

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240421070 ## File path: core/src/main/java/com/netflix/iceberg/hadoop/HadoopTableOperations.java ## @@ -9

[GitHub] mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240423935 ## File path: core/src/main/java/com/netflix/iceberg/hadoop/HadoopTableOperations.java ## @@ -

[GitHub] mccheah commented on issue #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on issue #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#issuecomment-446020013 Addressed comments and is ready for another pass of reviews. This

[GitHub] mccheah commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240428533 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

[GitHub] mccheah commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240429601 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

[GitHub] mccheah commented on issue #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-10 Thread GitBox
mccheah commented on issue #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#issuecomment-446025134 Addressed comments and is ready for another round of review. --

[GitHub] mccheah commented on a change in pull request #6: Support customizing the location where data is written in Spark

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #6: Support customizing the location where data is written in Spark URL: https://github.com/apache/incubator-iceberg/pull/6#discussion_r240433428 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java ###

[GitHub] rdblue opened a new pull request #45: Lazily submit tasks in ParallelIterable and add cancellation.

2018-12-10 Thread GitBox
rdblue opened a new pull request #45: Lazily submit tasks in ParallelIterable and add cancellation. URL: https://github.com/apache/incubator-iceberg/pull/45 This removes the planner pool from ParallelIterable, which was used to submit all of the iterable tasks in parallel. This was used

[GitHub] rdblue opened a new pull request #46: Do not scan manifests with no deletes when expiring snapshots.

2018-12-10 Thread GitBox
rdblue opened a new pull request #46: Do not scan manifests with no deletes when expiring snapshots. URL: https://github.com/apache/incubator-iceberg/pull/46 This is an automated message from the Apache Git Service. To respo

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240437179 ## File path: core/src/main/java/com/netflix/iceberg/FileIO.java ## @@ -0,0 +1,51 @@ +/* + * Lic

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240437343 ## File path: core/src/main/java/com/netflix/iceberg/FileIO.java ## @@ -0,0 +1,51 @@ +/* + * Lic

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240437553 ## File path: core/src/main/java/com/netflix/iceberg/TableOperations.java ## @@ -56,27 +55,17 @@

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240437835 ## File path: core/src/main/java/com/netflix/iceberg/hadoop/SerializableConfiguration.java ## @

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240437921 ## File path: core/src/main/java/com/netflix/iceberg/hadoop/SerializableConfiguration.java ## @

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240438110 ## File path: core/src/test/java/com/netflix/iceberg/LocalTableOperations.java ## @@ -52,25 +53,

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240438485 ## File path: core/src/test/java/com/netflix/iceberg/TestTables.java ## @@ -173,14 +174,39 @@ pu

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240439930 ## File path: data/src/main/java/com/netflix/iceberg/data/TableScanIterable.java ## @@ -125,7 +1

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240440277 ## File path: core/src/test/java/com/netflix/iceberg/TestTables.java ## @@ -173,14 +174,39 @@ pu

[GitHub] rdblue commented on issue #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
rdblue commented on issue #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#issuecomment-446034462 Looks really close! Just minor issues right now. This is an automat

[GitHub] rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240442276 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

[GitHub] rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240442751 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

[GitHub] mccheah commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240442927 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

[GitHub] mccheah commented on issue #6: Support customizing the location where data is written in Spark

2018-12-10 Thread GitBox
mccheah commented on issue #6: Support customizing the location where data is written in Spark URL: https://github.com/apache/incubator-iceberg/pull/6#issuecomment-446036642 Addressed the comments and is ready for another round of review. Also made the test cleaner. --

[GitHub] mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240443385 ## File path: core/src/main/java/com/netflix/iceberg/hadoop/SerializableConfiguration.java ##

[GitHub] rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240443779 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

[GitHub] rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-10 Thread GitBox
rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240443779 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

[GitHub] mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240444049 ## File path: core/src/test/java/com/netflix/iceberg/LocalTableOperations.java ## @@ -52,25 +53

[GitHub] mccheah commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240444739 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

[GitHub] mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240447566 ## File path: data/src/main/java/com/netflix/iceberg/data/TableScanIterable.java ## @@ -125,7 +

[GitHub] mccheah commented on issue #14: Pluggable file I/O submodule in TableOperations

2018-12-10 Thread GitBox
mccheah commented on issue #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#issuecomment-446043037 Addressed all comments so far. This is an automated message from t

[GitHub] mccheah commented on issue #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-10 Thread GitBox
mccheah commented on issue #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#issuecomment-446043156 Addressed all comments so far. This is an auto

[GitHub] rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-11 Thread GitBox
rdblue commented on a change in pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#discussion_r240706640 ## File path: core/src/test/java/com/netflix/iceberg/LocalTableOperations.java ## @@ -52,25 +53,

[GitHub] rdblue closed issue #12: File I/O Submodule for TableOperations

2018-12-11 Thread GitBox
rdblue closed issue #12: File I/O Submodule for TableOperations URL: https://github.com/apache/incubator-iceberg/issues/12 This is an automated message from the Apache Git Service. To respond to the message, please log on Git

[GitHub] rdblue closed pull request #14: Pluggable file I/O submodule in TableOperations

2018-12-11 Thread GitBox
rdblue closed pull request #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a

[GitHub] rdblue commented on issue #14: Pluggable file I/O submodule in TableOperations

2018-12-11 Thread GitBox
rdblue commented on issue #14: Pluggable file I/O submodule in TableOperations URL: https://github.com/apache/incubator-iceberg/pull/14#issuecomment-446284186 Merged. Thanks @mccheah! Nice work. This is an automated message fr

[GitHub] rdblue commented on issue #45: Lazily submit tasks in ParallelIterable and add cancellation.

2018-12-11 Thread GitBox
rdblue commented on issue #45: Lazily submit tasks in ParallelIterable and add cancellation. URL: https://github.com/apache/incubator-iceberg/pull/45#issuecomment-446284445 @mccheah, it would be great to get a review from you on this one. ---

[GitHub] rdblue opened a new issue #47: Use the correct FileIO for all file interaction

2018-12-11 Thread GitBox
rdblue opened a new issue #47: Use the correct FileIO for all file interaction URL: https://github.com/apache/incubator-iceberg/issues/47 #14 split the file-related methods from `TableOperations` into a separate `FileIO` class. The new class is serializable so that it can be sent to other

[GitHub] rdblue commented on issue #47: Use the correct FileIO for all file interaction

2018-12-11 Thread GitBox
rdblue commented on issue #47: Use the correct FileIO for all file interaction URL: https://github.com/apache/incubator-iceberg/issues/47#issuecomment-446286744 @mccheah, I think this is needed for you to hook into all file reads. ---

[GitHub] rdblue commented on a change in pull request #6: Support customizing the location where data is written in Spark

2018-12-11 Thread GitBox
rdblue commented on a change in pull request #6: Support customizing the location where data is written in Spark URL: https://github.com/apache/incubator-iceberg/pull/6#discussion_r240713212 ## File path: spark/src/test/java/com/netflix/iceberg/spark/source/TestDataFrameWrites.java

[GitHub] rdblue commented on issue #6: Support customizing the location where data is written in Spark

2018-12-11 Thread GitBox
rdblue commented on issue #6: Support customizing the location where data is written in Spark URL: https://github.com/apache/incubator-iceberg/pull/6#issuecomment-446289039 Looks good to me, other than the duplicate test cases in `parameters`. --

[GitHub] rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-11 Thread GitBox
rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240715658 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

[GitHub] rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-11 Thread GitBox
rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240715658 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

[GitHub] rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source

2018-12-11 Thread GitBox
rdblue commented on a change in pull request #7: Allow custom hadoop properties to be loaded in the Spark data source URL: https://github.com/apache/incubator-iceberg/pull/7#discussion_r240716751 ## File path: spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java

<    1   2   3   4   5