Re: Getting started links

Ryan Blue Tue, 04 Dec 2018 17:11:29 -0800

Yevgeniy,
The project recently moved to Apache. I’m adding the new mailing list and
will update the old README with some pointers.

The ASF site has newer Javadoc:
https://iceberg.apache.org/javadoc/0.6.0/index.html?com/netflix/iceberg/package-summary.html

Right now, the easiest way to test is with a path-based table. That’s what
Spark supports, since we haven’t updated it to use the Hive metastore. You
can add Iceberg by downloading the iceberg-runtime Jar and dropping it into
your Spark classpath using --jars.

Spark 2.3.x only supports interacting with Iceberg or other v2 sources
through the DataFrame API and doesn’t support creating tables with DDL.
We’re working on getting those features into Spark, but for now you have to
create a table and then write to it from Spark. Here’s an example:

val schema = new Schema(...)
val spec = PartitionSpec.builderFor(schema).build() // add
configuration for your partitioning
val tables = new HadoopTables(spark.sparkContext.hadoopConfiguration)
val table = tables.create(schema, spec, "hdfs://nn:8020/path/to/table")

Once your table is created, you can write to it using the dataframe API. Be
sure you sort the data frame to group data in each partition.

df.write.format("iceberg").save("hdfs://nn:8020/path/to/table)

Once data is written, you can read from the table like this:

val df = spark.read.format("iceberg").load("hdfs://nn:8020/path/to/table")

We will be adding Hive support so you can refer to your table by name and
use the Hive metastore to track its metadata, but hadoop tables should get
you started with your evaluation. Thanks for reaching out!

rb

On Tue, Dec 4, 2018 at 7:51 AM Yevgeniy Viller <zhenya.vi...@gmail.com>
wrote:

>
> Hey Ryan,
>
> We are doing POC of Iceberg against our internal datawarehouse platform.
> I read Spec docs and example in Git. However, it is still little hard to
> get started without proper examples. Also current version in Git is 0.5.1,
> but API docs
> https://docs.google.com/document/d/1Q-zL5lSCle6NEEdyfiYsXYzX_Q8Qf0ctMyGBKslOswA/edit#heading=h.vga9bjlv1x2e
> is for 0.3.0. Do you have link current version of APIs specs?
>
> Thanks,
> Yevgeniy
>
> On Thursday, January 4, 2018 at 2:19:20 PM UTC-5, Ryan Blue wrote:
>>
>> The Iceberg repository is now public on github, here:
>> https://github.com/Netflix/iceberg
>>
>> The project is built with gradle and requires a Spark 2.3.0-SNAPSHOT (for
>> Datasource V2) and Parquet 1.9.1-SNAPSHOT (for API additions and bug fixes).
>>
>> An early version of the spec is available for comments here:
>> https://docs.google.com/document/d/1Q-zL5lSCle6NEEdyfiYsXYzX_Q8Qf0ctMyGBKslOswA/edit?usp=sharing
>>
>> Feedback is welcome!
>>
>> rb
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Iceberg Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to iceberg-devel+unsubscr...@googlegroups.com.
> To post to this group, send email to iceberg-de...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iceberg-devel/68ae23cd-709d-40ba-a4d2-0f0c4413f27e%40googlegroups.com
> <https://groups.google.com/d/msgid/iceberg-devel/68ae23cd-709d-40ba-a4d2-0f0c4413f27e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Getting started links

Reply via email to