Re: Iceberg/Hive properties handling

2020-11-25 Thread Ryan Blue
Yes, I think that is a good summary of the principles. #4 is correct because we provide some information that is informational (Hive schema) or tracked only by the metastore (best-effort current user). I also agree that it would be good to have a table identifier in HMS table metadata when loading

Re: Iceberg/Hive properties handling

2020-11-25 Thread Jacques Nadeau
Minor error, my last example should have been: db1.table1_etl_branch => nessie.folder1.folder2.folder3.table1@etl_branch -- Jacques Nadeau CTO and Co-Founder, Dremio On Wed, Nov 25, 2020 at 4:56 PM Jacques Nadeau wrote: > I agree with Ryan on the core principles here. As I understand them: >

Re: Iceberg/Hive properties handling

2020-11-25 Thread Jacques Nadeau
I agree with Ryan on the core principles here. As I understand them: 1. Iceberg metadata describes all properties of a table 2. Hive table properties describe "how to get to" Iceberg metadata (which catalog + possibly ptr, path, token, etc) 3. There could be default "how to get to" inf

Re: Iceberg/Hive properties handling

2020-11-25 Thread Ryan Blue
Thanks for working on this, Laszlo. I’ve been thinking about these problems as well, so this is a good time to have a discussion about Hive config. I think that Hive configuration should work mostly like other engines, where different configurations are used for different purposes. Different purpo

Re: Iceberg - Hive schema synchronization

2020-11-25 Thread Ryan Blue
I agree that a 1-to-1 type mapping is the right option. Some additional mappings should be supported; I think it should be fine to use VARCHAR in DDL to produce a string column in Iceberg. Iceberg is also strict about type promotion, and I don't think that we should confuse type promotion with how

Re: Integrating Existing Iceberg Tables with a Metastore

2020-11-25 Thread Ryan Blue
Great to hear you're up and running, Marko! Would you be interested in sharing your JDBC/Posgres metastore? I don't think we have one yet and it would be great to have a simple one that is backed by a database. On Wed, Nov 25, 2020 at 9:06 AM Marko Babic wrote: > Thanks for your help + suggesti

Re: Integrating Existing Iceberg Tables with a Metastore

2020-11-25 Thread Marko Babic
Thanks for your help + suggestions, Peter. Thanks for the pointer to Nessie, Jacques. To wrap up the thread: I took an afternoon to put together a Postgres-backed metastore to make sure I understood all the moving pieces and have some confidence that I know what would go into the migration no matt

Re: Iceberg - Hive schema synchronization

2020-11-25 Thread Zoltán Borók-Nagy
Hi Everyone, In Impala we face the same challenges. I think a strict 1-to-1 type mapping would be beneficial because that way we could derive the Iceberg schema from the Hive schema, not just the other way around. So we could just naturally create Iceberg tables via DDL. We should use the same ty

Iceberg/Hive properties handling

2020-11-25 Thread Laszlo Pinter
Hi All, I would like to start a discussion, how should we handle properties from various sources like Iceberg, Hive or global configuration. I've put together a short document , please have a look and