Re: Spark Views in Iceberg Catalog

Walaa Eldin Moustafa Tue, 15 Nov 2022 15:19:51 -0800

In our case, we store the view definitions in HMS, have them access both
Hive and Iceberg tables, and if they are expressed in Hive/Spark SQL, then
they should be accessible from both Spark and Trino with the same name
(Trino uses Coral <https://github.com/linkedin/coral> to query HiveQL views
in Trino). If you want to avoid stating the catalog name in the view
definition (e.g., explicitly stating `iceberg_catalog.table` vs
`hive.table`), you could build a catalog implementation that routes between
both table types. What we did in Spark 2.3 is to intercept V1 data source
(responsible for the Hive table resolution) to route to V2 and Iceberg
readers when applicable; hence we could switch between the Hive and Iceberg
versions of the table transparently.


Thanks,
Walaa.


On Tue, Nov 15, 2022 at 2:38 PM Walaa Eldin Moustafa <wa.moust...@gmail.com>
wrote:

> Hi Marc,
>
> Could you clarify where you store the view definitions in this case, and
> how the syntax looks like?
>
> Thanks,
> Walaa.
>
>
> On Tue, Nov 15, 2022 at 2:34 PM Ryan Blue <b...@tabular.io> wrote:
>
>> Hi Marc,
>>
>> This is expected. Although the ViewCatalog SPIP was approved by the Spark
>> community, the implementation hasn't made it in yet for v2.
>>
>> Ryan
>>
>> On Tue, Nov 15, 2022 at 11:38 AM Marc Laforet <mlafor...@gmail.com>
>> wrote:
>>
>>> Hi Iceberg folks,
>>>
>>> I'm working on a project where we're migrating tables from hive to
>>> iceberg. We are revamping our ingestion pipeline in parallel from batch to
>>> stream. Originally, our plan was to have two separate tables, a backfill
>>> table and a live table, that would be stitched together via a view for
>>> downstream consumers. This is proving rather difficult. In the absence of
>>> engine agnostic views we were going to prepend views with the engine type
>>> (ie trino_my_table and spark_my_table) but I receive a  
>>> org.apache.spark.sql.AnalysisException:
>>> Catalog iceberg_catalog does not support views error when trying to
>>> create the spark view. With the ongoing work towards engine agnostic views
>>> I'm unsure if this limitation is expected or easily surpassed with some
>>> config/spark change?
>>>
>>> Thank you for your time,
>>>
>>> Marc
>>>
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Re: Spark Views in Iceberg Catalog

Reply via email to