[ 
https://issues.apache.org/jira/browse/HUDI-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519490#comment-17519490
 ] 

Sagar Sumit commented on HUDI-2762:
-----------------------------------

[~rex_xiong] [~alexey.kudinkin] [~mengtao] [~rmahindra] 

This issue should still be reproducible. As [~rex_xiong] mentioned, default 
path filters in Hive will filter out such files.

However, we do have our own custom path filter (HoodieROTablePathFilter) but 
that only filters the base files. For insert-logs-only writes, we may have to 
write the index type to table config and then accept log files in RO view based 
on that config. This is a significant change. 

I don't see many people writing insertes only to log files. The primary use 
case is kafka-connect sync. I think in that case we can write a custom path 
filter for kafka-conect because there we can safely assume that all files are 
log files.

> Ensure hive can query insert only logs in MOR
> ---------------------------------------------
>
>                 Key: HUDI-2762
>                 URL: https://issues.apache.org/jira/browse/HUDI-2762
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: hive
>            Reporter: Rajesh Mahindra
>            Assignee: Alexey Kudinkin
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> Currently, we are able to query MOR tables that have base parquet files with 
> inserts an logs files with updates. However, we are currently unable to query 
> tables with insert only log files. Both _ro and _rt tables are returning 0 
> rows. However, hms does create the table and partitions for the table. 
>  
> One sample table is here:
> [https://s3.console.aws.amazon.com/s3/buckets/debug-hive-site?prefix=database/&region=us-east-2]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to