[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

GitBox Fri, 31 Dec 2021 01:52:30 -0800


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1003327348



   > > @dongkelun @xushiyan I offer another solution to discuss.
   > > Query incrementally in hive need to set 
`hoodie.%s.consume.start.timestamp` which is used in 
`HoodieHiveUtils.readStartCommitTime`。Currently, we pass the 
`hoodie.table.name` named `tableName` to this function. We can add configs 
`hoodie.datasource.write.database.name` in `DataSourceWriteOptions` and 
`hoodie.database.name` in `HoodieTableConfig`. And if `database.name` provided, 
we joint the `database.name` and `table.name` and pass it to 
`readStartCommitTime`. And then, use can set 
`hoodie.dbName.tableName.consume.start.timestamp` in hive and query.
   > > Also, `hoodie.datasource.write.database.name` and `hoodie.database.name` 
can reuse in other scene.
   > > @xushiyan what do you think.
   > 
   > @xushiyan @YannByron I probably understand the solution.
   > 
   > SQL will persist the database name to ` hoodie.properties` by default, DF 
is selectively persisted through optional database parameters. Then, in 
incremental query, if set ` databaseName.tableName`, we match 
`databaseName.tableName`. If it is inconsistent or there is no databaseName, 
incremental query will not be performed. If consistent, perform an incremental 
query.If the incremental query does not have a database name set, does not 
match the database name, only the table name
   > 
   > So, which parameter should DF use to persist the database name？
   
   @xushiyan Hello, do you think this idea is OK? If so, I'll submit a version 
according to this idea first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

Reply via email to