[email protected] has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21031 )

Change subject: IMPALA-12709: Add support for hierarchical metastore event 
processing
......................................................................


Patch Set 25:

(20 comments)

Rework

http://gerrit.cloudera.org:8080/#/c/21031/19//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21031/19//COMMIT_MSG@28
PS19, Line 28: related
> nit: related
Done


http://gerrit.cloudera.org:8080/#/c/21031/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21031/20//COMMIT_MSG@31
PS20, Line 31:
> Please describe the pseudo events in the commit message.
Done


http://gerrit.cloudera.org:8080/#/c/21031/19/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/21031/19/be/src/catalog/catalog-server.cc@235
PS19, Line 235: db_event_executors
> IMO, db_event_executors and table_event_executors could be a query option f
Changing db_event_executors and table_event_executors requires  resizing the 
existing db and table executors which are already in use(i.e., when db 
processors and table processors are already processing the events dispatched to 
them). IMHO, To keep it simple, it is  good to keep them as start up flags 
similar to enable_hierarchical_event_processing flag.


http://gerrit.cloudera.org:8080/#/c/21031/20/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/21031/20/be/src/catalog/catalog-server.cc@217
PS20, Line 217: hms_event_polling_interval_ms
> Instead of adding a new flag the old hms_event_polling_interval could be ch
For backward compatility, have not modified the existing flag. Have updated the 
description of existing seconds precision flag to describe about this new 
milliseconds precision flag.


http://gerrit.cloudera.org:8080/#/c/21031/20/be/src/catalog/catalog-server.cc@235
PS20, Line 235: DEFINE_int32(db_event_executors, 5,
> Please mention these new flags in the commit message so we can find them in
Done


http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File 
fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java@658
PS19, Line 658:
> Why did you feel the need to introduce a new pseudo class?
CommitTxnEvent and AbortTxnEvent can involve multiple tables in a transaction 
and processing these events modifies multiple table objects. Pseudo events are 
introduced such that a pseudo event is created for each table involved in the 
transaction and these pseudo events are processed independently at respective 
table processor.
Have described about the pseudo events in commit message now.


http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@846
PS19, Line 846:       LOG.info("EventId: {} EventType: COMMIT_TXN transaction 
id: {}", getEventId(),
> I think we'll be removing writeIds for the transaction if the event is self
We do not use Catalog.txnToWriteIds_ for self event. We add them to catalog in 
AllocWriteIdEvent and remove them upon AbortTxnEvent and CommitTxnEvent. These 
are used to track table write ids for a transaction and used in processing o 
those events. So we can hold the reference to table write Ids here and use them 
during processing.
These are not the validWriteIds_ in table object.


http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@966
PS19, Line 966:     tableName, wr
> What happens when different table event executors try to modify different t
When tables are processed in different table event executors, they are 
executing in different threads. They are processed concurrently.


http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@4081
PS19, Line 4081:     return resp;
> nit: adding?
Done


http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/events/DBBarrierEvent.java
File fe/src/main/java/org/apache/impala/catalog/events/DBBarrierEvent.java:

http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/events/DBBarrierEvent.java@24
PS19, Line 24: ache.impala.ca
> nit: Need some explanation regarding what is the objective of this new clas
Done


http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/events/DBBarrierEvent.java@55
PS19, Line 55:   private final MetastoreDatabaseEvent actualEvent_; // database 
event
> nit: should we log the counter of 'expectedProceedCount_' whenever we incr
Done


http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/DBEventExecutor.java
File fe/src/main/java/org/apache/impala/catalog/events/DBEventExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/DBEventExecutor.java@288
PS20, Line 288:     executorName_ = "DBEventExecutor-" + name;
              :     ses_ = Executors.newSingleThreadScheduledExecutor(
              :         new ThreadFactoryBuilder().s
> I am confused about locking - only a few public functions are synchronized
Had missed locking at some places. Have added them now.


http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java@52
PS19, Line 52:    * processor status is active.
> Why is it any different from pause state?
Event processor status is ACTIVE. But event processor do not read and process 
HMS events when it is put on hold. It is used in performance measurement. Hold 
helps to ensure that we do not process events until all the necessary events 
are generated in test. Once required events are generated, event processing can 
be resumed to process all events.


http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@832
PS20, Line 832:   }
> can you add some comments about the role of the pseudo events?
Done. Have added the description at class and commit message.


http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@863
PS20, Line 863:         dbEventExecutor.enqueue(new 
PseudoAbortTxnEvent(abortTxnEvent,
              :             entry.getKey().get
> Can we add synchronization for the drop and create events? I.e. the create
Yes. Have added a RenameTableBarrierEvent to synchronize the source and target 
table processors to process rename table event.


http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@863
PS20, Line 863:         dbEventExecutor.enqueue(new 
PseudoAbortTxnEvent(abortTxnEvent,
              :             entry.getKey().get
> I am not completely convinced that this is a good idea. Processing the drop
Yes. Have added a RenameTableBarrierEvent to synchronize the source and target 
table processors to process rename table event. This ensures that the new 
version of table is not created until the old version of table is dropped.


http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@882
PS20, Line 882:       pseudoCreateTableEvent.setEventId(event.getEventId());
              :
> Shouldn't these use  AfterTable instead of BeforeTable?
Thanks. Thats a bug. Have fixed it.


http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@885
PS20, Line 885:           new DropTableEvent(event.getCatalogOpExecutor(), e
> Can't it confuse other parts of the code if to events have the same event i
Have synchronized the processing of drop and create for rename with 
RenameTableBarrierEvent


http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java
File fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21031/20/fe/src/main/java/org/apache/impala/catalog/events/TableEventExecutor.java@223
PS20, Line 223:
> Do I understand correctly that the tableProcessors_ will have a fixed order
Do I understand correctly that the tableProcessors_ will have a fixed order, 
and the events will be processed in this order instead of event id order?
Yes.

Does this also apply to DROP/CREATE table events? In case of renames, what 
guarantees that the CREATE event won't be processed before the DROP event if 
the the new table name is earlier among the table processors?
Have added RenameTableBarrierEvent to synchronize the source and target table 
processors to process rename table event. This ensures that the new version of 
table is not created until the old version of table is dropped.


http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java
File fe/src/main/java/org/apache/impala/service/JniCatalog.java:

http://gerrit.cloudera.org:8080/#/c/21031/19/fe/src/main/java/org/apache/impala/service/JniCatalog.java@222
PS19, Line 222:           
BackendConfig.INSTANCE.getHMSPollingIntervalInSeconds() * 1000;
> Even with this feature disabled, do we want to deal with polling interval i
Yes. Added support for ms precision event polling interval. It works 
independent of enable_hierarchical_event_processing flag.



--
To view, visit http://gerrit.cloudera.org:8080/21031
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I76d8a739f9db6d40f01028bfd786a85d83f9e5d6
Gerrit-Change-Number: 21031
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]>
Gerrit-Comment-Date: Wed, 23 Oct 2024 16:25:24 +0000
Gerrit-HasComments: Yes

Reply via email to