ccoffline commented on a change in pull request #6416:
URL: https://github.com/apache/incubator-doris/pull/6416#discussion_r695365237



##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/alter/MaterializedViewHandler.java
##########
@@ -1165,15 +1152,15 @@ private void getOldAlterJobInfos(Database db, 
List<List<Comparable>> rollupJobIn
 
 
         for (AlterJob selectedJob : jobs) {
-            OlapTable olapTable = (OlapTable) 
db.getTable(selectedJob.getTableId());
-            if (olapTable == null) {
-                continue;
-            }
-            olapTable.readLock();
             try {
-                selectedJob.getJobInfo(rollupJobInfos, olapTable);
-            } finally {
-                olapTable.readUnlock();
+                OlapTable olapTable = 
db.getTableOrMetaException(selectedJob.getTableId(), Table.TableType.OLAP);
+                olapTable.readLock();
+                try {
+                    selectedJob.getJobInfo(rollupJobInfos, olapTable);
+                } finally {
+                    olapTable.readUnlock();
+                }
+            } catch (MetaNotFoundException ignored) {

Review comment:
       This is equals to `continue`

##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/alter/MaterializedViewHandler.java
##########
@@ -883,26 +878,20 @@ private void removeJobFromRunningQueue(AlterJobV2 
alterJob) {
     }
 
     private void changeTableStatus(long dbId, long tableId, OlapTableState 
olapTableState) {
-        Database db = Catalog.getCurrentCatalog().getDb(dbId);
-        if (db == null) {
-            LOG.warn("db {} has been dropped when changing table {} status 
after rollup job done",
-                    dbId, tableId);
-            return;
-        }
-        OlapTable tbl = (OlapTable) db.getTable(tableId);
-        if (tbl == null) {
-            LOG.warn("table {} has been dropped when changing table status 
after rollup job done",
-                    tableId);
-            return;
-        }
-        tbl.writeLock();
         try {
-            if (tbl.getState() == olapTableState) {
-                return;
+            Database db = 
Catalog.getCurrentCatalog().getDbOrMetaException(dbId);
+            OlapTable olapTable = db.getTableOrMetaException(tableId, 
Table.TableType.OLAP);
+            olapTable.writeLock();
+            try {
+                if (olapTable.getState() == olapTableState) {
+                    return;
+                }
+                olapTable.setState(olapTableState);
+            } finally {
+                olapTable.writeUnlock();
             }
-            tbl.setState(olapTableState);
-        } finally {
-            tbl.writeUnlock();
+        } catch (MetaNotFoundException e) {
+            LOG.warn("[INCONSISTENT META] changing table status failed after 
rollup job done", e);

Review comment:
       This is called by `replayAlterJobV2()` and `onJobDone()`, which should 
both not throw this exception.

##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/Alter.java
##########
@@ -260,17 +251,11 @@ private void processModifyColumnComment(Database db, 
OlapTable tbl, List<AlterCl
         }
     }
 
-    public void replayModifyComment(ModifyCommentOperationLog operation) {
+    public void replayModifyComment(ModifyCommentOperationLog operation) 
throws MetaNotFoundException {

Review comment:
       It is considerable. But "replay" throwing any exception is an extremely 
big risk, which will cause all FE crush and cannot recover. These 
`MetaNotFoundException` are mostly thrown by getDb and getTable, due to the 
lock inconsistence that makes editlogs out of order. Semantically, throwing 
this exception means some metadata the editlog want to edit on is missing 
during replay, which makes this replay unable to continue. Just like 
getDb/Table, This kind of inconsistence cannot recover anyway, and these 
editlogs are theoretically only affect lost metadata.
   I prefer to use `MetaNotFoundException` that indicate metadata has lost 
rather than other exception which may cause confuse. The inconsistence can be 
checked from warning log.

##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/RollupJobV2.java
##########
@@ -667,22 +638,26 @@ private void replayCancelled(RollupJobV2 replayedJob) {
 
     @Override
     public void replay(AlterJobV2 replayedJob) {
-        RollupJobV2 replayedRollupJob = (RollupJobV2) replayedJob;
-        switch (replayedJob.jobState) {
-            case PENDING:
-                replayCreateJob(replayedRollupJob);
-                break;
-            case WAITING_TXN:
-                replayPendingJob(replayedRollupJob);
-                break;
-            case FINISHED:
-                replayRunningJob(replayedRollupJob);
-                break;
-            case CANCELLED:
-                replayCancelled(replayedRollupJob);
-                break;
-            default:
-                break;
+        try {
+            RollupJobV2 replayedRollupJob = (RollupJobV2) replayedJob;
+            switch (replayedJob.jobState) {
+                case PENDING:
+                    replayCreateJob(replayedRollupJob);
+                    break;
+                case WAITING_TXN:
+                    replayPendingJob(replayedRollupJob);
+                    break;
+                case FINISHED:
+                    replayRunningJob(replayedRollupJob);
+                    break;
+                case CANCELLED:
+                    replayCancelled(replayedRollupJob);
+                    break;
+                default:
+                    break;
+            }
+        } catch (MetaNotFoundException e) {
+            LOG.warn("[INCONSISTENT META] replay rollup job failed {}", 
replayedJob.getJobId(), e);

Review comment:
       It is called by `replayAlterJobV2`. I'm afraid it would be different 
from the origin.
   
https://github.com/apache/incubator-doris/blob/138e7e896dee6842d8af7583d59288190371cd86/fe/fe-core/src/main/java/org/apache/doris/alter/AlterHandler.java#L460-L469

##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/AlterHandler.java
##########
@@ -302,40 +302,37 @@ protected void jobDone(AlterJob alterJob) {
         }
     }
 
-    public void replayInitJob(AlterJob alterJob, Catalog catalog) {
-        Database db = catalog.getDb(alterJob.getDbId());
+    public void replayInitJob(AlterJob alterJob, Catalog catalog) throws 
MetaNotFoundException {
+        Database db = catalog.getDbOrMetaException(alterJob.getDbId());
         alterJob.replayInitJob(db);
         // add rollup job
         addAlterJob(alterJob);
     }
     
-    public void replayFinishing(AlterJob alterJob, Catalog catalog) {
-        Database db = catalog.getDb(alterJob.getDbId());
+    public void replayFinishing(AlterJob alterJob, Catalog catalog) throws 
MetaNotFoundException {
+        Database db = catalog.getDbOrMetaException(alterJob.getDbId());
         alterJob.replayFinishing(db);
         alterJob.setState(JobState.FINISHING);
         // !!! the alter job should add to the cache again, because the alter 
job is deserialized from journal
         // it is a different object compared to the cache
         addAlterJob(alterJob);
     }
 
-    public void replayFinish(AlterJob alterJob, Catalog catalog) {
-        Database db = catalog.getDb(alterJob.getDbId());
+    public void replayFinish(AlterJob alterJob, Catalog catalog) throws 
MetaNotFoundException {
+        Database db = catalog.getDbOrMetaException(alterJob.getDbId());
         alterJob.replayFinish(db);
         alterJob.setState(JobState.FINISHED);
 
         jobDone(alterJob);
     }
 
-    public void replayCancel(AlterJob alterJob, Catalog catalog) {
+    public void replayCancel(AlterJob alterJob, Catalog catalog) throws 
MetaNotFoundException {
         removeAlterJob(alterJob.getTableId());
         alterJob.setState(JobState.CANCELLED);
-        Database db = catalog.getDb(alterJob.getDbId());
-        if (db != null) {
-            // we log rollup job cancelled even if db is dropped.
-            // so check db != null here
-            alterJob.replayCancel(db);
-        }
-
+        // we log rollup job cancelled even if db is dropped.
+        // so check db != null here
+        Database db = catalog.getDbOrMetaException(alterJob.getDbId());
+        alterJob.replayCancel(db);

Review comment:
       I'll fix this.

##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/RollupJobV2.java
##########
@@ -812,10 +787,14 @@ public void gsonPostProcess() throws IOException {
             return;
         }
         // parse the define stmt to schema
-        SqlParser parser = new SqlParser(new SqlScanner(new 
StringReader(origStmt.originStmt),
-                                                        
SqlModeHelper.MODE_DEFAULT));
+        SqlParser parser = new SqlParser(new SqlScanner(new 
StringReader(origStmt.originStmt), SqlModeHelper.MODE_DEFAULT));
         ConnectContext connectContext = new ConnectContext();
-        Database db = Catalog.getCurrentCatalog().getDb(dbId);
+        Database db;
+        try {
+            db = Catalog.getCurrentCatalog().getDbOrMetaException(dbId);
+        } catch (MetaNotFoundException e) {
+            throw new IOException("error happens when parsing create 
materialized view stmt: " + origStmt, e);

Review comment:
       The origin code will throw NPE, I really don't know what to do here

##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeHandler.java
##########
@@ -1889,17 +1889,14 @@ public void cancel(CancelStmt stmt) throws DdlException 
{
         Preconditions.checkState(!Strings.isNullOrEmpty(dbName));
         Preconditions.checkState(!Strings.isNullOrEmpty(tableName));
 
-        Database db = Catalog.getCurrentCatalog().getDb(dbName);
-        if (db == null) {
-            throw new DdlException("Database[" + dbName + "] does not exist");
-        }
+        Database db = Catalog.getCurrentCatalog().getDbOrDdlException(dbName);
 
         AlterJob schemaChangeJob = null;
         AlterJobV2 schemaChangeJobV2 = null;
 
-        OlapTable olapTable = null;
+        OlapTable olapTable;
         try {
-            olapTable = (OlapTable) db.getTableOrThrowException(tableName, 
Table.TableType.OLAP);
+            olapTable = db.getTableOrMetaException(tableName, 
Table.TableType.OLAP);

Review comment:
       ok, that make sense

##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeJobV2.java
##########
@@ -218,9 +215,9 @@ protected void runPendingJob() throws AlterCancelException {
         }
         MarkedCountDownLatch<Long, Long> countDownLatch = new 
MarkedCountDownLatch<>(totalReplicaNum);
 
-        OlapTable tbl = null;
+        OlapTable tbl;
         try {
-            tbl = (OlapTable) db.getTableOrThrowException(tableId, 
TableType.OLAP);
+            tbl = db.getTableOrMetaException(tableId, TableType.OLAP);

Review comment:
       ok

##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeJobV2.java
##########
@@ -370,14 +367,11 @@ protected void runWaitingTxnJob() throws 
AlterCancelException {
         }
 
         LOG.info("previous transactions are all finished, begin to send schema 
change tasks. job: {}", jobId);
-        Database db = Catalog.getCurrentCatalog().getDb(dbId);
-        if (db == null) {
-            throw new AlterCancelException("Databasee " + dbId + " does not 
exist");
-        }
+        Database db = Catalog.getCurrentCatalog().getDbOrException(dbId, s -> 
new AlterCancelException("Database " + s + " does not exist"));
 
-        OlapTable tbl = null;
+        OlapTable tbl;
         try {
-            tbl = (OlapTable) db.getTableOrThrowException(tableId, 
TableType.OLAP);
+            tbl = db.getTableOrMetaException(tableId, TableType.OLAP);

Review comment:
       ok

##########
File path: fe/fe-core/src/main/java/org/apache/doris/backup/RestoreJob.java
##########
@@ -1048,11 +1049,17 @@ private boolean downloadAndDeserializeMetaInfo() {
     }
 
     private void replayCheckAndPrepareMeta() {
-        Database db = catalog.getDb(dbId);
+        Database db;
+        try {
+            db = catalog.getDbOrMetaException(dbId);

Review comment:
       It is called by `replayRun`, which is called by the code below, and have 
some next process.
   
https://github.com/apache/incubator-doris/blob/138e7e896dee6842d8af7583d59288190371cd86/fe/fe-core/src/main/java/org/apache/doris/backup/BackupHandler.java#L643-L655

##########
File path: fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java
##########
@@ -4548,72 +4476,43 @@ private void unprotectUpdateReplica(ReplicaPersistInfo 
info) {
         replica.setBad(false);
     }
 
-    public void replayAddReplica(ReplicaPersistInfo info) {
-        Database db = getDb(info.getDbId());
-        OlapTable olapTable = (OlapTable) db.getTable(info.getTableId());
-        if (olapTable == null) {
-            /**
-             * Same as replayUpdateReplica()
-             */
-            LOG.warn("Olap table is null when the add replica log is replayed, 
{}", info);
-            return;
-        }
+    public void replayAddReplica(ReplicaPersistInfo info) throws 
MetaNotFoundException {
+        Database db = this.getDbOrMetaException(info.getDbId());
+        OlapTable olapTable = db.getTableOrMetaException(info.getTableId(), 
TableType.OLAP);
         olapTable.writeLock();
         try {
-            unprotectAddReplica(info);
+            unprotectAddReplica(olapTable, info);
         } finally {
             olapTable.writeUnlock();
         }
     }
 
-    public void replayUpdateReplica(ReplicaPersistInfo info) {
-        Database db = getDb(info.getDbId());
-        OlapTable olapTable = (OlapTable) db.getTable(info.getTableId());
-        if (olapTable == null) {

Review comment:
       ok

##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/httpv2/rest/TableRowCountAction.java
##########
@@ -61,15 +61,12 @@ public Object count(
             String fullDbName = getFullDbName(dbName);
             // check privilege for select, otherwise return HTTP 401
             checkTblAuth(ConnectContext.get().getCurrentUserIdentity(), 
fullDbName, tblName, PrivPredicate.SELECT);
-            Database db = Catalog.getCurrentCatalog().getDb(fullDbName);
-            if (db == null) {
-                return ResponseEntityBuilder.okWithCommonError("Database [" + 
dbName + "] " + "does not exists");
-            }
-            OlapTable olapTable = null;
+            OlapTable olapTable;
             try {
-                olapTable = (OlapTable) db.getTableOrThrowException(tblName, 
Table.TableType.OLAP);
+                Database db = 
Catalog.getCurrentCatalog().getDbOrMetaException(fullDbName);
+                olapTable = db.getTableOrMetaException(tblName, 
Table.TableType.OLAP);
             } catch (MetaNotFoundException e) {
-                return ResponseEntityBuilder.okWithCommonError(e.getMessage());
+                return ResponseEntityBuilder.notFound(e.getMessage());

Review comment:
       ok

##########
File path: fe/fe-core/src/main/java/org/apache/doris/catalog/Catalog.java
##########
@@ -6341,12 +6264,12 @@ public long loadCluster(DataInputStream dis, long 
checksum) throws IOException,
                 // for adding BE to some Cluster, but loadCluster is after 
loadBackend.
                 cluster.setBackendIdList(latestBackendIds);
 
-                String dbName =  
InfoSchemaDb.getFullInfoSchemaDbName(cluster.getName());
+                String dbName = 
InfoSchemaDb.getFullInfoSchemaDbName(cluster.getName());
                 InfoSchemaDb db;
                 // Use real Catalog instance to avoid InfoSchemaDb id 
continuously increment
                 // when checkpoint thread load image.
-                if 
(Catalog.getServingCatalog().getFullNameToDb().containsKey(dbName)) {
-                    db = 
(InfoSchemaDb)Catalog.getServingCatalog().getFullNameToDb().get(dbName);
+                if 
(Catalog.getCurrentCatalog().getFullNameToDb().containsKey(dbName)) {

Review comment:
       It is a mischange, I'll fix this

##########
File path: fe/fe-core/src/main/java/org/apache/doris/persist/EditLog.java
##########
@@ -865,6 +862,8 @@ public static void loadJournal(Catalog catalog, 
JournalEntity journal) {
                     throw e;
                 }
             }
+        } catch (MetaNotFoundException e) {
+            LOG.warn("[INCONSISTENT META] replay failed {}", journal, e);

Review comment:
       I considered printing the original journal content, but gave up because 
of the complicated deserialization. `JournalEntity` has implemented 
`toString()` method and has already print the op code. It's better for all the 
`Writable` objects to implement `toString()` method.
   I'll add `e.getMessage()` in one line and keep the exception stack trace.

##########
File path: 
fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeJobV2.java
##########
@@ -218,9 +215,9 @@ protected void runPendingJob() throws AlterCancelException {
         }
         MarkedCountDownLatch<Long, Long> countDownLatch = new 
MarkedCountDownLatch<>(totalReplicaNum);
 
-        OlapTable tbl = null;
+        OlapTable tbl;
         try {
-            tbl = (OlapTable) db.getTableOrThrowException(tableId, 
TableType.OLAP);
+            tbl = db.getTableOrMetaException(tableId, TableType.OLAP);

Review comment:
       This checks if the table exists and if the table is OLAP, so it might be 
easier to code this way.

##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/Alter.java
##########
@@ -260,17 +251,11 @@ private void processModifyColumnComment(Database db, 
OlapTable tbl, List<AlterCl
         }
     }
 
-    public void replayModifyComment(ModifyCommentOperationLog operation) {
+    public void replayModifyComment(ModifyCommentOperationLog operation) 
throws MetaNotFoundException {

Review comment:
       @caiconghui we don't have any promise that the edit logs are in order. 
These check code is to prevent the worst from happening.
   Edit logs that out of order may cause meta inconsistent, which has to be 
fixes sooner or later. We are exploring ways to ensure consistency and minimize 
the cost. Until then, we have to check all NPE or let all the FE to crash.

##########
File path: fe/fe-core/src/main/java/org/apache/doris/alter/Alter.java
##########
@@ -260,17 +251,11 @@ private void processModifyColumnComment(Database db, 
OlapTable tbl, List<AlterCl
         }
     }
 
-    public void replayModifyComment(ModifyCommentOperationLog operation) {
+    public void replayModifyComment(ModifyCommentOperationLog operation) 
throws MetaNotFoundException {

Review comment:
       @caiconghui we don't have any promise that the edit logs are in order. 
These check code is to prevent the worst from happening.
   Edit logs that out of order may cause meta inconsistent, which has to be 
fixed sooner or later. We are exploring ways to ensure consistency and minimize 
the cost. Until then, we have to check all NPE or let all the FE to crash.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to