hi Aggarwal, I am using the newest version (CDH3 Update1 Hive 0.7), after submitting several jobs using hive, the submit becomes very slow (about 2-5 minutes), following is some error information from hive.log (seems the metastore has some problem, I upgrade the metastore from 0.5 to 0.6 and then from 0.6 to 0.7 using the upgrade scripts of coudera..)
2011-08-11 09:32:34,391 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2011-08-11 09:32:33,883 WARN mapred.JobClient (JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for parsing the arguments. Applications should implement To ol for the same. 2011-08-11 09:32:34,810 ERROR metastore.HiveMetaStore (HiveMetaStore.java:executeWithRetry(321)) - JDO datastore error. Retrying metastore command after 1000 ms (attempt 1 of 1) 2011-08-11 09:32:35,033 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 2011-08-11 09:32:35,033 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved. 2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved. 2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2011-08-11 09:32:35,036 ERROR DataNucleus.Plugin (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved. 2011-08-11 09:32:35,922 ERROR parse.SemanticAnalyzer (SemanticAnalyzer.java:getMetaData(918)) - org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table model_use rclass at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:838) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:772) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:782) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6596) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:209) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:286) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:482) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: javax.jdo.JDODataStoreException: Exception thrown obtaining schema column information from datastore NestedThrowables: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Table 'metastore.DELETEME1313026355834' doesn't exist at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313) at org.datanucleus.ObjectManagerImpl.getExtent(ObjectManagerImpl.java:4154) at org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compileCandidates(JDOQLQueryCompiler.java:411) at org.datanucleus.store.rdbms.query.legacy.QueryCompiler.executionCompile(QueryCompiler.java:312) at org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compile(JDOQLQueryCompiler.java:225) at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.compileInternal(JDOQLQuery.java:175) at org.datanucleus.store.query.Query.executeQuery(Query.java:1628) at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499) at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243) at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:375) at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:394) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:432) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.access$200(HiveMetaStore.java:109) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$5.run(HiveMetaStore.java:454) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$5.run(HiveMetaStore.java:451) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:307) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:451) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:232) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:108) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1868) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1878) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830) ... 14 more Caused by: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Table 'metastore.DELETEME1313026355834' doesn't exist at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:936) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2985) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1631) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1723) at com.mysql.jdbc.Connection.execSQL(Connection.java:3277) at com.mysql.jdbc.Connection.execSQL(Connection.java:3206) at com.mysql.jdbc.Statement.executeQuery(Statement.java:1232) at com.mysql.jdbc.DatabaseMetaData$2.forEach(DatabaseMetaData.java:2390) at com.mysql.jdbc.DatabaseMetaData$IterateBlock.doForAll(DatabaseMetaData.java:76) at com.mysql.jdbc.DatabaseMetaData.getColumns(DatabaseMetaData.java:2264) at org.apache.commons.dbcp.DelegatingDatabaseMetaData.getColumns(DelegatingDatabaseMetaData.java:218) at org.datanucleus.store.rdbms.adapter.DatabaseAdapter.getColumns(DatabaseAdapter.java:1460) at org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.refreshTableData(RDBMSSchemaHandler.java:924) at org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getRDBMSTableInfoForTable(RDBMSSchemaHandler.java:823) at org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getRDBMSTableInfoForTable(RDBMSSchemaHandler.java:772) at org.datanucleus.store.rdbms.schema.RDBMSSchemaHandler.getSchemaData(RDBMSSchemaHandler.java:207) at org.datanucleus.store.rdbms.RDBMSStoreManager.getColumnInfoForTable(RDBMSStoreManager.java:1699) at org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:218) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:2702) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:2503) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2148) 2011/8/11 Aggarwal, Vaibhav <vagg...@amazon.com> > How much time is the query startup taking?**** > > ** ** > > In earlier versions of Hive (before HIVE 2299) the query startup process > had an algorithm which took O(n^2) operations in number of partitions.**** > > This means 100M operations before it would submit the map reduce job.**** > > ** ** > > *From:* air [mailto:cnwe...@gmail.com] > *Sent:* Wednesday, August 10, 2011 3:40 AM > > *To:* user@hive.apache.org > *Subject:* Re: CDH3 U1 Hive Job-commit very slow**** > > ** ** > > there is only 10186 partitions in the metadata store (select count(1) from > PARTITIONS; in mysql), I think it is not the problem. **** > > 2011/8/10 Aggarwal, Vaibhav <vagg...@amazon.com>**** > > Do you have a lot of partitions in your table?**** > > Time taken to process the partitions before submitting the job is > proportional to number of partitions.**** > > **** > > There is a patch I submitted recently as an attempt to alleviate this > problem:**** > > **** > > https://issues.apache.org/jira/browse/HIVE-2299**** > > **** > > If that is not the case, even I would be interested in root cause of large > query startup time.**** > > **** > > *From:* air [mailto:cnwe...@gmail.com] > *Sent:* Tuesday, August 09, 2011 1:19 AM > *To:* user@hive.apache.org > *Subject:* Fwd: CDH3 U1 Hive Job-commit very slow**** > > **** > > **** > > ---------- Forwarded message ---------- > From: *air* <cnwe...@gmail.com> > Date: 2011/8/9 > Subject: CDH3 U1 Hive Job-commit very slow > To: CDH Users <cdh-u...@cloudera.org> > > > when I submit a ql to hive, it is a very long time until it really submit > the job to the hadoop cluster, what may cause this problem ?* *thank you > for your help.* > > hive> select count(1) from log_test where src='test' and ds='2011-08-04'; > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> * > <------------------------stay here for a long time.. > > -- > Knowledge Mangement .**** > > > > > -- > Knowledge Mangement .**** > > > > > -- > Knowledge Mangement .**** > -- Knowledge Mangement .