I have written a threaded program to batch upsert data into Phoenix. I am using Phoenix because of secondary index capability. I am getting extremely slow performance in write.
Explain query looks like following 0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from VBQL_PHOENIX_TRANSCRIPT5 . . . . . . . . . . . . . . . . . . .> ; +---------------------------------------------------------------------------------------------------------------+-----------------+------+ | PLAN | EST_BYTES_READ | EST_ | +---------------------------------------------------------------------------------------------------------------+-----------------+------+ | CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5 | 314572800 | 6838 | | SERVER FILTER BY FIRST KEY ONLY | 314572800 | 6838 | | SERVER AGGREGATE INTO SINGLE ROW | 314572800 | 6838 | +---------------------------------------------------------------------------------------------------------------+-----------------+------+ I have secondary index that looks like this: 0: jdbc:phoenix:localhost:2181:/hbase> EXPLAIN select count(1) from VBQL_PHOENIX_TRANSCRIPT_INDEX5; +---------------------------------------------------------------------------------------------------------------+-----------------+------+ | PLAN | EST_BYTES_READ | EST_ | +---------------------------------------------------------------------------------------------------------------+-----------------+------+ | CLIENT 100-CHUNK 6838539 ROWS 314572800 BYTES PARALLEL 100-WAY FULL SCAN OVER VBQL_PHOENIX_TRANSCRIPT_INDEX5 | 314572800 | 6838 | | SERVER FILTER BY FIRST KEY ONLY | 314572800 | 6838 | | SERVER AGGREGATE INTO SINGLE ROW | 314572800 | 6838 | +---------------------------------------------------------------------------------------------------------------+-----------------+------+ Tuning Settings used in HBase. Problem is it is very hard to scale, I tried adding more nodes to Hbase cluster, and I also tried adding more threads to the client program but it is not scaling beyond 6K per minutes, which is VERY SLOW. Any help is greatly appreciated. <property> <name>index.writer.threads.max</name> <value>30</value> </property> <property> <name>index.builder.threads.max</name> <value>30</value> </property> <property> <name>phoenix.query.threadPoolSize</name> <value>256</value> </property> <property> <name>index.builder.threads.keepalivetime</name> <value>90000</value> </property> <property> <name>phoenix.query.timeoutMs</name> <value>90000</value> </property> Rest of the settings I am using are default. Source Code (Simple multithreded with write need to be saved individually/ we cannot use batch write) public void generateAndInsertDataToHbase(int iterationCount ){ try { int iterations =0; Instant start = Instant.now(); ExecutorService executorService = Executors.newCachedThreadPool(); while (true) { if (iterations == iterationCount) { verifyRowCountAndExit(start,executorService); } Future<Void> future = executorService.submit(new Callable<Void>() { public Void call() throws Exception { List<VbqlBean> vbqlList = VbqlUtils.generateRandomAccountVbql(accountCategoryMap); if(vbqlList.size() >0) { addVbqlAccountToHbaseAsBatch( vbqlList,connStr); } return null; } }); future.get(); iterations ++; } }catch(Exception e) { e.printStackTrace(); } } public void addVbqlAccountToHbaseAsBatch(List<VbqlBean> vbqlList, String connStr) throws Exception{ Connection connection = null; try { connection = DBUtils.initSingleConnection(connStr); // connection.setAutoCommit(false); if(vbqlList.size() >0) { for (VbqlBean vbqlBean : vbqlList) { DBUtils.executeSingleVbqlQuery( connStr, vbqlBean, tableName); } } connection.commit(); } finally { if (connection != null) try { connection.close(); } catch (SQLException ignore) {} } } -- Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/