[jira] [Commented] (KUDU-2984) memory_gc-itest is flaky
[ https://issues.apache.org/jira/browse/KUDU-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959357#comment-16959357 ] Yingchun Lai commented on KUDU-2984: I'll take a look. > memory_gc-itest is flaky > > > Key: KUDU-2984 > URL: https://issues.apache.org/jira/browse/KUDU-2984 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.11.0, 1.12.0 >Reporter: Alexey Serbin >Priority: Minor > Attachments: memory_gc-itest.txt.xz > > > The {{memory_gc-itest}} fails time to time with the following error message > (DEBUG build): > {noformat} > src/kudu/integration-tests/memory_gc-itest.cc:117: Failure > Expected: (ratio) >= (0.1), actual: 0.0600604 vs 0.1 > tserver-2 > src/kudu/util/test_util.cc:339: Failure > Failed > Timed out waiting for assertion to pass. > {noformat} > The full log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2984) memory_gc-itest is flaky
[ https://issues.apache.org/jira/browse/KUDU-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960240#comment-16960240 ] Yingchun Lai commented on KUDU-2984: I found 2 problems may make this test flaky: # Test table has only 1 tablet (by default) and 1 replica, so data writes on only 1 tserver, the other 2 may not conssume much memory when we do scan workload. # For tserver-1, scan workload is going on together with interval memory GC, in a corner case, GC may always take place after CHECK, so CHECK always failed. I'm sorry that this test is still flaky after 2 fix patches:(, now I try to fix again [https://gerrit.cloudera.org/c/14553/,|https://gerrit.cloudera.org/c/14553/] repeat test for thousands times on Ubuntu 18.04. I'll do more test before I try to merge it. > memory_gc-itest is flaky > > > Key: KUDU-2984 > URL: https://issues.apache.org/jira/browse/KUDU-2984 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.11.0, 1.12.0 >Reporter: Alexey Serbin >Assignee: Yingchun Lai >Priority: Minor > Attachments: memory_gc-itest.txt.xz > > > The {{memory_gc-itest}} fails time to time with the following error message > (DEBUG build): > {noformat} > src/kudu/integration-tests/memory_gc-itest.cc:117: Failure > Expected: (ratio) >= (0.1), actual: 0.0600604 vs 0.1 > tserver-2 > src/kudu/util/test_util.cc:339: Failure > Failed > Timed out waiting for assertion to pass. > {noformat} > The full log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KUDU-2984) memory_gc-itest is flaky
[ https://issues.apache.org/jira/browse/KUDU-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16960240#comment-16960240 ] Yingchun Lai edited comment on KUDU-2984 at 10/26/19 3:01 AM: -- I found 2 problems may make this test flaky: # Test table has only 1 tablet (by default) and 1 replica, so data writes on only 1 tserver, the other 2 may not conssume much memory when we do scan workload. # For tserver-1, scan workload is going on together with interval memory GC, in a corner case, GC may always take place after CHECK, so CHECK always failed. I'm sorry that this test is still flaky after 2 fix patches:(, now I try to fix again [https://gerrit.cloudera.org/c/14553/,|https://gerrit.cloudera.org/c/14553/] and repeated test for thousands times on Ubuntu 18.04 and all passed. However, I'll do more test before I try to merge it. was (Author: acelyc111): I found 2 problems may make this test flaky: # Test table has only 1 tablet (by default) and 1 replica, so data writes on only 1 tserver, the other 2 may not conssume much memory when we do scan workload. # For tserver-1, scan workload is going on together with interval memory GC, in a corner case, GC may always take place after CHECK, so CHECK always failed. I'm sorry that this test is still flaky after 2 fix patches:(, now I try to fix again [https://gerrit.cloudera.org/c/14553/,|https://gerrit.cloudera.org/c/14553/] repeat test for thousands times on Ubuntu 18.04. I'll do more test before I try to merge it. > memory_gc-itest is flaky > > > Key: KUDU-2984 > URL: https://issues.apache.org/jira/browse/KUDU-2984 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.11.0, 1.12.0 >Reporter: Alexey Serbin >Assignee: Yingchun Lai >Priority: Minor > Attachments: memory_gc-itest.txt.xz > > > The {{memory_gc-itest}} fails time to time with the following error message > (DEBUG build): > {noformat} > src/kudu/integration-tests/memory_gc-itest.cc:117: Failure > Expected: (ratio) >= (0.1), actual: 0.0600604 vs 0.1 > tserver-2 > src/kudu/util/test_util.cc:339: Failure > Failed > Timed out waiting for assertion to pass. > {noformat} > The full log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2879) Build hangs in DEBUG type on Ubuntu 18.04
[ https://issues.apache.org/jira/browse/KUDU-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965333#comment-16965333 ] Yingchun Lai commented on KUDU-2879: I'm building on an x86 machine. I did nothing about this, but after some OS upgrading, now I can build it. > Build hangs in DEBUG type on Ubuntu 18.04 > - > > Key: KUDU-2879 > URL: https://issues.apache.org/jira/browse/KUDU-2879 > Project: Kudu > Issue Type: Improvement >Reporter: Yingchun Lai >Priority: Major > Attachments: config.diff, config.log > > > Few months ago, I report this issue on Slack: > [https://getkudu.slack.com/archives/C0CPXJ3CH/p1549942641041600] > I switch to RELEASE type then on, and haven't try build on DEBUG type on my > Ubuntu environment. > Now, when I try build DEBUG type to check 1.10.0-RC2, this issue occurred > again. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (KUDU-2879) Build hangs in DEBUG type on Ubuntu 18.04
[ https://issues.apache.org/jira/browse/KUDU-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai closed KUDU-2879. -- Resolution: Fixed > Build hangs in DEBUG type on Ubuntu 18.04 > - > > Key: KUDU-2879 > URL: https://issues.apache.org/jira/browse/KUDU-2879 > Project: Kudu > Issue Type: Improvement >Reporter: Yingchun Lai >Priority: Major > Attachments: config.diff, config.log > > > Few months ago, I report this issue on Slack: > [https://getkudu.slack.com/archives/C0CPXJ3CH/p1549942641041600] > I switch to RELEASE type then on, and haven't try build on DEBUG type on my > Ubuntu environment. > Now, when I try build DEBUG type to check 1.10.0-RC2, this issue occurred > again. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-2992) Limit concurrent alter request of a table
Yingchun Lai created KUDU-2992: -- Summary: Limit concurrent alter request of a table Key: KUDU-2992 URL: https://issues.apache.org/jira/browse/KUDU-2992 Project: Kudu Issue Type: Improvement Components: master Reporter: Yingchun Lai One of our production environment clusters cause an accident some days ago, one user has a table, partition schema looks like: {code:java} HASH (uuid) PARTITIONS 80,RANGE (date_hour) ( PARTITION 2019102900 <= VALUES < 2019102901, PARTITION 2019102901 <= VALUES < 2019102902, PARTITION 2019102902 <= VALUES < 2019102903, PARTITION 2019102903 <= VALUES < 2019102904, PARTITION 2019102904 <= VALUES < 2019102905, ...) {code} He try to remove many outdated partitions once by SparkSQL, but it returns an timeout error at first, then he try again and again, and SparkSQL failed again and again. Then the cluster became unstable, memory used and CPU load increasing. I found many log like: {code:java} W1030 17:29:53.382287 7588 rpcz_store.cc:259] Trace:1030 17:26:19.714799 (+ 0us) service_pool.cc:162] Inserting onto call queue1030 17:26:19.714808 (+ 9us) service_pool.cc:221] Handling call1030 17:29:53.382204 (+213667396us) ts_tablet_manager.cc:874] Deleting tablet c52c5f43f7884d08b07fd0005e878fed1030 17:29:53.382205 (+ 1us) ts_tablet_manager.cc:794] Acquired tablet manager lock1030 17:29:53.382208 (+ 3us) inbound_call.cc:162] Queueing success responseMetrics: {"tablet-delete.queue_time_us":213667360}W1030 17:29:53.382300 7586 rpcz_store.cc:253] Call kudu.tserver.TabletServerAdminService.DeleteTablet from 10.152.49.21:55576 (request call id 1820316) took 213667 ms (3.56 min). Client timeout 2 ms (30 s)W1030 17:29:53.382292 10623 rpcz_store.cc:253] Call kudu.tserver.TabletServerAdminService.DeleteTablet from 10.152.49.21:55576 (request call id 1820315) took 213667 ms (3.56 min). Client timeout 2 ms (30 s)W1030 17:29:53.382297 10622 rpcz_store.cc:259] Trace:1030 17:26:19.714825 (+ 0us) service_pool.cc:162] Inserting onto call queue1030 17:26:19.714833 (+ 8us) service_pool.cc:221] Handling call1030 17:29:53.382239 (+213667406us) ts_tablet_manager.cc:874] Deleting tablet 479f8c592f16408c830637a0129359e11030 17:29:53.382241 (+ 2us) ts_tablet_manager.cc:794] Acquired tablet manager lock1030 17:29:53.382244 (+ 3us) inbound_call.cc:162] Queueing success responseMetrics: {"tablet-delete.queue_time_us":213667378} {code} That means 'Acquired tablet manager lock' cost much time, right? {code:java} Status TSTabletManager::BeginReplicaStateTransition( const string& tablet_id, const string& reason, scoped_refptr* replica, scoped_refptr* deleter, TabletServerErrorPB::Code* error_code) { // Acquire the lock in exclusive mode as we'll add a entry to the // transition_in_progress_ map. std::lock_guard lock(lock_); TRACE("Acquired tablet manager lock"); RETURN_NOT_OK(CheckRunningUnlocked(error_code)); ... }{code} But I think the root case is Kudu master send too many duplicate 'alter table/delete tablet' request to tserver. I found more info in master's log: {code:java} $ grep "Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d" kudu_master.zjy-hadoop-prc-ct01.bj.work.log.INFO.20191030-204137.62788 | egrep "attempt = 1\)"I1030 20:41:42.207222 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 43 ms (attempt = 1)I1030 20:41:42.207556 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 40 ms (attempt = 1)I1030 20:41:42.260052 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 31 ms (attempt = 1)I1030 20:41:42.278609 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 19 ms (attempt = 1)I1030 20:41:42.312175 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 48 ms (attempt = 1)I1030 20:41:42.318933 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 62 ms (attempt = 1)I1030 20:41:42.340060 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 30 ms (attempt = 1)I1030 20:41:42.475689 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a380
[jira] [Updated] (KUDU-2992) Limit concurrent alter request of a table
[ https://issues.apache.org/jira/browse/KUDU-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-2992: --- Description: One of our production environment clusters cause an accident some days ago, one user has a table, partition schema looks like: {code:java} HASH (uuid) PARTITIONS 80,RANGE (date_hour) ( PARTITION 2019102900 <= VALUES < 2019102901, PARTITION 2019102901 <= VALUES < 2019102902, PARTITION 2019102902 <= VALUES < 2019102903, PARTITION 2019102903 <= VALUES < 2019102904, PARTITION 2019102904 <= VALUES < 2019102905, ...) {code} He try to remove many outdated partitions once by SparkSQL, but it returns an timeout error at first, then he try again and again, and SparkSQL failed again and again. Then the cluster became unstable, memory used and CPU load increasing. I found many log like: {code:java} W1030 17:29:53.382287 7588 rpcz_store.cc:259] Trace: 1030 17:26:19.714799 (+ 0us) service_pool.cc:162] Inserting onto call queue 1030 17:26:19.714808 (+ 9us) service_pool.cc:221] Handling call 1030 17:29:53.382204 (+213667396us) ts_tablet_manager.cc:874] Deleting tablet c52c5f43f7884d08b07fd0005e878fed 1030 17:29:53.382205 (+ 1us) ts_tablet_manager.cc:794] Acquired tablet manager lock 1030 17:29:53.382208 (+ 3us) inbound_call.cc:162] Queueing success response Metrics: {"tablet-delete.queue_time_us":213667360} W1030 17:29:53.382300 7586 rpcz_store.cc:253] Call kudu.tserver.TabletServerAdminService.DeleteTablet from 10.152.49.21:55576 (request call id 1820316) took 213667 ms (3.56 min). Client timeout 2 ms (30 s) W1030 17:29:53.382292 10623 rpcz_store.cc:253] Call kudu.tserver.TabletServerAdminService.DeleteTablet from 10.152.49.21:55576 (request call id 1820315) took 213667 ms (3.56 min). Client timeout 2 ms (30 s) W1030 17:29:53.382297 10622 rpcz_store.cc:259] Trace: 1030 17:26:19.714825 (+ 0us) service_pool.cc:162] Inserting onto call queue 1030 17:26:19.714833 (+ 8us) service_pool.cc:221] Handling call 1030 17:29:53.382239 (+213667406us) ts_tablet_manager.cc:874] Deleting tablet 479f8c592f16408c830637a0129359e1 1030 17:29:53.382241 (+ 2us) ts_tablet_manager.cc:794] Acquired tablet manager lock 1030 17:29:53.382244 (+ 3us) inbound_call.cc:162] Queueing success response Metrics: {"tablet-delete.queue_time_us":213667378} ...{code} That means 'Acquired tablet manager lock' cost much time, right? {code:java} Status TSTabletManager::BeginReplicaStateTransition( const string& tablet_id, const string& reason, scoped_refptr* replica, scoped_refptr* deleter, TabletServerErrorPB::Code* error_code) { // Acquire the lock in exclusive mode as we'll add a entry to the // transition_in_progress_ map. std::lock_guard lock(lock_); TRACE("Acquired tablet manager lock"); RETURN_NOT_OK(CheckRunningUnlocked(error_code)); ... }{code} But I think the root case is Kudu master send too many duplicate 'alter table/delete tablet' request to tserver. I found more info in master's log: {code:java} $ grep "Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d" kudu_master.zjy-hadoop-prc-ct01.bj.work.log.INFO.20191030-204137.62788 | egrep "attempt = 1\)" I1030 20:41:42.207222 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 43 ms (attempt = 1) I1030 20:41:42.207556 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 40 ms (attempt = 1) I1030 20:41:42.260052 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 31 ms (attempt = 1) I1030 20:41:42.278609 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 19 ms (attempt = 1) I1030 20:41:42.312175 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 48 ms (attempt = 1) I1030 20:41:42.318933 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 62 ms (attempt = 1) I1030 20:41:42.340060 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 30 ms (attempt = 1) ...{code} That means master received too many duplicate 'delete tablet' request from client, and then dispatch these request to tservers. I think we should limit the concurrent alter request of a table, when a alter request is on going and hasn't been finished, the following request should be re
[jira] [Updated] (KUDU-2992) Limit concurrent alter request of a table
[ https://issues.apache.org/jira/browse/KUDU-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-2992: --- Description: One of our production environment clusters cause an accident some days ago, one user has a table, partition schema looks like: {code:java} HASH (uuid) PARTITIONS 80,RANGE (date_hour) ( PARTITION 2019102900 <= VALUES < 2019102901, PARTITION 2019102901 <= VALUES < 2019102902, PARTITION 2019102902 <= VALUES < 2019102903, PARTITION 2019102903 <= VALUES < 2019102904, PARTITION 2019102904 <= VALUES < 2019102905, ...) {code} He try to remove many outdated partitions once by SparkSQL, but it returns an timeout error at first, then he try again and again, and SparkSQL failed again and again. Then the cluster became unstable, memory usage and CPU load increasing. I found many log like: {code:java} W1030 17:29:53.382287 7588 rpcz_store.cc:259] Trace: 1030 17:26:19.714799 (+ 0us) service_pool.cc:162] Inserting onto call queue 1030 17:26:19.714808 (+ 9us) service_pool.cc:221] Handling call 1030 17:29:53.382204 (+213667396us) ts_tablet_manager.cc:874] Deleting tablet c52c5f43f7884d08b07fd0005e878fed 1030 17:29:53.382205 (+ 1us) ts_tablet_manager.cc:794] Acquired tablet manager lock 1030 17:29:53.382208 (+ 3us) inbound_call.cc:162] Queueing success response Metrics: {"tablet-delete.queue_time_us":213667360} W1030 17:29:53.382300 7586 rpcz_store.cc:253] Call kudu.tserver.TabletServerAdminService.DeleteTablet from 10.152.49.21:55576 (request call id 1820316) took 213667 ms (3.56 min). Client timeout 2 ms (30 s) W1030 17:29:53.382292 10623 rpcz_store.cc:253] Call kudu.tserver.TabletServerAdminService.DeleteTablet from 10.152.49.21:55576 (request call id 1820315) took 213667 ms (3.56 min). Client timeout 2 ms (30 s) W1030 17:29:53.382297 10622 rpcz_store.cc:259] Trace: 1030 17:26:19.714825 (+ 0us) service_pool.cc:162] Inserting onto call queue 1030 17:26:19.714833 (+ 8us) service_pool.cc:221] Handling call 1030 17:29:53.382239 (+213667406us) ts_tablet_manager.cc:874] Deleting tablet 479f8c592f16408c830637a0129359e1 1030 17:29:53.382241 (+ 2us) ts_tablet_manager.cc:794] Acquired tablet manager lock 1030 17:29:53.382244 (+ 3us) inbound_call.cc:162] Queueing success response Metrics: {"tablet-delete.queue_time_us":213667378} ...{code} That means 'Acquired tablet manager lock' cost much time, right? {code:java} Status TSTabletManager::BeginReplicaStateTransition( const string& tablet_id, const string& reason, scoped_refptr* replica, scoped_refptr* deleter, TabletServerErrorPB::Code* error_code) { // Acquire the lock in exclusive mode as we'll add a entry to the // transition_in_progress_ map. std::lock_guard lock(lock_); TRACE("Acquired tablet manager lock"); RETURN_NOT_OK(CheckRunningUnlocked(error_code)); ... }{code} But I think the root case is Kudu master send too many duplicate 'alter table/delete tablet' request to tserver. I found more info in master's log: {code:java} $ grep "Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d" kudu_master.zjy-hadoop-prc-ct01.bj.work.log.INFO.20191030-204137.62788 | egrep "attempt = 1\)" I1030 20:41:42.207222 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 43 ms (attempt = 1) I1030 20:41:42.207556 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 40 ms (attempt = 1) I1030 20:41:42.260052 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 31 ms (attempt = 1) I1030 20:41:42.278609 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 19 ms (attempt = 1) I1030 20:41:42.312175 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 48 ms (attempt = 1) I1030 20:41:42.318933 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 62 ms (attempt = 1) I1030 20:41:42.340060 62821 catalog_manager.cc:2971] Scheduling retry of 8f8b354490684bf3a54e49a1478ec99d Delete Tablet RPC for TS=d50ddd2e763e4d5e81828a3807187b2e with a delay of 30 ms (attempt = 1) ...{code} That means master received too many duplicate 'delete tablet' request from client, and then dispatch these request to tservers. I think we should limit the concurrent alter request of a table, when a alter request is on going and hasn't been finished, the following request should be r
[jira] [Commented] (KUDU-2453) kudu should stop creating tablet infinitely
[ https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977296#comment-16977296 ] Yingchun Lai commented on KUDU-2453: We also happend to see this issue, I created another Jira to trace it, and also give some ideas to resolve it. > kudu should stop creating tablet infinitely > --- > > Key: KUDU-2453 > URL: https://issues.apache.org/jira/browse/KUDU-2453 > Project: Kudu > Issue Type: Bug > Components: master, tserver >Affects Versions: 1.4.0, 1.7.2 >Reporter: LiFu He >Priority: Major > > I have met this problem again on 2018/10/26. And now the kudu version is > 1.7.2. > - > We modified the flag 'max_create_tablets_per_ts' (2000) of master.conf, and > there are some load on the kudu cluster. Then someone else created a big > table which had tens of thousands of tablets from impala-shell (that was a > mistake). > {code:java} > CREATE TABLE XXX( > ... >PRIMARY KEY (...) > ) > PARTITION BY HASH (...) PARTITIONS 100, > RANGE (...) > ( > PARTITION "2018-10-24" <= VALUES < "2018-10-24\000", > PARTITION "2018-10-25" <= VALUES < "2018-10-25\000", > ... > PARTITION "2018-12-07" <= VALUES < "2018-12-07\000" > ) > STORED AS KUDU > TBLPROPERTIES ('kudu.master_addresses'= '...'); > {code} > Here are the logs after creating table (only pick one tablet as example): > {code:java} > --Kudu-master log > ==e884bda6bbd3482f94c07ca0f34f99a4== > W1024 11:40:51.914397 180146 catalog_manager.cc:2664] TS > 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): Create Tablet RPC > failed for tablet e884bda6bbd3482f94c07ca0f34f99a4: Remote error: Service > unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService > from 10.120.219.118:50247 dropped due to backpressure. The service queue is > full; it has 512 items. > I1024 11:40:51.914412 180146 catalog_manager.cc:2700] Scheduling retry of > CreateTablet RPC for tablet e884bda6bbd3482f94c07ca0f34f99a4 on TS > 39f15fcf42ef45bba0c95a3223dc25ee with a delay of 42 ms (attempt = 1) > ... > ==Be replaced by 0b144c00f35d48cca4d4981698faef72== > W1024 11:41:22.114512 180202 catalog_manager.cc:3949] T > P f6c9a09da7ef4fc191cab6276b942ba3: Tablet > e884bda6bbd3482f94c07ca0f34f99a4 (table quasi_realtime_user_feature > [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed > timeout. Replacing with a new tablet 0b144c00f35d48cca4d4981698faef72 > ... > I1024 11:41:22.391916 180202 catalog_manager.cc:3806] T > P f6c9a09da7ef4fc191cab6276b942ba3: Sending > DeleteTablet for 3 replicas of tablet e884bda6bbd3482f94c07ca0f34f99a4 > ... > I1024 11:41:22.391927 180202 catalog_manager.cc:2922] Sending > DeleteTablet(TABLET_DATA_DELETED) for tablet e884bda6bbd3482f94c07ca0f34f99a4 > on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by > 0b144c00f35d48cca4d4981698faef72 at 2018-10-24 11:41:22 CST) > ... > W1024 11:41:22.428129 180146 catalog_manager.cc:2892] TS > 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for > tablet e884bda6bbd3482f94c07ca0f34f99a4 with error code TABLET_NOT_RUNNING: > Already present: State transition of tablet e884bda6bbd3482f94c07ca0f34f99a4 > already in progress: creating tablet > ... > I1024 11:41:22.428143 180146 catalog_manager.cc:2700] Scheduling retry of > e884bda6bbd3482f94c07ca0f34f99a4 Delete Tablet RPC for > TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 35 ms (attempt = 1) > ... > W1024 11:41:22.683702 180145 catalog_manager.cc:2664] TS > b251540e606b4863bb576091ff961892 (kudu1.lt.163.org:7050): Create Tablet RPC > failed for tablet 0b144c00f35d48cca4d4981698faef72: Remote error: Service > unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService > from 10.120.219.118:59735 dropped due to backpressure. The service queue is > full; it has 512 items. > I1024 11:41:22.683717 180145 catalog_manager.cc:2700] Scheduling retry of > CreateTablet RPC for tablet 0b144c00f35d48cca4d4981698faef72 on TS > b251540e606b4863bb576091ff961892 with a delay of 46 ms (attempt = 1) > ... > ==Be replaced by c0e0acc448fc42fc9e48f5025b112a75== > W1024 11:41:52.775420 180202 catalog_manager.cc:3949] T > P f6c9a09da7ef4fc191cab6276b942ba3: Tablet > 0b144c00f35d48cca4d4981698faef72 (table quasi_realtime_user_feature > [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed > timeout. Replacing with a new tablet c0e0acc448fc42fc9e48f5025b112a75 > ... > --Kudu-tserver log > I1024 11:40:52.014571 137
[jira] [Comment Edited] (KUDU-2453) kudu should stop creating tablet infinitely
[ https://issues.apache.org/jira/browse/KUDU-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977296#comment-16977296 ] Yingchun Lai edited comment on KUDU-2453 at 11/19/19 9:25 AM: -- We also happend to see this issue, I created another Jira to trace it, and also gave some ideas to resolve it. was (Author: acelyc111): We also happend to see this issue, I created another Jira to trace it, and also give some ideas to resolve it. > kudu should stop creating tablet infinitely > --- > > Key: KUDU-2453 > URL: https://issues.apache.org/jira/browse/KUDU-2453 > Project: Kudu > Issue Type: Bug > Components: master, tserver >Affects Versions: 1.4.0, 1.7.2 >Reporter: LiFu He >Priority: Major > > I have met this problem again on 2018/10/26. And now the kudu version is > 1.7.2. > - > We modified the flag 'max_create_tablets_per_ts' (2000) of master.conf, and > there are some load on the kudu cluster. Then someone else created a big > table which had tens of thousands of tablets from impala-shell (that was a > mistake). > {code:java} > CREATE TABLE XXX( > ... >PRIMARY KEY (...) > ) > PARTITION BY HASH (...) PARTITIONS 100, > RANGE (...) > ( > PARTITION "2018-10-24" <= VALUES < "2018-10-24\000", > PARTITION "2018-10-25" <= VALUES < "2018-10-25\000", > ... > PARTITION "2018-12-07" <= VALUES < "2018-12-07\000" > ) > STORED AS KUDU > TBLPROPERTIES ('kudu.master_addresses'= '...'); > {code} > Here are the logs after creating table (only pick one tablet as example): > {code:java} > --Kudu-master log > ==e884bda6bbd3482f94c07ca0f34f99a4== > W1024 11:40:51.914397 180146 catalog_manager.cc:2664] TS > 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): Create Tablet RPC > failed for tablet e884bda6bbd3482f94c07ca0f34f99a4: Remote error: Service > unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService > from 10.120.219.118:50247 dropped due to backpressure. The service queue is > full; it has 512 items. > I1024 11:40:51.914412 180146 catalog_manager.cc:2700] Scheduling retry of > CreateTablet RPC for tablet e884bda6bbd3482f94c07ca0f34f99a4 on TS > 39f15fcf42ef45bba0c95a3223dc25ee with a delay of 42 ms (attempt = 1) > ... > ==Be replaced by 0b144c00f35d48cca4d4981698faef72== > W1024 11:41:22.114512 180202 catalog_manager.cc:3949] T > P f6c9a09da7ef4fc191cab6276b942ba3: Tablet > e884bda6bbd3482f94c07ca0f34f99a4 (table quasi_realtime_user_feature > [id=946d6dd03ec544eab96231e5a03bed59]) was not created within the allowed > timeout. Replacing with a new tablet 0b144c00f35d48cca4d4981698faef72 > ... > I1024 11:41:22.391916 180202 catalog_manager.cc:3806] T > P f6c9a09da7ef4fc191cab6276b942ba3: Sending > DeleteTablet for 3 replicas of tablet e884bda6bbd3482f94c07ca0f34f99a4 > ... > I1024 11:41:22.391927 180202 catalog_manager.cc:2922] Sending > DeleteTablet(TABLET_DATA_DELETED) for tablet e884bda6bbd3482f94c07ca0f34f99a4 > on 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050) (Replaced by > 0b144c00f35d48cca4d4981698faef72 at 2018-10-24 11:41:22 CST) > ... > W1024 11:41:22.428129 180146 catalog_manager.cc:2892] TS > 39f15fcf42ef45bba0c95a3223dc25ee (kudu2.lt.163.org:7050): delete failed for > tablet e884bda6bbd3482f94c07ca0f34f99a4 with error code TABLET_NOT_RUNNING: > Already present: State transition of tablet e884bda6bbd3482f94c07ca0f34f99a4 > already in progress: creating tablet > ... > I1024 11:41:22.428143 180146 catalog_manager.cc:2700] Scheduling retry of > e884bda6bbd3482f94c07ca0f34f99a4 Delete Tablet RPC for > TS=39f15fcf42ef45bba0c95a3223dc25ee with a delay of 35 ms (attempt = 1) > ... > W1024 11:41:22.683702 180145 catalog_manager.cc:2664] TS > b251540e606b4863bb576091ff961892 (kudu1.lt.163.org:7050): Create Tablet RPC > failed for tablet 0b144c00f35d48cca4d4981698faef72: Remote error: Service > unavailable: CreateTablet request on kudu.tserver.TabletServerAdminService > from 10.120.219.118:59735 dropped due to backpressure. The service queue is > full; it has 512 items. > I1024 11:41:22.683717 180145 catalog_manager.cc:2700] Scheduling retry of > CreateTablet RPC for tablet 0b144c00f35d48cca4d4981698faef72 on TS > b251540e606b4863bb576091ff961892 with a delay of 46 ms (attempt = 1) > ... > ==Be replaced by c0e0acc448fc42fc9e48f5025b112a75== > W1024 11:41:52.775420 180202 catalog_manager.cc:3949] T > P f6c9a09da7ef4fc191cab6276b942ba3: Tablet > 0b144c00f35d48cca4d4981698faef72 (table quasi_realtime_user_feature > [id=946d6dd03ec544eab96231e5a03bed59]) was not created within t
[jira] [Created] (KUDU-3001) Multi-thread to load containers in a data directory
Yingchun Lai created KUDU-3001: -- Summary: Multi-thread to load containers in a data directory Key: KUDU-3001 URL: https://issues.apache.org/jira/browse/KUDU-3001 Project: Kudu Issue Type: Improvement Reporter: Yingchun Lai Assignee: Yingchun Lai As what [~tlipcon] mentioned in https://issues.apache.org/jira/browse/KUDU-2014, we can improve tserver startup time by load containers in a data directoty by multiple threads. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3102) tabletserver coredump in jsonwriter
Yingchun Lai created KUDU-3102: -- Summary: tabletserver coredump in jsonwriter Key: KUDU-3102 URL: https://issues.apache.org/jira/browse/KUDU-3102 Project: Kudu Issue Type: Bug Components: tserver Affects Versions: 1.10.1 Reporter: Yingchun Lai A tserver coredump happened, backtrace like fowllowing: {code:java} [Thread debugging using libthread_db enabled][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".Missing separate debuginfo for /home/work/app/kudu/c3tst-dev/master/package/libcrypto.so.10Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/35/93fa778645a59ea272dbbb59d318c60940e792.debugCore was generated by `/home/work/app/kudu/c3tst-dev/master/package/kudu_master -default_num_replicas='.Program terminated with signal 11, Segmentation fault.#0 GetStackTrace_x86 (result=0x7fbf7232fa00, max_depth=31, skip_count=0) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h:328328 /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h: No such file or directory.Missing separate debuginfos, use: debuginfo-install cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 zlib-1.2.7-17.el7.x86_64(gdb) bt#0 GetStackTrace_x86 (result=0x7fbf7232fa00, max_depth=31, skip_count=0) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h:328#1 0x00b9992b in GetStackTrace (result=result@entry=0x7fbf7232fa00, max_depth=max_depth@entry=31, skip_count=skip_count@entry=1) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace.cc:295#2 0x00b8c14d in DoSampledAllocation (size=size@entry=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1169#3 0x0289f151 in do_malloc (size=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1361#4 do_allocate_full (size=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1751#5 tcmalloc::allocate_full_cpp_throw_oom (size=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1765#6 0x0289f2a7 in dispatch_allocate_full (size=) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1774#7 malloc_fast_path (size=) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1845#8 tc_new (size=) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1969#9 0x7fbf79c785cd in std::__cxx11::basic_string, std::allocator >::reserve (this=this@entry=0x7fbf7232fbb0, __res=) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:293#10 0x7fbf79c6be0b in std::__cxx11::basic_stringbuf, std::allocator >::overflow (this=0x7fbf72330668, __c=83) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/sstream.tcc:133#11 0x7fbf79c76b89 in std::basic_streambuf >::xsputn (this=0x7fbf72330668, __s=0x6929232 "Service_RequestConsensusVote\",\"total_count\":1,\"min\":104,\"mean\":104.0,\"percentile_75\":104,\"percentile_95\":104,\"percentile_99\":104,\"percentile_99_9\":104,\"percentile_99_99\":104,\"max\":104,\"total_sum\":104}"..., __n=250) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/streambuf.tcc:98#12 0x7fbf79c66b62 in sputn (__n=250, __s=, this=) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/streambuf:451#13 _M_write (__n=250, __s=, this=0x7fbf72330660) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/ostream:313#14 std::ostream::write (this=0x7fbf72330660, __s=0x6929200 ",{\"name\":\"handler_latency_kudu_consensus_ConsensusService_RequestConsensusVote\",\"total_count\":1,\"min\":104,\"mean\":104.0,\"percentile_75\":104,\"percentile_95\":104,\"percentile_99\":104,\"percentile_99_9\":104"..., __n=250) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/ostream.tcc:196#15 0x0265bb1c in Flush (this=0x58c75f8) at /home/laiyingchun/kudu_xm/src/kudu/util/jsonwriter.cc:307#16 kudu::JsonWriterImpl, rapidjson::UTF8, rapidjson::CrtAllocator, 0u> >::EndObject (this=0x58c75f0) at /home/laiyingchun/kudu_xm/src/kudu/util/jsonwriter.cc:34
[jira] [Updated] (KUDU-3102) tabletserver coredump in jsonwriter
[ https://issues.apache.org/jira/browse/KUDU-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3102: --- Description: A tserver coredump happened, backtrace like fowllowing: {code:java} [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Missing separate debuginfo for /home/work/app/kudu/c3tst-dev/master/package/libcrypto.so.10 Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/35/93fa778645a59ea272dbbb59d318c60940e792.debug Core was generated by `/home/work/app/kudu/c3tst-dev/master/package/kudu_master -default_num_replicas='. Program terminated with signal 11, Segmentation fault. #0 GetStackTrace_x86 (result=0x7fbf7232fa00, max_depth=31, skip_count=0) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h:328 328 /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h: No such file or directory. Missing separate debuginfos, use: debuginfo-install cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 GetStackTrace_x86 (result=0x7fbf7232fa00, max_depth=31, skip_count=0) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h:328 #1 0x00b9992b in GetStackTrace (result=result@entry=0x7fbf7232fa00, max_depth=max_depth@entry=31, skip_count=skip_count@entry=1) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace.cc:295 #2 0x00b8c14d in DoSampledAllocation (size=size@entry=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1169 #3 0x0289f151 in do_malloc (size=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1361 #4 do_allocate_full (size=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1751 #5 tcmalloc::allocate_full_cpp_throw_oom (size=16385) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1765 #6 0x0289f2a7 in dispatch_allocate_full (size=) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1774 #7 malloc_fast_path (size=) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1845 #8 tc_new (size=) at /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1969 #9 0x7fbf79c785cd in std::__cxx11::basic_string, std::allocator >::reserve (this=this@entry=0x7fbf7232fbb0, __res=) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:293 #10 0x7fbf79c6be0b in std::__cxx11::basic_stringbuf, std::allocator >::overflow (this=0x7fbf72330668, __c=83) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/sstream.tcc:133 #11 0x7fbf79c76b89 in std::basic_streambuf >::xsputn (this=0x7fbf72330668, __s=0x6929232 "Service_RequestConsensusVote\",\"total_count\":1,\"min\":104,\"mean\":104.0,\"percentile_75\":104,\"percentile_95\":104,\"percentile_99\":104,\"percentile_99_9\":104,\"percentile_99_99\":104,\"max\":104,\"total_sum\":104}"..., __n=250) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/streambuf.tcc:98 #12 0x7fbf79c66b62 in sputn (__n=250, __s=, this=) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/streambuf:451 #13 _M_write (__n=250, __s=, this=0x7fbf72330660) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/ostream:313 #14 std::ostream::write (this=0x7fbf72330660, __s=0x6929200 ",{\"name\":\"handler_latency_kudu_consensus_ConsensusService_RequestConsensusVote\",\"total_count\":1,\"min\":104,\"mean\":104.0,\"percentile_75\":104,\"percentile_95\":104,\"percentile_99\":104,\"percentile_99_9\":104"..., __n=250) at /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/ostream.tcc:196 #15 0x0265bb1c in Flush (this=0x58c75f8) at /home/laiyingchun/kudu_xm/src/kudu/util/jsonwriter.cc:307 #16 kudu::JsonWriterImpl, rapidjson::UTF8, rapidjson::CrtAllocator, 0u> >::EndObject (this=0x58c75f0) at /home/laiyingchun/kudu_xm/src/kudu/util/jsonwriter.cc:345 #17 0x0265a968 in EndObject (this=0x7fbf72330190) at /home/laiyingchun/kudu_xm/src/kudu/util/jsonwriter.cc:151 #18 kudu::JsonWriter::Protobuf (this=this@entry=0x7fbf72330190, pb=...) at /hom
[jira] [Commented] (KUDU-3102) tabletserver coredump in jsonwriter
[ https://issues.apache.org/jira/browse/KUDU-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073308#comment-17073308 ] Yingchun Lai commented on KUDU-3102: [~adar] The server has 128GB physical memory, and when the Kudu master process coredump, only about 16GB is used totally. But another thing I have to point out is that there are several processes running on the server, some Kudu masters, and some java applications. And one of the masters' cluster is running YCSB benchmark, but I don't think it will introduce much load to master. [~tlipcon] Not frequently, just once since last restart about 3 month ago, Kudu master version is 1.10.1. > tabletserver coredump in jsonwriter > --- > > Key: KUDU-3102 > URL: https://issues.apache.org/jira/browse/KUDU-3102 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.10.1 >Reporter: Yingchun Lai >Priority: Major > > A tserver coredump happened, backtrace like fowllowing: > {code:java} > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > Missing separate debuginfo for > /home/work/app/kudu/c3tst-dev/master/package/libcrypto.so.10 > Try: yum --enablerepo='*debug*' install > /usr/lib/debug/.build-id/35/93fa778645a59ea272dbbb59d318c60940e792.debug > Core was generated by > `/home/work/app/kudu/c3tst-dev/master/package/kudu_master > -default_num_replicas='. > Program terminated with signal 11, Segmentation fault. > #0 GetStackTrace_x86 (result=0x7fbf7232fa00, max_depth=31, skip_count=0) at > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h:328 > 328 > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h: > No such file or directory. > Missing separate debuginfos, use: debuginfo-install > cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 > cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 > cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 glibc-2.17-157.el7_3.1.x86_64 > keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 > libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 > libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 > ncurses-libs-5.9-13.20130511.el7.x86_64 > nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 > openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 > zlib-1.2.7-17.el7.x86_64 > (gdb) bt > #0 GetStackTrace_x86 (result=0x7fbf7232fa00, max_depth=31, skip_count=0) at > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h:328 > #1 0x00b9992b in GetStackTrace (result=result@entry=0x7fbf7232fa00, > max_depth=max_depth@entry=31, skip_count=skip_count@entry=1) at > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/stacktrace.cc:295 > #2 0x00b8c14d in DoSampledAllocation (size=size@entry=16385) at > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1169 > #3 0x0289f151 in do_malloc (size=16385) at > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1361 > #4 do_allocate_full (size=16385) at > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1751 > #5 tcmalloc::allocate_full_cpp_throw_oom (size=16385) at > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1765 > #6 0x0289f2a7 in dispatch_allocate_full > (size=) at > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1774 > #7 malloc_fast_path (size=) at > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1845 > #8 tc_new (size=) at > /home/laiyingchun/kudu_xm/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1969 > #9 0x7fbf79c785cd in std::__cxx11::basic_string std::char_traits, std::allocator >::reserve > (this=this@entry=0x7fbf7232fbb0, __res=) > at > /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:293 > #10 0x7fbf79c6be0b in std::__cxx11::basic_stringbuf std::char_traits, std::allocator >::overflow > (this=0x7fbf72330668, __c=83) at > /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/sstream.tcc:133 > #11 0x7fbf79c76b89 in std::basic_streambuf > >::xsputn (this=0x7fbf72330668, > __s=0x6929232 > "Service_RequestConsensusVote\",\"total_count\":1,\"min\":104,\"mean\":104.0,\"percentile_75\":104,\"percentile_95\":104,\"percentile_99\":104,\"percentile_99_9\":104,\"percentile_99_99\":104,\"max\":104,\"total_sum\":104}"..., > __n=250) > at > /home/laiyingchun/gcc-7.4.0-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/streambuf.tcc:98 > #12 0x7fbf79c66b62 in sputn (__n=250, __s=, > this=) at > /home/laiyingchun/gcc-7.4.0-build/x86_64-
[jira] [Updated] (KUDU-2824) Make some tables in high priority in MM compaction
[ https://issues.apache.org/jira/browse/KUDU-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-2824: --- Fix Version/s: 1.10.0 > Make some tables in high priority in MM compaction > -- > > Key: KUDU-2824 > URL: https://issues.apache.org/jira/browse/KUDU-2824 > Project: Kudu > Issue Type: Improvement > Components: tserver >Affects Versions: 1.9.0 >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Minor > Labels: MM, compaction, maintenance, priority > Fix For: 1.10.0 > > > In a Kudu cluster with thousands of tables, it's hard for a specified > tablet's maintenance OPs to be launched when their scores are not the > highest, even if the table the tablet belongs to is high priority for Kudu > users. > For example, table A has 10 tablets and has total size of 1G, table B has > 1000 tablets and has total size of 100G. Both of them have similar update > writes, i.e. DRSs have similar overlaps, similar redo/undo logs, so they have > similar compaction scores. However, table A has much more reads than table B, > but table A and B are equal in MM, their DRS compactions are lauched equally, > we have to suffer a long time util most of tablets have been compacted in the > cluster to achieve a fast scan. > So, maybe we can introduce some algorithm to detect high priority tables and > speed up compaction of these tables? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KUDU-2824) Make some tables in high priority in MM compaction
[ https://issues.apache.org/jira/browse/KUDU-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103238#comment-17103238 ] Yingchun Lai edited comment on KUDU-2824 at 5/9/20, 10:23 AM: -- [maintenance] Support priorities for tables in MM compaction {{This commit adds a feature to specify different priorities for table compaction. In a Kudu cluster with thousands of tables, it's hard for a specified tablet's maintenance OPs to be launched when their scores are not the highest, even if the table the tablet belongs to is high priority for Kudu users. This patch allows administators to specify different priorities for tables by gflags, these maintenance OPs of these high priority tables have greater chance to be launched. }} {{ Change-Id: I3ea3b73505157678a8fb551656123b64e6bfb304 }} {{Reviewed-on: [http://gerrit.cloudera.org:8080/12852]}} {{Tested-by: Adar Dembo }} {{Reviewed-by: Adar Dembo }} was (Author: acelyc111): [maintenance] Support priorities for tables in MM compaction This commit adds a feature to specify different priorities for table compaction. In a Kudu cluster with thousands of tables, it's hard for a specified tablet's maintenance OPs to be launched when their scores are not the highest, even if the table the tablet belongs to is high priority for Kudu users. This patch allows administators to specify different priorities for tables by gflags, these maintenance OPs of these high priority tables have greater chance to be launched. Change-Id: I3ea3b73505157678a8fb551656123b64e6bfb304 Reviewed-on: [http://gerrit.cloudera.org:8080/12852]Tested-by: Adar Dembo Reviewed-by: Adar Dembo > Make some tables in high priority in MM compaction > -- > > Key: KUDU-2824 > URL: https://issues.apache.org/jira/browse/KUDU-2824 > Project: Kudu > Issue Type: Improvement > Components: tserver >Affects Versions: 1.9.0 >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Minor > Labels: MM, compaction, maintenance, priority > > In a Kudu cluster with thousands of tables, it's hard for a specified > tablet's maintenance OPs to be launched when their scores are not the > highest, even if the table the tablet belongs to is high priority for Kudu > users. > For example, table A has 10 tablets and has total size of 1G, table B has > 1000 tablets and has total size of 100G. Both of them have similar update > writes, i.e. DRSs have similar overlaps, similar redo/undo logs, so they have > similar compaction scores. However, table A has much more reads than table B, > but table A and B are equal in MM, their DRS compactions are lauched equally, > we have to suffer a long time util most of tablets have been compacted in the > cluster to achieve a fast scan. > So, maybe we can introduce some algorithm to detect high priority tables and > speed up compaction of these tables? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2824) Make some tables in high priority in MM compaction
[ https://issues.apache.org/jira/browse/KUDU-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103238#comment-17103238 ] Yingchun Lai commented on KUDU-2824: [maintenance] Support priorities for tables in MM compaction This commit adds a feature to specify different priorities for table compaction. In a Kudu cluster with thousands of tables, it's hard for a specified tablet's maintenance OPs to be launched when their scores are not the highest, even if the table the tablet belongs to is high priority for Kudu users. This patch allows administators to specify different priorities for tables by gflags, these maintenance OPs of these high priority tables have greater chance to be launched. Change-Id: I3ea3b73505157678a8fb551656123b64e6bfb304 Reviewed-on: [http://gerrit.cloudera.org:8080/12852]Tested-by: Adar Dembo Reviewed-by: Adar Dembo > Make some tables in high priority in MM compaction > -- > > Key: KUDU-2824 > URL: https://issues.apache.org/jira/browse/KUDU-2824 > Project: Kudu > Issue Type: Improvement > Components: tserver >Affects Versions: 1.9.0 >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Minor > Labels: MM, compaction, maintenance, priority > > In a Kudu cluster with thousands of tables, it's hard for a specified > tablet's maintenance OPs to be launched when their scores are not the > highest, even if the table the tablet belongs to is high priority for Kudu > users. > For example, table A has 10 tablets and has total size of 1G, table B has > 1000 tablets and has total size of 100G. Both of them have similar update > writes, i.e. DRSs have similar overlaps, similar redo/undo logs, so they have > similar compaction scores. However, table A has much more reads than table B, > but table A and B are equal in MM, their DRS compactions are lauched equally, > we have to suffer a long time util most of tablets have been compacted in the > cluster to achieve a fast scan. > So, maybe we can introduce some algorithm to detect high priority tables and > speed up compaction of these tables? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2824) Make some tables in high priority in MM compaction
[ https://issues.apache.org/jira/browse/KUDU-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai resolved KUDU-2824. Resolution: Fixed > Make some tables in high priority in MM compaction > -- > > Key: KUDU-2824 > URL: https://issues.apache.org/jira/browse/KUDU-2824 > Project: Kudu > Issue Type: Improvement > Components: tserver >Affects Versions: 1.9.0 >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Minor > Labels: MM, compaction, maintenance, priority > Fix For: 1.10.0 > > > In a Kudu cluster with thousands of tables, it's hard for a specified > tablet's maintenance OPs to be launched when their scores are not the > highest, even if the table the tablet belongs to is high priority for Kudu > users. > For example, table A has 10 tablets and has total size of 1G, table B has > 1000 tablets and has total size of 100G. Both of them have similar update > writes, i.e. DRSs have similar overlaps, similar redo/undo logs, so they have > similar compaction scores. However, table A has much more reads than table B, > but table A and B are equal in MM, their DRS compactions are lauched equally, > we have to suffer a long time util most of tablets have been compacted in the > cluster to achieve a fast scan. > So, maybe we can introduce some algorithm to detect high priority tables and > speed up compaction of these tables? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2984) memory_gc-itest is flaky
[ https://issues.apache.org/jira/browse/KUDU-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125073#comment-17125073 ] Yingchun Lai commented on KUDU-2984: Has been fixed by [https://gerrit.cloudera.org/c/14553/]. > memory_gc-itest is flaky > > > Key: KUDU-2984 > URL: https://issues.apache.org/jira/browse/KUDU-2984 > Project: Kudu > Issue Type: Bug > Components: test >Affects Versions: 1.11.0, 1.12.0 >Reporter: Alexey Serbin >Assignee: Yingchun Lai >Priority: Minor > Attachments: memory_gc-itest.txt.xz > > > The {{memory_gc-itest}} fails time to time with the following error message > (DEBUG build): > {noformat} > src/kudu/integration-tests/memory_gc-itest.cc:117: Failure > Expected: (ratio) >= (0.1), actual: 0.0600604 vs 0.1 > tserver-2 > src/kudu/util/test_util.cc:339: Failure > Failed > Timed out waiting for assertion to pass. > {noformat} > The full log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2984) memory_gc-itest is flaky
[ https://issues.apache.org/jira/browse/KUDU-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai resolved KUDU-2984. Fix Version/s: 1.12.0 Resolution: Fixed > memory_gc-itest is flaky > > > Key: KUDU-2984 > URL: https://issues.apache.org/jira/browse/KUDU-2984 > Project: Kudu > Issue Type: Bug > Components: test >Affects Versions: 1.11.0, 1.12.0 >Reporter: Alexey Serbin >Assignee: Yingchun Lai >Priority: Minor > Fix For: 1.12.0 > > Attachments: memory_gc-itest.txt.xz > > > The {{memory_gc-itest}} fails time to time with the following error message > (DEBUG build): > {noformat} > src/kudu/integration-tests/memory_gc-itest.cc:117: Failure > Expected: (ratio) >= (0.1), actual: 0.0600604 vs 0.1 > tserver-2 > src/kudu/util/test_util.cc:339: Failure > Failed > Timed out waiting for assertion to pass. > {noformat} > The full log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2984) memory_gc-itest is flaky
[ https://issues.apache.org/jira/browse/KUDU-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-2984: --- Affects Version/s: (was: 1.12.0) > memory_gc-itest is flaky > > > Key: KUDU-2984 > URL: https://issues.apache.org/jira/browse/KUDU-2984 > Project: Kudu > Issue Type: Bug > Components: test >Affects Versions: 1.11.0 >Reporter: Alexey Serbin >Assignee: Yingchun Lai >Priority: Minor > Fix For: 1.12.0 > > Attachments: memory_gc-itest.txt.xz > > > The {{memory_gc-itest}} fails time to time with the following error message > (DEBUG build): > {noformat} > src/kudu/integration-tests/memory_gc-itest.cc:117: Failure > Expected: (ratio) >= (0.1), actual: 0.0600604 vs 0.1 > tserver-2 > src/kudu/util/test_util.cc:339: Failure > Failed > Timed out waiting for assertion to pass. > {noformat} > The full log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3304) Alter table support set replica number
Yingchun Lai created KUDU-3304: -- Summary: Alter table support set replica number Key: KUDU-3304 URL: https://issues.apache.org/jira/browse/KUDU-3304 Project: Kudu Issue Type: New Feature Components: client Affects Versions: 1.15.0 Reporter: Yingchun Lai For some historical reason, there maybe some tables with only one replica, when we want to increase their replication factor to 3, seems there is no way. I want to add a alter method to do this work, typically, it will used in CLI tools. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3304) Alter table support set replica number
[ https://issues.apache.org/jira/browse/KUDU-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378847#comment-17378847 ] Yingchun Lai commented on KUDU-3304: duplicate with https://issues.apache.org/jira/browse/KUDU-2357 > Alter table support set replica number > -- > > Key: KUDU-3304 > URL: https://issues.apache.org/jira/browse/KUDU-3304 > Project: Kudu > Issue Type: New Feature > Components: client >Affects Versions: 1.15.0 >Reporter: Yingchun Lai >Priority: Minor > > For some historical reason, there maybe some tables with only one replica, > when we want to increase their replication factor to 3, seems there is no way. > I want to add a alter method to do this work, typically, it will used in CLI > tools. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (KUDU-3304) Alter table support set replica number
[ https://issues.apache.org/jira/browse/KUDU-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3304: --- Comment: was deleted (was: duplicate with https://issues.apache.org/jira/browse/KUDU-2357) > Alter table support set replica number > -- > > Key: KUDU-3304 > URL: https://issues.apache.org/jira/browse/KUDU-3304 > Project: Kudu > Issue Type: New Feature > Components: client >Affects Versions: 1.15.0 >Reporter: Yingchun Lai >Priority: Minor > > For some historical reason, there maybe some tables with only one replica, > when we want to increase their replication factor to 3, seems there is no way. > I want to add a alter method to do this work, typically, it will used in CLI > tools. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-1954) Improve maintenance manager behavior in heavy write workload
[ https://issues.apache.org/jira/browse/KUDU-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388492#comment-17388492 ] Yingchun Lai commented on KUDU-1954: Although we have tried to reduce a single compaction operation's duration, it is still possible in some special environments compaction OPs run slower than data ingestion. In some environments, the machines may have only spinning disks, or even a single spinning disk, the --maintenance_manager_num_threads is set to 1, once the thread is lauching some heavy compaction OPs, flush OPs will wait a long time to be lauched. I think we can introduce a seperate flush threads to do flush OPs specially, which is similar to how RocksDB works[1]. 1. https://github.com/facebook/rocksdb/blob/4361d6d16380f619833d58225183cbfbb2c7a1dd/include/rocksdb/options.h#L599-L658 > Improve maintenance manager behavior in heavy write workload > > > Key: KUDU-1954 > URL: https://issues.apache.org/jira/browse/KUDU-1954 > Project: Kudu > Issue Type: Improvement > Components: compaction, perf, tserver >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Priority: Major > Labels: performance, roadmap-candidate, scalability > Attachments: mm-trace.png > > > During the investigation in [this > doc|https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit] > I found a few maintenance-manager-related issues during heavy writes: > - we don't schedule flushes until we are already in "backpressure" realm, so > we spent most of our time doing backpressure > - even if we configure N maintenance threads, we typically are only using > ~50% of those threads due to the scheduling granularity > - when we do hit the "memory-pressure flush" threshold, all threads quickly > switch to flushing, which then brings us far beneath the threshold > - long running compactions can temporarily starve flushes > - high volume of writes can starve compactions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3318) Log Block Container metadata consumed too much disk space
Yingchun Lai created KUDU-3318: -- Summary: Log Block Container metadata consumed too much disk space Key: KUDU-3318 URL: https://issues.apache.org/jira/browse/KUDU-3318 Project: Kudu Issue Type: Improvement Components: fs Reporter: Yingchun Lai In log block container, blocks in .data file are append only, there is a related append only .metadata file to trace blocks in .data, this type of entries in metadata are in CREATE type, the other type of entries in metadata are type of DELETE, it means mark the corresponding CREATE block as deleted. If there is a pair of CREATE and DELETE entries of a same block id, LBM use hole punch to reclaim disk space in .data file, but the entries in .metadata will not be compacted except bootstrap. Another way to limit metadata is the .data file offset reach its size limitation(default 10GB), or block number in metadata reach its limitation(no limit on default). I found a case in product environment that metadata consumed too many disk space and near to .data's disk space, it's a waste, and make users confused and complain that the actual disk space is far more than user's data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3318) Log Block Container metadata consumed too much disk space
[ https://issues.apache.org/jira/browse/KUDU-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413461#comment-17413461 ] Yingchun Lai commented on KUDU-3318: A easy way to resolve this problem si to add a limitation to .metadata file too, when it reach that limit size(similar to data file reach its limit offset, or block number reach its number limit), the container is refused to append more blocks, and then after all blocks are deleted, the whole container will be removed. > Log Block Container metadata consumed too much disk space > - > > Key: KUDU-3318 > URL: https://issues.apache.org/jira/browse/KUDU-3318 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > In log block container, blocks in .data file are append only, there is a > related append only .metadata file to trace blocks in .data, this type of > entries in metadata are in CREATE type, the other type of entries in metadata > are type of DELETE, it means mark the corresponding CREATE block as deleted. > If there is a pair of CREATE and DELETE entries of a same block id, LBM use > hole punch to reclaim disk space in .data file, but the entries in .metadata > will not be compacted except bootstrap. > Another way to limit metadata is the .data file offset reach its size > limitation(default 10GB), or block number in metadata reach its limitation(no > limit on default). > I found a case in product environment that metadata consumed too many disk > space and near to .data's disk space, it's a waste, and make users confused > and complain that the actual disk space is far more than user's data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3318) Log Block Container metadata consumed too much disk space
[ https://issues.apache.org/jira/browse/KUDU-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413462#comment-17413462 ] Yingchun Lai commented on KUDU-3318: Another way to optimize the situation is to compact metadata at runtime, now it is only compact at bootstrap. > Log Block Container metadata consumed too much disk space > - > > Key: KUDU-3318 > URL: https://issues.apache.org/jira/browse/KUDU-3318 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > In log block container, blocks in .data file are append only, there is a > related append only .metadata file to trace blocks in .data, this type of > entries in metadata are in CREATE type, the other type of entries in metadata > are type of DELETE, it means mark the corresponding CREATE block as deleted. > If there is a pair of CREATE and DELETE entries of a same block id, LBM use > hole punch to reclaim disk space in .data file, but the entries in .metadata > will not be compacted except bootstrap. > Another way to limit metadata is the .data file offset reach its size > limitation(default 10GB), or block number in metadata reach its limitation(no > limit on default). > I found a case in product environment that metadata consumed too many disk > space and near to .data's disk space, it's a waste, and make users confused > and complain that the actual disk space is far more than user's data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KUDU-3318) Log Block Container metadata consumed too much disk space
[ https://issues.apache.org/jira/browse/KUDU-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413462#comment-17413462 ] Yingchun Lai edited comment on KUDU-3318 at 9/11/21, 3:03 AM: -- Another way to optimize the situation is to compact metadata at runtime, now it is only compact at bootstrap. We can implement it in the future. was (Author: laiyingchun): Another way to optimize the situation is to compact metadata at runtime, now it is only compact at bootstrap. > Log Block Container metadata consumed too much disk space > - > > Key: KUDU-3318 > URL: https://issues.apache.org/jira/browse/KUDU-3318 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > In log block container, blocks in .data file are append only, there is a > related append only .metadata file to trace blocks in .data, this type of > entries in metadata are in CREATE type, the other type of entries in metadata > are type of DELETE, it means mark the corresponding CREATE block as deleted. > If there is a pair of CREATE and DELETE entries of a same block id, LBM use > hole punch to reclaim disk space in .data file, but the entries in .metadata > will not be compacted except bootstrap. > Another way to limit metadata is the .data file offset reach its size > limitation(default 10GB), or block number in metadata reach its limitation(no > limit on default). > I found a case in product environment that metadata consumed too many disk > space and near to .data's disk space, it's a waste, and make users confused > and complain that the actual disk space is far more than user's data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3318) Log Block Container metadata consumed too much disk space
[ https://issues.apache.org/jira/browse/KUDU-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3318: --- Description: In log block container, blocks in .data file are append only, there is a related append only .metadata file to trace blocks in .data, this type of entries in metadata are in CREATE type, the other type of entries in metadata are type of DELETE, it means mark the corresponding CREATE block as deleted. If there is a pair of CREATE and DELETE entries of a same block id, LBM use hole punch to reclaim disk space in .data file, but the entries in .metadata will not be compacted except bootstrap. Another way to limit metadata is the .data file offset reach its size limitation(default 10GB), or block number in metadata reach its limitation(no limit on default). I found a case in product environment that metadata consumed too many disk space and near to .data's disk space, it's a waste, and make users confused and complain that the actual disk space is far more than user's data. {code:java} [root@hybrid01 data]# du -cs *.metadata | sort -n | tail 19072 fb58e00979914e95aae7184e3189c8c6.metadata 19092 5bbf54294d5948c4a695e240e81d5f80.metadata 19168 89da5f3c4dfa469a9935f091bced1856.metadata 19200 f27e6ff14bd44fd1838f63f1be35ee64.metadata 19256 7b87a5e3c7fa4d3d86dcd3945d6741e1.metadata 19256 cf054d1aa7cb4f5cbbbce3b99189bbe1.metadata 19496 a6cbb4a284b842deafe6939be051c77c.metadata 19568 ba749640df684cb8868d6e51ea3d1b17.metadata 19924 e5469080934746e58b0fd2ba29d69c9d.metadata 148954280 total [root@hybrid01 data]# du -cs *.data | sort -n | tail 64568 46dfbc5ac94d429b8d79a536727495df.data 64568 b4abc59d4eb2473ca267e0b057c8fad7.data 65728 576e09ed7e164ddebe5b1702be296619.data 66368 88d295f38dec4197bfbc6927e0528bde.data 90904 7291e10aafe74f2792168f6146738c5d.data 96788 6e72381ae95840f99864baacbc9169af.data 98060 c413553491764d039e702577606bac02.data 103556 a5db7a9c2e93457aa06103e45f59d8b4.data 138200 3876af02694643d49b19b39789460759.data 176443948 total {code} was: In log block container, blocks in .data file are append only, there is a related append only .metadata file to trace blocks in .data, this type of entries in metadata are in CREATE type, the other type of entries in metadata are type of DELETE, it means mark the corresponding CREATE block as deleted. If there is a pair of CREATE and DELETE entries of a same block id, LBM use hole punch to reclaim disk space in .data file, but the entries in .metadata will not be compacted except bootstrap. Another way to limit metadata is the .data file offset reach its size limitation(default 10GB), or block number in metadata reach its limitation(no limit on default). I found a case in product environment that metadata consumed too many disk space and near to .data's disk space, it's a waste, and make users confused and complain that the actual disk space is far more than user's data. > Log Block Container metadata consumed too much disk space > - > > Key: KUDU-3318 > URL: https://issues.apache.org/jira/browse/KUDU-3318 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > In log block container, blocks in .data file are append only, there is a > related append only .metadata file to trace blocks in .data, this type of > entries in metadata are in CREATE type, the other type of entries in metadata > are type of DELETE, it means mark the corresponding CREATE block as deleted. > If there is a pair of CREATE and DELETE entries of a same block id, LBM use > hole punch to reclaim disk space in .data file, but the entries in .metadata > will not be compacted except bootstrap. > Another way to limit metadata is the .data file offset reach its size > limitation(default 10GB), or block number in metadata reach its limitation(no > limit on default). > I found a case in product environment that metadata consumed too many disk > space and near to .data's disk space, it's a waste, and make users confused > and complain that the actual disk space is far more than user's data. > > {code:java} > [root@hybrid01 data]# du -cs *.metadata | sort -n | tail > 19072 fb58e00979914e95aae7184e3189c8c6.metadata > 19092 5bbf54294d5948c4a695e240e81d5f80.metadata > 19168 89da5f3c4dfa469a9935f091bced1856.metadata > 19200 f27e6ff14bd44fd1838f63f1be35ee64.metadata > 19256 7b87a5e3c7fa4d3d86dcd3945d6741e1.metadata > 19256 cf054d1aa7cb4f5cbbbce3b99189bbe1.metadata > 19496 a6cbb4a284b842deafe6939be051c77c.metadata > 19568 ba749640df684cb8868d6e51ea3d1b17.metadata > 19924 e5469080934746e58b0fd2ba29d69c9d.metadata > 148954280 total > [root@hybrid01 data]# du -cs *.data | sort -n | tail > 64568 46dfbc5ac94d429b8d79a536727495df.data > 64568 b4abc59d4eb2473ca267e0b057c8fad7.da
[jira] [Updated] (KUDU-3318) Log Block Container metadata consumed too much disk space
[ https://issues.apache.org/jira/browse/KUDU-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3318: --- Description: In log block container, blocks in .data file are append only, there is a related append only .metadata file to trace blocks in .data, this type of entries in metadata are in CREATE type, the other type of entries in metadata are type of DELETE, it means mark the corresponding CREATE block as deleted. If there is a pair of CREATE and DELETE entries of a same block id, LBM use hole punch to reclaim disk space in .data file, but the entries in .metadata will not be compacted except bootstrap. Another way to limit metadata is the .data file offset reach its size limitation(default 10GB), or block number in metadata reach its limitation(no limit on default). I found a case in product environment that metadata consumed too many disk space and near to .data's disk space, it's a waste, and make users confused and complain that the actual disk space is far more than user's data. {code:java} [root@hybrid01 data]# du -cs *.metadata | sort -n | tail 19072 fb58e00979914e95aae7184e3189c8c6.metadata 19092 5bbf54294d5948c4a695e240e81d5f80.metadata 19168 89da5f3c4dfa469a9935f091bced1856.metadata 19200 f27e6ff14bd44fd1838f63f1be35ee64.metadata 19256 7b87a5e3c7fa4d3d86dcd3945d6741e1.metadata 19256 cf054d1aa7cb4f5cbbbce3b99189bbe1.metadata 19496 a6cbb4a284b842deafe6939be051c77c.metadata 19568 ba749640df684cb8868d6e51ea3d1b17.metadata 19924 e5469080934746e58b0fd2ba29d69c9d.metadata 148954280 total// all metadata size ~149GB [root@hybrid01 data]# du -cs *.data | sort -n | tail 64568 46dfbc5ac94d429b8d79a536727495df.data 64568 b4abc59d4eb2473ca267e0b057c8fad7.data 65728 576e09ed7e164ddebe5b1702be296619.data 66368 88d295f38dec4197bfbc6927e0528bde.data 90904 7291e10aafe74f2792168f6146738c5d.data 96788 6e72381ae95840f99864baacbc9169af.data 98060 c413553491764d039e702577606bac02.data 103556 a5db7a9c2e93457aa06103e45f59d8b4.data 138200 3876af02694643d49b19b39789460759.data 176443948 total // all data size ~176GB [root@hybrid01 data]# kudu pbc dump e5469080934746e58b0fd2ba29d69c9d.metadata --oneline | awk '{print $5}' | sort | uniq -c | egrep -v " 2 " 1 6165611810 // low live ratio, only 1 live block {code} was: In log block container, blocks in .data file are append only, there is a related append only .metadata file to trace blocks in .data, this type of entries in metadata are in CREATE type, the other type of entries in metadata are type of DELETE, it means mark the corresponding CREATE block as deleted. If there is a pair of CREATE and DELETE entries of a same block id, LBM use hole punch to reclaim disk space in .data file, but the entries in .metadata will not be compacted except bootstrap. Another way to limit metadata is the .data file offset reach its size limitation(default 10GB), or block number in metadata reach its limitation(no limit on default). I found a case in product environment that metadata consumed too many disk space and near to .data's disk space, it's a waste, and make users confused and complain that the actual disk space is far more than user's data. {code:java} [root@hybrid01 data]# du -cs *.metadata | sort -n | tail 19072 fb58e00979914e95aae7184e3189c8c6.metadata 19092 5bbf54294d5948c4a695e240e81d5f80.metadata 19168 89da5f3c4dfa469a9935f091bced1856.metadata 19200 f27e6ff14bd44fd1838f63f1be35ee64.metadata 19256 7b87a5e3c7fa4d3d86dcd3945d6741e1.metadata 19256 cf054d1aa7cb4f5cbbbce3b99189bbe1.metadata 19496 a6cbb4a284b842deafe6939be051c77c.metadata 19568 ba749640df684cb8868d6e51ea3d1b17.metadata 19924 e5469080934746e58b0fd2ba29d69c9d.metadata 148954280 total [root@hybrid01 data]# du -cs *.data | sort -n | tail 64568 46dfbc5ac94d429b8d79a536727495df.data 64568 b4abc59d4eb2473ca267e0b057c8fad7.data 65728 576e09ed7e164ddebe5b1702be296619.data 66368 88d295f38dec4197bfbc6927e0528bde.data 90904 7291e10aafe74f2792168f6146738c5d.data 96788 6e72381ae95840f99864baacbc9169af.data 98060 c413553491764d039e702577606bac02.data 103556 a5db7a9c2e93457aa06103e45f59d8b4.data 138200 3876af02694643d49b19b39789460759.data 176443948 total {code} > Log Block Container metadata consumed too much disk space > - > > Key: KUDU-3318 > URL: https://issues.apache.org/jira/browse/KUDU-3318 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > In log block container, blocks in .data file are append only, there is a > related append only .metadata file to trace blocks in .data, this type of > entries in metadata are in CREATE type, the other type of entries in metadata > are type of DELETE, it means mark the corresponding CREATE block as deleted. > If there is a pair of CREATE and DE
[jira] [Comment Edited] (KUDU-3318) Log Block Container metadata consumed too much disk space
[ https://issues.apache.org/jira/browse/KUDU-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413461#comment-17413461 ] Yingchun Lai edited comment on KUDU-3318 at 9/11/21, 3:29 AM: -- A easy way to resolve this problem is to add a limitation to .metadata file too, when it reach that limit size(similar to data file reach its limit offset, or block number reach its number limit), the container is refused to append more blocks, and then after all blocks are deleted, the whole container will be removed. was (Author: laiyingchun): A easy way to resolve this problem si to add a limitation to .metadata file too, when it reach that limit size(similar to data file reach its limit offset, or block number reach its number limit), the container is refused to append more blocks, and then after all blocks are deleted, the whole container will be removed. > Log Block Container metadata consumed too much disk space > - > > Key: KUDU-3318 > URL: https://issues.apache.org/jira/browse/KUDU-3318 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > In log block container, blocks in .data file are append only, there is a > related append only .metadata file to trace blocks in .data, this type of > entries in metadata are in CREATE type, the other type of entries in metadata > are type of DELETE, it means mark the corresponding CREATE block as deleted. > If there is a pair of CREATE and DELETE entries of a same block id, LBM use > hole punch to reclaim disk space in .data file, but the entries in .metadata > will not be compacted except bootstrap. > Another way to limit metadata is the .data file offset reach its size > limitation(default 10GB), or block number in metadata reach its limitation(no > limit on default). > I found a case in product environment that metadata consumed too many disk > space and near to .data's disk space, it's a waste, and make users confused > and complain that the actual disk space is far more than user's data. > > {code:java} > [root@hybrid01 data]# du -cs *.metadata | sort -n | tail > 19072 fb58e00979914e95aae7184e3189c8c6.metadata > 19092 5bbf54294d5948c4a695e240e81d5f80.metadata > 19168 89da5f3c4dfa469a9935f091bced1856.metadata > 19200 f27e6ff14bd44fd1838f63f1be35ee64.metadata > 19256 7b87a5e3c7fa4d3d86dcd3945d6741e1.metadata > 19256 cf054d1aa7cb4f5cbbbce3b99189bbe1.metadata > 19496 a6cbb4a284b842deafe6939be051c77c.metadata > 19568 ba749640df684cb8868d6e51ea3d1b17.metadata > 19924 e5469080934746e58b0fd2ba29d69c9d.metadata > 148954280 total// all metadata size ~149GB > [root@hybrid01 data]# du -cs *.data | sort -n | tail > 64568 46dfbc5ac94d429b8d79a536727495df.data > 64568 b4abc59d4eb2473ca267e0b057c8fad7.data > 65728 576e09ed7e164ddebe5b1702be296619.data > 66368 88d295f38dec4197bfbc6927e0528bde.data > 90904 7291e10aafe74f2792168f6146738c5d.data > 96788 6e72381ae95840f99864baacbc9169af.data > 98060 c413553491764d039e702577606bac02.data > 103556 a5db7a9c2e93457aa06103e45f59d8b4.data > 138200 3876af02694643d49b19b39789460759.data > 176443948 total // all data size ~176GB > [root@hybrid01 data]# kudu pbc dump e5469080934746e58b0fd2ba29d69c9d.metadata > --oneline | awk '{print $5}' | sort | uniq -c | egrep -v " 2 " > 1 6165611810 // low live ratio, only 1 live block > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3318) Log Block Container metadata consumed too much disk space
[ https://issues.apache.org/jira/browse/KUDU-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai resolved KUDU-3318. Fix Version/s: 1.16.0 Resolution: Fixed > Log Block Container metadata consumed too much disk space > - > > Key: KUDU-3318 > URL: https://issues.apache.org/jira/browse/KUDU-3318 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > Fix For: 1.16.0 > > > In log block container, blocks in .data file are append only, there is a > related append only .metadata file to trace blocks in .data, this type of > entries in metadata are in CREATE type, the other type of entries in metadata > are type of DELETE, it means mark the corresponding CREATE block as deleted. > If there is a pair of CREATE and DELETE entries of a same block id, LBM use > hole punch to reclaim disk space in .data file, but the entries in .metadata > will not be compacted except bootstrap. > Another way to limit metadata is the .data file offset reach its size > limitation(default 10GB), or block number in metadata reach its limitation(no > limit on default). > I found a case in product environment that metadata consumed too many disk > space and near to .data's disk space, it's a waste, and make users confused > and complain that the actual disk space is far more than user's data. > > {code:java} > [root@hybrid01 data]# du -cs *.metadata | sort -n | tail > 19072 fb58e00979914e95aae7184e3189c8c6.metadata > 19092 5bbf54294d5948c4a695e240e81d5f80.metadata > 19168 89da5f3c4dfa469a9935f091bced1856.metadata > 19200 f27e6ff14bd44fd1838f63f1be35ee64.metadata > 19256 7b87a5e3c7fa4d3d86dcd3945d6741e1.metadata > 19256 cf054d1aa7cb4f5cbbbce3b99189bbe1.metadata > 19496 a6cbb4a284b842deafe6939be051c77c.metadata > 19568 ba749640df684cb8868d6e51ea3d1b17.metadata > 19924 e5469080934746e58b0fd2ba29d69c9d.metadata > 148954280 total// all metadata size ~149GB > [root@hybrid01 data]# du -cs *.data | sort -n | tail > 64568 46dfbc5ac94d429b8d79a536727495df.data > 64568 b4abc59d4eb2473ca267e0b057c8fad7.data > 65728 576e09ed7e164ddebe5b1702be296619.data > 66368 88d295f38dec4197bfbc6927e0528bde.data > 90904 7291e10aafe74f2792168f6146738c5d.data > 96788 6e72381ae95840f99864baacbc9169af.data > 98060 c413553491764d039e702577606bac02.data > 103556 a5db7a9c2e93457aa06103e45f59d8b4.data > 138200 3876af02694643d49b19b39789460759.data > 176443948 total // all data size ~176GB > [root@hybrid01 data]# kudu pbc dump e5469080934746e58b0fd2ba29d69c9d.metadata > --oneline | awk '{print $5}' | sort | uniq -c | egrep -v " 2 " > 1 6165611810 // low live ratio, only 1 live block > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3332) Master coredump when add columns after unsafe_rebuild master
Yingchun Lai created KUDU-3332: -- Summary: Master coredump when add columns after unsafe_rebuild master Key: KUDU-3332 URL: https://issues.apache.org/jira/browse/KUDU-3332 Project: Kudu Issue Type: Bug Components: CLI Affects Versions: NA Reporter: Yingchun Lai When do master unsafe_rebuild, tables' next_column_id is set to (2^31 - 1) / 2, i.e. 2^30 - 1. After that, new added column's id is set to 2^30 - 1, 2^30, 2^30 + 1, ... We use an IdMapping to maintainance column id to it's index, like 2^30 - 1 -> 200, 2^30 -> 201, 2^30 + 1 -> 202. However, the IdMapping's implemention use a vector to save all the k-v pairs, and the key is nearly the index of IdMapping. So we have to use a very large vector to save a column id like 2^30, and future more, it will increase doubly when found capacity not enough. When the column id is 2^30, the double size is 2^31, which is overflow, will cause master crash. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3332) Master coredump when add columns after unsafe_rebuild master
[ https://issues.apache.org/jira/browse/KUDU-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435537#comment-17435537 ] Yingchun Lai commented on KUDU-3332: !image-2021-10-28-20-24-25-742.png! > Master coredump when add columns after unsafe_rebuild master > > > Key: KUDU-3332 > URL: https://issues.apache.org/jira/browse/KUDU-3332 > Project: Kudu > Issue Type: Bug > Components: CLI >Affects Versions: NA >Reporter: Yingchun Lai >Priority: Major > > When do master unsafe_rebuild, tables' next_column_id is set to (2^31 - 1) / > 2, i.e. 2^30 - 1. > After that, new added column's id is set to 2^30 - 1, 2^30, 2^30 + 1, ... We > use an IdMapping to maintainance column id to it's index, like 2^30 - 1 -> > 200, 2^30 -> 201, 2^30 + 1 -> 202. > However, the IdMapping's implemention use a vector to save all the k-v pairs, > and the key is nearly the index of IdMapping. So we have to use a very large > vector to save a column id like 2^30, and future more, it will increase > doubly when found capacity not enough. > When the column id is 2^30, the double size is 2^31, which is overflow, will > cause master crash. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3332) Master coredump when add columns after unsafe_rebuild master
[ https://issues.apache.org/jira/browse/KUDU-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai resolved KUDU-3332. Fix Version/s: 1.16.0 Resolution: Fixed > Master coredump when add columns after unsafe_rebuild master > > > Key: KUDU-3332 > URL: https://issues.apache.org/jira/browse/KUDU-3332 > Project: Kudu > Issue Type: Bug > Components: CLI >Affects Versions: NA >Reporter: Yingchun Lai >Priority: Major > Fix For: 1.16.0 > > > When do master unsafe_rebuild, tables' next_column_id is set to (2^31 - 1) / > 2, i.e. 2^30 - 1. > After that, new added column's id is set to 2^30 - 1, 2^30, 2^30 + 1, ... We > use an IdMapping to maintainance column id to it's index, like 2^30 - 1 -> > 200, 2^30 -> 201, 2^30 + 1 -> 202. > However, the IdMapping's implemention use a vector to save all the k-v pairs, > and the key is nearly the index of IdMapping. So we have to use a very large > vector to save a column id like 2^30, and future more, it will increase > doubly when found capacity not enough. > When the column id is 2^30, the double size is 2^31, which is overflow, will > cause master crash. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (KUDU-3332) Master coredump when add columns after unsafe_rebuild master
[ https://issues.apache.org/jira/browse/KUDU-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai closed KUDU-3332. -- > Master coredump when add columns after unsafe_rebuild master > > > Key: KUDU-3332 > URL: https://issues.apache.org/jira/browse/KUDU-3332 > Project: Kudu > Issue Type: Bug > Components: CLI >Affects Versions: NA >Reporter: Yingchun Lai >Priority: Major > Fix For: 1.16.0 > > > When do master unsafe_rebuild, tables' next_column_id is set to (2^31 - 1) / > 2, i.e. 2^30 - 1. > After that, new added column's id is set to 2^30 - 1, 2^30, 2^30 + 1, ... We > use an IdMapping to maintainance column id to it's index, like 2^30 - 1 -> > 200, 2^30 -> 201, 2^30 + 1 -> 202. > However, the IdMapping's implemention use a vector to save all the k-v pairs, > and the key is nearly the index of IdMapping. So we have to use a very large > vector to save a column id like 2^30, and future more, it will increase > doubly when found capacity not enough. > When the column id is 2^30, the double size is 2^31, which is overflow, will > cause master crash. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3290) Implement Replicate table's data to Kafka(or other Storage System)
[ https://issues.apache.org/jira/browse/KUDU-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442194#comment-17442194 ] Yingchun Lai commented on KUDU-3290: In some cases, there may be multiple upstreams, and we want to 'hot backup' all these data into a single downstream, with sensitive latency, this feature would be helpful. I've read the doc and left some comments, [~shenxingwuying] have improved some points of the design. [~awong] Could you give more suggestion about it ? > Implement Replicate table's data to Kafka(or other Storage System) > -- > > Key: KUDU-3290 > URL: https://issues.apache.org/jira/browse/KUDU-3290 > Project: Kudu > Issue Type: New Feature > Components: tserver >Reporter: shenxingwuying >Priority: Critical > > h1. background & problem > We use kudu to store the user profile data, because business requirements, > exchange and share data from multi-tenant users, which is reasonable in our > application scene, we need replicate data from one system to another. The > destination storage system we pick kafka, because of our company's > architecture at now. > At this time, we have two ideas to solve it. > h1. two replication scheme > Generally, Raft group has three replicas, one is leader and the other two are > followers. We’ll add a replica, its role is Learner. Learner only receive all > the data, but not pariticipart in ther leadership election. > The learner replica, its state machine will be a plugin system, eg: > # We can support KuduEngine, which just a data backup like mongodb’s hidden > replica. > # We can write to the thirdparty store system, like kafka or any other > system we need. Then we can replicate data to another system use its client. > At Paxos has a learner role, which only receive data. we need such a role for > new membership. > But it Kudu Learner has been used for the copying(recovering) tablet replica. > Maybe we need a new role name, at this, we still use Learner to represent the > new role. (We should think over new role name) > In our application scene, we will replicate data to kafka, and I will explain > the method. > h2. Learner replication > # Add a new replica role, maybe we call it learner, because Paxos has a > learner role, which only receive data. We need such a role for new > membership. But at Kudu Learner has been used for the copying(recovering) > tablet replica. Maybe we need a new role name, at this, we still use Learner > to represent the new role. (We should think over new role name) > # The voters's safepoint of clean obsoleted wal is min(leader’ max wal > sequence number, followers max wal sequence number, learner’ max wal sequence > number) > # The learner not voter, not partitipant in elections > # Raft can replication data to the learner > # The process of learner applydb, just like raft followers, the logs before > committed index will replicate to kafka, kafka’s response ok. the apply index > will increase. > # We need kafka client, it will be added to kudu as an option, maybe as an > compile option > # When a kudu-tserver decomission or corrupted, the learner must move to new > kudu-tserver. So the leader should save learner apply OpId, and replicate to > followers, when learner's failover when leader down. > # The leader must save the learners apply OpId and replicate it to > followers, when learner's recovery can make sure no data loss when leader > down. If leader no save the applyIndex, learner maybe loss data > # Followers save the learners applyindex and term, coz followers maybe > become leader. > # When load balancer running,we shoud support move learner another > kudu-tserver > # Table should add a switch option to determine whether raft group has > learner, can support setting it when creating table. > # Support altering table to add learners maybe an idea, but need solve the > base data migrate problem. > # Base data migrate. The simple but heavy cost, when learner's max_OpId < > committed_OpId (maybe data loss, maybe we alter table add learner replication > for a existing table), we can trigger a full scan at the timestamp and > replicate data to learner, and then recover the appendEntries flow. > # Kudu not support split and merge, we not discuss it now. If KuduSupport > split or merge, we can implement it use 12, of course we can use more better > method. > # If we need the funtion, our cluster should at least 4 tservers. > If kafka fail or topic not exist, the learner will stop replicate wal, that > will occupt more disk space. if learner loss or corrupted, it can recover > from the leader. We need make sure the safepoint. > h2. Leader replication > We can replication data to kafka or any other storage system from leader >
[jira] [Created] (KUDU-3353) Support setnx semantic on column
Yingchun Lai created KUDU-3353: -- Summary: Support setnx semantic on column Key: KUDU-3353 URL: https://issues.apache.org/jira/browse/KUDU-3353 Project: Kudu Issue Type: New Feature Components: api, server Reporter: Yingchun Lai -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3353: --- Description: h1. motivation In some usage scenarios, Kudu table has a column with semantic of "create time", which means it represent the create timestamp of the row. The other columns have the similar semantic as before, for example, the user properties like age, address, and etc. Upstream and Kudu user doesn't know whether a row is exist or not, and every cell data is the lastest ingested from, for example, event stream. If without the "create time" column, Kudu user can use UPSERT operations to write data to the table, every columns with data will overwrite the old data. But if with the "create time" column, the cell data will be overwrote by the following UPSERT ops, which is not what we expect. To achive the goal, we have to > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3353: --- Description: h1. motivation In some usage scenarios, Kudu table has a column with semantic of "create time", which means it represent the create timestamp of the row. The other columns have the similar semantic as before, for example, the user properties like age, address, and etc. Upstream and Kudu user doesn't know whether a row is exist or not, and every cell data is the lastest ingested from, for example, event stream. If without the "create time" column, Kudu user can use UPSERT operations to write data to the table, every columns with data will overwrite the old data. But if with the "create time" column, the cell data will be overwrote by the following UPSERT ops, which is not what we expect. To achive the goal, we have to read the column out to judge whether the column is NULL or not, if it's NULL, we can fill the row with the cell, if not NULL, we will drop it from the data before UPSERT, to avoid overwite "create time". It's expensive, is there a way to avoid a read from Kudu? h1. Resolvation We can implement column schema with semantic of "update if null". That means cell data in changelist will update the base data if the latter is NULL, and will ignore updates if it is not NULL. So we can use Kudu similarly as before, but only defined the column as "update if null" when create table or add column. was: h1. motivation In some usage scenarios, Kudu table has a column with semantic of "create time", which means it represent the create timestamp of the row. The other columns have the similar semantic as before, for example, the user properties like age, address, and etc. Upstream and Kudu user doesn't know whether a row is exist or not, and every cell data is the lastest ingested from, for example, event stream. If without the "create time" column, Kudu user can use UPSERT operations to write data to the table, every columns with data will overwrite the old data. But if with the "create time" column, the cell data will be overwrote by the following UPSERT ops, which is not what we expect. To achive the goal, we have to > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495292#comment-17495292 ] Yingchun Lai commented on KUDU-3353: I should clarify that no matter what the schema is, include SETNX columns or not, the update ops will update the row anyway, the difference is whether to update the cells or not. Suppose a table with schema: {code:java} TABLE test ( key INT64 NOT NULL, value1 INT64 NULLABLE, value2 INT64 NULLABLE UPDATE_IF_NULL, // this is a SETNX column PRIMARY KEY (key) ) ...{code} case 1: upsert ops on the table are: {code:java} upsert1: 1, 2, 3 upsert2: 1, 20, 30{code} Then the result will be '1, 20, 3'. (30 will not overwite 3 because it's not NULL) case 2: upsert ops on the table are: {code:java} upsert1: 1, 2, null upsert2: 1, 20, 30{code} Then the result will be '1, 20, 30'. (30 will be update because it's NULL) All the cells in upsert/update ops will be kept as before in changelist, the difference is the behavior of delta applier, overwrite the cell or ignore. > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495292#comment-17495292 ] Yingchun Lai edited comment on KUDU-3353 at 2/21/22, 3:18 AM: -- I should clarify that no matter what the schema is, include SETNX columns or not, the update ops will update the row anyway, the difference is whether to update the cells or not. Suppose a table with schema: {code:java} TABLE test ( key INT64 NOT NULL, value1 INT64 NULLABLE, value2 INT64 NULLABLE UPDATE_IF_NULL, // this is a SETNX column PRIMARY KEY (key) ) ...{code} case 1: upsert ops on the table are: {code:java} upsert1: 1, 2, 3 upsert2: 1, 20, 30{code} Then the result will be '1, 20, 3'. (30 will not overwite 3 because it's not NULL) case 2: upsert ops on the table are: {code:java} upsert1: 1, 2, null upsert2: 1, 20, 30{code} Then the result will be '1, 20, 30'. (30 will be update because it's NULL) All the cells in upsert/update ops will be kept as before in changelist, the difference is the behavior of delta applier, overwrite the cell or ignore. was (Author: laiyingchun): I should clarify that no matter what the schema is, include SETNX columns or not, the update ops will update the row anyway, the difference is whether to update the cells or not. Suppose a table with schema: {code:java} TABLE test ( key INT64 NOT NULL, value1 INT64 NULLABLE, value2 INT64 NULLABLE UPDATE_IF_NULL, // this is a SETNX column PRIMARY KEY (key) ) ...{code} case 1: upsert ops on the table are: {code:java} upsert1: 1, 2, 3 upsert2: 1, 20, 30{code} Then the result will be '1, 20, 3'. (30 will not overwite 3 because it's not NULL) case 2: upsert ops on the table are: {code:java} upsert1: 1, 2, null upsert2: 1, 20, 30{code} Then the result will be '1, 20, 30'. (30 will be update because it's NULL) All the cells in upsert/update ops will be kept as before in changelist, the difference is the behavior of delta applier, overwrite the cell or ignore. > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495292#comment-17495292 ] Yingchun Lai edited comment on KUDU-3353 at 2/21/22, 3:21 AM: -- I should clarify that no matter what the schema is, include SETNX columns or not, the update ops will update the row anyway, the difference is whether to update the cells or not. Suppose a table with schema: {code:java} TABLE test ( key INT64 NOT NULL, value1 INT64 NULLABLE, value2 INT64 NULLABLE UPDATE_IF_NULL, // this is a SETNX column PRIMARY KEY (key) ) ...{code} case 1: upsert ops on the table are: {code:java} upsert1: 1, 2, 3 upsert2: 1, 20, 30{code} Then the result will be '1, 20, 3'. (30 will not overwite 3 because it's not NULL) case 2: upsert ops on the table are: {code:java} upsert1: 1, 2, null upsert2: 1, 20, 30{code} Then the result will be '1, 20, 30'. (30 will be update because it's NULL) All the cells in upsert/update ops will be kept as before in changelist, the difference is the behavior of delta applier, overwrite the cell or ignore. So it's not effective expensive, and would be lighter than overwrite op since there is less no cell copies. was (Author: laiyingchun): I should clarify that no matter what the schema is, include SETNX columns or not, the update ops will update the row anyway, the difference is whether to update the cells or not. Suppose a table with schema: {code:java} TABLE test ( key INT64 NOT NULL, value1 INT64 NULLABLE, value2 INT64 NULLABLE UPDATE_IF_NULL, // this is a SETNX column PRIMARY KEY (key) ) ...{code} case 1: upsert ops on the table are: {code:java} upsert1: 1, 2, 3 upsert2: 1, 20, 30{code} Then the result will be '1, 20, 3'. (30 will not overwite 3 because it's not NULL) case 2: upsert ops on the table are: {code:java} upsert1: 1, 2, null upsert2: 1, 20, 30{code} Then the result will be '1, 20, 30'. (30 will be update because it's NULL) All the cells in upsert/update ops will be kept as before in changelist, the difference is the behavior of delta applier, overwrite the cell or ignore. > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495292#comment-17495292 ] Yingchun Lai edited comment on KUDU-3353 at 2/22/22, 2:13 AM: -- [~anjuwong] Thanks for your reply! I should clarify that no matter what the schema is, include SETNX columns or not, the update ops will update the row anyway, the difference is whether to update the cells or not. Suppose a table with schema: {code:java} TABLE test ( key INT64 NOT NULL, value1 INT64 NULLABLE, value2 INT64 NULLABLE UPDATE_IF_NULL, // this is a SETNX column PRIMARY KEY (key) ) ...{code} case 1: upsert ops on the table are: {code:java} upsert1: 1, 2, 3 upsert2: 1, 20, 30{code} Then the result will be '1, 20, 3'. (30 will not overwite 3 because it's not NULL) case 2: upsert ops on the table are: {code:java} upsert1: 1, 2, null upsert2: 1, 20, 30{code} Then the result will be '1, 20, 30'. (30 will be update because it's NULL) All the cells in upsert/update ops will be kept as before in changelist, the difference is the behavior of delta applier, overwrite the cell or ignore. So it's not effective expensive, and would be lighter than overwrite op since there is less no cell copies. was (Author: laiyingchun): I should clarify that no matter what the schema is, include SETNX columns or not, the update ops will update the row anyway, the difference is whether to update the cells or not. Suppose a table with schema: {code:java} TABLE test ( key INT64 NOT NULL, value1 INT64 NULLABLE, value2 INT64 NULLABLE UPDATE_IF_NULL, // this is a SETNX column PRIMARY KEY (key) ) ...{code} case 1: upsert ops on the table are: {code:java} upsert1: 1, 2, 3 upsert2: 1, 20, 30{code} Then the result will be '1, 20, 3'. (30 will not overwite 3 because it's not NULL) case 2: upsert ops on the table are: {code:java} upsert1: 1, 2, null upsert2: 1, 20, 30{code} Then the result will be '1, 20, 30'. (30 will be update because it's NULL) All the cells in upsert/update ops will be kept as before in changelist, the difference is the behavior of delta applier, overwrite the cell or ignore. So it's not effective expensive, and would be lighter than overwrite op since there is less no cell copies. > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510374#comment-17510374 ] Yingchun Lai commented on KUDU-3353: After discussing with [~anjuwong] , now let me clarify the design: # Add a column property to define a column as IMMUTABLE, means the column cell value can not be updated after been written. # Use UPDATE_IGNORE and add UPSERT_IGNORE, for UPDATE and UPSERT ops but ignore update-errors on IMMUTABLE columns. # Since the column is immutable, we restrict it must be 'NOT NULL'. Otherwise, you can't update the NULL value after the initial insertion. # It's able to add such a column with a default value. All the old column data in the table has the default immutable value, new insertion can specify a cell value on the column or not, if not, default value will be used. > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3353: --- Status: In Review (was: Open) > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KUDU-3371) Use RocksDB to store LBM metadata
Yingchun Lai created KUDU-3371: -- Summary: Use RocksDB to store LBM metadata Key: KUDU-3371 URL: https://issues.apache.org/jira/browse/KUDU-3371 Project: Kudu Issue Type: Improvement Components: fs Reporter: Yingchun Lai -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3371: --- Description: h1. motivation The current LBM container use separate .data and .metadata files. The .data file store the real user data, we can use hole punching to reduce disk space. While the metadata use write protobuf serialized string to a file, in append only mode. Each protobuf object is a struct of BlockRecordPB: {code:java} message BlockRecordPB { required BlockIdPB block_id = 1; // int64 required BlockRecordType op_type = 2; // CREATE or DELETE required uint64 timestamp_us = 3; optional int64 offset = 4; // Required for CREATE. optional int64 length = 5; // Required for CREATE. } {code} That means each object is either type of CREATE or DELETE. To mark a 'block' as deleted, there will be 2 objects in the metadata, one is CREATE type and the other is DELETE type. There are some weak points of current LBM metadata storage mechanism: h2. 1. Disk space amplification h2. 2. Long time bootstrap > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB { > required BlockIdPB block_id = 1; // int64 > required BlockRecordType op_type = 2; // CREATE or DELETE > required uint64 timestamp_us = 3; > optional int64 offset = 4; // Required for CREATE. > optional int64 length = 5; // Required for CREATE. > } {code} > That means each object is either type of CREATE or DELETE. To mark a 'block' > as deleted, there will be 2 objects in the metadata, one is CREATE type and > the other is DELETE type. > There are some weak points of current LBM metadata storage mechanism: > h2. 1. Disk space amplification > > h2. 2. Long time bootstrap > > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3371: --- Description: h1. motivation The current LBM container use separate .data and .metadata files. The .data file store the real user data, we can use hole punching to reduce disk space. While the metadata use write protobuf serialized string to a file, in append only mode. Each protobuf object is a struct of BlockRecordPB: {code:java} message BlockRecordPB { required BlockIdPB block_id = 1; // int64 required BlockRecordType op_type = 2; // CREATE or DELETE required uint64 timestamp_us = 3; optional int64 offset = 4; // Required for CREATE. optional int64 length = 5; // Required for CREATE. } {code} That means each object is either type of CREATE or DELETE. To mark a 'block' as deleted, there will be 2 objects in the metadata, one is CREATE type and the other is DELETE type. There are some weak points of current LBM metadata storage mechanism: h2. 1. Disk space amplification The metadata live blocks rate may be very low, the worst case is there is only 1 alive block (suppose it hasn't reach the runtime compact threshold), all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). So the disk space amplification is very serious. h2. 2. Long time bootstrap In Kudu server bootstrap stage, it have to replay all the metadata files, to find out the alive blocks. In the worst case, we may replayed thousands of blocks in metadata, but find only a very few blocks are alive. It may waste much time in almost all cases, since the Kudu cluster in production environment always run without bootstrap with several months, the LBM may be very loose. h2. 3. Metadada compaction To resolve the issues above, there is a metadata compaction mechanism in LBM, was: h1. motivation The current LBM container use separate .data and .metadata files. The .data file store the real user data, we can use hole punching to reduce disk space. While the metadata use write protobuf serialized string to a file, in append only mode. Each protobuf object is a struct of BlockRecordPB: {code:java} message BlockRecordPB { required BlockIdPB block_id = 1; // int64 required BlockRecordType op_type = 2; // CREATE or DELETE required uint64 timestamp_us = 3; optional int64 offset = 4; // Required for CREATE. optional int64 length = 5; // Required for CREATE. } {code} That means each object is either type of CREATE or DELETE. To mark a 'block' as deleted, there will be 2 objects in the metadata, one is CREATE type and the other is DELETE type. There are some weak points of current LBM metadata storage mechanism: h2. 1. Disk space amplification h2. 2. Long time bootstrap > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB { > required BlockIdPB block_id = 1; // int64 > required BlockRecordType op_type = 2; // CREATE or DELETE > required uint64 timestamp_us = 3; > optional int64 offset = 4; // Required for CREATE. > optional int64 length = 5; // Required for CREATE. > } {code} > That means each object is either type of CREATE or DELETE. To mark a 'block' > as deleted, there will be 2 objects in the metadata, one is CREATE type and > the other is DELETE type. > There are some weak points of current LBM metadata storage mechanism: > h2. 1. Disk space amplification > The metadata live blocks rate may be very low, the worst case is there is > only 1 alive block (suppose it hasn't reach the runtime compact threshold), > all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). > So the disk space amplification is very serious. > h2. 2. Long time bootstrap > In Kudu server bootstrap stage, it have to replay all the metadata files, to > find out the alive blocks. In the worst case, we may replayed thousands of > blocks in metadata, but find only a very few blocks are alive. > It may waste much time in almost all cases, since the Kudu cluster in > production environment always run without bootstrap with several months, the > LBM may be very loose. > h2. 3. Metadada compaction > To resolve the issues above, there is a metadata compaction mechanism in LBM, > -- This message was sent by Atlassian Jira (v8.20.7#8200
[jira] [Updated] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3371: --- Description: h1. Motivation The current LBM container use separate .data and .metadata files. The .data file store the real user data, we can use hole punching to reduce disk space. While the metadata use write protobuf serialized string to a file, in append only mode. Each protobuf object is a struct of BlockRecordPB: {code:java} message BlockRecordPB { required BlockIdPB block_id = 1; // int64 required BlockRecordType op_type = 2; // CREATE or DELETE required uint64 timestamp_us = 3; optional int64 offset = 4; // Required for CREATE. optional int64 length = 5; // Required for CREATE. } {code} That means each object is either type of CREATE or DELETE. To mark a 'block' as deleted, there will be 2 objects in the metadata, one is CREATE type and the other is DELETE type. There are some weak points of current LBM metadata storage mechanism: h2. 1. Disk space amplification The metadata live blocks rate may be very low, the worst case is there is only 1 alive block (suppose it hasn't reach the runtime compact threshold), all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). So the disk space amplification is very serious. h2. 2. Long time bootstrap In Kudu server bootstrap stage, it have to replay all the metadata files, to find out the alive blocks. In the worst case, we may replayed thousands of blocks in metadata, but find only a very few blocks are alive. It may waste much time in almost all cases, since the Kudu cluster in production environment always run without bootstrap with several months, the LBM may be very loose. h2. 3. Metadada compaction To resolve the issues above, there is a metadata compaction mechanism in LBM, both at runtime and bootstrap stage. The one at runtime will lock the container, and it's synchronous. The one in bootstrap stage is synchronous too, and may make the bootstrap time longer. h1. Optimization I'm trying to use RocksDB to store LBM container metadata recently, finished most of work now, and did some benchmark was: h1. motivation The current LBM container use separate .data and .metadata files. The .data file store the real user data, we can use hole punching to reduce disk space. While the metadata use write protobuf serialized string to a file, in append only mode. Each protobuf object is a struct of BlockRecordPB: {code:java} message BlockRecordPB { required BlockIdPB block_id = 1; // int64 required BlockRecordType op_type = 2; // CREATE or DELETE required uint64 timestamp_us = 3; optional int64 offset = 4; // Required for CREATE. optional int64 length = 5; // Required for CREATE. } {code} That means each object is either type of CREATE or DELETE. To mark a 'block' as deleted, there will be 2 objects in the metadata, one is CREATE type and the other is DELETE type. There are some weak points of current LBM metadata storage mechanism: h2. 1. Disk space amplification The metadata live blocks rate may be very low, the worst case is there is only 1 alive block (suppose it hasn't reach the runtime compact threshold), all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). So the disk space amplification is very serious. h2. 2. Long time bootstrap In Kudu server bootstrap stage, it have to replay all the metadata files, to find out the alive blocks. In the worst case, we may replayed thousands of blocks in metadata, but find only a very few blocks are alive. It may waste much time in almost all cases, since the Kudu cluster in production environment always run without bootstrap with several months, the LBM may be very loose. h2. 3. Metadada compaction To resolve the issues above, there is a metadata compaction mechanism in LBM, > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. Motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB { > required BlockIdPB block_id = 1; // int64 > required BlockRecordType op_type = 2; // CREATE or DELETE > required uint64 timestamp_us = 3; > optional int64 offset = 4; // Required for CREATE. > optional int64 length = 5; // Required for CREATE. > } {code} > That means each object is either type of CREATE or DELETE. To mark a 'block' > as d
[jira] [Updated] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3371: --- Description: h1. Motivation The current LBM container use separate .data and .metadata files. The .data file store the real user data, we can use hole punching to reduce disk space. While the metadata use write protobuf serialized string to a file, in append only mode. Each protobuf object is a struct of BlockRecordPB: {code:java} message BlockRecordPB { required BlockIdPB block_id = 1; // int64 required BlockRecordType op_type = 2; // CREATE or DELETE required uint64 timestamp_us = 3; optional int64 offset = 4; // Required for CREATE. optional int64 length = 5; // Required for CREATE. } {code} That means each object is either type of CREATE or DELETE. To mark a 'block' as deleted, there will be 2 objects in the metadata, one is CREATE type and the other is DELETE type. There are some weak points of current LBM metadata storage mechanism: h2. 1. Disk space amplification The metadata live blocks rate may be very low, the worst case is there is only 1 alive block (suppose it hasn't reach the runtime compact threshold), all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). So the disk space amplification is very serious. h2. 2. Long time bootstrap In Kudu server bootstrap stage, it have to replay all the metadata files, to find out the alive blocks. In the worst case, we may replayed thousands of blocks in metadata, but find only a very few blocks are alive. It may waste much time in almost all cases, since the Kudu cluster in production environment always run without bootstrap with several months, the LBM may be very loose. h2. 3. Metadada compaction To resolve the issues above, there is a metadata compaction mechanism in LBM, both at runtime and bootstrap stage. The one at runtime will lock the container, and it's synchronous. The one in bootstrap stage is synchronous too, and may make the bootstrap time longer. h1. Optimization by using RocksDB h2. Storage design * RocksDB instance: one RocksDB instance per data directory. * Key: . * Value: the same as before, i.e. the serialized protobuf string, and only store for CREATE entries. * Put/Delete: put value to rocksdb when create block, delete it from rocksdb when delete block * Scan: happened only in bootstrap stage to retrieve all blocks * DeleteRange: happened only when invalidate a container h2. Advantages # Disk space amplification: There is still disk space amplification problem. But we can tune RocksDB to reach a balanced point, I trust in most cases, RocksDB is better than append only file. # Bootstrap time: since there are only valid blocks left in rocksdb, so it maybe much faster than before. # metadata compaction: we can leave it to rocksdb to do this work, of course tuning needed. h2. test & benchmark I'm trying to use RocksDB to store LBM container metadata recently, finished most of work now, and did some benchmark. It show that the fs module block read/write/delete performance is similar to or little worse than the old implemention, the bootstrap time may reduce several times. I not sure if it is worth to continue the work, or anybody know if there is any discussion on this topic ever. was: h1. Motivation The current LBM container use separate .data and .metadata files. The .data file store the real user data, we can use hole punching to reduce disk space. While the metadata use write protobuf serialized string to a file, in append only mode. Each protobuf object is a struct of BlockRecordPB: {code:java} message BlockRecordPB { required BlockIdPB block_id = 1; // int64 required BlockRecordType op_type = 2; // CREATE or DELETE required uint64 timestamp_us = 3; optional int64 offset = 4; // Required for CREATE. optional int64 length = 5; // Required for CREATE. } {code} That means each object is either type of CREATE or DELETE. To mark a 'block' as deleted, there will be 2 objects in the metadata, one is CREATE type and the other is DELETE type. There are some weak points of current LBM metadata storage mechanism: h2. 1. Disk space amplification The metadata live blocks rate may be very low, the worst case is there is only 1 alive block (suppose it hasn't reach the runtime compact threshold), all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). So the disk space amplification is very serious. h2. 2. Long time bootstrap In Kudu server bootstrap stage, it have to replay all the metadata files, to find out the alive blocks. In the worst case, we may replayed thousands of blocks in metadata, but find only a very few blocks are alive. It may waste much time in almost all cases, since the Kudu cluster in production environment always run without bootstrap with several months, the LBM may be very loose. h2. 3. Metadada com
[jira] [Updated] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3371: --- Description: h1. Motivation The current LBM container use separate .data and .metadata files. The .data file store the real user data, we can use hole punching to reduce disk space. While the metadata use write protobuf serialized string to a file, in append only mode. Each protobuf object is a struct of BlockRecordPB: {code:java} message BlockRecordPB { required BlockIdPB block_id = 1; // int64 required BlockRecordType op_type = 2; // CREATE or DELETE required uint64 timestamp_us = 3; optional int64 offset = 4; // Required for CREATE. optional int64 length = 5; // Required for CREATE. } {code} That means each object is either type of CREATE or DELETE. To mark a 'block' as deleted, there will be 2 objects in the metadata, one is CREATE type and the other is DELETE type. There are some weak points of current LBM metadata storage mechanism: h2. 1. Disk space amplification The metadata live blocks rate may be very low, the worst case is there is only 1 alive block (suppose it hasn't reach the runtime compact threshold), all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). So the disk space amplification is very serious. h2. 2. Long time bootstrap In Kudu server bootstrap stage, it have to replay all the metadata files, to find out the alive blocks. In the worst case, we may replayed thousands of blocks in metadata, but find only a very few blocks are alive. It may waste much time in almost all cases, since the Kudu cluster in production environment always run without bootstrap with several months, the LBM may be very loose. h2. 3. Metadada compaction To resolve the issues above, there is a metadata compaction mechanism in LBM, both at runtime and bootstrap stage. The one at runtime will lock the container, and it's synchronous. The one in bootstrap stage is synchronous too, and may make the bootstrap time longer. h1. Optimization I'm trying to use RocksDB to store LBM container metadata recently, finished most of work now, and did some benchmark. It show that the fs module block read/write/delete performance is similar to or little worse than the old implemention, the bootstrap time may reduce several times. I not sure if it is worth to continue the work, or anybody know if there is any discussion on this topic ever. was: h1. Motivation The current LBM container use separate .data and .metadata files. The .data file store the real user data, we can use hole punching to reduce disk space. While the metadata use write protobuf serialized string to a file, in append only mode. Each protobuf object is a struct of BlockRecordPB: {code:java} message BlockRecordPB { required BlockIdPB block_id = 1; // int64 required BlockRecordType op_type = 2; // CREATE or DELETE required uint64 timestamp_us = 3; optional int64 offset = 4; // Required for CREATE. optional int64 length = 5; // Required for CREATE. } {code} That means each object is either type of CREATE or DELETE. To mark a 'block' as deleted, there will be 2 objects in the metadata, one is CREATE type and the other is DELETE type. There are some weak points of current LBM metadata storage mechanism: h2. 1. Disk space amplification The metadata live blocks rate may be very low, the worst case is there is only 1 alive block (suppose it hasn't reach the runtime compact threshold), all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). So the disk space amplification is very serious. h2. 2. Long time bootstrap In Kudu server bootstrap stage, it have to replay all the metadata files, to find out the alive blocks. In the worst case, we may replayed thousands of blocks in metadata, but find only a very few blocks are alive. It may waste much time in almost all cases, since the Kudu cluster in production environment always run without bootstrap with several months, the LBM may be very loose. h2. 3. Metadada compaction To resolve the issues above, there is a metadata compaction mechanism in LBM, both at runtime and bootstrap stage. The one at runtime will lock the container, and it's synchronous. The one in bootstrap stage is synchronous too, and may make the bootstrap time longer. h1. Optimization I'm trying to use RocksDB to store LBM container metadata recently, finished most of work now, and did some benchmark > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. Motivation > The current LBM container use separate .data and .metadata files. The .data > file store the
[jira] [Commented] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542543#comment-17542543 ] Yingchun Lai commented on KUDU-3371: Yes, after rocksdb in introduced to Kudu, we can store more 'metadata' like consensus-meta and tablet-meta. Rocksdb provides plenty of options we can tune, we can separate different data to different rocksdb instance or column family to get higher performance, or gain different requirement. The github link in this Jira seems broken :( https://issues.apache.org/jira/browse/KUDU-2204 > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. Motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB { > required BlockIdPB block_id = 1; // int64 > required BlockRecordType op_type = 2; // CREATE or DELETE > required uint64 timestamp_us = 3; > optional int64 offset = 4; // Required for CREATE. > optional int64 length = 5; // Required for CREATE. > } {code} > That means each object is either type of CREATE or DELETE. To mark a 'block' > as deleted, there will be 2 objects in the metadata, one is CREATE type and > the other is DELETE type. > There are some weak points of current LBM metadata storage mechanism: > h2. 1. Disk space amplification > The metadata live blocks rate may be very low, the worst case is there is > only 1 alive block (suppose it hasn't reach the runtime compact threshold), > all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). > So the disk space amplification is very serious. > h2. 2. Long time bootstrap > In Kudu server bootstrap stage, it have to replay all the metadata files, to > find out the alive blocks. In the worst case, we may replayed thousands of > blocks in metadata, but find only a very few blocks are alive. > It may waste much time in almost all cases, since the Kudu cluster in > production environment always run without bootstrap with several months, the > LBM may be very loose. > h2. 3. Metadada compaction > To resolve the issues above, there is a metadata compaction mechanism in LBM, > both at runtime and bootstrap stage. > The one at runtime will lock the container, and it's synchronous. > The one in bootstrap stage is synchronous too, and may make the bootstrap > time longer. > h1. Optimization by using RocksDB > h2. Storage design > * RocksDB instance: one RocksDB instance per data directory. > * Key: . > * Value: the same as before, i.e. the serialized protobuf string, and only > store for CREATE entries. > * Put/Delete: put value to rocksdb when create block, delete it from rocksdb > when delete block > * Scan: happened only in bootstrap stage to retrieve all blocks > * DeleteRange: happened only when invalidate a container > h2. Advantages > # Disk space amplification: There is still disk space amplification problem. > But we can tune RocksDB to reach a balanced point, I trust in most cases, > RocksDB is better than append only file. > # Bootstrap time: since there are only valid blocks left in rocksdb, so it > maybe much faster than before. > # metadata compaction: we can leave it to rocksdb to do this work, of course > tuning needed. > h2. test & benchmark > I'm trying to use RocksDB to store LBM container metadata recently, finished > most of work now, and did some benchmark. It show that the fs module block > read/write/delete performance is similar to or little worse than the old > implemention, the bootstrap time may reduce several times. > I not sure if it is worth to continue the work, or anybody know if there is > any discussion on this topic ever. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542546#comment-17542546 ] Yingchun Lai commented on KUDU-3371: I submit my WIP patch here https://gerrit.cloudera.org/c/18569/ > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. Motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB { > required BlockIdPB block_id = 1; // int64 > required BlockRecordType op_type = 2; // CREATE or DELETE > required uint64 timestamp_us = 3; > optional int64 offset = 4; // Required for CREATE. > optional int64 length = 5; // Required for CREATE. > } {code} > That means each object is either type of CREATE or DELETE. To mark a 'block' > as deleted, there will be 2 objects in the metadata, one is CREATE type and > the other is DELETE type. > There are some weak points of current LBM metadata storage mechanism: > h2. 1. Disk space amplification > The metadata live blocks rate may be very low, the worst case is there is > only 1 alive block (suppose it hasn't reach the runtime compact threshold), > all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). > So the disk space amplification is very serious. > h2. 2. Long time bootstrap > In Kudu server bootstrap stage, it have to replay all the metadata files, to > find out the alive blocks. In the worst case, we may replayed thousands of > blocks in metadata, but find only a very few blocks are alive. > It may waste much time in almost all cases, since the Kudu cluster in > production environment always run without bootstrap with several months, the > LBM may be very loose. > h2. 3. Metadada compaction > To resolve the issues above, there is a metadata compaction mechanism in LBM, > both at runtime and bootstrap stage. > The one at runtime will lock the container, and it's synchronous. > The one in bootstrap stage is synchronous too, and may make the bootstrap > time longer. > h1. Optimization by using RocksDB > h2. Storage design > * RocksDB instance: one RocksDB instance per data directory. > * Key: . > * Value: the same as before, i.e. the serialized protobuf string, and only > store for CREATE entries. > * Put/Delete: put value to rocksdb when create block, delete it from rocksdb > when delete block > * Scan: happened only in bootstrap stage to retrieve all blocks > * DeleteRange: happened only when invalidate a container > h2. Advantages > # Disk space amplification: There is still disk space amplification problem. > But we can tune RocksDB to reach a balanced point, I trust in most cases, > RocksDB is better than append only file. > # Bootstrap time: since there are only valid blocks left in rocksdb, so it > maybe much faster than before. > # metadata compaction: we can leave it to rocksdb to do this work, of course > tuning needed. > h2. test & benchmark > I'm trying to use RocksDB to store LBM container metadata recently, finished > most of work now, and did some benchmark. It show that the fs module block > read/write/delete performance is similar to or little worse than the old > implemention, the bootstrap time may reduce several times. > I not sure if it is worth to continue the work, or anybody know if there is > any discussion on this topic ever. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544775#comment-17544775 ] Yingchun Lai commented on KUDU-3353: This feature has been implemented by [KUDU-3353 [schema] Add an immutable attribute on column schema (If80ebca7) · Gerrit Code Review (cloudera.org)|https://gerrit.cloudera.org/c/18241/], Help to review, thanks! [~anjuwong] [~aserbin] [~tlipcon] > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544775#comment-17544775 ] Yingchun Lai edited comment on KUDU-3353 at 6/1/22 9:11 AM: This feature has been implemented by [this patch|https://gerrit.cloudera.org/c/18241/], Help to review, thanks! [~anjuwong] [~aserbin] [~tlipcon] was (Author: laiyingchun): This feature has been implemented by [KUDU-3353 [schema] Add an immutable attribute on column schema (If80ebca7) · Gerrit Code Review (cloudera.org)|https://gerrit.cloudera.org/c/18241/], Help to review, thanks! [~anjuwong] [~aserbin] [~tlipcon] > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558980#comment-17558980 ] Yingchun Lai commented on KUDU-3371: Thanks [~weichiu] Kudu store protobuf serialized metadata into append only file too, so the cost will not increase after using RocksDB in the serialization/deserialization stage. The difference is the cost of store strings into append only file and RocksDB (including muttable memtable and WAL). > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. Motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB { > required BlockIdPB block_id = 1; // int64 > required BlockRecordType op_type = 2; // CREATE or DELETE > required uint64 timestamp_us = 3; > optional int64 offset = 4; // Required for CREATE. > optional int64 length = 5; // Required for CREATE. > } {code} > That means each object is either type of CREATE or DELETE. To mark a 'block' > as deleted, there will be 2 objects in the metadata, one is CREATE type and > the other is DELETE type. > There are some weak points of current LBM metadata storage mechanism: > h2. 1. Disk space amplification > The metadata live blocks rate may be very low, the worst case is there is > only 1 alive block (suppose it hasn't reach the runtime compact threshold), > all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). > So the disk space amplification is very serious. > h2. 2. Long time bootstrap > In Kudu server bootstrap stage, it have to replay all the metadata files, to > find out the alive blocks. In the worst case, we may replayed thousands of > blocks in metadata, but find only a very few blocks are alive. > It may waste much time in almost all cases, since the Kudu cluster in > production environment always run without bootstrap with several months, the > LBM may be very loose. > h2. 3. Metadada compaction > To resolve the issues above, there is a metadata compaction mechanism in LBM, > both at runtime and bootstrap stage. > The one at runtime will lock the container, and it's synchronous. > The one in bootstrap stage is synchronous too, and may make the bootstrap > time longer. > h1. Optimization by using RocksDB > h2. Storage design > * RocksDB instance: one RocksDB instance per data directory. > * Key: . > * Value: the same as before, i.e. the serialized protobuf string, and only > store for CREATE entries. > * Put/Delete: put value to rocksdb when create block, delete it from rocksdb > when delete block > * Scan: happened only in bootstrap stage to retrieve all blocks > * DeleteRange: happened only when invalidate a container > h2. Advantages > # Disk space amplification: There is still disk space amplification problem. > But we can tune RocksDB to reach a balanced point, I trust in most cases, > RocksDB is better than append only file. > # Bootstrap time: since there are only valid blocks left in rocksdb, so it > maybe much faster than before. > # metadata compaction: we can leave it to rocksdb to do this work, of course > tuning needed. > h2. test & benchmark > I'm trying to use RocksDB to store LBM container metadata recently, finished > most of work now, and did some benchmark. It show that the fs module block > read/write/delete performance is similar to or little worse than the old > implemention, the bootstrap time may reduce several times. > I not sure if it is worth to continue the work, or anybody know if there is > any discussion on this topic ever. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559016#comment-17559016 ] Yingchun Lai commented on KUDU-3371: Now I've completed the main work of introducing RocksDB to store log block manager's metadata, introduced another block manager type named "logr", and some related unit tests and benchmark tests. The benchmark tests include startup, it shows that reopen staged reduced upto 90% time cost of using 'log' type block manager (but the delete blocks stage increase about 1 time, create blocks stage cost similar time, shutdown block manager reduce about 20%). test: log_block_manager-test --gtest_filter=EncryptionEnabled/LogBlockManagerTest.StartupBenchmark/0 ... |--startup_benchmark_block_count_per_batch_for_testing=1000 --startup_benchmark_batch_count_for_testing=5000|create blocks|delete blocks|shutdown block manager|reopening block manager| | |--startup_benchmark_deleted_block_percentage|create-log|create-logr|delete-log|delete-logr|shutdown-log|shutdown-logr|reopen-log|reopen-logr|live_blocks| |10|19.861|18.412|1.307|1.678|10.083|18.736|8.832|5.693|450| |20|19.369|19.018|2.223|4.292|17.901|21.559|8.503|7.061|400| |30|20.121|19.737|3.626|6.045|29.604|53.677|8.561|6.189|350| |40|19.183|18.233|4.409|8.116|37.216|55.642|8.745|4.241|300| |50|19.997|18.257|4.889|10.178|94.15|70.607|9.342|3.365|250| |60|19.451|18.08|7.123|11.995|65.856|46.161|9.436|3.166|200| |70|18.841|18.448|7.249|14.529|84.43|64.063|9.072|3.018|150| |80|20.418|18.004|9.922|16.708|111.138|77.051|10.026|2.788|100| |90|20.255|18.144|9.728|18.337|121.562|107.961|9.85|1.317|50| |95|19.449|18.524|11.598|19.059|140.193|116.238|9.972|1.18|25| |99|20.583|18.38|11.918|19.505|138.448|114.04|10.085|1.107|5| |99.9|18.852|18.253|12.137|20.497|143.368|107.981|10.033|1.068|5000| |99.99|20.024|18.199|11.799|20.181|138.805|111.367|10.631|1.111|500| test: block_manager-stress-test (run the test in 30 seconds, with threads to write/read/delete blocks) | |file|log|logr| |Wrote blocks|28,320|71,680|77,920| |Read blocks|3,557,279|3,588,357|3,554,305| |Deleted blocks|26,681|70,041|76,281| > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. Motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB { > required BlockIdPB block_id = 1; // int64 > required BlockRecordType op_type = 2; // CREATE or DELETE > required uint64 timestamp_us = 3; > optional int64 offset = 4; // Required for CREATE. > optional int64 length = 5; // Required for CREATE. > } {code} > That means each object is either type of CREATE or DELETE. To mark a 'block' > as deleted, there will be 2 objects in the metadata, one is CREATE type and > the other is DELETE type. > There are some weak points of current LBM metadata storage mechanism: > h2. 1. Disk space amplification > The metadata live blocks rate may be very low, the worst case is there is > only 1 alive block (suppose it hasn't reach the runtime compact threshold), > all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). > So the disk space amplification is very serious. > h2. 2. Long time bootstrap > In Kudu server bootstrap stage, it have to replay all the metadata files, to > find out the alive blocks. In the worst case, we may replayed thousands of > blocks in metadata, but find only a very few blocks are alive. > It may waste much time in almost all cases, since the Kudu cluster in > production environment always run without bootstrap with several months, the > LBM may be very loose. > h2. 3. Metadada compaction > To resolve the issues above, there is a metadata compaction mechanism in LBM, > both at runtime and bootstrap stage. > The one at runtime will lock the container, and it's synchronous. > The one in bootstrap stage is synchronous too, and may make the bootstrap > time longer. > h1. Optimization by using RocksDB > h2. Storage design > * RocksDB instance: one RocksDB instance per data directory. > * Key: . > * Value: the same as before, i.e. the serialized protobuf string, and only > store for CREATE entries. > * Put/Delete: put value to rocksdb when create block, delete it from rocksdb > when delete block > * Scan: happened only in bootstrap stage to retrieve all blocks > * DeleteRange: happened only when i
[jira] [Comment Edited] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559016#comment-17559016 ] Yingchun Lai edited comment on KUDU-3371 at 6/27/22 6:48 AM: - Now I've completed the main work of introducing RocksDB to store log block manager's metadata, introduced another block manager type named "logr", and some related unit tests and benchmark tests. The benchmark tests include startup, it shows that reopen staged reduced upto 90% time cost of using 'log' type block manager (but the delete blocks stage increase about 1 time, create blocks stage cost similar time, shutdown block manager reduce about 20%). test: log_block_manager-test --gtest_filter=EncryptionEnabled/LogBlockManagerTest.StartupBenchmark/0 ... |--startup_benchmark_block_count_per_batch_for_testing=1000 --startup_benchmark_batch_count_for_testing=5000| | | | | | | | | | |--startup_benchmark_deleted_block_percentage|create-log|create-logr|delete-log|delete-logr|shutdown-log|shutdown-logr|reopen-log|reopen-logr|live_blocks| |10|19.861|18.412|1.307|1.678|10.083|18.736|8.832|5.693|450| |20|19.369|19.018|2.223|4.292|17.901|21.559|8.503|7.061|400| |30|20.121|19.737|3.626|6.045|29.604|53.677|8.561|6.189|350| |40|19.183|18.233|4.409|8.116|37.216|55.642|8.745|4.241|300| |50|19.997|18.257|4.889|10.178|94.15|70.607|9.342|3.365|250| |60|19.451|18.08|7.123|11.995|65.856|46.161|9.436|3.166|200| |70|18.841|18.448|7.249|14.529|84.43|64.063|9.072|3.018|150| |80|20.418|18.004|9.922|16.708|111.138|77.051|10.026|2.788|100| |90|20.255|18.144|9.728|18.337|121.562|107.961|9.85|1.317|50| |95|19.449|18.524|11.598|19.059|140.193|116.238|9.972|1.18|25| |99|20.583|18.38|11.918|19.505|138.448|114.04|10.085|1.107|5| |99.9|18.852|18.253|12.137|20.497|143.368|107.981|10.033|1.068|5000| |99.99|20.024|18.199|11.799|20.181|138.805|111.367|10.631|1.111|500| test: block_manager-stress-test (run the test in 30 seconds, with threads to write/read/delete blocks) | |file|log|logr| |Wrote blocks|28,320|71,680|77,920| |Read blocks|3,557,279|3,588,357|3,554,305| |Deleted blocks|26,681|70,041|76,281| was (Author: laiyingchun): Now I've completed the main work of introducing RocksDB to store log block manager's metadata, introduced another block manager type named "logr", and some related unit tests and benchmark tests. The benchmark tests include startup, it shows that reopen staged reduced upto 90% time cost of using 'log' type block manager (but the delete blocks stage increase about 1 time, create blocks stage cost similar time, shutdown block manager reduce about 20%). test: log_block_manager-test --gtest_filter=EncryptionEnabled/LogBlockManagerTest.StartupBenchmark/0 ... |--startup_benchmark_block_count_per_batch_for_testing=1000 --startup_benchmark_batch_count_for_testing=5000|create blocks|delete blocks|shutdown block manager|reopening block manager| | |--startup_benchmark_deleted_block_percentage|create-log|create-logr|delete-log|delete-logr|shutdown-log|shutdown-logr|reopen-log|reopen-logr|live_blocks| |10|19.861|18.412|1.307|1.678|10.083|18.736|8.832|5.693|450| |20|19.369|19.018|2.223|4.292|17.901|21.559|8.503|7.061|400| |30|20.121|19.737|3.626|6.045|29.604|53.677|8.561|6.189|350| |40|19.183|18.233|4.409|8.116|37.216|55.642|8.745|4.241|300| |50|19.997|18.257|4.889|10.178|94.15|70.607|9.342|3.365|250| |60|19.451|18.08|7.123|11.995|65.856|46.161|9.436|3.166|200| |70|18.841|18.448|7.249|14.529|84.43|64.063|9.072|3.018|150| |80|20.418|18.004|9.922|16.708|111.138|77.051|10.026|2.788|100| |90|20.255|18.144|9.728|18.337|121.562|107.961|9.85|1.317|50| |95|19.449|18.524|11.598|19.059|140.193|116.238|9.972|1.18|25| |99|20.583|18.38|11.918|19.505|138.448|114.04|10.085|1.107|5| |99.9|18.852|18.253|12.137|20.497|143.368|107.981|10.033|1.068|5000| |99.99|20.024|18.199|11.799|20.181|138.805|111.367|10.631|1.111|500| test: block_manager-stress-test (run the test in 30 seconds, with threads to write/read/delete blocks) | |file|log|logr| |Wrote blocks|28,320|71,680|77,920| |Read blocks|3,557,279|3,588,357|3,554,305| |Deleted blocks|26,681|70,041|76,281| > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. Motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB
[jira] [Commented] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559021#comment-17559021 ] Yingchun Lai commented on KUDU-3371: The side effect of rocksdb is delete blocks cost more time, but I think the deletion of blocks are always backgroud, it doesn't take effect of user faced write, scan, or alter table. And we have opportunity to research more in the future by tuning rocksdb options. Everybody can contribute on it then. > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. Motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB { > required BlockIdPB block_id = 1; // int64 > required BlockRecordType op_type = 2; // CREATE or DELETE > required uint64 timestamp_us = 3; > optional int64 offset = 4; // Required for CREATE. > optional int64 length = 5; // Required for CREATE. > } {code} > That means each object is either type of CREATE or DELETE. To mark a 'block' > as deleted, there will be 2 objects in the metadata, one is CREATE type and > the other is DELETE type. > There are some weak points of current LBM metadata storage mechanism: > h2. 1. Disk space amplification > The metadata live blocks rate may be very low, the worst case is there is > only 1 alive block (suppose it hasn't reach the runtime compact threshold), > all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). > So the disk space amplification is very serious. > h2. 2. Long time bootstrap > In Kudu server bootstrap stage, it have to replay all the metadata files, to > find out the alive blocks. In the worst case, we may replayed thousands of > blocks in metadata, but find only a very few blocks are alive. > It may waste much time in almost all cases, since the Kudu cluster in > production environment always run without bootstrap with several months, the > LBM may be very loose. > h2. 3. Metadada compaction > To resolve the issues above, there is a metadata compaction mechanism in LBM, > both at runtime and bootstrap stage. > The one at runtime will lock the container, and it's synchronous. > The one in bootstrap stage is synchronous too, and may make the bootstrap > time longer. > h1. Optimization by using RocksDB > h2. Storage design > * RocksDB instance: one RocksDB instance per data directory. > * Key: . > * Value: the same as before, i.e. the serialized protobuf string, and only > store for CREATE entries. > * Put/Delete: put value to rocksdb when create block, delete it from rocksdb > when delete block > * Scan: happened only in bootstrap stage to retrieve all blocks > * DeleteRange: happened only when invalidate a container > h2. Advantages > # Disk space amplification: There is still disk space amplification problem. > But we can tune RocksDB to reach a balanced point, I trust in most cases, > RocksDB is better than append only file. > # Bootstrap time: since there are only valid blocks left in rocksdb, so it > maybe much faster than before. > # metadata compaction: we can leave it to rocksdb to do this work, of course > tuning needed. > h2. test & benchmark > I'm trying to use RocksDB to store LBM container metadata recently, finished > most of work now, and did some benchmark. It show that the fs module block > read/write/delete performance is similar to or little worse than the old > implemention, the bootstrap time may reduce several times. > I not sure if it is worth to continue the work, or anybody know if there is > any discussion on this topic ever. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559031#comment-17559031 ] Yingchun Lai commented on KUDU-3371: I have submmit a merge request on gerrit [1][,|https://gerrit.cloudera.org/c/18569/,] but it seems too large and not friendly for reviewers, I will split it to several small merge requests. # Refactor LogBlockManager as a base class, add LogfBlockManager extend from it. LogfBlockManager is the Log Block Manager which manage the append only file to store containers' metadata, it is how we do as before. # Refactor LogBlockContainer as a base class, add LogfBlockContainer extend from it. LogfBlockContainer is the Log Block Container which use append only file to store containers' metadata, it is how we do as before. # Intruduce rocksdb as a thirdparty lib. # Add LogrBlockContainer which use rocksdb to store containers metadata. and add LogrBlockManager to manage LogrBlockContainer. Add related unit tests. # Do some refactors to support batch operates on blocks. # Use existing benchmarks to show the effect. # Add some metrics. (TODO, not included in [1]) # Add more kudu tools to operate on rocksdb metadata. (TODO, not included in [1]) # futher tuning on rocksdb options. (TODO, not included in [1]) 1. [https://gerrit.cloudera.org/c/18569/|https://gerrit.cloudera.org/c/18569/,] > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. Motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB { > required BlockIdPB block_id = 1; // int64 > required BlockRecordType op_type = 2; // CREATE or DELETE > required uint64 timestamp_us = 3; > optional int64 offset = 4; // Required for CREATE. > optional int64 length = 5; // Required for CREATE. > } {code} > That means each object is either type of CREATE or DELETE. To mark a 'block' > as deleted, there will be 2 objects in the metadata, one is CREATE type and > the other is DELETE type. > There are some weak points of current LBM metadata storage mechanism: > h2. 1. Disk space amplification > The metadata live blocks rate may be very low, the worst case is there is > only 1 alive block (suppose it hasn't reach the runtime compact threshold), > all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). > So the disk space amplification is very serious. > h2. 2. Long time bootstrap > In Kudu server bootstrap stage, it have to replay all the metadata files, to > find out the alive blocks. In the worst case, we may replayed thousands of > blocks in metadata, but find only a very few blocks are alive. > It may waste much time in almost all cases, since the Kudu cluster in > production environment always run without bootstrap with several months, the > LBM may be very loose. > h2. 3. Metadada compaction > To resolve the issues above, there is a metadata compaction mechanism in LBM, > both at runtime and bootstrap stage. > The one at runtime will lock the container, and it's synchronous. > The one in bootstrap stage is synchronous too, and may make the bootstrap > time longer. > h1. Optimization by using RocksDB > h2. Storage design > * RocksDB instance: one RocksDB instance per data directory. > * Key: . > * Value: the same as before, i.e. the serialized protobuf string, and only > store for CREATE entries. > * Put/Delete: put value to rocksdb when create block, delete it from rocksdb > when delete block > * Scan: happened only in bootstrap stage to retrieve all blocks > * DeleteRange: happened only when invalidate a container > h2. Advantages > # Disk space amplification: There is still disk space amplification problem. > But we can tune RocksDB to reach a balanced point, I trust in most cases, > RocksDB is better than append only file. > # Bootstrap time: since there are only valid blocks left in rocksdb, so it > maybe much faster than before. > # metadata compaction: we can leave it to rocksdb to do this work, of course > tuning needed. > h2. test & benchmark > I'm trying to use RocksDB to store LBM container metadata recently, finished > most of work now, and did some benchmark. It show that the fs module block > read/write/delete performance is similar to or little worse than the old > implemention, the bootstrap time may reduce several times. > I not sure if it is worth to conti
[jira] [Comment Edited] (KUDU-3371) Use RocksDB to store LBM metadata
[ https://issues.apache.org/jira/browse/KUDU-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559031#comment-17559031 ] Yingchun Lai edited comment on KUDU-3371 at 7/6/22 3:59 PM: I have submmit a merge request on gerrit [1], but it seems too large and not friendly for reviewers, I will split it to several small merge requests. # Refactor LogBlockManager as a base class, add LogfBlockManager extend from it. LogfBlockManager is the Log Block Manager which manage the append only file to store containers' metadata, it is how we do as before. # Refactor LogBlockContainer as a base class, add LogfBlockContainer extend from it. LogfBlockContainer is the Log Block Container which use append only file to store containers' metadata, it is how we do as before. # Intruduce rocksdb as a thirdparty lib. # Add LogrBlockContainer which use rocksdb to store containers metadata. and add LogrBlockManager to manage LogrBlockContainer. Add related unit tests. # Do some refactors to support batch operates on blocks. # Use existing benchmarks to show the effect. # Add some metrics. (TODO, not included in [1]) # Add more kudu tools to operate on rocksdb metadata. (TODO, not included in [1]) # futher tuning on rocksdb options. (TODO, not included in [1]) 1. [https://gerrit.cloudera.org/c/18569/|https://gerrit.cloudera.org/c/18569/,] was (Author: laiyingchun): I have submmit a merge request on gerrit [1][,|https://gerrit.cloudera.org/c/18569/,] but it seems too large and not friendly for reviewers, I will split it to several small merge requests. # Refactor LogBlockManager as a base class, add LogfBlockManager extend from it. LogfBlockManager is the Log Block Manager which manage the append only file to store containers' metadata, it is how we do as before. # Refactor LogBlockContainer as a base class, add LogfBlockContainer extend from it. LogfBlockContainer is the Log Block Container which use append only file to store containers' metadata, it is how we do as before. # Intruduce rocksdb as a thirdparty lib. # Add LogrBlockContainer which use rocksdb to store containers metadata. and add LogrBlockManager to manage LogrBlockContainer. Add related unit tests. # Do some refactors to support batch operates on blocks. # Use existing benchmarks to show the effect. # Add some metrics. (TODO, not included in [1]) # Add more kudu tools to operate on rocksdb metadata. (TODO, not included in [1]) # futher tuning on rocksdb options. (TODO, not included in [1]) 1. [https://gerrit.cloudera.org/c/18569/|https://gerrit.cloudera.org/c/18569/,] > Use RocksDB to store LBM metadata > - > > Key: KUDU-3371 > URL: https://issues.apache.org/jira/browse/KUDU-3371 > Project: Kudu > Issue Type: Improvement > Components: fs >Reporter: Yingchun Lai >Priority: Major > > h1. Motivation > The current LBM container use separate .data and .metadata files. The .data > file store the real user data, we can use hole punching to reduce disk space. > While the metadata use write protobuf serialized string to a file, in append > only mode. Each protobuf object is a struct of BlockRecordPB: > > {code:java} > message BlockRecordPB { > required BlockIdPB block_id = 1; // int64 > required BlockRecordType op_type = 2; // CREATE or DELETE > required uint64 timestamp_us = 3; > optional int64 offset = 4; // Required for CREATE. > optional int64 length = 5; // Required for CREATE. > } {code} > That means each object is either type of CREATE or DELETE. To mark a 'block' > as deleted, there will be 2 objects in the metadata, one is CREATE type and > the other is DELETE type. > There are some weak points of current LBM metadata storage mechanism: > h2. 1. Disk space amplification > The metadata live blocks rate may be very low, the worst case is there is > only 1 alive block (suppose it hasn't reach the runtime compact threshold), > all the other thousands of blocks are dead (i.e. in pair of CREATE-DELETE). > So the disk space amplification is very serious. > h2. 2. Long time bootstrap > In Kudu server bootstrap stage, it have to replay all the metadata files, to > find out the alive blocks. In the worst case, we may replayed thousands of > blocks in metadata, but find only a very few blocks are alive. > It may waste much time in almost all cases, since the Kudu cluster in > production environment always run without bootstrap with several months, the > LBM may be very loose. > h2. 3. Metadada compaction > To resolve the issues above, there is a metadata compaction mechanism in LBM, > both at runtime and bootstrap stage. > The one at runtime will lock the container, and it's synchronous. > The one in bootstrap stage is synchronous too, and may make the bootstrap > time longer. >
[jira] [Commented] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568976#comment-17568976 ] Yingchun Lai commented on KUDU-3353: Let me clarify some use cases: A user profile table in Kudu has a column "first_login_ts", it represent the first login time to the website. The data in the table is upsert by user event log, the log contains user's id, some attributes, and "first_login_ts". The first_login_ts is filled by the log produced time, that means for a specified user, his/her event logs have a different (higher and higher) "first_login_ts", but only the first one could be set, and the following logs should not update it. The updated design: 1. Add a column attribute to define a column as IMMUTABLE, means the column cell value can not be updated after it's been written during inserting the row. 2. Use UPDATE_IGNORE and add UPSERT_IGNORE, for UPDATE and UPSERT ops but ignore update-errors on IMMUTABLE columns. > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568976#comment-17568976 ] Yingchun Lai edited comment on KUDU-3353 at 7/20/22 11:27 AM: -- Let me clarify some use cases: A user profile table in Kudu has a column "first_login_ts", it represent the first login time to the website. The data in the table is upsert by user event log, the log contains user's id, some attributes, and "first_login_ts". The first_login_ts is filled by the log produced time, that means for a specified user, his/her event logs have a different (higher and higher) "first_login_ts", but only the first one could be set, and the following logs should not update it. The same to columns such as sex, birthday, birthplace and etc. If the table column supports "immutable" attribute, the new value in update/upsert ops will not be applied to the change list, we can gain the profits of faster read. And in some cases without immutable attribute, we have to read the old value, compare with the new value, and then judge which value wins, it would be much cost. The updated design: 1. Add a column attribute to define a column as IMMUTABLE, means the column cell value can not be updated after it's been written during inserting the row. 2. Use UPDATE_IGNORE and add UPSERT_IGNORE, for UPDATE and UPSERT ops but ignore update-errors on IMMUTABLE columns. was (Author: laiyingchun): Let me clarify some use cases: A user profile table in Kudu has a column "first_login_ts", it represent the first login time to the website. The data in the table is upsert by user event log, the log contains user's id, some attributes, and "first_login_ts". The first_login_ts is filled by the log produced time, that means for a specified user, his/her event logs have a different (higher and higher) "first_login_ts", but only the first one could be set, and the following logs should not update it. The updated design: 1. Add a column attribute to define a column as IMMUTABLE, means the column cell value can not be updated after it's been written during inserting the row. 2. Use UPDATE_IGNORE and add UPSERT_IGNORE, for UPDATE and UPSERT ops but ignore update-errors on IMMUTABLE columns. > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
Yingchun Lai created KUDU-3400: -- Summary: CompilationManager::RequestRowProjector consumed too much memory Key: KUDU-3400 URL: https://issues.apache.org/jira/browse/KUDU-3400 Project: Kudu Issue Type: Bug Components: codegen Affects Versions: 1.12.0 Reporter: Yingchun Lai In one of our cluster, we find that CompilationManager::RequestRowProjector function consumed too much memory accidentally. Some situaction of this cluster: # some tables have more than 1000 columns, so the table schema may be very costly to copy # sometimes the tservers have memory pressure, and then do flush operations more frequently (to try to reduce memory consumed by MRS/DMS) I catched a heap profile on a tserver, found out that CompilationManager::RequestRowProjector cost most memory when Schema copied, the source code: {code:java} CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, CodeGenerator* generator) : base_(base), proj_(proj), cache_(cache), generator_(generator) {} {code} That is to say, Schemas (i.e. base and proj) are copied when construct CompilationTask objects. The heap profile says that Schema consumed about 50GB memory, that really shock me, even though the Schema is large, but how can it consumed 50GB memory? I forget to `pstack` the process when it happend, maybe there are hundreds of thousands of CompilationManager::RequestRowProjector calls that time, but according to the code logic, it should not hang there for a long time? {code:java} if (!cached) { shared_ptr task(make_shared( *base_schema, *projection, &cache_, &generator_)); WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), "RowProjector compilation request submit failed", 10); return false; } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
[ https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3400: --- Attachment: data02heap.svg > CompilationManager::RequestRowProjector consumed too much memory > > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen >Affects Versions: 1.12.0 >Reporter: Yingchun Lai >Priority: Major > Attachments: data02heap.svg > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr task(make_shared( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KUDU-3353) Add an immutable attribute on column schema
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3353: --- Summary: Add an immutable attribute on column schema (was: Support setnx semantic on column) > Add an immutable attribute on column schema > --- > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3353) Add an immutable attribute on column schema
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620695#comment-17620695 ] Yingchun Lai commented on KUDU-3353: Now almost all parts of this feature have been implemented, but the kudu-spark part is left, currently there is no Spark use case, we can implement it if someone need it. > Add an immutable attribute on column schema > --- > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3419) Tablet server maybe get stuck when loading tablet metadata failed
[ https://issues.apache.org/jira/browse/KUDU-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629429#comment-17629429 ] Yingchun Lai commented on KUDU-3419: when tserver shutdown, all internal objects will shutdown too, why need manual shutdown tablet_manager_? {code:java} TabletServer::~TabletServer() { ShutdownImpl(); } void TabletServer::ShutdownImpl() { if (kInitialized == state_ || kRunning == state_) { const string name = rpc_server_->ToString(); LOG(INFO) << "TabletServer@" << name << " shutting down..."; // 1. Stop accepting new RPCs. UnregisterAllServices(); // 2. Shut down the tserver's subsystems. maintenance_manager_->Shutdown(); WARN_NOT_OK(heartbeater_->Stop(), "Failed to stop TS Heartbeat thread"); fs_manager_->UnsetErrorNotificationCb(ErrorHandlerType::DISK_ERROR); fs_manager_->UnsetErrorNotificationCb(ErrorHandlerType::CFILE_CORRUPTION); tablet_manager_->Shutdown(); // <== tablet_manager_ will be shutdown client_initializer_->Shutdown(); // 3. Shut down generic subsystems. KuduServer::Shutdown(); LOG(INFO) << "TabletServer@" << name << " shutdown complete."; } state_ = kStopped; } {code} > Tablet server maybe get stuck when loading tablet metadata failed > - > > Key: KUDU-3419 > URL: https://issues.apache.org/jira/browse/KUDU-3419 > Project: Kudu > Issue Type: Bug >Reporter: Xixu Wang >Priority: Major > Attachments: image-2022-11-04-14-57-49-684.png, > image-2022-11-04-14-59-54-665.png, image-2022-11-04-15-25-05-437.png, > image-2022-11-04-15-29-27-092.png, image-2022-11-04-15-30-08-892.png, > image-2022-11-04-15-32-34-366.png > > > Tablet server maybe get stuck when loading tablet metadata failed. > The follow steps repeat the bug. > 1. Change the permission of one tablet meta file to root. We use account: > *kudu* to run Kudu. > !image-2022-11-04-14-57-49-684.png! > 2.Start an instance of tablet server. A permission erro will be saw: > !image-2022-11-04-15-29-27-092.png! > 3. Tablet server gets stuck and will not exit automatically. > !image-2022-11-04-15-30-08-892.png! > 4. Pstack is as follow: > As we can see. Tablet Server can not exit, because ThreadPool can not be > shutdown. TxnStatlessTrasckerTask is running, which cause threadpool can not > be shutdown. > !image-2022-11-04-15-32-34-366.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3367) Delta file with full of delete op can not be schedule to compact
[ https://issues.apache.org/jira/browse/KUDU-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633458#comment-17633458 ] Yingchun Lai commented on KUDU-3367: [~zhangyifan27] KUDU-1625 depends the tablet supports 'live row count' (which is introduced since Kudu 1.12 ?), even if upgrading Kudu to a higher version, the old exists tablet still doesn't have such metadata, so the DeletedRowsetGCOp will not work on these tablets. I guess [~Koppa] is trying to make these old tablet is able to GC such rowsets whose rows full deleted, right? > Delta file with full of delete op can not be schedule to compact > > > Key: KUDU-3367 > URL: https://issues.apache.org/jira/browse/KUDU-3367 > Project: Kudu > Issue Type: New Feature > Components: compaction >Reporter: dengke >Assignee: dengke >Priority: Major > Attachments: image-2022-05-09-14-13-16-525.png, > image-2022-05-09-14-16-31-828.png, image-2022-05-09-14-18-05-647.png, > image-2022-05-09-14-19-56-933.png, image-2022-05-09-14-21-47-374.png, > image-2022-05-09-14-23-43-973.png, image-2022-05-09-14-26-45-313.png, > image-2022-05-09-14-32-51-573.png > > > If we get a REDO delta with full of delete op, wich means there is no update > op in the file. The current compact algorithm will not schedule the file do > compact. If such files exist, after accumulating for a period of time, it > will greatly affect our scan speed. However, processing such files every time > compact reduces compact's performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
[ https://issues.apache.org/jira/browse/KUDU-3400 ] Yingchun Lai deleted comment on KUDU-3400: was (Author: laiyingchun): add the pstack [^pstack.txt] > CompilationManager::RequestRowProjector consumed too much memory > > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen >Affects Versions: 1.12.0 >Reporter: Yingchun Lai >Priority: Major > Attachments: data02heap.svg > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr task(make_shared( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
[ https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635127#comment-17635127 ] Yingchun Lai commented on KUDU-3400: add the pstack [^pstack.txt] > CompilationManager::RequestRowProjector consumed too much memory > > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen >Affects Versions: 1.12.0 >Reporter: Yingchun Lai >Priority: Major > Attachments: data02heap.svg > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr task(make_shared( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
[ https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3400: --- Attachment: pstack.txt > CompilationManager::RequestRowProjector consumed too much memory > > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen >Affects Versions: 1.12.0 >Reporter: Yingchun Lai >Priority: Major > Attachments: data02heap.svg, pstack.txt > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr task(make_shared( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
[ https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3400: --- Attachment: heapprofile.svg > CompilationManager::RequestRowProjector consumed too much memory > > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen >Affects Versions: 1.12.0 >Reporter: Yingchun Lai >Priority: Major > Attachments: data02heap.svg, heapprofile.svg, pstack.txt > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr task(make_shared( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
[ https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635134#comment-17635134 ] Yingchun Lai commented on KUDU-3400: add pstack and heapprofile [^pstack.txt] ^[^heapprofile.svg]^ ^Any thought about it ? [~alexey] [~awong] [~zhangyifan27]^ > CompilationManager::RequestRowProjector consumed too much memory > > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen >Affects Versions: 1.12.0 >Reporter: Yingchun Lai >Priority: Major > Attachments: data02heap.svg, heapprofile.svg, pstack.txt > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr task(make_shared( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
[ https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646952#comment-17646952 ] Yingchun Lai edited comment on KUDU-3400 at 12/14/22 6:16 AM: -- [~aserbin] {quote}When generating heap profile, it's important to use proper location for the toolchain and binary. If the binaries were built with devtoolset, it's necessary to set proper environment when running {{pprof}} from gperftools (see the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script). {quote} Do you mean it's needed to run the script before running \{{pprof}} was (Author: laiyingchun): [~aserbin] {quote}When generating heap profile, it's important to use proper location for the toolchain and binary. If the binaries were built with devtoolset, it's necessary to set proper environment when running {{pprof}} from gperftools (see the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script). {quote} Do you mean it's needed to run the script before running \{{pprof}} > CompilationManager::RequestRowProjector consumed too much memory > > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen >Affects Versions: 1.12.0 >Reporter: Yingchun Lai >Priority: Major > Attachments: data02heap.svg, heapprofile.svg, pstack.txt > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr task(make_shared( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
[ https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646952#comment-17646952 ] Yingchun Lai commented on KUDU-3400: [~aserbin] {quote}When generating heap profile, it's important to use proper location for the toolchain and binary. If the binaries were built with devtoolset, it's necessary to set proper environment when running {{pprof}} from gperftools (see the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script). {quote} Do you mean it's needed to run the script before running \{{pprof}} > CompilationManager::RequestRowProjector consumed too much memory > > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen >Affects Versions: 1.12.0 >Reporter: Yingchun Lai >Priority: Major > Attachments: data02heap.svg, heapprofile.svg, pstack.txt > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr task(make_shared( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
[ https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646952#comment-17646952 ] Yingchun Lai edited comment on KUDU-3400 at 12/14/22 6:18 AM: -- [~aserbin] {quote}When generating heap profile, it's important to use proper location for the toolchain and binary. If the binaries were built with devtoolset, it's necessary to set proper environment when running {{pprof}} from gperftools (see the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script). {quote} Do you mean it's needed to run the script before running {{pprof}} ? was (Author: laiyingchun): [~aserbin] {quote}When generating heap profile, it's important to use proper location for the toolchain and binary. If the binaries were built with devtoolset, it's necessary to set proper environment when running {{pprof}} from gperftools (see the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script). {quote} Do you mean it's needed to run the script before running \{{pprof}} > CompilationManager::RequestRowProjector consumed too much memory > > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen >Affects Versions: 1.12.0 >Reporter: Yingchun Lai >Priority: Major > Attachments: data02heap.svg, heapprofile.svg, pstack.txt > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr task(make_shared( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KUDU-3400) CompilationManager::RequestRowProjector consumed too much memory
[ https://issues.apache.org/jira/browse/KUDU-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646952#comment-17646952 ] Yingchun Lai edited comment on KUDU-3400 at 12/14/22 7:19 AM: -- [~aserbin] {quote}When generating heap profile, it's important to use proper location for the toolchain and binary. If the binaries were built with devtoolset, it's necessary to set proper environment when running {{pprof}} from gperftools (see the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script). {quote} Thanks for your reminding. Do you mean it's needed to run the script before running {{pprof}} ? Similar to KUDU-3406, the tserver is in memory presure and flush op taking priority over delta compaction over and over again. I suspect https://issues.apache.org/jira/browse/KUDU-3197 is related to the issue too if {{pprof }}is not properly used, both of them say thay "Schema" cost too much memory. After upgrading the cluster to a version including this patch ([https://gerrit.cloudera.org/c/18255/),] this situation hasn't reproduced after about 1 month. was (Author: laiyingchun): [~aserbin] {quote}When generating heap profile, it's important to use proper location for the toolchain and binary. If the binaries were built with devtoolset, it's necessary to set proper environment when running {{pprof}} from gperftools (see the {{$KUDU_HOME/build-support/enable_devtoolset.sh}} script). {quote} Do you mean it's needed to run the script before running {{pprof}} ? > CompilationManager::RequestRowProjector consumed too much memory > > > Key: KUDU-3400 > URL: https://issues.apache.org/jira/browse/KUDU-3400 > Project: Kudu > Issue Type: Bug > Components: codegen >Affects Versions: 1.12.0 >Reporter: Yingchun Lai >Priority: Major > Attachments: data02heap.svg, heapprofile.svg, pstack.txt > > > In one of our cluster, we find that CompilationManager::RequestRowProjector > function consumed too much memory accidentally. Some situaction of this > cluster: > # some tables have more than 1000 columns, so the table schema may be very > costly to copy > # sometimes the tservers have memory pressure, and then do flush operations > more frequently (to try to reduce memory consumed by MRS/DMS) > I catched a heap profile on a tserver, found out that > CompilationManager::RequestRowProjector cost most memory when Schema copied, > the source code: > > {code:java} > CompilationTask(const Schema& base, const Schema& proj, CodeCache* cache, > CodeGenerator* generator) > : base_(base), > proj_(proj), > cache_(cache), > generator_(generator) {} {code} > That is to say, Schemas (i.e. base and proj) are copied when construct > CompilationTask objects. > The heap profile says that Schema consumed about 50GB memory, that really > shock me, even though the Schema is large, but how can it consumed 50GB > memory? I forget to `pstack` the process when it happend, maybe there are > hundreds of thousands of CompilationManager::RequestRowProjector calls that > time, but according to the code logic, it should not hang there for a long > time? > {code:java} > if (!cached) { > shared_ptr task(make_shared( > *base_schema, *projection, &cache_, &generator_)); > WARN_NOT_OK_EVERY_N_SECS(pool_->Submit([task]() { task->Run(); }), > "RowProjector compilation request submit failed", 10); > return false; > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KUDU-3292) Show non-default flags on varz Web UI
[ https://issues.apache.org/jira/browse/KUDU-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3292: --- Attachment: image-2023-01-10-11-57-13-209.png > Show non-default flags on varz Web UI > - > > Key: KUDU-3292 > URL: https://issues.apache.org/jira/browse/KUDU-3292 > Project: Kudu > Issue Type: Improvement > Components: ui >Reporter: Grant Henke >Assignee: Bakai Ádám >Priority: Minor > Labels: beginner, newbie, newbie++, trivial > Attachments: image-2023-01-10-11-57-13-209.png > > > Currently each Kudu server has a /varz webpage (the Flags tab) showing all of > the flags set on the server. It would be a nice usability change to include a > seperate section showing only the non-default flags. This should be super > straigtforward given we have the ability to get all the non-default flags via > GetNonDefaultFlags or GetNonDefaultFlagsMap in flags.cc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3292) Show non-default flags on varz Web UI
[ https://issues.apache.org/jira/browse/KUDU-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656408#comment-17656408 ] Yingchun Lai commented on KUDU-3292: There maybe some duplicate flags? The flags in 'non-default' section and the ones in 'all' section are duplicated. Would it better to refactor this page, and show more infomation of flags, the description, default value, current value, and etc. This is how does impala do: !image-2023-01-10-11-57-13-209.png! > Show non-default flags on varz Web UI > - > > Key: KUDU-3292 > URL: https://issues.apache.org/jira/browse/KUDU-3292 > Project: Kudu > Issue Type: Improvement > Components: ui >Reporter: Grant Henke >Assignee: Bakai Ádám >Priority: Minor > Labels: beginner, newbie, newbie++, trivial > Attachments: image-2023-01-10-11-57-13-209.png > > > Currently each Kudu server has a /varz webpage (the Flags tab) showing all of > the flags set on the server. It would be a nice usability change to include a > seperate section showing only the non-default flags. This should be super > straigtforward given we have the ability to get all the non-default flags via > GetNonDefaultFlags or GetNonDefaultFlagsMap in flags.cc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-2670) Splitting more tasks for spark job, and add more concurrent for scan operation
[ https://issues.apache.org/jira/browse/KUDU-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682837#comment-17682837 ] Yingchun Lai commented on KUDU-2670: C++ client has implemented this feature too by https://issues.apache.org/jira/browse/KUDU-3393 > Splitting more tasks for spark job, and add more concurrent for scan operation > -- > > Key: KUDU-2670 > URL: https://issues.apache.org/jira/browse/KUDU-2670 > Project: Kudu > Issue Type: Improvement > Components: java, spark >Affects Versions: 1.8.0 >Reporter: yangz >Assignee: Xu Yao >Priority: Major > Labels: performance > > Refer to the KUDU-2437 Split a tablet into primary key ranges by size. > We need a java client implementation to support the split the tablet scan > operation. > We suggest two new implementation for the java client. > # A ConcurrentKuduScanner to get more scanner read data at the same time. > This will be useful for one case. We scanner only one row, but the predicate > doesn't contain the primary key, for this case, we will send a lot scanner > request but only one row return.It will be slow to send so much scanner > request one by one. So we need a concurrent way. And by this case we test, > for a 10G tablet, it will save a lot time for one machine. > # A way to split more spark task. To do so, we need get scanner tokens for > two step, first we send to the tserver to give range, then with this range we > get more scanner tokens. For our usage we make a tablet 10G, but we split a > task to process only 1G data. So we get better performance. > And all this feature has run well for us for half a year. We hope this > feature will be useful for the community. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3393) c++ client suport getTableKeyRanges
[ https://issues.apache.org/jira/browse/KUDU-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682838#comment-17682838 ] Yingchun Lai commented on KUDU-3393: Server and Java client parts have been implemented by https://issues.apache.org/jira/browse/KUDU-2670 > c++ client suport getTableKeyRanges > > > Key: KUDU-3393 > URL: https://issues.apache.org/jira/browse/KUDU-3393 > Project: Kudu > Issue Type: New Feature > Components: client >Reporter: dengke >Priority: Major > > The java client can split a tablet to mutil ranges and concurrent scan data. > This is a good feature, but the C++ client does not support this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3436) build_mini_cluster_binaries.sh doesn't work on Mac 13.0.1
[ https://issues.apache.org/jira/browse/KUDU-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739953#comment-17739953 ] Yingchun Lai commented on KUDU-3436: {quote} * New in macOS Big Sur 11.0.1, the system ships with a built-in dynamic linker cache of all system-provided libraries. As part of this change, copies of dynamic libraries are no longer present on the filesystem. Code that attempts to check for dynamic library presence by looking for a file at a path or enumerating a directory will fail. Instead, check for library presence by attempting to {{dlopen()}} the path, which will correctly check for the library in the cache. (62986286) {quote} [https://developer.apple.com/documentation/macos-release-notes/macos-big-sur-11_0_1-release-notes#Kernel] It seems we can't copy /usr/lib/libc+abi.dylib to the kudu-binary JAR artifact, except build the binaries on macOS 10.13 (the oldest version macOS Kudu support)? > build_mini_cluster_binaries.sh doesn't work on Mac 13.0.1 > - > > Key: KUDU-3436 > URL: https://issues.apache.org/jira/browse/KUDU-3436 > Project: Kudu > Issue Type: Bug >Reporter: Bakai Ádám >Priority: Major > > > {code:java} > build_mini_cluster_binaries.sh {code} > returns the following error: > {code:java} > Traceback (most recent call last): > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 503, in > main() > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 500, in main > relocate_deps(target_src, target_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 408, in relocate_deps > return relocate_deps_macos(target_src, target_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 397, in relocate_deps_macos > copy_file(dep_src, dep_dst) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 287, in copy_file > shutil.copyfile(src, dest) > File > "/opt/homebrew/Cellar/python@2/2.7.18/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", > line 96, in copyfile > with open(src, 'rb') as fsrc: > IOError: [Errno 2] No such file or directory: u'/usr/lib/libc++abi.dylib' > {code} > After further investigation, it looks like libc+{+}abi.dylib is in the > uninstrumented lib, but otool -L always gives back a path for > /usr/lib/libc{+}+abi.dylib . Simply adding the dylib into the > PAT_MACOS_LIB_EXCLUDE list doesn't work: it creates a jar file, but the > binaries can not be started. > It is probably due to the changes in how dynamic linking works in newer > MacOS: > [https://stackoverflow.com/questions/70581876/macos-dynamic-linker-reports-it-loaded-library-which-doesnt-exist] > It happens both on ARM64 and X86 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KUDU-3436) build_mini_cluster_binaries.sh doesn't work on Mac 13.0.1
[ https://issues.apache.org/jira/browse/KUDU-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739953#comment-17739953 ] Yingchun Lai edited comment on KUDU-3436 at 7/4/23 3:35 PM: {quote} * New in macOS Big Sur 11.0.1, the system ships with a built-in dynamic linker cache of all system-provided libraries. As part of this change, copies of dynamic libraries are no longer present on the filesystem. Code that attempts to check for dynamic library presence by looking for a file at a path or enumerating a directory will fail. Instead, check for library presence by attempting to {{dlopen()}} the path, which will correctly check for the library in the cache. (62986286){quote} [https://developer.apple.com/documentation/macos-release-notes/macos-big-sur-11_0_1-release-notes#Kernel] It seems we can't copy /usr/lib/libc+abi.dylib to the kudu-binary JAR artifact, except build the binaries on macOS 10.13 (the oldest version macOS Kudu support) ~ 10.15? was (Author: laiyingchun): {quote} * New in macOS Big Sur 11.0.1, the system ships with a built-in dynamic linker cache of all system-provided libraries. As part of this change, copies of dynamic libraries are no longer present on the filesystem. Code that attempts to check for dynamic library presence by looking for a file at a path or enumerating a directory will fail. Instead, check for library presence by attempting to {{dlopen()}} the path, which will correctly check for the library in the cache. (62986286) {quote} [https://developer.apple.com/documentation/macos-release-notes/macos-big-sur-11_0_1-release-notes#Kernel] It seems we can't copy /usr/lib/libc+abi.dylib to the kudu-binary JAR artifact, except build the binaries on macOS 10.13 (the oldest version macOS Kudu support)? > build_mini_cluster_binaries.sh doesn't work on Mac 13.0.1 > - > > Key: KUDU-3436 > URL: https://issues.apache.org/jira/browse/KUDU-3436 > Project: Kudu > Issue Type: Bug >Reporter: Bakai Ádám >Priority: Major > > > {code:java} > build_mini_cluster_binaries.sh {code} > returns the following error: > {code:java} > Traceback (most recent call last): > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 503, in > main() > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 500, in main > relocate_deps(target_src, target_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 408, in relocate_deps > return relocate_deps_macos(target_src, target_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 397, in relocate_deps_macos > copy_file(dep_src, dep_dst) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 287, in copy_file > shutil.copyfile(src, dest) > File > "/opt/homebrew/Cellar/python@2/2.7.18/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", > line 96, in copyfile > with open(src, 'rb') as fsrc: > IOError: [Errno 2] No such file or directory: u'/usr/lib/libc++abi.dylib' > {code} > After further investigation, it looks like libc+{+}abi.dylib is in the > uninstrumented lib, but otool -L always gives back a path for > /usr/lib/libc{+}+abi.dylib . Simply adding the dylib into the > PAT_MACOS_LIB_EXCLUDE list doesn't work: it creates a jar file, but the > binaries can not be started. > It is probably due to the changes in how dynamic linking works in newer > MacOS: > [https://stackoverflow.com/questions/70581876/macos-dynamic-linker-reports-it-loaded-library-which-doesnt-exist] > It happens both on ARM64 and X86 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3436) build_mini_cluster_binaries.sh doesn't work on Mac 13.0.1
[ https://issues.apache.org/jira/browse/KUDU-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740206#comment-17740206 ] Yingchun Lai commented on KUDU-3436: What kind of runtime issues will be if skipping pack libc++abi.dylib to the artifact? I can find 3 ways to solve the problem: # skip packing libc++abi.dylib to the artifact # copy libc++abi.dylib from thirdparty, i.e. thirdparty/installed/uninstrumented/lib, to the artifact # do not publish macOS artifact > build_mini_cluster_binaries.sh doesn't work on Mac 13.0.1 > - > > Key: KUDU-3436 > URL: https://issues.apache.org/jira/browse/KUDU-3436 > Project: Kudu > Issue Type: Bug >Reporter: Bakai Ádám >Priority: Major > > > {code:java} > build_mini_cluster_binaries.sh {code} > returns the following error: > {code:java} > Traceback (most recent call last): > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 503, in > main() > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 500, in main > relocate_deps(target_src, target_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 408, in relocate_deps > return relocate_deps_macos(target_src, target_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 397, in relocate_deps_macos > copy_file(dep_src, dep_dst) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 287, in copy_file > shutil.copyfile(src, dest) > File > "/opt/homebrew/Cellar/python@2/2.7.18/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", > line 96, in copyfile > with open(src, 'rb') as fsrc: > IOError: [Errno 2] No such file or directory: u'/usr/lib/libc++abi.dylib' > {code} > After further investigation, it looks like libc+{+}abi.dylib is in the > uninstrumented lib, but otool -L always gives back a path for > /usr/lib/libc{+}+abi.dylib . Simply adding the dylib into the > PAT_MACOS_LIB_EXCLUDE list doesn't work: it creates a jar file, but the > binaries can not be started. > It is probably due to the changes in how dynamic linking works in newer > MacOS: > [https://stackoverflow.com/questions/70581876/macos-dynamic-linker-reports-it-loaded-library-which-doesnt-exist] > It happens both on ARM64 and X86 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3436) build_mini_cluster_binaries.sh doesn't work on Mac 13.0.1
[ https://issues.apache.org/jira/browse/KUDU-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740210#comment-17740210 ] Yingchun Lai commented on KUDU-3436: Besides, in recent years Mac with Apple chips are more popular, maybe we can change to/both publish ARM artifacts. > build_mini_cluster_binaries.sh doesn't work on Mac 13.0.1 > - > > Key: KUDU-3436 > URL: https://issues.apache.org/jira/browse/KUDU-3436 > Project: Kudu > Issue Type: Bug >Reporter: Bakai Ádám >Priority: Major > > > {code:java} > build_mini_cluster_binaries.sh {code} > returns the following error: > {code:java} > Traceback (most recent call last): > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 503, in > main() > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 500, in main > relocate_deps(target_src, target_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 408, in relocate_deps > return relocate_deps_macos(target_src, target_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 397, in relocate_deps_macos > copy_file(dep_src, dep_dst) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 287, in copy_file > shutil.copyfile(src, dest) > File > "/opt/homebrew/Cellar/python@2/2.7.18/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", > line 96, in copyfile > with open(src, 'rb') as fsrc: > IOError: [Errno 2] No such file or directory: u'/usr/lib/libc++abi.dylib' > {code} > After further investigation, it looks like libc+{+}abi.dylib is in the > uninstrumented lib, but otool -L always gives back a path for > /usr/lib/libc{+}+abi.dylib . Simply adding the dylib into the > PAT_MACOS_LIB_EXCLUDE list doesn't work: it creates a jar file, but the > binaries can not be started. > It is probably due to the changes in how dynamic linking works in newer > MacOS: > [https://stackoverflow.com/questions/70581876/macos-dynamic-linker-reports-it-loaded-library-which-doesnt-exist] > It happens both on ARM64 and X86 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KUDU-3436) build_mini_cluster_binaries.sh doesn't work on Mac 13.0.1
[ https://issues.apache.org/jira/browse/KUDU-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744690#comment-17744690 ] Yingchun Lai commented on KUDU-3436: Has been solved by https://gerrit.cloudera.org/c/20185/ > build_mini_cluster_binaries.sh doesn't work on Mac 13.0.1 > - > > Key: KUDU-3436 > URL: https://issues.apache.org/jira/browse/KUDU-3436 > Project: Kudu > Issue Type: Bug >Reporter: Bakai Ádám >Priority: Major > > > {code:java} > build_mini_cluster_binaries.sh {code} > returns the following error: > {code:java} > Traceback (most recent call last): > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 503, in > main() > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 500, in main > relocate_deps(target_src, target_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 408, in relocate_deps > return relocate_deps_macos(target_src, target_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 398, in relocate_deps_macos > relocate_deps_macos(dep_src, dep_dst, config) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 397, in relocate_deps_macos > copy_file(dep_src, dep_dst) > File > "/Users/adambakai/CLionProjects/kudu/build-support/mini-cluster/relocate_binaries_for_mini_cluster.py", > line 287, in copy_file > shutil.copyfile(src, dest) > File > "/opt/homebrew/Cellar/python@2/2.7.18/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", > line 96, in copyfile > with open(src, 'rb') as fsrc: > IOError: [Errno 2] No such file or directory: u'/usr/lib/libc++abi.dylib' > {code} > After further investigation, it looks like libc+{+}abi.dylib is in the > uninstrumented lib, but otool -L always gives back a path for > /usr/lib/libc{+}+abi.dylib . Simply adding the dylib into the > PAT_MACOS_LIB_EXCLUDE list doesn't work: it creates a jar file, but the > binaries can not be started. > It is probably due to the changes in how dynamic linking works in newer > MacOS: > [https://stackoverflow.com/questions/70581876/macos-dynamic-linker-reports-it-loaded-library-which-doesnt-exist] > It happens both on ARM64 and X86 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KUDU-3510) Docker images build failed
Yingchun Lai created KUDU-3510: -- Summary: Docker images build failed Key: KUDU-3510 URL: https://issues.apache.org/jira/browse/KUDU-3510 Project: Kudu Issue Type: Bug Components: build, docker Affects Versions: 1.17.0 Reporter: Yingchun Lai I encountered some issures when try to build Docker images: 1. Enviroment: CentOS 7.9, docker 24.0.1. Error: {code:java} $ python ./docker/docker-build.py --action push --platforms linux/amd64 linux/arm64 Starting docker build: 2023-09-12T13:43:53.888588 Version: 1.17.0 (a3cd1ef13) ... => CANCELED [linux/amd64 dev 7/7] RUN ./bootstrap-dev-env.sh && ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm bootstrap-python-env.sh 2.7s => ERROR [linux/arm64 dev 7/7] RUN ./bootstrap-dev-env.sh && ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm bootstrap-python-env.sh 2.0s => CANCELED [linux/arm64 runtime 5/5] RUN ./bootstrap-runtime-env.sh && rm bootstrap-runtime-env.sh 2.4s -- > [linux/arm64 dev 7/7] RUN ./bootstrap-dev-env.sh && ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm bootstrap-python-env.sh: #0 1.451 Error while loading ȇs//./bootstrap-dev-env.sh: No such file or directory -- ERROR: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c ./bootstrap-dev-env.sh && ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm bootstrap-python-env.sh" did not complete successfully: exit code: 1 Traceback (most recent call last): File "./docker/docker-build.py", line 384, in main() File "./docker/docker-build.py", line 377, in main run_command(docker_build_cmd, opts) File "./docker/docker-build.py", line 145, in run_command subprocess.check_output(cmd, shell=True) File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command 'docker buildx build --push --platform linux/arm64,linux/amd64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data1/laiyingchun/dev/ap_kudu_117/docker/Dockerfile --target kudu --tag apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag apache/kudu:latest-ubuntu --tag apache/kudu:latest /data1/laiyingchun/dev/ap_kudu_117' returned non-zero exit status 1 {code} This issue seems can be resolved by [https://gerrit.cloudera.org/c/20299/,] but I didn't troubleshoot the root cause. 2. Enviroment: Rocky 8.6, 20.10.17 Error: {code:java} $ python3 ./docker/docker-build.py --action push --platforms linux/amd64 linux/arm64 Starting docker build: 2023-09-12T13:43:42.725191 Version: 1.17.0 (a3cd1ef13) ... => CACHED [linux/amd64 kudu 6/6] COPY --chown=kudu:kudu ./docker/kudu-entrypoint.sh / 0.0s => ERROR [linux/arm64 build 10/17] RUN --mount=type=cache,id=ccache,uid=1000,gid=1000,target=/home/kudu/.ccache --mount=type=cache,id=gradle-cache,uid=1000,gid=1000,target=/home/kudu/.gradle ../../build-support/enable_devtoolset.sh ../../ 727.5s -- > [linux/arm64 build 10/17] RUN --mount=type=cache,id=ccache,uid=1000,gid=1000,target=/home/kudu/.ccache --mount=type=cache,id=gradle-cache,uid=1000,gid=1000,target=/home/kudu/.gradle ../../build-support/enable_devtoolset.sh ../../thirdparty/installed/common/bin/cmake -DCMAKE_BUILD_TYPE=release -DKUDU_LINK=static -DKUDU_GIT_HASH=a3cd1ef13 -DNO_TESTS=1 ../.. && make -j4 && sudo make install && if [ "1" == "1" ]; then find "bin" -name "kudu*" -type f -exec strip {} ;; fi && if [[ "1" == "1" ]]; then find "/usr/local" -name "libkudu*" -type f -exec strip {} ;; fi: #0 2.029 -- The C compiler identification is GNU 7.5.0 #0 2.930 -- The CXX compiler identification is GNU 7.5.0 ... #0 704.1 [100%] Linking CXX executable ../../../bin/kudu #0 723.3 [100%] Built target kudu #0 723.4 sudo: effective uid is not 0
[jira] [Commented] (KUDU-3510) Docker images build failed
[ https://issues.apache.org/jira/browse/KUDU-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764106#comment-17764106 ] Yingchun Lai commented on KUDU-3510: The issue #2 seems can be resolved by command: {code:java} docker run --privileged multiarch/qemu-user-static:latest --reset -p yes --credential yes {code} ref: https://github.com/docker/buildx/issues/1335 > Docker images build failed > -- > > Key: KUDU-3510 > URL: https://issues.apache.org/jira/browse/KUDU-3510 > Project: Kudu > Issue Type: Bug > Components: build, docker >Affects Versions: 1.17.0 >Reporter: Yingchun Lai >Priority: Major > > I encountered some issures when try to build Docker images: > 1. > Enviroment: > CentOS 7.9, docker 24.0.1. > Error: > {code:java} > $ python ./docker/docker-build.py --action push --platforms linux/amd64 > linux/arm64 > Starting docker build: 2023-09-12T13:43:53.888588 > Version: 1.17.0 (a3cd1ef13) > ... > => CANCELED [linux/amd64 dev 7/7] RUN ./bootstrap-dev-env.sh && > ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm > bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm > bootstrap-python-env.sh 2.7s > => ERROR [linux/arm64 dev 7/7] RUN ./bootstrap-dev-env.sh && > ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm > bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm > bootstrap-python-env.sh 2.0s > => CANCELED [linux/arm64 runtime 5/5] RUN ./bootstrap-runtime-env.sh && rm > bootstrap-runtime-env.sh > > 2.4s > -- > > [linux/arm64 dev 7/7] RUN ./bootstrap-dev-env.sh && > ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm > bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm > bootstrap-python-env.sh: > #0 1.451 Error while loading ȇs//./bootstrap-dev-env.sh: No such file or > directory > -- > ERROR: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c > ./bootstrap-dev-env.sh && ./bootstrap-java-env.sh && > ./bootstrap-python-env.sh && rm bootstrap-dev-env.sh && rm > bootstrap-java-env.sh && rm bootstrap-python-env.sh" did not complete > successfully: exit code: 1 > Traceback (most recent call last): > File "./docker/docker-build.py", line 384, in > main() > File "./docker/docker-build.py", line 377, in main > run_command(docker_build_cmd, opts) > File "./docker/docker-build.py", line 145, in run_command > subprocess.check_output(cmd, shell=True) > File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output > raise CalledProcessError(retcode, cmd, output=output) > subprocess.CalledProcessError: Command 'docker buildx build --push --platform > linux/arm64,linux/amd64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" > --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" > --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache > Kudu " --build-arg URL="https://kudu.apache.org"; > --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg > VCS_TYPE="git" --build-arg > VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file > /data1/laiyingchun/dev/ap_kudu_117/docker/Dockerfile --target kudu --tag > apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag > apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag > apache/kudu:latest-ubuntu --tag apache/kudu:latest > /data1/laiyingchun/dev/ap_kudu_117' returned non-zero exit status 1 {code} > This issue seems can be resolved by [https://gerrit.cloudera.org/c/20299/,] > but I didn't troubleshoot the root cause. > 2. > Enviroment: Rocky 8.6, 20.10.17 > Error: > {code:java} > $ python3 ./docker/docker-build.py --action push --platforms linux/amd64 > linux/arm64 > Starting docker build: 2023-09-12T13:43:42.725191 > Version: 1.17.0 (a3cd1ef13) > ... > => CACHED [linux/amd64 kudu 6/6] COPY --chown=kudu:kudu > ./docker/kudu-entrypoint.sh / > > 0.0s > => ERROR [linux/arm64 build 10/17] RUN > --mount=type=cache,id=ccache,uid=1000,gid=1000,target=/home/kudu/.ccache > --mount=type=cache,id=gradle-cache,uid=1000,gid=1000,target=/home/kudu/.gradle > ../../build-support/enable_devtoolset.sh ../../ 727.5s > -- > > [linux/arm64 build 10/17] RUN > --mount=type=cache,id=ccache,uid=1000,gid=1000,target=/home/kudu/.ccache > --mount=type=cache,id=gradle-cache,uid=1000,gid=1000,target=/home/kudu/.gradle > ../../build-support/enable_devtoolset.sh > ../../thirdp
[jira] [Comment Edited] (KUDU-3510) Docker images build failed
[ https://issues.apache.org/jira/browse/KUDU-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764106#comment-17764106 ] Yingchun Lai edited comment on KUDU-3510 at 9/12/23 9:50 AM: - The issue #2 seems can be resolved by command: {code:java} docker run --privileged multiarch/qemu-user-static:latest --reset -p yes --credential yes {code} ref: [https://github.com/docker/buildx/issues/1335] It works well after the command. {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/amd64 Starting docker build: 2023-09-12T17:35:26.209468 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/amd64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu --tag apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag apache/kudu:latest-ubuntu --tag apache/kudu:latest /data/qdev/laiyingchun/kudu [+] Building 11.8s (52/52) FINISHED ... Finished Docker build: 2023-09-12T17:37:37.680546 (0:02:11.471078) {code} {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/arm64 Starting docker build: 2023-09-12T17:41:29.525245 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/arm64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu --tag apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag apache/kudu:latest-ubuntu --tag apache/kudu:latest /data/qdev/laiyingchun/kudu [+] Building 21.5s (52/52) FINISHED => [internal] load .dockerignore 0.0s ... Finished Docker build: 2023-09-12T17:48:39.189521 (0:07:09.664276) {code} was (Author: laiyingchun): The issue #2 seems can be resolved by command: {code:java} docker run --privileged multiarch/qemu-user-static:latest --reset -p yes --credential yes {code} ref: https://github.com/docker/buildx/issues/1335 > Docker images build failed > -- > > Key: KUDU-3510 > URL: https://issues.apache.org/jira/browse/KUDU-3510 > Project: Kudu > Issue Type: Bug > Components: build, docker >Affects Versions: 1.17.0 >Reporter: Yingchun Lai >Priority: Major > > I encountered some issures when try to build Docker images: > 1. > Enviroment: > CentOS 7.9, docker 24.0.1. > Error: > {code:java} > $ python ./docker/docker-build.py --action push --platforms linux/amd64 > linux/arm64 > Starting docker build: 2023-09-12T13:43:53.888588 > Version: 1.17.0 (a3cd1ef13) > ... > => CANCELED [linux/amd64 dev 7/7] RUN ./bootstrap-dev-env.sh && > ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm > bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm > bootstrap-python-env.sh 2.7s > => ERROR [linux/arm64 dev 7/7] RUN ./bootstrap-dev-env.sh && > ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm > bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm > bootstrap-python-env.sh 2.0s > => CANCELED [linux/arm64 runtime 5/5] RUN ./bootstrap-runtime-env.sh && rm > bootstrap-runtime-env.sh > > 2.4s > -- > > [linux/arm64 dev 7/7] RUN ./bootstrap-dev-env.sh && > ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm > bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm > bootstrap-python-env.sh: > #0 1.451 Error w
[jira] [Comment Edited] (KUDU-3510) Docker images build failed
[ https://issues.apache.org/jira/browse/KUDU-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764106#comment-17764106 ] Yingchun Lai edited comment on KUDU-3510 at 9/12/23 9:51 AM: - The issue #2 seems can be resolved by command: {code:java} docker run --privileged multiarch/qemu-user-static:latest --reset -p yes --credential yes {code} ref: [https://github.com/docker/buildx/issues/1335] It works well after the command. {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/amd64 Starting docker build: 2023-09-12T17:35:26.209468 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/amd64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu --tag apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag apache/kudu:latest-ubuntu --tag apache/kudu:latest /data/qdev/laiyingchun/kudu [+] Building 11.8s (52/52) FINISHED ... Finished Docker build: 2023-09-12T17:37:37.680546 (0:02:11.471078) {code} {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/arm64 Starting docker build: 2023-09-12T17:41:29.525245 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/arm64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu --tag apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag apache/kudu:latest-ubuntu --tag apache/kudu:latest /data/qdev/laiyingchun/kudu [+] Building 21.5s (52/52) FINISHED => [internal] load .dockerignore 0.0s ... Finished Docker build: 2023-09-12T17:48:39.189521 (0:07:09.664276) {code} was (Author: laiyingchun): The issue #2 seems can be resolved by command: {code:java} docker run --privileged multiarch/qemu-user-static:latest --reset -p yes --credential yes {code} ref: [https://github.com/docker/buildx/issues/1335] It works well after the command. {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/amd64 Starting docker build: 2023-09-12T17:35:26.209468 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/amd64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu --tag apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag apache/kudu:latest-ubuntu --tag apache/kudu:latest /data/qdev/laiyingchun/kudu [+] Building 11.8s (52/52) FINISHED ... Finished Docker build: 2023-09-12T17:37:37.680546 (0:02:11.471078) {code} {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/arm64 Starting docker build: 2023-09-12T17:41:29.525245 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/arm64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="
[jira] [Comment Edited] (KUDU-3510) Docker images build failed
[ https://issues.apache.org/jira/browse/KUDU-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764106#comment-17764106 ] Yingchun Lai edited comment on KUDU-3510 at 9/12/23 9:51 AM: - The issue #2 seems can be resolved by command: {code:java} docker run --privileged multiarch/qemu-user-static:latest --reset -p yes --credential yes {code} ref: [https://github.com/docker/buildx/issues/1335] It works well after the command. {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/amd64 Starting docker build: 2023-09-12T17:35:26.209468 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/amd64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu --tag apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag apache/kudu:latest-ubuntu --tag apache/kudu:latest /data/qdev/laiyingchun/kudu [+] Building 11.8s (52/52) FINISHED ... Finished Docker build: 2023-09-12T17:37:37.680546 (0:02:11.471078) {code} {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/arm64 Starting docker build: 2023-09-12T17:41:29.525245 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/arm64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu --tag apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag apache/kudu:latest-ubuntu --tag apache/kudu:latest /data/qdev/laiyingchun/kudu [+] Building 21.5s (52/52) FINISHED => [internal] load .dockerignore 0.0s ... Building kudu-python target... Running: docker buildx build --load --platform linux/arm64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu-python --tag apache/kudu:kudu-python-1.17.0-ubuntu --tag apache/kudu:kudu-python-1.17.0 --tag apache/kudu:kudu-python-1.17-ubuntu --tag apache/kudu:kudu-python-1.17 --tag apache/kudu:kudu-python-latest-ubuntu --tag apache/kudu:kudu-python-latest /data/qdev/laiyingchun/kudu [+] Building 407.2s (53/53) FINISHED Finished Docker build: 2023-09-12T17:48:39.189521 (0:07:09.664276) {code} was (Author: laiyingchun): The issue #2 seems can be resolved by command: {code:java} docker run --privileged multiarch/qemu-user-static:latest --reset -p yes --credential yes {code} ref: [https://github.com/docker/buildx/issues/1335] It works well after the command. {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/amd64 Starting docker build: 2023-09-12T17:35:26.209468 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/amd64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/
[jira] [Closed] (KUDU-3510) Docker images build failed
[ https://issues.apache.org/jira/browse/KUDU-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai closed KUDU-3510. -- Resolution: Fixed > Docker images build failed > -- > > Key: KUDU-3510 > URL: https://issues.apache.org/jira/browse/KUDU-3510 > Project: Kudu > Issue Type: Bug > Components: build, docker >Affects Versions: 1.17.0 >Reporter: Yingchun Lai >Priority: Major > > I encountered some issures when try to build Docker images: > 1. > Enviroment: > CentOS 7.9, docker 24.0.1. > Error: > {code:java} > $ python ./docker/docker-build.py --action push --platforms linux/amd64 > linux/arm64 > Starting docker build: 2023-09-12T13:43:53.888588 > Version: 1.17.0 (a3cd1ef13) > ... > => CANCELED [linux/amd64 dev 7/7] RUN ./bootstrap-dev-env.sh && > ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm > bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm > bootstrap-python-env.sh 2.7s > => ERROR [linux/arm64 dev 7/7] RUN ./bootstrap-dev-env.sh && > ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm > bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm > bootstrap-python-env.sh 2.0s > => CANCELED [linux/arm64 runtime 5/5] RUN ./bootstrap-runtime-env.sh && rm > bootstrap-runtime-env.sh > > 2.4s > -- > > [linux/arm64 dev 7/7] RUN ./bootstrap-dev-env.sh && > ./bootstrap-java-env.sh && ./bootstrap-python-env.sh && rm > bootstrap-dev-env.sh && rm bootstrap-java-env.sh && rm > bootstrap-python-env.sh: > #0 1.451 Error while loading ȇs//./bootstrap-dev-env.sh: No such file or > directory > -- > ERROR: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c > ./bootstrap-dev-env.sh && ./bootstrap-java-env.sh && > ./bootstrap-python-env.sh && rm bootstrap-dev-env.sh && rm > bootstrap-java-env.sh && rm bootstrap-python-env.sh" did not complete > successfully: exit code: 1 > Traceback (most recent call last): > File "./docker/docker-build.py", line 384, in > main() > File "./docker/docker-build.py", line 377, in main > run_command(docker_build_cmd, opts) > File "./docker/docker-build.py", line 145, in run_command > subprocess.check_output(cmd, shell=True) > File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output > raise CalledProcessError(retcode, cmd, output=output) > subprocess.CalledProcessError: Command 'docker buildx build --push --platform > linux/arm64,linux/amd64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" > --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" > --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache > Kudu " --build-arg URL="https://kudu.apache.org"; > --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg > VCS_TYPE="git" --build-arg > VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file > /data1/laiyingchun/dev/ap_kudu_117/docker/Dockerfile --target kudu --tag > apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag > apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag > apache/kudu:latest-ubuntu --tag apache/kudu:latest > /data1/laiyingchun/dev/ap_kudu_117' returned non-zero exit status 1 {code} > This issue seems can be resolved by [https://gerrit.cloudera.org/c/20299/,] > but I didn't troubleshoot the root cause. > 2. > Enviroment: Rocky 8.6, 20.10.17 > Error: > {code:java} > $ python3 ./docker/docker-build.py --action push --platforms linux/amd64 > linux/arm64 > Starting docker build: 2023-09-12T13:43:42.725191 > Version: 1.17.0 (a3cd1ef13) > ... > => CACHED [linux/amd64 kudu 6/6] COPY --chown=kudu:kudu > ./docker/kudu-entrypoint.sh / > > 0.0s > => ERROR [linux/arm64 build 10/17] RUN > --mount=type=cache,id=ccache,uid=1000,gid=1000,target=/home/kudu/.ccache > --mount=type=cache,id=gradle-cache,uid=1000,gid=1000,target=/home/kudu/.gradle > ../../build-support/enable_devtoolset.sh ../../ 727.5s > -- > > [linux/arm64 build 10/17] RUN > --mount=type=cache,id=ccache,uid=1000,gid=1000,target=/home/kudu/.ccache > --mount=type=cache,id=gradle-cache,uid=1000,gid=1000,target=/home/kudu/.gradle > ../../build-support/enable_devtoolset.sh > ../../thirdparty/installed/common/bin/cmake -DCMAKE_BUILD_TYPE=release > -DKUDU_LINK=static -DKUDU_GIT_HASH=a3cd1ef13 -DNO_TESTS=1 ../.. && > make -j4 && sudo make install && if [ "1" == "1" ]; then find "bin" -name > "kudu*" -type f -exe
[jira] [Comment Edited] (KUDU-3510) Docker images build failed
[ https://issues.apache.org/jira/browse/KUDU-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764106#comment-17764106 ] Yingchun Lai edited comment on KUDU-3510 at 9/12/23 9:52 AM: - The issue #2 seems can be resolved by command: {code:java} docker run --privileged multiarch/qemu-user-static:latest --reset -p yes --credential yes {code} ref: [https://github.com/docker/buildx/issues/1335] It works well after the command. {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/amd64 Starting docker build: 2023-09-12T17:35:26.209468 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/amd64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu --tag apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag apache/kudu:latest-ubuntu --tag apache/kudu:latest /data/qdev/laiyingchun/kudu [+] Building 11.8s (52/52) FINISHED ... Finished Docker build: 2023-09-12T17:37:37.680546 (0:02:11.471078) {code} {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/arm64 Starting docker build: 2023-09-12T17:41:29.525245 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/arm64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu --tag apache/kudu:1.17.0-ubuntu --tag apache/kudu:1.17.0 --tag apache/kudu:1.17-ubuntu --tag apache/kudu:1.17 --tag apache/kudu:latest-ubuntu --tag apache/kudu:latest /data/qdev/laiyingchun/kudu [+] Building 21.5s (52/52) FINISHED => [internal] load .dockerignore 0.0s ... Building kudu-python target... Running: docker buildx build --load --platform linux/arm64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/kudu/docker/Dockerfile --target kudu-python --tag apache/kudu:kudu-python-1.17.0-ubuntu --tag apache/kudu:kudu-python-1.17.0 --tag apache/kudu:kudu-python-1.17-ubuntu --tag apache/kudu:kudu-python-1.17 --tag apache/kudu:kudu-python-latest-ubuntu --tag apache/kudu:kudu-python-latest /data/qdev/laiyingchun/kudu [+] Building 407.2s (53/53) FINISHED ... Finished Docker build: 2023-09-12T17:48:39.189521 (0:07:09.664276) {code} was (Author: laiyingchun): The issue #2 seems can be resolved by command: {code:java} docker run --privileged multiarch/qemu-user-static:latest --reset -p yes --credential yes {code} ref: [https://github.com/docker/buildx/issues/1335] It works well after the command. {code:java} $ python3 ./docker/docker-build.py --action load --platforms linux/amd64 Starting docker build: 2023-09-12T17:35:26.209468 Version: 1.17.0 (a3cd1ef13) Bases: ['ubuntu:bionic'] Targets: ['kudu', 'kudu-python'] Building targets for ubuntu:bionic... Building kudu target... Running: docker buildx build --load --platform linux/amd64 --build-arg RUNTIME_BASE_OS="ubuntu:bionic" --build-arg DEV_BASE_OS="ubuntu:bionic" --build-arg BASE_OS="ubuntu:bionic" --build-arg DOCKERFILE="docker/Dockerfile" --build-arg MAINTAINER="Apache Kudu " --build-arg URL="https://kudu.apache.org"; --build-arg VERSION="1.17.0" --build-arg VCS_REF="a3cd1ef13" --build-arg VCS_TYPE="git" --build-arg VCS_URL="https://gitbox.apache.org/repos/asf/kudu.git"; --file /data/qdev/laiyingchun/k
[jira] [Created] (KUDU-3580) Kudu servers and tests crash after linking RocksDB library
Yingchun Lai created KUDU-3580: -- Summary: Kudu servers and tests crash after linking RocksDB library Key: KUDU-3580 URL: https://issues.apache.org/jira/browse/KUDU-3580 Project: Kudu Issue Type: Bug Components: master, test, tserver Reporter: Yingchun Lai -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KUDU-3580) Kudu servers and tests crash after linking RocksDB library
[ https://issues.apache.org/jira/browse/KUDU-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3580: --- Description: After this commit [1] is merged, it's reported that the binaries (both test binaries, {{{}kudu{}}}, {{{}kudu-tserver{}}}, {{kudu-master}} results in SIGILL with coredumps). GDB shows the following stack: (gdb) run Starting program: /home/aserbin/tmp/kudu [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGILL, Illegal instruction. std::function::swap(std::function&) (__x=..., this=0x7fffe0e0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 548 /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h: No such file or directory. (gdb) bt #0 std::function::swap(std::function&) ( __x=..., this=0x7fffe0e0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 #1 std::function::operator=(std::function const&) (__x=..., this=0x7fffe108) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:463 #2 rocksdb::OptionTypeInfo::SetParseFunc(std::function const&) (f=..., this=0x7fffe100) at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:591 #3 rocksdb::OptionTypeInfo::AsCustomSharedPtr ( offset=offset@entry=0, ovt=ovt@entry=rocksdb::OptionVerificationType::kByName, flags=flags@entry=rocksdb::OptionTypeFlags::kDontSerialize) at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:497 #4 0x00ee8c5e in __static_initialization_and_destruction_0(int, int) [clone .constprop.449] () at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/env/env.cc:1267 #5 0x03ca23cd in __libc_csu_init () #6 0x75a69c18 in __libc_start_main (main=0xed8de0 , argc=1, argv=0x7fffe4f8, init=0x3ca2380 <__libc_csu_init>, fini=, rtld_fini=, stack_end=0x7fffe4e8) at ../csu/libc-start.c:266 #7 0x00f8f4c4 in _start () at /root/Projects/kudu/src/kudu/tools/tool_main.cc:306 (gdb) 1. https://github.com/apache/kudu/commit/4da8b20070a7c0070a1829dfd50fdc78cad88b6a > Kudu servers and tests crash after linking RocksDB library > -- > > Key: KUDU-3580 > URL: https://issues.apache.org/jira/browse/KUDU-3580 > Project: Kudu > Issue Type: Bug > Components: master, test, tserver >Reporter: Yingchun Lai >Priority: Critical > > After this commit [1] is merged, it's reported that the binaries (both test > binaries, {{{}kudu{}}}, {{{}kudu-tserver{}}}, {{kudu-master}} results in > SIGILL with coredumps). > > GDB shows the following stack: > (gdb) run > Starting program: /home/aserbin/tmp/kudu > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Program received signal SIGILL, Illegal instruction. > std::function const&, std::string const&, void*)>::swap(std::function (rocksdb::ConfigOptions const&, std::string const&, std::string const&, > void*)>&) (__x=..., > this=0x7fffe0e0) > at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 > 548 /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h: No > such file or directory. > (gdb) bt > #0 std::function const&, std::string const&, void*)>::swap(std::function (rocksdb::ConfigOptions const&, std::string const&, std::string const&, > void*)>&) ( > __x=..., this=0x7fffe0e0) > at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 > #1 std::function const&, std::string const&, void*)>::operator=(std::function (rocksdb::ConfigOptions const&, std::string const&, std::string const&, > void*)> const&) (__x=..., this=0x7fffe108) > at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:463 > #2 rocksdb::OptionTypeInfo::SetParseFunc(std::function (rocksdb::ConfigOptions const&, std::string const&, std::string const&, > void*)> const&) > (f=..., this=0x7fffe100) > at > /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:591 > #3 rocksdb::OptionTypeInfo::AsCustomSharedPtr ( > offset=offset@entry=0, > ovt=ovt@entry=rocksdb::OptionVerificationType::kByName, > flags=flags@entry=rocksdb::OptionTypeFlags::kDontSerialize) > at > /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:497 > #4 0x00ee8c5e in __static_initialization_and_destruction_0(int, int) > [clone .constprop.449] () > at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/env/env.cc:1267 > #5 0x03ca23cd in __libc_csu_init () > #6 0x75a69c18 in __l
[jira] [Updated] (KUDU-3580) Kudu servers and tests crash after linking RocksDB library
[ https://issues.apache.org/jira/browse/KUDU-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3580: --- Description: After this commit [1] is merged, it's reported that the binaries (both test binaries, {{{}kudu{}}}, {{{}kudu-tserver{}}}, {{kudu-master}} results in SIGILL with coredumps). GDB shows the following stack: {code:java} (gdb) run Starting program: /home/aserbin/tmp/kudu [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGILL, Illegal instruction. std::function::swap(std::function&) (__x=..., this=0x7fffe0e0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 548 /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h: No such file or directory. (gdb) bt #0 std::function::swap(std::function&) ( __x=..., this=0x7fffe0e0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 #1 std::function::operator=(std::function const&) (__x=..., this=0x7fffe108) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:463 #2 rocksdb::OptionTypeInfo::SetParseFunc(std::function const&) (f=..., this=0x7fffe100) at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:591 #3 rocksdb::OptionTypeInfo::AsCustomSharedPtr ( offset=offset@entry=0, ovt=ovt@entry=rocksdb::OptionVerificationType::kByName, flags=flags@entry=rocksdb::OptionTypeFlags::kDontSerialize) at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:497 #4 0x00ee8c5e in __static_initialization_and_destruction_0(int, int) [clone .constprop.449] () at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/env/env.cc:1267 #5 0x03ca23cd in __libc_csu_init () #6 0x75a69c18 in __libc_start_main (main=0xed8de0 , argc=1, argv=0x7fffe4f8, init=0x3ca2380 <__libc_csu_init>, fini=, rtld_fini=, stack_end=0x7fffe4e8) at ../csu/libc-start.c:266 #7 0x00f8f4c4 in _start () at /root/Projects/kudu/src/kudu/tools/tool_main.cc:306 (gdb) {code} 1. [https://github.com/apache/kudu/commit/4da8b20070a7c0070a1829dfd50fdc78cad88b6a] was: After this commit [1] is merged, it's reported that the binaries (both test binaries, {{{}kudu{}}}, {{{}kudu-tserver{}}}, {{kudu-master}} results in SIGILL with coredumps). GDB shows the following stack: (gdb) run Starting program: /home/aserbin/tmp/kudu [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGILL, Illegal instruction. std::function::swap(std::function&) (__x=..., this=0x7fffe0e0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 548 /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h: No such file or directory. (gdb) bt #0 std::function::swap(std::function&) ( __x=..., this=0x7fffe0e0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 #1 std::function::operator=(std::function const&) (__x=..., this=0x7fffe108) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:463 #2 rocksdb::OptionTypeInfo::SetParseFunc(std::function const&) (f=..., this=0x7fffe100) at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:591 #3 rocksdb::OptionTypeInfo::AsCustomSharedPtr ( offset=offset@entry=0, ovt=ovt@entry=rocksdb::OptionVerificationType::kByName, flags=flags@entry=rocksdb::OptionTypeFlags::kDontSerialize) at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:497 #4 0x00ee8c5e in __static_initialization_and_destruction_0(int, int) [clone .constprop.449] () at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/env/env.cc:1267 #5 0x03ca23cd in __libc_csu_init () #6 0x75a69c18 in __libc_start_main (main=0xed8de0 , argc=1, argv=0x7fffe4f8, init=0x3ca2380 <__libc_csu_init>, fini=, rtld_fini=, stack_end=0x7fffe4e8) at ../csu/libc-start.c:266 #7 0x00f8f4c4 in _start () at /root/Projects/kudu/src/kudu/tools/tool_main.cc:306 (gdb) 1. https://github.com/apache/kudu/commit/4da8b20070a7c0070a1829dfd50fdc78cad88b6a > Kudu servers and tests crash after linking RocksDB library > -- > > Key: KUDU-3580 > URL: https://issues.apache.org/jira/browse/KUDU-3580 > Project: Kudu > Issue Type: Bug > Components: master, test, tserver >Reporter: Yingchun Lai >Priority: Critical > > After this commit [1] is merged, it's reported that the binaries (both test > binaries, {{{}kudu{}}}, {{{}kudu-tserver{}}}, {{kudu-master}} results in > SIGILL with coredumps). >
[jira] [Updated] (KUDU-3580) Kudu servers and tests crash after linking RocksDB library
[ https://issues.apache.org/jira/browse/KUDU-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai updated KUDU-3580: --- Description: After this commit [1] is merged, it's reported that the binaries (both test binaries, {{{}kudu{}}}, {{{}kudu-tserver{}}}, {{kudu-master}} results in SIGILL with coredumps). GDB shows the following stack: {code:java} (gdb) run Starting program: /home/aserbin/tmp/kudu [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGILL, Illegal instruction. std::function::swap(std::function&) (__x=..., this=0x7fffe0e0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 548 /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h: No such file or directory. (gdb) bt #0 std::function::swap(std::function&) ( __x=..., this=0x7fffe0e0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 #1 std::function::operator=(std::function const&) (__x=..., this=0x7fffe108) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:463 #2 rocksdb::OptionTypeInfo::SetParseFunc(std::function const&) (f=..., this=0x7fffe100) at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:591 #3 rocksdb::OptionTypeInfo::AsCustomSharedPtr ( offset=offset@entry=0, ovt=ovt@entry=rocksdb::OptionVerificationType::kByName, flags=flags@entry=rocksdb::OptionTypeFlags::kDontSerialize) at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:497 #4 0x00ee8c5e in __static_initialization_and_destruction_0(int, int) [clone .constprop.449] () at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/env/env.cc:1267 #5 0x03ca23cd in __libc_csu_init () #6 0x75a69c18 in __libc_start_main (main=0xed8de0 , argc=1, argv=0x7fffe4f8, init=0x3ca2380 <__libc_csu_init>, fini=, rtld_fini=, stack_end=0x7fffe4e8) at ../csu/libc-start.c:266 #7 0x00f8f4c4 in _start () at /root/Projects/kudu/src/kudu/tools/tool_main.cc:306 (gdb) {code} And an example of results where SIGILL is observed (just built the binaries with the top of the master branch at 634d967a0c620db2b3932c09b1fe13be1dc70f44): [http://dist-test.cloudera.org/job?job_id=root.1712768932.261750] 1. [https://github.com/apache/kudu/commit/4da8b20070a7c0070a1829dfd50fdc78cad88b6a] was: After this commit [1] is merged, it's reported that the binaries (both test binaries, {{{}kudu{}}}, {{{}kudu-tserver{}}}, {{kudu-master}} results in SIGILL with coredumps). GDB shows the following stack: {code:java} (gdb) run Starting program: /home/aserbin/tmp/kudu [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGILL, Illegal instruction. std::function::swap(std::function&) (__x=..., this=0x7fffe0e0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 548 /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h: No such file or directory. (gdb) bt #0 std::function::swap(std::function&) ( __x=..., this=0x7fffe0e0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:548 #1 std::function::operator=(std::function const&) (__x=..., this=0x7fffe108) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:463 #2 rocksdb::OptionTypeInfo::SetParseFunc(std::function const&) (f=..., this=0x7fffe100) at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:591 #3 rocksdb::OptionTypeInfo::AsCustomSharedPtr ( offset=offset@entry=0, ovt=ovt@entry=rocksdb::OptionVerificationType::kByName, flags=flags@entry=rocksdb::OptionTypeFlags::kDontSerialize) at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/include/rocksdb/utilities/options_type.h:497 #4 0x00ee8c5e in __static_initialization_and_destruction_0(int, int) [clone .constprop.449] () at /root/Projects/kudu/thirdparty/src/rocksdb-7.7.3/env/env.cc:1267 #5 0x03ca23cd in __libc_csu_init () #6 0x75a69c18 in __libc_start_main (main=0xed8de0 , argc=1, argv=0x7fffe4f8, init=0x3ca2380 <__libc_csu_init>, fini=, rtld_fini=, stack_end=0x7fffe4e8) at ../csu/libc-start.c:266 #7 0x00f8f4c4 in _start () at /root/Projects/kudu/src/kudu/tools/tool_main.cc:306 (gdb) {code} 1. [https://github.com/apache/kudu/commit/4da8b20070a7c0070a1829dfd50fdc78cad88b6a] > Kudu servers and tests crash after linking RocksDB library > -- > > Key: KUDU-3580 > URL: https://issues.apache.org/jira/browse/KUDU-3580 > Project: Kudu > Issue Type: Bug > Components: master, test, tserver >Reporter: Yingchun Lai >Priority: Critical > > After this commit