[ https://issues.apache.org/jira/browse/SOLR-15644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419087#comment-17419087 ]
Mark Robert Miller edited comment on SOLR-15644 at 9/23/21, 9:40 AM: --------------------------------------------------------------------- I’m not sure where I’m being offensive. I don’t expect you to go fix all the Solr issues, it’s a ridiculous amount of time and effort, I can tell you Uwe isn’t going to do it either. And it’s not a knock on the test framework either. If I thought it should have to deal with Solr and all of its random dependencies and it’s problems in an ideal way for every case that Solr faces I would say it should be changed. I haven’t found anything I’ve tried to change, I use it as intended - it catches bad behavior, and if I can address that behavior I do. And you can in 100s of cases. And in many fewer cases you either cannot, or the effort is too large. For instances, many things can be address by making Solr handle interrupts properly and it making it close/shutdown properly. Both are absolutely huge endeavors, I have done it, it would still take forever to repeat. I won’t get into all the issues because most of them apply when the tests milliseconds to seconds. When most of the tests are 10s of seconds to minutes, I don’t really care about the performance of moving from test to test. I’ll wait, linger, do rounds of interrupts, whatever. The broad linger and other waits are still detrimental because you lose the value of the framework letting you know what’s not right - things can be added now that cause almost every test to linger the full 10 seconds and then get interrupted and no one would even bat an eye or notice. But even that is not a big deal in the current world of things. The only thing I have to deal with in this world is cases where you remove some of the exceptions and slowness of a test and it has objects / threads that you don’t want carrying on into other tests, but with some layers removed, the test framework will interrupt them, they won’t stop in time, and it will fail the test run. But I can stop them and not fail the run and do it relatively quickly. Not with any broad approach, but specifically for that problem resource. The other items are no longer very interesting to me, but that are various cases. Sometimes there are items where if you just wait a very short time, it will close it very quickly, but if you hit it with an interrupt it will take much longer. There are cases with the overseer where if you hit it with an interrupt you may poke the bear and find it almost impossible to stop. Other cases where an interrupt gets you out of one layer of third party code, but not fully out and another quick interrupt or two will get you out. All of these cases are very individualized to mostly isolated objects / dependencies and if I’m doing things well, I don’t want any kind of broad behavior to deal with them - I want the framework to tightly control and fail everything I don’t have to individually work around as a last resort. Anyway, 1000 of issues are there, it just depends on what you care about and what affects you. You could look at the big integration tests and say they are heavy, and so crank down the number of test jvms and say, Solr tests are often heavy, use less jvms than Lucene. And that might be the end of it and Solr will still search your data. You could also look at those test jvms when you start up 15 at once and see most of the heavy tests sitting there using 0-3% of cpu most of them time. You could then look into that and find 1000 things that if addressed make those tests run as fast or faster than tight no dependency Lucene tests. If you took the former approach, maybe there are not 1000 problems, the tests are passing and you are searching your data. The second approach, you have a slightly different system when those huge integration tests are giving Lucene a hard time and actually using cpu and moving and exposing actual issues that are never seen when they sit around mostly hanging out. 1000 issues from one angle, no major issue the other. Which perspective I’m seeing depends on if I’m just collecting a pay check or want to be honest about what’s going on in front of me. Whether it’s a tangential piece of software in my daily life or a core piece. was (Author: markrmiller): I’m not sure where I’m being offensive. I don’t expect you to go fix all the Solr issues, it’s a ridiculous amount of time and effort, I can tell you Uwe isn’t going to do it either. And it’s not a knock on the test framework either. If I thought it should have to deal with Solr and all of its random dependencies and it’s problems in an ideal way for every case that Solr faces I would say it should be changed. I haven’t found anything I’ve tried to change, I use it as intended - it catches bad behavior, and if I can’t address that behavior I do. And you can in 100s of cases. And in many fewer cases you either cannot, or the effort is too large. For instances, many things can be address by making Solr handle interrupts properly and it making it close/shutdown properly. Both are absolutely huge endeavors, I have done it, it would still take forever to repeat. I won’t get into all the issues because most of them apply when the tests milliseconds to seconds. When most of the tests are 10s of seconds to minutes, I don’t really care about the performance of moving from test to test. I’ll wait, linger, do rounds of interrupts, whatever. The broad linger and other waits are still detrimental because you lose the value of the framework letting you know what’s not right - things can be added now that cause almost every test to linger the full 10 seconds and then get interrupted and no one would even bat an eye or notice. But even that is not a big deal in the current world of things. The only thing I have to deal with in this world is cases where you remove some of the exceptions and slowness of a test and it has objects / threads that you don’t want carrying on into other tests, but with some layers removed, the test framework will interrupt them, they won’t stop in time, and it will fail the test run. But I can stop them and not fail the run and do it relatively quickly. Not with any broad approach, but specifically for that problem resource. The other items are no longer very interesting to me, but that are various cases. Sometimes there are items where if you just wait a very short time, it will close it very quickly, but if you hit it with an interrupt it will take much longer. There are cases with the overseer where if you hit it with an interrupt you may poke the bear and find it almost impossible to stop. Other cases where an interrupt gets you out of one layer of third party code, but not fully out and another quick interrupt or two will get you out. All of these cases are very individualized to mostly isolated objects / dependencies and if I’m doing things well, I don’t want any kind of broad behavior to deal with them - I want the framework to tightly control and fail everything I don’t have to individually work around as a last resort. Anyway, 1000 of issues are there, it just depends on what you care about and what affects you. You could look at the big integration tests and say they are heavy, and so crank down the number of test jvms and say, Solr tests are often heavy, use less jvms than Lucene. And that might be the end of it and Solr will still search your data. You could also look at those test jvms when you start up 15 at once and see most of the heavy tests sitting there using 0-3% of cpu most of them time. You could then look into that and find 1000 things that if addressed make those tests run as fast or faster than tight no dependency Lucene tests. If you took the former approach, maybe there are not 1000 problems, the tests are passing and you are searching your data. The second approach, you have a slightly different system when those huge integration tests are giving Lucene a hard time and actually using cpu and moving and exposing actual issues that are never seen when they sit around mostly hanging out. 1000 issues from one angle, no major issue the other. Which perspective I’m seeing depends on if I’m just collecting a pay check or want to be honest about what’s going on in front of me. Whether it’s a tangential piece of software in my daily life or a core piece. > Add the ability to interrupt and wait for threads for problematic tests. > ------------------------------------------------------------------------ > > Key: SOLR-15644 > URL: https://issues.apache.org/jira/browse/SOLR-15644 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests > Reporter: Mark Robert Miller > Assignee: Mark Robert Miller > Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > The stuff in the test framework is slow and lacks control. For problematic > tests, you don't want to linger first and you want fine control around > interrupting - interrupting with a sledgehammer approach can actually make > things take longer. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org