[Canonical-ubuntu-qa] [Bug 2057734] Re: proc_sched_rt01 from ubuntu_ltp failed
** Changed in: linux (Ubuntu Mantic) Status: In Progress => Fix Committed ** Changed in: linux (Ubuntu Xenial) Status: Confirmed => Won't Fix ** Changed in: linux (Ubuntu Bionic) Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Canonical Platform QA Team, which is subscribed to ubuntu-kernel-tests. https://bugs.launchpad.net/bugs/2057734 Title: proc_sched_rt01 from ubuntu_ltp failed Status in ubuntu-kernel-tests: New Status in linux package in Ubuntu: Invalid Status in linux source package in Xenial: Won't Fix Status in linux source package in Bionic: Won't Fix Status in linux source package in Focal: Confirmed Status in linux source package in Jammy: Confirmed Status in linux source package in Mantic: Fix Committed Bug description: [Impact] The updated LTP has added proc_sched_rt01 testcase which can't pass since several commits are missed from kernel side. Test log: INFO: Test start time: Tue Mar 12 11:52:21 UTC 2024 COMMAND: /opt/ltp/bin/ltp-pan -q -e -S -a 163430 -n 163430 -f /tmp/ltp-X3Nz2HWCQe/alltests -l /dev/null -C /dev/null -T /dev/null LOG File: /dev/null FAILED COMMAND File: /dev/null TCONF COMMAND File: /dev/null Running tests... tst_kconfig.c:87: TINFO: Parsing kernel config '/lib/modules/6.5.0-27-generic/build/.config' tst_test.c:1741: TINFO: LTP version: 20230929-406-gcbc2d0568 tst_test.c:1625: TINFO: Timeout per run is 0h 00m 30s proc_sched_rt01.c:45: TFAIL: Expect: timeslice_ms > 0 after reset to default proc_sched_rt01.c:51: TPASS: echo 0 > /proc/sys/kernel/sched_rt_period_us : EINVAL (22) proc_sched_rt01.c:53: TFAIL: echo -1 > /proc/sys/kernel/sched_rt_period_us invalid retval 2: SUCCESS (0) proc_sched_rt01.c:59: TPASS: echo -2 > /proc/sys/kernel/sched_rt_runtime_us : EINVAL (22) proc_sched_rt01.c:72: TFAIL: echo rt_period_us+1 > /proc/sys/kernel/sched_rt_runtime_us invalid retval 1: SUCCESS (0) HINT: You _MAY_ be missing kernel fixes: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c1fc6484e1fb https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=079be8fc6309 [Fix] There are 3 relevant commits from upstream. 1. 079be8fc6309 sched/rt: Disallow writing invalid values to sched_rt_period_us 2. c1fc6484e1fb sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset 3. c7fcb99877f9 sched/rt: Fix sysctl_sched_rr_timeslice intial value Mantic: the 3rd is already in master-next. Jammy: stable v5.15.150 includes the three commits. Focal: master-next has include them after update to v5.4.270 Bionic: all the three commits are needed. [Test case] Run LTP update 20240312 to check the log of proc_sched_rt01. [Regression potential] Low risk since these content are existed in upstream for a while. Cyril Hrubis (2): sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Disallow writing invalid values to sched_rt_period_us kernel/sched/rt.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) [Original Bug Description] This is a new test case, issue found on M/J/F/B when testing LTP update 20240312 Test log: INFO: Test start time: Tue Mar 12 11:52:21 UTC 2024 COMMAND:/opt/ltp/bin/ltp-pan -q -e -S -a 163430 -n 163430 -f /tmp/ltp-X3Nz2HWCQe/alltests -l /dev/null -C /dev/null -T /dev/null LOG File: /dev/null FAILED COMMAND File: /dev/null TCONF COMMAND File: /dev/null Running tests... tst_kconfig.c:87: TINFO: Parsing kernel config '/lib/modules/6.5.0-27-generic/build/.config' tst_test.c:1741: TINFO: LTP version: 20230929-406-gcbc2d0568 tst_test.c:1625: TINFO: Timeout per run is 0h 00m 30s proc_sched_rt01.c:45: TFAIL: Expect: timeslice_ms > 0 after reset to default proc_sched_rt01.c:51: TPASS: echo 0 > /proc/sys/kernel/sched_rt_period_us : EINVAL (22) proc_sched_rt01.c:53: TFAIL: echo -1 > /proc/sys/kernel/sched_rt_period_us invalid retval 2: SUCCESS (0) proc_sched_rt01.c:59: TPASS: echo -2 > /proc/sys/kernel/sched_rt_runtime_us : EINVAL (22) proc_sched_rt01.c:72: TFAIL: echo rt_period_us+1 > /proc/sys/kernel/sched_rt_runtime_us invalid retval 1: SUCCESS (0) HINT: You _MAY_ be missing kernel fixes: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c1fc6484e1fb https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=079be8fc6309 Summary: passed 2 failed 3 broken 0 skipped 0 warnings 0 INFO: ltp-pan reported some tests FAIL LTP Version: 20230929-406-gcbc2d0568 INFO: Test end time: Tue Mar 12 11:52:21 UTC 2024 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2057734/+subscriptions -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https
[Canonical-ubuntu-qa] [Merge] ~xypron/ubuntu-manual-tests:milk-v-mars into ubuntu-manual-tests:main
Heinrich Schuchardt has proposed merging ~xypron/ubuntu-manual-tests:milk-v-mars into ubuntu-manual-tests:main. Commit message: Test case for pre-installed Milk-V Mars image Requested reviews: Ubuntu Testcase Admins (ubuntu-testcase) For more details, see: https://code.launchpad.net/~xypron/ubuntu-manual-tests/+git/ubuntu-manual-tests/+merge/464815 Please, add the new test case to the Noble test plan. -- Your team Ubuntu Testcase Admins is requested to review the proposed merge of ~xypron/ubuntu-manual-tests:milk-v-mars into ubuntu-manual-tests:main. diff --git a/testcases/image/1795_Install Milk-V Mars b/testcases/image/1795_Install Milk-V Mars new file mode 100755 index 000..5876f76 --- /dev/null +++ b/testcases/image/1795_Install Milk-V Mars @@ -0,0 +1,38 @@ +The scope of this test is to ensure that riscv64+visionfive2 image boots from SD card on Milk-V Mars board + + +Flash downloaded image onto SD card +You can use Gnome Disks app to restore img.xz onto the SD card +Alternatively you can use xz -d to decompress, and then dd to copy the image to the SD card +Connect networking, serial console to the board +Ethernet cable for networking +USB to TTL adapter for serial console (pinout available here: https://milkv.io/docs/mars/getting-started/setup) +Connect to the serial console +sudo screen /dev/ttyUSB0 115200 +Power on the board +You should see U-BOOT output +It should then boot GRUB after a delay +You should see GRUB menu +It should then boot the default kernel after a delay +After a while cloud-init will run + Wait for the 'Cloud-init finished' message +Then one will be able to login +Login and change password +Login using ubuntu for both username and password +Reenter ubuntu password again +Set new password +Confirm the new password +Perform generic testing +Check that apt update works +Run any command that is not installed, check that command-not-found recommends things to install +e.g. hello +Install a package and check that it works, e.g. hello +Reboot +The board should reboot normally +Poweroff +Console messages should reach poweroff target +There should be final kernel dmsg powering off +Manually turn power-off from the board + +If all actions produce the expected results listed, please submit a 'passed' result. +If an action fails, or produces an unexpected result, please submit a 'failed' result and file a bug. Please be sure to include the bug number when you submit your result. -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa More help : https://help.launchpad.net/ListHelp
[Canonical-ubuntu-qa] [Merge] ~andersson123/autopkgtest-cloud:align-with-prod into autopkgtest-cloud:master
Tim Andersson has proposed merging ~andersson123/autopkgtest-cloud:align-with-prod into autopkgtest-cloud:master. Requested reviews: Canonical's Ubuntu QA (canonical-ubuntu-qa) For more details, see: https://code.launchpad.net/~andersson123/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/464819 -- Your team Canonical's Ubuntu QA is requested to review the proposed merge of ~andersson123/autopkgtest-cloud:align-with-prod into autopkgtest-cloud:master. diff --git a/mojo/service-bundle b/mojo/service-bundle index 0a51b2c..144d844 100644 --- a/mojo/service-bundle +++ b/mojo/service-bundle @@ -32,7 +32,7 @@ applications: constraints: mem=16G cores=8 root-disk=40G {%- if stage_name == "production" or stage_name == "staging" %} storage: - tmp: 200G + tmp: 350G {%- endif %} options: &common-options swift-password: include-file://{{local_dir}}/swift_password @@ -100,17 +100,17 @@ applications: {%- if stage_name == "production" %} n-workers: |- lcy02: - amd64: 45 + amd64: 90 bos01: - arm64: 20 - ppc64el: 20 - s390x: 20 + arm64: 22 + ppc64el: 22 + s390x: 22 bos02: - arm64: 20 - ppc64el: 20 + arm64: 22 + ppc64el: 22 s390x: 22 bos03: - arm64: 25 + arm64: 28 {%- elif stage_name == "staging" %} n-workers: |- lcy02: -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa More help : https://help.launchpad.net/ListHelp
Re: [Canonical-ubuntu-qa] [Merge] ~andersson123/autopkgtest-cloud:stop-tests-from-webpage into autopkgtest-cloud:master
TODO (aside from inline comments): - type all functions with docstrings and param explanations - add the stop test hyperlink to all other pages which display running jobs Diff comments: > diff --git > a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/test-killer > b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/test-killer > new file mode 100755 > index 000..b1bc37e > --- /dev/null > +++ > b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/test-killer > @@ -0,0 +1,211 @@ > +#!/usr/bin/python3 > +"""Kills running tests.""" > + > +import configparser > +import json > +import logging > +import socket > +import subprocess > +import time > + > +import amqplib.client_0_8 as amqp > +import requests > + > +WRITER_EXCHANGE_NAME = "stop-running.fanout" > +RABBIT_CREDS = "/home/ubuntu/rabbitmq.cred" > +MSG_ONLY_KEYS = [ > +"uuid", > +"not-running-on", > +] > +NUM_WORKERS = 2 this should be stored as juju config option and preserved in worker conf file or something > + > +RABBIT_CFG = configparser.ConfigParser() > +with open(RABBIT_CREDS, "r") as f: > +RABBIT_CFG.read_string("[rabbit]\n" + f.read().replace('"', "")) > + > + > +def amqp_connect(): > +amqp_con = amqp.Connection( > +RABBIT_CFG["rabbit"]["RABBIT_HOST"], > +userid=RABBIT_CFG["rabbit"]["RABBIT_USER"], > +password=RABBIT_CFG["rabbit"]["RABBIT_PASSWORD"], > +confirm_publish=True, > +) > +return amqp_con > + > + > +def check_message(msg): > +return list(msg.keys()) == MSG_ONLY_KEYS > + > + > +def get_test_pid(uuid): > +try: > +# get list of running processes > +ps_aux_run = subprocess.run( > +["ps", "aux"], > +stdout=subprocess.PIPE, > +check=True, > +) > +# Filter the list for only 'runner' processes > +runner_run = subprocess.run( > +["grep", "runner"], > +input=ps_aux_run.stdout, > +stdout=subprocess.PIPE, > +check=True, > +) > +# Check all runner processes for the given uuid > +# If this one fails, the test isn't running on this worker > +uuid_run = subprocess.run( > +["grep", uuid], > +input=runner_run.stdout, > +capture_output=True, > +check=True, > +) > +except subprocess.CalledProcessError as _: > +# We hit this exception if the test with the given uuid > +# isn't running on this cloud worker > +return None > +search_for_test_output = uuid_run.stdout > +search_me = search_for_test_output.splitlines() > +# We have to assert the length is 1 otherwise we'll only kill > +# the first one in the list - which may be the incorrect one > +# if there's two processes with same uuid - something is wrong! > +assert len(search_me) == 1 > +line = search_me[0].decode("utf-8") > +if uuid in line: perhaps brittle parsing? idk I don't think so tho > +line = line.split(" ") > +line = [x for x in line if x] > +pid = line[1] > +return int(pid) > + > + > +def place_message_in_queue(info: dict, amqp_con: amqp.Connection): > +complete_amqp = amqp_con.channel() > +complete_amqp.access_request( > +"/complete", active=True, read=False, write=True > +) > +complete_amqp.exchange_declare( > +WRITER_EXCHANGE_NAME, "fanout", durable=True, auto_delete=False > +) > +complete_amqp.basic_publish( > +amqp.Message(json.dumps(info), delivery_mode=2), > +WRITER_EXCHANGE_NAME, > +"", > +) > + > + > +def kill_process(pid: int, uuid: str): > +# sends SIGUSR1 to worker > +# This causes autopkgtest to exit with code -10 this is inaccurate, exit code doesn't need to be stated, but need to say it hits the fallback option which cancels the test request and kills the openstack server if it's up yet > +# which the worker then detects, exits the test and kills > +# the openstack server, then the worker goes on to the next > +# test in the queue > +kill_cmd = "kill -USR1 %i" % pid > +try: > +_ = subprocess.run( > +kill_cmd.split(" "), > +check=True, > +) > +while get_test_pid(uuid) is not None: > +time.sleep(3) I think this sleep can be shorter - get_test_pid is a very quick function > +return True > +except subprocess.CalledProcessError as _: > +return False > + > + > +def test_is_queued(uuid: str): ugly function ... mojo stage ([production | staging]) should be in a config file somewhere > +influx_cfg = configparser.ConfigParser() > +with open("/home/ubuntu/influx.cred", "r") as f: > +influx_cfg.read_string("[influx]\n" + f.read()) > +if influx_cfg["influx"]["INFLUXDB_CONTEXT"] == "staging": > +autopkgtest_url = "https://autopkgtest.staging.ubuntu.com"; > +else: > +
Re: [Canonical-ubuntu-qa] [Merge] ~xypron/ubuntu-manual-tests:milk-v-mars into ubuntu-manual-tests:main
Review: Approve LGTM -- https://code.launchpad.net/~xypron/ubuntu-manual-tests/+git/ubuntu-manual-tests/+merge/464815 Your team Ubuntu Testcase Admins is subscribed to branch ubuntu-manual-tests:main. -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa More help : https://help.launchpad.net/ListHelp
[Canonical-ubuntu-qa] [Merge] ~xypron/ubuntu-manual-tests:milk-v-mars into ubuntu-manual-tests:main
The proposal to merge ~xypron/ubuntu-manual-tests:milk-v-mars into ubuntu-manual-tests:main has been updated. Status: Needs review => Merged For more details, see: https://code.launchpad.net/~xypron/ubuntu-manual-tests/+git/ubuntu-manual-tests/+merge/464815 -- Your team Ubuntu Testcase Admins is subscribed to branch ubuntu-manual-tests:main. -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa More help : https://help.launchpad.net/ListHelp
Re: [Canonical-ubuntu-qa] [Merge] ~xypron/ubuntu-manual-tests:milk-v-mars into ubuntu-manual-tests:main
This has been updated on the isotracker https://iso.qa.ubuntu.com/qatracker/milestones/450/builds/300224/testcases/1795/results Please let me know if this should be in a different testsuite i.e. the desktop testsuite -- https://code.launchpad.net/~xypron/ubuntu-manual-tests/+git/ubuntu-manual-tests/+merge/464815 Your team Ubuntu Testcase Admins is subscribed to branch ubuntu-manual-tests:main. -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa More help : https://help.launchpad.net/ListHelp
[Canonical-ubuntu-qa] [Bug 2063214] [NEW] unshare(1) fails within testbed VMs
Public bug reported: We hit this while running src:autopkgtest autopackage tests (d/t/unshare), but other packages may be affected too. In short: this works on my Noble laptop: paride@ossimoro:~$ cat /etc/subuid paride:10:65536 paride@ossimoro:~$ cat /etc/subgid paride:10:65536 paride@ossimoro:~$ unshare --map-auto --map-root-user root@ossimoro:~# id uid=0(root) gid=0(root) groups=0(root),65534(nogroup) root@ossimoro:~# su -c id uid=0(root) gid=0(root) groups=0(root) However, in a Noble amd64 testbed VM (running in lcy02): ubuntu@autopkgtest:~$ cat /etc/subuid ubuntu:10:65536 ubuntu@autopkgtest:~$ cat /etc/subgid ubuntu:10:65536 ubuntu@autopkgtest:~$ unshare --map-auto --map-root-user root@autopkgtest:~# id uid=0(root) gid=0(root) groups=0(root),65534(nogroup) root@autopkgtest:~# su -c id su: cannot set groups: Operation not permitted root@autopkgtest:~# echo $? 1 I am currently unable to tell what differs between the two systems. ** Affects: auto-package-testing Importance: Undecided Status: New ** Description changed: We hit this while running src:autopkgtest autopackage tests (d/t/unshare), but other packages may be affected too. In short: this works on my Noble laptop: paride@ossimoro:~$ cat /etc/subuid paride:10:65536 paride@ossimoro:~$ cat /etc/subgid paride:10:65536 paride@ossimoro:~$ unshare --map-auto --map-root-user root@ossimoro:~# id uid=0(root) gid=0(root) groups=0(root),65534(nogroup) root@ossimoro:~# su -c id uid=0(root) gid=0(root) groups=0(root) - However, in a Noble arm64 testbed VM (running in lcy02): + However, in a Noble amd64 testbed VM (running in lcy02): ubuntu@autopkgtest:~$ cat /etc/subuid ubuntu:10:65536 ubuntu@autopkgtest:~$ cat /etc/subgid ubuntu:10:65536 ubuntu@autopkgtest:~$ unshare --map-auto --map-root-user root@autopkgtest:~# id uid=0(root) gid=0(root) groups=0(root),65534(nogroup) root@autopkgtest:~# su -c id su: cannot set groups: Operation not permitted root@autopkgtest:~# echo $? 1 I am currently unable to tell what differs between the two systems. -- You received this bug notification because you are a member of Canonical's Ubuntu QA, which is subscribed to Auto Package Testing. https://bugs.launchpad.net/bugs/2063214 Title: unshare(1) fails within testbed VMs Status in Auto Package Testing: New Bug description: We hit this while running src:autopkgtest autopackage tests (d/t/unshare), but other packages may be affected too. In short: this works on my Noble laptop: paride@ossimoro:~$ cat /etc/subuid paride:10:65536 paride@ossimoro:~$ cat /etc/subgid paride:10:65536 paride@ossimoro:~$ unshare --map-auto --map-root-user root@ossimoro:~# id uid=0(root) gid=0(root) groups=0(root),65534(nogroup) root@ossimoro:~# su -c id uid=0(root) gid=0(root) groups=0(root) However, in a Noble amd64 testbed VM (running in lcy02): ubuntu@autopkgtest:~$ cat /etc/subuid ubuntu:10:65536 ubuntu@autopkgtest:~$ cat /etc/subgid ubuntu:10:65536 ubuntu@autopkgtest:~$ unshare --map-auto --map-root-user root@autopkgtest:~# id uid=0(root) gid=0(root) groups=0(root),65534(nogroup) root@autopkgtest:~# su -c id su: cannot set groups: Operation not permitted root@autopkgtest:~# echo $? 1 I am currently unable to tell what differs between the two systems. To manage notifications about this bug go to: https://bugs.launchpad.net/auto-package-testing/+bug/2063214/+subscriptions -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa More help : https://help.launchpad.net/ListHelp
Re: [Canonical-ubuntu-qa] [Merge] ~andersson123/autopkgtest-cloud:stop-tests-from-webpage into autopkgtest-cloud:master
inline comments all addressed Diff comments: > diff --git > a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/test-killer > b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/test-killer > new file mode 100755 > index 000..b1bc37e > --- /dev/null > +++ > b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/test-killer > @@ -0,0 +1,211 @@ > +#!/usr/bin/python3 > +"""Kills running tests.""" > + > +import configparser > +import json > +import logging > +import socket > +import subprocess > +import time > + > +import amqplib.client_0_8 as amqp > +import requests > + > +WRITER_EXCHANGE_NAME = "stop-running.fanout" > +RABBIT_CREDS = "/home/ubuntu/rabbitmq.cred" > +MSG_ONLY_KEYS = [ > +"uuid", > +"not-running-on", > +] > +NUM_WORKERS = 2 done > + > +RABBIT_CFG = configparser.ConfigParser() > +with open(RABBIT_CREDS, "r") as f: > +RABBIT_CFG.read_string("[rabbit]\n" + f.read().replace('"', "")) > + > + > +def amqp_connect(): > +amqp_con = amqp.Connection( > +RABBIT_CFG["rabbit"]["RABBIT_HOST"], > +userid=RABBIT_CFG["rabbit"]["RABBIT_USER"], > +password=RABBIT_CFG["rabbit"]["RABBIT_PASSWORD"], > +confirm_publish=True, > +) > +return amqp_con > + > + > +def check_message(msg): > +return list(msg.keys()) == MSG_ONLY_KEYS > + > + > +def get_test_pid(uuid): > +try: > +# get list of running processes > +ps_aux_run = subprocess.run( > +["ps", "aux"], > +stdout=subprocess.PIPE, > +check=True, > +) > +# Filter the list for only 'runner' processes > +runner_run = subprocess.run( > +["grep", "runner"], > +input=ps_aux_run.stdout, > +stdout=subprocess.PIPE, > +check=True, > +) > +# Check all runner processes for the given uuid > +# If this one fails, the test isn't running on this worker > +uuid_run = subprocess.run( > +["grep", uuid], > +input=runner_run.stdout, > +capture_output=True, > +check=True, > +) > +except subprocess.CalledProcessError as _: > +# We hit this exception if the test with the given uuid > +# isn't running on this cloud worker > +return None > +search_for_test_output = uuid_run.stdout > +search_me = search_for_test_output.splitlines() > +# We have to assert the length is 1 otherwise we'll only kill > +# the first one in the list - which may be the incorrect one > +# if there's two processes with same uuid - something is wrong! > +assert len(search_me) == 1 > +line = search_me[0].decode("utf-8") > +if uuid in line: i think is fine > +line = line.split(" ") > +line = [x for x in line if x] > +pid = line[1] > +return int(pid) > + > + > +def place_message_in_queue(info: dict, amqp_con: amqp.Connection): > +complete_amqp = amqp_con.channel() > +complete_amqp.access_request( > +"/complete", active=True, read=False, write=True > +) > +complete_amqp.exchange_declare( > +WRITER_EXCHANGE_NAME, "fanout", durable=True, auto_delete=False > +) > +complete_amqp.basic_publish( > +amqp.Message(json.dumps(info), delivery_mode=2), > +WRITER_EXCHANGE_NAME, > +"", > +) > + > + > +def kill_process(pid: int, uuid: str): > +# sends SIGUSR1 to worker > +# This causes autopkgtest to exit with code -10 > +# which the worker then detects, exits the test and kills > +# the openstack server, then the worker goes on to the next > +# test in the queue > +kill_cmd = "kill -USR1 %i" % pid > +try: > +_ = subprocess.run( > +kill_cmd.split(" "), > +check=True, > +) > +while get_test_pid(uuid) is not None: > +time.sleep(3) done > +return True > +except subprocess.CalledProcessError as _: > +return False > + > + > +def test_is_queued(uuid: str): done > +influx_cfg = configparser.ConfigParser() > +with open("/home/ubuntu/influx.cred", "r") as f: > +influx_cfg.read_string("[influx]\n" + f.read()) > +if influx_cfg["influx"]["INFLUXDB_CONTEXT"] == "staging": > +autopkgtest_url = "https://autopkgtest.staging.ubuntu.com"; > +else: > +autopkgtest_url = "https://autopkgtest.ubuntu.com"; > +queue_req = requests.get(autopkgtest_url) done > +if uuid in queue_req.content.decode("utf-8"): > +return True > +return False > + > + > +def already_checked_this_host(hostnames): done > +if socket.getfqdn() in hostnames: > +return True > +return False > + > + > +def process_message(msg, amqp_con): > +body = msg.body > +if isinstance(body, bytes): > +body = body.decode("UTF-8", errors="replace") > +info = json.loads(body) > +logging.info("Received request to kill test: %s" %
[Canonical-ubuntu-qa] [Merge] ~andersson123/autopkgtest-cloud:fix-tims-recent-docs into autopkgtest-cloud:master
Tim Andersson has proposed merging ~andersson123/autopkgtest-cloud:fix-tims-recent-docs into autopkgtest-cloud:master. Requested reviews: Canonical's Ubuntu QA (canonical-ubuntu-qa) For more details, see: https://code.launchpad.net/~andersson123/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/464843 -- Your team Canonical's Ubuntu QA is requested to review the proposed merge of ~andersson123/autopkgtest-cloud:fix-tims-recent-docs into autopkgtest-cloud:master. diff --git a/docs/administration.rst b/docs/administration.rst index 6a7e939..49739da 100644 --- a/docs/administration.rst +++ b/docs/administration.rst @@ -409,6 +409,7 @@ Before doing any of the steps detailed in this section, it's important to make s are currently running on the cloud worker with the partition you want to resize. .. code-block:: + # on the worker machine with the volume you intend to resize chmod -x autopkgtest-cloud/worker/worker sudo systemctl stop autopkgtest.target # ensure that you WAIT for all running jobs to finish, i.e. for the stop command to exit @@ -417,6 +418,7 @@ are currently running on the cloud worker with the partition you want to resize. First check that this specific version of openstack is available via: .. code-block:: + openstack --os-volume-api-version 3.42 volume list The command should not fail. @@ -424,6 +426,7 @@ The command should not fail. To resize a volume: .. code-block:: + # get the 'openstack' volume id juju storage --volume # the volume id is in the "Provider ID" column # from the above command, get the id, and set it to a variable: VOLUME_ID @@ -462,6 +465,7 @@ In order to kill a currently running test, grab the test uuid. This can be seen `ssh` to a worker unit, and run: .. code-block:: + ps aux | grep runner | grep $uuid # grab the PID from the process kill -15 $pid -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa More help : https://help.launchpad.net/ListHelp
Re: [Canonical-ubuntu-qa] [Merge] ~andersson123/autopkgtest-cloud:stop-tests-from-webpage into autopkgtest-cloud:master
All dev work on this is now done, I just need to test it again. -- https://code.launchpad.net/~andersson123/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/461654 Your team Canonical's Ubuntu QA is requested to review the proposed merge of ~andersson123/autopkgtest-cloud:stop-tests-from-webpage into autopkgtest-cloud:master. -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa More help : https://help.launchpad.net/ListHelp
[Canonical-ubuntu-qa] [Merge] ~andersson123/autopkgtest-cloud:stop-tests-from-webpage into autopkgtest-cloud:master
Tim Andersson has proposed merging ~andersson123/autopkgtest-cloud:stop-tests-from-webpage into autopkgtest-cloud:master. Requested reviews: Canonical's Ubuntu QA (canonical-ubuntu-qa) For more details, see: https://code.launchpad.net/~andersson123/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/461654 -- Your team Canonical's Ubuntu QA is requested to review the proposed merge of ~andersson123/autopkgtest-cloud:stop-tests-from-webpage into autopkgtest-cloud:master. diff --git a/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/test-killer b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/test-killer new file mode 100755 index 000..bfec858 --- /dev/null +++ b/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/test-killer @@ -0,0 +1,294 @@ +#!/usr/bin/python3 +"""Kills running tests.""" + +import configparser +import json +import logging +import pathlib +import socket +import subprocess +import time +from typing import List + +import amqplib.client_0_8 as amqp +import requests + +WRITER_EXCHANGE_NAME = "stop-running.fanout" +RABBIT_CREDS = "/home/ubuntu/rabbitmq.cred" +MSG_ONLY_KEYS = [ +"uuid", +"not-running-on", +] + +RABBIT_CFG = configparser.ConfigParser() +with open(RABBIT_CREDS, "r") as f: +RABBIT_CFG.read_string("[rabbit]\n" + f.read().replace('"', "")) + + +def amqp_connect() -> amqp.Connection: +""" +Creates an amqp.Connection object from the relevant creds +""" +amqp_con = amqp.Connection( +RABBIT_CFG["rabbit"]["RABBIT_HOST"], +userid=RABBIT_CFG["rabbit"]["RABBIT_USER"], +password=RABBIT_CFG["rabbit"]["RABBIT_PASSWORD"], +confirm_publish=True, +) +return amqp_con + + +def check_message(msg: dict) -> bool: +""" +Checks the "kill-request" message sent has only the desired keys + +:param msg: the amqp message converted from bytes to dictionary +""" +return list(msg.keys()) == MSG_ONLY_KEYS + + +def get_test_pid(uuid: str) -> int: +""" +Parses the output of ps aux and finds the pid of a running test +with a given uuid + +:param uuid: The given test uuid that is desired to be killed +""" +try: +# get list of running processes +ps_aux_run = subprocess.run( +["ps", "aux"], +stdout=subprocess.PIPE, +check=True, +) +# Filter the list for only 'runner' processes +runner_run = subprocess.run( +["grep", "runner"], +input=ps_aux_run.stdout, +stdout=subprocess.PIPE, +check=True, +) +# Check all runner processes for the given uuid +# If this one fails, the test isn't running on this worker +uuid_run = subprocess.run( +["grep", uuid], +input=runner_run.stdout, +capture_output=True, +check=True, +) +except subprocess.CalledProcessError as _: +# We hit this exception if the test with the given uuid +# isn't running on this cloud worker +return None +search_for_test_output = uuid_run.stdout +search_me = search_for_test_output.splitlines() +# We have to assert the length is 1 otherwise we'll only kill +# the first one in the list - which may be the incorrect one +# if there's two processes with same uuid - something is wrong! +assert len(search_me) == 1 +line = search_me[0].decode("utf-8") +if uuid in line: +line = line.split(" ") +line = [x for x in line if x] +pid = line[1] +return int(pid) + + +def place_message_in_queue(info: dict, amqp_con: amqp.Connection): +""" +Places a given dictionary into amqp as an amqp.Message object +into the queue with the WRITER_EXCHANGE_NAME exchange + +:param info: dictionary that'll be converted to an amqp message +:param amqp_con: the amqp connection that test-killer is using +""" +complete_amqp = amqp_con.channel() +complete_amqp.access_request( +"/complete", active=True, read=False, write=True +) +complete_amqp.exchange_declare( +WRITER_EXCHANGE_NAME, "fanout", durable=True, auto_delete=False +) +complete_amqp.basic_publish( +amqp.Message(json.dumps(info), delivery_mode=2), +WRITER_EXCHANGE_NAME, +"", +) + + +def kill_process(pid: int, uuid: str) -> bool: +""" +Sends SIGUSR1 to worker. +This causes the worker to go into the fallback failure mode, +in which the worker then exits the test and kills the +openstack server. The worker goes on to the next test in the +queue + +:param pid: pid of autopkgtest process to kill +:param uuid: The given test uuid that is desired to be killed +""" +kill_cmd = "kill -USR1 %i" % pid +try: +_ = subprocess.run( +kill_cmd.split(" "), +check=True, +) +while get_test_pid(uuid) is not None: +time.slee
Re: [Canonical-ubuntu-qa] [Merge] ~xypron/ubuntu-manual-tests:milk-v-mars into ubuntu-manual-tests:main
@andersson123 The Milk-V Mars test case should be under "Product (Ubuntu Server)". It is still missing on https://iso.qa.ubuntu.com/qatracker/milestones/453 -- https://code.launchpad.net/~xypron/ubuntu-manual-tests/+git/ubuntu-manual-tests/+merge/464815 Your team Ubuntu Testcase Admins is subscribed to branch ubuntu-manual-tests:main. -- Mailing list: https://launchpad.net/~canonical-ubuntu-qa Post to : canonical-ubuntu-qa@lists.launchpad.net Unsubscribe : https://launchpad.net/~canonical-ubuntu-qa More help : https://help.launchpad.net/ListHelp
[Canonical-ubuntu-qa] [Bug 2060719] Re: Bad page state in process swapper/0 pfn:02ee3
Looks it is same as https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056706 -- You received this bug notification because you are a member of Canonical Platform QA Team, which is subscribed to ubuntu-kernel-tests. https://bugs.launchpad.net/bugs/2060719 Title: Bad page state in process swapper/0 pfn:02ee3 Status in ubuntu-kernel-tests: New Bug description: Test: * ubuntu-kernel-test Affected Series: Noble Cloud: AWS Instance_Type: t2.small Boot Kernel: 6.8.0-1001-aws Image_ID: ami-05dfc4aad5fc7b1f9 Affected Cycles: d2024.04.04 , d2024.02.07, This bug appears to be isolated to the aws, t2.small instance. Xen HVM domU, BIOS 4.11.amazon 08/24/2006 dmesg: [ 19.439388] RSP: 002b:7ffe8c887f10 EFLAGS: 0287 [ 19.439390] RAX: 00b3b390 RBX: 0002 RCX: 76e4dd91ce20 [ 19.439392] RDX: bd6a078635df035e RSI: 00b3b080 RDI: 76e4dd92bd80 [ 19.439394] RBP: 7ffe8c887f90 R08: 000e R09: 76e4dd91ce30 [ 19.439395] R10: 0004 R11: 76e4dd92bd80 R12: bd6a078635df035e [ 19.439397] R13: 76e4dd91ce00 R14: 00b3b080 R15: 000f [ 19.439400] [ 19.440945] BUG: Bad page state in process swapper/0 pfn:02ee3 [ 19.445260] page:f373b374 refcount:0 mapcount:0 mapping: index:0x0 pfn:0x2ee3 [ 19.445263] flags: 0xfc000(node=0|zone=1|lastcpupid=0x1f) [ 19.445266] page_type: 0x() [ 19.445268] raw: 000fc000 dead0040 8d83413d4800 [ 19.445270] raw: 0001 [ 19.445272] page dumped because: page_pool leak [ 19.445273] Modules linked in: 8021q garp mrp stp llc crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd binfmt_misc floppy psmouse nls_iso8859_1 input_leds serio_raw dm_multipath msr efi_pstore nfnetlink ip_tables x_tables autofs4 [ 19.445294] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GB 6.8.0-1001-aws #1-Ubuntu [ 19.445299] Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006 [ 19.445301] Call Trace: [ 19.445302] [ 19.445304] dump_stack_lvl+0x48/0x70 [ 19.445309] dump_stack+0x10/0x20 [ 19.445312] bad_page+0x76/0x120 [ 19.445316] free_page_is_bad_report+0x86/0xa0 [ 19.445318] free_unref_page_prepare+0x26d/0x3c0 [ 19.445320] free_unref_page+0x34/0x1c0 [ 19.445323] ? gnttab_end_foreign_access_ref+0x24/0x50 [ 19.445326] __folio_put+0x3c/0x90 [ 19.445329] __pskb_pull_tail+0x1fa/0x5e0 [ 19.445332] handle_incoming_queue+0x180/0x190 [ 19.445335] xennet_poll+0x50e/0x900 [ 19.445337] ? xen_irq_lateeoi_locked.part.0+0x14a/0x2c0 [ 19.445340] ? xennet_interrupt+0x7c/0x90 [ 19.445343] __napi_poll+0x33/0x1e0 [ 19.445346] net_rx_action+0x18a/0x2f0 [ 19.445349] ? evtchn_2l_handle_events+0x178/0x440 [ 19.445353] __do_softirq+0xde/0x32e [ 19.445357] __irq_exit_rcu+0x75/0xa0 [ 19.445359] irq_exit_rcu+0xe/0x20 [ 19.445362] sysvec_xen_hvm_callback+0x92/0xd0 [ 19.445365] [ 19.445366] [ 19.445368] asm_sysvec_xen_hvm_callback+0x1b/0x20 [ 19.445370] RIP: 0010:pv_native_safe_halt+0xb/0x10 [ 19.445375] Code: 22 d7 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d 19 05 42 00 fb f4 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 83 [ 19.445377] RSP: 0018:a5003d50 EFLAGS: 0246 [ 19.445380] RAX: 4000 RBX: 8d834196ac64 RCX: [ 19.445381] RDX: 0001 RSI: 8d834196ac00 RDI: 0001 [ 19.445383] RBP: a5003d58 R08: R09: [ 19.445385] R10: R11: R12: 8d834196ac64 [ 19.445386] R13: R14: a52f8000 R15: 8d83bd60 [ 19.445389] ? acpi_safe_halt+0x19/0x60 [ 19.445392] acpi_idle_do_entry+0x40/0x80 [ 19.445394] acpi_idle_enter+0xb6/0x180 [ 19.445396] cpuidle_enter_state+0x91/0x6f0 [ 19.445400] ? finish_task_switch.isra.0+0x81/0x290 [ 19.445405] cpuidle_enter+0x2e/0x50 [ 19.445410] call_cpuidle+0x23/0x60 [ 19.445414] cpuidle_idle_call+0x10f/0x150 [ 19.445417] do_idle+0x82/0xf0 [ 19.445420] cpu_startup_entry+0x2a/0x30 [ 19.445423] rest_init+0xc2/0xf0 [ 19.445426] ? acpi_enable_subsystem+0xe6/0x2a0 [ 19.445429] ? static_key_disable+0x1f/0x30 [ 19.445432] arch_call_rest_init+0xe/0x30 [ 19.445435] start_kernel+0x34f/0x440 [ 19.445437] x86_64_start_reservations+0x18/0x30 [ 19.445442] x86_64_start_kernel+0xbf/0x110 [ 19.445445] secondary_startup_64_no_verify+0x184/0x18b [ 19.445450] To manage notifications about this bug go to: https