Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"

Dane Leblanc (leblancd) Mon, 10 Mar 2014 14:55:02 -0700

Sean, John:

I’ve had a similar experience as Sukhdev… I had tried doing clean.sh on every 
run, but that didn’t help prevent the tgt problem, and it doesn’t help recover 
from it.
Sounds like the best option is to reset the VM for each run.


Thanks,
Dane

From: Sukhdev Kapur [mailto:[email protected]]
Sent: Monday, March 10, 2014 4:33 PM
To: Sean Dague
Cc: Dane Leblanc (leblancd); [email protected]
Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job 
failed to start"

Hi Sean,

In my case, for every run, I do unstack.sh, clean.sh, sudo rm -rf devstack, 
sudo rm -rf /opt/stack.
Then I go get everything fresh and stack.sh, and a full run of smoke tests
Few iterations of this sequence will get you into this condition. Once in this 
condition - clean.sh and unstack.sh, nothing helps, it fails solid 100% of 
times. If reboot the VM, everything works just fine for next 10-20 cycles until 
it hits the same condition. So, I am planning on modifying the script to reboot 
the VM every two hours or so....as a work around....but, the underlying problem 
occurred close to Ichouse check-ins. I started to notice this few days earlier 
than the Icehouse deadline, prior to that I was running the same sequence 
without any issue (for several weeks) - if that helps any...

-Sukhdev


On Mon, Mar 10, 2014 at 1:07 PM, Sean Dague 
<[email protected]<mailto:[email protected]>> wrote:
So, honestly, running stack.sh / unstack.sh that many times in a row
really isn't expected to work in my experience. You should at minimum be
doing ./clean.sh to try to reset the state further.

        -Sean

On 03/10/2014 03:00 PM, Dane Leblanc (leblancd) wrote:
> In my case, the base OS is 12.04 Precise.
>
> The problem is intermittent in that it takes maybe 15 to 20 cycles of 
> unstack/stack to get it into the failure mode, but once in the failure mode, 
> it appears that tgt daemon is 100% dead-in-the-water.
>
> -----Original Message-----
> From: Sean Dague [mailto:[email protected]<mailto:[email protected]>]
> Sent: Monday, March 10, 2014 1:49 PM
> To: Dane Leblanc (leblancd); 
> [email protected]<mailto:[email protected]>
> Subject: Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: 
> job failed to start"
>
> What base OS? A change was made there recently to better handle debian 
> because we believed (possibly incorrectly) that precise actually had working 
> init scripts.
>
> It would be interesting to understand if this was a 100% failure, or only 
> intermittent, and what base OS it was on.
>
>       -Sean
>
> On 03/10/2014 11:37 AM, Dane Leblanc (leblancd) wrote:
>> I don't know if anyone can give me some troubleshooting advice with this 
>> issue.
>>
>> I'm seeing an occasional problem whereby after several DevStack 
>> unstack.sh/stack.sh<http://unstack.sh/stack.sh> cycles, the tgt daemon 
>> (tgtd) fails to start during Cinder startup.  Here's a snippet from the 
>> stack.sh log:
>>
>> 2014-03-10 07:09:45.214 | Starting Cinder
>> 2014-03-10 07:09:45.215 | + return 0
>> 2014-03-10 07:09:45.216 | + sudo rm -f /etc/tgt/conf.d/stack.conf
>> 2014-03-10 07:09:45.217 | + _configure_tgt_for_config_d
>> 2014-03-10 07:09:45.218 | + [[ ! -d /etc/tgt/stack.d/ ]]
>> 2014-03-10 07:09:45.219 | + is_ubuntu
>> 2014-03-10 07:09:45.220 | + [[ -z deb ]]
>> 2014-03-10 07:09:45.221 | + '[' deb = deb ']'
>> 2014-03-10 07:09:45.222 | + sudo service tgt restart
>> 2014-03-10 07:09:45.223 | stop: Unknown instance:
>> 2014-03-10 07:09:45.619 | start: Job failed to start
>> jenkins@neutronpluginsci:~/devstack$ 2014-03-10 07:09:45.621 | +
>> exit_trap
>> 2014-03-10 07:09:45.622 | + local r=1
>> 2014-03-10 07:09:45.623 | ++ jobs -p
>> 2014-03-10 07:09:45.624 | + jobs=
>> 2014-03-10 07:09:45.625 | + [[ -n '' ]]
>> 2014-03-10 07:09:45.626 | + exit 1
>>
>> If I try to restart tgt manually without success:
>>
>> jenkins@neutronpluginsci:~$ sudo service tgt restart
>> stop: Unknown instance:
>> start: Job failed to start
>> jenkins@neutronpluginsci:~$ sudo tgtd
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> (null): iser_ib_init(3263) Failed to initialize RDMA; load kernel modules?
>> (null): fcoe_init(214) (null)
>> (null): fcoe_create_interface(171) no interface specified.
>> jenkins@neutronpluginsci:~$
>>
>> The config in /etc/tgt is:
>>
>> jenkins@neutronpluginsci:/etc/tgt$ ls -l total 8 drwxr-xr-x 2 root
>> root 4096 Mar 10 07:03 conf.d
>> lrwxrwxrwx 1 root root   30 Mar 10 06:50 stack.d -> 
>> /opt/stack/data/cinder/volumes
>> -rw-r--r-- 1 root root   58 Mar 10 07:07 targets.conf
>> jenkins@neutronpluginsci:/etc/tgt$ cat targets.conf include
>> /etc/tgt/conf.d/*.conf include /etc/tgt/stack.d/*
>> jenkins@neutronpluginsci:/etc/tgt$ ls conf.d
>> jenkins@neutronpluginsci:/etc/tgt$ ls /opt/stack/data/cinder/volumes
>> jenkins@neutronpluginsci:/etc/tgt$
>>
>> I don't know if there's any missing Cinder config in my DevStack localrc 
>> files. Here's one that I'm using:
>>
>> MYSQL_PASSWORD=nova
>> RABBIT_PASSWORD=nova
>> SERVICE_TOKEN=nova
>> SERVICE_PASSWORD=nova
>> ADMIN_PASSWORD=nova
>> ENABLED_SERVICES=g-api,g-reg,key,n-api,n-crt,n-obj,n-cpu,n-cond,cinder
>> ,c-sch,c-api,c-vol,n-sch,n-novnc,n-xvnc,n-cauth,horizon,rabbit
>> enable_service mysql
>> disable_service n-net
>> enable_service q-svc
>> enable_service q-agt
>> enable_service q-l3
>> enable_service q-dhcp
>> enable_service q-meta
>> enable_service q-lbaas
>> enable_service neutron
>> enable_service tempest
>> VOLUME_BACKING_FILE_SIZE=2052M
>> Q_PLUGIN=cisco
>> declare -a Q_CISCO_PLUGIN_SUBPLUGINS=(openvswitch nexus) declare -A
>> Q_CISCO_PLUGIN_SWITCH_INFO=([10.0.100.243]=admin:Cisco12345:22:neutron
>> pluginsci:1/9)
>> NCCLIENT_REPO=git://github.com/CiscoSystems/ncclient.git<http://github.com/CiscoSystems/ncclient.git>
>> PHYSICAL_NETWORK=physnet1
>> OVS_PHYSICAL_BRIDGE=br-eth1
>> TENANT_VLAN_RANGE=810:819
>> ENABLE_TENANT_VLANS=True
>> API_RATE_LIMIT=False
>> VERBOSE=True
>> DEBUG=True
>> LOGFILE=/opt/stack/logs/stack.sh.log
>> USE_SCREEN=True
>> SCREEN_LOGDIR=/opt/stack/logs
>>
>> Here are links to a log showing another localrc file that I use, and the 
>> corresponding stack.sh log:
>>
>> http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_console_lo
>> g.txt
>> http://128.107.233.28:8080/job/neutron/1390/artifact/vpnaas_stack_sh_l
>> og.txt
>>
>> Does anyone have any advice on how to debug this, or recover from this 
>> (beyond rebooting the node)? Or am I missing any Cinder config?
>>
>> Thanks in advance for any help on this!!!
>> Dane
>>
>>
>>
>> _______________________________________________
>> OpenStack-Infra mailing list
>> [email protected]<mailto:[email protected]>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>>
>
>
> --
> Sean Dague
> Samsung Research America
> [email protected]<mailto:[email protected]> / 
> [email protected]<mailto:[email protected]>
> http://dague.net
>


--
Sean Dague
Samsung Research America
[email protected]<mailto:[email protected]> / 
[email protected]<mailto:[email protected]>
http://dague.net

_______________________________________________
OpenStack-Infra mailing list
[email protected]<mailto:[email protected]>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

_______________________________________________
OpenStack-Infra mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Re: [OpenStack-Infra] tgt restart fails in Cinder startup "start: job failed to start"

Reply via email to