Thank you very much for your reply, John.

I tried to make some sense of enabling lx_detrace in the lx zone as
describe in the "Enabling lx_debug probes" section of that doc. After
following those instructions (copying libs and mounting lofs) I ran the
"dtrace -Zqn lx*:::debug'{printf("%s", copyinstr(arg0))}'" command and
started up the zone in another terminal. Sure enough, I started to get lots
of output that I don't understand. :)

So then in that other terminal I tried running the lxunsup.d script shown
in that doc. When I first tried it for kicks before enabling lx_debug it
printed the "PID NAME CALL" line and then sat there. This time, it didn't
even print that line. So I tried to CTRL-C the other dtrace and it stopped
spewing stuff but did not return me to the prompt. Then I tried it in the
lxunsup.d terminal and it wouldn't exit either. Now they are both hung and
I can't SSH into the global zone or the LX zone. The connection is
established but the SSH handshake seems to hang.

Did I do something wrong? Anybody have any ideas on how to recover from
this? I am a long way from this machine. I may be able to make iDRAC access
work for powering the GZ off/on or seeing the console but am hoping there
is some way to kill dtrace and recover from those existing terminals.

--
Amos


On 7 April 2018 at 16:01, John E. Barfield <[email protected]>
wrote:

> Ive had some of my own headaches with LX branded zones and found it
> interesting to enable dtrace debugging of LX in the GZ following the hints
> and using some of the dtrace scripts in this guide:
>
> https://wiki.smartos.org/display/DOC/LX+Branded+Zones
>
> Its pretty simple to replace the lx binary with the debug version and then
> monitor the output using dtrace from the GZ.
>
> I think that the most important thing to determine is whether or not your
> application is using unsupported syscalls.
>
> We have found a lot of apps that run great in LX while others just simply
> need to run in a VM. Then there are those that we’ve simply ported to a
> native SmartOS zone to get the best experience.
>
> John Barfield
>
> On Apr 7, 2018, at 2:31 PM, smartos-discuss <smartos-discuss@lists.
> smartos.org> wrote:
>
> This is a digest of messages to smartos-discuss.
> Digest Contents
>
>    1. CouchDB failing in LX zone
>    
> <#m_-2767831536118299497_20180406215237:57414E4E-3A06-11E8-A790-DF9910533F21>
>
> CouchDB failing in LX zone
> <https://www.listbox.com/member/archive/184463/2018/04/20180406215237:57414E4E-3A06-11E8-A790-DF9910533F21>
>
> *Sent by Amos Hayes <[email protected] <[email protected]>>*
> at Fri, 6 Apr 2018 21:52:30 -0400
> Hello SmartOS folks. I have been running SmartOS at home for years now
> with only one issue with a GZ upgrade way back in 2014 so thank you for a
> rock solid system! Plex on an LX zone is my media server. Recently I
> deployed a new SmartOS server in a remote location to host an instance of
> the application we develop at work. We run & develop on Ubuntu in a VSphere
> environment. It has prerequisites like Java/Jetty, CouchDB, and ffmpeg.
> Everything seemed happy in an LX zone at first. But as it turns out, Apache
> CouchDB on an Ubuntu LX zone seems to have trouble remaining responsive. I
> have the same application setup on many VMWare Ubuntu guests and do not
> have any stability problems with CouchDB. I am using this Ubuntu 16.04
> image (with apt update and full-upgrade): https://docs.joyent.com/
> public-cloud/instances/infrastructure/images/ubuntu#ubuntu-1604-20170403
> The rest of the setup is identical to my other VMWare guest setups with
> CouchDB 1.6.0 from the default repositories and our application and its
> dependancies such as openjdk, ffmpeg, etc. The tricky thing is that
> everything works for a while and then at some point within hours/days
> couchdb will become unresponsive. Looking at top shows its beam.smp process
> locked at 100%. There are various errors in the logs that seem to point to
> resource problems encountered somewhere in the Erlang code. An example
> snippit would be: {error_info, {exit, {timeout, {gen_server,call,
> [<0.2715.5>,{open_ref_count,<0.4245.5>}]}}, [{gen_server,terminate,7,
> [{file,"gen_server.erl"},{line,826}]}, {proc_lib,init_p_do_apply,3,
> [{file,"proc_lib.erl"},{line,240}]}]}}, I'm not informed enough to know
> how to debug this, although I did try some basics like making sure the zone
> was assigned sufficient RAM (I upped it to 16GB out of 32GB total on host)
> and that quotas were set to 0 (although this was adjusted after creating
> the zone.) Restarting couchdb brings it back up and it behaves normally
> again for a while with no errors or warnings during use. I haven't taken
> this up with couchdb folks yet because the only thing different in the
> environment is that I'm running it in an Ubuntu LX zone vs. an Ubuntu
> VMWare guest. If anyone could point me to some things to try or something I
> can set up to try to catch more details when it happens again, I'd
> appreciate it. I am completely unfamiliar with dtrace but I gather this
> type of problem is where it shines. I have changed the couchdb logging to
> "warning" so it will only log warnings and errors so the logs will be a bit
> more manageable. I imagine it will break again within a day and then maybe
> I'll have some focused logs to post somewhere. Thanks for reading through.
> I welcome any thoughts.
>
> *smartos-discuss* | Archives
> <https://www.listbox.com/member/archive/184463/=now> | Modify
> <https://www.listbox.com/member/?&;>
> Your Subscription <http://www.listbox.com>
>



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to