So if you look at the code for run.sh you will see that it is a while loop that calls _run.sh. If _run.sh gets terminated and the java process is still running, future runs of _run.sh will attempt to start the java process and will fail. But as it fails a whole lot of log message will get logged to cloud.out by the /etc/init.d/cloud script. I guess the fix I am working on should also address( by preventing scripts from logging to cloud.out) this issue.
Thanks Saurav On Thu, Mar 20, 2014 at 10:29 PM, Sunil Bakhru <sbak...@juniper.net> wrote: > The fix for 6258 will contain the log size and prevent it from hogging the > root partition. > > However, we do see the following 'ERROR' log which fills up our system VM > log file. Has anybody seen this? > Any pointers on what could possibly be wrong? > > > 2014-03-20 08:40:54,395 ERROR [cloud.agent.AgentShell] (main:null) Unable > to start agent: Java process is being started twice. If this is not true, > remove /var/run/agent.SecStorage.pid > 2014-03-20 08:41:05,359 INFO [cloud.agent.AgentShell] (main:null) Agent > started > 2014-03-20 08:41:05,364 INFO [cloud.agent.AgentShell] (main:null) > Implementation Version is 4.3.0 > 2014-03-20 08:41:05,364 INFO [cloud.agent.AgentShell] (main:null) > agent.properties found at /usr/local/cloud/systemvm/conf/agent.properties > 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found > property: instance > 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found > property: NfsSecondaryStorageResource.id > 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found > property: resource > 2014-03-20 08:41:05,370 INFO [cloud.agent.AgentShell] (main:null) > Defaulting to using properties file for storage > 2014-03-20 08:41:05,371 INFO [cloud.agent.AgentShell] (main:null) > Defaulting to the constant time backoff algorithm > 2014-03-20 08:41:05,380 INFO [cloud.utils.LogUtils] (main:null) log4j > configuration found at /usr/local/cloud/systemvm/conf/log4j-cloud.xml > 2014-03-20 08:41:05,393 DEBUG [cloud.agent.AgentShell] (main:null) > Checking to see if agent.SecStorage.pid exists. > 2014-03-20 08:41:05,396 DEBUG [cloud.utils.ProcessUtil] (main:null) > environment.properties could not be opened > 2014-03-20 08:41:05,403 DEBUG [cloud.utils.ProcessUtil] (main:null) > Executing: bash -c ps -p 3469 > 2014-03-20 08:41:05,411 DEBUG [cloud.utils.ProcessUtil] (main:null) > Execution is successful. > 2014-03-20 08:41:05,411 DEBUG [cloud.utils.ProcessUtil] (main:null) PID > TTY TIME CMD > 3469 ? 00:00:36 java > > 2014-03-20 08:41:05,411 ERROR [cloud.agent.AgentShell] (main:null) Unable > to start agent: Java process is being started twice. If this is not true, > remove /var/run/agent.SecStorage.pid > > > > Appreciate the help. > > Thanks, > Sunil > > > > > > On 3/19/14 8:40 AM, "Rajesh Battala" <rajesh.batt...@citrix.com> wrote: > > >Great. Post your patch at reviews.apache.org > > > >Thanks > >Rajesh Battala > > > >-----Original Message----- > >From: Saurav Lahiri [mailto:saurav.lah...@sungard.com] > >Sent: Wednesday, March 19, 2014 8:49 PM > >To: dev@cloudstack.apache.org > >Subject: Re: system vm disk space issue in ACS 4.3 > > > >Thanks Rajesh. I have created a jira ticket for this > >https://issues.apache.org/jira/browse/CLOUDSTACK-6258. Will send in the > >fix for review in a couple of days. > > > >Thanks > >Saurav > > > > > >On Wed, Mar 19, 2014 at 8:03 PM, Rajesh Battala > ><rajesh.batt...@citrix.com>wrote: > > > >> Can you please file a bug and send your fix for review. > >> > >> Thanks > >> Rajesh Battala > >> > >> -----Original Message----- > >> From: Saurav Lahiri [mailto:saurav.lah...@sungard.com] > >> Sent: Wednesday, March 19, 2014 7:20 PM > >> To: dev@cloudstack.apache.org > >> Subject: Re: system vm disk space issue in ACS 4.3 > >> > >> The problem appears to be the start function in the /etc/init.d/cloud > >> service for console proxy. > >> More specifically the following line also writes to /var/log/cloud.out > >> > >> > >> ---------------------------------------------------------------------- > >> ------------------------------------------------------------ > >> (cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh > > >> /var/log/cloud/cloud.out > >> 2>&1 & ) > >> > >> ---------------------------------------------------------------------- > >> ------------------------------------------------------------ > >> > >> since run.sh calls _run.sh and both has "set -x" enabled, in certain > >> situations they can keep logging messages to cloud.out without being > >> aware of the settings in log4j-cloud.xml > >> > >> > >> One way to fix that could be that run.sh and _run.sh would log to > >> cloud.out only if a debug flag was set to true, otherwise only the > >> java process would write to cloud.out and log4j would respect the > >> settings in log4j-cloud.xml > >> > >> > >> Thanks > >> Saurav > >> > >> > >> > >> On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri > >> <saurav.lah...@sungard.com > >> >wrote: > >> > >> > Could it have something to do with the RollingFileAppender that is > >> > being used. > >> > The following > >> > rollingfileappender<http://apache-logging.6191.n7.nabble.com/Rolling > >> > Fi leAppender-not-working-consistently-td8582.html> link appears to > >> > be a > >> bit outdated but they more or less describe a similar problem that we > >> are seeing? > >> > > >> > > >> > On our environment that is what we have seeing for sometime on > >> > console proxy. The root filesystem goes full with the cloud.out.* > >> > occupying all the space. This happens pretty frequently and we have > >> > to regularly recycle the console proxy to resolve this issue. > >> > > >> > > >> > As seen below, cloud.out.2 should not have exceeded 10MB but it > >> > stands at 217MB now. > >> > > >> > drwxr-xr-x 2 root root 4.0K Mar 17 14:57 . > >> > drwxr-xr-x 8 root root 4.0K Mar 17 15:01 .. > >> > -rw-r--r-- 1 root root 0 Mar 12 18:18 api-server.log > >> > -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out > >> > -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1 > >> > -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2 > >> > > >> > root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out > >> > sleep 649 root 1w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > sleep 649 root 2w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2312 root 1w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2312 root 2w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2339 root 1w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2339 root 2w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2786 root 1w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > bash 2786 root 2w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > java 2805 root 1w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > java 2805 root 2w REG 202,1 226122291 181737 > >> > /var/log/cloud/cloud.out.2 > >> > java 2805 root 116w REG 202,1 319382 181769 > >> > /var/log/cloud/cloud.out > >> > root@v-zzzz-VM:/var/log/cloud# ls -alh > >> > > >> > Thanks > >> > Saurav > >> > > >> > > >> > On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal < > >> > chiradeep.vit...@citrix.com> wrote: > >> > > >> >> Yes, it was deliberate. I can¹t find the discussion, but it > >> >> revolved around a security best practice of having separate > >> >> partitions for /, /swap, home directories > >> >> > >> >> > >> >> On 3/10/14, 11:35 AM, "Marcus" <shadow...@gmail.com> wrote: > >> >> > >> >> >There have been several raised, actually regarding /var/log. As > >> >> >for the system vm partitioning, it was explicitly changed from > >> >> >single to multiple partitions last year. I have no idea why, but I > >> >> >generally don't file bugs without community discussion on things > >> >> >that seem deliberate. > >> >> > > >> >> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <shadow...@gmail.com> > wrote: > >> >> >> Yeah, I've just seen on busy systems where even with log > >> >> >>rotation working properly the little space left in var after OS > >> >> >>files is barely enough, for example the conntrackd log on a busy > >> >> >>VPC. We actually ended up rolling our own system vm, the > >> >> >>existing image has plenty of space, its just locked up in other > >>partitions. > >> >> >> > >> >> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala" > >> >> >><rajesh.batt...@citrix.com> > >> >> >>wrote: > >> >> >>> > >> >> >>> Yes, only 435MB is available for /var . we can increase the > >> >> >>> space > >> >> also. > >> >> >>> But we need to find out the root cause which services are > >> >> >>>causing the /var to fill up. > >> >> >>> Can you please find out and post which log files are taking up > >> >> >>>more space in /var > >> >> >>> > >> >> >>> Thanks > >> >> >>> Rajesh Battala > >> >> >>> > >> >> >>> -----Original Message----- > >> >> >>> From: Marcus [mailto:shadow...@gmail.com] > >> >> >>> Sent: Saturday, March 8, 2014 8:19 PM > >> >> >>> To: dev@cloudstack.apache.org > >> >> >>> Subject: RE: system vm disk space issue in ACS 4.3 > >> >> >>> > >> >> >>> Perhaps there's a new service. I know in the past we've seen > >> >> >>>issues with this , specifically the conntrackd log. I think the > >> >> >>>cloud logs weren't getting rolled either, but I thought it was > >> >> >>>all fixed. > >> >> >>> > >> >> >>> There's also simply not a ton of space on /var, I wish we would > >> >> >>>go back to just having one partition because it orphans lots of > >> >> >>>free space in other filesystems. > >> >> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala" > >> >> >>><rajesh.batt...@citrix.com> > >> >> >>> wrote: > >> >> >>> > >> >> >>> > AFAIK, log roation is enabled in the systemvm. > >> >> >>> > Can you check whether the logs are getting zipped .? > >> >> >>> > > >> >> >>> > -----Original Message----- > >> >> >>> > From: Anirban Chakraborty [mailto:abc...@juniper.net] > >> >> >>> > Sent: Saturday, March 8, 2014 12:46 PM > >> >> >>> > To: dev@cloudstack.apache.org > >> >> >>> > Subject: system vm disk space issue in ACS 4.3 > >> >> >>> > > >> >> >>> > Hi All, > >> >> >>> > > >> >> >>> > I am seeing system vm disk has no space left after running > >> >> >>> > for few > >> >> >>>days. > >> >> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while > >> >> >>> > agent state of s-1-VM shows blank (hyphen in the UI). > >> >> >>> > Both the system vms are running and ssh-able from the host. > >> >> >>> > The log > >> >> >>>in > >> >> >>> > s-1-Vm shows following errors: > >> >> >>> > > >> >> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.* > >> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left > >> >> >>> > on device > >> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left > >> >> >>> > on device > >> >> >>> > > >> >> >>> > whereas logs in v-1-VM shows > >> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left > >> >> >>> > on device > >> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left > >> >> >>> > on device > >> >> >>> > /var/log/cloud/cloud.out.3:07:18:00,547 INFO > >> >> CSExceptionErrorCode:87 > >> >> >>> > - Could not find exception: > >> >> >>> > com.cloud.exception.AgentControlChannelException > >> >> >>> > in error code list for exceptions > >> >> >>> > > >> >> >>> > > >> >> > >> >> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChann > >> >> >>>el > >> >> >>>Except > >> >> >>>ion: > >> >> >>> > Unable to post agent control request as link is not available > >> >> >>> > > >> >> >>> > Looks like cloud agent is filling up the log, which is > >> >> >>> > leading to > >> >> the > >> >> >>> > disk full state. > >> >> >>> > > >> >> >>> > Is this a known issue? Thanks. > >> >> >>> > > >> >> >>> > Anirban > >> >> >>> > > >> >> > >> >> > >> >> > >> > > >> > >> > > > > > >