The fix for 6258 will contain the log size and prevent it from hogging the root partition.
However, we do see the following 'ERROR' log which fills up our system VM log file. Has anybody seen this? Any pointers on what could possibly be wrong? 2014-03-20 08:40:54,395 ERROR [cloud.agent.AgentShell] (main:null) Unable to start agent: Java process is being started twice. If this is not true, remove /var/run/agent.SecStorage.pid 2014-03-20 08:41:05,359 INFO [cloud.agent.AgentShell] (main:null) Agent started 2014-03-20 08:41:05,364 INFO [cloud.agent.AgentShell] (main:null) Implementation Version is 4.3.0 2014-03-20 08:41:05,364 INFO [cloud.agent.AgentShell] (main:null) agent.properties found at /usr/local/cloud/systemvm/conf/agent.properties 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found property: instance 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found property: NfsSecondaryStorageResource.id 2014-03-20 08:41:05,370 DEBUG [cloud.agent.AgentShell] (main:null) Found property: resource 2014-03-20 08:41:05,370 INFO [cloud.agent.AgentShell] (main:null) Defaulting to using properties file for storage 2014-03-20 08:41:05,371 INFO [cloud.agent.AgentShell] (main:null) Defaulting to the constant time backoff algorithm 2014-03-20 08:41:05,380 INFO [cloud.utils.LogUtils] (main:null) log4j configuration found at /usr/local/cloud/systemvm/conf/log4j-cloud.xml 2014-03-20 08:41:05,393 DEBUG [cloud.agent.AgentShell] (main:null) Checking to see if agent.SecStorage.pid exists. 2014-03-20 08:41:05,396 DEBUG [cloud.utils.ProcessUtil] (main:null) environment.properties could not be opened 2014-03-20 08:41:05,403 DEBUG [cloud.utils.ProcessUtil] (main:null) Executing: bash -c ps -p 3469 2014-03-20 08:41:05,411 DEBUG [cloud.utils.ProcessUtil] (main:null) Execution is successful. 2014-03-20 08:41:05,411 DEBUG [cloud.utils.ProcessUtil] (main:null) PID TTY TIME CMD 3469 ? 00:00:36 java 2014-03-20 08:41:05,411 ERROR [cloud.agent.AgentShell] (main:null) Unable to start agent: Java process is being started twice. If this is not true, remove /var/run/agent.SecStorage.pid Appreciate the help. Thanks, Sunil On 3/19/14 8:40 AM, "Rajesh Battala" <rajesh.batt...@citrix.com> wrote: >Great. Post your patch at reviews.apache.org > >Thanks >Rajesh Battala > >-----Original Message----- >From: Saurav Lahiri [mailto:saurav.lah...@sungard.com] >Sent: Wednesday, March 19, 2014 8:49 PM >To: dev@cloudstack.apache.org >Subject: Re: system vm disk space issue in ACS 4.3 > >Thanks Rajesh. I have created a jira ticket for this >https://issues.apache.org/jira/browse/CLOUDSTACK-6258. Will send in the >fix for review in a couple of days. > >Thanks >Saurav > > >On Wed, Mar 19, 2014 at 8:03 PM, Rajesh Battala ><rajesh.batt...@citrix.com>wrote: > >> Can you please file a bug and send your fix for review. >> >> Thanks >> Rajesh Battala >> >> -----Original Message----- >> From: Saurav Lahiri [mailto:saurav.lah...@sungard.com] >> Sent: Wednesday, March 19, 2014 7:20 PM >> To: dev@cloudstack.apache.org >> Subject: Re: system vm disk space issue in ACS 4.3 >> >> The problem appears to be the start function in the /etc/init.d/cloud >> service for console proxy. >> More specifically the following line also writes to /var/log/cloud.out >> >> >> ---------------------------------------------------------------------- >> ------------------------------------------------------------ >> (cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh > >> /var/log/cloud/cloud.out >> 2>&1 & ) >> >> ---------------------------------------------------------------------- >> ------------------------------------------------------------ >> >> since run.sh calls _run.sh and both has "set -x" enabled, in certain >> situations they can keep logging messages to cloud.out without being >> aware of the settings in log4j-cloud.xml >> >> >> One way to fix that could be that run.sh and _run.sh would log to >> cloud.out only if a debug flag was set to true, otherwise only the >> java process would write to cloud.out and log4j would respect the >> settings in log4j-cloud.xml >> >> >> Thanks >> Saurav >> >> >> >> On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri >> <saurav.lah...@sungard.com >> >wrote: >> >> > Could it have something to do with the RollingFileAppender that is >> > being used. >> > The following >> > rollingfileappender<http://apache-logging.6191.n7.nabble.com/Rolling >> > Fi leAppender-not-working-consistently-td8582.html> link appears to >> > be a >> bit outdated but they more or less describe a similar problem that we >> are seeing? >> > >> > >> > On our environment that is what we have seeing for sometime on >> > console proxy. The root filesystem goes full with the cloud.out.* >> > occupying all the space. This happens pretty frequently and we have >> > to regularly recycle the console proxy to resolve this issue. >> > >> > >> > As seen below, cloud.out.2 should not have exceeded 10MB but it >> > stands at 217MB now. >> > >> > drwxr-xr-x 2 root root 4.0K Mar 17 14:57 . >> > drwxr-xr-x 8 root root 4.0K Mar 17 15:01 .. >> > -rw-r--r-- 1 root root 0 Mar 12 18:18 api-server.log >> > -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out >> > -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1 >> > -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2 >> > >> > root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out >> > sleep 649 root 1w REG 202,1 226122291 181737 >> > /var/log/cloud/cloud.out.2 >> > sleep 649 root 2w REG 202,1 226122291 181737 >> > /var/log/cloud/cloud.out.2 >> > bash 2312 root 1w REG 202,1 226122291 181737 >> > /var/log/cloud/cloud.out.2 >> > bash 2312 root 2w REG 202,1 226122291 181737 >> > /var/log/cloud/cloud.out.2 >> > bash 2339 root 1w REG 202,1 226122291 181737 >> > /var/log/cloud/cloud.out.2 >> > bash 2339 root 2w REG 202,1 226122291 181737 >> > /var/log/cloud/cloud.out.2 >> > bash 2786 root 1w REG 202,1 226122291 181737 >> > /var/log/cloud/cloud.out.2 >> > bash 2786 root 2w REG 202,1 226122291 181737 >> > /var/log/cloud/cloud.out.2 >> > java 2805 root 1w REG 202,1 226122291 181737 >> > /var/log/cloud/cloud.out.2 >> > java 2805 root 2w REG 202,1 226122291 181737 >> > /var/log/cloud/cloud.out.2 >> > java 2805 root 116w REG 202,1 319382 181769 >> > /var/log/cloud/cloud.out >> > root@v-zzzz-VM:/var/log/cloud# ls -alh >> > >> > Thanks >> > Saurav >> > >> > >> > On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal < >> > chiradeep.vit...@citrix.com> wrote: >> > >> >> Yes, it was deliberate. I can¹t find the discussion, but it >> >> revolved around a security best practice of having separate >> >> partitions for /, /swap, home directories >> >> >> >> >> >> On 3/10/14, 11:35 AM, "Marcus" <shadow...@gmail.com> wrote: >> >> >> >> >There have been several raised, actually regarding /var/log. As >> >> >for the system vm partitioning, it was explicitly changed from >> >> >single to multiple partitions last year. I have no idea why, but I >> >> >generally don't file bugs without community discussion on things >> >> >that seem deliberate. >> >> > >> >> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <shadow...@gmail.com> wrote: >> >> >> Yeah, I've just seen on busy systems where even with log >> >> >>rotation working properly the little space left in var after OS >> >> >>files is barely enough, for example the conntrackd log on a busy >> >> >>VPC. We actually ended up rolling our own system vm, the >> >> >>existing image has plenty of space, its just locked up in other >>partitions. >> >> >> >> >> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala" >> >> >><rajesh.batt...@citrix.com> >> >> >>wrote: >> >> >>> >> >> >>> Yes, only 435MB is available for /var . we can increase the >> >> >>> space >> >> also. >> >> >>> But we need to find out the root cause which services are >> >> >>>causing the /var to fill up. >> >> >>> Can you please find out and post which log files are taking up >> >> >>>more space in /var >> >> >>> >> >> >>> Thanks >> >> >>> Rajesh Battala >> >> >>> >> >> >>> -----Original Message----- >> >> >>> From: Marcus [mailto:shadow...@gmail.com] >> >> >>> Sent: Saturday, March 8, 2014 8:19 PM >> >> >>> To: dev@cloudstack.apache.org >> >> >>> Subject: RE: system vm disk space issue in ACS 4.3 >> >> >>> >> >> >>> Perhaps there's a new service. I know in the past we've seen >> >> >>>issues with this , specifically the conntrackd log. I think the >> >> >>>cloud logs weren't getting rolled either, but I thought it was >> >> >>>all fixed. >> >> >>> >> >> >>> There's also simply not a ton of space on /var, I wish we would >> >> >>>go back to just having one partition because it orphans lots of >> >> >>>free space in other filesystems. >> >> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala" >> >> >>><rajesh.batt...@citrix.com> >> >> >>> wrote: >> >> >>> >> >> >>> > AFAIK, log roation is enabled in the systemvm. >> >> >>> > Can you check whether the logs are getting zipped .? >> >> >>> > >> >> >>> > -----Original Message----- >> >> >>> > From: Anirban Chakraborty [mailto:abc...@juniper.net] >> >> >>> > Sent: Saturday, March 8, 2014 12:46 PM >> >> >>> > To: dev@cloudstack.apache.org >> >> >>> > Subject: system vm disk space issue in ACS 4.3 >> >> >>> > >> >> >>> > Hi All, >> >> >>> > >> >> >>> > I am seeing system vm disk has no space left after running >> >> >>> > for few >> >> >>>days. >> >> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while >> >> >>> > agent state of s-1-VM shows blank (hyphen in the UI). >> >> >>> > Both the system vms are running and ssh-able from the host. >> >> >>> > The log >> >> >>>in >> >> >>> > s-1-Vm shows following errors: >> >> >>> > >> >> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.* >> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left >> >> >>> > on device >> >> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left >> >> >>> > on device >> >> >>> > >> >> >>> > whereas logs in v-1-VM shows >> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left >> >> >>> > on device >> >> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left >> >> >>> > on device >> >> >>> > /var/log/cloud/cloud.out.3:07:18:00,547 INFO >> >> CSExceptionErrorCode:87 >> >> >>> > - Could not find exception: >> >> >>> > com.cloud.exception.AgentControlChannelException >> >> >>> > in error code list for exceptions >> >> >>> > >> >> >>> > >> >> >> >> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChann >> >> >>>el >> >> >>>Except >> >> >>>ion: >> >> >>> > Unable to post agent control request as link is not available >> >> >>> > >> >> >>> > Looks like cloud agent is filling up the log, which is >> >> >>> > leading to >> >> the >> >> >>> > disk full state. >> >> >>> > >> >> >>> > Is this a known issue? Thanks. >> >> >>> > >> >> >>> > Anirban >> >> >>> > >> >> >> >> >> >> >> > >> >> > >