Re: system vm disk space issue in ACS 4.3

Saurav Lahiri Wed, 19 Mar 2014 06:51:51 -0700

The problem appears to be the start function in the /etc/init.d/cloud
service for console proxy.
More specifically the following line also writes to /var/log/cloud.out


----------------------------------------------------------------------------------------------------------------------------------
(cd $CLOUD_COM_HOME/systemvm; nohup ./run.sh > /var/log/cloud/cloud.out
2>&1 & )
----------------------------------------------------------------------------------------------------------------------------------

since run.sh calls _run.sh and both has "set -x" enabled, in certain
situations they can keep
logging messages to cloud.out without being aware of the settings in
log4j-cloud.xml


One way to fix that could be that run.sh and _run.sh would log to cloud.out
only if a debug flag
was set to true, otherwise only the java process would write to cloud.out
and log4j would
respect the settings in log4j-cloud.xml


Thanks
Saurav



On Mon, Mar 17, 2014 at 8:47 PM, Saurav Lahiri <[email protected]>wrote:

> Could it have  something to do with the RollingFileAppender that is being
> used.
> The following 
> rollingfileappender<http://apache-logging.6191.n7.nabble.com/RollingFileAppender-not-working-consistently-td8582.html>
>  link
> appears to be a bit outdated but they more or less describe a similar
> problem that we are seeing?
>
>
> On our environment that is what we have seeing for sometime on console
> proxy.  The root filesystem goes full with the cloud.out.* occupying all
> the space. This happens pretty frequently and we have to regularly recycle
> the console proxy to resolve this issue.
>
>
> As seen below, cloud.out.2 should not have exceeded 10MB but it stands at
> 217MB now.
>
> drwxr-xr-x 2 root root 4.0K Mar 17 14:57 .
> drwxr-xr-x 8 root root 4.0K Mar 17 15:01 ..
> -rw-r--r-- 1 root root    0 Mar 12 18:18 api-server.log
> -rw-r--r-- 1 root root 357K Mar 17 15:06 cloud.out
> -rw-r--r-- 1 root root 2.1M Mar 17 14:56 cloud.out.1
> -rw-r--r-- 1 root root 217M Mar 17 15:06 cloud.out.2
>
> root@v-zzzz-VM:/var/log/cloud# lsof | grep cloud.out
> sleep       649 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> sleep       649 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2312 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2312 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2339 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2339 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2786 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> bash       2786 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root    1w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root    2w      REG      202,1 226122291     181737
> /var/log/cloud/cloud.out.2
> java       2805 root  116w      REG      202,1    319382     181769
> /var/log/cloud/cloud.out
> root@v-zzzz-VM:/var/log/cloud# ls -alh
>
> Thanks
> Saurav
>
>
> On Tue, Mar 11, 2014 at 7:58 AM, Chiradeep Vittal <
> [email protected]> wrote:
>
>> Yes, it was deliberate. I can¹t find the discussion, but it revolved
>> around a security best practice of having separate partitions for /,
>> /swap, home directories
>>
>>
>> On 3/10/14, 11:35 AM, "Marcus" <[email protected]> wrote:
>>
>> >There have been several raised, actually regarding /var/log.  As for
>> >the system vm partitioning, it was explicitly changed from single to
>> >multiple partitions last year. I have no idea why, but I generally
>> >don't file bugs without community discussion on things that seem
>> >deliberate.
>> >
>> >On Sat, Mar 8, 2014 at 11:32 AM, Marcus <[email protected]> wrote:
>> >> Yeah, I've just seen on busy systems where even with log rotation
>> >>working
>> >> properly the little space left in var after OS files is barely enough,
>> >>for
>> >> example the conntrackd log on a busy VPC. We actually ended up rolling
>> >>our
>> >> own system vm, the existing image has plenty of space, its just locked
>> >>up in
>> >> other partitions.
>> >>
>> >> On Mar 8, 2014 8:58 AM, "Rajesh Battala" <[email protected]>
>> >>wrote:
>> >>>
>> >>> Yes, only 435MB is available for /var . we can increase the space
>> also.
>> >>> But we need to find out the root cause which services are causing the
>> >>>/var
>> >>> to fill up.
>> >>> Can you please find out and post which log files are taking up more
>> >>>space
>> >>> in /var
>> >>>
>> >>> Thanks
>> >>> Rajesh Battala
>> >>>
>> >>> -----Original Message-----
>> >>> From: Marcus [mailto:[email protected]]
>> >>> Sent: Saturday, March 8, 2014 8:19 PM
>> >>> To: [email protected]
>> >>> Subject: RE: system vm disk space issue in ACS 4.3
>> >>>
>> >>> Perhaps there's a new service. I know in the past we've seen issues
>> >>>with
>> >>> this , specifically the conntrackd log. I think the cloud logs weren't
>> >>> getting rolled either, but I thought it was all fixed.
>> >>>
>> >>> There's also simply not a ton of space on /var, I wish we would go
>> >>>back to
>> >>> just having one partition because it orphans lots of free space in
>> >>>other
>> >>> filesystems.
>> >>> On Mar 8, 2014 12:37 AM, "Rajesh Battala" <[email protected]>
>> >>> wrote:
>> >>>
>> >>> > AFAIK, log roation is enabled in the systemvm.
>> >>> > Can you check whether the logs are getting zipped .?
>> >>> >
>> >>> > -----Original Message-----
>> >>> > From: Anirban Chakraborty [mailto:[email protected]]
>> >>> > Sent: Saturday, March 8, 2014 12:46 PM
>> >>> > To: [email protected]
>> >>> > Subject: system vm disk space issue in ACS 4.3
>> >>> >
>> >>> > Hi All,
>> >>> >
>> >>> > I am seeing system vm disk has no space left after running for few
>> >>>days.
>> >>> > Cloudstack UI shows the agent in v-2-VM in alert state, while agent
>> >>> > state of s-1-VM shows blank (hyphen in the UI).
>> >>> > Both the system vms are running and ssh-able from the host. The log
>> >>>in
>> >>> > s-1-Vm shows following errors:
>> >>> >
>> >>> > root@s-1-VM:~# grep 'Exception' /var/log/cloud/*.*
>> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
>> >>> > device
>> >>> > /var/log/cloud/cloud.out.2:java.io.IOException: No space left on
>> >>> > device
>> >>> >
>> >>> > whereas logs in v-1-VM shows
>> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
>> >>> > device
>> >>> > /var/log/cloud/cloud.out.3:java.io.IOException: No space left on
>> >>> > device
>> >>> > /var/log/cloud/cloud.out.3:07:18:00,547  INFO
>> CSExceptionErrorCode:87
>> >>> > - Could not find exception:
>> >>> > com.cloud.exception.AgentControlChannelException
>> >>> > in error code list for exceptions
>> >>> >
>> >>> >
>>
>> >>>/var/log/cloud/cloud.out.3:com.cloud.exception.AgentControlChannelExcept
>> >>>ion:
>> >>> > Unable to post agent control request as link is not available
>> >>> >
>> >>> > Looks like cloud agent is filling up the log, which is leading to
>> the
>> >>> > disk full state.
>> >>> >
>> >>> > Is this a known issue? Thanks.
>> >>> >
>> >>> > Anirban
>> >>> >
>>
>>
>>
>

Re: system vm disk space issue in ACS 4.3

Reply via email to