On 05/13/2015 09:06 AM, Simon Pasquier wrote:
Hello,

Like many others commented before, I don't quite understand how unique are the Cloudpulse use cases.

For operators, I got the feeling that existing solutions fit well:
- Traditional monitoring tools (Nagios, Zabbix, ....) are necessary anyway for infrastructure monitoring (CPU, RAM, disks, operating system, RabbitMQ, databases and more) and diagnostic purposes. Adding OpenStack service checks is fairly easy if you already have the toolchain.
Is it really so easy? Rabbitmq has an "aliveness" test that is easy to hook into. I don't know exactly what it does, other than what the doc says, but I should not have to. If I want my standard monitoring system to call into a cloud and ask "is nova healthy?", "is glance healthy?", etc. are their such calls?

There are various sets of calls associated with nagios, zabbix, etc. but those seem like "after-market" parts for a car. Seems to me the services themselves would know best how to check if they are healthy, particularly as that could change version to version. Has their been discussion of adding a health-check (admin) api in each service? Lacking that, is there documentation from any OpenStack projects about "how to check the health of nova"? When I saw this thread start, that is what I thought it was going to be about.

 -David

- OpenStack projects like Rally or Tempest can generate synthetic loads and run end-to-end tests. Integrating them with a monitoring system isn't terribly difficult either.

As far as Monitoring-as-a-service is concerned, do you have plans to integrate/leverage Ceilometer?

BR,
Simon

On Tue, May 12, 2015 at 7:20 PM, Vinod Pandarinathan (vpandari) <vpand...@cisco.com <mailto:vpand...@cisco.com>> wrote:

    Hello,

      I'm pleased to announce the development of a new project called
    CloudPulse.  CloudPulse provides Openstack
    health-checking services to both operators, tenants, and
    applications. This project will begin as
a StackForge project based upon an empty cookiecutter[1] repo. The repos to work in are:
    Server: https://github.com/stackforge/cloudpulse
    Client: https://github.com/stackforge/python-cloudpulseclient

    Please join us via iRC on #openstack-cloudpulse on freenode.

    I am holding a doodle poll to select times for our first meeting
    the week after summit.  This doodle poll will close May 24th and
    meeting times will be announced on the mailing list at that time.
    At our first IRC meeting,
    we will draft additional core team members, so if your interested
    in joining a fresh new development effort, please attend our first
    meeting.
    Please take a moment if your interested in CloudPulse to fill out
    the doodle poll here:

    https://doodle.com/kcpvzy8kfrxe6rvb

    The initial core team is composed of
    Ajay Kalambur,
    Behzad Dastur, Ian Wells, Pradeep chandrasekhar, Steven
    DakeandVinod Pandarinathan.
    I expect more members to join during our initial meeting.

     A little bit about CloudPulse:
     Cloud operators need notification of OpenStack failures before a
    customer reports the failure. Cloud operators can then take timely
    corrective actions with minimal disruption to applications. Many
    cloud applications, including
    those I am interested in (NFV) have very stringent service level
    agreements.  Loss of service can trigger contractual
    costs associated with the service.  Application high availability
    requires an operational OpenStack Cloud, and the reality
    is that occascionally OpenStack clouds fail in some mysterious
    ways.  This project intends to identify when those failures
    occur so corrective actions may be taken by operators, tenants,
    and the applications themselves.

    OpenStack is considered healthy when OpenStack API services
    respond appropriately.  Further OpenStack is
    healthy when network traffic can be sent between the tenant
    networks and can access the Internet.  Finally OpenStack
    is healthy when all infrastructure cluster elements are in an
    operational state.

    For information about blueprints check out:
    https://blueprints.launchpad.net/cloudpulse
    https://blueprints.launchpad.net/python-cloudpulseclient

    For more details, check out our Wiki:
    https://wiki.openstack.org/wiki/Cloudpulse

    Plase join the CloudPulse team in designing and implementing a
    world-class Carrier Grade system for checking
    the health of OpenStack clouds.  We look forward to seeing you on
    IRC on #openstack-cloudpulse.

    Regards,
    Vinod Pandarinathan
    [1] https://github.com/openstack-dev/cookiecutter


    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to