Skip to main content

Application Health-Checks

There are two automated health-checks at an application level:

  • Elastic Load Balancing (ELB) checks
  • Service status checks

And a migration version that is displayed on the front page /auth/health-check/db-version.

ELB Checks

The following routes are available for the ELB to check that the application is alive and the instance doesn’t need to be recycled.

  • Frontend - /health-check (unused)
  • Membrane - /auth/health-check (unused)
  • API - /api/health-check

Frontend and membrane use /healthCheck.php

Service Status Checks

This is designed to provide a single point where the overall health of the service can be checked.

Currently supports checks for:

  • Frontend to membrane
  • Membrane to API
  • DDC queue

Frontend, API and Membrane provide an endpoint that will give a combined service status of the services they are connected to (in a single tier list to make checking and merging of statuses easier).

  • Frontend - /health-check/service-status
  • API - /api/health-check/service-status
  • Membrane - /auth/health-check/service-status

Examples

Healthy status

{
    "ok": true,
    "membrane": {
        "ok": true,
        "status-code": 200
    },
    "api": {
        "ok": true,
        "status-code": 200
    },
    "ddc-queue": {
        "ok": true,
        "queue-type": "beanstalk",
        "stats": {
            "name": "ddc",
            "current-jobs-urgent": "0",
            "current-jobs-ready": "0",
            "current-jobs-reserved": "0",
            "current-jobs-delayed": "0",
            "current-jobs-buried": "0",
            "total-jobs": "0",
            "current-using": "0",
            "current-watching": "1",
            "current-waiting": "1",
            "cmd-delete": "0",
            "cmd-pause-tube": "0",
            "pause": "0",
            "pause-time-left": "0"
        }
    }
}

DDC queue unavailable error

{
    "ok": true,
    "membrane": {
        "ok": true,
        "status-code": 200
    },
    "api": {
        "ok": true,
        "status-code": 200
    },
    "ddc-queue": {
        "ok": false,
        "queue-type": "beanstalk"
    }
}

API connectivity error

{
  "ok": false,
  "membrane": {
    "ok": true,
    "status-code": 200
  },
  "api": {
    "ok": false,
    "error": "Threw an exception trying to call, check logs for more details"
  }
}

Adding service checks

The health-check main service collects information from a list of service status providers. You may add a new service check with the following steps.

First, add a new status provider by writing a class that implements Application\HealthCheck\StatusProvider. This class must implement the methods isEnabled() and getStatus().

The method isEnabled() will return a boolean indicating if this service check is active.

If active, the main health-check service will call getStatus(). This method must return an instance of ServiceStatus indicating if the service is healthy and including an array with information to be displayed in the health-check endpoint. If affectOverallStatus() returns true then a bad status value will cause the overall value for Sirius to be bad.

<?php
// 📁back-end/module/Application/src/HealthCheck/CommandBusQueueStatusProvider.php
declare(strict_types=1);

namespace Application\HealthCheck;

use Application\Queue\QueueType;
use Application\Queue\Sqs\GeneralSqsQueueException;
use Application\Queue\Sqs\SqsQueueHandler;
use Psr\Log\LoggerInterface;

class CommandBusQueueStatusProvider implements StatusProvider
{
    public const NAME = 'command-bus-queue';

    /** @var SqsQueueHandler */
    private $queueHandler;

    /** @var string */
    private $queueType;

    /** @var LoggerInterface */
    private $logger;

    public function __construct(
        SqsQueueHandler $queueHandler,
        QueueType $queueType,
        LoggerInterface $logger
    ) {
        $this->queueHandler = $queueHandler;
        $this->queueType = $queueType->toString();
        $this->logger = $logger;
    }

    public function isEnabled(): bool
    {
        return true;
    }

    public function affectOverallStatus(): bool
    {
        return true;
    }

    public function getStatus(): ServiceStatus
    {
        $ok = false;
        $status = ['queue-type' => $this->queueType];

        try {
            $status['attributes'] = $this->queueHandler->getQueueAttributes();
            $ok = true;
        } catch (GeneralSqsQueueException $e) {
            $this->logger->error($e);
        }

        return new ServiceStatus($ok, $status);
    }
}

Next, register this service in the main service container.

<?php
// 📁back-end/module/Application/config/module.config.php

return [
    // ...
    'service_manager' => [
        'factories' => [
            // ...
            \Application\HealthCheck\CommandBusQueueStatusProvider::class =>
            \Application\HealthCheck\CommandBusQueueStatusProviderFactory::class,
    ],
];

Finally make this check visible to the health-check service by adding it to the application’s health-check configuration file.

<?php
// 📁back-end/config/autoload/healthcheck.global.php

<?php

use Application\HealthCheck\CommandBusQueueStatusProvider;
use Ddc\HealthCheck\DdcQueueStatusProvider;

return [
    'sirius' => [
        'health-check' => [
            'providers' => [
                DdcQueueStatusProvider::NAME => DdcQueueStatusProvider::class,
                CommandBusQueueStatusProvider::NAME = CommandBusQueueStatusProvider::class,
            ],
        ],
    ],
];

With this new status provider now enabled we can call the /health-check/service-status endpoint and verify that a new check "command-bus-queue" is being displayed.

{
    "ok": true,
    "membrane": {
        "ok": true,
        "status-code": 200
    },
    "api": {
        "ok": true,
        "status-code": 200
    },
    "ddc-queue": {
        "ok": true,
        "queue-type": "beanstalk",
        "stats": {
            "name": "ddc",
            "current-jobs-urgent": "0",
            "current-jobs-ready": "0",
            "current-jobs-reserved": "0",
            "current-jobs-delayed": "0",
            "current-jobs-buried": "0",
            "total-jobs": "0",
            "current-using": "0",
            "current-watching": "1",
            "current-waiting": "1",
            "cmd-delete": "0",
            "cmd-pause-tube": "0",
            "pause": "0",
            "pause-time-left": "0"
        }
    },
    "command-bus-queue": {
        "ok": true,
        "queue-type": "sqs",
        "attributes": {
            "VisibilityTimeout": "30",
            "DelaySeconds": "0",
            "ReceiveMessageWaitTimeSeconds": "20",
            "ApproximateNumberOfMessages": "0",
            "ApproximateNumberOfMessagesNotVisible": "0",
            "ApproximateNumberOfMessagesDelayed": "0",
            "CreatedTimestamp": "1566200172",
            "LastModifiedTimestamp": "1566200172",
            "QueueArn": "arn:aws:sqs:elasticmq:000000000000:command-bus.fifo",
            "ContentBasedDeduplication": "true",
            "FifoQueue": "true"
        }
    }
}
This page was last reviewed on 12 November 2020. It needs to be reviewed again on 10 December 2020 by the page owner #opg-sirius-develop .
This page was set to be reviewed before 10 December 2020 by the page owner #opg-sirius-develop. This might mean the content is out of date.