Application Health-Checks
There are two automated health-checks at an application level:
- Elastic Load Balancing (ELB) checks
- Service status checks
And a migration version that is displayed on the front page /auth/health-check/db-version
.
ELB Checks
The following routes are available for the ELB to check that the application is alive and the instance doesn’t need to be recycled.
- Frontend -
/health-check
(unused) - Membrane -
/auth/health-check
(unused) - API -
/api/health-check
Frontend and membrane use /healthCheck.php
Service Status Checks
This is designed to provide a single point where the overall health of the service can be checked.
Currently supports checks for:
- Frontend to membrane
- Membrane to API
- DDC queue
Frontend, API and Membrane provide an endpoint that will give a combined service status of the services they are connected to (in a single tier list to make checking and merging of statuses easier).
- Frontend -
/health-check/service-status
- API -
/api/health-check/service-status
- Membrane -
/auth/health-check/service-status
Examples
Healthy status
{
"ok": true,
"membrane": {
"ok": true,
"status-code": 200
},
"api": {
"ok": true,
"status-code": 200
},
"ddc-queue": {
"ok": true,
"queue-type": "beanstalk",
"stats": {
"name": "ddc",
"current-jobs-urgent": "0",
"current-jobs-ready": "0",
"current-jobs-reserved": "0",
"current-jobs-delayed": "0",
"current-jobs-buried": "0",
"total-jobs": "0",
"current-using": "0",
"current-watching": "1",
"current-waiting": "1",
"cmd-delete": "0",
"cmd-pause-tube": "0",
"pause": "0",
"pause-time-left": "0"
}
}
}
DDC queue unavailable error
{
"ok": true,
"membrane": {
"ok": true,
"status-code": 200
},
"api": {
"ok": true,
"status-code": 200
},
"ddc-queue": {
"ok": false,
"queue-type": "beanstalk"
}
}
API connectivity error
{
"ok": false,
"membrane": {
"ok": true,
"status-code": 200
},
"api": {
"ok": false,
"error": "Threw an exception trying to call, check logs for more details"
}
}
Adding service checks
The health-check main service collects information from a list of service status providers. You may add a new service check with the following steps.
First, add a new status provider by writing a class that implements Application\HealthCheck\StatusProvider
. This class must implement the methods isEnabled()
and getStatus()
.
The method isEnabled()
will return a boolean indicating if this service check is active.
If active, the main health-check service will call getStatus()
. This method must return an instance of ServiceStatus
indicating if the service is healthy and including an array with information to be displayed in the health-check endpoint. If affectOverallStatus()
returns true
then a bad status value will cause the overall value for Sirius to be bad.
<?php
// 📁back-end/module/Application/src/HealthCheck/CommandBusQueueStatusProvider.php
declare(strict_types=1);
namespace Application\HealthCheck;
use Application\Queue\QueueType;
use Application\Queue\Sqs\GeneralSqsQueueException;
use Application\Queue\Sqs\SqsQueueHandler;
use Psr\Log\LoggerInterface;
class CommandBusQueueStatusProvider implements StatusProvider
{
public const NAME = 'command-bus-queue';
/** @var SqsQueueHandler */
private $queueHandler;
/** @var string */
private $queueType;
/** @var LoggerInterface */
private $logger;
public function __construct(
SqsQueueHandler $queueHandler,
QueueType $queueType,
LoggerInterface $logger
) {
$this->queueHandler = $queueHandler;
$this->queueType = $queueType->toString();
$this->logger = $logger;
}
public function isEnabled(): bool
{
return true;
}
public function affectOverallStatus(): bool
{
return true;
}
public function getStatus(): ServiceStatus
{
$ok = false;
$status = ['queue-type' => $this->queueType];
try {
$status['attributes'] = $this->queueHandler->getQueueAttributes();
$ok = true;
} catch (GeneralSqsQueueException $e) {
$this->logger->error($e);
}
return new ServiceStatus($ok, $status);
}
}
Next, register this service in the main service container.
<?php
// 📁back-end/module/Application/config/module.config.php
return [
// ...
'service_manager' => [
'factories' => [
// ...
\Application\HealthCheck\CommandBusQueueStatusProvider::class =>
\Application\HealthCheck\CommandBusQueueStatusProviderFactory::class,
],
];
Finally make this check visible to the health-check service by adding it to the application’s health-check configuration file.
<?php
// 📁back-end/config/autoload/healthcheck.global.php
<?php
use Application\HealthCheck\CommandBusQueueStatusProvider;
use Ddc\HealthCheck\DdcQueueStatusProvider;
return [
'sirius' => [
'health-check' => [
'providers' => [
DdcQueueStatusProvider::NAME => DdcQueueStatusProvider::class,
CommandBusQueueStatusProvider::NAME = CommandBusQueueStatusProvider::class,
],
],
],
];
With this new status provider now enabled we can call the /health-check/service-status
endpoint and verify that a new check "command-bus-queue"
is being displayed.
{
"ok": true,
"membrane": {
"ok": true,
"status-code": 200
},
"api": {
"ok": true,
"status-code": 200
},
"ddc-queue": {
"ok": true,
"queue-type": "beanstalk",
"stats": {
"name": "ddc",
"current-jobs-urgent": "0",
"current-jobs-ready": "0",
"current-jobs-reserved": "0",
"current-jobs-delayed": "0",
"current-jobs-buried": "0",
"total-jobs": "0",
"current-using": "0",
"current-watching": "1",
"current-waiting": "1",
"cmd-delete": "0",
"cmd-pause-tube": "0",
"pause": "0",
"pause-time-left": "0"
}
},
"command-bus-queue": {
"ok": true,
"queue-type": "sqs",
"attributes": {
"VisibilityTimeout": "30",
"DelaySeconds": "0",
"ReceiveMessageWaitTimeSeconds": "20",
"ApproximateNumberOfMessages": "0",
"ApproximateNumberOfMessagesNotVisible": "0",
"ApproximateNumberOfMessagesDelayed": "0",
"CreatedTimestamp": "1566200172",
"LastModifiedTimestamp": "1566200172",
"QueueArn": "arn:aws:sqs:elasticmq:000000000000:command-bus.fifo",
"ContentBasedDeduplication": "true",
"FifoQueue": "true"
}
}
}