Smarter Systems Monitoring

Systems monitoring can be a lot more than just monitoring systems to be ‘up’. With a small investment, it can be a ‘unit test’ for all the services that run over your infrastructure. It also changes support from reactive to active, aids business continuity and makes risks managable when migrating or upgrading systems.

As an example, at my last position at the Anglican School Corperation, when the new dark-fibre rollout was completed, it was decided to create a ‘dashboard’ for it using PRTG. We chose PRTG over Zabbix – due to much better out-of-box support and standard sensors, leading to a quick implemention.

Within no time, we received a beautiful dashboard for the network status of all 17 schools + the links to datacenters and important servers. A large display was also installed in our office providing a truly at-a-glance overview.

Does low CPU utilisation indicate that all your web services run as expected?

But -as you may well know- just having the network speed, ping time and a cpu utilisation factor will not tell you if all services runs smoothly. In fact, in 95% of cases where people logged a ticket or called that a certain program or service was slow, ping times, cpu times and disk utilisation were perfectly normal. It was time for somewhat smarter monitoring 🙂

Examples of ‘smarter’ sensors that we build

So as soon as PRTG was up and running, I asked my programmer, Jay, to code a few new sensors to cover services. This was done on both our Edumate Student Management Systems and on our Integration server which syncs our Edumate SMS with Canvas LMS and our Great Plains ERP.

SensorServerFunction
1. PartitionEdumateChecks the amount of free disk space on the partition that holds the DB2 database. This directly affects performance if low.
2. Payment GatewayEdumateChecks if the Commbank API can be used by Edumate. This API is used for direct debit and parents paying on the parent portal.
3. Web RequestIntegrationMake a demanding web request to Edumate and return the response time.
4. Great Plains DBIntegrationRun a query on the GP database. This is needed for the finance integration to work.

A closer look at the Payment Gateway sensor.

When PRTG checks this sensor, it executes the same API call to the bank as the Edumate Banking Module would. This checks that:
(1) The Edumate server reponds to a sensor call
(2) The Edumate server can reach the Commonwealth Bank API, also confirming that the network and firewall is operating as expected.
(3) We check that we get a valid reponse from the API
(4) We can measure the overall response time.

I’m sure that by now you have already grasped that this sensor accomplishes more than regular sensors – so what did this accomplish for us in a wider context?

Operational & Strategic Outcomes (of our sensors)

1. We have an entire eco-system test
With a single sensor we’ve established that the entire eco-system needed for making payments works: the network, the firewall, the payment gateway itself. The sensor returns the time taken, a quantitative measure to monitor.
2. The IT Team moved from reactive to pro-active support
We now know if the payment gateway has an issues, well before a parent or a staff member needs the functionality. As an IT team, we have moved from reactive to proactive support.
3. We have another ‘unit-test’ for our infrastructure and services
If we would move our server, or host it on a cloud service (=somebody else’s server), or make a different infrastructure change, we will get alerted if any of our services is not available. It’s essentially a ‘unit-test’ that automates testing and will reduce outage time.

Summary: Smarter monitoring can help to manage those increasing system interdependies.

With an increasing amount of interdependies, it is harder for your staff to cover off on all interdependies when making changes. But software like PRTG in combination with well considered sensors can provide you with a safety blanket that quickly checks the availability of services for all your important stakeholders.

For K-12 schools, you also may want to setup an outside sensor to check that people can VPN in, or reach your website, that the payment gateways work – especially if that affects the school’s cashflow. You may want to ensure that your staff can reach those systems needed to fulfil their duty of care.

Last but not least, consider that this can help immensily managing projects such as the relocation and migration of systems as well as vendor software upgrades. While not reducing the risk, it will make it managable and provides a very quick feedback loop. Business continuity also improves as checks and balance move from static documentation to active and any new staff will get to grips with your entire eco-system quicker.

Smarter systems monitoring – get you some!

Related business services
Formulation of requirements, board decision paper, product/vendor selection, creation of sensors, strategic consultancy.

Releated

Computer Science or Tech Groupies

Early in school we get taught gravity and hear about the founding fathers of physics, such as Copernicus, Newton, Kepler, Maxwell and Einstein. Would you be taken serious as a scientist without a thorough understanding of the history of your science? Or philosophy without Plato, Descartes, Spinoza and Kant? IT today is mostly consumption culture […]