The first is why was New Zealand impacted by an outage in Australia? The NZ banking system needs to be independent of an external country and organization. Many countries have a rule called data sovereignty and one thought that NZ had this. It certainly needs to be something that the Reserve Bank of NZ needs to look at and ask the question.
The second is why did the systems fail in the first place? It was reported in the Australian media that NAB’s system is hosted by IBM Melbourne. These data centers have or should have a very high level of resilience for power. They should have, and they probably did have, Uninterruptable Power Systems (UPS), and generators. Why did these not kick in and take over immediately? Obviously, there was some central point of failure in the power system. The design needs to be questioned as these computers often have two power supply feeds from two different sources. Was the data center testing their UPS and Generators? Why did it take 7 hours to recover? Ok Mainframes can be slow to boot and sort out their data but 7 hours really?
Third is why did it not switch automatically to a secondary data center. Banking systems should have a disaster recovery site which they automatically, well within 30 minutes switch to. Many financial transaction organizations have what is known as active/active or active/passive systems which enable automatic switching to mirror sites. The fact that NAB and the BNZ for that matter, seem to be totally reliant to one data centre should raise alarm bells by the respective Central Banks of both countries. Certainly, a please explain is required.
Fourth is why didn’t the transaction authorization system operate in a ”sub host” or “stand in” mode, where it can process transactions up to a certain limit, depending on the clients financial profile. Sub Host systems cannot normally run for a long period of time but it would have provided a window to swap to a secondary data center.
Many questions need to be asked of NAB , the BNZ, and if they are hosted by IBM, of IBM as well. Yes it may be a very unusual problem but they should have had a recovery site set up and switched to it immediately.
There are questions to be asked and lessons to be learnt.
Author - Sam Mulholland
Sam has previously worked in an managed banking data centres for over 30 years plus 20 years worked for international finance organizations in risk assessment, business continuity and resilient disaster recovery strategies.