Inbound call outage – a design failure

Today, for about 4 hours, we saw a major disruption to our voice (telephone call) network. The problem was in the systems of our upstream provider, Vocus. While we don’t have any detailed information of what failed at Vocus, it is already clear that there is a fundamental design flaw in the telephone network of Australia.

During the outage we were able to send outbound calls via an alternate carrier, such that our clients were able to make phone calls without interruption. However we were unable to do this for inbound calls. Unfortunately inbound calls are the calls that most businesses rely on for sales and so this was particularly disruptive. Why were we unable to use an alternate carrier? 

The problem comes down to the mess that our major carriers are in around managing numbers. The basic system is that each carrier announces ahead of time to all the other carriers what numbers it will accept calls for. The process for moving a number between one carrier and another is a ridiculously arduous process called “number porting”, something that often takes days or weeks. So whilst we can send an outbound call via any carrier we like (and just say it comes from xyz number), we have to allocate that same to number to a particular carrier for inbound calls beforehand during the number port process.

You would think there would be some sophisticated protocol (like “SS7”) to do this, however my understanding is that it basically comes down to ad hoc spreadsheets and the file transfer of these between the carriers.

If one of the carriers has a failure there is no method or process for us to quickly tell an alternate carrier to tell all the other carriers to send calls to the alternate carrier instead (and then onto us).

There are only a small number of large carriers that are part of this club that are allowed to send and receive calls to each other. While there are plenty of smaller players, they all have to connect via one of the club members. As you can imagine competition and innovation are rare commodities in this environment.

A good comparison to the issue can be made in the data world with a protocol called BGP. While BGP has its issues, one of the great things about it is that all players (even the smaller ones like us) are allowed to “announce” our addresses (the equivalent of telephone numbers) to the world. We are also allowed to announce via multiple routes, so if one upstream carrier fails another can quickly (in a minute or two) take over, resulting in minimal disruption. Could this technically be done for phone calls, indeed it could – most modern signalling protocols (e.g. SIP that is used by your VoIP phone) can easily handle this.

So unfortunately there really was nothing that we, Launtel, could do about the Vocus outage today. We could choose a different carrier, but in reality all of them will have outages and this is the first system outage they have had in a few years. All providers have to do the same as we do and get their calls through a large provider who is a member of the club.

The one thing we cannot do and the only way to get a fully reliable system, is to use multiple carriers for inbound calls.

Damian Ivereigh
Launtel
15 June 2020