Menu Sign In Contact FAQ
Banner
Welcome to our forums

FAA NOTAM system failure (and other IT hacks)

What a pity nobody has ever invented the idea of a file validity checker.

LFMD, France

johnh wrote:

What a pity nobody has ever invented the idea of a file validity checker.

If the news reports are to be taken at face value (dangerous, I know, but that’s all the information we have), the problem was caused by a corrupted database file. That most probably means that the database had been damaged by a hardware or software error. So a file validity checker wouldn’t have made any difference. In fact it could possibly have been such a checker that discovered the problem.

ESKC (Uppsala/Sundbro), Sweden

You cannot prevent “file corruption”. That is what happens with computers

The key is to keep backups of the database. Usually, the database itself is not that big. For example the EuroGA database is backed up every 24hrs, and that is in addition to other backups which I obviously won’t talk about

Administrator
Shoreham EGKA, United Kingdom

You can’t prevent file corruption, but you can check the file contents before you load into a mission critical app. For example, if it’s some kind of database, you can load it into a test app, and run consistency checks on the data. You only load it to the live app AFTER you have done all that. That was what I meant by a validity checker. It doesn’t ensure there won’t be any errors, but it does ensure that it won’t break the critical app completely.

Last Edited by johnh at 15 Jan 17:39
LFMD, France

and run consistency checks on the data

I don’t know the database used and don’t know the structure, but you either have integrity (CRC etc) or not. If not, then +TSRB instead of +TSRA will not be picked up.

It is virtually impossible to write a database application so it can handle every possible database corruption. One tries but… For example you cannot currently join up on EuroGA if you are on IPV6 – because one of the db fields isn’t wide enough. Affects < 1% of people.

Administrator
Shoreham EGKA, United Kingdom

It is virtually impossible to write a database application so it can handle every possible database corruption.

Absolutely. But you can ensure that all structural links etc are valid. If it says +TSRB it doesn’t really matter, though it will send people like me rushing to the list of acronyms. But if the live app crashes because of an invalid link, that does matter. You can also ensure that the basic file format is valid, i.e. that the test app can load it successfully.

You can put a CRC or whatever on a backup file, but it doesn’t help if something goes wrong with the live data.

In reality I doubt we’ll ever know what really happened. And it’s highly likely that it was actually human error and it’s just a lot easier to blame the software.

(A little anecdote… many years ago when I worked for a well-known large Internet company, one of our huge customers, a major telco, accidentally typed a one line command that fed all the Internet routes – ~500K of them at the time – into the internal routing database, which is designed to handle a few thousand at most. Not only did the network die, but it was a very complex process to bring it up again.)

LFMD, France

johnh wrote:

You can’t prevent file corruption, but you can check the file contents before you load into a mission critical app. For example, if it’s some kind of database, you can load it into a test app, and run consistency checks on the data. You only load it to the live app AFTER you have done all that. That was what I meant by a validity checker. It doesn’t ensure there won’t be any errors, but it does ensure that it won’t break the critical app completely.

You assume that this database is something which is prepared separately and then fed into the system. It is much more likely that it is the “live” database of NOTAMs which is continuously used and maintained by the NOTAM system.

ESKC (Uppsala/Sundbro), Sweden

Yes; each country maintains its own database of flight plans, notams, tafs, metars, etc. In Europe, Eurocontrol run one which is used (via B2B paid access) by a lot of people.

Administrator
Shoreham EGKA, United Kingdom

johnh wrote:

In reality I doubt we’ll ever know what really happened. And it’s highly likely that it was actually human error and it’s just a lot easier to blame the software.

I agree. I am in this business, and my team gets to write more than our share of RCAs, unfortunately. In the past, people in the org have liked to blame “human error”, but I argue that human error can never be a root cause.

Fly more.
LSGY, Switzerland
Sign in to add your message

Back to Top