In more than two decades working with industrial networks – as a software engineer building them, as a system engineer integrating new machines into production lines, and as a network engineer troubleshooting and fixing them – it often surprises me how often asset owners do not have the faintest idea about what is going on in their networks. Very often all seems to run fine…until some problem occurs, requiring immediate intervention to prevent downtime or a faulty product batch. I used to arrive at a site, and my first question would be: “Has something been changed in the network lately?”, mostly receiving “Nothing that I know of!” as the answer.
Troubleshooting networks is certainly not the favorite activity of an engineer. It can be frustrating, and it is often impossible to find supporting information for a successful analysis. Problems hide in the network and its devices and can only be identified through a lengthy review of thousands of host and network logs. In many cases, these problems do not seem to have direct impact on the process, and are therefore overlooked. Invisible from the outside, the network gets slower and slower, devices get overloaded, but as long as a reasonable amount of spare bandwidth and system resources are available, all keeps working. Until too many problems occur at the same time, or the problem becomes unsustainable for the network and its devices, and connections suddenly drop, controllers can no longer communicate with their peers, and production lines shut down.
I recently visited a factory where a production line ran fine for more than seven years, but due to wear and tear on wiring, it suffered an increasing number of unexpected production stops – starting with once a month, then once a week, and eventually every few days, compromising product deliveries. In another factory, a device on a production line had been reported absent by the PLC every five minutes for more than five years. Instead of investigating the issue, the operator preferred to acknowledge the alarm at every occurrence. I was called in to identify the root-cause of this problem, just to find out that the device did not have its network cable inserted.
Monitoring network traffic would enable operators to identify many of these problems long before they affect production, and with minimal effort. This includes wiring and configuration faults and sometimes even software malfunctions and bugs. I always advise customers to monitor their network as part of the FAT or SAT, to avoid signing off a new system without guarantee that there are no problems, even if they aren’t visible on the surface! Additionally, network monitoring provides a reference measurement, allowing asset owners to know what is “normal” on a network. This knowledge can be used at any time to answer the question: “Has something (been) changed in the network lately?”
It is absolutely a fact that monitoring network traffic provides value to both engineers and asset owners: engineers can get their job done more efficiently, spending less time investigating malfunctions, while asset owners see their risks and costs for unplanned downtime drastically reduced.