Industrial Technology - Linked-in Industrial Technology - Twitter Industrial Technology - News Feed
Latest Issue

Making sense of sounds: how AI can boost machine uptime

Making sense of sounds: how AI can boost machine uptime

Anyone familiar with the necessity of maintaining a machine knows how important the sounds and vibrations it makes are. Proper machine health monitoring through sound and vibrations can cut maintenance costs in half and double the lifetime. Implementing live acoustic data and analysis is another important approach for condition-based monitoring, says Sebastien Christian of Analog Devices.

We can learn what the normal sound of a machine is. When the sound changes, we identify it as abnormal. Then we may learn what the problem is so that we can associate that sound with a specific issue. Identifying anomalies takes a few minutes of training, but connecting sounds, vibrations and their causes to perform diagnostics can take a lifetime.

Analog Devices set out to build a system able to learn sounds and vibrations from a machine and decipher their meaning to detect abnormal behaviour and to perform diagnostics. The result of this work, OtoSense, is a machine health monitoring system that enables what ‘computer hearing’, which allows a computer to make sense of the leading indicators of a machine’s behaviour: sound and vibration.

To be robust, agnostic, and efficient, the OtoSense design philosophy followed some guiding principles:

  • Get inspiration from human neurology. Humans can learn and make sense of any sound they can hear in a very energy efficient manner.
  • Be able to learn stationary sounds as well as transient sounds. This requires adapted features and continuous monitoring.
  • Perform the recognition at the edge, close to the sensor. There should not be any need of a network connection to a remote server to make a decision.
  • Interaction with experts and the necessity to learn from them must happen with minimal impact on their daily workload, and be as enjoyable as possible.

The process by which humans make sense of sounds can be described in four familiar steps: analogue acquisition of the sound, digital conversion, feature extraction, and interpretation.

Analogue acquisition and digitisation in OtoSense is performed by sensors, amplifiers, and codecs. For feature extraction, OtoSense uses a time window that Analog Devices calls ‘chunk’, which moves with a fixed step size. OtoSense shows a graphical representation of all the sounds or vibration heard, organised by similarity, but without trying to create rigid categories. This lets experts organise and name the groupings seen on screen without trying to artificially create bounded categories.

From sound and vibration to features

A feature is assigned an individual number to describe a given attribute/quality of a sound or vibration over a period of time (the time window, or chunk, as we mentioned earlier). A portion of the OtoSense platform’s two to 1024 features describe the time domain. They are extracted either right from the waveform or from the evolution of any other feature over the chunk. Some of these features include the average and maximal amplitude, complexity derived from the linear length of the waveform, amplitude variation, the existence and characterisation of impulsions, stability as the resemblance between the first and last buffer, skinny autocorrelation avoiding convolution, or variations of the main spectral peaks.

The features used on the frequency domain are extracted from an FFT. The FFT is computed on each buffer and yields 128 to 2048 individual frequency contributions. The process then creates a vector with the desired number of dimensions – much smaller than the FFT size, of course, but that still extensively describe the environment. OtoSense initially starts with an agnostic method for creating equal-sized buckets on the log spectrum. Then, depending on the environment and the events to be identified, these buckets adapt to focus on areas of the spectrum where information density is high, either from an unsupervised perspective that maximises entropy or from a semi-supervised perspective that uses labelled events as a guide. This mimics the architecture of our inner ear cells, which is denser where the speech information is maximal.

Outlier detection and event recognition with OtoSense happen at the edge, without the participation of any remote asset. This architecture ensures that the system won’t be impacted by a network failure and it avoids having to send all raw data chunks out for analysis. An edge device running OtoSense is a self-contained system describing the behaviour of the machine it’s listening to in real time.

The OtoSense server, running the AI and HMI, is typically hosted on premises. A cloud architecture makes sense for aggregating multiple meaningful data streams as the output of OtoSense devices. It makes less sense to use cloud hosting for an AI dedicated to processing large amounts of data and interacting with hundreds of devices on a single site.

From features to anomaly detection

Normality/abnormality evaluation does not require much interaction with experts to be started. Experts only need to help establish a baseline for a machine’s normal sounds and vibrations. This baseline is then translated into an outlier model on the OtoSense server before being pushed to the device.

Two different strategies are used to evaluate the normality of an incoming sound or vibration. The first is called ‘usualness’ where any new incoming sound that lands in the feature space is checked for its surrounding, how far it is from baseline points and clusters, and how big those clusters are. The bigger the distance and the smaller the clusters, the more unusual the new sound is and the higher its outlier score is. When this outlier score is above a threshold as defined by experts, the corresponding chunk is labelled unusual and sent to the server to become available for experts.

The second strategy is very simple: any incoming chunk with a feature value above or below the maximum or minimum of all the features defining the baseline is labelled as extreme and sent to the server as well. The combination of unusual and extreme strategies offers good coverage of abnormal sounds or vibrations, and these strategies perform well for detecting progressive wear and unexpected, brutal events. 

From features to event recognition

Features belong to the physical realm, while meaning belongs to human cognition. To associate features with meaning, interaction between OtoSense AI and human experts is needed. A lot of time has been spent following our customers’ feedback to develop a human-machine interface (HMI) that enables engineers to efficiently interact with OtoSense to design event recognition models. This HMI allows for exploring data, labelling it, creating outlier models and sound recognition models, and testing those models.

Any sound or vibration can be visualised, along with its context, in many different ways – for example, using Sound Widgets (also known as Swidgets). At any moment, an outlier model or an event recognition model can be created. Event recognition models are presented as a round confusion matrix that allows OtoSense users to explore confusion events. Outliers can be explored and labelled through an interface that shows all the unusual and extreme sounds over time.

The objective of the OtoSense technology from Analog Devices is to make sound and vibration expertise available continuously, on any machine, with no need for a network connection to perform outlier detection and event recognition. This technology’s growing use for machine health monitoring in aerospace, automotive, and industrial monitoring applications has shown good performance in situations that once require human expertise and in situations involving embedded applications, especially on complex machines.

Download pdf

Other News from Analog Devices GmbH

Analog Devices’ people counting algorithm ensures efficient space utilisation and worker safety

Latest news about Machine Building

Additional Information
Text styles