New content
Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability

Open access

Research article

First published online March 26, 2026

RAPTOR: Resilience-aware prediction and tracking of operational risks from alarm information flows

Anandarup Mukherjee https://orcid.org/0000-0002-3165-1151 [email protected], Aitichya Chandra, […], Manuel Herrera, Luning Li, Hanu Priya Indiran, Arjun Parekh, and Ajith Kumar Parlikad+4View all authors and affiliations

OnlineFirst

https://doi.org/10.1177/1748006X261431895

Abstract

Telecommunication base station monitoring systems trigger alarms in response to network events, degradations, and recoveries. These alarm sequences are crucial for predictive analytics and for quantifying resilience at base stations and across the entire network. Alarms often hide structured information flows that can be revealed by exploring their causality, thereby revealing the dynamism and interdependencies among the infrastructure’s network components. The RAPTOR framework proposed in this work uses temporal clustering, first-order Markov modelling, and resilience metrics to analyse alarm logs and predict alarm behaviour in Radio Access Network (RAN) base stations. Using Dynamic Time Warping (DTW) to identify co-activated alarm clusters and Markov transition matrices to estimate alarm propagation, the proposed framework supports accurate modelling of information flow, next-alarm prediction, and interpretable resilience scores based on alarm entropy, self-loop tendency, and absorption probability. This framework is developed and validated on real-world alarm logs from numerous base stations of a major telecommunication service provider in the United Kingdom. Preliminary evaluation results indicate this approach is a good resource for network managers in infrastructure planning and maintenance scheduling, utilising features such as early warning, causality tracing, root-cause insights, and alarm interpretability.

Introduction

Alarm sequences in telecommunication network infrastructure are critical information sources that can be used to identify performance degradation, faults, and recovery events across the network as well as through their distributed component structure. Radio Access Network (RAN) Base stations and other infrastructure units emit these alarms and are crucial for operational monitoring. However, they are treated in isolation or with a limited temporal context in most cases. This siloed handling of these alarms constrains the predictive and diagnostic utility of alarm data, particularly in high-density or fault-prone environments. Recent works in time series analysis, pattern mining,¹ and probabilistic modelling² offer promising opportunities to transform these alarm logs into interpretable and actionable insights.³ However, most existing approaches rely on frequent pattern extraction, sequence classification, or anomaly detection without integrating prediction and causality inference.⁴ The quantification of alarm resilience is seldom addressed, and even when it is, it is not done within a unified framework.

This work introduces RAPTOR (Resilience-Aware Prediction and Tracking of Operational Risks from Alarm Information Flows), a data-driven pipeline that models alarm sequences generated from RAN base stations across telecommunication networks. By combining temporal clustering (via Dynamic Time Warping), Markov transition modelling, and resilience inference, the proposed framework uncovers the temporal and causal structure among alarms. Further, based on the identified causality, the framework outlines a method for forecasting future alarm events using interpretable probabilistic models and quantifies the resilience or fragility of specific alarms based on their propagation and resolution behaviours. This work hypothesises that it is possible to stratify patterns in alarm co-activation, transition complexity, and risk potential in each base station. This is done by analysing alarm dynamics across base stations with varying levels of alarm activities.

The findings demonstrate that when temporal behaviours of the alarms in each base station are encoded, even lightweight statistical models can yield accurate, explainable predictions and serve as a foundation for network resilience engineering. The core issue with telecommunication system alarms is that, despite abundant alarm data, it is significantly challenging to extract actionable insights that can quantify alarm behaviour and capture their interactions. Temporal irregularity in the occurrence of alarms and misalignment in the alarm logs are significant issues, as multiple issues can trigger the same alarm and are mostly unpredictable, with uneven time intervals. More often, a robust alignment technique is required to compare and interpret alarm patterns effectively, as fixed-window or synchronous models fail to handle variability.

Further, because alarm logs lack explicit causal labels, many alarms may appear as infrequent, isolated events, requiring approaches to discover the hidden causality. Sparse alarm transitions also reduce the discoverability of alarm causality sequences and their meaningful interpretation, which limits the use of approaches such as supervised learning due to minimal ground truth information. Approaches such as those using deep learning⁴ operate as black boxes. The factors of alarm transitions, causality, persistence, escalation, and recovery potential are mostly missing from these approaches, thereby limiting operational understanding and decision-making across the network infrastructure. The complexity of network infrastructure and operational methods at the base station level gives rise to many distinct alarm types, with significant variations in alarm activation patterns across base stations. This high dimensionality and heterogeneity across base stations require sensitive and contextual models that can adapt to localised behaviour and operating conditions.

Considering the inherent challenges of real-world alarm logs, the RAPTOR framework proposes a resilience-aware prediction and tracking of operational risks from information flows in the form of alarm transitions, reusable across heterogeneous base stations. The following three research questions form the underlying core of this framework:

RQ1 Temporal Causality: How can temporal similarity and sequential alignment techniques be used to uncover latent causal structures in alarm sequences across heterogeneous telecommunication base stations?

RQ2 Predictive Modelling: To what extent can first-order Markov models accurately forecast the next alarm events in time-ordered sequences, and how does this vary across base stations with differing activity levels?

RQ3 Resilience Quantification: How can the resilience of individual alarms be quantified using transition-based metrics, and what insights do these measures provide about fault persistence, escalation risk, and system recovery?

Considering these three research questions, RAPTOR proposes a unified and interpretable framework for analysing, forecasting, and interpreting alarm dynamics in telecommunication base stations. Unlike prior approaches that focus narrowly on correlation mining,^5,6 or black-box prediction,³ the proposed method integrates temporal alignment, probabilistic modelling, and resilience assessment into a coherent pipeline. This approach can help network managers and maintenance engineers schedule and plan maintenance of critical telecommunications infrastructure by providing early warning, causality tracing, root-cause insights, and alarm interpretability.

Related work

Telecommunication base station alarms have always served as critical metrics for service disruption diagnosis and management. Several efforts have been made to understand the behaviour of the alarm systems through the analysis of causal and temporal dependencies across the network. Early efforts attempted to study telecom alarm management involves expert knowledge and rule-based methods to identify the correlation between alarms and their possible causes. Brugnoni et al.⁷ proposed a real-time fault diagnosis system for the Italian telecom network using a heuristic framework combining alarm pattern identification, fault hypotheses selection, and investigatory explanation. Jakobson and Weissman⁵ presented the idea of alarm correlation and defined fault propagation as an acyclic graph. In this graph, the edges represent the causally related alarms. These early works demonstrated the utility of alarm grouping and filtering based on known dependency rules. However, these efforts were limited due to the challenges arising from human knowledge acquisition and increased network complexity and size.

Subsequent efforts include the study conducted by Bouloutas et al.,⁸ which used protocol models to explore and assess alarm faults. The study assumed predefined network models and did not consider data-driven insights from alarm logs, limiting generalisability. Subsequent studies attempted to explore the application of data-mining approaches for analysing recurrent temporal and correlation patterns among the alarms. A notable method is the Telecommunication Network Alarm Sequence Analyser (TASA), proposed by Klemettinen et al.⁹ TASA treats alarm logs as event sequences and uses episode mining methods to identify recurring alarm sequences.

Another branch of work has focussed on measuring similarity and building networks of alarms based on historical co-occurrence patterns. Lin et al.¹⁰ argue that directly computing pairwise similarity (e.g. via Euclidean distance or Dynamic Time Warping) on alarm time series can be misleading, because alarms may have complex positive or negative correlations. They propose a shuffling-based similarity index that measures the likelihood of the observed overlap of two alarm sequences compared to random chance, which is then used to construct device-to-device alarm correlation networks for the functional grouping of network elements that often experience concurrent alarms. Fournier-Viger et al.⁶ models the telecom network as a dynamic graph of nodes (network elements) and links, and then extracts alarm correlation rules that describe how alarms spread through that graph.

Beyond descriptive correlation, statistical and machine learning models were used to predict impending alarms or faults based on observed alarm sequences. Salfner et al.¹¹ developed a Semi-Markov “Similar Events Prediction” model that treats each alarm type as a state and uses sojourn times to estimate the probability of a failure within a specified window; it proved effective in live telecom systems but suffered from state-space explosion as network size grows. Salaun et al.¹² introduced the DIG-DAG structure to compactly encode all possible alarm chains in a log, and a querying mechanism to extract predictive patterns from this structure. Building on this, Desbouvries et al.¹³ analytically compared recurrent neural networks (RNNs) with Hidden Markov Models (HMMs) for modelling alarm time series. While HMMs rely on a fixed number of states, RNNs (especially LSTM networks) can, in principle, capture longer and more complex sequences of alarm dependencies. Li et al.³ presents a data-driven alarm prognosis model for cellular base stations that uses an ensemble of deep learning classifiers to tackle the heterogeneity and class imbalance in alarm data by extracting a rich set of features (e.g. alarm counts, durations, timestamps) and training multiple learners whose outputs are combined.

Telecommunication networks are also characterised by alarm storms due to compounding alarm activities arising from a single fault. Alarm storms complicate the network operations and demand the need to quantify alarm resilience. In essence, it becomes important to understand the propensity of an alarm to escalate into other alarms or self-resolve. Recent work has explored this avenue by modelling the alarm escalation patterns, recurrence, and absorption behaviours. Abele et al.¹⁴ presented the idea of Root Cause Alarms to be the instigators of an alarm storm and suggested the early detection of these alarms to reduce cascading effects. Li et al.¹⁵ developed an unsupervised association mining approach to quantify each alarm’s propagation tendency in a telecom network. The approach first cluster alarms, discards duplicates, and then identifies the root causes with an accuracy of 91%. Wang et al.¹⁶ proposed the Alarm Behaviour Analysis and Discovery (AABD) framework to capture the flapping and parent-child dependencies across a 2G–4G network. The dependencies are used to derive per-type flapping and escalation metrics. Zhao et al.¹⁷ developed a time-decay factor to identify recent alarms influencing resilience. Despite these advancements, most studies focussing on resilience assessments remain retrospective and are primarily coupled with real-time alarm prediction. The existing approaches do not sufficiently capture the critical factors for proactive risk mitigation, that is, the dynamics between escalation, suppression, and retriggering of alarms.

While prior work has made significant advances in temporal pattern mining, event prediction, and resilience evaluation, these areas have evolved mainly in parallel. Existing systems typically lack a unifying framework that connects prediction, interpretation, and resilience quantification in a data-driven yet operationally meaningful way. RAPTOR addresses this prevailing gap by unifying three capabilities that existing alarm-analysis tools treat in isolation: it captures the temporal causality that underlies alarm cascades, generates resilience-aware forecasts of forthcoming alarms, and quantifies alarm behaviour through metrics such as absorption time, recurrence propensity, and escalation likelihood.

Methodological framework

The RAPTOR framework evolved from an iterative, data driven research process grounded in the previous “Alarm Webs” framework.¹ Alarm Webs demonstrated that it is possible to uncover co-activation patterns across a RAN base station’s temporally aligned alarm sequences. However, when large-scale RAN data involves multiple base stations, three limitations become apparent: (a) absence of predictive capabilities, (b) probabilistic interpretation of alarm transitions is lacking, and (c) an inability to quantify alarm stability or fragility. Therefore, RAPTOR represents a means to address these gaps through successive design cycles using real operational RAN base-station alarm data. RAPTOR evolved as a unified pipeline for causality discovery, forecasting, and resilience assessment of RAN alarms. Figure 1 outlines the proposed RAPTOR framework. This framework is designed to interpret alarm behaviour across base stations by integrating temporal clustering, Markov modelling, and resilience analytics of the alarms. The raw alarm logs from RAN base stations are modelled as temporally structured sequences, which are then used to identify co-activated alarm patterns, model probabilistic transitions, and quantify network resilience to enable prediction, interpretability, and early detection of cascading risks in telecommunication networks from alarm logs alone.

Figure 1. Conceptual overview of the RAPTOR framework for predictive maintenance and risk-aware decision-making.

The alarm log dataset from a leading UK telecommunications network operator, spanning 2019–2021, was used for this framework. Each alarm has 13 attributes that capture the severity, type and base-station ID, location, timestamps for occurrence/clearance/acknowledgement, cleared and acknowledged status flags, log and equipment serials, and the base station’s maintenance status. These fields support temporal sequencing, spatial analysis, and station-level context across approximately 12,000 base stations. RAPTOR uses this comprehensive schema to model the alarm propagation within RANs (and not between RANs), and extends this to next-alarm prediction, and resilience assessment. Figure 2 illustrates the distribution of alarm occurrences across the dataset.

Figure 2. Distribution of alarm occurrences across base.

Data processing and encoding

In the data logs for the RAN base station alarms, each base station generates a continuous, time-ordered sequence of alarm events. To ensure the reliability of temporal event ordering, records with missing information are removed, and timestamp values are converted to a standard format. The alarms are treated as a sequence of transitions from one alarm to the next by associating each alarm entry with the time of occurrence and the base station ID where it was triggered. Unique numerical identifiers are assigned to each alarm type to enable consistent referencing across the dataset. These sequences also include a “No Alarm” state denoted by ϕ in this paper. This state is treated as a first-class alarm type and is not synthetically introduced. A “No Alarm” label is added only when an alarm represents the final event in a sequence (which is analogous to a leaf node in a causal graph representing the temporal chain of alarm-triggering events). A transition of

a_{i} \to ϕ

represents the empirical resolution of an alarm into a stable state.

For each base station, a station-specific analysis is performed using the ordered sequence of alarms extracted. These sequences are then used to create a temporal activation profile for every alarm type (within each base station), capturing the time intervals between consecutive activations (in hours). For each alarm type within a base station, a cumulative temporal activation profile is constructed. This temporal activation profile encodes the elapsed time since first activation (i.e.

{\begin{matrix} \frac{t_{i} - t_{1}}{3600} \end{matrix}}_{i = 2}^{n}

) and is used as input to DTW. This represents the global temporal evolution of the alarm’s activity, rather than capturing only its local inter-arrival gaps. The complete set of these alarm profiles (a valid alarm profile is one where each alarm must appear at least twice) constitutes the alarm activity dictionary for each base station. Algorithm 1 summarises the process.

Algorithm 1 Construct temporal profiles
1: function BUILDPROFILES(S) $▹$ Alarm sequence for one base station
2: Initialise dictionary $T$
3: for each alarm a in Sdo
4: Extract ordered timestamps $[t_{1}, t_{2}, . . ., t_{n}]$ for a
5: if $n > 1$ then
6: $T_{a} \leftarrow {\begin{matrix} \frac{(t_{i} - t_{1})}{3600} : i = 2, \dots, n \end{matrix}}$
7: $T [a] \leftarrow T_{a}$
8: end if
9: end for
10: return $T$ $▹$ Dictionary of activation profiles
11: end function

Similarity estimation and temporal clustering

The quantification of similarity between each base station’s temporal alarm activation profiles is implemented using Dynamic Time Warping (DTW). This results in a symmetric matrix containing pairwise distances representing the temporal relationships among all valid alarms (a DTW matrix is illustrated as a heatmap in Figure 3). This matrix provides a basis for clustering alarms by considering the comparable activation behaviour of temporal sequences. Agglomerative hierarchical clustering is applied on the condensed DTW distance matrix to merge alarms with minimal temporal dissimilarity, thereby creating an initial behavioural profile of the alarms. The resulting dendrogram, as shown in Figure 4, denotes functional or causal proximity of the alarms after grouping these alarms into clusters. he choice of DTW allows for the analysis of sequences with varying lengths and non-linear time shifts as compared to rigid metrics like Euclidean or Manhattan distance.^18,19 Moreover, compared to other plausible approaches in similar problem domains, such as Longest Common Subsequence (LCSS), Smith-Waterman (SW), and Needleman-Wunsch (NW) algorithms, DTW offers better trade-offs between computation efficiency and temporal structure-preserving accuracy.^18,20

Figure 3. DTW distance matrix for the medium-activity base.

Figure 4. Hierarchical clustering on DTW matrix for the medium-activity base station (BS-M1).

Markov chain modelling of alarm transitions

The probability that an alarm transitions to the next alarm based on its current state is modelled using a first-order Markov chain. The first-order Markov chain modelling allows for an efficient and interpretable framework for predictions. Markov chain models have been widely accepted to provide a robust modelling of system degradation and fault evolution, even from short alarm sub-sequences.^21–23 Moreover, empirical evidence suggests that Markov chains can outperform higher-order models on systems with repeated behaviours.^24,25 Simultaneously, it also enables smoother classification decisions, thereby reducing false positives.²⁶ Algorithm 2 outlines this process. For each base station, the ordered sequence of alarms S (from Algorithm 1) is used to record every observed transition between consecutive alarms. Within each S, every pair of consecutive alarms’ transitions is recorded. A count matrix keeps track of these grouped transitions and records how often an alarm is followed by another. The raw counts are converted to transition probabilities by normalising each row of the count matrix. This resulting probabilistic transition matrix represents the likelihood of each possible next alarm and forms the basis of the Markov model.

Algorithm 2 Build Markov transition matrix
1: function BUILDMARKOV(S) $▹$ Alarm sequence
2: Initialise count matrix M
3: for $i = 1$ to $\| S \| - 1$ do
4: Let $(a, b) \leftarrow (S [i], S [i + 1])$
5: $M [a, b] \leftarrow M [a, b] + 1$
8: end for
9: Normalise each row of M to obtain transition matrix P
10: returnP
11: end function

Forecasting and evaluation

This section outlines how the forecasting of the alarm evolution and causality is handled. Based on the Markov transition matrix, the most likely alarms to occur next are predicted and arranged in descending order of their probability. The top-N alarms with the highest probabilities of occurrence, given that an alarm has already occurred, are selected as the model’s forecast for that alarm in a base station. The model’s prediction accuracy is evaluated by comparing predictions to the actual

T - 1

alarm transitions in each sequence of alarms.

Definition 1. Top-1 Accuracy. Measures how often the most probable next alarm predicted by the model matches the actual next alarm observed in the sequence. It is computed as:

Top - 1 = \frac{1}{T - 1} \sum_{i = 1}^{T - 1} {\begin{matrix} 1, if predicition is correct, \\ 0, otherwise . \end{matrix}

(1)

Definition 2. Top-N Accuracy. Measures whether the actual next alarm appears among the N most probable alarms predicted for the current state. It is given by:

Top - N = \frac{1}{T - 1} \sum_{i = 1}^{T - 1} {\begin{matrix} 1, if actual alarm in top N preds ., \\ 0, otherwise . \end{matrix}

(2)

To evaluate the model’s predictive performance, two accuracy metrics are defined – Top-1 and Top-N accuracy. The Top-1 accuracy measures how often the model’s prediction matches the next predicted alarm (Definition 1). Similarly, the Top-N accuracy extends this metric by treating the model’s prediction as correct if the next predicted alarm is within the top N most probable subsequent alarms. Both metrics are averaged over all the transitions to provide an overall accuracy score for each base station. To assess generalisation and to graphically represent the causality, this analysis focuses on the

K

base stations with median alarm activity, comparing their transition matrices and predictive accuracies. This analysis can be applied to high- or low-activity base stations just as easily. To determine whether the learned Markov structure consistently captures alarm progression patterns across different network contexts, this cross-station evaluation becomes essential.

Alarm-level resilience analysis

In the context of alarm networks (be it within a base station or across interconnected systems), resilience refers to an alarm’s ability to withstand, contain, or recover from cascading fault propagation. A resilient alarm will not frequently trigger secondary alarms, will not persist for long, and will tend to transition towards a stable “No Alarm” state. In contrast, alarms with low resilience will more likely propagate, repeat, or remain unresolved, thereby contributing to operational fragility of the base station or network. An alarm’s resilience in a network of alarms indicates how well an alarm can withstand or recover from fault cascades. A resilient alarm is unlikely to trigger other alarms. Even if it is triggered, it is unlikely to stay active for long and usually returns to a stable “No Alarm” state. In contrast, an alarm with low resilience tends to propagate faults, occur frequently, or stay unresolved for extended durations. Primarily, these low-resilience alarms increase the base stations and, by extension, the wider network’s overall fragility. Propositions 1–3 provides three metrics for evaluating the resilience characteristics of alarms by exploiting the transition probability matrix.

Proposition 1. Cascade Entropy. It represents the uncertainty or spread in how an alarm can evolve into other alarms (measures the diversity of transitions originating from a given alarm):

H (a_{i}) = - \sum_{j} P_{ij} \log P_{ij}, P_{ij} > 0

(3)

A higher entropy value indicates broader propagation potential and a lower ability to contain cascading behaviour.

Proposition 2. Self-loop Probability. Quantifies the likelihood of an alarm repeating itself without resolving or transitioning to another state:

ρ (a_{i}) = P_{ii}

(4)

A higher value of $ρ (a_{i})$ implies persistence or unresolved fault conditions, thereby indicating reduced resilience.

Proposition 3. Absorption Probability. Measures the likelihood that an alarm transitions into a “No Alarm” state, representing recovery or containment (a higher absorption probability corresponds to faster resolution and stronger resilience):

α (a_{i}) = P_{i ϕ}

(5)

A composite resilience score is defined as a weighted aggregation of the min-max normalised values of the three individual metrics to combine these measures. It can be represented as:

R (a_{i}) = w_{1} (1 - \hat{H} (a_{i})) + w_{2} (1 - \hat{ρ} (a_{i})) + w_{3} \hat{α} (a_{i})

(6)

Where

\hat{H} (a_{i})

\hat{ρ} (a_{i})

, and

\hat{α} (a_{i})

are the normalised cascade entropy, self-loop probability, and absorption probability for alarm AI, computed across all alarms in the set. A higher

R (a_{i})

indicates greater resilience (i.e. the alarm is less likely to propagate, less prone to repetition, and more likely to resolve). The weights

w_{1}

w_{2}

, and

w_{3}

specify the relative importance of the three metrics and can be tuned according to operational priorities. The weights are set to

w_{1} = 0.4

w_{2} = 0.3

, and

w_{3} = 0.3

to prioritise alarms that demonstrate containment and recovery over those that propagate in this work. These weights can be experimentally tuned based on expert feedback and requirements analysis with network infrastructure stakeholders (managers and maintenance engineers).

Results and discussions

Experimental setup

The number of base stations in the RAN dataset is significantly large. For brevity, the RAPTOR framework is applied to a selective set of base stations that represent the diversity of alarm behaviours across the network. First, the total number of alarms reported in each base station is calculated to obtain the overall distribution of alarm activity across the station. The distribution is shown in Figure 2. Next, three categories of base stations are selected from the distribution. These categories are: (i) high-activity stations, representing the top decile of alarm activity; (ii) median-activity stations, representing the middle quantiles of alarm activity; and (iii) low-activity stations, representing the lower second decile of alarm activity. Then, from each category, one representative station is selected to ensure balanced coverage of the diverse operational conditions. Note that due to the word limit, most of the visualisation plots and related discussions will focus mainly on the median-activity base station since it provides a balanced reflection of typical alarm dynamics.

Temporal and transition dynamics in alarm behaviour

The medium-activity base station selected was the BS-M1. It recorded 53 alarm occurrences that can be grouped into 16 unique types. Figure 3 presents the DTW distance matrix for BS-M1. The figure indicates localised clusters of low-distance cells that signify co-activation within related subsystems. Subsequently, cells with higher distances signify independent or sporadic events. The matrix structure reflects a mixed alarm activity where periods of coordinated subsystem activity are interspersed with isolated faults. The agglomerative hierarchical clustering of alarm profiles for BS-M1 is developed using the DTW matrix and is illustrated in Figure 4.

The figure shows the emergence of compact clusters, indicating that several alarms share close temporal signatures, possibly resulting from common physical or logical origins. Moreover, the shallow hierarchy and limited branching indicate partial periodicity and controlled coupling. These observations are consistent with a stable and moderately loaded base station. This station-level view of the co-activation patterns aligns with the recent efforts to analyse anomaly propagation using spatio-temporal graph-based architectures. The key difference is that DTW remains a foundational approach that focuses on temporal proximity and shape similarity as compared to existing efforts that encode cross-cell dependencies.⁴ Figure 5 provides the Markov transition heatmap, representing the probability of one alarm leading to another. The figure reveals distinct transition hotspots, indicating semi-deterministic chains linking hardware and link-level faults to higher-layer service issues. The heatmap shows a higher overall sparsity, suggesting decent fault containment.

Figure 5. Markov chain transition probabilities for the medium-activity base station (BS-M1).

Prediction dynamics, resilience, and accuracy evaluation

Figure 6 provides the next-alarm graph by translating probabilistic transitions into interpretable directed edges. The edges trace how local disturbances evolve across the alarm network. Alarms are represented as nodes, and the edges are annotated with probability (P) and estimated time to activation (ETA). For each base station, the elapsed time

Δ t = t_{i + 1} - t_{i}

between consecutive alarm events in the alarm log is calculated. For a transition

a_{i} \to a_{j}

, the ETA is the empirical median of all observed

Δ t

values for that transition and is reported together with its interquartile range (IQR). For a specific pair, if the minimum number of samples is not available, the estimate falls back on the marginal distribution of

Δ t

following

a_{i}

, and ultimately to a global baseline. The overall network reflects a modular and semi-linear topology as distinct subsystems form interpretable alarm chains. For example, the transitions from link faults are mainly directed towards interface or signalling errors. Interestingly, the central alarms like BBU CPRI Interface Error and eNodeB S1 Transmission Interruption act as bridging nodes that connect hardware-level events to higher-layer service degradations. Notably, most edges have moderate probabilities and ETAs, suggesting recoverable faults and bounded propagation chains. The figure validates the suitability of the RAPTOR framework in converting alarm sequences into causal networks that can be used to uncover propagation tendencies and recovery paths. RAPTOR demonstrates a lightweight, transparent, and traceable mechanism for alarm propagation modelling. In contrast, recent approaches tend to lean on multi-layer neural architectures integrating deep ensembles and meta-learners, that exhibit higher adaptability and accuracy but limited traceability for operational contexts.⁴

Figure 6. Predicted next-alarm graph for the medium-activity base station (BS-M1).

Figure 7 summarise the overall resilience score for the unique alarms in the BS-M1 base station. Figure 8 overlays the resilience characteristics onto the next-alarm graph. The colour scale represents resilience scores, with higher values indicating self-resolving alarms and lower scores suggesting persistent or cascading behaviours. Both figures suggest that BS-M1 exhibits a favourable resilience profile. Most alarms have a resilience score above 0.7. Although there are specific alarms that act as low-resilience hubs and are characterised by multiple transitions. In essence, the overall stability is high, but a small subset of alarms dictates the system vulnerability. Notably, the resilience analysis involving the score design and the resilience-aware alarm transition graph captures the vulnerabilities in an observable and operationally meaningful way and also remain a unique contribution of this paper.

Figure 7. Alarm resilience scores for the medium-activity base.

Figure 8. Resilience-aware alarm transition graph for the medium-activity base station (BS-M1). Node colour represents resilience score; edge labels show transition probability ( $P$ ) and estimated time to activation (ETA).

Table 1 summarises the prediction performance across the three representative stations (high-activity, medium-activity, and low-activity stations). BS-M1 shows a moderate Top-1 accuracy (53.85%) and a strong Top-3 accuracy (92.31%). The accuracy results confirm reliable forecasting within a narrow prediction window. The prediction performance reveals that predictive reliability generally scales with data richness. However, the accuracy remains robust even for low to moderate alarm activities. Prior studies have demonstrated the suitability of similar metrics such as Top-2 and Top-N predictions, as they account for stochastic uncertainty in alarm propagation and can help “avoid incorrect predictions” in cases where the “likelihood values of the two most probable stages were very close.”²³ Moreover, unlike precision–recall metrics that assume independent samples, Top-N accuracy is more appropriate for Markovian alarm sequences where prediction quality depends on navigating the evolving state space rather than single-label correctness.^21,22 Hence, reporting multiple stages instead of one significantly improve prediction accuracy in systems with correlated alerts and supports more robust operational decision-making under uncertainty.²² The overall outcomes highlight the applicability of RAPTOR to model interpretable alarm propagation and alarm stability, along with consistent predictive accuracy across varying operational regimes.

Table 1. Prediction accuracy across representative base.

Base station	Top-1 accuracy (%)	Top-2 accuracy (%)	Top-3 accuracy (%)
High-activity base station (BS-H1)	89.42	98.90	99.23
Medium-activity base station (BS-M1)	53.85	80.77	92.31
Low-activity base station (BS-L1)	65.00	85.00	100.00

Multi-station validation of transition forecasting performance

To further show the generalisability of the proposed framework, the Markov prediction performance for 20 medium-activity base stations (BS-M1 to BS-M20) is presented in Figure 9 . The figure compares the Top-1 and Top-3 accuracies across the 20 stations. In most cases accuracy exceeds 50% and Top-3 accuracy surpasses 85%. Therefore, the results confirm that the next alarm event is most likely to lie among the top few probabilistic predictions. The results also reveal some degree of heterogeneity in forecasting behaviour. Hence, it is possible that the stations with structured, recurrent alarm pathways may achieve near-perfect accuracy. On the contrary, stations with intermittent or irregular alarm dynamics may display moderate variation.

Figure 9. Prediction accuracy (Top-1 vs Top-3) across 20.

This multi-station validation demonstrates RAPTOR’s strong predictive performance without the need for extensive retraining or parameter tuning required in recent approaches.⁴ The framework is able to adapt naturally to station-specific dynamics and local alarm topologies, while preserving interpretability through probabilistic transition mapping. Table 2 captures the representative outputs for BS-M1 by listing the Top-3 predicted alarms for a selected alarm. The table again confirms that high-confidence transitions correspond to direct and recurrent fault linkages. By capturing both transition probability and resilience context, RAPTOR provides a comprehensive lens for proactive fault management and operational intelligence in cellular networks.

Table 2. Top-3 predicted next alarms for selected current alarms (BS-M1).

Current alarm (ID)	Predicted Top-3 next alarms
Cell PS service faulty (0)	ALD maintenance link failure ( $P = 1.00$ )
OML fault (2)	1. ESL link fault ( $P = 0.25$ ) 2. GSM cell manually blocked ( $P = 0.25$ ) 3. Monitoring device maintenance link failure ( $P = 0.25$ )
GSM cell out of service (3)	1. OML fault ( $P = 0.33$ ) 2. BBU CPRI interface error ( $P = 0.33$ ) 3. ESL link fault ( $P = 0.17$ )
ESL link fault (4)	1. OML fault ( $P = 0.50$ ) 2. GSM cell out of service ( $P = 0.25$ ) 3. BBU topology and configuration mismatch ( $P = 0.25$ )
IP PM activation failure (5)	1. gNodeB X2 interface fault ( $P = 0.82$ ) 2. User plane path fault ( $P = 0.14$ ) 3. IP PM activation failure ( $P = 0.01$ )
Cell blocked (6)	1. NR DU cell blocked ( $P = 0.27$ ) 2. GSM cell manually blocked ( $P = 0.18$ ) 3. ESL link fault ( $P = 0.09$ )

High-confidence predictions (

P > 0.8

) are shown in red.

Limitations and future research

The RAPTOR framework offers sufficient generalisability and versatility to be effective across domains that involve sequential, time-dependent signals such as alarms and sensor readings. This claim is supported by prior evidence. Comparable frameworks using similar datasets have been successfully applied in a wide range of domains, including relay anomaly detection in power systems,^26,27 DDoS attack prediction,^23,28 real-time fire detection in sensor networks,^19,29 healthcare support trajectory analysis,³⁰ and alarm flood identification in chemical process industries.^21,22 That being said, the RAPTOR framework has its limitations through the individual components that make up the framework. DTW is notably characterised by its over-compression and over-stretching issues, where it maps multiple points to one, which can potentially lose critical signal features.¹⁸ The first-order Markov assumes a short memory and ignores long-range dependencies and causal effects.²⁴ Moreover, it struggles with novelty detection and may misclassify unknown states into existing states if not provided with sufficiently large and diverse samples.³¹ Consequently, the resilience scoring inherits these limitations and may misrepresent alarm resilience transitions in complex evolving sequences. There are multiple avenues to address these limitations that set a clear tone for future research directions. DTW can be extended using an adaptive penalty function to mitigate pathological matching and improve similarity accuracy. Alarm transitions can also be modelled using higher-order dynamic Markov chains to capture complex trends and integrate anomaly substitution strategies to prevent anomalies from infecting future predictions. Unknown states and hidden correlations can be addressed through the addition of “

m + 1

” unknown states and multivariate joint sequences. Future extensions of this work will also include modelling multi-alarm transitions, integrating network topology for spatial inference, or combining with root cause analytics such as Viterbi-based fault isolation to develop fully autonomous alarm intelligence systems.

Conclusions

This paper presents a comprehensive, interpretable, and scalable framework, RAPTOR, for alarm sequence analysis in RAN base stations. The core domain challenges of temporal variability, causality inference, sparse and heterogeneous data, and limited interpretability of alarms in telecommunication network operations have been addressed through this work. RAPTOR addressed RQ1 (Temporal Causality) by making use of DTW-based clustering and transition graphs, which revealed latent time-aligned alarm dependencies, and also identified station-specific alarm co-activation trends. Further, RQ2 (Predictive Modelling) was addressed by constructing Markov transition matrices. The Top-1 and Top-3 prediction accuracy metrics evaluated the effectiveness of this approach and were benchmarked across base stations with different activity levels. Finally, RQ3 (Resilience Quantification) was addressed by defining resilience scores based on cascade entropy, self-loop probability, and the alarm absorption likelihood within and across base stations.

Empirical evaluation on a real-world dataset demonstrated that the framework robustly adapts to varying alarm densities and produces interpretable predictions and resilience insights. Although the evaluation in this work focussed on medium-activity base stations, the model performs well for both low and high-activity base stations, which involved complex but structured stochastic dynamics (high-activity base stations) as well as simple deterministic alarm chains (low-activity base stations). This work contributes to telecommunication network resilience by connecting alarm behaviour to system health inference. RAPTOR provides diagnostic value beyond prediction by quantifying alarm persistence, its cascading potential, and its self-resolution tendency. Future extensions of this work will include modelling multi-alarm transitions, integrating network topology for spatial inference, or combining with root cause analytics to develop fully autonomous alarm intelligence systems.

Consent for publication

Publication approval was obtained from the collaborating industry partner. The industry partner reviewed and approved the manuscript prior to submission.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors acknowledge the use of data from the BT Prosperity Partnership Project: Next Generation Converged Digital Infrastructure, which was supported by the Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/R004935/1.

ORCID iD

Anandarup Mukherjee https://orcid.org/0000-0002-3165-1151

Data availability statement

The data underlying this study are proprietary to the collaborating industry partner and cannot be shared publicly due to confidentiality and contractual restrictions. Derived, de-identified, or aggregated results may be available from the corresponding author upon reasonable request and subject to the partner’s approval.*

References

1. Mukherjee A, Herrera M, Indiran HP, et al. Alarm webs: a framework for decoding RAN alarm dynamics. IFAC-PapersOnLine 2024; 58(8): 103–108.

Crossref

Web of Science

Google Scholar

2. Li YF, Zhao W, Zhang C, et al. A study on the prediction of service reliability of wireless telecommunication system via distribution regression. Reliab Eng Syst Saf 2024; 250: 110291.

Crossref

Web of Science

Google Scholar

3. Li L, Herrera M, Mukherjee A, et al. Predictive alarm models for improving radio access network robustness. Expert Syst Appl 2025; 259: 125312. https://doi.org/10.1016/j.eswa.2024.125312

Google Scholar

4. Lin J, Lan T, Zhang B, et al. Multi-scenario cellular KPI prediction based on spatiotemporal graph neural network. IEEE Trans Autom Sci Eng 2025; 22: 5131–5142. https://doi.org/10.1109/tase.2024.3416952

Google Scholar

5. Jakobson G, Weissman M. Alarm correlation. IEEE Netw 1993; 7(6): 52–59.

Crossref

Google Scholar

6. Fournier-Viger P, He G, Zhou M, et al. Discovering alarm correlation rules for network fault management. In: International Conference on Service-Oriented Computing, Springer International Publishing, Cham, 2020, pp. 228–239.

Google Scholar

7. Brugnoni S, Bruno G, Manione R, et al. An expert system for real time fault diagnosis of the Italian telecommunications network. In: Proceedings of the IFIP TC6/WG6. 6 Third International Symposium on Integrated Network Management with participation of the IEEE Communications Society CNOM and with support from the Institute for Educational Services, 1993, pp. 617–628.

Google Scholar

8. Bouloutas AT, Calo S, Finkel A. Alarm correlation and fault identification in communication networks. IEEE Trans Commun 1994; 42(234): 523–533. https://doi.org/10.1109/tcomm.1994.577079

Google Scholar

9. Klemettinen M, Mannila H, Toivonen H. Interactive exploration of interesting findings in the telecommunication network alarm sequence analyzer TASA. Inf Softw Tech 1999; 41(9): 557–567. https://doi.org/10.1016/s0950-5849(99)00019-1

Google Scholar

10. Lin Y, Wang S, Wu Y, et al. Similarity analysis of alarm sequences by a shuffling method. Front Phys 2021; 9: 1–6. https://doi.org/10.3389/fphy.2021.714910

Google Scholar

11. Salfner F, Schieschke M, Malek M. Predicting failures of computer systems: a case study for a telecommunication system. In: 20th international parallel and distributed processing symposium, IPDPS 2006, 2006. https://doi.org/10.1109/IPDPS.2006.1639672.

Google Scholar

12. Salaün A, Bouillard A, Buob MO. Space-time pattern extraction in alarm logs for network diagnosis. In: International Conference on Machine Learning for Networking, Cham: Springer International Publishing, 2019, pp. 134–153.

Crossref

Google Scholar

13. Desbouvries F, Petetin Y, Salaün A. Expressivity of hidden Markov chains vs. recurrent neural networks from a system theoretic viewpoint. IEEE Trans Signal Process 2023; 71: 4178–4191.

Crossref

Web of Science

Google Scholar

14. Abele L, Anic M, Gutmann T, et al. Combining knowledge modelling and machine learning for alarm root cause analysis. IFAC Proc Volumes 2013; 46(9): 1843–1848.

Crossref

Google Scholar

15. Li M, Yang M, Chen P. Alarm reduction and root cause inference based on association mining in communication network. Front Comput Sci 2023; 5: 1211739.

Crossref

Google Scholar

16. Wang J, He C, Liu Y, et al. Efficient alarm behavior analytics for telecom networks. Inf Sci 2017; 402: 1–14.

Crossref

Web of Science

Google Scholar

17. Zhao Y, Chen J, Wu D, et al. Network anomaly detection by using a time-decay closed frequent pattern. Information 2019; 10(8): 262.

Crossref

Web of Science

Google Scholar

18. Li H, Liu J, Yang Z, et al. Adaptively constrained dynamic time warping for time series classification and clustering. Inf Sci 2020; 534: 97–116. https://doi.org/10.1016/j.ins.2020.04.009

Google Scholar

19. Baek J, Alhindi TJ, Jeong YS, et al. Real-time fire detection system based on dynamic time warping of multichannel sensor networks. Fire Saf J 2021; 123: 103364. https://doi.org/10.1016/j.firesaf.2021.103364

Google Scholar

20. Hu W, Zhang X, Wang J, et al. Pattern matching of industrial alarm floods using word embedding and dynamic time warping. IEEE/CAA J Autom Sin 2023; 10(4): 1096–1098. https://doi.org/10.1109/jas.2023.123594

Google Scholar

21. Venkidasalapathy JA, Kravaris C. Hidden Markov model based approach for alarm rationalization. In: 21th IFAC World Congress, 2020, pp. 13767–13770.

Google Scholar

22. Ariamuthu Venkidasalapathy J, Kravaris C. Hidden Markov model based approach for diagnosing cause of alarm signals. AIChE J 2021; 67(10): 1–11. https://doi.org/10.1002/aic.17297

Google Scholar

23. Ghafir I, Kyriakopoulos KG, Lambotharan S, et al. Hidden Markov models and alert correlations for the prediction of advanced persistent threats. IEEE Access 2019; 7: 99508–99520. https://doi.org/10.1109/ACCESS.2019.2930200

Google Scholar

24. Ren H, Ye Z, Li Z. Anomaly detection based on a dynamic Markov model. Inf Sci 2017; 411: 52–65. https://doi.org/10.1016/j.ins.2017.05.021

Google Scholar

25. Papataxiarhis V, Kostakonti S. Stepwise correlation of multivariate IoT event data based on first-order Markov chains. arXiv e-prints, Art. no. arXiv:2305.18082, 2023. https://doi.org/10.48550/arXiv.2305.18082

Google Scholar

26. Smyth P. Hidden Markov models and neural networks for fault detection in dynamic systems. In: Proceedings of the 1993 IEEE workshop, neural networks for signal processing III, NNSP 1993, vol. 27, no. 1, 1993, pp. 582–591. https://doi.org/10.1109/NNSP.1993.471829.

Google Scholar

27. Andrade JR, Rocha C, Silva R, et al. Data-driven anomaly detection and event log profiling of Scada alarms. IEEE Access 2022; 10: 73758–73773. https://doi.org/10.1109/access.2022.3190398

Google Scholar

28. Holgado P, Villagra VA, Vazquez L. Real-time multistep attack prediction based on hidden Markov models. IEEE Trans Dependable Secure Comput 2020; 17(1): 134–147. https://doi.org/10.1109/tdsc.2017.2751478

Google Scholar

29. Yu M, Yuan H, Li K, et al. Research on multi-detector real-time fire alarm technology based on signal similarity. Fire Saf J 2023; 136: 103724. https://doi.org/10.1016/j.firesaf.2022.103724

Google Scholar

30. Hebbrecht K, Stuivenga M, Birkenhäger T, et al. Understanding personalized dynamics to inform precision medicine: a dynamic time warp analysis of 255 depressed inpatients. BMC Med 2020; 18: 1–15.

31. Smyth P. Markov monitoring with unknown states. IEEE J Sel Areas Commun 1994; 12(9): 1600–1612. https://doi.org/10.1109/49.339929

Google Scholar