Filter by tag: #ids #apt #ai-code-generators #tinyml #iot #federated-learning #cps #static-analysis #blockchain Show All
Federated and generative data sharing for data-driven security: Challenges and approach
- Published at: IEEE 20th International Workshop on Assurance in Distributed Systems and Networks (ADSN)
- Authors: R. Natella, A. Ceccarelli, M. Ficco
- Date: September 2022
- Tags: #ids #apt
Abstract
Modern cyber-attacks are evolving into Advanced Persistent Threats (APTs). They are attacks orchestrated by cybercriminals or state-sponsored groups, which perform carefully-planned, stealthy, targeted attacks that span over a long period of time. It is difficult to defend against APTs, mostly because the absence of high-quality data to build detectors and train personnel. In fact, new attacks are continuously crafted, and most organizations are unwilling to share data about attacks they have experienced. In this paper, we argue about an approach for the automatic generation of representative datasets of APTs, without forcing organizations to disclose their sensitive information. We propose to adopt the Federated Learning paradigm to train a Generative Machine Learning model, which will generate new traces of network and host events representative of real APT attacks. Blockchain-based strategies will overcome the typical shortcomings of a centralized approach, such as single-point-failure and malicious clients. The generated APT datasets can be leveraged for training and assessing APT detectors based on AI, and emulating attacks in live cyber-ranges exercises.
Published paper
Intrusion detection without attack knowledge: generating out-of-distribution tabular data
- Published at: IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)
- Authors: A. Ceccarelli, T. Zoppi
- Date: October 2023
- Tags: #ids
Abstract
Anomaly-based intrusion detectors are machine learners trained to distinguish between normal and anomalous data. The normal data is generally easy to collect when building the train set; instead, collecting anomalous data requires historical data or penetration testing campaigns. Unfortunately, the first is most often unavailable or unusable, and the latter is usually expensive and unfeasible, as it requires hacking the target system. It turns out that the possibility of training an intrusion detector without attack knowledge, i.e., without anomalies, is attractive. This paper reviews strategies to train anomaly detectors in the absence of anomalies, from shallow machine learning to deep learning and computer vision approaches, and applies such strategies to the domain of intrusion detection. We experimentally show that training an intrusion detector without attack knowledge is effective when normal and attack data distributions are distinguishable. Detection performance severely drops in the case of complex (but more realistic) datasets, making all the existing solutions inadequate for real applications. However, the recent advancements of out-of-distribution research in deep learning and computer vision show interesting prospective results.
Published paper
AI Code Generators for Security: Friend or Foe?
- Published at: IEEE Security & Privacy Magazine
- Authors: R. Natella, P. Liguori, C. Improta. B. Cukic, D. Cotroneo
- Date: February 2024
- Tags: #ai-code-generators
Abstract
Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark.
Published paper arXiv Dataset
On Attacks (Dis)Similarities to Test Adversarial Defense: Can We Reduce the Attack Set?
- Published at: ITASEC 2024
- Authors: T. Puccetti, T. Zoppi, A. Ceccarelli
- Date: April 2024
- Tags: #ids
Abstract
Published paper
Federated Learning for IoT devices: Enhancing TinyML with on-board training
- Published at: Information Fusion
- Authors: M. Ficco, A. Guerriero, E. Milite, F. Palmieri, R. Pietrantuono, S. Russo
- Date: April 2024
- Tags: #tinyml #iot #federated-learning
Abstract
The spread of the Internet of Things (IoT) involving an uncountable number of applications, combined with the rise of Machine Learning (ML), has enabled the rapid growth of pervasive and intelligent systems in a variety of domains, including healthcare, environment, railway transportation and Industry 4.0. While this opens up favorable scenarios, it also raises new challenges. The huge amount of data collected and processed by ML applications requires efficient and scalable solutions that contrast with the constrained capabilities of IoT devices as for memory, power consumption, processing and network bandwidth. The TinyML technologies foster the adoption of ML algorithms running locally on IoT devices. However, they typically foresee a remote training process (e.g., on cloud servers) combined with local inference – a strategy not always viable, e.g., for privacy and security issues.
We present a technique to enable the on-board training of ML algorithms on IoT devices, through the combination of federated learning (FL) and transfer learning (TL). We experimentally analyze it in classification and regression problems, comparing it to traditional FL solutions, as well as with a consolidated technique based on Tensorflow Lite. Results show that FL with TL reaches accuracy values better than FL without TL in both classification (86.48%) and regression (0.0201). These results are comparable with a model trained on the full dataset. We further analyze training and inference time and power consumption on various devices. Finally, we evaluate how the performance changes with unbalanced training datasets, showing that although they strongly impact accuracy, FL makes models more robust, letting them achieve accuracy comparable to when trained on balanced datasets.
Published paper
Securing an Application Layer Gateway: An Industrial Case Study
- Published at: IEEE 19th European Dependable Computing Conference (EDCC)
- Authors: C. Cesarano, R. Natella
- Date: April 2024
- Tags: #iot #cps
Abstract
Application Layer Gateways (ALGs) play a crucial role in securing critical systems, including railways, industrial automation, and defense applications, by segmenting networks at different levels of criticality. However, they require rigorous security testing to prevent software vulnerabilities, not only at the network level but also at the application layer (e.g., deep traffic inspection components). This paper presents a vulnerability-driven methodology for the comprehensive security testing of ALGs. We present the methodology in the context of an industrial case study in the railways domain, and a simulation-based testing environment to support the methodology.
Published paper arXiv
Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks
- Published at: 32nd IEEE/ACM International Conference on Program Comprehension (ICPC)
- Authors: D. Cotroneo, C. Improta, P. Liguori, R. Natella
- Date: April 2024
- Tags: #ai-code-generators
Abstract
AI-based code generators have become pivotal in assisting developers in writing software starting from natural language (NL). However, they are trained on large amounts of data, often collected from unsanitized online sources (e.g., GitHub, HuggingFace). As a consequence, AI models become an easy target for data poisoning, i.e., an attack that injects malicious samples into the training data to generate vulnerable code.
To address this threat, this work investigates the security of AI code generators by devising a targeted data poisoning strategy. We poison the training data by injecting increasing amounts of code containing security vulnerabilities and assess the attack’s success on different state-of-the-art models for code generation. Our study shows that AI code generators are vulnerable to even a small amount of poison. Notably, the attack success strongly depends on the model architecture and poisoning rate, whereas it is not influenced by the type of vulnerabilities. Moreover, since the attack does not impact the correctness of code generated by pre-trained models, it is hard to detect. Lastly, our work offers practical insights into understanding and potentially mitigating this threat.
Published paper arXiv Dataset
ROSPaCe: Intrusion Detection Dataset for a ROS2-Based Cyber-Physical System and IoT Networks
- Published at: Scientific Data, Vol. 11.1, Article no. 481
- Authors: Tommaso Puccetti, Simone Nardi, Cosimo Cinquilli, Tommaso Zoppi, Andrea Ceccarelli
- Date: May 2024
- Tags: #iot #ids
Abstract
Most of the intrusion detection datasets to research machine learning-based intrusion detection systems (IDSs) are devoted to cyber-only systems, and they typically collect data from one architectural layer. Often the attacks are generated in dedicated attack sessions, without reproducing the realistic alternation and overlap of normal and attack actions. We present a dataset for intrusion detection by performing penetration testing on an embedded cyber-physical system built over Robot Operating System 2 (ROS2). Features are monitored from three architectural layers: the Linux operating system, the network, and the ROS2 services. The dataset is structured as a time series and describes the expected behavior of the system and its response to ROS2-specific attacks: it repeatedly alternates periods of attack-free operation with periods when a specific attack is being performed. This allows measuring the time to detect an attacker and the number of malicious activities performed before detection. Also, it allows training an intrusion detector to minimize both, by taking advantage of the numerous alternating periods of normal and attack operations.
Published paper Dataset
TinyIDS - An IoT Intrusion Detection System by Tiny Machine Learning
- Published at: Computational Science and Its Applications – ICCSA 2024 Workshops
- Authors: Pietro Fusco, Gennaro Pio Rimoli, Massimo Ficco
- Date: July 2024
- Tags: #iot #ids #federated-learning #tinyml
Abstract
The use of Internet of Things (IoT) devices in sectors, such as healthcare, automotive, and industrial automation, has increased the risk of attacks against critical assets. Machine learning techniques may be utilized to identify malicious behaviors, but they often require dedicated, energy-intensive, and expensive devices, which may not be deployable in IoT infrastructures. Furthermore, privacy constraints, security policies, and latency constraints could limit the sending of sensitive data to powerful remote servers. To address this issue, the emerging field of TinyML offers a solution for implementing machine learning algorithms directly on resource-constrained devices. Therefore, this article presents the implementation of an intrusion detector, named TinyIDS, which exploits the Tiny machine learning techniques. The detector can be deployed on resource-constrained IoT devices to detect attacks against sensor networks, as well as malicious behaviors of compromised smart objects. On-board training has been exploited to train and analyze data locally without having to transfer sensitive data to remote or untrusted cloud services. The solution has been tested on common MCU-based devices and ToN_IoT datasets.
Published paper Dataset
The Power of Words: Generating PowerShell Attacks from Natural Language
- Published at: 18th USENIX WOOT Conference on Offensive Technologies (WOOT 24)
- Authors: P. Liguori, C. Marescalco, R. Natella, V. Orbinato, L. Pianese
- Date: August 2024
- Tags: #ai-code-generators #apt
Abstract
As the Windows OS stands out as one of the most targeted systems, the \textit{PowerShell} language has become a key tool for malicious actors and cybersecurity professionals (e.g., for penetration testing). This work explores an uncharted domain in AI code generation by automatically generating offensive PowerShell code from natural language descriptions using Neural Machine Translation (NMT). For training and evaluation purposes, we propose two novel datasets with PowerShell code samples, one with manually curated descriptions in natural language and another code-only dataset for reinforcing the training. We present an extensive evaluation of state-of-the-art NMT models and analyze the generated code both statically and dynamically. Results indicate that tuning NMT using our dataset is effective at generating offensive PowerShell code. Comparative analysis against the most widely used LLM service ChatGPT reveals the specialized strengths of our fine-tuned models.
Published paper arXiv Dataset
A Strategy for Predicting the Performance of Supervised and Unsupervised Tabular Data Classifiers
- Published at: Data Science and Engineering
- Authors: T. Zoppi, A. Ceccarelli, A. Bondavalli
- Date: September 2024
- Tags: #ids
Abstract
Machine Learning algorithms that perform classification are increasingly been adopted in Information and Communication Technology (ICT) systems and infrastructures due to their capability to profile their expected behavior and detect anomalies due to ongoing errors or intrusions. Deploying a classifier for a given system requires conducting comparison and sensitivity analyses that are time-consuming, require domain expertise, and may even not achieve satisfactory classification performance, resulting in a waste of money and time for practitioners and stakeholders. This paper predicts the expected performance of classifiers without needing to select, craft, exercise, or compare them, requiring minimal expertise and machinery. Should classification performance be predicted worse than expectations, the users could focus on improving data quality and monitoring systems instead of wasting time in exercising classifiers, saving key time and money. The prediction strategy uses scores of feature rankers, which are processed by regressors to predict metrics such as Matthews Correlation Coefficient (MCC) and Area Under ROC-Curve (AUC) for quantifying classification performance. We validate our prediction strategy through a massive experimental analysis using up to 12 feature rankers that process features from 23 public datasets, creating additional variants in the process and exercising supervised and unsupervised classifiers. Our findings show that it is possible to predict the value of performance metrics for supervised or unsupervised classifiers with a mean average error (MAE) of residuals lower than 0.1 for many classification tasks. The predictors are publicly available in a Python library whose usage is straightforward and does not require domain-specific skill or expertise.
Published paper Dataset
Automating the correctness assessment of AI-generated code for security contexts
- Published at: Journal of Systems and Software
- Authors: D. Cotroneo, A. Foggia, C. Improta, P. Liguori, R. Natella
- Date: October 2024
- Tags: #ai-code-generators
Abstract
Evaluating the correctness of code generated by AI is a challenging open problem. In this paper, we propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation. We use ACCA to assess four state-of-the-art models trained to generate security-oriented assembly code and compare the results of the evaluation with different baseline solutions, including output similarity metrics, widely used in the field, and the well-known ChatGPT, the AI-powered language model developed by OpenAI. Our experiments show that our method outperforms the baseline solutions and assesses the correctness of the AI-generated code similar to the human-based evaluation, which is considered the ground truth for the assessment in the field. Moreover, ACCA has a very strong correlation with the human evaluation (Pearson’s correlation coefficient r=0.84 on average). Finally, since it is a fully automated solution that does not require any human intervention, the proposed method performs the assessment of every code snippet in ~0.17s on average, which is definitely lower than the average time required by human analysts to manually inspect the code, based on our experience.
Published paper arXiv Dataset
Enhancing AI-based Generation of Software Exploits with Contextual Information
- Published at: IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)
- Authors: Pietro Liguori, Cristina Improta, Roberto Natella, Bojan Cukic, Domenico Cotroneo
- Date: October 2024
- Tags: #ai-code-generators
Abstract
This practical experience report explores Neural Machine Translation (NMT) models’ capability to generate offensive security code from natural language (NL) descriptions, highlighting the significance of contextual understanding and its impact on model performance. Our study employs a dataset comprising real shellcodes to evaluate the models across various scenarios, including missing information, necessary context, and unnecessary context. The experiments are designed to assess the models’ resilience against incomplete descriptions, their proficiency in leveraging context for enhanced accuracy, and their ability to discern irrelevant information. The findings reveal that the introduction of contextual data significantly improves performance. However, the benefits of additional context diminish beyond a certain point, indicating an optimal level of contextual information for model training. Moreover, the models demonstrate an ability to filter out unnecessary context, maintaining high levels of accuracy in the generation of offensive security code. This study paves the way for future research on optimizing context use in AI-driven code generation, particularly for applications requiring a high degree of technical precision such as the generation of offensive code.
Published paper arXiv Dataset
Enhancing robustness of AI offensive code generators via data augmentation
- Published at: Empirical Software Engineering
- Authors: C. Improta, P. Liguori, R. Natella, B. Cukic, D. Cotroneo
- Date: October 2024
- Tags: #ai-code-generators
Abstract
Since manually writing software exploits for offensive security is time-consuming and requires expert knowledge, AI-base code generators are an attractive solution to enhance security analysts’ productivity by automatically crafting exploits for security testing. However, the variability in the natural language and technical skills used to describe offensive code poses unique challenges to their robustness and applicability. In this work, we present a method to add perturbations to the code descriptions to create new inputs in natural language (NL) from well-intentioned developers that diverge from the original ones due to the use of new words or because they miss part of them. The goal is to analyze how and to what extent perturbations affect the performance of AI code generators in the context of offensive code. First, we show that perturbed descriptions preserve the semantics of the original, non-perturbed ones. Then, we use the method to assess the robustness of three state-of-the-art code generators against the newly perturbed inputs, showing that the performance of these AI-based solutions is highly affected by perturbations in the NL descriptions. To enhance their robustness, we use the method to perform data augmentation, i.e., to increase the variability and diversity of the NL descriptions in the training data, proving its effectiveness against both perturbed and non-perturbed code descriptions.
Published paper arXiv Dataset
Detection Latencies of Anomaly Detectors - An Overlooked Perspective?
- Published at: IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)
- Authors: T. Puccetti, A. Ceccarelli
- Date: October 2024
- Tags: #ids
Abstract
The ever-evolving landscape of attacks, coupled with the growing complexity of ICT systems, makes crafting anomaly-based intrusion detectors and error detectors difficult: they must accurately detect attacks and promptly perform detections. Although improving and comparing the detection capability is the focus of most research works, the timeliness of the detection is less considered and often insufficiently evaluated or discussed. In this paper, we argue the relevance of measuring the temporal latency of attacks and errors, and we propose an evaluation approach for detectors to ensure a trade-off between correct and in-time detection. Briefly, the approach relates the false positive rate with the temporal latency of attacks and errors, ultimately leading to guidelines for configuring a detector. We apply our approach by evaluating different intrusion and error detectors in two industrial cases: i) an embedded railway on-board system that optimizes public mobility, and ii) an edge device for the Industrial Internet of Things. Our results show that considering latency in addition to traditional metrics like the false positive rate, precision, and coverage gives an additional fundamental perspective on the actual performance of the detector and should be considered when assessing and configuring anomaly detectors.
Published paper Dataset
Better and safer autonomous driving with predicted object relevance
- Published at: IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW)
- Authors: A. Ceccarelli, L. Montecchi
- Date: October 2024
- Tags: #cps
Abstract
Object detection in autonomous driving consists in perceiving and locating instances of objects in multi-dimensional data, such as images or LIDAR scans. Very recently, multiple works are proposing to evaluate object detectors by measuring their ability to detect the objects that are most likely to interfere with the driving task. Detectors are then ranked according to their ability to detect objects that are relevant, rather than the general accuracy of detection. However, there is little evidence so far that isolating the most relevant objects may contribute to improvements in the safety and effectiveness of the driving task. This paper defines and exercises a strategy to i) set-up and deploy object detectors that successfully extract knowledge on object relevance, and ii) use such knowledge to improve the trajectory planning task. We show that, given the output of an object detector, filtering objects based on their predicted relevance, in combination with the usual confidence threshold, improves the quality of trajectories produced by the downstream trajectory planner. We conclude the paper showing that information on object relevance should be further exploited and we sketch some directions for future work.
Published paper Dataset
Anomaly-based error and intrusion detection in tabular data: No DNN outperforms tree-based classifiers
- Published at: Future Generation Computer Systems
- Authors: T. Zoppi, S. Gazzini, A. Ceccarelli
- Date: November 2024
- Tags: #ids
Abstract
Recent years have seen a growing involvement of researchers and practitioners in crafting Deep Neural Networks (DNNs) that seem to outperform existing machine learning approaches for solving classification problems as anomaly-based error and intrusion detection. Undoubtedly, classifiers may be very diverse among themselves, and choosing one or another is typically due to the specific task and target system. Designing and training the optimal tabular data classifier requires extensive experimentation, sensitivity analyses, big datasets, and domain-specific knowledge that may not be available at will or considered a non-strategical asset by many companies and stakeholders. This paper compares, using a total of 23 public datasets: i) traditional (tree-based, statistical) supervised classifiers, ii) DNNs that are specifically designed for classifying tabular data, iii) DNNs for image classification that are applied to tabular data after converting data points into images, alone and as ensembles. Experimental results and related discussions show clear advantages in adopting tree-based classifiers for anomaly-based error and intrusion detection in tabular data as they outperform their competitors, including DNNs. Then, individual classifiers are compared against ensembles using different combinations of the classifiers considered in this study as base-learners, providing a unified final response through many meta-learning strategies. Results show that there is no benefit in building ensembles instead of using a tree-based classifier as Random Forests, eXtreme Gradient Boosting or Extra Trees. The paper concludes that anomaly-based error and intrusion detectors for critical systems should use the old (but gold) tree-based classifiers, which are also easier to fine-tune, and understand; plus, they require less time and resources to learn their model.
Published paper
GoSurf: Identifying Software Supply Chain Attack Vectors in Go
- Published at: ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED)
- Authors: Carmine Cesarano, Vivi Andersson, Roberto Natella, Martin Monperrus
- Date: November 2024
- Tags: #apt #static-analysis
Abstract
In Go, the widespread adoption of open-source software has led to a flourishing ecosystem of third-party dependencies, which are often integrated into critical systems. However, the reuse of dependencies introduces significant supply chain security risks, as a single compromised package can have cascading impacts. Existing supply chain attack taxonomies overlook language-specific features that can be exploited by attackers to hide malicious code. In this paper, we propose a novel taxonomy of 12 distinct attack vectors tailored for the Go language and its package lifecycle. Our taxonomy identifies patterns in which language-specific Go features, intended for benign purposes, can be misused to propagate malicious code stealthily through supply chains. Additionally, we introduce GoSurf, a static analysis tool that analyzes the attack surface of Go packages according to our proposed taxonomy. We evaluate GoSurf on a corpus of 500 widely used, real-world Go packages. Our work provides preliminary insights for securing the open-source software supply chain within the Go ecosystem, allowing developers and security analysts to prioritize code audit efforts and uncover hidden malicious behaviors.
Published paper Dataset
Laccolith: Hypervisor-Based Adversary Emulation With Anti-Detection
- Published at: IEEE Transactions on Dependable and Secure Computing
- Authors: Vittorio Orbinato, Marco Carlo Feliciano, Domenico Cotroneo, Roberto Natella
- Date: November 2024
- Tags: #apt
Abstract
Advanced Persistent Threats (APTs) represent the most threatening form of attack nowadays since they can stay undetected for a long time. Adversary emulation is a proactive approach for preparing against these attacks. However, adversary emulation tools lack the anti-detection abilities of APTs. We introduce Laccolith, a hypervisor-based solution for adversary emulation with anti-detection to fill this gap. We also present an experimental study to compare Laccolith with MITRE CALDERA, a state-of-the-art solution for adversary emulation, against five popular anti-virus products. We found that CALDERA cannot evade detection, limiting the realism of emulated attacks, even when combined with a state-of-the-art anti-detection framework. Our experiments show that Laccolith can hide its activities from all the tested anti-virus products, thus making it suitable for realistic emulations.
Published paper
Tiny Federated Learning with Blockchain for Privacy and Security Preservation of MCU-Based IoT Applications
- Published at: IEEE International Conference on Blockchain Computing and Applications (BCCA)
- Authors: G. P. Rimoli, B. Boi, P. Fusco, C. Esposito, M. Ficco
- Date: November 2024
- Tags: #tinyml #blockchain #iot #federated-learning
Abstract
In several Internet of Things (IoT) application contexts, such as autonomous vehicles, healthcare, and smart cities, massive amounts of data are produced at the edge and used in neural networks deployed in central servers or the cloud. On the other hand, physical or legal constraints may restrict the use of this data only locally. Thus, the development of secure and efficient traditional Machine Learning solutions in the IoT context can be a huge challenge. Therefore, this paper combines an approach based on Tiny Federated Learning and Transfer Learning with on-board training, as an effective paradigm to continuously analyze data locally without having to transfer sensitive data to untrusted servers and networks. Moreover, a decentralized blockchain-based federated learning framework is implemented to provide tamper-proof data protection and resistance to malicious or compromised tiny devices. A prototype is created based on the Hyperledger Fabric and real resource-constrained microcontrollers to assess the viability of the proposed solution.
Published paper
A Benchmark for DDoS Attacks Detection in Microservice Architectures
- Published at: IEEE International Conference on Computing, Networking and Communications (ICNC)
- Authors: M. Ficco; P. Fusco; A. Guerriero; R. Pietrantuono; M. Russo; F. Palmieri
- Date: February 2025
- Tags: #apt #ids
Abstract
Microservices have become increasingly popular in modern software architectures due to their scalability and flexibility. However, this architectural paradigm introduces unique security challenges, particularly in the detection and mitigation of cyberattacks. This paper presents a collection of datasets designed to benchmark and evaluate attack detection strategies in microservices applications. The datasets include normal and malicious traffic patterns simulating real-world scenarios and attacks, such as classic DDoS, Slow DDoS, Syn Flood, GET Flood. Data was collected from experiments with a popular benchmark microservice system with diverse services interacting via standard protocols and API gateways. Each entry is labeled to distinguish between benign and malicious activities, providing a robust foundation for training and evaluating machine learning models aimed at intrusion detection. In addition to raw data, the dataset includes metadata detailing the configuration of microservices, the nature of simulated attacks, and the temporal sequence of events. This level of detail ensures that researchers and practitioners can reproduce experiments and gain deeper insights into the behavior of attacks in microservice contexts. By offering these datasets, we aim to facilitate the development of advanced detection algorithms and promote more effective security measures in microservice environments.
Published paper Dataset
Evaluation of Systems Programming Exercises through Tailored Static Analysis
- Published at: 56th ACM Technical Symposium on Computer Science Education (SIGCSE)
- Authors: Roberto Natella
- Date: February 2025
- Tags: #static-analysis
Abstract
In large programming classes, it takes a significant effort from teachers to evaluate exercises and provide detailed feedback. In systems programming, test cases are not sufficient to assess exercises, since concurrency and resource management bugs are difficult to reproduce. This paper presents an experience report on static analysis for the automatic evaluation of systems programming exercises. We design systems programming assignments with static analysis rules that are tailored for each assignment, to provide detailed and accurate feedback. Our evaluation shows that static analysis can identify a significant number of erroneous submissions missed by test cases.
Published paper arXiv
Cross-Model Federated Learning-Based Network Traffic Classification
- Published at: Advanced Information Networking and Applications (AINA)
- Authors: Kainat Ibrar, Francesco Palmieri, Pietro Fusco, Massimo Ficco
- Date: April 2025
- Tags: #ids #federated-learning
Abstract
Network traffic classification (NTC) plays a pivotal role in areas such as service quality assurance, malicious activity detection, and lawful interceptions. However, the increasing complexity of network environments, amplified by diverse modalities of traffic data, poses significant challenges for conventional models. This study introduces a Federated Learning (FL) based framework to address cross-modal heterogeneity, where each client hosts data from distinct modalities. The approach overcomes key challenges by predicting a common feature space, learning modality-specific features, and aggregating diverse client parameters. Experimental results demonstrate the effectiveness of the proposed approach, achieving high accuracy, sensitivity, and F1-score across multiple configurations. Performance improves significantly by increasing training rounds, showcasing the framework’s capability to adapt and generalize across diverse data distributions. The study advances the understanding of real-world NTC scenarios, offering a scalable and privacy-preserving solution for cross-model heterogeneous FL environments.
Published paper
TinyML-Based Intrusion Detection System for Handling Class Imbalance in IoT-Edge Domain Using Siamese Neural Network on MCU
- Published at: Advanced Information Networking and Applications (AINA)
- Authors: Pietro Fusco, Alberto Montefusco, Gennaro Pio Rimoli, Francesco Palmieri, Massimo Ficco
- Date: April 2025
- Tags: #iot #ids #tinyml
Abstract
The widespread of Internet of Things (IoT) devices has introduced significant cyber security challenges, requiring robust and efficient Intrusion Detection Systems (IDSs) tailored for IoT-edge environments. In this context, on-board training has emerged as a valuable approach for enabling online learning of IoT-edge smart devices, useful for model refining with on-field data, as well as reducing concept drift and data privacy violations. On the other hand, collecting a large set of representative on-field attack samples could be very complex or infeasible, particularly in critical application domains. Moreover, the collected samples are often uneven data distribution, known as imbalanced datasets. Therefore, appropriate neural network models trainable with reduced and imbalanced datasets should be used. In this paper, a performance assessment of a Siamese Neural Network (SNN)-based IDS deployed on a tiny Microcontroller Unit (MCU) is presented. The detection system is trained using both a custom IoT dataset and the widely used TON_IoT dataset in order to assess its ability to detect anomalous traffic patterns indicative of IoT-edge attacks. Accuracy and latency are analyzed to ensure practical applicability. The results highlight that the SNN-based IDS achieves a high detection rate with limited and imbalanced training data, demonstrating its effectiveness in securing IoT-edge environments under resource constraints.
Published paper
Creation and Use of a Representative Dataset for Advanced Persistent Threats Detection
- Published at: International Conference on Computer Safety, Reliability, and Security (SAFECOMP)
- Authors: Tommaso Puccetti, Simona De Vivo, Davide Zhang, Pietro Liguori, Roberto Natella, Andrea Ceccarelli
- Date: September 2025
- Tags: #cps #ids #apt
Abstract
Cyber-physical systems are vulnerable to Advanced Persistent Threats (APTs), which exploit system vulnerabilities using stealthy, long-term attacks. Anomaly-based intrusion detection systems are a promising means to protect against APTs. Still, they depend on high-quality datasets, which often fail to represent APT complexity and the evolution of the attacker strategies through time. This paper proposes a methodology to create semi-synthetic, labeled datasets that represent the complex attack graphs of APTs in cyber-physical systems. To demonstrate our approach, we replicate publish/subscribe network traffic from a real testbed with realistic noise and multi-step APT attacks based on the MITRE ATT&CK framework. The dataset captures detailed APT stages and enables the evaluation of the intrusion detection systems that revolve around false positives and the time to detection.
Published paper Dataset