Publications

Federated and generative data sharing for data-driven security: Challenges and approach

Published at: IEEE 20th International Workshop on Assurance in Distributed Systems and Networks (ADSN)
Authors: R. Natella, A. Ceccarelli, M. Ficco
Date: 01 Sep 2022

Abstract

Modern cyber-attacks are evolving into Advanced Persistent Threats (APTs). They are attacks orchestrated by cybercriminals or state-sponsored groups, which perform carefully-planned, stealthy, targeted attacks that span over a long period of time. It is difficult to defend against APTs, mostly because the absence of high-quality data to build detectors and train personnel. In fact, new attacks are continuously crafted, and most organizations are unwilling to share data about attacks they have experienced. In this paper, we argue about an approach for the automatic generation of representative datasets of APTs, without forcing organizations to disclose their sensitive information. We propose to adopt the Federated Learning paradigm to train a Generative Machine Learning model, which will generate new traces of network and host events representative of real APT attacks. Blockchain-based strategies will overcome the typical shortcomings of a centralized approach, such as single-point-failure and malicious clients. The generated APT datasets can be leveraged for training and assessing APT detectors based on AI, and emulating attacks in live cyber-ranges exercises.

Published paper

Intrusion detection without attack knowledge: generating out-of-distribution tabular data

Published at: IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)
Authors: A. Ceccarelli, T. Zoppi
Date: 01 Oct 2023

Abstract

Anomaly-based intrusion detectors are machine learners trained to distinguish between normal and anomalous data. The normal data is generally easy to collect when building the train set; instead, collecting anomalous data requires historical data or penetration testing campaigns. Unfortunately, the first is most often unavailable or unusable, and the latter is usually expensive and unfeasible, as it requires hacking the target system. It turns out that the possibility of training an intrusion detector without attack knowledge, i.e., without anomalies, is attractive. This paper reviews strategies to train anomaly detectors in the absence of anomalies, from shallow machine learning to deep learning and computer vision approaches, and applies such strategies to the domain of intrusion detection. We experimentally show that training an intrusion detector without attack knowledge is effective when normal and attack data distributions are distinguishable. Detection performance severely drops in the case of complex (but more realistic) datasets, making all the existing solutions inadequate for real applications. However, the recent advancements of out-of-distribution research in deep learning and computer vision show interesting prospective results.

Published paper

AI Code Generators for Security: Friend or Foe?

Published at: IEEE Security & Privacy Magazine
Authors: R. Natella, P. Liguori, C. Improta. B. Cukic, D. Cotroneo
Date: 01 Feb 2024

Abstract

Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark.

Published paper arXiv Dataset

On Attacks (Dis)Similarities to Test Adversarial Defense: Can We Reduce the Attack Set?

Published at: ITASEC 2024
Authors: T. Puccetti, T. Zoppi, A. Ceccarelli
Date: 01 Apr 2024

Abstract

Published paper

Federated Learning for IoT devices: Enhancing TinyML with on-board training

Published at: Information Fusion
Authors: M. Ficco, A. Guerriero, E. Milite, F. Palmieri, R. Pietrantuono, S. Russo
Date: 01 Apr 2024

Abstract

The spread of the Internet of Things (IoT) involving an uncountable number of applications, combined with the rise of Machine Learning (ML), has enabled the rapid growth of pervasive and intelligent systems in a variety of domains, including healthcare, environment, railway transportation and Industry 4.0. While this opens up favorable scenarios, it also raises new challenges. The huge amount of data collected and processed by ML applications requires efficient and scalable solutions that contrast with the constrained capabilities of IoT devices as for memory, power consumption, processing and network bandwidth. The TinyML technologies foster the adoption of ML algorithms running locally on IoT devices. However, they typically foresee a remote training process (e.g., on cloud servers) combined with local inference – a strategy not always viable, e.g., for privacy and security issues.

We present a technique to enable the on-board training of ML algorithms on IoT devices, through the combination of federated learning (FL) and transfer learning (TL). We experimentally analyze it in classification and regression problems, comparing it to traditional FL solutions, as well as with a consolidated technique based on Tensorflow Lite. Results show that FL with TL reaches accuracy values better than FL without TL in both classification (86.48%) and regression (0.0201). These results are comparable with a model trained on the full dataset. We further analyze training and inference time and power consumption on various devices. Finally, we evaluate how the performance changes with unbalanced training datasets, showing that although they strongly impact accuracy, FL makes models more robust, letting them achieve accuracy comparable to when trained on balanced datasets.

Published paper

Securing an Application Layer Gateway: An Industrial Case Study

Published at: IEEE 19th European Dependable Computing Conference (EDCC)
Authors: C. Cesarano, R. Natella
Date: 01 Apr 2024

Abstract

Application Layer Gateways (ALGs) play a crucial role in securing critical systems, including railways, industrial automation, and defense applications, by segmenting networks at different levels of criticality. However, they require rigorous security testing to prevent software vulnerabilities, not only at the network level but also at the application layer (e.g., deep traffic inspection components). This paper presents a vulnerability-driven methodology for the comprehensive security testing of ALGs. We present the methodology in the context of an industrial case study in the railways domain, and a simulation-based testing environment to support the methodology.

arXiv

Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

Published at: 32nd IEEE/ACM International Conference on Program Comprehension (ICPC)
Authors: D. Cotroneo, C. Improta, P. Liguori, R. Natella
Date: 01 Apr 2024

Abstract

AI-based code generators have become pivotal in assisting developers in writing software starting from natural language (NL). However, they are trained on large amounts of data, often collected from unsanitized online sources (e.g., GitHub, HuggingFace). As a consequence, AI models become an easy target for data poisoning, i.e., an attack that injects malicious samples into the training data to generate vulnerable code.

To address this threat, this work investigates the security of AI code generators by devising a targeted data poisoning strategy. We poison the training data by injecting increasing amounts of code containing security vulnerabilities and assess the attack’s success on different state-of-the-art models for code generation. Our study shows that AI code generators are vulnerable to even a small amount of poison. Notably, the attack success strongly depends on the model architecture and poisoning rate, whereas it is not influenced by the type of vulnerabilities. Moreover, since the attack does not impact the correctness of code generated by pre-trained models, it is hard to detect. Lastly, our work offers practical insights into understanding and potentially mitigating this threat.

Published paper arXiv Dataset

ROSPaCe: Intrusion Detection Dataset for a ROS2-Based Cyber-Physical System and IoT Networks

Published at: Scientific Data, Vol. 11.1, Article no. 481
Authors: Tommaso Puccetti, Simone Nardi, Cosimo Cinquilli, Tommaso Zoppi, Andrea Ceccarelli
Date: 01 May 2024

Abstract

Most of the intrusion detection datasets to research machine learning-based intrusion detection systems (IDSs) are devoted to cyber-only systems, and they typically collect data from one architectural layer. Often the attacks are generated in dedicated attack sessions, without reproducing the realistic alternation and overlap of normal and attack actions. We present a dataset for intrusion detection by performing penetration testing on an embedded cyber-physical system built over Robot Operating System 2 (ROS2). Features are monitored from three architectural layers: the Linux operating system, the network, and the ROS2 services. The dataset is structured as a time series and describes the expected behavior of the system and its response to ROS2-specific attacks: it repeatedly alternates periods of attack-free operation with periods when a specific attack is being performed. This allows measuring the time to detect an attacker and the number of malicious activities performed before detection. Also, it allows training an intrusion detector to minimize both, by taking advantage of the numerous alternating periods of normal and attack operations.

Published paper Dataset

The Power of Words: Generating PowerShell Attacks from Natural Language

Published at: 18th USENIX WOOT Conference on Offensive Technologies (WOOT 24)
Authors: P. Liguori, C. Marescalco, R. Natella, V. Orbinato, L. Pianese
Date: 01 Aug 2024

Abstract

As the Windows OS stands out as one of the most targeted systems, the \textit{PowerShell} language has become a key tool for malicious actors and cybersecurity professionals (e.g., for penetration testing). This work explores an uncharted domain in AI code generation by automatically generating offensive PowerShell code from natural language descriptions using Neural Machine Translation (NMT). For training and evaluation purposes, we propose two novel datasets with PowerShell code samples, one with manually curated descriptions in natural language and another code-only dataset for reinforcing the training. We present an extensive evaluation of state-of-the-art NMT models and analyze the generated code both statically and dynamically. Results indicate that tuning NMT using our dataset is effective at generating offensive PowerShell code. Comparative analysis against the most widely used LLM service ChatGPT reveals the specialized strengths of our fine-tuned models.

Published paper arXiv Dataset

A Strategy for Predicting the Performance of Supervised and Unsupervised Tabular Data Classifiers

Published at: Data Science and Engineering
Authors: T. Zoppi, A. Ceccarelli, A. Bondavalli
Date: 01 Sep 2024

Abstract

Machine Learning algorithms that perform classification are increasingly been adopted in Information and Communication Technology (ICT) systems and infrastructures due to their capability to profile their expected behavior and detect anomalies due to ongoing errors or intrusions. Deploying a classifier for a given system requires conducting comparison and sensitivity analyses that are time-consuming, require domain expertise, and may even not achieve satisfactory classification performance, resulting in a waste of money and time for practitioners and stakeholders. This paper predicts the expected performance of classifiers without needing to select, craft, exercise, or compare them, requiring minimal expertise and machinery. Should classification performance be predicted worse than expectations, the users could focus on improving data quality and monitoring systems instead of wasting time in exercising classifiers, saving key time and money. The prediction strategy uses scores of feature rankers, which are processed by regressors to predict metrics such as Matthews Correlation Coefficient (MCC) and Area Under ROC-Curve (AUC) for quantifying classification performance. We validate our prediction strategy through a massive experimental analysis using up to 12 feature rankers that process features from 23 public datasets, creating additional variants in the process and exercising supervised and unsupervised classifiers. Our findings show that it is possible to predict the value of performance metrics for supervised or unsupervised classifiers with a mean average error (MAE) of residuals lower than 0.1 for many classification tasks. The predictors are publicly available in a Python library whose usage is straightforward and does not require domain-specific skill or expertise.

Published paper Dataset

Automating the correctness assessment of AI-generated code for security contexts

Published at: Journal of Systems and Software
Authors: D. Cotroneo, A. Foggia, C. Improta, P. Liguori, R. Natella
Date: 01 Oct 2024

Abstract

Evaluating the correctness of code generated by AI is a challenging open problem. In this paper, we propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation. We use ACCA to assess four state-of-the-art models trained to generate security-oriented assembly code and compare the results of the evaluation with different baseline solutions, including output similarity metrics, widely used in the field, and the well-known ChatGPT, the AI-powered language model developed by OpenAI. Our experiments show that our method outperforms the baseline solutions and assesses the correctness of the AI-generated code similar to the human-based evaluation, which is considered the ground truth for the assessment in the field. Moreover, ACCA has a very strong correlation with the human evaluation (Pearson’s correlation coefficient r=0.84 on average). Finally, since it is a fully automated solution that does not require any human intervention, the proposed method performs the assessment of every code snippet in ~0.17s on average, which is definitely lower than the average time required by human analysts to manually inspect the code, based on our experience.

Published paper arXiv Dataset

Enhancing robustness of AI offensive code generators via data augmentation

Published at: Empirical Software Engineering
Authors: C. Improta, P. Liguori, R. Natella, B. Cukic, D. Cotroneo
Date: 01 Oct 2024

Abstract

Since manually writing software exploits for offensive security is time-consuming and requires expert knowledge, AI-base code generators are an attractive solution to enhance security analysts’ productivity by automatically crafting exploits for security testing. However, the variability in the natural language and technical skills used to describe offensive code poses unique challenges to their robustness and applicability. In this work, we present a method to add perturbations to the code descriptions to create new inputs in natural language (NL) from well-intentioned developers that diverge from the original ones due to the use of new words or because they miss part of them. The goal is to analyze how and to what extent perturbations affect the performance of AI code generators in the context of offensive code. First, we show that perturbed descriptions preserve the semantics of the original, non-perturbed ones. Then, we use the method to assess the robustness of three state-of-the-art code generators against the newly perturbed inputs, showing that the performance of these AI-based solutions is highly affected by perturbations in the NL descriptions. To enhance their robustness, we use the method to perform data augmentation, i.e., to increase the variability and diversity of the NL descriptions in the training data, proving its effectiveness against both perturbed and non-perturbed code descriptions.

Published paper arXiv Dataset

Detection Latencies of Anomaly Detectors - An Overlooked Perspective?

Published at: IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)
Authors: T. Puccetti, A. Ceccarelli
Date: 01 Oct 2024

Abstract

The ever-evolving landscape of attacks, coupled with the growing complexity of ICT systems, makes crafting anomaly-based intrusion detectors and error detectors difficult: they must accurately detect attacks and promptly perform detections. Although improving and comparing the detection capability is the focus of most research works, the timeliness of the detection is less considered and often insufficiently evaluated or discussed. In this paper, we argue the relevance of measuring the temporal latency of attacks and errors, and we propose an evaluation approach for detectors to ensure a trade-off between correct and in-time detection. Briefly, the approach relates the false positive rate with the temporal latency of attacks and errors, ultimately leading to guidelines for configuring a detector. We apply our approach by evaluating different intrusion and error detectors in two industrial cases: i) an embedded railway on-board system that optimizes public mobility, and ii) an edge device for the Industrial Internet of Things. Our results show that considering latency in addition to traditional metrics like the false positive rate, precision, and coverage gives an additional fundamental perspective on the actual performance of the detector and should be considered when assessing and configuring anomaly detectors.

Published paper Dataset

Better and safer autonomous driving with predicted object relevance

Published at: IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW)
Authors: A. Ceccarelli, L. Montecchi
Date: 01 Oct 2024

Abstract

Object detection in autonomous driving consists in perceiving and locating instances of objects in multi-dimensional data, such as images or LIDAR scans. Very recently, multiple works are proposing to evaluate object detectors by measuring their ability to detect the objects that are most likely to interfere with the driving task. Detectors are then ranked according to their ability to detect objects that are relevant, rather than the general accuracy of detection. However, there is little evidence so far that isolating the most relevant objects may contribute to improvements in the safety and effectiveness of the driving task. This paper defines and exercises a strategy to i) set-up and deploy object detectors that successfully extract knowledge on object relevance, and ii) use such knowledge to improve the trajectory planning task. We show that, given the output of an object detector, filtering objects based on their predicted relevance, in combination with the usual confidence threshold, improves the quality of trajectories produced by the downstream trajectory planner. We conclude the paper showing that information on object relevance should be further exploited and we sketch some directions for future work.

Published paper Dataset

Anomaly-based error and intrusion detection in tabular data: No DNN outperforms tree-based classifiers

Published at: Future Generation Computer Systems
Authors: T. Zoppi, S. Gazzini, A. Ceccarelli
Date: 01 Nov 2024

Abstract

Recent years have seen a growing involvement of researchers and practitioners in crafting Deep Neural Networks (DNNs) that seem to outperform existing machine learning approaches for solving classification problems as anomaly-based error and intrusion detection. Undoubtedly, classifiers may be very diverse among themselves, and choosing one or another is typically due to the specific task and target system. Designing and training the optimal tabular data classifier requires extensive experimentation, sensitivity analyses, big datasets, and domain-specific knowledge that may not be available at will or considered a non-strategical asset by many companies and stakeholders. This paper compares, using a total of 23 public datasets: i) traditional (tree-based, statistical) supervised classifiers, ii) DNNs that are specifically designed for classifying tabular data, iii) DNNs for image classification that are applied to tabular data after converting data points into images, alone and as ensembles. Experimental results and related discussions show clear advantages in adopting tree-based classifiers for anomaly-based error and intrusion detection in tabular data as they outperform their competitors, including DNNs. Then, individual classifiers are compared against ensembles using different combinations of the classifiers considered in this study as base-learners, providing a unified final response through many meta-learning strategies. Results show that there is no benefit in building ensembles instead of using a tree-based classifier as Random Forests, eXtreme Gradient Boosting or Extra Trees. The paper concludes that anomaly-based error and intrusion detectors for critical systems should use the old (but gold) tree-based classifiers, which are also easier to fine-tune, and understand; plus, they require less time and resources to learn their model.

Published paper