Datasets

Assembly Code Correctness Assessment

Authors: D. Cotroneo, A. Foggia, C. Improta, P. Liguori, R. Natella
Date: 01 Oct 2024
Paper: Automating the correctness assessment of AI-generated code for security contexts
Published at: Journal of Systems and Software

Additional information

This repository contains the code and the experimental results related to the paper Automating the correctness assessment of AI-generated code for security contexts.

The paper presents ACCA, a fully automated method to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation, demonstrating a very strong correlation with human-based evaluation, which is considered the ground truth for the assessment in the field.

PoisonPy

Authors: D. Cotroneo, C. Improta, P. Liguori, R. Natella
Date: 01 Apr 2024
Paper: Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks
Published at: 32nd IEEE/ACM International Conference on Program Comprehension (ICPC)

Dataset Published paper arXiv

Additional information

This dataset has been designed to perform a targeted data poisoning attack on AI code generators, leading them to generate vulnerable code. Each sample consists of a piece of Python code, and the corresponding description in natural language (English).

The dataset contains 823 unique pairs of code description–Python code snippet, including both safe and unsafe (i.e., containing vulnerable functions or bad patterns) code snippets.

The detailed organization of the dataset is described in the README.md file.

To build the dataset, we combined the only two available (at the time) benchmark datasets for evaluating the security of AI-generated code, SecurityEval and LLMSecEval. Both corpora are built from different sources, including CodeQL and SonarSource documentation and MITRE’s CWE.

PoisonPy covers a total of 34 CWEs from the OWASP Top 10 categorization, 12 of which fall into MITRE’s Top 40.

In the paper, we used the dataset to assess the susceptibility of three AI code generators (Seq2Seq, CodeBERT, CodeT5+) to our targeted data poisoning attack.

PowerShell Offensive Code Generation

Authors: P. Liguori, C. Marescalco, R. Natella, V. Orbinato, L. Pianese
Date: 01 Aug 2024
Paper: The Power of Words: Generating PowerShell Attacks from Natural Language
Published at: 18th USENIX WOOT Conference on Offensive Technologies (WOOT 24)

Dataset Published paper arXiv

Additional information

This repo provides a replication package for the paper The Power of Words: Generating PowerShell Attacks from Natural Language, presented at the 18th USENIX WOOT Conference on Offensive Technologies (WOOT 2024).

PowerShell Offensive Code Generation

In this paper, we present an extensive evaluation of state-of-the-art NMT models in generating PowerShell offensive commands.

We also contribute with a large collection of unlabeled samples of general-purpose PowerShell code to pre-train NMT models to refine their capabilities to comprehend and generate PowerShell code. Then we build a manually annotated labelled dataset consisting of PowerShell code samples specifically crafted for security applications which we pair with curated Natural language descriptions in English.

We use this dataset to pre-train and fine-tune:

CodeT5+
CodeGPT
CodeGen

We also evaluate the model with:

Static Analysis in which the generated code is assessed to ensure that it adheres to PowerShell programming conventions
Execution Analysis which evaluates the capabilities of the generated offensive PowerShell code in executing malicious action

The project includes scripts and data to repeat the training/testing experiments and replicate evaluations.

Robustness of AI Code Generators

Authors: C. Improta, P. Liguori, R. Natella, B. Cukic, D. Cotroneo
Date: 01 Oct 2024
Paper: Enhancing robustness of AI offensive code generators via data augmentation
Published at: Empirical Software Engineering

Dataset Published paper arXiv

Additional information

This repository contains the code, the dataset and the experimental results related to the paper Enhancing Robustness of AI Offensive Code Generators via Data Augmentation.

The paper presents a data augmentation method to perturb the natural language (NL) code descriptions used to prompt AI-based code generators and automatically generate offensive code. This method is used to create new code descriptions that are semantically equivalent to the original ones, and then to assess the robustness of 3 state-of-the-art code generators against unseen inputs. Finally, the perturbation method is used to perform data augmentation, i.e., increase the diversity of the NL descriptions in the training data, to enhance the models’ performance against both perturbed and non-perturbed inputs.

Robustness of AI Code Generators

This repository contains:

Extended Shellcode IA32, the assembly dataset used for the experiments, which we developed by extending the publicly available Shellcode IA32 dataset for automatically generating shellcodes from NL descriptions. This extended version contains 5,900 unique pairs of assembly code snippets/English intents, including 1,374 intents (~23% of the dataset) that generate multiple lines of assembly code (e.g., whole functions).
The source code to replicate the injection of perturbations by performing word substitutions or word omissions on the NL code descriptions (code folder). This folder also contains a README.md file detailing how to set up the project, how to change the dataset if needed, and how to run the code.
The results we obtained by feeding the perturbed code descriptions to the AI models, i.e., Seq2Seq, CodeBERT and CodeT5+ (paper results folder). This folder also contains the evaluation of the models’ performance on single-line vs. multi-line code snippets and the results of a survey we conducted to manually assess the semantic equivalence of perturbed NL descriptions to their original counterpart.

Detection Latencies of Anomaly Detectors

Authors: T. Puccetti, A. Ceccarelli
Date: 01 Oct 2024
Paper: Detection Latencies of Anomaly Detectors - An Overlooked Perspective?
Published at: IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)

Dataset Published paper

Additional information

This repository contains the code the paper: Detection Latencies of Anomaly Detectors: An Overlooked Perspective?

The repository contains two public datasets: ROSPaCe, and Arancino. The first is a dataset for intrusion detection composed by monitoring an embedded system on normal behavior and under attack. The second is specific for error detection and it represents an embedded system in an Internet of Things setting. Both datasets are time series-based and comprise a set of sequences of variable length.

Object detection in autonomous driving

Authors: A. Ceccarelli, L. Montecchi
Date: 01 Oct 2024
Paper: Better and safer autonomous driving with predicted object relevance
Published at: IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW)

Dataset Published paper

Additional information

This repository contains code for object detection in autonomous driving using object relevance.

Feature RAnkers to Predict classification PerformancE of binary classifiers

Authors: T. Zoppi, A. Ceccarelli, A. Bondavalli
Date: 01 Sep 2024
Paper: A Strategy for Predicting the Performance of Supervised and Unsupervised Tabular Data Classifiers
Published at: Data Science and Engineering

Dataset Published paper

Additional information

This repository contains FRAPPE, a Python library that exercises Feature RAnkers to Predict classification PerformancE of binary classifiers.

Violent Python

Authors: R. Natella, P. Liguori, C. Improta. B. Cukic, D. Cotroneo
Date: 01 Feb 2024
Paper: AI Code Generators for Security: Friend or Foe?
Published at: IEEE Security & Privacy Magazine

Dataset Published paper arXiv

Additional information

This dataset has been designed for training and evaluating AI code generators for security. Each sample consists of a piece of Python code, and the corresponding description in natural language (English).

We built the dataset by using the popular book Violent Python, by T. J. O’Connor, which presents several examples of offensive programs using the Python language. The dataset covers multiple areas of offensive security, including penetration testing, forensic analysis, network traffic analysis, and OSINT and social engineering.

The dataset consists of 1,372 unique samples. We describe offensive code in natural language at granularity of individual lines, of groups of lines (blocks), and of entire functions.

In the paper, we used this dataset to experiment with three AI code generators (CodeBERT, Github Copilot, Amazon CodeWhisperer) at generating offensive Python code.

ROSPaCe

Authors: Tommaso Puccetti, Simone Nardi, Cosimo Cinquilli, Tommaso Zoppi, Andrea Ceccarelli
Date: 01 May 2024
Paper: ROSPaCe: Intrusion Detection Dataset for a ROS2-Based Cyber-Physical System and IoT Networks
Published at: Scientific Data, Vol. 11.1, Article no. 481

Dataset Published paper

Additional information

ROSPaCe is a dataset for intrusion detection composed by performing penetration testing on SPaCe, an embedded cyber-physical system built over Robot Operating System 2 (ROS2). Features are monitored from three architectural layers: the Linux operating system, the network, and the ROS2 services. We perform attacks through the execution of discovery and DoS attacks, for a total of 6 attacks, with 3 of them specific to ROS2. We collect data from the network interfaces, the operative system, and ROS2, and we merge the observations in a unique dataset using the timestamp. We label each data point indicating if it is recorded during the normal (attack-free) operation, or while the system is under attack. The dataset is organized as a time series in which we alternate sequences of normal (attack-free) operations, and sequences when attacks are carried out in addition to the normal operations. The goal of this strategy is to reproduce multiple scenarios of an attacker trying to penetrate the system. Noteworthy, this allows measuring the time to detect an attacker and the number of malicious activities performed before detection. Also, it allows training an intrusion detector to minimize both, by taking advantage of the numerous alternating periods of normal and attack operations. The final version of ROSPaCe includes 30 247 050 data points and 482 columns excluding the label. The features are 25 from the Linux operating system, 5 from the ROS2 services, and 422 from the network. The dataset is encoded in the complete_dataset.csv file for a total of 40.5 GB. The dataset contains about 23 million attack data points and above 6.5 million normal data points (78% attacks, 22% normal). We provide a lightweight version of the ROSpace dataset by selecting the best-performing 60 features. This includes the 30 features from the Linux operating system, the ROS2 services, and the 30 best-performing features from the network.