Title: ALGORITHM FOR PREDICTING VULNERABILITIES IN SOFTWARE CODE USING TRANSFORMERS |
Author: Pozdniakova Mariia Olehivna |
Abstract: Keeping code safe grows harder each release cycle. As repositories sprawl and continuous integration shortens review windows, hidden buffer mishaps or logic flips slip past human eyes. This article gathers, cross-checks, and re-weights results from a dozen peer-reviewed studies that fine-tuned CodeBERT, GraphCodeBERT, VulCoBERT and allied transformers against widely-used benchmarks such as Devign, CWE-119, and LineVul. The design is simple enough to bolt into a pipeline yet expressive enough to surface data-flow anomalies that elude purely lexical models. Pooled statistics indicate that, relative to classical static analyzers, transformer-based detectors raise mean F1 by seven percentage points and chop false positives by roughly a quarter, though variance widens on cross-project splits. Our prototype, re-implemented from open assets, mirrors those numbers within two decimal places-close enough for engineering choice. We further show, through a small ablation borrowed wholesale from prior papers, that numeric-literal embeddings matter more than previously assumed, hinting at subtle type inference cues. Energy cost? About two GPU minutes per thousand functions, that is acceptable for nightly builds. Oddly, memory footprint balloons when comments are kept, suggesting a quick hygiene win for practitioners. By blending conceptual synthesis with re-run experiments, the paper offers a ready map for teams who must protect mixed-language stacks without budget for large-scale labeling. Limitations remain: C and C++ dominate the evidence base, and we cannot yet guarantee zero-day coverage. Even so, the direction is clear-transformers, properly wired, tilt the odds toward safer software. Future directions include distilling the network, benchmarking on Rust and Go, and surfacing contextual hints inside popular code editors like VSCode. |
Keywords: Software vulnerability prediction, transformer models, abstract syntax tree fusion, static code analysis, secure software engineering. |
PDF Download |