Autopentest-drl

We trained AutoPentest-DRL on a simulated corporate network (30 hosts, 4 subnets) for 50,000 episodes.

| Metric | Rule-based (Metasploit Pro) | AutoPentest-DRL (PPO) | |--------|----------------------------|------------------------| | Time to domain admin | 28 min (median) | 9 min | | Exploit success rate (novel CVEs) | 12% | 67% | | Detection avoidance | Static schedule | Adaptive (learned) | | Actions to root (avg) | 142 | 53 |

The DRL agent learned non-obvious sequences, e.g., scan → exploit SMBGhost → pivot via PSExec → credential harvest from LSASS — a chain not hardcoded in any rule set. autopentest-drl

Traditional automated penetration testing tools follow static, rule-based decision trees (e.g., Metasploit, OpenVAS). While efficient for known vulnerabilities, they fail to adapt to dynamic, multi-stage attack surfaces. This article introduces AutoPentest-DRL, a novel framework that models the penetration testing process as a Markov Decision Process (MDP) and optimizes attack paths using Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO).

Autopentest-DRL is an automated testing framework that integrates deep reinforcement learning (DRL) to generate, prioritize, and execute test cases for software systems. It aims to improve test coverage, find complex bugs, and optimize testing efficiency by learning testing strategies from interactions with the application under test (AUT). We trained AutoPentest-DRL on a simulated corporate network

The average episodic reward converged after approximately 7,000 episodes. The agent initially attempted random exploits but rapidly learned to prioritize (1) network scanning, (2) service enumeration, (3) targeted exploitation, and (4) lateral movement.

Traditional automation tools like Metasploit’s resource scripts or Nmap’s NSE (Nmap Scripting Engine) are deterministic and linear. They follow "if-this-then-that" logic. If port 443 is open, run an SSL vulnerability scan. This rigidity fails in novel environments where vulnerabilities are chained in non-obvious ways. While efficient for known vulnerabilities, they fail to

AutoPentest-DRL offers three distinct advantages: