An SMT-LIB Format for Sequences and Regular Expressions

An SMT-LIB Format for Sequences and Regular Expressions

Nikolaj Bjørner, Vijay Ganesh, Raphaël Michel and Margus Veanes, An SMT-LIB Format for Sequences and Regular Expressions, 10th International Workshop on Satisfiability Modulo Theories, (Affiliated with IJCAR 2012), Manchester, UK, June 30-July 1, 2012

Strings are ubiquitous in software. Tools for verification and testing of software rely in various degrees on reasoning about strings. Web applications are particularly important in this context since they tend to be string-heavy and have large number security errors attributable to improper string sanitzation and manipulations. In recent years, many string solvers have been implemented to address the analysis needs of verification, testing and security tools aimed at string-heavy applications. These solvers support a basic representation of strings, functions such as concatenation, extraction, and predicates such as equality and membership in regular expressions. However, the syntax and semantics supported by the current crop of string solvers are mutually incompatible. Hence, there is an acute need for a standardized theory of strings (i.e., SMT-LIBization of a theory of strings) that supports a core set of functions, predicates and string representations.
This paper presents a proposal for exactly such a standardization effort, i.e., an SMT-LIBization of strings and regular expressions. It introduces a theory of sequences general-izing strings, and builds a theory of regular expressions on top of sequences. The proposed logic QF BVRE is designed to capture a common substrate among existing tools for string constraint solving.