A Myhill-Nerode Theorem for Register Automata and Symbolic Trace Languages

F.W. Vaandrager and A. Midya. A Myhill-Nerode Theorem for Register Automata and Symbolic Trace Languages. In Theoretical Computer Science, 2022. Earlier version appeared in Violet Ka I Pun, Volker Stolz, Adenilso Simao, editors. Theoretical Aspects of Computing - ICTAC 2020 - 17th International Colloquium, Macau, China, November 30 - December 4, 2020, Proceedings. Lecture Notes in Computer Science 12545, pages 43-63, Springer 2020. Also available as CoRR arXiv:2007.03540, July 2020.


We propose a new symbolic trace semantics for register automata (extended finite state machines) which records both the sequence of input symbols that occur during a run as well as the constraints on input parameters that are imposed by this run. Our main result is a generalization of the classical Myhill-Nerode theorem to this symbolic setting. Whereas the Myhill-Nerode theorem refers to a single equivalence relation on words, and constructs a DFA in which states are equivalence classes, our generalization requires the use of three relations to capture the additional structure of register automata. Location equivalence captures that symbolic traces end in the same location, transition equivalence captures that they share the same final transition, and a partial equivalence relation captures that symbolic values v and v' are stored in the same register after symbolic traces w and w', respectively. A symbolic language is defined to be regular if location, transition and register relations exist that satisfy certain conditions, in particular, they all have finite index. We show that the symbolic language associated to a register automaton is regular, and we construct, for each regular symbolic language, a register automaton that accepts this language. Our result provides a foundation for grey-box learning algorithms in settings where the constraints on data parameters can be extracted from code using e.g. tools for symbolic/concolic execution or tainting. We believe that moving to a grey-box setting is essential to overcome the scalability problems of state-of-the-art black-box learning algorithms.

Journal version (website publisher)
Conference version (website publisher)
Paper (local copy)
Slides (pdf)
Presentation (mp4)