YAPF Bug Caused By a Python Library Bug

Python logo YAPF is an automatic formatter developed by Google and used by many projects including the one I’m currently working. The usage of automatic formatters is a controversial topic, with both pros and cons. However, this post is not about that. While working on some Python code, I encountered a YAPF crash for completely normal, albeit a bit unusual, Python code that defines a tuple with typing hints.

A simplified version of the code causing the crash is as follows:

t: tuple = 1, 2

Checking Validity of The Bug

Crashes are an unacceptable behavior for any tool, especially for input that is not erroneous. Nevertheless, it is crucial to double-check the correctness of the input before reporting or conducting further investigation.

The first obvious step in this case is of course to attempt to execute the code using the Python interpreter. As expected, this did not cause any issues.

Small manipulations with the code, like simply adding parentheses around 1, 2 or removing type hint could be a workaround to make YAPF work as expected.

a: tuple = (1, 2)
b = 1, 2

I also double-checked on python.org, and this is indeed a valid way to define a tuple in Python.

First Look At The Crash And YAPF Source Code

Here is the complete output of YAPF crash obtained on my laptop:

Traceback (most recent call last):
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/yapflib/pytree_utils.py", line 119, in ParseCodeToTree
    tree = parser_driver.parse_string(code, debug=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib2to3/pgen2/driver.py", line 103, in parse_string
    return self.parse_tokens(tokens, debug)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib2to3/pgen2/driver.py", line 71, in parse_tokens
    if p.addtoken(type, value, (prefix, start)):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib2to3/pgen2/parse.py", line 162, in addtoken
    raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=12, value=',', context=('', (1, 12))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/yapflib/yapf_api.py", line 183, in FormatCode
    tree = pytree_utils.ParseCodeToTree(unformatted_source)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/yapflib/pytree_utils.py", line 125, in ParseCodeToTree
    tree = parser_driver.parse_string(code, debug=False)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib2to3/pgen2/driver.py", line 103, in parse_string
    return self.parse_tokens(tokens, debug)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib2to3/pgen2/driver.py", line 71, in parse_tokens
    if p.addtoken(type, value, (prefix, start)):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib2to3/pgen2/parse.py", line 162, in addtoken
    raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=12, value=',', context=('', (1, 12))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/__init__.py", line 225, in _FormatFile
    reformatted_code, encoding, has_change = yapf_api.FormatFile(
                                             ^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/yapflib/yapf_api.py", line 96, in FormatFile
    reformatted_source, changed = FormatCode(
                                  ^^^^^^^^^^^
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/yapflib/yapf_api.py", line 186, in FormatCode
    raise errors.YapfError(errors.FormatErrorMsg(e))
                           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/yapflib/errors.py", line 37, in FormatErrorMsg
    return '{}:{}:{}: {}'.format(e.args[1][0], e.args[1][1], e.args[1][2], e.msg)
                                 ~~~~~~^^^
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/<omitted>/bin/yapf", line 33, in <module>
    sys.exit(load_entry_point('yapf==0.32.0', 'console_scripts', 'yapf')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/__init__.py", line 360, in run_main
    sys.exit(main(sys.argv))
             ^^^^^^^^^^^^^^
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/__init__.py", line 124, in main
    changed = FormatFiles(
              ^^^^^^^^^^^^
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/__init__.py", line 202, in FormatFiles
    changed |= _FormatFile(filename, lines, style_config, no_local_style,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/__init__.py", line 236, in _FormatFile
    raise errors.YapfError(errors.FormatErrorMsg(e))
                           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<omitted>/yapf/0.32.0/libexec/lib/python3.11/site-packages/yapf/yapflib/errors.py", line 37, in FormatErrorMsg
    return '{}:{}:{}: {}'.format(e.args[1][0], e.args[1][1], e.args[1][2], e.msg)
                                 ~~~~~~^^^
IndexError: tuple index out of range

A similar, but shorter crash dump can be obtained by adding a simple unit test to the YAPF repository like this:

diff --git a/yapftests/reformatter_python3_test.py b/yapftests/reformatter_python3_test.py
index b5d68e8..d1453ca 100644
--- a/yapftests/reformatter_python3_test.py
+++ b/yapftests/reformatter_python3_test.py
@@ -468,6 +468,13 @@ def rrrrrrrrrrrrrrrrrrrrrr(
     llines = yapf_test_helper.ParseAndUnwrap(unformatted_code)
     self.assertCodeEqual(expected_formatted_code, reformatter.Reformat(llines))
 
+  def testTypedTuple(self):
+    code = """\
+t: tuple = 1, 2
+"""
+    llines = yapf_test_helper.ParseAndUnwrap(code)
+    self.assertCodeEqual(code, reformatter.Reformat(llines))
 
 if __name__ == '__main__':
   unittest.main()

From the stack trace, it is obvious that the exception is raised by the Python standard lib2to3 library. YAPF appears to use this library to build the syntax tree of the input source code.

Reproducing The Bug Without YAPF

The next logical step would be to try to reproduce the bug by calling lib2to3 directly, without the additional functionality of YAPF. In this case, it was quite simple and it only took a few minutes to get a relatively small reproducer for the issue:

from lib2to3.pgen2 import driver
from lib2to3 import pygram
from lib2to3 import pytree

parser_driver = driver.Driver(pygram.python_grammar, convert=pytree.convert)
code = """\
a: tuple = 1, 2
"""
tree = parser_driver.parse_string(code)

The most relevant part of the stack trace is exactly the same as the one observed using YAPF:

Traceback (most recent call last):
  File "/Users/evgenii/Projects/yulyugin.github.io/test.py", line 9, in <module>
    tree = parser_driver.parse_string(code)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/lib2to3/pgen2/driver.py", line 103, in parse_string
    return self.parse_tokens(tokens, debug)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/lib2to3/pgen2/driver.py", line 71, in parse_tokens
    if p.addtoken(type, value, (prefix, start)):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/lib2to3/pgen2/parse.py", line 162, in addtoken
    raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=12, value=',', context=('', (1, 12))

Studying lib2to3 implementation

I downloaded Python source code and changed my setup to use lib2to3 from the repository instead of installation.

It was simple to discover that lib2to3 gets its grammar description from the Grammar.txt file, which appears to use BNF-like notation to describe the syntax of Python 2.x and 3.x.

However, I have halted my investigation at this point as I received the following warning from the development version:

DeprecationWarning: lib2to3 package is deprecated and may not be able to parse Python 3.10+

It does not make sense to spend time fixing a deprecated product, so I am concluding my work and reporting the issue to the YAPF team. Perhaps I will return to this in the future if I will want to practice some grammar specification.

Written on February 10, 2023.