Consider a block of code that generates a run-time error:
#!/usr/bin/env python import sys, os from sqlalchemy import create_engine import pandas chromosomes = range(1,23)+['X','Y'] # Create a database connection with sqlalchemy db = create_engine(os.environ.get('DATABASE')) # Get data directly from a 'mutation' table in the database data = pandas.read_sql("SELECT chr, start_pos, end_pos FROM mutation", db) # Try to process each chromosome for chr in chromosomes: for entry in data[data.chr==chr]: ## Throws an error for the 'X' and 'Y' chromosomes
How would you go about debugging it?
For me, the process usually works something like this:
|Oh, I know!||I know exactly what & where the error is.||Fix and continue.||Used '=' instead of '==' for an if comparison.|
|Sort of sure.||I can point the blame at a few variables or lines or code.||Try some print statements.||Inconsistent sex chromosome encoding (23 vs. X/Y).|
|No idea...||I don't know and need to interact with the program to find out.||Time for a debugger.||Unexpected values like 'MT' as a chromosome.|
Most coders are familiar with using print statements to check variable values, but this has limitations, especially for long running code or complicated data structures. A better approach is to use a debugger to set breakpoints at the suspected error-causing lines. When breakpoints are encountered, running code stops and gives you an interactive shell (like ipython) where you can query variable values directly or test code changes. Python has several debuggers, but the one I prefer is ipdb, which can be installed and used like this:
On the command line:
pip install ipdb
In your code (one the line you want a breakpoint):
import ipdb; ipdb.set_trace()
Now here's a question:
When would you ever not want launch a debugger when your code encounters an exception?
After all, wouldn't it be nice to simply start debugging when that occurs instead of having to add print statements or breakpoints on the offending line before rerunning the code? Well, one answer is that the code could be executing in a production environment where interactive shells aren't available or even desired, in which case the preferred outcome is to write the error to a log file for later debugging. Well, fine. But can't python distinguish production from development environments?
Fortunately, it can. One example can be found here. In short, this code snippet will hook into python's exception handling system to launch a debugger when an interactive tty interface is available (i.e. you are running in a terminal).
This code works for the python debugger, pdb, a more bland version of ipdb. Here are some modifications to get it working with ipdb:
sitecustomize.py (created somewhere like /usr/lib/python[version]/site-packages/ or in PYTHONPATH)
# code snippet, to be included in 'sitecustomize.py' import sys def info(type, value, tb): if hasattr(sys, 'ps1') or not sys.stderr.isatty(): # we are in interactive mode or we don't have a tty-like # device, so we call the default hook sys.__excepthook__(type, value, tb) else: import traceback, ipdb # we are NOT in interactive mode, print the exception... traceback.print_exception(type, value, tb) print # ...then start the debugger in post-mortem mode. ipdb.pm() sys.excepthook = info
Try it out! Hope this makes your debugging tasks a little bit easier!
Update 7/7/2016: If your script redirects output (STDOUT) to a place other than the terminal, this won't work! This is because ipdb output will be directed to the file as well!