Monday, January 23, 2012

MySQL: Python, Meta-Programming, and Interceptors

I recently found Todd's posts on interceptors which allow callbacks (called interceptors) to be registered with the connector so that you can intercept a statement execution, commit, or any of the many extension points supported by Connector/Java. This is a language feature that allow you to implement a number of new features without having to change the application code such as load-balancing policies, profiling queries or transactions, or debugging an application.

Since Python is a dynamic language, it is easy to add interceptors to any method in Connector/Python, without having to extend the connector with specific code. This is something that is possible in dynamic languages such as Python, Perl, JavaScript, and even some lesser known languages such as Lua and Self. In this post, I will describe how and also give an introduction to some of the (in my view) more powerful features of Python.

In order to create an interceptor, you need to be able to do these things:

  • Catch an existing method in a class and replace it with a new one.
  • Call the original function, if necessary.
  • For extra points: catch an existing method in an object and replace a new one.
You will in this post see how all three of these problems are solved in Python. You will see and use decorators to be able to define methods in existing classes and object, and closures to be able to call the original version of the methods. By picking this approach, it will not be necessary to change the implementation: in fact, you can use this code to replace any method in any class, not only in Connector/Python.

Table 1. Attributes for methods
Method Instance
Name Unbound Bound
__name__ Name of Method
im_func "Inner" function of the method
im_self None Class instance for the method
im_class Class that the method belongs to
In addition to being able to replace methods in the class, we would also like to be able to replace methods in instances of a class ("objects" in the traditional sense). This is useful to create specialized objects, for example for tracking particular cases where a method is used.

In order to understand how the replacement works, you should understand that in Python (and the dynamic languages mentioned above), all objects can have attributes, including classes, functions, and a bunch of other esoteric constructions. Each type of object has a set of pre-defined attributes with well-defined meaning. For classes (and class instances), methods are stored as attributes of the class (or class instance) and can therefore be replaced with other methods that you build dynamically. However, it requires some tinkering to take an existing "normal" function definition and "imbue" it with whatever "tincture" that makes it behave as a method of the class or class instance.

Depending on where the method comes from, it can be either unbound and bound. Unbound methods are roughly equivalent to member function pointers in C++: they reference a function, but not the instance. In contrast, bound methods have an instance tied to it, so when you call them, they already know what instance they belong to and will use it. Methods have a set of attributes, of which the four in Table 1 interests us. If a method is fetched from a class (to be precise, from a class object), it will be unbound and im_self will be None. If the method is fetched from a class instance, it will be bound and im_self will be set to the instance it belongs to. These attributes are all the "tincture" you need make our own instance methods. The code for doing the replacement described above is simply:

import functools, types

def replace_method(orig, func):
    functools.update_wrapper(func, orig.im_func)
    new = types.MethodType(func, orig.im_self, orig.im_class)
    obj = orig.im_self or orig.im_class
    setattr(obj, orig.__name__, new)
The function uses two standard modules to make the job simpler, but the steps are:
  1. Copy the meta-information from the original method function to the new function using update_wrapper. This copies the name, module information, and documentation from the original method function to make it look like the original method.
  2. Create a new method instance from the method information of the original method using the constructor MethodType, but replace the "inner" function with the new function.
  3. Install the new instance method in the class or instance by replacing the attribute denoting the original method with the new method. Depending on whether the function is given a bound or unbound instance, either the method in the class or in the instance is replaced.
Using this function you can now replace a method in a class like this:
from mysql.connector import MySQLCursor

def my_execute(self, operation, params=None):
  ...

replace_method(MySQLCursor.execute, my_execute)
This is already pretty useful, but note that you can also replace only a specific instance as well by using replace_method(cursor.execute, my_execute). It was not necessary to change anything inside Connector/Python to intercept a method there, so you can actually apply this to any method in any of the classes in Connector/Python that you already have available. In order to make it even easier to use you'll see how to define a decorator that will install the function in the correct place at the same time as it is defined. The code for defining a decorator and an example usage is:
import functools, types
from mysql.connector import MySQLCursor

def intercept(orig):
    def wrap(func):
        functools.update_wrapper(func, orig.im_func)
        meth = types.MethodType(func, orig.im_self, orig.im_class)
        obj = orig.im_self or orig.im_class
        setattr(obj, orig.__name__, meth)
        return func
    return wrap

# Define a function using the decorator
@intercept(MySQLCursor.execute)
def my_execute(self, operation, params=None):
  ...
The @intercept line before the definition of my_execute is where the new descriptor is used. The syntax is a shorthand that can be used to do some things with the function when defining it. It behaves as if the following code had been executed:
def _temporary(self, operation, params=None):
  ...
my_execute = intercept(MySQLCursor.execute)(_temporary)
As you can see here, whatever is given after the @ is used as a function and called with the function-being-defined as argument. This explains why the wrap function is returned from the decorator (it will be called with a reference to the function that is being defined), and also why the original function is returned from the wrap function (the result will be assigned to the function name).

Using a statement interceptor, you can catch the execution of statements and do some special magic on them. In our case, let's define an interceptor to catch the execution of a statement and log the result using the standard logging module. If you read the wrap function carefully, you probably noted that it uses a closure to access the value of orig when the decorator was called, not the value it happen to have when the wrap function is executed. This feature is very useful since a closure can also be used to get access to the original execute function and call it from within the new function. So, to intercept an execute call and log information about the statement using the logging module, you could use code like this:

from mysql.connector import MySQLCursor
original_execute = MySQLCursor.execute
@intercept(MySQLCursor.execute)
def my_execute(self, operation, params=None):
    if params is not None:
        stmt = operation % self._process_params(params)
    else:
        stmt = operation
    result = original_execute(self, operation, params)
    logging.debug("Executed '%s', rowcount: %d", stmt, self.rowcount)
    logging.debug("Columns: %s", ', '. join(c[0] for c in self.description))
    return result
Now with this, you could implement your own caching layer to, for example, do a memcached lookup before sending the statement to the server for execution. I leave this as an exercises to the reader, or maybe I'll show you in a later post. &smiley; Implementing a lifecycle interceptor is similar, only that you replace, for example, the commit or rollback calls. However, implementing an exception interceptor is not obvious. Catching the exception is straightforward and can be done using the intercept decorator:
original_init = ProgrammingError.__init__
@intercept(ProgrammingError.__init__)
def catch_error(self, msg, errno):
    logging.debug("This statement didn't work: '%s', errno: %d", msg, errno)
    original_init(self, msg, errno=errno)
However, in order to do something more interesting, such as asking for some additional information from the database, it is necessary to either get hold of the cursor that was used to execute the query, or at least the connection. It is possible to dig through the interpreter stack, or try to override one of the internal methods that Connector/Python uses, but since that is very dependent on the implementation, I will not present that in this post. It would be good if the cursor is passed down to the exception constructor, but this requires some changes to the connector code.

Even though I have been programming in dynamic languages for decades (literally) it always amaze me how easy it is to accomplish things in these languages. If you are interested in playing around with this code, you can always fetch Connector/Python on Launchpad and try out the examples above. Some links and other assorted references related to this post are: