You're viewing all posts tagged with python

Flyweighting in Python Redux

I just expanded the Flyweighted Object class to support keyword arguments:

import weakref
import cPickle

class FlyweightedObject(object):
    _pool = weakref.WeakValueDictionary()

    def __new__(klass, *args, **kwargs):
        if not hasattr(klass.__init__, 'im_func'): raise 'cannot flyweight an object which has no python constructor'

        arguments = {}

        constructor = klass.__init__.im_func
        arguments_missing = constructor.func_code.co_argcount - len(args) - 1

        if arguments_missing > 0:
            args += constructor.func_defaults[-arguments_missing:]

        varnames = constructor.func_code.co_varnames[1:]

        for i in range(len(varnames)):
            varname = varnames[i]
            arguments[varname] = kwargs.get(varname, args[i])

        key = cPickle.dumps((klass, arguments))
        instance = klass._pool.get(key, None)

        if instance is None:
            instance = object.__new__(klass)
            klass._pool[key] = instance

        return instance

    def __getnewargs__(self):
        if hasattr(self.__class__.__init__, 'im_func'):
            constructor = self.__class__.__init__.im_func
            return tuple(getattr(self, attr) for attr in constructor.func_code.co_varnames[1:])
        return tuple()

Flyweighting in Python

If you are dealing with large static datasets in Python it can be useful to flyweight your objects. With flyweighting, every time you construct a new object you check to see if it already exists. If so, the original object will be returned instead of constructing a duplicate.

Recently I wrote a little bit of code to achieve this in the general case:

import weakref

class FlyweightedObject(object):
    _pool = weakref.WeakValueDictionary()
    
    def __new__(klass, *args):
        if hasattr(klass.__init__, 'im_func'):
            constructor = klass.__init__.im_func
            arguments_missing = constructor.func_code.co_argcount - len(args) - 1
            if arguments_missing > 0:
                args += constructor.func_defaults[-arguments_missing:]

        key = (klass,) + args
        instance = klass._pool.get(key, None)

        if instance is None:
            instance = object.__new__(klass)
            klass._pool[key] = instance
            
        return instance

Now when you inherit FlyweightedObject you get the flyweighting thrown in for free:

class Person(FlyweightedObject):
    def __init__(self, age, name = 'Simon'):
        self.age = age
        self.name = name

f = Person(1)
g = Person(1, 'Simon')

print id(f) == id(g)
# => True

One thing to watch out for is changing attributes after the object has been constructed. This will lead to the flyweight pool keys and the objects themselves going out of sync:

f = Person(1, 'Dave')
f.name = 'Simon'  

g = Person(1, 'Simon')

print id(f) == id(g)
# => False

One last thing. To get pickle working with flyweighted objects you’ll have to create a __getnewargs__ method which returns the tuple that will be passed to __new__ on unpickling:

class Person(FlyweightedObject):
    def __init__(self, age, name = 'Simon'):
        self.name = name
        self.age = age

    def __getnewargs__(self):
        return self.age, self.name

This can also be automated as long as the instance variables are named correctly:

def __getnewargs__(self):
    if hasattr(self.__class__.__init__, 'im_func'):
        constructor = self.__class__.__init__.im_func
        return tuple(getattr(self, attr) for attr in constructor.func_code.co_varnames[1:])
    return tuple()

Python Slots

I just got clued in by Elf Sternberg’s Blog to a really useful feature in Python that I have never heard about before, slots:

class Foo(object):
    __slots__ = ['x']
    def __init__(self, n):
        self.x = n

From the Python reference manual:

“By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.

The default can be overridden by defining slots in a new-style class definition. The slots declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because dict is not created for each instance.”

At work we have a plans engine that loads millions of little Python objects into memory, so this is a great little optimisation for us.

Finally, to get a set of every slot attribute in the object hierarcy, you can add this method to your class:

def inherited_slots(self):
        return set(slot for klass in self.__class__.__mro__ if hasattr(klass, '__slots__') for slot in klass.__slots__)

Converting ASCII Into Unicode In Python

To convert the unicodeString José into Jose in python:

import unicodedata

unicodedata.normalize('NFKD', unicodeString).encode('ASCii', 'ignore')

IO Error From raw_input() in Jython

When I try to use raw_input on my Ubuntu machine in Jython 2.2.1 on Java 1.6.0_07 I get an empty IOError when executing a python file, but not when using a python shell:

-- raw_input.py --
raw_input()

>> jython raw_input.py 
Traceback (innermost last):
  File "raw_input.py", line 1, in ?
IOError:

-- console --
>>> raw_input()
hello
'hello'

To get round this I just use sys.stdin.readline().strip() to read a line from stdin and remove the trailing newline character.

Reflection and Introspection Over Modules and Packages In Python

Someone Twitter me if I’m missing something, but I couldn’t find a core way of doing reflection over packages in Python.
In this particular case, I wanted a way to load all the modules in a certain package (a directory with an __init__.py) and automatically add them into a running Twisted service.

To get this working, I created a small module called reflection:

import os
import sys
import re

'turns a lower case string with underscores into its camel case equivalent'
def camelize(string):
    return re.sub(r"(?:^|_)(.)", lambda x: x.group(0)[-1].upper(), string)

'returns a list of modules objects in the package identified by package_name'
def dir_modules(package_name):
    modules = []
    load_package(modules.append, package_name)
    return modules

'load the specified class from module, or return the default class derived from module.__name__ if no class_name is specified'
def load_class(module, class_name = None):
    return getattr(__import__(module) if module.__class__ == str else module, class_name or camelize(module.__name__.split('.')[-1]))

def load_package(function, package_name):
    os.path.walk(package_name, load_modules, function)

def load_modules(function, package_name, module_names):
    for module_name in module_names:
        if re.match(r"^(?!__)\w*\.py$", module_name):
            qualified_module_name = '%s.%s' % (package_name, module_name[0:-3])
            __import__(qualified_module_name)
            function(sys.modules[qualified_module_name])

My Twisted application then uses this code in its initializer:

xmlrpc.XMLRPC.__init__(self)
xmlrpc.addIntrospection(self)
for module in reflection.dir_modules('services'):
    klass = reflection.load_class(module)
    if issubclass(klass, xmlrpc.XMLRPC):
        print 'adding xmlrpc sub handler from %s' % module.__name__
        self.putSubHandler(module.__name__.split('.')[-1], klass())
    elif issubclass(klass, internet.TimerService):
        print 'initalizing timer service from %s' % module.__name__
        instance = klass(Application.Config[module.__name__.split('.')[-1]]['interval'])
        instance.setServiceParent(Service.Application)
        instance.startService()

Now you can drop new modules into the services directory and, provided they are named correctly, they will be auto-loaded into the application on restart.

Killing Python: Exiting Without Using SystemExit

Usually in python you exit a script programmatically by raising SystemExit or calling sys.exit. Both methods send an exception hurtling up the call stack, allowing every level of your program to execute finally statements and exit cleanly.

This behaviour changes if you throw SystemExit in a multithreaded application: it kills the calling thread instead. If the calling thread is not the main thread the application will just keep ticking along. Fine in most cases, but sometimes you just need to make the application quit.

To achieve this, you need to call:

os._exit()

with an error code of your choice.

Creating Static Methods In Python

Static methods can be a little confusing if you come to Python from other languages in which they are first class citizens. To create a static method you need to pass an existing method through staticmethod():

class Person:
  people = {}
  def __init__(self, name):
    self.name = name
    Person.people[self.name] = self

  def find_by_name(name):
    return Person.people.get(name)

  find_by_name = staticmethod(find_by_name)

This will do all the hokey Python magic to (I assume) bind the method correctly into the Class’s lookup __dict__.

Thankfully, newer versions of Python — including 2.5.1, the version which ships with Leopard — allow you to use the slightly more aesthetically pleasing decorator shortcut:

class Person:
  people = {}
  def __init__(self, name):
    self.name = name
    Person.people[self.name] = self

  @staticmethod
  def find_by_name(name):
    return Person.people.get(name)

Absolute Paths From Relative Paths In Bash

There doesn’t seem to be a nice, cross-platform way of deriving the absolute path from a relative one in Bash.
(`readline -f .` works in Linux, but doesn’t seem to work in OS X).

Here is a small program written in Python which does exactly that:

#!/usr/bin/env python
import os, sys
if len(sys.argv) < 2:
    print 'usage: %s <PATHS>' % sys.argv[0]
    raise SystemExit(1)
directory = os.path.dirname(sys.argv[1])
if len(sys.argv) > 2:
    directory = os.path.join(directory, *sys.argv[2:])
print os.path.abspath(directory)

Paste it into a file and chmod a+x it and you have yourself a handy utility:

~/Projects $ abspath .
/Users/simon/Projects

~/Projects $ abspath ../
/Users/simon

I use it in a script called server that starts various web-services. By starting the script with:

#!/bin/sh
directory=`abspath $0 ..`
cd $directory

...

you can now run it from anywhere on the file system. The script will cd itself into the web service’s working directory before kicking things off.