If you are dealing with large static datasets in Python it can be useful to flyweight your objects. With flyweighting, every time you construct a new object you check to see if it already exists. If so, the original object will be returned instead of constructing a duplicate.
Recently I wrote a little bit of code to achieve this in the general case:
import weakref
class FlyweightedObject(object):
_pool = weakref.WeakValueDictionary()
def __new__(klass, *args):
if hasattr(klass.__init__, 'im_func'):
constructor = klass.__init__.im_func
arguments_missing = constructor.func_code.co_argcount - len(args) - 1
if arguments_missing > 0:
args += constructor.func_defaults[-arguments_missing:]
key = (klass,) + args
instance = klass._pool.get(key, None)
if instance is None:
instance = object.__new__(klass)
klass._pool[key] = instance
return instanceNow when you inherit FlyweightedObject you get the flyweighting thrown in for free:
class Person(FlyweightedObject):
def __init__(self, age, name = 'Simon'):
self.age = age
self.name = name
f = Person(1)
g = Person(1, 'Simon')
print id(f) == id(g)
# => TrueOne thing to watch out for is changing attributes after the object has been constructed. This will lead to the flyweight pool keys and the objects themselves going out of sync:
f = Person(1, 'Dave')
f.name = 'Simon'
g = Person(1, 'Simon')
print id(f) == id(g)
# => FalseOne last thing. To get pickle working with flyweighted objects you’ll have to create a __getnewargs__ method which returns the tuple that will be passed to __new__ on unpickling:
class Person(FlyweightedObject):
def __init__(self, age, name = 'Simon'):
self.name = name
self.age = age
def __getnewargs__(self):
return self.age, self.nameThis can also be automated as long as the instance variables are named correctly:
def __getnewargs__(self):
if hasattr(self.__class__.__init__, 'im_func'):
constructor = self.__class__.__init__.im_func
return tuple(getattr(self, attr) for attr in constructor.func_code.co_varnames[1:])
return tuple()
If you were previously using the Haml syntax:
%p= 'string ', method, ' string'you will need to change it to:
%p= ['string ', method, ' string']to avoid compilation errors in production mode.
I just got clued in by Elf Sternberg’s Blog to a really useful feature in Python that I have never heard about before, slots:
class Foo(object):
__slots__ = ['x']
def __init__(self, n):
self.x = nFrom the Python reference manual:
“By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.
The default can be overridden by defining slots in a new-style class definition. The slots declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because dict is not created for each instance.”
At work we have a plans engine that loads millions of little Python objects into memory, so this is a great little optimisation for us.
Finally, to get a set of every slot attribute in the object hierarcy, you can add this method to your class:
def inherited_slots(self):
return set(slot for klass in self.__class__.__mro__ if hasattr(klass, '__slots__') for slot in klass.__slots__)
Here’s a useful snippet of JavaScript I use for inserting arguments into a format string:
String.format = function()
{
var replacements = arguments;
return arguments[0].replace(/\{(\d+)\}/gm, function(string, match) {
return replacements[parseInt(match) + 1];
});
}And here it is in action:
String.format('http://www.google.com/search?q={0}', escape(searchTerm))
To convert the unicodeString José into Jose in python:
import unicodedata
unicodedata.normalize('NFKD', unicodeString).encode('ASCii', 'ignore')
When I try to use raw_input on my Ubuntu machine in Jython 2.2.1 on Java 1.6.0_07 I get an empty IOError when executing a python file, but not when using a python shell:
-- raw_input.py --
raw_input()
>> jython raw_input.py
Traceback (innermost last):
File "raw_input.py", line 1, in ?
IOError:
-- console --
>>> raw_input()
hello
'hello'To get round this I just use sys.stdin.readline().strip() to read a line from stdin and remove the trailing newline character.
A common gotcha when using regular expressions occurs when using the default (greedy) qualifiers + ? and *. These qualifiers will attempt to make the longest match they possibly can.
The regular expression:
/'(.*)'/will successfully match and group the word there in hello 'there', but will actually run on to the last single quote in the string hello 'there' how are 'you', matching there' how are 'you.
One solution is to restrict the set of characters you are searching for with the greedy qualifier, thus ensuring the match will finish before hitting the terminating character:
/'([^']*)'/This works, but the more readable option is to turn the greedy qualifier into a reluctant one:
/'(.*?)'/By adding a ? to the qualifier the expression will try to match the minimum string that satisfies the expression.
Looking around, it seems that there is no easy way to stop Java from eating all your system resources when running a particularly heavy-going task.
Thankfully my lovely colleague Ben made me aware of a helpful UNIX command called nice.
By prefixing nice to any command you can ask the scheduler to be a bit more kind, running the process at a slightly lower priority to ensure it doesn’t starve other resources of CPU time:
nice java ExpensiveTask
Most methods of progressive enhancement involve setting various DOM elements to display:none and then making them visible from JavaScript, or building new document nodes and inserting them.
In the former case you end up with lots of brittle snippets of code for traversing the DOM and toggling elements. In the later case you often end up having to write the same markup twice: Once in your web application, and once in the JavaScript which enhances your code.
I recently read a blog post by James Padolsey that suggested creating comment nodes with HTML inside and then promoting their contents to real nodes within their parent elements. Neat.
The comments for this article on Hacker News quickly moved to performance however: As you can’t natively scour the DOM for comments, it is necessary to iterate over every DOM element from JavaScript – checking its type and manipulating it if applicable. An expensive business.
If performance really is an issue, but you like the idea of baking your HTML directly into the page as comments, you can replace the DOM traversal with an atomic regular expression replace and an innerHTML assignment. Both of these operations occur in the underlying DOM implementation and are run at native speeds:
document.body.innerHTML = document.body.innerHTML.replace(/(?:<!--\[enhance\]>)|(?:<!\[enhance\]-->)/g, '');Wrapped up into a tidy rails helper, you can use this to create blocked out elements which will only render if the user has JavaScript enabled:
def enhancement(&block)
concat("<!--[enhance]>#{capture(&block)}<![enhance]-->")
end<% enhancement do %>You have JavaScript enabled<% end %>Obviously it’s not quite as flexible as the DOM-based solution — especially as James seems to be branching out into a fully-blown templating system called JSHTML — but I think it works as a speedy way to encapsulate JavaScript-only functionality.
One-liner to remote copy a MySQL database over SSH:
mysqldump [db] | ssh -C [host] 'mysql [db]'For anything more complicated there is also taps.