dictnotes.txt talks about sparse dictionaries for a different reason, check if spareseness is in fact implemented in the newest code.
tradeoff between memory (sparse dicts) and time (dense dicts) is not always obvious. perhaps some dicts could be targeted for extra sparseness live?
locals() is not really a dict since python 1.something.
python string object caches own hash value already, this could be improved though, rarely executed code could recompute hashes faster than it takes to load extra 4 or 8 bytes from memory and reduce cache pressure, there's probably little reason to store precomputed hash in pyc though.
This post will track the status of my attempts to understand and hopefully optimize name lookups in python.
Consider the following contrived example
aglobal = ""
def outer():
def inner():
alocal = 3
li = [str(acomprehension) for acomprehension in range(aclosure+alocal)]
return aglobal.join(li)
aclosure = 5
return inner()
Now, consider what name lookup happens inside invocation of inner():
- alocal and acomprehension: locals() -> hit
- aclosure: locals() -> miss, closure -> hit
- aglobal: locals()-> miss, closure -> miss, globals() -> hit
- str: locals() -> miss, closure -> miss, globals() -> miss, builtins -> hit
# builtins hash table (using python syntax rather than C)
idx = hash(name) % len(hashtable)
hashtable = [
[("int", {pointer to int})], #
[], # some entries will be empty
[("float", {pointer to float})], #
[("str", {pointer to str})], # what we are looking for
[("unicode", {ptr}), ("True", {ptr})] # example collision
]
items = hashtable(idx)
for item in items:
if item[0] == name: return item[1] # actual string comparison
raise NameError("name '%s' is not defined" % name)
The larger the nesting (locals, number of closures, globals, builtins), the more important it become not to find a name in a hashtable, but to decide quickly that the name is not in the table.
Therefore the intermediatory talbes be sparse like this:
hashtable = [
[("int", {pointer to int})], #
[], # many entries will be empty
[], #
[], #
[], #
[("float", {pointer to float})], #
[], #
[("str", {pointer to str})], # what we are looking for
[], #
[("unicode", {ptr})], #
[], #
[], #
[("True", {ptr})] # no or few collision
]
Moreover hot hashtables (those that are looked up in frequently) should be instrumented with counters for hits, early misses, and late misses (requiring full string comparison).

0 comments:
Post a Comment