Tuesday, 28 October 2014

Software is perishable

Software is like fresh produce:
If you don't use it right away, [you will] throw it away!
Take it as advice (do throw it away) or a prediction (you will), but that's just what it is.

Some times I come across 6-year old code in active respository; out of curiosity, I run (git|svn) blame. Most likely whoever wrote that (line|block) is not with the company any longer. Typically there are no unit tests, or only superficial. At times code is dead.

Perhaps it's merely an elaboration on YAGNI, but here's my personal motto. If a feature is clearly needed, well-defined and yet due for deployment 6 months from no, don't write it! A lot can change during this time.

Likewise, useful lifespan of most code(*) is measured in a few years.

A year-old code is suspect.

Two-year old code got to be rubbish.

If you strongly believe that given blob is worth keeping, then you won't mind refactoring it — reducing the scope (some requirements are outdated), expanding it (surely this new feature could be done more elegantly, if only library supported it), providing new unit tests, etc... In other words, make a choice — either you own it and work on it, or you throw it away.

(*) frequently used libraries are not "most code", they are a minority amongst all the useless, rarely used and leaf (non-library) projects.

Tuesday, 28 January 2014

Ulimit with Python

In [41]: subprocess.Popen(("/bin/ls", "-a"),
                                                       (-1, -1)),
Out[41]: ('.\n..\n', None)

In [42]: subprocess.Popen(("/bin/ls", "-a"),
                                                       (0, 0)),
Out[42]: ('', None)

Wednesday, 4 December 2013

Passing exceptions around

Passing exceptions between threads

Let's consider a typical client-server paradigm, thread 1 performs some work for thread 2, the result is typically a value, but sometimes an exception is raised.

Let's make thread 2 pretend action was performed synchronously.

If an exception is raised, user wants to see full stack, that is frames from thread 1 and thread 2.
# thread 1
def do_thing_quickly():
        shared.result = ...
    except Exception:
        shared.exc_info = sys.exc_info()

# thread 2
def do_thing_especially_quickly():
    # ask thread 1 to do thing quickly
    # wait for thread 1 to finish
    if shared.exc_info:
        raise shared.exc_info[0], shared.exc_info[1], shared.exc_info[2]

Logging saved exceptions

In [2]: import sys, logging

In [5]: try: map(int, "1234ස!")                             
except: exc_info = sys.exc_info()

In [7]: logging.error("fracking toasters", exc_info=exc_info)
ERROR:root:fracking toasters
Traceback (most recent call last):
  File "<ipython-input-5-ca8ed6e70829>", line 1, in <module>
    try: map(int, "234a")
ValueError: invalid literal for int() with base 10: '\xe0'

Thursday, 14 November 2013

PyPy on Android

Here are the results for my Galaxy S4
Running in chroot jail with rasbian libraries

Friday, 14 June 2013

Debugging Python with almost no slowdown

pdb-clone promises to run your code with very little slowdown.

Currently you have to install from source, but it's a breeze.

Hopefully you can pip it too soon.

Sunday, 2 June 2013

Pycircle Poznan Talk

I present my first Python talk today.

[03.06.2013] Kiedy Python nie wystarcza

Bywają momenty, kiedy zależy nam najbardziej na wydajności pisanego przez nas oprogramowania i skupienie się na samej funkcjonalności jest niewystarczające. Dlatego następne spotkanie będzie poświęcone różnym technikom optymalizacji programów pisanych w Pythonie. Omówione będą takie narzędzia i biblioteki jak Psyco, NumPy, ctypes, CFFI, Cython czy PyPy. Spotkanie będzie poprowadzone w języku angielskim. Zapraszamy wszystkich serdecznie! Czas i miejsce to samo co zwykle, czyli o 17:15 w sali D1.

That's CS building, Morasko campus, UAM in Poznan at 5 o'clock.

There will be a presentation / live coding and discussion.

I'll post the resulting code on github afterwards.

Sunday, 3 March 2013

Save and restore SHA-512 inner state

Say you want to compute a digest on very long input, so large that you your laptop might get switched off, or it comes in batches with long breaks, or you want to use several machines (sequentially) or you want to be able to restart the program e.g. to update it to new version. Or maybe your input has a very long head and several tails, e.g. it's a tree and you want to reuse hash computed over head. Or maybe you want to save partial hash in the database

And of course you want to compute hash fast, so you want to use CPython's built-in implementation written in C, or in this case the one that uses OpenSSL.

Normally CPython's hashlib hash objects don't offer you a way to save their state, they are not picklable, and internals are not accessible from Python.

With ctypes, of course everything is possible:

""" save and restore sha512 inner state
    supports 32-bit and 64-bit architectures
    tested on CPython 2.6 and 2.7
    TODO does not take endian into account
    TODO assumes Python compiled with OpenSSL
from hashlib import sha512
import ctypes
import binascii


def save(obj):
    """return inner state of sha512 `obj` as raw string"""
    #assert isinstance(obj, sha512)
    datap = ctypes.cast(ctypes.cast(id(obj),
    assert datap

    return datap[:STATESIZE]

def restore(data):
    """create new sha512 object with inner state from `data`, str/bytes or iterable"""
    new = sha512()
    datap = ctypes.cast(ctypes.cast(id(new),
    assert datap
    assert datap[:8] == '\x08\xc9\xbc\xf3g\xe6\tj'  # first sha512 word

    for i, byte in enumerate(data):
        assert i < STATESIZE
        datap[i] = byte
    assert i + 1 == STATESIZE

    return new

savehex = lambda o: binascii.b2a_hex(save(o))
restorehex = lambda d: restore(binascii.a2b_hex(d))

if __name__ == "__main__":
    # different data lengths
    testdata = ["", "abcd" * 256, "o" * 13, "y" * 256]
    real = sha512()
    for test in testdata:
        # invariant x == restore(save(x))
        assert real.digest() == restore(save(real)).digest()
        assert real.hexdigest() == restorehex(savehex(real)).hexdigest()

Of course I'm not the first person to consider this: [e.g. java]