Sunday, 3 March 2013

Save and restore SHA-512 inner state

Say you want to compute a digest on very long input, so large that you your laptop might get switched off, or it comes in batches with long breaks, or you want to use several machines (sequentially) or you want to be able to restart the program e.g. to update it to new version. Or maybe your input has a very long head and several tails, e.g. it's a tree and you want to reuse hash computed over head. Or maybe you want to save partial hash in the database

And of course you want to compute hash fast, so you want to use CPython's built-in implementation written in C, or in this case the one that uses OpenSSL.

Normally CPython's hashlib hash objects don't offer you a way to save their state, they are not picklable, and internals are not accessible from Python.

With ctypes, of course everything is possible:

#!/usr/bin/python
""" save and restore sha512 inner state
    supports 32-bit and 64-bit architectures
    tested on CPython 2.6 and 2.7
    TODO does not take endian into account
    TODO assumes Python compiled with OpenSSL
"""
from hashlib import sha512
import ctypes
import binascii

POFFSET = 6
STATESIZE = 216


def save(obj):
    """return inner state of sha512 `obj` as raw string"""
    #assert isinstance(obj, sha512)
    datap = ctypes.cast(ctypes.cast(id(obj),
                                    ctypes.POINTER(ctypes.c_voidp))[POFFSET],
                        ctypes.POINTER(ctypes.c_char))
    assert datap

    return datap[:STATESIZE]


def restore(data):
    """create new sha512 object with inner state from `data`, str/bytes or iterable"""
    new = sha512()
    datap = ctypes.cast(ctypes.cast(id(new),
                                    ctypes.POINTER(ctypes.c_voidp))[POFFSET],
                        ctypes.POINTER(ctypes.c_char))
    assert datap
    assert datap[:8] == '\x08\xc9\xbc\xf3g\xe6\tj'  # first sha512 word

    for i, byte in enumerate(data):
        assert i < STATESIZE
        datap[i] = byte
    assert i + 1 == STATESIZE

    return new


savehex = lambda o: binascii.b2a_hex(save(o))
restorehex = lambda d: restore(binascii.a2b_hex(d))


if __name__ == "__main__":
    # different data lengths
    testdata = ["", "abcd" * 256, "o" * 13, "y" * 256]
    real = sha512()
    for test in testdata:
        real.update(test)
        # invariant x == restore(save(x))
        assert real.digest() == restore(save(real)).digest()
        assert real.hexdigest() == restorehex(savehex(real)).hexdigest()



Of course I'm not the first person to consider this: [e.g. java]

Friday, 8 February 2013

hex vs oct vs bin -- unusual exception text


Somehow Python builtin bin raises same error type but different text than hex or oct:

In [1]: hex("foo")
TypeError: hex() argument can't be converted to hex

In [2]: oct("foo")
TypeError: oct() argument can't be converted to oct

In [3]: bin("foo")
TypeError: 'str' object cannot be interpreted as an index


What gives?

Wednesday, 16 January 2013

First look at FoundationDB, part 1, intro

What's so special about it:
  • transaction and ACID compliance
  • clustering without unnecessary sacrifices
  • targets SSD storage from get-go
  • built on futures and contracts
  • awesome Python interface
Notable limitations:
  • transactions are limited to 10MB modified key space
  • transactions are limited to 5 seconds lifetime
The latter gets ridiculous pretty fast, foundationdb own tools hit 1 second transaction warning:

 air:~ dima$ fdbbackup status  
 fdb WARNING: long transaction (2.05451s elapsed in transactional function 'getStatus' (0 retries, committed))  
 No previous backups found.  
 WARNING: No backup agents appear to be running.  

Super-simple use:

 In [76]: db["a1231"] = "a"  
 In [78]: db[:]  
 Out[78]: [a1231: a]  

Tuesday, 8 January 2013

Python evaluation stack

"""
valuestack.py: Demo reading values from Python value stack
Offsets are derived for CPython-2.7.2, 64-bit, production build

air:~ dima$ python valuestack.py
return 0.0911498238164
caught 0.0911498238164
"""
import ctypes, inspect, random


def id2obj(i):
    """convert CPython `id` back to object ref, by temporary pointer swap"""
    tmp = None,
    try:
        ctypes.cast(id(tmp), ctypes.POINTER(ctypes.c_ulong))[3] = i
        return tmp[0]
    finally:
        ctypes.cast(id(tmp), ctypes.POINTER(ctypes.c_ulong))[3] = id(None)


def introspect():
    """pointer on top of value stack is id of the object about to be returned
    FIXME adjust for sum(args, locals, etc) in the introspected function
    """
    fr = inspect.stack()[1][0]
    print "caught", id2obj(ctypes.cast(id(fr), ctypes.POINTER(ctypes.c_ulong))[47])


def value():
    tmp = random.random()
    print "return", tmp
    return tmp


def foo():
    try:
        return value()
    finally:
        introspect()

if __name__ == "__main__":
    foo()

Tuesday, 1 January 2013

Numpy and Matplotlib on GAE

Google App Engine supports numpy and matplotlib since 13/14 December 2012.

Unfortunately matplotlib doesn't work out of the box locally, but that's relatively easy to fix:
  • copy matplotlib data into the application
  • add matplotlibrc into the application
  • point matplotlib to use both via environment variables
  • monkey-patch subprocess module, it's mutilated in gae
  • don't use local files
My demo, available at http://gae-matplotlib-demo.appspot.com/, although not entirely perfect should be enough to get anyone started, html "screenshot" below:


#!/usr/bin/python
import logging, os, StringIO, cgi
try: import webapp2
except: pass  # when ran directly from command line
import subprocess
def no_popen(*args, **kwargs): raise OSError("forbjudet")
subprocess.Popen = no_popen  # not allowed in GAE, missing from module
subprocess.PIPE = None
subprocess.STDOUT = None
os.environ["MATPLOTLIBDATA"] = os.getcwdu()  # own matplotlib data
os.environ["MPLCONFIGDIR"] = os.getcwdu()    # own matplotlibrc
import numpy, matplotlib, matplotlib.pyplot as plt

def dynamic_png():
    try:
        plt.title("Dynamic PNG")
        for i in range(5): plt.plot(sorted(numpy.random.randn(25)))
        rv = StringIO.StringIO()
        plt.savefig(rv, format="png")
        plt.clf()
        return """<img src="data:image/png;base64,%s"/>""" % rv.getvalue().encode("base64").strip()
    finally:
        plt.clf()


def dynamic_svg():
    try:
        plt.title("Dynamic SVG")
        for i in range(5): plt.plot(sorted(numpy.random.randn(25)))
        rv = StringIO.StringIO()
        plt.savefig(rv, format="svg")
        return rv.getvalue().partition("-->")[-1]
    finally:
        plt.clf()

try: dynamic_png()  # crashes first time because it can't cache fonts
except: logging.exception("don't about it")

if __name__ != "__main__":
    class MainHandler(webapp2.RequestHandler):
        def get(self):
            self.response.write("""<html><head><title>Numpy and Matplotlib on Google App Engine</title></head><body>""")
            self.response.write(dynamic_png())
            self.response.write(dynamic_svg())
            self.response.write("<pre>%s</pre>" % cgi.escape(file(__file__.rstrip("c")).read()))  # source, dev runs .py, gae runs .pyc
            self.response.write("""</body> </html>""")

    app = webapp2.WSGIApplication([
        ('/', MainHandler)
    ], debug=True)

Sunday, 9 December 2012

See the Earth from space from the Earth

Recently I've seen a few too many videos, courtesy of Lekko Naukowo, in which astronauts recall their first glance of the Earth from space, in short, many describe feeling of unity with the Earth, awe and even transcendence.

Now going to space just to see what we see every day seems a bit excessive, indeed even Space Ship One, probably the most economical form of transport into almost space burns over 2 tons of fuel and oxydiser to get 3 people, a pilot and 2 passengers, to below low orbit for only 24 minutes, so I figure, why not get the same view from the Earth, live, with your own eyes, without anything digital in between.

All we need to do is launch a convex mirror into orbit and point a telescope at it.



Then sell tickets to view the Earth to general public to recoup costs.

Let's see how large of a mirror and telescope we'd need. Assume a round image 1000 "pixels" across, where pixel is resolving limit of the system.

Seeing from the Earth is not so great, but adaptive optics allow to go way beyond. For example the upcoming upgrade to largest telescope at Palomar, the Hale telescope, promises to reach 16 milliarcseconds resolving accuracy.*


The cheapest is to put the mirror in low earth orbit, for example 160km above sea level, the mirror would have to be only 12.5m in diameter.

Of course, the reflected image of the Earth from such a small distance will be very distorted, after all the diameter of the Earth is 12 thousand kilometers, which is huge compared to 160 km above the sea level, in other words, only small part of the Earth will be visible. How about geostationary orbit then? At 35,786km above sea level, the mirror would have to be almost 3km in diameter! But wait, the optical power of the telescope is huge, the space mirror does not have to solid, it can very well be sparse, in other words a thousand-by-thousand round matrix of small mirrors.

How large of a telescope you'd need? Roughly 10m in diameter to cover most visible light spectrum. With a primary mirror this large, a filter will be needed to reduce the brightness and not burn the general public's eye, and such telescope is very very expensive, but then again so is sending a large mirror into space! Furthermore most, if not all, visible light telescopes are idle during the day, and that's exactly when reflection of the Earth is visible, as in, the viewable part of the Earth gets sunlight.

Thursday, 6 December 2012

Drop GRUB, use SYSLINUX

Syslinux used to be de facto booting standard if you had a fat partition, it was simple and effective.

Since version 4, Syslinux integrates Extlinux, which allows a pc to boot from ext2/3/4 and some other filesystems, as well as both MBR and GPT partition tables.

Meanwhile grub moved to grub2 which meant introduction of scheme-like language for the config file, which not only meant increase of config file size from 2~5 lines to over a 100, it also made the any manual changes exceedingly error-prone.

The conclusion is simple, drop bloated GRUB, use SYSLINUX, it's simple and effective.