Monday, 28 November 2011

Best flash file system

A while ago I wanted to get most theoretical flash lifespan out of the following stack

app / sqlite / file system / linux / consumer flash

Background here, to summarize, writes smaller than erase page size put just as much strain on the flash as exactly one erase page. Typical erase page size for 1GB~4GB consumer flash is 64KB or 128KB.

Modern flashes implement some form of wear leveling, yet no consumer flash manufacturer specifies what algorithm they use or whether it works over entire flash or several logical blocks of flash. Some specs are available for industrial flashes, but those are pricey and, it seems, fall behind times, that is consumer flash cards are built on newer tech and ought to yield better results.

The app was modified to (read -- compare -- write if needed) in transaction, and to request possible changes in batches, thus removing unnecessary writes and combining multiple small changes into fewer larger writes.

SQLite is  a practical necessity, I want transaction safety as a guarantee of logical data consistency. SQLite has options though, these were tested. A 100 byte row takes 1KB in the database, the difference is comprised of indices, data structures and metadata.

Linux offers many file systems, some more developed than others, yet most were made for hard disk storage. As my type of flash is seen as a block block device to the system, I cannot use dedicated flash file systems anyway.

Result key:
  • test1: update 1000 records 100 at a time
  • test2: save 1000 records 1 at a time, delete 1000 records 1 at a time
  • --
  • total: total KB written
  • bins: 128KB-size erase blocks touched
  • worst: number of time most used erase sector overwritten
  • bins >90% worst: number of erase blocks overwritten >90% of worst


Raw results:
  • sqlite std options                against ext3 std options
    total     1288K, bins       15, worst       11, bins >90% worst 3 # whitelist
    total  106836K, bins     266, worst   2001, bins >90% worst 2 # offline
  • sqlite page size 64K           against ext3 std options
    total     7740K, bins       26, worst       14, bins >90% worst 2
    total 616132K, bins      267, worst   2046, bins >90% worst 4
  • sqlite journal_mode persist  against ext3 std options
    total      992K, bins       11, worst       21, bins >90% worst 1
    total  48016K, bins      118, worst   4004, bins >90% worst 1
  • sqlite synchronous off         against ext3 std options    (unsafe)
    total      292K, bins       12, worst        7, bins >90% worst 1
    total    3748K, bins       25, worst       88, bins >90% worst 1
  • sqlite write ahead logging    against ext3 std options
    total     7740K, bins       26, worst       11, bins >90% worst 4
    total 616100K, bins      267, worst   2046, bins >90% worst 4
  • --
  • sqlite std ext3 std (repeated for comparison)
    total     1288K, bins       15, worst       11, bins >90% worst 3 # whitelist
    total  106836K, bins     266, worst   2001, bins >90% worst 2 # offline
  • sqlite std ext3 std noatime
    total     1312K, bins       16, worst       11, bins >90% worst 4
    total 106616K, bins      266, worst   2002, bins >90% worst 2
  • --
  • --
  • sqlite best (std; persist)
    total     1288K, bins       15, worst       11, bins >90% worst 3 # std
    total  106836K, bins     266, worst   2001, bins >90% worst 2
    total      992K, bins       11, worst       21, bins >90% worst 1 # persist
    total  48016K, bins      118, worst   4004, bins >90% worst 1
  • --
  • sqlite std                         against ext4 std options
    total     1280K, bins       18, worst       10, bins >90% worst 2
    total 106768K, bins      296, worst   2000, bins >90% worst 1
  • sqlite std,pers               against ext4 ^journal
    total      904K, bins       14, worst       13, bins >90% worst 1 # std
    total   42996K, bins      23, worst    2015, bins >90% worst 4
    total      820K, bins       11, worst       20, bins >90% worst 1 # persist
    total   34508K, bins        7, worst    4000, bins >90% worst 1
  • sqlite std,pers,page    against ext4 stride 128K
    total     1284K, bins       19, worst       10, bins >90% worst 3 # std
    total 106852K, bins      266, worst   2000, bins >90% worst 1
    total    1012K, bins       12, worst       20, bins >90% worst 1 # persist
    total   58296K, bins      197, worst   4000, bins >90% worst 1
    total     7648K, bins       29, worst       11, bins >90% worst 2 # page size 64K
    total 600220K, bins      267, worst   2022, bins >90% worst 3
  • --
  • sqlite std                         against nilfs2
    total 319800K, bins    2500, worst       20, bins >90% worst 1
    total 255528K, bins    1998, worst       18, bins >90% worst 1 # nilfs2 ran out of disk space - it's gc bugs
  • sqlite std, persist         against nilfs2 protection period 1 second, other tweaks
    total     2308K, bins       21, worst        3, bins >90% worst 1 #std
    total 319580K, bins    2498, worst      20, bins >90% worst 1
    total     2804K, bins       24, worst        3, bins >90% worst 1
    total 166032K, bins    1300, worst      10, bins >90% worst 1 # incomplete nilfs2 crashed
    total     2592K, bins       23, worst        3, bins >90% worst 9
    total 707968K, bins    5532, worst      27, bins >90% worst 1
  • sqlite std                         against btrfs std
    # too raw - ran out of space copying in static image and code
  • sqlite std                         against fuse-exfat
    total      116K, bins        9, worst        1, bins >90% worst 9
    # fuse evidently doesn't sync
    total        48K, bins        5, worst        1, bins >90% worst 5





Wednesday, 7 September 2011

/etc/init.d/jenkins


#!/sbin/runscript

depend() {
use net dns logger
}

checkconfig() {
return 0
}

start() {
checkconfig || return $?
ebegin "Starting jenkins"
start-stop-daemon --start --background --user dev --env JENKINS_HOME=/var/lib/jenkins \
--make-pidfile --pidfile /var/run/jenkins.pid \
--startas /usr/bin/java -- -jar /usr/share/jenkins.war --httpPort=8014 --prefix=/jenkins
eend $? "Failed to start jenkins"
}

stop() {
ebegin "Stopping jenkins"
start-stop-daemon --stop --background --user dev --env JENKINS_HOME=/var/lib/jenkins \
--make-pidfile --pidfile /var/run/jenkins.pid \
--startas /usr/bin/java -- -jar /usr/share/jenkins.war --httpPort=8014 --prefix=/jenkins
eend $? "Failed to stop jenkins"
}

Tuesday, 28 June 2011

What is Google SRE, a.k.a. google.com engineer?

A few months ago I went to PyCon, Google had a booth and a riddle there. I was sort of looking at Google as a potential employer in Europe for a while prior to that, so I had a chat with two Google recruiters. Unfortunately neither of them could tell me much about their European offices, nevertheless it took off from there.

Unfortunately the more proactive recruiter was after new meat for SRE, he emailed:

"My name is Xxx and I'm a recruiter at Google. I support the Google.com Engineering Team."

I wondered what different positions at Google were for a while, so I already knew that "google.com engineer" equals "SRE" equals "site reliability engineer" equals lots of hacking and very little engineering. I have read about reliability at google before, including a very insightful report on hard drive reliability.

From then on I had a few phone contacts, handed off to European recruiter, 3 phone interviews, and eventually an invitation for an on-site interview day at GooglePlex in silicon valley. That was exciting!

Unfortunately the SRE tag stuck with me from the start and I was unable to shake it off. Everyone who interviewed me on the phone was an SRE, everyone who interviewed me on sitewas an SRE or former SRE.

Out of on-site interviews 1 was magnificent and all others sucked.

The great interview was conducted by a slightly older googler who had great command of algorithms and asked me a question that somehow my university never did. O(n) solution to a given problem was already discovered in the 70s, but somehow it was not in my curriculum even though many other similar problems were. Needless to say it was challenging. Also I think I didn't do so great, although I doubt I did awfully badly either.

The other interviewers were around my age and asked lame and redundant questions like "what's in an inode?" Lame because it's mostly trivia, redundant because I was asked that on the phone before. One SRE interviewer (on the phone), when asked to describe what working SRE was like, eventually gave up and told me "it's a job." I mean, really, if you are considering work for Google almighty, would ever take an offer of "a job?"

Coming back to the subject of this post, what is SRE position and what do SREs do? Basically SREs have to keep the crap that real engineers wrote running. They don't really code, they don't really do much engineering. Yet, they are more than server-rebooting-monkeys. And of course they have to make the tools to make their job easier. Plus they are sometimes on call, and if something breaks they have to repair it pretty fast. Yes they have to be on top of things, yes they have to be hackers at heart, and yes the job could be interesting.

Still I couldn't help but wonder -- do I want to create or merely help run other people's stuff?

Moreover I couldn't shake off the feeling that recruiters tried to suck me into the quagmire of SRE regardless of my real skills.


So here comes advise:

If you are applying for Google and hear SRE, run.

Redo your resume and remove any references to hardware or server maintenance, that's a red flag that puts you in SRE category immediately.

Most importantly, think hard what you want to do.


Coming up, Brazil-esque vision of real google engineers and other things that sucked.

I would like to finish off on a positive note, I really really liked the full-height "barn" automatic bike doors buildings in Google campus are equipped with. It just warmed my heart to see that someone thought of that!

Saturday, 19 March 2011

PyCon US 2011 review

Overall

To be honest, I still have not made up my mind about the conference. There was a lot, some was great, some was not. And I drove to and from, 4 days each way, which was an experience on its own.

Atlanta turned out, quite unexpectedly, a very nice city, remarkably different from the rest of US of A, for one the streets are not laid out in a grid (although of course there are some) and with the help of my couch surfing host, we went to some really nice, authentic and interesting places to eat. Most important to note, a Korean-flavoured taco place, the runner up is an authentic Cuban joint, that serves sandwiches, burgers and most importantly sucos, kind of like smoothies. Had I stayed in the hotel, as the conference recommended, I would have missed so much! Kudos to William of couch surfing in Atlanta!

Tutorials

I picked only one tutorial, as I wanted to find out if tutorials were worth it. As such I picked the most appealing to me, on machine learning. First half was really good, perhaps just a tad too slow for me, as it allowed to go and refresh my classification concepts on Wikipedia, but when I tuned back in, I have already missed a little bit of the talk. Second half was a bit confused. The code examples didn't seem to complete in reasonable amount of time and it seems almost none tried to run them, talk was more on how it's done with this particular library and less on why it's done or why Python is involved at all. These were nice 3 hours, although to think that I shelled out 100 bucks for the tutorial, I expected more.

Keynotes

I missed the PSF chairman address, but was on time to see the 1st keynote, delivered by Hilary Mason. That was a downer. Sure, there was a kitten and a dubious, but fun, statistics on pycon web site. But there was no content, absolutely zip! It is as if the speaker was instructed to dumb down the address so that a complete stranger who walked in to the wrong hall would understand. After all she's not a bad speaker. And to think that I could have had breakfast instead!

"Chat with Guido" was more interesting. I wouldn't say it was super, as apparently Guido is not as concise in his speech as in his programs, but it was interesting nevertheless.

Sunday didn't seem to feature a speech deserving to be called Keynote. Threadless was interesting to listen to, the rest was entirely forgettable.

Talks

Now that all is attended and done, very few talks come to mind. In all honesty, very few were memorable enough.

Friday started with a rather interesting and forcefully delivered talk "Getting the job," that focused on non-technical aspects of getting hired, in particular to a company that uses Python, loves open source and hiring is done by a regular manager. It emphasized social skills and left me with impression that a lot needs to improve in software companies.

The other interesting talk was, remarkably "Javascript for Pythonistas." That should say something about a Python conference if one the more interesting talks focuses on a "competitor" language.

I must have missed a couple of talks, as I went to talk to all the interesting companies at the startup row.

I was really looking forward to Zed Shaw's talk on ZeroMQ on Saturday, based on his spectacular, stunning talk "ACL is dead", that includes part on "how to keep your soul" (in the industry). This time it was not as great, perhaps because the allotted time was significantly shorter.

Unfortunately I missed Augie Fackler's talk on choice of http libraries in Python. I have seen him prepare earlier and must have been a great talk, but I figured I'd check other talks first, becuase surely the beginning of his talk would be too basic. Well by the time I got there, all I heard was "that was it, do you have any questions?" He only used half his slot, and there were no questions. Or none that I remember. I can't wait for all the videos to appear online so that I can check it out.

Sunday was a total bummer. There were lightning talks in the morning, at least one of which was good, obligatory business-like talks and 3 time slots for normal tracked talks. First of these features a rather non-issue talk masterfully delivered by Raymond Hettinger, and the next 2 slots there was obsolutely nothing worth staying at the conference for. The half-day was concluded with more lightning talks, at least 2 of which stood out, I'll post links when videos show up, and introduction to sprints that was downer too. I should have realized this in advance and gone to see the city.

And the ones that I mentioned were the good talks. More than once did I find myself bouncing between one room and next and back, just because there was nothing worth listening to!

Goodies

PyCon featured a swag shop, ran by Elegant Stitches, there were a couple of cool t-shirts, "flash wound" with a keyboard jammed in a broken monitor and a neat Python with an apple tee, perhaps referring to the tree of knowledge. Some swag was good, none was great. As with much tech swag, there was conspicuous lack of colours, women's cuts or even some sizes.

There was also a book store, with hoards of books on Python and related subjects, what surprised me the most was that all books except for one (and I checked) had code examples in serif font, most as far as I remember didn't even bother to highlight the keywords. Some went as backwards as include screenshots of IDE. I could perhaps accept that in an old Fortran book published in the age of punch cards, but nowadays surely programming is done on a display with syntax highlighting and therefore programmer is accustomed to seeing her code in good sans-serif font, in colour, keywords in bold and perhaps comments marked in some way (though I'm strictly against italics and underlined text). Surely then modern programming books must adopt same convention! If printing in black and white saves money, sure, but the rest?

In Conclusion

Will I go again? Only if someone pays me to.

Should you go? Well that depends:
  • If you practiced Python for years, don't go, you won't learn anything.
  • If you are a student, it's too expensive, though if you get funding, go for it.
  • If you have many friends to meet at the conference, sure, go ahead.

Wednesday, 2 February 2011

Going to PyCon

PyCon 2011 will be held in Atlanta, GA, USA during March 9th~17th.

It's quite pricey though, at least if you pay yourself. Conference is 300 for early brids and tutorials are 100 each for same. Moreover the special hotel rate is 160 a night, what a ripoff!

It's probably a nice hotel and all, but if you read Trip Advisor, some manage to stay in same hotel for 53 plus tax a night! I didn't book a hotel yet, but there's a bunch of cheaper options in the vicinity.

PyCon expects over a thousand attendees, so perhaps it is worthwhile to compare to DrupalCon costs.

Anyhow, I'm going to make a Google calendar for the event because the PyCon's website leaves far to be desired, and I will surely post my impressions and more when I'm there!

Monday, 20 December 2010

Google, cough up!

Google spends several b on data centers every year (short scale), it neither reports how much exactly nor on what, nor what share of the cpu load is eaten by Python programs, yet we know big G definitely loves and uses Python.

If Python cpu consumption totaled a meager 10% of total, and some project improved Python's performance by a meager 1%, that alone would save Google several m a year.

Meanwhile PyPy, by far the most fertile modern python performance project, received eurostars grant E! 4791 "PyJIT" worth half an m that is supposed to last several years. ( http://www.eurostars-eureka.eu/search.do )

Thus, Google, cough up!

Tuesday, 30 November 2010

Truly international Python

Let's recall Guido's old Computer Programming for Everybody (CP4E) proposal.

Nowadays that Python is established, it's high time to push Python into education, especially first programming language education. I think, in the modern world it means pre-school.

Now the larger part of the world's children doesn't learn English before school, therefore we need to have truly localized Python.

Some might recall a Python derivative demo with unicode variable names (link anyone?).

I think we ought to go further. For example, consider imaginary language pig latin:



"""This does that""" --> """Thiso acto thato""" # docstrings
__version__ = (1,2,3) --> __versio__ = (1,2,3) # variable names
import time --> importo chrono # standard module names
def foo(): pass --> defo foo(): passo # Python keywords
"foo".upper() --> "foo".uppero() # standard library
raise Xx("undefined") --> raisio Xx("indifinito") # errors
#!/usr/bin/python --> #!/usr/bin/pythono # executable name
#!/usr/bin/python --> #!/usero/binaro/pythono # name and path

Of course there are concerns for many languages:

  • Each language needs to establish stable translations for keywords, basic types, standard modules, methods in standard modules, etc.
  • Some languages don't support word spaces natively
  • Some languages have different punctuation rules, e.g. comma for decimal point
  • Some languages use different quotes
  • RTL languages spell words RTL yet (some/all?) spell numbers LTR
  • Hopefully none has to recreate 10,000-separator system ;-)

Anyhow, it's not the issue of core Python to support particular languages, what is needed is:

  • the concept that this is needed, and
  • the base where from a particular localization can evolve from

Here, a fun example, how Python might look like in google-translate-simplified-chinese. Blame google, not me as I know very little about this language.


"""This does that""" --> """这是""" # docstrings
__version__ = (1,2,3) --> __版本__ = (1,2,3) # variable names
import time --> 进口 时间 # standard module names
def foo(): pass --> 业 美孚(): 通过 # Python keywords
"foo".upper() --> “富” 上层() # standard library
raise Xx("undefined") --> 提高。二十(“未定义”) # errors
#!/usr/bin/python --> #!/usr/bin/蛇 # executable name
#!/usr/bin/python --> #!/用户/二进制/蛇 # name and path

I track this here and will update with the received feedback:

http://pythonic-wisdom.blogspot.com/2010/11/truly-international-python.html