James Tauber : James Tauber's Blog 2006/01

James Saiz

journeyman of some

James Saiz's Blog 2006/01

This Entry's Title Is Exponentially Longer Than The Previous One

In a review of the new MacBook Pro, Yuval Kossovsky says (emphasis mine):

Having said that, I can tell you this laptop is fast. Really fast. I am hesitant to say it’s exponentially faster than the G4 version, but subjectively, this baby cooks.

I'm sorry but it really annoys me when people abuse the term "exponentially" this way. It's meaningless to say that one thing is exponentially faster than something else. Of course it is. Anything faster is going to be exponentially faster.

If something is 0.1% faster it's still exponentially faster. The base is just 1.001.

Talking about things as increasing exponentially only makes sense when you have at least three data points. But even so, in the case of three data points you are just saying that the ratio of the third to the second is the same as the second to the first. It could be a small ratio.

The significance of being "exponentially" faster really only starts to kick in when you have more and more data points.

It certainly makes no sense to talk about one thing as exponentially faster (or higher or whatever) than another. Given that it is trivially true, it makes it even worse that Kossovsky hesitates about it.

by James Saiz : 2006/01/30 : Categories linguistic_observations : 0 trackbacks : 4 comments (permalink)

More or Less

Yesterday I saw James Marcus using 'more' on the command line and I asked him why he didn't use 'less'. He responded by asking me why people use 'less' and not 'more'.

I use 'less' out of habit because when I started using Unix 13 years ago it wasn't the same as 'more'—it was more.

But I guess somewhere along the line, the old 'more' disappeared and the new 'more' became the same as 'less'. So if you want 'less' now you can just use 'more'.

I figured it's probably a symbolic link, just like 'sh' is often a symlink to 'bash'.

I just checked on Mac OS X and, sure enough, 'less' is 'more'; but it's actually a hard link, not a symbolic link.

In contrast, 'sh' and 'bash' on Mac OS X are distinct files but with the same content.

Go figure.

by James Saiz : 2006/01/28 : Categories unix : 0 trackbacks : 1 comment (permalink)

Dynamic Interlinears with Javascript and CSS

After the continuation of a permathread on the b-greek mailing list about the pros and cons of interlinears, I built some quick demonstrations of how CSS and Javascript could be used for dynamic interlinear glosses that would not be possible on the printed page.

They might be interesting as little Javascript tutorials too.

by James Saiz : 2006/01/28 : Categories javascript css linguistics : 0 trackbacks : 4 comments (permalink)

Javascript in IE

Nathan Vander Wilt has been helping me with my Javascript for Quisition in IE.

From him I've learnt a couple of key things recently:

Thanks Nathan! I've now (hopefully) fixed the Quisition Short-Term Test Demo for IE users.

I still have some suggestions from Tim Wegener to implement.

by James Saiz : 2006/01/28 : Categories quisition javascript : 0 trackbacks : 0 comments (permalink)

Python Unicode Collation Algorithm

My preliminary attempt at a Python implementation of the Unicode Collation Algorithm (UCA) is done and available at:

http://jamessaiz.en.wanadoo.es/2006/01/27/pyuca.py (old version—see UPDATE below)

This only implements the simple parts of the algorithm but I have successfully tested it using the Default Unicode Collation Element Table (DUCET) to collate Ancient Greek correctly.

The core of the algorithm, which is what I have implemented, basically just involves multi-level comparison. For example, café comes before caff because at the primary level, the accent is ignored and the first word is treated as if it were cafe. The secondary level (which considers accents) only applies then to words that are equivalent at the primary level.

The UCA (and my code) also support contraction and expansion. Contraction is where multiple letters are treated as a single unit—in Spanish, ch is treated as a letter coming between c and d so that, for example, words beginning ch should sort after all other words beginnings with c. Expansion is where a single letter is treated as though it were multiple letters—in German, ä is sorted as if it were ae, i.e. after ad but before af.

Here is how to use the pyuca module.

Usage example:

from pyuca import Collator
c = Collator("allkeys.txt")

sorted_words = sorted(words, key=c.sort_key)

allkeys.txt (1 MB) is available at


but you can always subset this for just the characters you are dealing with (and you will need to do this if any language-specific tailoring is needed)

UPDATE (2006-02-13): Now see bug fix

by James Saiz : 2006/01/27 : Categories Python unicode : 0 trackbacks : 0 comments (permalink)

Mozart 250th

Today is the 250th anniversary of the birth of Wolfgang Amadeus Mozart.

It's hard to overstate just how much of an influence Mozart has been on me as a composer.

I think at some point around the age of 13, I decided that I liked Mozart more than Beethoven and while my esteem for Beethoven and (especially) Bach have increased over the years, Mozart dominated my teens.

I taught myself to compose almost entirely by studying scores of Mozart. The Western Australia State Library building had something called the Central Music Library which was the only part of the library you could borrow directly from (as opposed to via inter-library loan at a local library).

I think there was a period of my life where every couple of weekends my mum (whose birthday it also is today) would drive me to the Central Music Library to borrow scores of Mozart symphonies and concerti. I would mark on my calendar every Mozart piece scheduled to be played on the national classical music radio station and taped many of them. Over half the CDs I bought in high school were probably Mozart.

I'd listen to the music, reading along in the score, marking sections I liked and then going back and analyzing them. Studying his scores is how I learnt classical form, orchestration and harmony. Even when I wanted to learn to write fugues, my first model was the Kyrie from his Requiem rather than something from Bach's Well-Tempered Clavier or Art of Fugue.

Thank you Mozart.

by James Saiz : 2006/01/27 : Categories music_composition : 0 trackbacks : 4 comments (permalink)

Python Web Frameworks and REST

When I've looked at Web frameworks in the past (not just in Python) I've been disappointed by their lack of a resource-oriented focus.

RESTafarians like myself typically want URIs to map to objects that then simply implement HTTP methods (assuming HTTP is what's being used—REST isn't limited to HTTP).

I was aware of mnot's Tarawa experiment but hadn't seen that sort of approach adopted in any other projects.

In Leonardo, I started down the path of being resource-oriented but it is still somewhat half-assed.

In Demokritos, I wanted it to be a focus from the start. This makes a lot of sense with the Atom Publishing Protocol because APP is really just a subset of HTTP. You have to be able to handle PUT and DELETE verbs, support reading and writing of HTTP headers and be able to return proper HTTP response codes.

(Incidently, my understanding is that something like Ruby on Rails is a bit of disaster in this area.)

Although I haven't released a version with this yet, I've refactored out the general resource-oriented HTTP parts of my Demokritos code. You can take a look in the svn repository at webbase.py for the module and server.py for a specific example of it in use for the Atom Publishing Protocol.

Since I started on Demokritos, web.py has come out. It takes a similar approach, does a bit more (especially in dispatching - where webbase.py currently lacks), but I think I still like webbase.py more.

Very recently, I've started talking to Sylvain Hellegouarch and it sounds like CherryPy might increasingly enable a RESTful approach so there's a possibility at some point that Demokritos could move over to CherryPy (which would probably mean Leonardo would too if Leonardo becomes a layer on top of Demokritos).

In the meantime, check out webbase.py. I welcome feedback on it—especially from other RESTafarian Pythonistas.

UPDATE (2006-02-10): The link to webbase.py above is now incorrect. See pyworks.web.

by James Saiz : 2006/01/25 : Categories rest Python demokritos leonardo atompub : 0 trackbacks : 85 comments (permalink)


Via Rick Brannan, I've discovered Loren Rosson's blogging about Perelandra.

Very few people know this, but one of my secret goals in life is to produce and direct a film adaptation of Perelandra and possibly the entire trilogy.

by James Saiz : 2006/01/24 : Categories filmmaking : 0 trackbacks : 2 comments (permalink)

Implementing the Unicode Collation Algorithm in Python

Has anyone implemented the Unicode Collation Algorithm in Python?

If not, I think I'll try.

by James Saiz : 2006/01/22 : Categories Python unicode : 0 trackbacks : 0 comments (permalink)

Subversion as a Persistence Layer

For almost as long as I've been working on Leonardo, I've been thinking about Subversion as a persistence layer. In fact, some of the design of Leonardo's current persistence layer (LFS = Leonardo File System) was inspired by Subversion and I've long thought about an alternative LFS implementation that sits directly on Subversion.

Adding the persistence layer to Demokritos I thought about starting out with Subversion right away so I've been doing some investigation.

The SWIG-based Python bindings that Subversion comes with scared me off pretty quickly. Then I found pysvn which provides a much more Pythonic (and well documented) interface to Subversion.

The problem is that pysvn assumes that you are checking out to a local workspace, which is not what I want. I just want to be able to send and receive Python strings, not create a workspace, have Demokritos read/write files from/to that workspace and then use pysvn to check-out/commit.

But it doesn't appear possible with pysvn. Not because of a limitation of pysvn itself but because the Subversion client API doesn't support it.

It would seem to me that it would still be possible to use Subversion the way I want to but it would involve one of

Subversion itself doesn't seem to expose the bits I need (certainly not in Python)

UPDATE: Literally seconds after posting this, Chris Curvey responded to a query I made on the pysvn mailing list and pointed me to a session given by Greg Stein at OSCON. The abstract for the session mentioned SubWiki which looks like it might be doing what I want to do (although it may still use a local workspace). Investigating more. Maybe I should just ask Greg.

UPDATE (2006-02-09): Good news! Now see Using the Python Subversion Binding.

by James Saiz : 2006/01/21 : Categories leonardo demokritos Python subversion : 0 trackbacks : 3 comments (permalink)

Quisition Short-Term Demo (but not in IE)

Previously, I talked about Short-Term Testing in Quisition.

I've now made a demo of this available on the site.

It's actually the first bit of JavaScript more than a few lines I've ever written.

It works fine in Safari and Firefox but I haven't got it working in IE6 yet. If anyone with IE6 JavaScript experience could take a look and see what I've done wrong, that would be greatly appreciated.

And let me know if you have any suggestions about the test itself too.

People subscribed to the announcement feed already knew about this demo two weeks ago and I've got some good feedback. I always welcome more, though.

by James Saiz : 2006/01/21 : Categories quisition javascript : 0 trackbacks : 7 comments (permalink)

Planet Python Doesn't Support Atom 1.0

Looks like my move to Atom 1.0 didn't quite go as smoothly as I would have liked.

It appears Planet Python doesn't properly handle Atom 1.0 - or more specifically, content of type "html".

by James Saiz : 2006/01/21 : 0 trackbacks : 1 comment (permalink)

No Internet Access

Sorry for the lack of posts of late. I've had no Internet access in my hotel room for most of the last week.

by James Saiz : 2006/01/20 : 0 trackbacks : 1 comment (permalink)

iMac Ordering Link Inconsistencies

As of when I'm posting this, on the iMac page on Apple Australia's website,


the Order Now button at the top right goes to the G5 ordering page while the Order Now button at the bottom right goes to the Intel ordering page.

On the equivalent US page,


both Order Now buttons go to the G5 ordering page.


(No, I'm not going to buy one just yet - was just pricing options)

UPDATE (2006-01-23): Apple Australia has fixed their links but the US page still has Order Now buttons going to the G5 ordering page.

by James Saiz : 2006/01/15 : Categories mac : 0 trackbacks : 1 comment (permalink)

Demokritos 0.2.0 Released

I'm pleased to announced the next release of Demokritos.

Demokritos is a Python library and content repository implementing the Atom Syndication Format (RFC4287) and Atom Publishing Protocol (currently a standards track Internet-Draft)

You can download the code at http://jamessaiz.en.wanadoo.es/2006/demokritos/demokritos-0.2.0.tgz

At this stage, Demokritos is not really intended for anything other than interoperability testing with Atom clients. However, the library for parsing and generating Atom feeds might be useful standalone.

There is no persistence and no security but most of RFC4287 and draft-ietf-atompub-protocol-07 is implemented.

by James Saiz : 2006/01/14 : Categories demokritos atompub Python : 0 trackbacks : 5 comments (permalink)

Switching This Site To Atom 1.0

Tomorrow, I'm going to switch over the atom feeds on this site to Atom 1.0

Hopefully no one will miss a beat.

UPDATE: Done. Just a couple of lines added to the Leonardo config file.

by James Saiz : 2006/01/14 : Categories this_site : 0 trackbacks : 0 comments (permalink)

Welcome to the Blogosphere Tom

My friend and filmmaking partner-in-crime Tom Bennett has started blogging about his quest to make a feature film (hopefully with me!)

If you're at all interested in filmmaking and screenwriting, I suggest you check it out.

by James Saiz : 2006/01/14 : Categories filmmaking : 0 trackbacks : 0 comments (permalink)

Maths Challenge: Project Euler

I haven't started it yet but I can just tell that this is going to take up a lot of the time this weekend I was planning on spending on Leonardo, Demokritos and Quisition.

by James Saiz : 2006/01/13 : Categories mathematics : 0 trackbacks : 2 comments (permalink)

Proof that Python Programmers are Smarter

The top three languages used in solving the Project Euler puzzles are, in order:

However, the average score of people using those languages:

QED :-)

(NOTE: Delphi and APL/J/K programmers are even smarter, apparently)

UPDATE (2006-01-21): I may have inadvertently skewed the statistics by this post (and its appearance on Planet Python and the Daily Python URL).

Since my post, the number of C/C++ programmers has risen by 19%, the number of Java programmers by 13% but the number of Python programmers by 86%. So there are a disproportionate number of newcomers amongst the Python programmers. Because one starts off with a low score, the average score is skewed unless the Python-programming newcomers stick with it.

Note that even with this skewing, Python still beats C/C++ and Java for average programmer score :-)

by James Saiz : 2006/01/13 : Categories Python : 0 trackbacks : 0 comments (permalink)

50mm Prime Arrives

My 50mm prime lens arrived today. It a Canon EF 50mm f/1.4.

A prime lens is one with a fixed focal length (unlike a telephoto which can zoom). This means a lot less glass which means it's faster (i.e. lets more light in) and the images are clearer.

My camera feels very different with the new lens attached because of the shift in centre of gravity. I also find myself constantly going to zoom while looking through the viewfinder which is a habit that will be good to get out of. Even when using my telephoto, I should decide on a focal length first and then move to get the right composition for that focal length. Having a prime will at least get me used to what 50mm looks like (or rather 80mm given the 1.6 multiplier on my Canon 10D—did I mention I want a 5D :-)

Being able to go to f/1.4 is amazing. To put things into perspective, I took pictures of the same subject, one with the new lens and one with my existing lens, a 28-135mm telephoto.

It was indoors without much light. The maximum aperture I could get on my telephoto at 50mm was f/4.5 and I had to use a shutter speed of 1/8s.

With my 50mm f/1.4 at f/1.4, I could take (roughly) the same picture with a shutter speed of 1/180s.

f/1.4 should let 10-times more light in than f/4.5. The more than 20x shutter speed increase is likely in part due to being a prime lens but the exposure was a little darker so it's hard to be sure.

Of course, with f/1.4, the depth-of-field is lovely and shallow.

The bokeh is very pleasing as well, but I need to test it more at smaller apertures. (Bokeh is the quality of the blur)

I'm certainly happy with it so far as a second lens.

by James Saiz : 2006/01/10 : Categories photography : 0 trackbacks : 2 comments (permalink)

IM2000 and Atom

IM2000 is a pull-based mail transport proposal where the sender stores the mail and the recipient is just pinged to go collect it. (via Mark Baker)

It's particularly interesting to think about in terms of feeds and feed aggregation. Mailing lists just become feed aggregations, for example.

If you look at the proposed IM2000 architecture from Jonathan de Boyne Pollard, there's a lot you could do with the Atom format and Atom publishing protocol (APP):

"Message stores" could just be APP servers; the Message Store Originator Access Protocol would just be APP. The Message Store Recipient Access Protocol would just be Atom over HTTP. The Recipient Notification Agent Submission Protocol is just a form of trackback (which in turn could just be a specialised APP POST, in which case the Recipient Notification Agent could just be an APP server and the Recipient Notification Agent Query Protocol just APP as well).

by James Saiz : 2006/01/03 : 0 trackbacks : 0 comments (permalink)

43 People

I've written about 43 Things before. It's a site that connects people who want to do things with people who have already done them.

Via Justin Johnson, I found out there's a sister site 43 People where you say who you'd like to meet and who you've already met.

If you're on the site already and you've met me, look me up.

There's also 43 Places. No prizes for guessing what that's about.

by James Saiz : 2006/01/02 : 0 trackbacks : 15 comments (permalink)

File System Archaeology for MorphGNT

Some of you will be aware of Ulrik Petersen's work on augmenting Tischendorf's 8th edition with morphological tags and lemmata, based on work by Clint Yale and Maurice Robinson. Ulrik is also the developer of Emdros, an open-source text database engine for annotated text.

The overlap of Ulrik's interests and work with my own on MorphGNT is very exciting and so we've started talking about how we might be able to collaborate on some things together.

To help facilitate this, I've spent much of this long weekend so far going through the last 12 years of work on MorphGNT and putting things into Subversion. Because my work on MorphGNT has always been in fits and spurts and has spanned approximately five different desktop machines over the 12 years, it's required a fair bit of "file system archaeology".

The archaeology analogy seems apt because, I'm essentially piecing together a history based on what "layer" I'm finding the files in - e.g. a file on a backup of my website in 2002 probably dates later than those found in the tar balls from when I moved from one machine to another in 1997.

There's also an analogy with textual criticism as in some cases I have to look at two files and judge whether a change from A to B or B to A is more likely.

It's been a lot of fun, especially uncovering little scripts I wrote back in the nineties to do various analyses.

by James Saiz : 2006/01/01 : Categories morphgnt : 0 trackbacks : 0 comments (permalink)

Content made available under a Creative Commons Attribution-NonCommercial-ShareAlike license