James Tauber : James Tauber's Blog 2006

James Saiz

journeyman of some

James Saiz's Blog 2006

Account Management Patterns

On the weekend, I drew some diagrams describing the account management sub-system I had written for Quisition, partly to see the patterns abstracted from the particular implementation.

Here's the login pattern:

login pattern

Elliotte Rusty Harold recently wrote about the problems with using GETs for confirmation.

I wanted account signup to involve being sent an email to ensure the user had given a legitimate email address, but cognisant of the issues Rusty raises, I made the email received on signup link to a further form the user then has to submit to truly activate the account:

sign up and activation

I originally had the "forget password form" directly resetting the password, but then I realised someone could maliciously enter the email address of another user to reset their password. Not a security issue so much (the new password goes to the right person) but it's a nuisance for the person if they didn't request the reset.

So I adopted an additional pattern where an email is sent which then takes the user to a reset password form:

sign up and activation

In both cases, the URI in the email includes a hash in the parameters so the GET that leads to the form can't be faked.

by James Saiz : 2006/03/20 : 0 trackbacks : 2 comments (permalink)

Switched over to lighttpd

I just switched a bunch of my sites over to running on lighttpd including http://morphgnt.org/, http://leonardo.pyworks.org/ and http://www.quisition.com/.

It took me a little while to work out how to translate my ScriptAlias directives in Apache to lighttpd (hint: configure mod_alias to map the request path to the CGI script then mod_cgi to recognize files ending in certain characters as being CGI scripts)

The only problem I now have is I've killed anonymous SVN access on pyworks.org because I was previously serving it up via Apache. I'm still investigating alternatives to running Apache just for this purpose.

by James Saiz : 2006/03/20 : Categories web lighttpd : 0 trackbacks : 0 comments (permalink)

Emacs, Unicode and Greek on Mac OS X

Ulrik Petersen pointed me to How to use EMACS with Unicode Greek (polytonic Greek (multiaccented) included) and LaTeX.

"I can go back to using Emacs!" I thought to myself (actually, I probably typed it out loud to Ulrik over IM)

All that remained was to find a more up-to-date OS X build of Emacs. OS X comes with 21.2 but the greek.el above requires 21.3.

My initial Google searching found that a lot of Emacs for OS X work ended in 2003.

Then I stumbled across this: Carbon Emacs.

Emacs 22 for Tiger (with Universal Build).

The anti-aliasing is beautiful and greek.el works a charm.

Now to dig up my old .emacs file...

by James Saiz : 2006/03/18 : Categories os_x emacs greek unicode : 0 trackbacks : 0 comments (permalink)

Leonardo 0.7.0 Released

It happened a few days ago, but I haven't announced it here yet...

I am pleased to announce the release of Leonardo 0.7.0.

Leonardo is the Python-based content management system that runs this site and provides blogging and wiki-style content.

New features include:

Plus some internal cleanup and bug fixes.

You can download it from the Leonardo Website.

by James Saiz : 2006/03/17 : Categories leonardo Python announcements : 0 trackbacks : 0 comments (permalink)

Accepted into PhD Programme at Essex

Today I received a packaging indicating my acceptance into the PhD programme in Linguistics at the University of Essex. I will be a part-time external student for the next six to eight years starting this April.

It's hard to describe just how much this means to me. Doing a doctorate is by far my oldest goal in life. I was about eight when I decided I wanted to do a PhD. In high school, I wanted to do it in theoretical physics (specifically general relativity) but 18 months into undergraduate studies decided I wanted to do it in linguistics.

Various reasons, both personal and commercial, delayed my commencement by a decade. But I always knew I wanted to come back to it. I'm finally on the path. Thank you to my referees and to my new supervisor, Andy Spencer.

Undoubtedly you'll hear a lot more about it on this blog over the years.

by James Saiz : 2006/03/14 : Categories phd linguistics announcements : 0 trackbacks : 9 comments (permalink)

Missing SxSW

The last two years, I've gone to SxSW. This year, the timing didn't work with the schedule at work.

Last year I expressed my disappoinment about missing ETech because of SxSW but this year (when they were scheduled at different times) I've missed them both :-(

by James Saiz : 2006/03/14 : 0 trackbacks : 1 comment (permalink)

Amazon S3

Amazon's Digital Services division has launched a data storage web service called S3. Probably a decent way for them to make some money off excess storage they have.

They offer both a REST and SOAP interface. It took all of a minute or two for me to grok the REST interface. Just had to map a couple of things into a well-known mental model. With the SOAP interface, I felt far more like I was having to learn an entirely new way.

Of course, that comes as no surprise to me ;-)

by James Saiz : 2006/03/14 : Categories web rest amazon : 0 trackbacks : 0 comments (permalink)

Exploring lighttpd

Mostly for Quisition but also as an Apache-replacement for some of my other sites, I'm exploring lighttpd.

It sure looks nice so far. If software is judged by how it is configured, lighttpd is wonderful. A breath of fresh air!

by James Saiz : 2006/03/12 : 0 trackbacks : 2 comments (permalink)

Announcing MorphGNT.org

I've hinted before about Ulrik Petersen and I collaborating on Greek New Testament linguistic endeavours.

I'm now delighted to announce the website that will be the home of our collaborative work:


I've transferred my MorphGNT files over there and Ulrik has done the same with his Tischendorf 8th and Strong's Dictionary.

We've been working on a bunch of other stuff for the last few months which will eventually find its way on to that site too.

by James Saiz : 2006/03/12 : Categories morphgnt announcements : 0 trackbacks : 0 comments (permalink)

Recreational Programming

In his post Recreational Programming, Sam Ruby says:

For recreation, some people like to do NY Times crosswords puzzles in ink. Me, I like tackling small, incremental, computer programming tasks.

I can totally relate to that, as I'm sure many readers of this blog can. But it was Sam's title that really caught my eye. Recreational Programming is the term my significant other and I use to describe my various open source tinkerings.

I think we came up with the term after a conversation something like this many years ago when we'd only just started going out and she had no idea what she was in for...

HB: What are you doing?
Me: Programming.
HB: Late on a Saturday night? Is work really busy?
Me: No, it's not work.
HB: So why are you doing it?
Me: It's fun and it's relaxing.
HB: You find programming fun and relaxing?
Me: Yes. It's a form of recreation for me.

After that, the term recreational programming stuck. HB gets why I do it if I use that term.

So now conversations are more like:

HB: What did you do last night?
Me: Recreational programming.
HB: Cool!

rather than:

HB: What did you do last night?
Me: Tried implementing the Unicode Collation Algorithm in Python.
HB: You're strange.

by James Saiz : 2006/03/11 : 0 trackbacks : 1 comment (permalink)

Upgrade This Site With Anti-Spam Maths Captcha

I've upgrade this site to Leonardo 0.7.0 which I'm about to release.

It includes an enhancement to the comment module by Bryan Lawrence that provides for a maths-based captcha to help prevent comment spam. Basically you'll need to do a simple addition to post a comment.

Hopefully this will stop the literally thousands of automated spam comments I receive each month.

by James Saiz : 2006/03/10 : 0 trackbacks : 3 comments (permalink)

Quisition Update

If any of you are wondering how Quisition, my online flashcard site, is going, here's an update.

I'm currently implementing the account sub-system: sign-up, activation, login, etc.

Once that is done, I'll probably go live with it, even though you won't be able do anything with your account just yet.

Remember you can always subscribe to the announcements feed on the Quisition site for announcement when new things become available.

by James Saiz : 2006/03/10 : Categories quisition : 0 trackbacks : 0 comments (permalink)

Crashing Safari

If you download this:


and open it in Safari, it will cause a crash (you have been warned!). In Firefox, it works exactly as expected.

I can't work out whether I'm doing anything wrong. The problem occurs specifically doing a lot of .tagName or .nodeName accesses on an XML element (see the comment in the javascript indicating the location the crash occurs).

The report includes the following:

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x000001b8

Thread 0 Crashed: 0 com.apple.WebCore 0x95994984 DOM::DocumentImpl::tagName(unsigned) const + 36 ...

Any ideas?

by James Saiz : 2006/03/09 : Categories safari ajax : 0 trackbacks : 0 comments (permalink)

Minti Launched

On Monday evening I had the pleasure of meeting Clay Cook and his wife Rachel. Besides being successful Web entrepreneurs, they're super-nice people.

They've just launched their latest venture, Minti, which is a parenting advice site with user contributed articles, rating, tagging and all that Web 2.0 goodness.

Check it out: http://www.minti.com

by James Saiz : 2006/03/08 : 0 trackbacks : 1 comment (permalink)

Demokritos 0.3.7 Released

Last week, Dave Johnson mentioned that he'd successfully posted from his blogging client MatisseBlogger to Demokritos. It was a great interop session on #atom and Joe Gregorio also provided automated testing against Demokritos.

Thanks to both Dave and Joe I was able to make fixes to Demokritos. I didn't do a release immediately, but here it now is:


This version has successfully worked with two independent clients now, so it's getting into reasonable shape.

The upcoming 0.4.0 release will include authentication.

by James Saiz : 2006/03/08 : Categories demokritos atompub Python announcements : 0 trackbacks : 0 comments (permalink)

Starting Leukippos

Given that Leonardo has web-based editing, I'll need some kind of web-based Atom Protocol client once Leonardo is based on Demokritos.

I've been toying for a while with writing an AJAX-based Atom client. The natural name for it would be Leukippos. (Leukippos was the teacher of Demokritos and co-originator of the Greek idea of atoms.)

Anyway, tonight I made a start. My first version of Leukippos retrieves an APP introspection document via XmlHttpRequest, parses it to retrieve the workspaces and collections and allows a user to click on a collection to retrieve it.

I'm not finished collection feed parsing yet, but once that's done and I've prettied it up a bit with CSS, I'll post it here.

My ultimate goal would for it to function something like TiddlyWiki but, of course, with Atom Protocol support (an idea I've mentioned before).

by James Saiz : 2006/03/08 : Categories demokritos leukippos atompub : 0 trackbacks : 0 comments (permalink)

Demokritos the Economist

I named my Atom server Demokritos after the Greek philosopher (more often spelt in the Latin form Democritus) who, along with his teacher Leucippus, developed the idea that all matter is made up of indivisible elements called atoms.

What I didn't know until today, however, is that Demokritos was also responsible for some economic thought that was well before his time.

I've just started reading Economic Thought Before Adam Smith, Volume I of An Austrian Perspective on the History of Economic Thought by Murray Rothbard. It is a tour through the economic thinking of the Greeks, the Romans and the Scholastics of the Middle Ages and Renaissance, largely arguing against any claim Adam Smith might have to being the father of economics (and, in fact, suggesting many of Smith's ideas were a step backwards).

What particularly caught my interest in the first few pages, though, was that Demokritos was the first recorded proponent of the subject value theory. Demokritos believed that moral values and ethics were absolute but that economic values were subjective. He was also the first person we know to write about marginal utility and time preference. All these concepts are core to the Austrian School of the 19th and 20th century and are considered innovations beyond Adam Smith and yet Demokritos was discussing them in a rudimentary way at the time of Socrates!

I talked a little bit about subject value theory and marginal utility in One Red Paperclip and the Benefits of Trade.

by James Saiz : 2006/03/07 : Categories demokritos economics : 0 trackbacks : 0 comments (permalink)

Sensor Sizes

I mentioned before that my new HD camera has 1/3" sensors. That means that each of the three CCDs that detect light to be encoded on tape measures 4.8mm x 3.6mm (a diagonal of 6mm).

Now I realise none of these measures are equivalent to 1/3". I've read contradictory information about why a 4:3 ratio sensor with a diagonal of 6mm is called a 1/3" (although note that 4.8mm + 3.6mm is almost 1/3") so we'll just treat 1/3" as a name for 4.8mm x 3.6mm.

This is a fair bit smaller than 35mm film as you can see from this comparison chart I've drawn up (which includes both sensors and film for both still photography and motion pictures):

relative sizes of 1/3", 2/3", APS-C, 35mm (motion) and 35mm (still) sensors / film

Why are there two pictures for 35mm? 35mm motion picture film frames travel vertically whereas 35mm still film frames travel horizontally. So the width you see of the 35mm motion picture frame is the height of the 35mm still frame (24mm). The aspect ratios are also different. Motion picture film is 4:3 whereas still is 3:2. Note however that, in the case of motion pictures, not all of this area is used as the sound may be recorded along one side (reducing the width to around 22mm) or the top and bottom of the frame masked to change the aspect ratio to the more common 1.85:1 used in movies.

APS-C is the size used by some DSLR still cameras such as my Canon 10D. You may have heard me mention how much I'd like a 5D which has a full-size frame, by which I mean the 35mm (still) sensor at the bottom.

Professional video cameras typically use 2/3" sensors. The use of a 1/3" sensor in the "prosumer" HD cameras like my JVC are one of the key things that distinguishes them from the truly professional cameras. Note, however, that an HD camera with 1/3" sensors is capable of producing images of a higher resolution than a 2/3" standard definition camera.

Lucas used cameras with 2/3" sensors in Episode II and III. There are high-end video cameras with full-frame (i.e. 35mm) sensors in the works.

The size of the sensor impacts things like cost, light sensitivity and the field of view relative to focal length (I'll talk about that last one soon). Ironically, a smaller sensor (like a 1/3" versus 2/3" in video or an APS-C versus 35mm in DSLRs), although cheaper to manufacture and considered less professional, actually requires a sharper lens to resolve the same resolution. A small sensor is packing more lines per mm so a lens has to be capable of resolving that.

by James Saiz : 2006/03/03 : Categories filmmaking : 0 trackbacks : 0 comments (permalink)

Mounting Disk Images From OS X Terminal

Sometimes, to install software on my remote Mac Mini, I need to be able to mount disk images from a terminal session.

I just discovered how to do this. The command is hdiutil.

To mount a disk image:

hdiutil attach SomeDiskImage.dmg

Although I haven't tried it, I believe the disk image can be referenced by URI.

To unmount:

hdiutil detach /Volumes/SomeDiskImage/

I've added this to my Headless Tiger page.

by James Saiz : 2006/03/02 : Categories os_x : 0 trackbacks : 2 comments (permalink)

Demokritos 0.3.0 Released

I'm pleased to announced the next release of Demokritos.

Demokritos is a Python library and content repository implementing the Atom Syndication Format (RFC4287) and Atom Publishing Protocol (currently a standards track Internet-Draft)

You can download the code at http://jamessaiz.en.wanadoo.es/2006/demokritos/demokritos-0.3.0.tgz

This release add persistence using a Subversion backend and has been updated for draft-ietf-atompub-protocol-08

Note that you'll need Subversion 1.3 with the SWIG Python bindings built.

At this stage, Demokritos is not really intended for anything other than interoperability testing with Atom clients. However, the library for parsing and generating Atom feeds might be useful standalone as may the web and svn modules.

Demokritos is made available under a GPL license.

UPDATE: Now see Demokritos 0.3.5 Released

by James Saiz : 2006/03/01 : Categories demokritos atompub Python : 0 trackbacks : 0 comments (permalink)

Demokritos 0.3.5 Released

Once people started using Demokritos 0.3.0 with their own clients a couple of major bugs emerged.

I've fixed them and now released 0.3.5.

You can download the code at http://jamessaiz.en.wanadoo.es/2006/demokritos/demokritos-0.3.5.tgz

by James Saiz : 2006/03/01 : Categories demokritos atompub Python : 0 trackbacks : 0 comments (permalink)


Back in April 2002, when I was "blogging" to Advogato, I wrote a little Python implementation of the esoteric language brainf***.

I just realised I've never posted it here, so here is a slightly revised version:


by James Saiz : 2006/03/01 : Categories Python esoteric_languages : 0 trackbacks : 0 comments (permalink)

Two Years Ago

I just missed my anniversary.

Two years (and two days) ago, I started this blog!

Thanks to all of you who read it regularly.

by James Saiz : 2006/02/28 : Categories this_site : 0 trackbacks : 1 comment (permalink)

Three Key Camera Subsystems

When comparing different video cameras, I like to think in terms of three subsystems that each massively impact the quality of the image that results.

Note that there are lots of other features that will distinguish one camera from another, but the three above are the key ones that directly affect the image.

The JVC GY-HD101E I just bought has the following:

I'll say more about what each of these means in subsequent posts. It's a pretty fascinating area of technology.

by James Saiz : 2006/02/28 : Categories filmmaking hd100 : 0 trackbacks : 0 comments (permalink)

HD101 Arrives

My JVC GY-HD101E arrived today.


I'll report more as I get the chance to try it out. I also want to start getting down some notes on digital cinematography.

by James Saiz : 2006/02/27 : Categories filmmaking hd100 : 0 trackbacks : 0 comments (permalink)

Congratulations Nelson

Nelson Clemente, my singer/songwriter/model/actor friend (who is also the better half of Nelson James) just won Most Promising New Talent 2005 at the Limelight Theatre Awards. Way to go Nelson!

by James Saiz : 2006/02/27 : 0 trackbacks : 0 comments (permalink)

Starting to Blog About My First Feature Film

I've mentioned before that Tom Bennett has started blogging about Heist, the feature film we're working on together.

Well, it's about time I start too.

Tom recently finished the first draft of the script and is currently reworking the script based on my initial feedback.

In that last post, he mentions that I've just bought the camera I plan to shoot Heist on.

It's a JVC GY-HD101E and while Tom will continue to blog about the screenwriting process, I plan to start blogging about digital cinematography, getting up to speed with the camera and planning for Heist.

by James Saiz : 2006/02/21 : Categories filmmaking heist : 0 trackbacks : 0 comments (permalink)

Thank You Daily Python-URL

It warms the cockles of my heart that, despite no longer being on Planet Python (a state I still hope is temporary), Daily Python-URL still posts (and therefore has editors that read) my Python-related posts.

Thank you PythonWare!

If you read this blog and have any interest in Python, do yourself a favour (to quote Ian "Molly" Meldrum) and add Daily Python-URL to your feed reader.

Of course, that doesn't mean I don't want you sticking around here as well :-)

by James Saiz : 2006/02/16 : Categories Python this_site : 0 trackbacks : 0 comments (permalink)

First Pass of Demokritos Persistence Done

@84 on the Demokritos trunk now has full persistence of members and collections into a subversion repository.

At the moment there's no caching (so subversion gets hit every request) and transactions are overly granular. As a result, performance isn't great. But it seems to work!

by James Saiz : 2006/02/15 : Categories demokritos atompub Python subversion : 0 trackbacks : 0 comments (permalink)

Are Absolute Imports Available in Python 2.4 or Not?

PEP 328 looks like it will solve my recent import issues.

However, the PEP mentions that

In Python 2.4, you must enable the new absolute import behavior with from __future__ import absolute_import

which got me excited because it means I don't need to wait for Python 2.5 to take advantage of it.


from __future__ import absolute_import


SyntaxError: future feature absolute_import is not defined

So that part of PEP 328 was just teasing me. I think it's an error in the PEP.

by James Saiz : 2006/02/15 : Categories Python pyworks : 0 trackbacks : 2 comments (permalink)

Bug Fix to Python Unicode Collation Algorithm

See Python Unicode Collation Algorithm for background. This version fixes a major bug that prevented the collation algorithm from working properly with any expansions:


by James Saiz : 2006/02/13 : Categories Python unicode : 0 trackbacks : 0 comments (permalink)

The Naming of Musical Notes, Part IV

In Part III, I introduced what I called absolute note names and relative note names.

I'd like to introduce a third possibility now which is the approach underlying much of Western music note naming. I'll call it fixed note naming.

The fixed note name of a note is what the relative note name would be if we were in the key of <1>-major.

So with fixed note names, we are still naming based on a diatonic scale (and so will have to use + and - to get the chromatic notes) but the diatonic scale we are using might not be the diatonic scale of the key we are in.

Let's use parentheses to indicate fixed note names. And so a <3>-major scale in each of our note naming systems would look as follows:

<3> <5> <7>  <8> <10> <12> <2'>
{1} {2} {3}  {4} {5}  {6}  {7}
(2) (3) (4+) (5) (6)  (7)  (1'+)

The third note is (4+) because <7> is not part of the <1>-major scale. Why didn't we use (5-) which also maps to <7>?

Well notice that

(2) (3) (4+) (5) (6) (7) (1'+)

has the nice property that each number appears exactly once. This property is a convention we adopt. This way, the scale can be thought of in terms of how each note in the <1>-major scale gets modified.

If all this seems too abstract, you can think about the example above in terms of the D-major scale:

D E F# G A B C#

Note that this Western choice of note naming is what I've called fixed note naming in that the notes of the scale have been named based on modifications to the C-major scale under the convention that each letter appears exactly once.

This is one way of thinking about why the third degree of the D-major scale is F# and not Gb. There are other approaches that come to the same conclusion which we'll explore later on.

I'll finish this part, though, with a final thought. It is possible to have a Gb in a piece in D-major. But it is a flattened 4th and is not the same as F# even though in 12-ET the pitch is the same. In absolute note naming, they are both <7> (if <1> = C). In relative note naming (specific to D-major), Gb would be {4-} whereas F# would be {3}.

by James Saiz : 2006/02/12 : Categories music_theory : 0 trackbacks : 2 comments (permalink)

Pyworks Common Library and Import Issues

Demokritos contains some modules that might be useful in other software, so I decided to create a little package for these common modules called 'pyworks' (seeing as that's the umbrella I'm starting to use for my various Python-related projects).

So I was thinking I could rename webbase.py to pyworks.web and call my Subversion library pyworks.svn. I also have a library for helping build domain objects from xml parsing events. I was going to call it pyworks.xml

But that means a 'from svn import repos' in pyworks/svn.py won't work and nor will 'from xml.parsers import expat' in pyworks/xml.py

This problem doesn't exist outside of the pyworks package itself. 'import xml' only means the standard library package and 'from pyworks import xml' gives you the pyworks xml module. Similary with 'import svn' versus 'import pyworks.svn'.

For this reason, I don't mind a little hack internal to pyworks.

I seem to recall the py library (that py.test is in) does something similar.

Unfortunately, calling the module 'pyworks_xml.py' and putting 'import pyworks_xml as xml' in 'pyworks/__init__.py' doesn't work.

UPDATE: The following seems to work in pyworks/__init__.py:

import sys
import pyworks_xml
sys.modules["pyworks.xml"] = pyworks_xml

UPDATE (2006-02-15): Looks like PEP-328 might be exactly what I need. Another reason I'm looking forward to Python 2.5.

by James Saiz : 2006/02/09 : Categories Python pyworks : 0 trackbacks : 1 comment (permalink)

Using the Python Subversion Binding

I've previously talked about my desire to use subversion as a persistence layer.

I've now started to work out how to use the Python bindings that Subversion comes with—mostly through studying the source code of ViewVC (formerly ViewCVS)—and it looks like it definitely meets my needs.

I've written up my notes so far as a page for others wanting to use the Python binding. Hopefully it's useful to other people:


It's a work in progress and I'll add more as I discover and use it.

by James Saiz : 2006/02/08 : Categories Python subversion : 0 trackbacks : 0 comments (permalink)

January Stats

Looking at my web stats, it appears that last month I had almost 100,000 visits (97,571) from almost 20,000 distinct IPs (19,484) averaging almost 10,000 hits/day (9,801).

That's double January last year. Almost.

by James Saiz : 2006/02/08 : Categories this_site : 0 trackbacks : 0 comments (permalink)

Back Home

No blog posts for a over a week because of a busy work week and travelling back to Australia.

But in that time, the number of bloglines subscribers to this blog increased 5% to 181 (after being flat for months)

Does that mean I'll get better readership if I don't post at all? :-)

But seriously, I'm not sure why the sudden increase. It could be because I'm no longer on Planet Python.

If you're a new reader via bloglines and you don't mind telling me how you came to subscribe, email me.

by James Saiz : 2006/02/07 : Categories this_site : 0 trackbacks : 0 comments (permalink)

This Entry's Title Is Exponentially Longer Than The Previous One

In a review of the new MacBook Pro, Yuval Kossovsky says (emphasis mine):

Having said that, I can tell you this laptop is fast. Really fast. I am hesitant to say it’s exponentially faster than the G4 version, but subjectively, this baby cooks.

I'm sorry but it really annoys me when people abuse the term "exponentially" this way. It's meaningless to say that one thing is exponentially faster than something else. Of course it is. Anything faster is going to be exponentially faster.

If something is 0.1% faster it's still exponentially faster. The base is just 1.001.

Talking about things as increasing exponentially only makes sense when you have at least three data points. But even so, in the case of three data points you are just saying that the ratio of the third to the second is the same as the second to the first. It could be a small ratio.

The significance of being "exponentially" faster really only starts to kick in when you have more and more data points.

It certainly makes no sense to talk about one thing as exponentially faster (or higher or whatever) than another. Given that it is trivially true, it makes it even worse that Kossovsky hesitates about it.

by James Saiz : 2006/01/30 : Categories linguistic_observations : 0 trackbacks : 4 comments (permalink)

More or Less

Yesterday I saw James Marcus using 'more' on the command line and I asked him why he didn't use 'less'. He responded by asking me why people use 'less' and not 'more'.

I use 'less' out of habit because when I started using Unix 13 years ago it wasn't the same as 'more'—it was more.

But I guess somewhere along the line, the old 'more' disappeared and the new 'more' became the same as 'less'. So if you want 'less' now you can just use 'more'.

I figured it's probably a symbolic link, just like 'sh' is often a symlink to 'bash'.

I just checked on Mac OS X and, sure enough, 'less' is 'more'; but it's actually a hard link, not a symbolic link.

In contrast, 'sh' and 'bash' on Mac OS X are distinct files but with the same content.

Go figure.

by James Saiz : 2006/01/28 : Categories unix : 0 trackbacks : 1 comment (permalink)

Dynamic Interlinears with Javascript and CSS

After the continuation of a permathread on the b-greek mailing list about the pros and cons of interlinears, I built some quick demonstrations of how CSS and Javascript could be used for dynamic interlinear glosses that would not be possible on the printed page.

They might be interesting as little Javascript tutorials too.

by James Saiz : 2006/01/28 : Categories javascript css linguistics : 0 trackbacks : 4 comments (permalink)

Javascript in IE

Nathan Vander Wilt has been helping me with my Javascript for Quisition in IE.

From him I've learnt a couple of key things recently:

Thanks Nathan! I've now (hopefully) fixed the Quisition Short-Term Test Demo for IE users.

I still have some suggestions from Tim Wegener to implement.

by James Saiz : 2006/01/28 : Categories quisition javascript : 0 trackbacks : 0 comments (permalink)

Python Unicode Collation Algorithm

My preliminary attempt at a Python implementation of the Unicode Collation Algorithm (UCA) is done and available at:

http://jamessaiz.en.wanadoo.es/2006/01/27/pyuca.py (old version—see UPDATE below)

This only implements the simple parts of the algorithm but I have successfully tested it using the Default Unicode Collation Element Table (DUCET) to collate Ancient Greek correctly.

The core of the algorithm, which is what I have implemented, basically just involves multi-level comparison. For example, café comes before caff because at the primary level, the accent is ignored and the first word is treated as if it were cafe. The secondary level (which considers accents) only applies then to words that are equivalent at the primary level.

The UCA (and my code) also support contraction and expansion. Contraction is where multiple letters are treated as a single unit—in Spanish, ch is treated as a letter coming between c and d so that, for example, words beginning ch should sort after all other words beginnings with c. Expansion is where a single letter is treated as though it were multiple letters—in German, ä is sorted as if it were ae, i.e. after ad but before af.

Here is how to use the pyuca module.

Usage example:

from pyuca import Collator
c = Collator("allkeys.txt")

sorted_words = sorted(words, key=c.sort_key)

allkeys.txt (1 MB) is available at


but you can always subset this for just the characters you are dealing with (and you will need to do this if any language-specific tailoring is needed)

UPDATE (2006-02-13): Now see bug fix

by James Saiz : 2006/01/27 : Categories Python unicode : 0 trackbacks : 0 comments (permalink)

Mozart 250th

Today is the 250th anniversary of the birth of Wolfgang Amadeus Mozart.

It's hard to overstate just how much of an influence Mozart has been on me as a composer.

I think at some point around the age of 13, I decided that I liked Mozart more than Beethoven and while my esteem for Beethoven and (especially) Bach have increased over the years, Mozart dominated my teens.

I taught myself to compose almost entirely by studying scores of Mozart. The Western Australia State Library building had something called the Central Music Library which was the only part of the library you could borrow directly from (as opposed to via inter-library loan at a local library).

I think there was a period of my life where every couple of weekends my mum (whose birthday it also is today) would drive me to the Central Music Library to borrow scores of Mozart symphonies and concerti. I would mark on my calendar every Mozart piece scheduled to be played on the national classical music radio station and taped many of them. Over half the CDs I bought in high school were probably Mozart.

I'd listen to the music, reading along in the score, marking sections I liked and then going back and analyzing them. Studying his scores is how I learnt classical form, orchestration and harmony. Even when I wanted to learn to write fugues, my first model was the Kyrie from his Requiem rather than something from Bach's Well-Tempered Clavier or Art of Fugue.

Thank you Mozart.

by James Saiz : 2006/01/27 : Categories music_composition : 0 trackbacks : 4 comments (permalink)

Python Web Frameworks and REST

When I've looked at Web frameworks in the past (not just in Python) I've been disappointed by their lack of a resource-oriented focus.

RESTafarians like myself typically want URIs to map to objects that then simply implement HTTP methods (assuming HTTP is what's being used—REST isn't limited to HTTP).

I was aware of mnot's Tarawa experiment but hadn't seen that sort of approach adopted in any other projects.

In Leonardo, I started down the path of being resource-oriented but it is still somewhat half-assed.

In Demokritos, I wanted it to be a focus from the start. This makes a lot of sense with the Atom Publishing Protocol because APP is really just a subset of HTTP. You have to be able to handle PUT and DELETE verbs, support reading and writing of HTTP headers and be able to return proper HTTP response codes.

(Incidently, my understanding is that something like Ruby on Rails is a bit of disaster in this area.)

Although I haven't released a version with this yet, I've refactored out the general resource-oriented HTTP parts of my Demokritos code. You can take a look in the svn repository at webbase.py for the module and server.py for a specific example of it in use for the Atom Publishing Protocol.

Since I started on Demokritos, web.py has come out. It takes a similar approach, does a bit more (especially in dispatching - where webbase.py currently lacks), but I think I still like webbase.py more.

Very recently, I've started talking to Sylvain Hellegouarch and it sounds like CherryPy might increasingly enable a RESTful approach so there's a possibility at some point that Demokritos could move over to CherryPy (which would probably mean Leonardo would too if Leonardo becomes a layer on top of Demokritos).

In the meantime, check out webbase.py. I welcome feedback on it—especially from other RESTafarian Pythonistas.

UPDATE (2006-02-10): The link to webbase.py above is now incorrect. See pyworks.web.

by James Saiz : 2006/01/25 : Categories rest Python demokritos leonardo atompub : 0 trackbacks : 85 comments (permalink)


Via Rick Brannan, I've discovered Loren Rosson's blogging about Perelandra.

Very few people know this, but one of my secret goals in life is to produce and direct a film adaptation of Perelandra and possibly the entire trilogy.

by James Saiz : 2006/01/24 : Categories filmmaking : 0 trackbacks : 2 comments (permalink)

Implementing the Unicode Collation Algorithm in Python

Has anyone implemented the Unicode Collation Algorithm in Python?

If not, I think I'll try.

by James Saiz : 2006/01/22 : Categories Python unicode : 0 trackbacks : 0 comments (permalink)

Subversion as a Persistence Layer

For almost as long as I've been working on Leonardo, I've been thinking about Subversion as a persistence layer. In fact, some of the design of Leonardo's current persistence layer (LFS = Leonardo File System) was inspired by Subversion and I've long thought about an alternative LFS implementation that sits directly on Subversion.

Adding the persistence layer to Demokritos I thought about starting out with Subversion right away so I've been doing some investigation.

The SWIG-based Python bindings that Subversion comes with scared me off pretty quickly. Then I found pysvn which provides a much more Pythonic (and well documented) interface to Subversion.

The problem is that pysvn assumes that you are checking out to a local workspace, which is not what I want. I just want to be able to send and receive Python strings, not create a workspace, have Demokritos read/write files from/to that workspace and then use pysvn to check-out/commit.

But it doesn't appear possible with pysvn. Not because of a limitation of pysvn itself but because the Subversion client API doesn't support it.

It would seem to me that it would still be possible to use Subversion the way I want to but it would involve one of

Subversion itself doesn't seem to expose the bits I need (certainly not in Python)

UPDATE: Literally seconds after posting this, Chris Curvey responded to a query I made on the pysvn mailing list and pointed me to a session given by Greg Stein at OSCON. The abstract for the session mentioned SubWiki which looks like it might be doing what I want to do (although it may still use a local workspace). Investigating more. Maybe I should just ask Greg.

UPDATE (2006-02-09): Good news! Now see Using the Python Subversion Binding.

by James Saiz : 2006/01/21 : Categories leonardo demokritos Python subversion : 0 trackbacks : 3 comments (permalink)

Quisition Short-Term Demo (but not in IE)

Previously, I talked about Short-Term Testing in Quisition.

I've now made a demo of this available on the site.

It's actually the first bit of JavaScript more than a few lines I've ever written.

It works fine in Safari and Firefox but I haven't got it working in IE6 yet. If anyone with IE6 JavaScript experience could take a look and see what I've done wrong, that would be greatly appreciated.

And let me know if you have any suggestions about the test itself too.

People subscribed to the announcement feed already knew about this demo two weeks ago and I've got some good feedback. I always welcome more, though.

by James Saiz : 2006/01/21 : Categories quisition javascript : 0 trackbacks : 7 comments (permalink)

Planet Python Doesn't Support Atom 1.0

Looks like my move to Atom 1.0 didn't quite go as smoothly as I would have liked.

It appears Planet Python doesn't properly handle Atom 1.0 - or more specifically, content of type "html".

by James Saiz : 2006/01/21 : 0 trackbacks : 1 comment (permalink)

No Internet Access

Sorry for the lack of posts of late. I've had no Internet access in my hotel room for most of the last week.

by James Saiz : 2006/01/20 : 0 trackbacks : 1 comment (permalink)

iMac Ordering Link Inconsistencies

As of when I'm posting this, on the iMac page on Apple Australia's website,


the Order Now button at the top right goes to the G5 ordering page while the Order Now button at the bottom right goes to the Intel ordering page.

On the equivalent US page,


both Order Now buttons go to the G5 ordering page.


(No, I'm not going to buy one just yet - was just pricing options)

UPDATE (2006-01-23): Apple Australia has fixed their links but the US page still has Order Now buttons going to the G5 ordering page.

by James Saiz : 2006/01/15 : Categories mac : 0 trackbacks : 1 comment (permalink)

Demokritos 0.2.0 Released

I'm pleased to announced the next release of Demokritos.

Demokritos is a Python library and content repository implementing the Atom Syndication Format (RFC4287) and Atom Publishing Protocol (currently a standards track Internet-Draft)

You can download the code at http://jamessaiz.en.wanadoo.es/2006/demokritos/demokritos-0.2.0.tgz

At this stage, Demokritos is not really intended for anything other than interoperability testing with Atom clients. However, the library for parsing and generating Atom feeds might be useful standalone.

There is no persistence and no security but most of RFC4287 and draft-ietf-atompub-protocol-07 is implemented.

by James Saiz : 2006/01/14 : Categories demokritos atompub Python : 0 trackbacks : 5 comments (permalink)

Switching This Site To Atom 1.0

Tomorrow, I'm going to switch over the atom feeds on this site to Atom 1.0

Hopefully no one will miss a beat.

UPDATE: Done. Just a couple of lines added to the Leonardo config file.

by James Saiz : 2006/01/14 : Categories this_site : 0 trackbacks : 0 comments (permalink)

Welcome to the Blogosphere Tom

My friend and filmmaking partner-in-crime Tom Bennett has started blogging about his quest to make a feature film (hopefully with me!)

If you're at all interested in filmmaking and screenwriting, I suggest you check it out.

by James Saiz : 2006/01/14 : Categories filmmaking : 0 trackbacks : 0 comments (permalink)

Maths Challenge: Project Euler

I haven't started it yet but I can just tell that this is going to take up a lot of the time this weekend I was planning on spending on Leonardo, Demokritos and Quisition.

by James Saiz : 2006/01/13 : Categories mathematics : 0 trackbacks : 2 comments (permalink)

Proof that Python Programmers are Smarter

The top three languages used in solving the Project Euler puzzles are, in order:

However, the average score of people using those languages:

QED :-)

(NOTE: Delphi and APL/J/K programmers are even smarter, apparently)

UPDATE (2006-01-21): I may have inadvertently skewed the statistics by this post (and its appearance on Planet Python and the Daily Python URL).

Since my post, the number of C/C++ programmers has risen by 19%, the number of Java programmers by 13% but the number of Python programmers by 86%. So there are a disproportionate number of newcomers amongst the Python programmers. Because one starts off with a low score, the average score is skewed unless the Python-programming newcomers stick with it.

Note that even with this skewing, Python still beats C/C++ and Java for average programmer score :-)

by James Saiz : 2006/01/13 : Categories Python : 0 trackbacks : 0 comments (permalink)

50mm Prime Arrives

My 50mm prime lens arrived today. It a Canon EF 50mm f/1.4.

A prime lens is one with a fixed focal length (unlike a telephoto which can zoom). This means a lot less glass which means it's faster (i.e. lets more light in) and the images are clearer.

My camera feels very different with the new lens attached because of the shift in centre of gravity. I also find myself constantly going to zoom while looking through the viewfinder which is a habit that will be good to get out of. Even when using my telephoto, I should decide on a focal length first and then move to get the right composition for that focal length. Having a prime will at least get me used to what 50mm looks like (or rather 80mm given the 1.6 multiplier on my Canon 10D—did I mention I want a 5D :-)

Being able to go to f/1.4 is amazing. To put things into perspective, I took pictures of the same subject, one with the new lens and one with my existing lens, a 28-135mm telephoto.

It was indoors without much light. The maximum aperture I could get on my telephoto at 50mm was f/4.5 and I had to use a shutter speed of 1/8s.

With my 50mm f/1.4 at f/1.4, I could take (roughly) the same picture with a shutter speed of 1/180s.

f/1.4 should let 10-times more light in than f/4.5. The more than 20x shutter speed increase is likely in part due to being a prime lens but the exposure was a little darker so it's hard to be sure.

Of course, with f/1.4, the depth-of-field is lovely and shallow.

The bokeh is very pleasing as well, but I need to test it more at smaller apertures. (Bokeh is the quality of the blur)

I'm certainly happy with it so far as a second lens.

by James Saiz : 2006/01/10 : Categories photography : 0 trackbacks : 2 comments (permalink)

IM2000 and Atom

IM2000 is a pull-based mail transport proposal where the sender stores the mail and the recipient is just pinged to go collect it. (via Mark Baker)

It's particularly interesting to think about in terms of feeds and feed aggregation. Mailing lists just become feed aggregations, for example.

If you look at the proposed IM2000 architecture from Jonathan de Boyne Pollard, there's a lot you could do with the Atom format and Atom publishing protocol (APP):

"Message stores" could just be APP servers; the Message Store Originator Access Protocol would just be APP. The Message Store Recipient Access Protocol would just be Atom over HTTP. The Recipient Notification Agent Submission Protocol is just a form of trackback (which in turn could just be a specialised APP POST, in which case the Recipient Notification Agent could just be an APP server and the Recipient Notification Agent Query Protocol just APP as well).

by James Saiz : 2006/01/03 : 0 trackbacks : 0 comments (permalink)

43 People

I've written about 43 Things before. It's a site that connects people who want to do things with people who have already done them.

Via Justin Johnson, I found out there's a sister site 43 People where you say who you'd like to meet and who you've already met.

If you're on the site already and you've met me, look me up.

There's also 43 Places. No prizes for guessing what that's about.

by James Saiz : 2006/01/02 : 0 trackbacks : 15 comments (permalink)

File System Archaeology for MorphGNT

Some of you will be aware of Ulrik Petersen's work on augmenting Tischendorf's 8th edition with morphological tags and lemmata, based on work by Clint Yale and Maurice Robinson. Ulrik is also the developer of Emdros, an open-source text database engine for annotated text.

The overlap of Ulrik's interests and work with my own on MorphGNT is very exciting and so we've started talking about how we might be able to collaborate on some things together.

To help facilitate this, I've spent much of this long weekend so far going through the last 12 years of work on MorphGNT and putting things into Subversion. Because my work on MorphGNT has always been in fits and spurts and has spanned approximately five different desktop machines over the 12 years, it's required a fair bit of "file system archaeology".

The archaeology analogy seems apt because, I'm essentially piecing together a history based on what "layer" I'm finding the files in - e.g. a file on a backup of my website in 2002 probably dates later than those found in the tar balls from when I moved from one machine to another in 1997.

There's also an analogy with textual criticism as in some cases I have to look at two files and judge whether a change from A to B or B to A is more likely.

It's been a lot of fun, especially uncovering little scripts I wrote back in the nineties to do various analyses.

by James Saiz : 2006/01/01 : Categories morphgnt : 0 trackbacks : 0 comments (permalink)

Content made available under a Creative Commons Attribution-NonCommercial-ShareAlike license