James Saiz : Blog James Saiz - 2005

James Saiz

journeyman of some

Blog James Saiz - 2005

XML Declaration Required for UTF-8 AJAX

Jenni was adding some non-ASCII cards to her local Quisition instance and it wasn't working. The browser, Safari, was getting the wrong encoding.

Strangely, manually telling Safari the encoding didn't help. I suspected culprit might be AJAX as the problem was with content loaded asynchronously.

Sure enough, after being thrown off by a couple of red herrings, I found that the response to XMLHttpRequest, at least on Safari, requires an XML Declaration with encoding="utf-8" even though, according to the XML TR, it shouldn't.

by James Saiz : 2005/12/30 : Categories quisition ajax : 0 trackbacks : 0 comments (permalink)

Summer of Code T-Shirt

When I got back from Brunei, there was a notice from DHL saying a parcel delivery had been attempted but no one was home. I wasn't expecting a parcel while I was away.

The notice had a waybill number so I entered that in on the DHL website and, while it didn't give me the sender, it did give me the city of origin: Redwood City, CA.

I hadn't ordered anything from a company in Redwood City.

So I rang DHL. The package was from Google.

I went and picked it up from the depot and it was a Summer of Code t-shirt to thank me for my participation this year as a mentor.

Cool that it arrived in time for summer here :-)

by James Saiz : 2005/12/29 : 0 trackbacks : 0 comments (permalink)

Contributing to Open Source Python Projects

Adudzik asks on 43things:

As a smart and enthusiastic beginner, where should I look for good open source projects, preferably in Python?

While it's possible to find open source projects being done in Python on SourceForge and listed on FreshMeat, what would benefit people like Adudzik, in my opinion, is being apprenticed to a willing project lead. Sort of like the Summer of Code but longer term and restricted to existing projects (and presumably with less chance of monetary payment). SourceForge and FreshMeat don't really indicate which projects would be willing to take on an apprentice.

So perhaps python.org could run a directory of open source python projects that would be willing to take on an apprentice.

I'd certainly be willing to take on an apprentice or two on a number of my projects such as Leonardo, pyso or Cleese.

As I've written elsewhere:

I believe writing software is a craft. I also believe that writing code is something well suited to an apprentice-journeyman-master model particularly when applied in an open source context.

So ultimately I'd love to see virtual schools under a master with a number of journeymen and apprentices. Apprentices work on projects mostly under the direction of the journeymen. The journeymen have more responsibility on projects and start their own projects under the direction of the master. Eventually, a journeyman presents a released piece of software as his or her "masterpiece" and is declared by some loose collection of masters (a guild) as a new master. This guild would also be responsible for the recruitment of new apprentices.

That's my larger vision but just having a directory of projects willing to take on an apprentice would be a good start.

Any other thoughts?

by James Saiz : 2005/12/26 : Categories python open_source software_craftsmanship : 0 trackbacks : 2 comments (permalink)

Congratulations Bowstreet

Congratulations to the investors and employees of Bowstreet on the acquisition of their company by IBM.

Bowstreet, where I worked from mid 1999 to early 2002, was a wonderful experience for me. Without Bowstreet, there would be no mValent.

by James Saiz : 2005/12/24 : 0 trackbacks : 2 comments (permalink)

Back After Vacation

If you're wondering why I haven't blogged for two weeks, it's because I've been on vacation.

My parents, sisters and HB went to Brunei, where my family lived from 1982-1986. Besides relaxing at the hotel and doing what little sight-seeing there is, we visited my old house, school, etc.

It's a strange experience going back to a place after 20 years; especially when the last time was as a child. Some scenes brought back memories; others I had to really struggle with to match up with my memories (because of changes; not faulty memory)

I'll probably post some photos soon - perhaps even some then and now photos.

by James Saiz : 2005/12/23 : 0 trackbacks : 1 comment (permalink)

Short-Term Testing in Quisition

When using physical flash cards I use two distinct methods for determining which subset of cards to test myself on in a particular session and what happens when I get a card right or wrong. I'm implementing the same system online for Quisition.

The first is what I call short-term testing.

It applies only to the current pile being learnt (as opposed to older ones being reviewed) and I try to keep this to around 10 cards. I try to do this a couple of times a day and it usually only takes a few minutes.

I go through the pile testing myself on each one and putting them into a right pile or wrong pile. If there are no cards in the wrong pile, I'm done. However if there are, I go through the wrong pile again. If I get it right it goes in the right pile but if I get it wrong again it goes back in the wrong pile. I keep repeating this until the wrong pile is empty. Then I repeat the whole process again.

So there are, in effect, two while loops, the outer while loop testing (or retesting) all cards and the inner loop retesting the cards got wrong.

Note that the results of the short-term test don't need to be persisted. For this reason, I've implemented it entirely client-side in Javascript for Quisition.

I'll put up a demo short-term test in the next few weeks for people to try out. Subscribe to the announcement feed on the Quisition website to find out when it's available.

UPDATE (2006-01-21): Now see the demo.

by James Saiz : 2005/12/10 : Categories quisition : 0 trackbacks : 6 comments (permalink)

Aperture Arrives, 50mm Prime On Its Way

Apple's Aperture arrived today, just two days after I read the damning review on ArsTechnica.

I have yet to try it out but I did notice that the box features a close up of a 50mm f/1.4 lens. Coincidently, I just ordered Canon's EF 50mm f/1.4 lens yesterday.

I've been eyeing the 50mm for a while as my first prime lens. I'll be using it on my 10D although I'd really like a 5D which has a full-size sensor.

by James Saiz : 2005/12/07 : Categories photography : 0 trackbacks : 0 comments (permalink)

Scalability and Uptime for Quisition

I've been thinking a lot about how to scale Quisition, the online flashcard site I'm working on in my "copious spare time".

Flashcard testing isn't a critical app but, given one of the features of Quisition will be its scheduling of what cards to review on which day, it's fairly important to users that the site is available daily.

Scaling I can mitigate somewhat by the number of users. I've been thinking my goal should be get 100 very happy users and then worry about the infrastructure to support 1000.

Given I'm thinking about these sorts of things, it was interesting to read Don't Scale: 99.999% uptime is for Wal-Mart at Signal vs Noise. In particular, this quote is a nice confirmation of my current attitude:

Before you have users, it’s a waste of time ensuring that they can always get to the service

Some interesting comments have been made on the Signal vs Noise post. Thoughts welcome here too.

by James Saiz : 2005/12/07 : Categories quisition : 0 trackbacks : 0 comments (permalink)

Leonardo 0.7 beta 1 Released

The first beta of Leonardo 0.7 is now available at:


Leonardo is an extensible content management system written in Python and initially focused on providing for personal websites with a password-protected wiki and blog (including Atom feed).

Changes Since 0.6.x

by James Saiz : 2005/12/06 : Categories leonardo python announcements (permalink)

DPs Seeing Red

A company simply called RED is tantalising digital cinematographers with their promise on an otherwise information-scarce website of a 2540p camera based on a full frame 4K CMOS.

That's as much a resolution increase over 1080-line HD as 1080-line HD is over standard definition.

by James Saiz : 2005/12/06 : Categories filmmaking : 0 trackbacks : 0 comments (permalink)

iMac Back Home

I got my iMac back today after three weeks.

I'm happy with the AppleCentre store that did the repairs but still very frustrated that Apple doesn't let them keep spare power supplies in stock (and then takes 2 weeks to ship them).

by James Saiz : 2005/12/06 : Categories mac : 0 trackbacks : 0 comments (permalink)

IE6 Transparent PNG Bug

I was just showing a friend the Quisition website on their machine (Windows XP with IE6) and noticed the background is non-white.

I vaguely remember reading about IE6 having a problem with transparent PNGs but until now it hasn't been something I've needed to worry about.

I guess the solution is to go back to Illustrator CS2 and make a white background version.

by James Saiz : 2005/12/05 : Categories quisition web_design : 0 trackbacks : 1 comment (permalink)

Upgrading This Site to Leonardo 0.7 Beta Candidate

I'm about to upgrade this site to what will (assuming all goes well) then be released as Leonardo 0.7b1.

Apologies if anything breaks.

UPDATE: Looks like it worked!

by James Saiz : 2005/12/04 (permalink)

Quisition: An Online Flashcard System

I've previously mentioned that I'm working on a web-based flashcard system. Well I've decided I'm going to try to make a website out of it.

I've given it the name Quisition because it's all about acquisition through inquisition. I've registered the domain quisition.com.

Nothing to see there yet, but there is an atom feed you can subscribe to to get announcements.

Well, there's also a little logo I designed for the site :-)

Over the next month, I'll probably put up some info about how it all works along with some screen shots.

Then, some time in Q1, I'll launch a limited beta to get feedback and gauge interest.

by James Saiz : 2005/12/04 : Categories quisition : 0 trackbacks : 3 comments (permalink)

Babylon 5 Scripts

JMS is releasing his Babylon 5 scripts with notes as a 14-volume series over the next year.

Besides being an absolutely thrill for someone like me who's a fan both of B5 and the making of films and episodic television in general, the project is interesting in some other ways too:

Amongst other things, the 15th volume will answer the decade-old question: "What would Babylon 5 have been like had Sinclair stayed?"

First two volumes are out. Discount on the second ends today.

And so it begins...

by James Saiz : 2005/12/03 : Categories babylon_5 filmmaking (permalink)

Revisting Versioned Literate Programming

Greg Wilson asks about incremental display of source code.

His example shows somewhat the kind of thing I was thinking when I wrote about versioned literate aspect-oriented programming where I said that I'd like a literate programming tool for writing tutorials...

I could write a web and then tangle it to generate the [...] application and weave it to get the tutorial. But as features are incrementally added to the application over the course of the tutorial, conventional literate programming might not be enough. At the very least, some kind of versioning would need to be included.

Greg is talking more about animated display online using Javascript but from an authoring perspective, I think we're looking for a similar tool. Greg's example is really nice for demonstrating how code gets developed at certain insertion points. That starts to touch on the aspected-oriented element I was thinking of (although there are no cross-cutting concerns in his simple case)

Back in March 2004 when I wrote my original post, Dave Long commented:

Literate programming is composing a program source by pasting together a dag of (smaller) chunks.

Versioning, however, is composing a program source by pasting together a list of (sequential) edits.

The former is primarily spatial, and the latter primarily temporal, so it may not be too difficult to keep the two from interfering.

Heck; use the versioning capability to expand chunks in the appropriate environment of a configuration tree, and one would have self-documenting CM.

One thing that occurs to me is that the versioning in what I'm talking about isn't exactly the same kind of versioning you normally do with something like Subversion. It's not about keeping a history, it's about creating a history, so there'd be nothing wrong with going back and changing earlier versions.

I'm still wondering if anyone has built something like this. I haven't had the time :-)

2005/12/02 : Categories software_craftsmanship : 0 trackbacks : 5 comments (permalink)

2005/12/02 : Categories music_theory : 0 trackbacks : 0 comments (permalink)

Demokritos 0.1.0 Released

I've decided my Atom server prototype written in Python is probably at the stage where it needs some interoperability testing.

You can download a very early alpha at http://jamessaiz.en.wanadoo.es/2005/demokritos/demokritos-0.1.0.tgz

Alternatively, if you have a client and want to do some interop testing with me, either email me directly or look for me on #atom at irc.freenode.org.

Some caveats about this version:

2005/11/30 : Categories demokritos atompub python : 0 trackbacks : 1 comment (permalink)

Relational Python: Restrict

The next relational operator I implemented in my relational python experiment was RESTRICT.

Restrict filters the set of tuples by some condition. In many formulations of the relational algebra, the restriction is on just one attribute at a time and you chain RESTRICTs together but for relational python, we'll use a lambda that can do rich testing across one or more attributes.

Here was my first attempt:

def RESTRICT(orig_rel, restriction):
    new_rel = Rel(orig_rel.attributes())
    for tup in orig_rel.tuples():
        if restriction(tup):
    return new_rel

but I thought it would be neater as a list comprehension. The only problem was, there was no way in Rel to add multiple tuples at a time so I added the following method to Rel:

    def add_multiple(self, tupset):
        self.tuples_.update(set([self._convert_dict(tup) for tup in tupset]))

This enabled me to rewrite RESTRICT as:

def RESTRICT(orig_rel, restriction):
    new_rel = Rel(orig_rel.attributes())
    new_rel.add_multiple([tup for tup in orig_rel.tuples() if restriction(tup)])
    return new_rel

Here's how to use the function:

rel4 = RESTRICT(rel1, lambda tup: tup["SALARY"] > "30K")

And, of course, I had to write a lazy "view" version:


    def __init__(self, orig_rel, restriction):
        Rel.__init__(self, orig_rel.attributes())
        self.orig_rel = orig_rel
        self.restriction = restriction

    def add(self, tup):
        raise Exception

    def tuples(self):
        for tup in self.orig_rel.tuples():
            if self.restriction(tup):
                yield tup

As always, suggestions for improvements are welcome in comments.

Next up, I'll implement the cross product.

2005/11/30 : Categories relational_python python : 0 trackbacks : 2 comments (permalink)

Devin Kilminster Has Been Annealing

Devin Kilminster has taken the lead in category III of my ongoing programming competition. Unlike Mark Ellison, who used a deterministic algorithm, Devin used simulated annealing like I did.

I'm starting to put together a second edition of the programming competition that will involve more complex relationships between prerequisites. I don't think that will make the problem any harder for simulated annealing approaches (it will just involve changing the scoring function) but it will probably require quite different deterministic approaches than the current competition. I might have two divisions to keep it fair between the 'annealers' and the 'deterministas'.

2005/11/29 : Categories programming_competition : 0 trackbacks : 4 comments (permalink)

Weight Loss

On 1st November I decided to try to lose some weight. I was already overweight at the start of this year but then put on an additional 15 pounds during the five months I was in the US and Europe.

I need to lose somewhere between 30 and 50 pounds. My first goal is to lose the 15 pounds to get me back to my weight at the start of the year. The next goal after that is to lose an additional 15 pounds to get me to the weight I was five years ago. Anything after that is a bonus but I'm going to go for another lot of 15 pounds to lose a total of 45 pounds.

My target is to lose 2 pounds per week.

I'm not doing anything dramatic. I'm easing my way into the first phase by doing four things:

For the most part, I've stuck with that so far.

More than anything, this strategy is simply reflective of what I was doing wrong before.

How am I going? I've lost 8 pounds as of last weekend (i.e. after 4 weeks). I already feel a lot less bulky. Hopefully I'll reach my first 15 pound milestone by the end of the year.

The real test will be whether I can continue the loss when visiting the US next.

by James Saiz : 2005/11/28 : Categories personal weight_loss : 0 trackbacks : 9 comments (permalink)

What I've Been Up To

I haven't blogged for a week. Here's a quick summary of what I've been up to:

2005/11/27 : 0 trackbacks : 3 comments (permalink)

Atom (and Demokritos) Status

Tim Bray has a nice summary of how the Atom Publishing Protocol works and a note on the current status.

On the weekend I worked more on my Python implementation of said protocol, Demokritos. I'm close to an initial release which won't be usable as a real Atom server yet but should be good enough for interop testing.

The small hurdle I've just encountered is that (at least in the 06 spec), Atom entries sent from clients to a server need not be fully valid entries. They can omit information that will be subsequently provided by the server itself. The mistake I've make is that my code for parsing Atom entry xml and building an object model is strict about the entry xml being valid. I'll need to relax that for incoming entry POSTs.

2005/11/21 : Categories atompub demokritos python : 0 trackbacks : 3 comments (permalink)

The Big 040

Today I turned 040. In octal that is. 0x20 in hex.

That's 32 for the humans.

Achieved almost none of the goals that I set for myself last birthday but it was a great year nevertheless. I'll write more later.

2005/11/19 : Categories personal : 0 trackbacks : 1 comment (permalink)

Relational Python: Projection

Now that we have a basic class for relations and a method for displaying them, we'll now start to go through some relational operators, starting with PROJECT.

PROJECT is defined such that if rel1 is:

| E1  | Lopez | D1  | 40K    |
| E3  | Finzi | D2  | 30K    |
| E2  | Cheng | D1  | 42K    |

then PROJECT(rel1, ["ENO", "ENAME"]) is:

| E1  | Lopez |
| E3  | Finzi |
| E2  | Cheng |

It's sometimes useful, even when not dealing with relations, to be able to do projections of dictionaries. The following function does that:

def project(orig_dict, attributes):
    return dict([item for item in orig_dict.items() if item[0] in attributes])

This can then be used to define PROJECT:

def PROJECT(orig_rel, attributes):
    new_rel = Rel(attributes)
    for tup in orig_rel.tuples():
        new_rel.add(project(tup, attributes))
    return new_rel

(Note that if Rel took an iterator over tuples in its constructor, this could be simplified further—I might do that at some stage)

This PROJECT function implements the relational operator PROJECT. It makes a new relation based on a point-in-time snapshot of another. However, it's easy to make the projection dynamic as well.

The following class allows one to create a projection of a relation that is dynamic. In other words, it is a projection of the current state of the original relation not just at a point in time.

class PROJECT_VIEW(Rel):

    def __init__(self, orig_rel, attributes):
        Rel.__init__(self, attributes)
        self.orig_rel = orig_rel

    def add(self, tup):
        raise Exception

    def tuples(self):
        for tup in self.orig_rel.tuples():
            yield project(tup, self.attributes_)

rel3 = PROJECT_VIEW(rel1, ["ENO", "ENAME"]) works just like rel2 = PROJECT(rel1, ["ENO", "ENAME"]) except that if new tuples are added to rel1, then rel3 changes whereas rel2 stays the same.

As always, I welcome people's suggestions as to how to improve this.

2005/11/17 : Categories relational_python python : 0 trackbacks : 5 comments (permalink)

Early Birthday Present: Australia in the World Cup

It's my 32nd birthday on Saturday, but Australia's soccer team, the Socceroos, gave me an early present by beating Uruguay to qualify for the World Cup in Germany next year.

The last time the Socceroos made it into the World Cup, I was only a few months old.

To give those of you in the US an idea of the significance of this: it's almost the Australian equivalent of the Red Sox winning the World Series.

One of my fondest memories growing up was watching the 1986 World Cup in Mexico. I went for Germany, as I have every time since. Next year, I'll be able to follow my home country for the first time.

Australia wins

by James Saiz : 2005/11/16 (permalink)

Disclosure: Trying Out Google Analytics

I want to try out Google Analytics so I've temporarily added it to this site. Just letting everyone know in the interests of openness.

2005/11/16 : Categories this_site google : 0 trackbacks : 1 comment (permalink)

Dead iMac and Getting Stuff Off the Hard Drive

So my iMac is completely dead now.

I rang the local AppleCentre and asked if they had power supplies in stock. I had dreams of them being able to replace it on the spot (assuming that is the problem) so I could be up and running by this afternoon.

"We're not allowed to carry spare parts" was the response. I'm trying to keep an open mind but I'm not sure why Apple would not allow spare parts to be kept on site at an AppleCentre. Especially given how far away Perth is and how long it takes for things to get shipped here.

The estimate for the replacement power supply to come in (again assuming that's the problem—I haven't actually brought the machine in to them yet) is 1-2 weeks.

1-2 weeks without my main non-work computer!

I (stupidly in hindsight) didn't get AppleCare on the iMac (although I have it on other Apple hardware that's never had a problem). The guy on the phone claimed AppleCare owners would get priority but given it's a shipping issue I doubt it makes much difference to the 1-2 weeks.

While I do do regular backups, there's a lot of cool stuff I've been working on in the last week that isn't in the last backup.

So I'd really like access to the stuff on the hard drive. I wonder if I take my iMac apart, whether the drive will just plug straight in to my PowerMac (temporarily replacing the second drive in that).

UPDATE: When I got to the AppleCentre I asked if they could open up the iMac right away and give me the hard drive. Fortunately they were happy to do this for me. I should be able to hook it up to my PowerMac (the one usually used for music and film) and get my stuff off. Now I don't really mind the 1-2 weeks as much :-)

by James Saiz : 2005/11/14 : Categories mac (permalink)

Optical Illusion

Okay, so this is just about the coolest optical illusion I've ever seen. Especially the second part about the pink dots disappearing all together.

Apparently the visual cortex only cares about differences and the pink dots are fuzzy enough (especially when you are focused on something else) that they seem constant even though the eye is doing its usual fast little movements (saccades). The green appears because a reduction in pink and a gain in green are the same thing.

Jenni pointed out to me that

if you fixate somewhere outside the circle, it still works, but if you move your focus to another point outside the circle, you can see the pink again. You don't have to look at the pink dots again to make them reappear. You just have to move them to a different point on the retina.

2005/11/13 : 0 trackbacks : 2 comments (permalink)

Using Python Coroutines for AJAX Applications

I think I just had an epiphany regarding the upcoming coroutine support in Python. I don't mean I came up with anything new (I think Ruby programmers have been doing it for a long time), just that I finally grok it—or at least, I think I do.

You see, I'm writing a little AJAX-based flash card website and I started off writing a standalone dynamic HTML mock-up of (obviously) the client side but without any communication to the server yet.

I then wrote a console-based flash card program to experiment with the algorithm I want to use for what card to show when, when to learn new cards, etc. The console-based program just has a function called test that takes a card object, tests the card on the user and returns a boolean as to whether the user got the card right or not.

So my code has a bunch of places where I say:

for card in to_test:
    correct = test(card)
    if correct:

In theory, it's only this test function that's throw away. It would be nice if I just had to replace those calls to test when I come to write the server version.

The only catch is that it will be the client that initiates requests for cards. Easy, I thought: I can use yield statements in the server code wherever I want to present the user with the next card. That way, the client (or more accurately some proxy for the client running on the server) can do a next() to get the next card.

The problem is that the client needs to return the result. The yield can't be a one-way street. At the point the server yields a card, it needs to also find out the result of that yield.

Enter coroutines. If I understand correctly, this is exactly the kind of problem coroutines solve. The yield statement in my server-side code becomes a yield expression. The client sends the generator the result of testing the card and that becomes what the yield expression evaluates to.

Have I understood coroutines correctly? If so, I can't wait for Python 2.5!

2005/11/13 : Categories python : 0 trackbacks : 8 comments (permalink)

More on the One Red Paperclip

In response to my comments on One Red Paperclip project and the benefits of trade, Mark Baker wonders:

if there's any way to benefit from this asymmetry on a large scale?

But I'm not sure there is an asymmetry. Remember that both parties in each exchange value what they are receiving more than what they are giving. The exchange might look asymmetric to a third party but that third party might have different values than the two participating in the exchange.

Clearly it seems dramatic when you look at the start and end points but remember also that each step is with a different person. So say A gets traded for B which gets traded for C and so on to Z.

It's possible that every participant along the way could value Z more than A but still be perfectly happy with their individual exchange.

One might argue that the person giving up Z for Y might have accepted A on the grounds that they could have done the A, B, ... Y exchanges themselves but there is the transaction cost of discovery that the person doing all the exchanges has to bear. When you think about it, the person who started with A is doing a benefit to everyone along the way, even if he's just motivated by getting Z for himself.

I think one also needs to consider that the person who gave up, say, L for K might then go trade K for something even more valuable to them, so it's not just the people on the path from A to Z that participate in some sense.

The transaction cost of discovery might be very high and this might be the undoing of a project like One Red Paperclip. How does Mr Paperclip know that exchanging P for Q gets him closer to Z? Do the self-organizing benefits of a free market really come into play when there is one person essentially trying to coordinate? Finding the individual exchanges that would lead to a particular goal sounds like a job for the market as a whole, not one individual.

Wow. Mark's question opened up a whole bunch of thoughts. And I didn't even get to his second question about the long tail. I also didn't talk about arbitrage which presumably is relevant to all this. And eBay has got to factor in somewhere :-)

As I'm just an economics novice, I'd really like to get some of my favourite economics bloggers to post their thoughts on this. I'll email Tyler Cowen and Peter Boettke and link to their posts if they make them.

2005/11/12 : Categories economics : 0 trackbacks : 0 comments (permalink)

First Email, Now My iMac

A couple of days ago, my iMac turned off. By itself. I didn't think much of it at the time. I thought maybe I'd shut it down and forgotten about it.

But it happened a couple more times last night and this morning. Now it dies within a minute or two of starting.

Definitely won't get stuff done this weekend that I was planning. Oh well :-)

UPDATE (2005-11-13): Now it dies within seconds of starting. I'd say the power supply is fried. From a little research it seems to be a known issue. Hopefully the local service place has power supplies in stock and can fix it on the spot.

2005/11/12 : 0 trackbacks : 0 comments (permalink)

by James Saiz : 2005/11/11 : Categories relational_python python (permalink)

Email Outage

If you've sent me email in the last 18 hours, I haven't been able to read it and may not be able to do so for some time. My mail provider has had an outage (8+ hours so far) and is claiming that it will still be hours before service is returned.

UPDATE (2005-11-12): Email is still down. Could turn out to be days. Looks like Merlin Mann uses the same mail host.

UPDATE (2005-11-13): I woke up this morning and mail was working. Unfortunately iMac isn't.

2005/11/11 : 0 trackbacks : 7 comments (permalink)

Relational Python: Basic class for relations

A relation is basically a set of dictionaries (called tuples) where each dictionary has identical keys (called attributes).

While, as you'll see in the next couple of posts in this series, my display routine and the initial relational operators work on iterations over plain Python dictionaries, I found it useful to implement a relation, at least in these preliminary stages, using a different internal structure (something Date is clear in his book he has no problem with).

Basically, I store the each tuple internally as a Python tuple rather than a dictionary and the relation also keeps an ordered list of the attributes which is used as the index into the tuples. Amongst other things, this gets around dictionaries not being hashable. It's also a storage optimization akin to using slots for Python attributes.

class Rel:

def __init__(self, attributes): self.attributes_ = tuple(attributes) self.tuples_ = set()

def add(self, tup): self.tuples_.add(self._convert_dict(tup))

def _convert_dict(self, tup): return tuple([tup[attribute] for attribute in self.attributes_])

def attributes(self): return set(self.attributes_)

def tuples(self): for tup in self.tuples_: tupdict = {} for col in range(len(self.attributes_)): tupdict[self.attributes_[col]] = tup[col] yield tupdict

Note that Rel.attributes and Rel.tuples return a set of attributes and a generator over dictionaries just as you would expect.

By implementing the handy little helper function:

def d(**args):
    return args

we can now create a relation and add tuples like so:

rel1 = Rel(["ENO", "ENAME", "DNO", "SALARY"])

rel1.add(d(ENO="E1", ENAME="Lopez", DNO="D1", SALARY="40K")) rel1.add(d(ENO="E2", ENAME="Cheng", DNO="D1", SALARY="42K")) rel1.add(d(ENO="E3", ENAME="Finzi", DNO="D2", SALARY="30K"))

In the next post I'll share my display routine and, following that, start on the relational operators, beginning with PROJECT.

by James Saiz : 2005/11/10 : Categories relational_python python (permalink)

One Red Paperclip and the Benefits of Trade

The One Red Paperclip project and the Donald Duck story it reminded Hans Nowak of is a nice example of a key principle in economics: voluntary trade benefits both parties.

Gene Callahan, in his wonderful book Economics for Real People, makes the point that when we trade voluntarily it isn't because we give equal value to what we are receiving and what we are giving—it is because we value what we are receiving more than what we are giving. If we valued them the same, there would be no reason to trade. (Note that I'm not just talking about monetary value.) The exact same thing is true of the other party.

People value things differently, in part because people just have different values but also because of marginal utility. Marginal utility is just the idea that the value to you of something is based on the value of getting it in addition to what you already have. e.g. if you already have enough food to eat, you might not value extra food as much. A second car isn't as valuable as the first. A third even less so. The animal in the Donald Duck cartoon that needed the string for his kite was willing to give up a pocket knife for it but if someone offered him the same deal again 5 minutes later, he likely would have rejected it as the marginal utility of the string had greatly reduced.

by James Saiz : 2005/11/10 : Categories economics (permalink)

5 things

Dave Warnock has tagged me in this blogospherical equivalent of a chain-letter.

Ten years ago

Finishing up my linguistics degree at University of Western Australia. Working part-time there as their first webmaster. Conspiring to get SGML used on the Web.

Five years ago

Living in Portsmouth, New Hampshire. Working at Bowstreet Software as Director of XML Technology. Going to conferences every couple of weeks to talk on XML and Web Services.

One year ago

Living back in Perth, but working for mValent in Boston. Working on the editing and scoring for my first short film, Alibi Phone Network. Resumed work on my morphological database of the Greek New Testament, MorphGNT and was in the midst of a major rewrite of Leonardo (release as 0.4)

Five yummy things

Kaju katli, good sushi, anything at Tu Y Yo (Somerville, Mass.), anything cooked by HB, anything cooked by my Mum.

Five songs I know by heart

Too many to name. Certainly anything by Nelson James :-)


2005/11/10 : 0 trackbacks : 0 comments (permalink)

Relational Python

Reading Chris Date's Database in Depth, I started to wonder what it would be like to have relational algebraic operations in Python. This is the first in a series of posts exploring that idea.

I'll start by defining a simple class for relations. In subsequent parts, I'll implement tabular display, the relational algebra and then see where it goes from there.

The goal is not to try to implement a SQL database in pure Python. Rather the goal is to extend Python's rich data structures like dictionaries and sets with additional concepts from relational theory.

It's an exploration for me and you get to come along for the ride (sort of like the Poincare Project which is by no means over yet). Maybe some of you will learn something. I certainly hope you'll teach me a thing or two in the comments and in your emails.

by James Saiz : 2005/11/09 : Categories relational_python python (permalink)

Multiclassing versus single classing in RPGs and real life

Sometimes I think about alternative paths I could have followed vocationally and the steps I would take to get there if I were much younger and making that choice now. It's almost like creating a new character in a role-playing game: "I'll start off as a economics undergraduate and then after five levels I'll switch to the prestige class Austrian Economist and go on a quest for Bigby's Prize in Honour of Alfred Nobel".

In RPGs, if you feel your character "concept" isn't working, you can go back and start a new one. Of course, that's much harder to do in real life, although some people do go back to undergraduate studies for a complete change in career.

I'm clearly multiclassing in real life. After getting a couple of levels in Mathematician, I switched over and progressed three or four in Linguist. Then I went and levelled up in Technologist a good eight or ten levels (the first few specialising in the schools of Web and XML but then also adding Python and Open Source). Somewhere along the way I picked up a level in Filmmaking and a couple in Music.

Progression as a multiclass character is much slower because you're a jack of...no...a journeyman of some.

Oddly enough, very few of my pen-and-paper RPG characters have ever been truly multiclass. They've either completely been in one class or had a few initial levels in one then made a permanent switch.

I think I'm attracted to the singular focus of just one class but, without the ability to go back and start a new character in real life, I've decided multiclassing is the way to go for me.

2005/11/09 : 0 trackbacks : 0 comments (permalink)

Working on atompub-protocol-06

I've just started working on moving Demokritos over to supporting atompub-protocol-06.

I've completed the changes to the introspection document. Next step will be throwing away the old collection format in favour of a normal atom feed. I'll also need to implement support for collection indexing. I'm glad APP defines the manner in which ranges of a collection are accessed because it saves us having to come up with something proprietary for Leonardo.

2005/11/08 : Categories demokritos atompub leonardo python : 0 trackbacks : 5 comments (permalink)

Mark Ellison Regains Lead

Mark Ellison has regained the lead in Category IV of my ongoing programming competition.

2005/11/08 : Categories programming_competition : 0 trackbacks : 0 comments (permalink)

MorphGNT 5.08 Released

I'm pleased to announce the release of a new version of MorphGNT, the morphologically parsed Greek New Testament database made available under a Creative Commons license.

I haven't put together the change log yet but will shortly.

UPDATE (2005-11-08): Change log is now available on MorphGNT page.

2005/11/07 : Categories morphgnt : 0 trackbacks : 1 comment (permalink)

Apple Shipping Woes

I made the mistake last week of ordering a bunch of stuff from the online Apple Store at the same time as preordering Aperture. Just about the time I thought the stuff shipping immediately would arrive, I discovered that it hadn't even shipped yet because Apple doesn't ship partial orders. I guess I'm spoilt by Amazon.

I'm going to try to give Apple a call on Monday to see if they'll just go ahead and ship the rest of the order now—otherwise I'll have to cancel the entire order and start again. Can't say this is the first time Apple shipping has let me down. (Actually, Amazon did once too, but it was an Apple product they were shipping!)

2005/11/05 : 0 trackbacks : 0 comments (permalink)

Why the Q in IRAQ?

Hans Nowak asks:

Why is "Iraq" spelled with a q?

Iraq in Arabic is العراق

The final letter (Arabic is written right-to-left) is ق (qāf) which is a uvular plosive. A uvular plosive is produced like a velar plosive (English 'k') but with the back of the tongue touching the roof of the mouth further back.

Arabic also has a velar plosive ﻙ (kāf).

'k' is a common transcription for velar plosives and 'q' is a common transcription for uvular plosives.

So Iraq is spelled with a 'q' because the final consonant is a uvular plosive and not a velar plosive (even though English speakers pronounce it as if it were a velar plosive). Irak would be a different word in Arabic.

2005/11/04 : Categories linguistics : 0 trackbacks : 2 comments (permalink)

Word of the Day: Mojibake

According to wikipedia, Mojibake refers to the gibberish characters one gets when a document's character encoding is wrongly interpreted.

(via Infundibulum)

It's actually a word I'll probably find myself using.

2005/11/03 : 0 trackbacks : 0 comments (permalink)

Questioning the Authorship of Bach's D minor Toccata and Fugue

Bach's D minor Toccata and Fugue has long been one of my favourite works. Even though I find the exposition of the fugue oddly simplistic for Bach, I've always loved the drama of the work and its improvisational feel.

So, like Tyler Cowen, I'm shocked to discover that some scholars doubt its authenticity.

The arguments seem to include:

One theory is that it is an organ transcription of a piece for strings by another composer.

Like many popular articles on scholarly controversy, it's not easy to tell just how mainstream a view it is. It happens all the time in Biblical Studies that one scholar's controversial viewpoint is published in popular article as though it were scholarly consensus.

I'm also dubious of authorship debates in general just because I think the variation within one author can be much greater than the average difference between authors. And isn't a composer allowed to experiment and grow?

As (a sorry excuse for) a composer, I'm very aware of just how different my works from different time periods are.

Still, it's a fascinating theory, and one I might have to dive into a bit more.

2005/11/02 : Categories music : 0 trackbacks : 146 comments (permalink)

The Size of the Perth Market

In an answer to a question asked at the end of my talk on Monday, I suggested that mValent couldn't have been started in Perth because the market is too small. I didn't say the market is too small for software companies in general, just that certain types of software (and markets) require you to be close to the customer in the early stages so there need to be enough serviceable customers locally to sustain you in the initial years. In mValent's case, I don't think that's true of Perth.

I suspect that successful software companies based in Perth generally either have an ideal customer profile that fits more local companies or have a more mature market that is easier to service remotely.

by James Saiz : 2005/11/02 : Categories entrepreneurship (permalink)

There Needs to Be Enough Subject-Verb Agreement

In my previous blog entry I said:

so there need to be enough serviceable customers locally to sustain you

I initially wrote:

so there needs to be enough serviceable customers locally to sustain you

but I decided that "need" has to agree with "serviceable customers".

I confess, though, that I struggled for a while with it (although, as with many cases like this, it's clear to me now that I've thought about it).

2005/11/02 : Categories linguistic_observations : 0 trackbacks : 0 comments (permalink)

Inaugural Commercial Technology Network

Last night I was the guest speaker at the inaugural Commercial Technology Network. CTN is "an initiative to foster collaboration between entrepreneurs, service providers, suppliers, and most importantly, customers."

I was asked to speak on how I came to be involved in the founding of mValent and the role networking played.

Besides the actual telling of the story (starting all the way back at my childhood entrepreneurial endeavours) I tried to stress the importance of individual relationship building and, in particular, individual contributions to one's "tribe", particularly through a willingness to share knowledge.

One of my heroes, Tom Peters, has long talked about loyalty to one's network and recently, I've been reading the Tim Sanders book Love is the Killer App where he argues that "nice, smart people can win business and influence friends by sharing generously."

It was certainly a great honour that Michael Kyriacou and the other members of the CTN steering committee asked me to give the inaugural talk. It seemed to be well received.

by James Saiz : 2005/11/01 : Categories entrepreneurship speaking (permalink)

iTunes Music Store Opens in Australia

The Apple release I've really been waiting for...

iTunes Music Store is now available to people in Australia.

by James Saiz : 2005/10/24 : Categories apple (permalink)


Back in March, I asked Why No Apple Pro Photo App?

Looks like now there is one.

UPDATE (2005-10-20): I've now had a chance to watch the quick tour videos. It looks fantastic as a photographic productivity tool. It's no Photoshop when it comes to image manipulation (although perhaps it will eventually develop into that) but for actually managing photos it looks like they've got a lot of things right.

by James Saiz : 2005/10/19 : Categories apple (permalink)

Keep the Academic Writing Samples Simple, Stupid

James Gosling recalls his PhD days:

Back when I was a grad student I was spinning out of control trying to come up with a thesis topic. My advisor took me out to lunch one day and asked me a simple question: "What is a PhD thesis?" I yattered on for a while and he listened patiently. Eventually he said "No: It's just a stack of 100 pages with 4 signatures on top". I was falling into a common grad student trap of feeling that I needed to do something grandiose and solve all of the worlds problems. He was into "keep it simple". So I did, and I came up with a pretty straightforward thesis proposal. The odd thing was that when I finally finished my thesis, I realized that I had only delt with one sentence out of the simplified proposal.

This is significant for me, not because I'm having problems with my thesis, but with something much smaller. For my application, I have to provide samples of academic writing. I have two papers in mind I want to write (new papers because I'm too embarrassed about anything I wrote during my undergraduate days ten years ago). The problem is I think I'm setting the bar too high. I keep thinking these two papers have to be ground-breaking work. But they aren't even my thesis. They are just samples of academic writing. As long as remind myself they are just "a stack of 10 pages that proves I can write English and put together a bibliography" then I think I'm in good shape.

by James Saiz : 2005/10/16 : Categories phd linguistics : 0 trackbacks : 1 comment (permalink)

Error 400 Trying to Watch Apple Special Event

Last month, when I wanted to watch the Apple Special Event, QuickTime just gave me a 400 Bad Request error.

Same thing is happening with this month's Apple Special Event.

Anyone else getting a 400 Bad Request error?

In the case of the September event, it finally started working after a few days. Surely Apple can do better than this.

by James Saiz : 2005/10/12 : 0 trackbacks : 1 comment (permalink)

New Draft of Atom Protocol Out

draft-ietf-atompub-protocol-05 is now out.

I haven't looked at it yet to see what will have to change in Demokritos. I will, however, delay releasing Demokritos until I've implemented at least as much of 05 as I had 04.

2005/10/11 : Categories demokritos : 0 trackbacks : 2 comments (permalink)

Alibi Showing at New Hampshire Film Expo

I've neglected to mention until now that Alibi Phone Network made the official selection at the New Hampshire Film Expo which is on this week in Portsmouth. That's four festivals we've made the official selection for. Unfortunately I won't be able to make this one.

Tom Bennett had a great poster made for this festival which I'll hopefully be able to put online at some stage.

Tom is also a finalist in the screenplay competition for his feature script The War in My Backyard.

by James Saiz : 2005/10/10 : Categories alibi_phone_network filmmaking (permalink)

To Do List Aggregation

For a while now I've been thinking about the need for a To Do List Aggregator.

While some "next actions" can be manually put on a list, some are time-based (either coming from a calendar or from something like Sciral Consistency). Others come from yet other applications: flagged emails or blog entries, unread email, etc.

Having multiple "inboxes" is a bad thing. So what would be nice would be an application that simply aggregated action items from multiple electronic inboxes, or what I'll call "Action Feeds".

By separating aggregation from the feeds themselves, it would be possible for people to develop all sorts of clever action feed generators that ranged from simple manual lists to integrations with calendars, email, etc.

The list aggregator would be similar to a regular blog aggregator but with a few important differences:

However, I still think Atom could be the basis for the protocol between aggregator and action feed generators. The comments I make in the UPDATE to Google Reader about IMAP are relevant here too.

2005/10/09 : 0 trackbacks : 3 comments (permalink)

TiddlyWiki and Atom Store

Continuing my thinking about To Do List Aggregation, it would be interesting to see a variant of the amazing TiddlyWiki that is backed by an Atom Store.

Probably wouldn't be that hard to do. Any Ajaxians interested in working with me on something like that?

2005/10/09 : 0 trackbacks : 0 comments (permalink)

Demokritos and Leonardo

What is Demokritos?

Demokritos is an open source Atom Store I'm writing in Python.

What is the relationship between Demokritos and Leonardo?

The focus of Demokritos is implementing the Atom specifications. The focus of Leonardo is implementing a practical CMS for personal websites. Although the two will likely merge at some point, I think doing so at this stage would slow down things too much.

What's the short term plan for Demokritos?

Get a 0.1 release out that can at least be used for interoperability testing.

What's the short term plan for Leonardo?

Work out what else (if anything) needs to be done for a 0.7 release; start a beta cycle.

2005/10/08 : Categories leonardo demokritos python : 0 trackbacks : 6 comments (permalink)


Saw Serenity last weekend but haven't had a chance to blog about it until now.

Enjoyed it a lot and will probably go see it again in Australia.

Some of my favourite aspects of it, as a filmmaker:

You can even observe the first point in the comfort of your own home as the first nine minutes are online at Vividas.

2005/10/08 : Categories filmmaking : 0 trackbacks : 1 comment (permalink)

Google Reader

Google recently launched the beta of Google Reader which is a web-based feed aggregator with the GMail-look you'd expect from Google. Like GMail, feeds are tagged rather than in folders.

If I were still using Bloglines, I might consider switching but I'm pretty attached to NetNewsWire so I'm not sure. The cost of switching is high for me because I flag a lot and I'd like my 'read' list to be consistent between clients. With email, that's achievable because of IMAP.

I took a look at my server logs to see if Google is grabbing feeds with something different from their GoogleBot. Turns out they are. Since 7th Oct I've got regular accesses to my atom feed from a user agent:

FeedFetcher-Google; (+http://www.google.com/feedfetcher.html)

The page linked to in the user agent string makes it clear that FeedFetcher is only for user-initiated feed retrieval (i.e. subscriptions in Google Reader or the personalised Google home page) and that the blog search and regular Google search are crawled for separately.

UPDATE: I started thinking more about IMAP for feed reading. Atom is the obvious contender but something more is necessary because the server needs to indicate what's read/unread and the client needs to be able to mark entries read/unread. A simple extension element would work for the former. What about the latter?

by James Saiz : 2005/10/08 : 0 trackbacks : 1 comment (permalink)

XML Catalog Spec Approved

I've just read, Via Anthony Coates, that OASIS has approved the XML Catalog 1.1 spec (disclosure: I voted YES to it)

XML Catalogs are near and dear to my heart...

Back in 1995, I was thinking about how SGML could be used on the Web and decided that one thing that would be useful is resolution of SGML public identifiers. So I proposed an extension to the existing SGML Catalog spec (under SGML Open, which was the forerunner to OASIS) to make them operate more like DNS than /etc/hosts for name resolution. Paul Grosso invited me to present the idea to an SGML Open meeting in early 1996. It was there that I met Jon Bosak and we shared our common vision for SGML on the Web. Just a couple of months later, Jon emailed me to say he had convinced the W3C to let him start a WG to work on this and would I like to be involved.

2005/10/05 : Categories xml : 0 trackbacks : 0 comments (permalink)

Laura Breckenridge: The Next Chapter

Back at SxSW I said of the film Southern Belles:

The real find was Laura Breckenridge, who plays Bell. Laura is definitely an actress to keep an eye on.

Well, Laura stars in the new series Related airing on the WB this Wednesday.

I can't comment on whether the series is any good, but Laura was certainly impressive in Southern Belles so it might be worth checking out if you're in the US.

Plus it's important to support actors who are into Ancient Greek :-) (Laura is a classics major at Princeton)

2005/10/03 : 0 trackbacks : 2 comments (permalink)

The Power of Editing

This has already gone around the blogosphere but I shouldn't assume that people that read this blog read the same blogs as I.

Fake Trailer for 'Shining'

I think the first time I truly realised the power of editing in film was listening to the writer/director commentary on Jerry Maquire. Cameron Crowe said that the Tom Cruise - Renee Zellweger story was secondary in his script and during shooting. It was editor Joe Hutshing's rough cut that dramatically altered the story's emphasis and Crowe decided that he preferred it.

It's almost impossible to tell from the final cut of a film just how much the editor did to change the story. An editor may save a film but you'd never be able to tell this without seeing the raw footage. I once asked an editor "how do you judge good editing for, say, the Academy Awards, given that without seeing the raw footage, you really don't know just how much the editor contributed." His response: "you can't and that's the dirty secret of editing awards".

2005/10/02 : Categories filmmaking : 0 trackbacks : 2 comments (permalink)

The Circle is Not Simply Connected

In the comments to Number of Connected One-Dimensional Manifolds, I questioned why the circle (or more precisely the one-dimensional sphere S^1) was not simply connected. I wasn't trying to argue—I just didn't have the intuition myself, for some reason.

It's funny because now it's bleeding obvious to me that it isn't simply connected. A loop that goes from one point to another then back again clearly isn't homotopic to a loop that simply goes around the circle.

I think I was letting my intuition that S^n is simply connected override this fact. I was over generalising in my mind. S^n is simply connected only for n > 1. Thanks to Michael Hamm and Allan Engelhardt for setting me straight.

by James Saiz : 2005/10/02 : Categories poincare_project (permalink)

What Does 'Relational' Mean?

I would guess that many people think that "relational" in "relational database" has to do with relationships between entities expressed via foreign keys.

I confess that's what I thought until I started reading Chris Date.

In fact, the term "relational" is a reference to the mathematical concept of a relation that I've touched on as part of the Poincare Project.

This fact was obscured to me somewhat by the fact that I've typically only dealt with binary relations but relations can be n-ary.

A relation is really just a subset of a cartesian product.

Consider the set of Employees at a company and the set of Departments. Which employees work at which departments can be expressed as a set of ordered pairs, a subset of the cartesian product Employees x Departments. In mathematical terms, this is a relation.

If we wanted to express, say, extension number as well, we could take as our relation a subset of the cartesian product Employees x Departments x Extensions. This relation is just a set of tuples.

This is the sense in which "relational" is used in "relational database".

In SQL, tables almost correspond to relations and rows to tuples. I say almost because SQL violates the relational model in allowing duplicate rows.

2005/09/28 : 0 trackbacks : 0 comments (permalink)

William Bardon Takes the Lead In Category IV

William Bardon just submitted two entries in Category IV of the programming competition that beat the reigning champion, Mark Ellison.

2005/09/26 : Categories programming_competition : 0 trackbacks : 0 comments (permalink)

Microsoft blames Sun

This crash analysis message seen in a colleague's browser gave me a chuckle.

microsoft crash analysis blaming sun

2005/09/23 : 0 trackbacks : 0 comments (permalink)

Coding Weekend

This weekend is the first weekend in a long time where I have the opportunity to do some solid open source coding.

So I've decided that's what I'm going to do. I have a particular standards-based project in mind that I'm starting largely from scratch but will hopefully be usable by Monday.

I won't say much more now other than to say that (i) it's pure Python; (ii) it's relevant to Leonardo and will probably be folded into (or at least used by) Leonardo at some stage.

I'll post more as I progress.

UPDATE (2005-09-24): Spent yesterday afternoon and evening implementing a library for parsing, manipulating and generating one format. Have spent today implementing the less mature protocol that goes with it. Might not finish the second part tonight but by the end of the weekend I should have a working system that can participate in interop testing with others. Currently at 2000 lines of Python code and 88% code coverage on the unit tests.

The software will be called DEMOKRITOS so no prizes for guessing what I'm implementing :-)

UPDATE (2005-09-26): By last night I'd got to the point where I had a working Atom Store with support for generic collections. It won't take too much work to finish off support for atom entry collections. I should be able to do a release in the next couple of days for interop testing although, without authentication and authorization, it won't really be usable in the "real world" just yet.

2005/09/23 : 0 trackbacks : 5 comments (permalink)

Number of Connected One-Dimensional Manifolds

The current Wikipedia article on Manifolds says that:

I'm confused by the third statement as I would have thought that the half-open interval (0,1] and the circle are both connected one-dimensional manifolds but that neither of them are homeomorphic to either the open or closed intervals.

What am I missing?

UPDATE (2005-10-03): Looks like the Wikipedia entry is, in fact, wrong in making the third statement above.

2005/09/22 : Categories poincare_project : 0 trackbacks : 10 comments (permalink)

New Leader in Programming Competition

Mark Ellison (who has just started blogging at rip-roaring pace) submitted entries in each category of my programming competition and completely blitzed the competition. See the results!

His algorithm rivals what I've been able to achieve with simulated annealing.

2005/09/20 : Categories programming_competition : 0 trackbacks : 1 comment (permalink)

The Naming of Musical Notes, Part II

In Part I, we saw that the key signature in modern music notation supports 15 major keys although only 12 are usable at a time if one wishes to avoid enharmonic scales. Here are the 15 with the 12 that Bach used in the major key preludes and fugues of his Well-Tempered Clavier in bold.

C# F# B E A D G C F Bb Eb Ab Db Gb Cb

Note that C#, F# and B are no more preferable than Db, Gb or Cb. A choice of 12 of the 15 will always include E, A, D, G, C, F, Bb, Eb, Ab but 8 combinations exists for choosing C# versus Db, F# versus Gb and B versus Cb. Mind you, one would probably be unlikely to choose Gb over F# if they had not also chosen Db over C#. That would mean having a 4-flat and a 6-flat but no 5-flat. So, in practice, a composer choosing 12 major keys from the 15 possible would probably choose either C#-Ab (as did Bach), F#-Db, B-Gb or E-Cb.

But we are still missing some enharmonic alternatives. Each of the seven letter names can appear with a sharp or flat (or nothing) and that gives us 21 note names:

Ab A A# Bb B B# Cb C C# Db D D# Eb E E# Fb F F# Gb G G#

In particular the following are not from amongst our major key candidates:

G# D# A# E# B# Fb

If we have a look at our minor key signatures, the following are missing:

E# B# Fb Cb Gb Db

These are acceptable note names, they just can't be (major and minor, respectively) keys. Why not? Well a clue is in the fact that we've already seen the keys that have up to 7 sharps or 7 flats. Given there are 7 distinct note names in an octave, we've run out of notes we can make sharp or flat!

C# major, for example, already sharpens all 7 letter names. What would G# do?

The C# major scale has the following notes:

C# D# E# F# G# A# B# C#

Note that even though there are alternative enharmonic spellings of these notes when considered in isolation, in the context of the C# major scale they must be spelt as above.

This is because only one note can use each letter name. Furthermore, even though the notion of a double-flat or double-sharp is available for individual chromatic notes in a piece, the diatonic notes of a scale are restricted to natural, flat or sharp.

We'll explore these two conventions more in Part III.

2005/09/15 : Categories music_theory : 0 trackbacks : 4 comments (permalink)

New CEO at mValent

I don't often blog about work, but I'm delighted to now be able to announce that mValent has a new CEO, Joe Forgione. I haven't met him in person yet but I'll get a chance on Friday when I arrive at the US office.

It's exciting times at mValent with a major release just around the corner and a veteran chief executive ready to take us to the next stage of the company's growth.

2005/09/13 : Categories mvalent announcements : 0 trackbacks : 0 comments (permalink)

Approaches to Tracking Vocals

Had a final recording session with Nelson before my next trip to the US.

One issue we often wrestle with is whether to do full-song takes or per-section takes.

I think the following might work well for us:

I'll try this approach with a few more songs and see how it works out.

2005/09/11 : Categories recording_producing_and_engineering : 0 trackbacks : 0 comments (permalink)

Old Classical Piece of Mine

I've never made any of my classical music available on the web and I thought now is as good a time as any. Here's an MP3:

Divertimento for Three Clarinets — First Movement

I wrote this piece in 1988-1989 at the height of my study of and love for the music of Mozart. I was fifteen at the time and this is probably the best piece I wrote while at high school (which isn't saying much—there is a reason most composers retract their juvenilia.

The MP3 above is a realisation in Logic Pro using the Garritan Personal Orchestra samples. The piece has never actually been performed with three clarinets. It had one public performance—in Canberra at the National Science Summer School that I attended in early 1990. The performance there was with a flute, violin and another violin restrung as a viola.

2005/09/10 : Categories music_composition : 0 trackbacks : 278 comments (permalink)


Only a couple of posts in the last two weeks. I think that's the worst blogging drought I've had in a long time (maybe ever!)

Here's a quick update:

2005/09/09 : 0 trackbacks : 0 comments (permalink)

Font Fallback

When Safari encounters a character not available in the current font family, say Ὦ, it attempts to find another installed font family that has the character and uses that. It seems most, if not all, OS X applications have this property.

When Internet Explorer on Windows encounters a character not available in the current font family, it just gives up and displays a square. At first I thought this might be Windows in general, but Firefox on Windows has the same behaviour as Safari on OS X.

So it's just IE.

I'd be interested if anyone other than IE users on Windows gets a square here: Ὦ

2005/09/03 : 0 trackbacks : 5 comments (permalink)

The Poincare Conjecture

Well, after a year of looking at the background mathematics, we're finally ready to state the Poincaré Conjecture:

Every simply-connected, closed 3-manifold is homeomorphic to the 3-sphere.

This isn't exactly how Poincaré put it (for a start, he said it in French) but this is the best way to express it given the terminology we've used up until this point.

Poincaré's conjecture has to do with three-dimensional manifolds but it might be easiest to start off thinking about the two-dimensional version:

Every simply-connected, closed 2-manifold is homeomorphic to the 2-sphere.

Consider the surface of a ball and a torus. Both are closed 2-manifolds. But only one is simply-connected. A torus isn't simply-connected so it can't be homeomorphic to the 2-sphere. The surface of a ball is simply-connected and it is homeomorphic to the 2-sphere.

The big question is: is it possible to find a closed 2-manifold that is simply-connected but is not homeomorphic to the 2-sphere? In a nutshell: no, it's not. If it's a closed 2-manifold and it's simply-connected then there isn't any topological property that will distinguish it from the 2-sphere.

The Poincaré Conjecture is that this is true for 3-manifolds as well.

Of course, this is really just the beginning of our journey. Mathematicians have spent the last century trying to prove this so we still have a lot to cover.

Interestingly, it's already been proven that it's true for dimensions greater than 3 (as well as for 1 and 2 dimensions). Stephen Smale proved it in 1960 for dimensions greater than 6 and then extended his proof to cover dimensions greater than 4. In 1966, he was awarded the highest prize in mathematics, the Fields Medal, for this proof. Michael Freedman then proved in 1982 that it's true for 4 dimensions which won him the Fields Medal in 1986.

2005/08/31 : Categories poincare_project : 0 trackbacks : 4 comments (permalink)

MorphGNT 5.07 Released

I'm pleased to announce the release of a new version of MorphGNT, the morphologically parsed Greek New Testament database made available under a Creative Commons license.

See the MorphGNT page for a list of changes (47 changes in 940 places).

2005/08/31 : Categories morphgnt : 0 trackbacks : 1 comment (permalink)

The Naming of Musical Notes, Part I

How many different notes are there in an octave? What about note names? The answer to the second is very interesting and this is part one of an exploration of that question.

Bach's Well-Tempered Clavier (henceforce WTC) consists of two books each containing prelude and fugue pairs in 24 different keys. Why 24? Well, if you look at a keyboard, you'll see there are 12 notes in the octave. Allowing for both major and minor keys we therefore have 24 major+minor keys to choose from and Bach wrote a prelude and fugue in each key in each book of the WTC.

But if we look at the key signature, it tells a different story. A key signature may consist of 1-7 sharps or 1-7 flats or nothing at all. Allowing for both major and minor keys that gives us 30 different keys.

Here are the 15 major keys that the key signature gives us (with the ones Bach uses in WTC in bold):

C# F# B E A D G C F Bb Eb Ab Db Gb Cb

The corresponding relative minors (with the ones Bach uses in WTC in bold again) are:

A# D# G# C# F# B E A D G C F Bb Eb Ab

Notice that the extra keys possible but unused by Bach are enharmonic with keys that are used. Db major is enharmonic with C# major, Gb major with F# major and Cb major with B major. Similarly A# minor is enharmonic with Bb minor, D# minor with Eb mintor and Ab minor with G# minor. That is not to say that Db major is the same as C# major—for one they have different key signatures and the names of each degree of the scale is different (more on that later). They may even sound different depending on the tuning system used.

But this explains why Bach wrote in 24 major+minor keys, even though notation provided him with 30—he avoided enharmonic duplicates.

But this isn't the whole story. Notice that:

The reason behind these two facts will be the subject of the next part.

2005/08/30 : Categories music_theory : 0 trackbacks : 4 comments (permalink)

Upcoming new MorphGNT

I'm just about to release MorphGNT 5.07 and, shortly after that, a major new release I'll designate 6.07.

I've decided not to reset the minor release number on a new major release to emphasis the fact that 5.07 and 6.07 are identical in the data they have in common, the 6-series just adds some extra data.

I haven't yet decided just how much extra data will make it in the 6-series releases, but one new addition will be a column containing the surface form / inflected form / reflex (take your pick of terminology) of each word taken in isolation.

What do I mean by "taken in isolation"? Well a word like μετά could appear in the text as μετά μεθ' μετ' or μετὰ depending on the text after it. This new column normalises that to μετά. This happens to also be the lemma so it might not be clear what the extra value is in this case. So consider the text in Matthew 1.20 which reads:

παραλαβεῖν Μαρίαν τὴν γυναῖκά σου

Note that τὴν has a grave accent and γυναῖκά has two accents. If you were to ask someone what the accusative singular feminine article is, they'd say τήν not τὴν. Similarly, if you asked someone what the accustive of γυνή is, they'd say γυναῖκα not γυναῖκά. The reason for the differing accentuation in the text is the context: final syllable acute becomes grave unless clause-final and enclitics like σου throw their accent back to the end of the previous word.

Sometimes you want to treat the variations these cause as distinct, sometimes you don't. By including the extra column, users of MorphGNT will have the best of both worlds.

Here is a list of possible differences between the existing text column and the new column:

The new column normalises all these differences.

2005/08/30 : Categories morphgnt : 0 trackbacks : 0 comments (permalink)

Books That Changed My Mind, Part I

I buy a lot of books (a colleague in the US has even claimed a correlation between the arrival of Amazon boxes at my cube and Amazon's share price). I don't necessarily read them all cover-to-cover. Someone once said to me that a single idea on a single page of a book and the book can pay for itself.

Here are three books that contained an idea that really made me change my mind about something. They come from complete different subject areas but they each gave me a real "wow" moment.

The Wealthy Barber by David Chilton — when I asked an accountant for financial advice back in 1997, he said "just read this book". The book has some great ideas but the real "wow" moment for me was the stuff about life insurance.

Geometrical Vectors by Gabriel Weinreich — when I was teaching myself differential geometry to understand General Relativity, I struggled to get an intuitive grasp of the distinction between vectors, one-forms, cross-products, etc. This book completely changed the way I think about vectors and vector calculus (I wish I'd had it in 2nd year mathematics).

The Epistles of John by Raymond Brown — I devoured the copy of this in the University Library when I was an undergraduate student. Not only did it change my mind about dealing with false Christian teaching but instilled in me a real fascination for reconstructing the context of the New Testament epistles.

There are other books that have had similar impact too. I'll post about them another time.

2005/08/22 : 0 trackbacks : 205 comments (permalink)

First Round of Programming Competition Results

Submissions from Didier Barbas and Tim Wegener.

I know from the scores exactly what algorithm Didier used as it's one of the five algorithms I'd applied to the Category II text.

There's plenty of room for improvement so get those entries in.

Here are the current results.

by James Saiz : 2005/08/22 : Categories programming_competition (permalink)

Planning a Programming Competition

It occurs to me that my exploration of ordered vocabulary learning might make an interesting programming competition. Like the ICFP competition I've entered before, it's well suited to any programming language because it is the results of the program that are judged, not the program itself.

I already have the scoring program written (original by me with improvements from Tim Wegener).

So, anyone that's interested, stay tuned, I'll post the details and rules in the next 36 hours. The input data will be from the Greek New Testament but no knowledge of either Greek or the New Testament is required.

There will be four categories, with greatly varying text lengths, so differently algorithms will be applicable.

The competition will be ongoing, with a ladder of the top 5 in each category, rather than a single "event" over a couple of days.

Here are my previous posts on the topic:

2005/08/20 : 0 trackbacks : 0 comments (permalink)

Closed Manifolds

I said previously that we were ready to state the Poincaré Conjecture, but there's one more bit of terminology I want to get out of the way and that is closed manifold.

A closed manifold is a compact manifold without a boundary.

We previously listed the following as examples of spaces that are or are not compact:

The non-compact examples have the characteristic that you can "keep on going" and keep getting new points whereas the compact examples have the characteristic that you reach a point where there is no more, either because you've reached the edge (i.e. boundary) or because you've gone back to a point you've already been.

Saying without a boundary further restricts us to cases like the circle and not like the closed interval.

So, in other words:

NOTE: "Closed" here doesn't mean the same thing as a closed subset (i.e. one whose complement is an open set in a topology).

Here are some things to think about:

2005/08/20 : Categories poincare_project : 0 trackbacks : 5 comments (permalink)

Vocab Ordering Programming Competition

Okay, I've written up the instructions. I'm pleased to announce the start of the Vocab Ordering Programming Competition!

Bibliobloggers, Pythonistas, spread the word!

2005/08/20 : Categories programming_competition : 0 trackbacks : 0 comments (permalink)

Tintin Movie News Still Coming

Nothing new, but this from an article in the India Tribune:

Tintin’s global reach can be best gauged from the fact that Hollywood has had its eyes on Herge’s hero for several decades. Indeed, Tintin has also been the subject of several full-fledged motion pictures around the world. But none would have been bigger than the one that is currently in the works.

Steven Spielberg, no less, owns the rights to a trilogy of live-action Tintin films. Plans have been on the drawing board since the early 1980s. But according to reports from Hollywood, Spielberg has now confirmed that the Tintin trilogy, dormant for long, is indeed moving ahead. When the films do get off the ground, they will reportedly be a joint venture between Spielberg’s DreamWorks and Universal Studios.

Speculation is already rife about the Tintin stories that Spielberg will take up for screen adaptation. Although Destination Moon and Explorers on the Moon is believed to be one of the pairs of books in the running, the other adventures that run over two separate books – The Secret of the Unicorn and Red Rackham’s Treasure; The Seven Crystal Balls and Prisoners of the Sun; and The Blue Lotus and Tintin in Tibet – stand the best chance of being successfully adapted.

2005/08/19 : Categories tintin : 0 trackbacks : 2 comments (permalink)

Possessive James

Just for the record, I think the possessive of James is James's. Not James' and certainly not Jame's.

Yet, for some reason, the majority of people seem to write James' or Jame's.

The latter two would only make sense if my name were something like Jame (and, in the case of James', there were two or more of me).

Phonetically, when you say the possessive of James, [dʒæɪmzəz], there are two [z] sounds, so there's no reason why you can't write 's' twice. Each <s> corresponds to a [z].

One interesting exception, though, seems to be with a word like Jesus [dʒiːzəz] which already has [zəz] at the end. If you listen to someone casually saying "Jesus's disciples", they will often say five syllables [dʒiːzəz dɪsaɪpl̩z] and not six [dʒiːzəzəz dɪsaɪpl̩z]. But even in that case I would argue "Jesus's" is the correct spelling of the possessive.

(NOTE: My IPA is very rusty so don't trust my transcriptions)

2005/08/18 : Categories linguistic_observations : 0 trackbacks : 4 comments (permalink)

Python Slice Questions

1. why does


call a.__getitem__ with an argument:

slice(0, 2147483647, None)

instead of

slice(None, None, None)

2. where are slice lists, like:


documented? The only place I've found is the language reference but the semantics are not explained there. Does this feature exist purely for Numeric Python?

UPDATE: slice lists have existed since 1.4 it appears.

2005/08/16 : Categories python : 0 trackbacks : 5 comments (permalink)

Happy Anniversary Rick

Happy 1st Blog-iversary to Rick Brannan, whose blog is one of my favourites. He's a Christian, a text geek, and he reads Marginal Revolution—you can't beat that :-)

by James Saiz : 2005/08/13 (permalink)

Old XML Post and a Joke

One of my favourite old posts to xml-dev was one I made back in June 1998.

I talked about the distinction between syntax and semantics in markup languages but also the need to distinguish XML the language from XML the metalanguage.

But the real reason it's one of my favourites is that it was during its composition that a play on the Magritte painting The Betrayal of Images popped into my head.

In the post to xml-dev, I used the English version but I later used it as a .sig with the French wording:

<pipe>Ceci n'est pas une pipe</pipe>

2005/08/10 : Categories xml : 0 trackbacks : 1 comment (permalink)

Logic Pro

Logic Pro arrived yesterday, along with...you guessed it...dongle number three!

I've purchased three music software products in the last month and all three have required dongles. Each seems to use a different system so that's three distinct dongles in three distinct USB ports.

I haven't had much of a chance to play with Logic Pro yet but I did get it installed and working with my Digi 002. The latter turned out to require a little trick.

Digi 002 is the digital audio workstation I use with ProTools LE. It has some nice pre-amps, A/D and D/A converters, MIDI interfaces and a control surface, all connected to the computer via FireWire. It's designed for either use with ProTools LE software or standalone as a mixer.

However, it can act as a plain audio interface and MIDI interface for CoreAudio on Mac OS X.

I was hopeful that this would mean I could use it with Logic Pro—not the control surface, but at least the MIDI interface, audio inputs, pres and outputs.

Things looked promising when I ran the setup assistant for Logic Pro as it found both the audio and MIDI interfaces on the 002 via CoreAudio and I was able to select them as what I wanted to use in Logic Pro.

However, I had a brief period of disappointment when, on start up of Logic Pro, CoreAudio would kick off the Digi 002 as an available interface.

Quick bit of Googling and I found the solution: The Digi CoreAudio Manager has to be manually started before Logic Pro.

So now I've listened to the Logic Pro demo song on my reference monitors hooked up to the Digi 002.


2005/08/09 : Categories record_producing_and_engineering : 0 trackbacks : 3 comments (permalink)

More Tintin Casting Rumour Denials

Seeing as I haven't got marlinspike.org back up and running yet, I'll continue to blog on Tintin movie news here.

Jamie Bell rubbishes Tintin movie rumors:

Award-winning teen star Jamie Bell has slammed reports he is set to star in a forthcoming screen version of Tintin.

Director Stephen Daldry is planning to remake the classic French adventure stories as a cinematic extravaganza, and although Bell was tipped to play the quiff-sporting hero, the young actor insists the claims are unfounded.

He says, "Who makes these rumours up? I mean, really.

"It would be fantastic, but it's news to me."

It's an odd story for a few reasons (besides mistaking Tintin for French). Firstly, the Jamie-Bell-as-Tintin rumour is a couple of years old but this story is datelined 7th August 2005.

More odd, though, is the claim that Stephen Daldry is going to direct. It is Stephen Spielberg who owns the rights, and while he had said he may only produce and not direct, I've never heard Daldry's name in association with the Spielberg project.

Perhaps the journalist got confused by the fact that it was Daldry who directed Jamie Bell in Billy Elliot.

(As as aside, Daldry is currently directing The Amazing Adventures of Kavalier & Clay which, from all account of the book, should be worth seeing)

2005/08/07 : Categories tintin : 0 trackbacks : 32 comments (permalink)

Homotopy Classes and Simple Connectedness

A month ago, we saw that path homotopy provides a way of distinguishing certain topological spaces. We're now in a position to make that a little more precise.

Until now we've not required that our paths end at the same point they start but it's useful at this point to restrict ourselves to such closed paths.

Path homotopy is an equivalence relation which means that we can partition all the closed paths starting and ending at a particular point in a manifold into equivalence classes. Such equivalence classes are called the homotopy classes for that point.

Let's consider again the two manifolds we looked at before that differ only in that the one on the right has a hole in it.

homotopy classes on two different manifolds

If we pick a point in each manifold, and consider the homotopy classes, we notice something very important. The point in the the manifold on the left has a single homotopy class. All the closed paths shown in black are homotopic. In contrast, the point in the manifold on the right has two homotopy classes. The closed paths in red are homotopic to one another and the closed paths in blue are homotopic to one another but the red paths are not homotopic to the blue paths.

If a topological space is connected and has a single homotopy class for each point, it is said to be simply connected. The manifold on the left is simply connected. The manifold on the right is not.

Another way of thinking about it is that any closed path in a simply connected manifold can be continuously shrunk down to a single point. The paths in red can be continuously shrunk down to a point. The paths in blue cannot. They get "stuck" around the hole.

Simple connectedness is a topological property that distinguishes spaces that may otherwise be topologically equivalent.

We are now ready to state Poincaré's famous conjecture.

2005/08/06 : Categories poincare_project : 0 trackbacks : 1 comment (permalink)

Going With Logic Pro

I previously mentioned my quandary regarding a more composition-focused tool to use alongside ProTools.

Well, I've decided to go with Logic Pro. I'm tentatively thinking that I'll compose, arrange and do all the programming and synth tracking in Logic Pro then switch to Pro Tools for vocals, mixing and mastering.

2005/08/06 : Categories record_producing_and_engineering : 0 trackbacks : 0 comments (permalink)

PayPal Woes

PayPal customer service is driving me up the wall.

I used PayPal a bit when I was living in the US. After I moved back to Australia, it continued to work because I still had a US credit card. When that expired, I entered my new Australian credit card details.

Because I was registered with PayPal as living in the US but my credit card billing address was in Australia, they wouldn't let me use that card and put a block on my account because it had no verified credit card on file.

At the time, I just let it go and used alternatives.

But last week I wanted to use PayPal again: "Limited Account Access" "The credit card you tried to add to your account requires additional verification."

But under "How can I restore my account access?" It simply says "This limitation cannot be appealed."

Because the issue ultimately was the change of country, I looked under help for how to indicate a move. Simple: you just close your account in one country and open it under another. Problem: you can't close your account if you have limited account access.

So I write to PayPal explaining the situation. They send back a form letter saying "To return your account to regular standing, please complete the checklist items". THERE ARE NO CHECKLIST ITEMS.

So I write again. Same form letter comes back.

I write a third letter, explaining everything in detail. SAME FORM LETTER.


Hopefully they'll read that!

Oh, and I tried just going and creating a new Australian account with a different email address but they won't let me BECAUSE THE CREDIT CARD IS ALREADY IN USE.

2005/08/05 : 0 trackbacks : 9 comments (permalink)

Return of the Dongle

I just ordered Steinberg's Groove Agent 2 and all Steinberg products now seem to require a USB dongle, just like the plugins I ordered recently for ProTools.

They use different dongles, though, so I guess I'll be using up an extra 2 USB ports whenever I'm producing music.

2005/08/05 : 0 trackbacks : 0 comments (permalink)

Missing OSCON

When I first found out that OSCON would coincide with my return to Australia, I did seriously consider stopping over in Portland on my way home for a week.

I spoke at OSCON in 2001 and it was probably my favourite conference of the 30 or so I spoke at 2000-2001.

So it was a tough decision but in the end I decided I just wanted to get home as soon as possible.

Now that I'm home and hearing good stuff about OSCON, I kinda wish I was there :-)

2005/08/04 : Categories conferences : 0 trackbacks : 276 comments (permalink)

Ordering Goals Rather Than Prerequisites

The outcome of my simulated annealing program is a list of prerequisites to learn along with an indication, every so often, of what new goal has been reached. Running on the Greek lexemes of 1John, you might get something starting like this:

learn μαρτυρέω
learn θεός
learn ἐν
learn εἰμί
learn ὁ
learn τρεῖς
learn ὅτι
know 230507

This gives seven prerequisites to learn and then a goal that has been reached (230507 = 1John 5.7). The problem is that two of those words are unnecessary. You only need to learn μαρτυρέω, εἰμί, ὁ, τρεῖς and ὅτι to be able to read 1John 5.7.

The problem is that the program is ordering prerequisites first and only then establishing at each point what goals (if any) have been achieved.

I can see two solutions:

The second is probably considerably more work but probably ultimately preferred.

UPDATE: I'm almost embarrassed to report that not only was changing over to ordering goals not as hard to do as I thought, but the particular way I did it performs 200 times faster than my previous prerequisite ordering script. New script is at http://jamessaiz.en.wanadoo.es/2005/08/sa_goal_ordering.py

2005/08/04 : Categories python : 0 trackbacks : 6 comments (permalink)

Paul Graham Has Done It Again

I've commented before that Paul Graham and I share a lot of the same views, he just expresses them much better than I do.

Well, he's done it again with his latest What Business Can Learn from Open Source.

Just last week I was trying to explain in a comment on mnot's blog that:

large companies have far more in common with centrally planned socialism than free market capitalism

Well, Paul Graham basically says the same thing and he ties it in beautifully with blogging and writing open source software:

Ironically, though open source and blogs are done for free, those worlds resemble market economies, while most companies, for all their talk about the value of free markets, are run internally like commmunist states.

Whereas I stumbled to say:

People need to see themselves as individuals in the market rather than employees of corporations in the market.

Paul Graham says:

Nothing shows more clearly that employment is not an ordinary economic relationship than companies being sued for firing people. In any purely economic relationship you're free to do what you want. If you want to stop buying steel pipe from one supplier and start buying it from another, you don't have to explain why. No one can accuse you of unjustly switching pipe suppliers. Justice implies some kind of paternal obligation that isn't there in transactions between equals.

Most of the legal restrictions on employers are intended to protect employees. But you can't have action without an equal and opposite reaction. You can't expect employers to have some kind of paternal responsibility toward employees without putting employees in the position of children. And that seems a bad road to go down.

My sentiments exactly.

Read the whole thing.

2005/08/04 : 0 trackbacks : 0 comments (permalink)

DataLibre DOAP

Almost exactly a year ago, I asked:

how can I use my own website as the authoritative source of my own FOAF and DOAP information while at the same time that information being available in directories for searching, rating, etc.

Well, it looks like O'Reilly's CodeZoo supports the DOAP part of this (discovered via Edd Dumbill)

I'm still no closer to a DOAP-Atom plugin for Leonardo. Any volunteers?

2005/08/04 : Categories datalibre : 0 trackbacks : 0 comments (permalink)

Equivalence Classes

If an equivalence relation is defined on a set, then we can classify each element of that set using the relation, by putting all elements that are equivalent (according to the relation) in the same class and elements that are not equivalent (according to the relation) in different classes.

The properties of equivalence relations ensure that a given element will be in exactly one class. Therefore, an equivalence relation can be used to partition a set into disjoint subsets. These subsets are called equivalence classes.

For example, say our set is all the people in the world and our equivalence relation is "share the same birthday". Then this partitions the set into 366 equivalence classes. I would be in the equivalence class with all the other people born on 19th November.

2005/08/03 : Categories poincare_project : 0 trackbacks : 4 comments (permalink)

Using Simulated Annealing to Order Goal Prerequisites

Back in November, I wrote about programmed vocabulary learning as a travelling salesman problem.

I'm pleased to say I've finally cleaned up my Python code and made an initial version available at:


UPDATE (2005-08-04): You probably don't want to use the above script. See Ordering Goals Rather Than Prerequisites for why, along with a much improved script.

by James Saiz : 2005/08/03 : Categories python (permalink)

Upgraded ProTools

I've upgraded Gideon, my recording studio's PowerMac, to ProTools LE 6.9 from 6.4 via 6.7 (DigiDesign skips version numbers in their public releases).

I haven't upgraded Gideon to Tiger yet, although ProTools LE 6.9.2 does support 10.4.1.

At the same time as ordering ProTools LE 6.9, I ordered a bunch of plugins but I discovered when I tried to install them that some of them require an iLok USB dongle. I've never had to use a dongle before—they seem so...old fashioned :-) Anyway, iLok is on its way.

What I really am missing is a decent tool for composition. ProTools is very much a tracking and mixing tool—still weak for composing / arranging.

Most products that are stronger on composition, MIDI, etc are increasingly focused on audio processing. The overlap is completely wasted on me. When I look at something like Digital Performer or Logic Pro, they seem to be pushing a bunch of stuff I already have in ProTools.

I get the impression that professional composers and producers just live with the redundancy and use overlapping tools.

2005/08/03 : Categories record_producing_and_engineering : 0 trackbacks : 0 comments (permalink)


FOP was the first big open source project I started. Six years ago, I donated it to Apache and shortly after that stopped working on it myself (I didn't have time once I started working at Bowstreet).

It was a lot of fun and I learnt a lot about software craftsmanship.

I still get an email every few weeks asking me a question about using FOP. Not entirely sure why my email address would be easier to find than the mailing list at Apache.

2005/08/02 : Categories fop software_craftsmanship : 0 trackbacks : 121 comments (permalink)

Sorting in Python with Identical Comparison Keys



The consistent python test.py versus ./test.py alternation is still odd, though.

UPDATE: I wonder if the problem is lack of a good hash on the Goal class. If it's using a memory location then that might explain the python test.py versus ./test.py alternation.

UPDATE 2: Yep, looks like it's fixed with a decent __hash__ implementation. Good to know I can still make rookie mistakes :-)

2005/08/02 : Categories python : 0 trackbacks : 1 comment (permalink)

O'Reilly Connection

I just discovered and signed up for O'Reilly Connection. If you sign up and I know you, feel free to connect.

Oh, and while I think of it, if you're on LinkedIn, feel free to connect too.

And no, I'm not looking for a job :-)

by James Saiz : 2005/08/02 (permalink)


Finally home in Perth after almost six months away.

Flights were uneventful other than losing my luggage somewhere between Boston and Brisbane. Still no word whether they've found it.

I have a tremendous amount to catch up on, not sure where to start.

Stay tuned. I'll be back to blogging lots now.

UPDATE (2005-08-02): Luggage has been found and returned.

2005/07/31 : Categories personal travel : 0 trackbacks : 2 comments (permalink)

I'm Proud

I'm proud of Elliot Cohen, who I'm mentoring in Google's Summer of Code. He's writing unit tests before implementing code.

Be careful Elliot; it's addictive :-)

2005/07/27 : Categories software_craftsmanship summer_of_code : 0 trackbacks : 1 comment (permalink)

Last Days

These are the last few days of my marathon trip to the US. On Friday, I'll be heading home after being away for almost six months.

Things are very busy at work (in a good way) and that, combined with preparations for leaving (like working out how to ship all the stuff I've accumulated here, in Austin, Palm Beach and Europe) means that I probably won't have time for much else until I get home on Sunday.

Apologies for the paucity of posts of late. I'll be back in full force next week :-)

2005/07/27 : 0 trackbacks : 0 comments (permalink)

Leonardo and Atom 1.0

Dave Warnock has been working on Atom 1.0 support in Leonardo. We decided it would be a good opportunity to start a better separation of the Leonardo core from individual plugins, so he is working on the plugin itself and I'll work on updates to the core, which I'll release as 0.7.

2005/07/23 : Categories leonardo python : 0 trackbacks : 2 comments (permalink)

Kaju Katli

My favourite Indian sweet is Kaju Katli.

Kaju means cashew. Are the two words cognate? Or is one a loan word (and in which direction?)

UPDATE (2005-07-22): Merriam-Webster claims cashew is from the Portuguese acajú. Platt's Dictionary of Urdu, Classical Hindi and English says that काजू is probably from...you guessed it—the Portuguese word acajú. But even the Portuguese acajú is just a loan word from the Tupi acajú.

2005/07/21 : Categories linguistic_observations : 0 trackbacks : 3 comments (permalink)

Indexing Time

Dave Warnock and I have been talking about indexing entries in Leonardo by last updated time.

We want to be able to retrieve the entries between A and B, or the n entries after A or the n entries before B where A and B can be either ordinals or times.

I'm guessing the right way of doing it would be some sort of balanced tree.

The nature of the data is that insertions will almost entirely be at the end, retrieval will largely be at the end and deletions will probably be fairly distributed.

Just as a preliminary, I've written an unbalanced tree, although I haven't finished implementing the kinds of queries we want to be able to do on the tree.

Any suggestions on algorithms and/or implementations in Python?

Most implementations don't seem to come out of the box with the kind of "slice" queries we want to do (or even both key- and ordinal-based queries).

2005/07/19 : Categories leonardo python algorithms : 0 trackbacks : 11 comments (permalink)

Problems Setting Up Firewall on OS X to Accept Mail

I'm really struggling with setting up my remote mac mini to receive mail. The problems seem to be in the configuration of ipfw.

Even with:

allow tcp from any to any dst-port 25 in

ipfw is logging a Stealth Mode connection attempt when I attempt to send mail to it.

Any ideas?

UPDATE (2005-07-18): Wasn't the firewall, it was Postfix. Apple's default main.cf file sets inet_interfaces twice and I had made changes to the first instance (which was then overridden by the second)

2005/07/17 : Categories os_x : 0 trackbacks : 1 comment (permalink)

Equivalence Relations

To formalize path homotopy as a way of distinguishing certain topological spaces, we need to introduce the notion of an equivalence relation and an equivalence class. We'll introduce the former here.

Consider a set A of objects. We pick certain pairs of elements in A and say they have a particular relation to one another. In other words, a relation R on A (or more accurately, a binary relation on A) is simply a choice of pairs—a subset of A x A. If <a, b> is in R then we say that a has the relation R with b. We can also write this aRb.

If A is a set of people, R might be something like "is the father of". And so if d is Darth Vader and l is Luke Skywalker, then dRl.

A relation is said to be an equivalence relation iff it is a relation with the following properties:

Our "is the father of" relation violates all three and so it certainly not an equivalence relation.

Something like "is less than" on the set of reals is transitive but not reflexive or symmetric and so is not an equivalence relation.

Something like "is less than or equal to" on the set of reals is transitive and reflexive but still not symmetric and so is not an equivalence relation.

Equality is an equivalence relation as it has all three necessary properties. Two topological spaces being homeomorphic is also an equivalence relation.

Importantly for us, path homotopy is an equivalence relation.

2005/07/16 : Categories poincare_project : 0 trackbacks : 6 comments (permalink)

Parts of Speech and Number of Accents

I thought I'd write a quick Python script to check how many accents were on each of the lemmata in MorphGNT 5.06.

Here are the counts by part of speech and number of accents on lemma:

     0    1    2  
  A    -    9159    -  
  C    924    17361    -  
  D    1592    4606    -  
  I    -    17    -  
  N    30    28271    1  
  P    5433    5488    -  
  RA    19862    4    -  
  RD    -    1744    -  
  RI    -    1165    -  
  RP    -    11584    -  
  RR    -    1677    -  
  V    8    28101    1  
  X    147    844    -  

Some of the low numbers are definitely errors in the database. Now to investigate...

UPDATE (2005-07-16): both 2-accent cases were mistakes. The 30 0-accent nouns and 5 of the 0-accent verbs were foreign loan words that intentionally weren't accented but 3 of the 0-accent verbs were mistakes. The 4 accented articles were the result of crasis with the following noun and the word should probably be analyzed as a noun rather than an article. I guess there'll be a 5.07 release soon. NOTE: I haven't looked at the particles, adverbs, conjunctions or prepositions yet.

by James Saiz : 2005/07/16 : Categories morphgnt (permalink)

MorphGNT 5.06 Released

Well, it's been about a hundred hours work over the last six months, but I'm pleased to announce the release of a new version of MorphGNT, the morphologically parsed Greek New Testament database made available under a Creative Commons license.

Besides some corrections to the text (mostly rho-breathing) and a couple of parsing code changes, this release has a huge number of corrections to the lemmata—160 lemma changes in 465 places. See this blog entry for how potential errors for this round of corrections were discovered.

You can download the new file at:

2005/07/16 : Categories morphgnt announcements : 0 trackbacks : 1 comment (permalink)

Thank You Fred Smith

At 5pm yesterday, I took my Mac Mini to the local FedEx store in Burlington, Mass. By 10.30am this morning it will be at macminicolo.net in Nevada ready to be installed in their data center. There is a good chance the machine will be online by 5pm today.

FedEx really is an amazing thing.

UPDATE (2005-07-15): Yep. It's online (and I've corrected the link to macminicolo.net — thanks to Joe Weaks)

by James Saiz : 2005/07/15 : 0 trackbacks : 2 comments (permalink)

Headless Tiger

I've created a page to put my ongoing notes on running non-Server Mac OS X 10.4 (Tiger) remotely via ssh.

See Headless Tiger.

I've opened up comments on that page so you can add tips.

2005/07/14 : Categories os_x (permalink)

Updating OS X From Command Line

As I get ready to send my Mac Mini off to the data center, I've been seeing just how much I can do via ssh.

So far so good.

As 10.4.2 just came out, the big question was whether one can do a software update from the command line.

Sure enough, it's possible:

sudo softwareupdate -l

will list the available updates.

sudo softwareupdate -i -r

will install recommended updates.


man softwareupdate

for more information.

UPDATE (2005-07-14): Now see Headless Tiger.

2005/07/13 : Categories os_x : 0 trackbacks : 0 comments (permalink)

Summer of Code Blogs

Elliot Cohen, whose Summer of Code project I am mentoring, has started a project blog at http://elliotpbnt.blogspot.com/

I also noticed that http://planet.python.org/ has started aggregating a bunch of SoC project blogs (including Elliot's)

2005/07/12 : Categories python summer_of_code software_craftsmanship : 0 trackbacks : 2 comments (permalink)

Isometric Games in Python

A couple of months ago, I started investigating free libraries for developing isometric games in Python.

I found the pygame-based project Pyplace but there hadn't been a release since 2001.

So I decided to start my own, which I've called pyso.

As a starting point, in particular because I have no experience with either pygame or writing isometric games, I've just cleaned up Pyplace (which was, how shall I say this politely, quite idiosyncratic in parts).

You can get my initial effort at:

It currently is really just the last Pyplace release taken apart, cleaned up a little and put back together again.

The next release will likely be quite different and more my own work.

2005/07/10 : Categories pyso python announcements : 0 trackbacks : 19 comments (permalink)

Switching to Dedicated Hosting

Thanks to Mac Mini colocation services like macminicolo, it's now cost effective for me to switch some (and perhaps eventually all) of my web sites to dedicated hosting.

So I got an account with macminicolo and ordered my Mac Mini which arrived yesterday.

One thing that I need to experiment with before I send it off to the datacenter is whether I'll be able to remotely manage it effectively as-is or whether I'll need to buy something like Apple Remote Desktop.

2005/07/09 : 0 trackbacks : 45 comments (permalink)

Leonardo 0.6.2 Released

I am pleased to announce the release of Leonardo 0.6.2.

Leonardo is the Python-based content management system that runs this site and provides blogging and wiki-style content.

This is a major bug fix release which:

You can download it at:


2005/07/09 : Categories leonardo python announcements : 0 trackbacks : 1 comment (permalink)

Homotopy as a Way of Distinguishing Topological Spaces

Path homotopy can be used to distinguish topological spaces that otherwise share the same topological properties.

For example, consider two topological spaces the locally resemble R^2 but globally look like the following:

two spaces, the right one with a hole in it

In other words, both are compact manifolds and the one on the right differs from the one on the left in that it has a "hole" in the middle.

Are the two homeomorphic? Our intuition tells us not because of the hole in the one on the right. But if they are not homeomorphic, there must be a topological property that one has that the other does not.

We'll get to what that property is formally later, but for now, I want to show informally that homotopy is the key.

two paths on the two spaces introduced above

Look at the two paths, f and g on each of the topological spaces. In the space on the left, they are path homotopic whereas on the right, they are not. In other words, the existence of the hole means that not all paths with the same start and end are path homotopic to one another.

There's no way you can continously transform f to g when there is a hole between them.

We'll explore this idea a little more formally over the next couple of weeks and then we'll finally be able to state the Poincaré Conjecture.

UPDATE (2005-07-10): I just changed "homotopic" to "path homotopic" twice in the third last paragraph. It's important that we're talking about path homotopy not just homotopy here as we require the start and end of the path to remain fixed during the transformation from f to g.

2005/07/09 : Categories poincare_project : 0 trackbacks : 2 comments (permalink)

Dr Seuss's Oscars

I was looking at Amazon's List of Bestselling Authors and noticed the claim that Dr Seuss (which rhymes with "voice", by the way) won three Academy Awards.

However, a look at his award page on IMDb doesn't list any.

Turns out that two films he co-wrote won Oscars (one for animated short and one for feature-length documentary) but, of course, those Oscars go to the producer(s).

Haven't found the third yet.

2005/07/09 : 0 trackbacks : 0 comments (permalink)

Simulating Mechanical Clock Movement

When we were in Switzerland we spent a bit of time in stores and I spent most of that time studying pendulum clocks whose movement was exposed.

I was delighted to discover an almost identical approach in every single case: a gear train with weights causing torque on the slowest moving gear and a pendulum connected to a piece (a type of what I later found is called an escapement) that regulates the motion of the fastest moving gear. (Wikipedia has a nice diagram showing an escapement in action).

Of course, the devil is in the detail, but the pattern was enough to get me excited about getting deeper into horology.

I've been thinking since about simulating the movement in software. I wonder how easy it would be to build something in ODE, the Open Dynamics Engine, which I know has a Python binding.

by James Saiz : 2005/07/07 : Categories horology python : 0 trackbacks : 1 comment (permalink)

Interesting Observations Come With Ambiguity

In an email to the Leonardo mailing list, I almost said:

If I use Kid, I'll ship Leonardo with it.

but then was worried that would be interpreted the wrong way around. So I considered saying:

If I use Kid, I'll ship it with Leonardo.

but was still worried that it would be interpreted the wrong way around.

A similar incident happened a few weeks ago when I was talking to my colleague James Marcus about whether he had the right A to use with B. I said:

I'm sure A comes with B.

and he looked confused. I realised he thought I was suggesting that A includes B (rather than the other way around)

Sentences of the form:

are strange in that the relationship between A and B is clearly not symmetrical and yet, for me at least, A and B are often syntactically interchangeable.

Even if I clearly intend to express that A includes B, either of the following in most cases conveys that to me:

I wonder if there are other phrasal verbs in English that have clearly distinct grammatical roles but ambiguous syntactic position.

2005/07/07 : Categories linguistic_observations : 0 trackbacks : 0 comments (permalink)

My Sister and Holly Lisle Meet in the Blogosphere

My sister, Jenni, is a 19-year-old aspiring fantasy writer. Her favourite author for many years has been Holly Lisle.

Well, Jenni just discovered that her blog is listed on Holly Lisle's blogroll.

Isn't the blogosphere great!

2005/07/06 : 0 trackbacks : 388 comments (permalink)

MorphGNT Roadmap

This month I should be doing another release of my morphologically-parsed Greek New Testament. This will be release 5.06.

I thought I'd outline my future plans (as they currently stand).

At some point, I'll start doing 6.xx releases. This will involve a format change that includes some more information. I'll probably continue the 5-series releases for people used to the format. The 5-series data is just a subset of the 6-series data so it's always possible (and easy) for me to generate a 5 from a 6.

From Series-7, MorphGNT's format will likely change dramatically to adopt a graph structure rather than a simple tabular structure. This will enable much greater extensibility and annotation.

Series-7 will be the last that is based on the CCAT database. From Series-8 onwards, the data will hopefully be completely the results of my own parsing work.

First things first, though—getting 5.06 out. I'm down to 299 mismatches to resolve.

2005/07/04 : Categories morphgnt : 0 trackbacks : 1 comment (permalink)

Developments on Atlanta Reality Show

This next week is an exciting week for the reality show concept I've been helping out Tom Bennett on. I can't say too much more at this point other than that a bunch of industry veterans who've seen the demo love the concept and it will be shown to some important people this week.

2005/07/03 : 0 trackbacks : 0 comments (permalink)

Unreadable Canon RAW Files on Compact Flash

Before going on my trip to Europe, I switched my Canon 10D to RAW mode and bought two 1.0GB compact flash cards.

Half way into the trip, my camera started getting "Err 99" problems. I lost a lot of shooting opportunities re-booting the camera after each error, but when a photo did successfully get taken, I had no problems downloading it to iPhoto on my PowerBook.

Then, on the second last day, I was transferring one of the 1GB cards to my PowerBook and it complained that the files were not a recognizable format. Judging from the fact the .CRW files were sitting in a temp directory, the transfer seemed to go okay. And I can view the photos without issue on the camera itself.

Anyone experienced this problem before? Any ideas how I can recover at least the embedded JPEGs from the CRWs?

I'm going to have to send the camera in to Canon to get the Err 99 problems fixed. That I can live with. Losing 160 photos is more upsetting.

2005/07/02 : 0 trackbacks : 2 comments (permalink)

Mount Pilatus Myst

Is it just me or is there something Myst-like about this shot I took from the top of Mount Pilatus in Lucerne?

top of mount pilatus

2005/07/02 : 0 trackbacks : 0 comments (permalink)

More Old XML Posts

Trying to dig up some old posts on behaviour sheets, I came across two interesting posts I made to xsl-list back in August 1998:


My feeling on the issue is that a spec be developed for tree addressing patterns that serves the needs of both XPointers and XSL patterns. Such a spec could stand apart (but be normative to) both XLink and XSL.


It occurs to me that maybe the formatting objects could be separate too.

I would actually like XSL to consist of three separate things:

1. Pattern Language for Tree Addressing; 2. DTD and Specification of Formatting Objects; 3. Specification for Stylesheets themselves, Tree Transforms, etc.

Given that XPath, XSL-FO and XSLT now have very separate existences, it's funny to think they started off as essentially one spec.

2005/07/02 : 0 trackbacks : 34 comments (permalink)

Behaviour Sheets Becoming A Reality

In the first couple of years of XML, I remember having discussions with people like Steve Ball and Paul Prescod about a hypothetical beast we called "behaviour sheets". The idea was that, just like stylesheets associate a style with particular elements or patterns of elements, a "behaviour sheet" associates behaviour (e.g. what to do when clicked on or moused over or dragged) with particular elements or patterns of elements.

Netscape submitted a spec to the W3C, although they called them Action Sheets.

Well, the idea (and an implementation) has emerged again in the form of a Javascript library called Behaviour. Ben's a Kiwi so he spells it correctly too! :-)

2005/07/02 : 0 trackbacks : 4 comments (permalink)

Summer of Code Kick-off

Congratulations to Elliot Cohen and all the other successful applicants to Google's Summer of Code.

I will be mentoring Elliot's project to create a Python library for Bayesian networks. Thank you to the Python Software Foundation for giving me the opportunity to do this.

The Summer of Code requires the project be hosted by a site like SourceForge. Much to my delight, Elliot is keen to use Subversion rather than CVS so we're likely going to give BerliOS a go. BerliOS uses the SourceForge code but already has support for Subversion.

I've also suggested Elliot start a blog and wiki.

2005/07/01 : Categories python summer_of_code software_craftsmanship : 0 trackbacks : 1 comment (permalink)

Path Homotopy

Previously we defined the notion of homotopy.

Two functions that are continuous deformations of one another are homotopic even if the two functions aren't paths.

But if the two functions are paths, then we can further define a stricter notion called path homotopy.

Two paths are path homotopic iff they are homotopic and they have the same start point and end point throughout the deformation.

In other words, if our paths are functions f and g from the interval [0, 1] to a topological space X, then path homotopy means not only the existence of a continuous map F : [0, 1] x [0, 1] -> X where

but also that:

for all t in [0, 1].

deformation of one path f to another path g with same start and end points viewed as a map from I x I

2005/07/01 : Categories poincare_project : 0 trackbacks : 3 comments (permalink)

On Way Back to Boston

I'm currently at Zürich airport waiting to catch a flight back to Boston. Had a wonderful trip—photos will be online at some stage.

Hundreds of emails and thousands of blog entries to catch up on :-)

2005/06/29 : 0 trackbacks : 0 comments (permalink)

MorphGNT Update

A couple of months ago, I talked about the current process I'm going through to identify errors in my morphologically parsed Greek New Testament, MorphGNT. By the end of April, I was down to 400 mismatches I needed to check. At the time, I thought I'd be able to finish going through them by the time I left to go to Europe on holiday.

Unfortunately, I haven't actually worked on it at all the last month. I'm leaving tomorrow but still have 350 mismatches to check (an estimated 14 hours work).

Hopefully I'll get it done some time during July and then I'll be able to release another version of MorphGNT.

2005/06/10 : Categories morphgnt : 0 trackbacks : 4 comments (permalink)

Three Weeks Off

I'm at the British Airways lounge at Logan Airport, Boston, just about to get on a plane to Zurich connecting via London.

I'll be travelling around Austria and Switzerland for the next three weeks—don't know if I'll be blogging at all during that time, but I'd say I'll find the chance.

2005/06/10 : Categories travel : 0 trackbacks : 3 comments (permalink)

Leonardo 0.6.1 Released

I am pleased to announce the release of Leonardo 0.6.1.

Leonardo is the Python-based content management system that runs this site and provides blogging and wiki-style content.

This is a minor bug fix release which updates the two wiki-formatting engines.

You can download it at:


by James Saiz : 2005/06/09 : Categories leonardo python announcements (permalink)


I finally got to watch Primer on DVD. Actually, I watched it twice in a row (and some scenes a third time).

Imagine if Darren Aronofsky had directed a screenplay that Christopher Nolan wrote after wondering what Back to the Future would have been like if Kubrick had made it instead of 2001. (sorry, that was the first thing that came to mind :-)

Wow! The last two films I can recall that had this much of an effect on me were The Usual Suspects and Memento.

But in the case of both The Usual Suspects and Memento it was the writer and director's (or writer/director's) second film—the masterpiece they made after they cut their teeth on a first feature. But Primer is Shane Carruth's first film. Made for $7000 (shot on Super 16 with mostly practicals and, apparently, a pretty close to 1:1 shooting ratio). Seven grand! Heck, the eventual blow up to 35mm alone would have cost way more than that.

It's not perfect. The acting was sometimes a little off (although David Sullivan is great) and there were occasional problems both with focus and sound quality. But that doesn't stop it from being one of the best films I've ever seen. It will be very interesting to see what Shane Carruth will do next.

On the one hand I'm totally inspired to go make my first feature. On the other hand, it sets the bar so high, I'm almost too scared to. After all, one only gets one chance to make a first feature.

2005/06/09 : Categories filmmaking : 0 trackbacks : 2 comments (permalink)

Continuous Functions are between Topological Spaces not Sets

In the Poincare Project, I've said things of the form "a continuous function from (some subset of the real numbers).

There's an assumption in that phrase that's worth pointing out.

Whenever someone talks about a continuous function, they are actually talking about a mapping between topological spaces rather than just between two sets. This is because the definition of "continuous" requires a topology.

So, in this context, whenever I say "the reals", I mean "the topological space consisting of the set of real numbers with the standard order topology". Recall that any totally ordered set has a particular topology that can be derived from the ordering relation.

Mathematicians frequently take this kind of shortcut and it should always be clear from the context what is being referred to. But I think it's useful to point out because I think it's something that needs to be understood explicitly.

2005/06/08 : Categories poincare_project : 0 trackbacks : 55 comments (permalink)

Apple on Intel and the Osborne Effect

A number of people (such as Jeff Nolan) have suggested that Steve Jobs's announcement of the move to Intel will hurt Apple due to the Osborne Effect.

If Steve had announced that a G6 PowerMac or G5 PowerBook was going to ship in 2006, wouldn't that be just as likely to cause an Osborne Effect?

So even if there is an Osborne Effect, I don't think see why it should be attributed to the switch per se.

Mind you, given there are always new technological innovations, holding off on something you were planning to do now because of an announcement about a release a year away doesn't make that much sense to me. A quarter or two maybe. But not a year.

But then again, no one said the Osborne Effect was rational. Just that (some) humans think that way.

2005/06/08 : 0 trackbacks : 4 comments (permalink)

Mentoring the Summer of Code

I've put my name forward as a Python mentor for the Google Summer of Code. That's not to say that the Python Software Foundation has accepted me as a mentor for official projects, but I'm making it known that I'm interested in helping out.

If you are thinking of applying and you have an interest that might overlap with mine, please feel free to email me.

Besides the specifics of a project, I believe I can help a lot with more general questions of software craftsmanship in an open source context.

In light of my previous blog entry, this definitely feels like an opportunity for me to "give back", although that phrase generally makes me cringe :-)

UPDATE (2005-06-30): I'm official mentoring Elliot Cohen's Bayesian Network project. Watch this blog for updates.

2005/06/07 : Categories python software_craftsmanship : 0 trackbacks : 12 comments (permalink)

Yet Another 2D Political Test

The Political Gauge (via Norm Walsh)

No surprises:

On Non-Fiscal Issues, you rank as a Moderate Liberal (34). On Fiscal Issues, you rank as a Strong Conservative (84).

I still find it confusing that Americans call economic liberals "conservative" and being fiscally "liberal" means anything but.

2005/06/07 : Categories politics : 0 trackbacks : 2 comments (permalink)

Be Careful What You Ask For

I was ego-surfing Google Print and found that a question I asked the FoRK mailing list back in 1999 had been quoted in a book.

The email read:

My XSL formatter/renderer, FOP, is soon to have more than just myself as the developer and as I communicate with would-be co-developers, I've started wondering about software engineering in open source projects.

There have been numerous musings on the business and anthropology of open source. Is anyone aware of readings that address the actual software engineering issues?


The book is Understanding Open Source Software Development by Joseph Feller and Brian Fitzgerald and my second paragraph is quoted on page 6.

I don't mind they quoted me—it's just funny that it was just a simple query to a mailing list that they quoted.

Incidently, I've done a lot of collaborative open source and distributed development since I asked that question. I should probably blog what I've learnt as a way of answering my own question, six years on.

2005/06/07 : 0 trackbacks : 0 comments (permalink)

Google Sitemaps

For as long as this blog has had an Atom feed, I've also published my entire site as an Atom feed whose entries include the pages outside the blog too. I'm not talking about a "recent changes" feed (although that would be useful), I'm talking about a snapshot of the entire site.

It was initially just an experiment in uses of Atom beyond blogs but it had the interesting side-effect that, if I subscribed to the feed in Bloglines, Bloglines would tell me whenever someone referenced a non-blog page on my site.

I don't publish the URI of my "site map" because, until Leonardo has caching (which is the big theme of 0.7) it's too inefficient to generate frequently.

Now, thanks to Google, I have an extra incentive to do so. Google has just announced Google Sitemaps which is a format for informing search crawlers about resources that exist on your site.

Like Bob Wyman, I wondered why they couldn't have just used Atom as the format for this. Well, buried down in a FAQ, Google say they will accept Atom 0.3 feeds. So the feed I produce for jtauber.com will work right now.

I still would have preferred them to adopt Atom as the primary format and just use extensions for any extra information they needed.

2005/06/04 : Categories google atom_format this_site leonardo : 0 trackbacks : 1 comment (permalink)


Consider two paths in the same topological space, X. Let's say one is the image of the map f from the interval [0, 1] and the other is the image of the map g from the interval [0, 1].

If it's possible to continuously deform f to g the two are said to be homotopic.

If x is the parameter for a path and t is the parameter for the deformation then we can think of the deformation as a continuous map F : [0, 1] x [0, 1] -> X where

and F(x, t) for some t, 0 < t < 1 is a path somewhere along in the deformation from f to g.

F is referred to as a homotopy from f to g.

deformation of one path f to another path g viewed as a map from I x I

Homotopies, as we shall soon see, will turn out to be a key to the topological difference between a sphere and a torus and will form the basis for our description of the Poincaré Conjecture itself.

2005/06/03 : Categories poincare_project : 0 trackbacks : 10 comments (permalink)

Atom Publishing Protocol

Dave Johnson of Roller fame has a great post outlining how the Atom Publishing Protocol will work.

I've commented before that I'd like to support blog clients in Leonardo and felt that a REST style made this much more straightforward.

Fortunately, Atom Publishing Protocol (APP), shares a very similar model to Leonardo so it looks like adding APP support to Leonardo will be fairly easy.

Also, the next version of Leonardo (discounting any bug releases) will switch to supporting the final version of the Atom feed format (assuming it's done in time, which it should be).

by James Saiz : 2005/06/03 : Categories atom_protocol leonardo (permalink)

Leonardo 0.6.0 Released

I am pleased to announce the release of Leonardo 0.6.0.

Leonardo is the Python-based content management system that runs this site and provides blogging and wiki-style content.

New features include:

You can download it at:


2005/06/01 : Categories leonardo python announcements : 0 trackbacks : 1 comment (permalink)

Java method implementations whose arg types are broader than declared in the interface

What is the motivation for disallowing this in Java?

interface A {
  void foo(C arg);
interface B {
  void foo(D arg);
interface C {}
interface D extends C {}

public class E implements A, B { public void foo(C arg) {} }

The compiler complains that E doesn't implement the foo(D) required by B.

I realise the underlying issue is that the following doesn't compile either:

public class F implements B {
  public void foo(C arg) {}

but I don't understand why Java disallows it?

2005/06/01 : 0 trackbacks : 9 comments (permalink)

Tintin Movie News

It's been ages since I've heard any news about Spielberg's live action adaptation of Tintin.

Via Animated News:

Quint at Ain't It Cool News recently visited the set of Steven Spielberg's War of the Worlds, coming to theaters June 29, 2005. Within this revealing two-part article is not only interesting information regarding War of the Worlds, but small details on Spielberg's other upcoming projects as well. For instance, after discussing King Kong with Peter Jackson and seeing a reel showcasing the magic at Weta, Spielberg feels that the computer effects company is capable of bringing Tintin's faithful canine companion Snowy to life in the Tintin film Spielberg is producing.

2005/06/01 : Categories tintin : 0 trackbacks : 2 comments (permalink)

Paths as homeomorphisms of the closed interval from 0 to 1

Previously, I defined a path in terms of a continuous function from a closed interval on the reals to a set of points in a topological space.

Because the function is continuous, by definition, the resultant image is homeomorphic to the closed interval on the reals. Because any closed interval on the reals is itself homeomorphic to the specific closed interval [0, 1] then the image of a path can be said to be homeomorphic to the real interval [0, 1].

UPDATE (2005-06-01): As Michael Hudson points out in a comment, a path will only be homeomorphic to the closed interval [0, 1] if it doesn't cross over itself. Homeomorphisms require the function to be bijective, continuous and have a continuous inverse. A path that crosses over itself doesn't meet these criteria.

2005/05/28 : Categories poincare_project : 0 trackbacks : 5 comments (permalink)

Syntax by any other name

One challenge doing any kind of cross-disciplinary work is the differences in terminology. It's one thing where the same concept gets two different names—it's a lot harder when two different things get the same name.

I recently got into a confusing discussion on the b-greek mailing list where people (including some notable scholars) were saying things like "word order doesn't always alter syntax in Greek". As a linguist that sounds like utter contradiction but people insisted it was true for some constructions in Ancient Greek and "prominent linguists" had recognized this for decades.

I finally took my own advice and stepped back to look at the terminology being used. Then it struck me.

What gets called "syntax" by Greek scholars is largely what I would describe as mapping grammatical relations (e.g. SUBJECT) to semantic roles (e.g. AGENT). That's why when you open up a book on Greek "syntax" it spends a lot of time talking about what different cases means semantically.

This isn't what syntax means to a formal linguist or computer scientist. To them, "syntax" has to do with things like constituent structure and word order.

Now, in English, grammatical relations are predominantly determined by word order rather than morphology whereas in Greek, the word order matters less in determining grammatical relations and morphology takes on that role.

And here is the crux of the terminology confusion. Consider the previous paragraph. You could replace "word order" with "syntax" and it would mean (roughly) the same thing to a formal linguist. I suspect you could replace "grammatical relations" with "syntax" and it would mean the same thing to a Greek scholar.

So here's a way I suggested we could avoid confusion on the b-greek mailing list:

A. Whenever I see someone say "syntax", I'll read it as "grammatical relations".

That way "word order doesn't always alter syntax in Greek" reads to me as "word order doesn't always alter grammatical relations" and I'll agree.

B. Whenever one sees me say "syntax", one should read it as "constituent structure, word order, etc"

That way "word order doesn't always alter syntax in Greek" reads as "word order doesn't always alter constituent structure, word order, etc in Greek" and you'll see why I think it's a contradiction.

2005/05/28 : Categories linguistics : 0 trackbacks : 2 comments (permalink)

Finding Dependencies in Tabular Data, Part 2

Yesterday I wrote about code in Python 2.4 to find out if the range of possible values in one column of tabular data is affected by the value of another column.

I posed the question there: What if you want to check the dependency, not between just two columns but two groups of columns?

Here is the original function for reference:

def find_dependencies(col_i, col_j):
    for i_value in possible_values[col_i]:
        j_values = set()
        for row in rows:
            if row[col_i] == i_value:
        if j_values < possible_values[col_j]:
            yield i_value, j_values

and here is a modified version that takes two sequences of column indices (rather than two column indices):

def find_dependencies_2(cols_i, cols_j):
    for i_value in cartesian_product(non_contig_slice(possible_values, cols_i)):
        j_values = set()
        for row in rows:
            if non_contig_slice(row, cols_i) == i_value:
                j_values.add(non_contig_slice(row, cols_j))
        if j_values < set(cartesian_product(non_contig_slice(possible_values, cols_j))):
            yield i_value, j_values

So find_dependencies_2((0,1,2), (3,4)) returns which tuples made up of the 0th, 1st and 2nd columns of a row reduce the possible values that can be taken by the tuple made up of the 3rd and 4th column of the row.

What was interesting in writing it is that I merely needed to change



non_contig_slice(row, cols_n)




cartesian_product(non_contig_slice(possible_values, cols_n))

Where cartesian_product is defined as:

def cartesian_product(sets, done=()):
    if sets:
        for element in sets[0]:
            for tup in cartesian_product(sets[1:], done + (element,)):
                yield tup
        yield done

and non_contig_slice is defined as:

def non_contig_slice(seq, indices):
    result = ()
    for i in indices:
        result += (seq[i],)
    return result

Successive applications of find_dependencies_2 with different combinations of column indices can be used to determine what dependencies exist between columns in tabular data.

More on that soon.

2005/05/27 : Categories python : 0 trackbacks : 5 comments (permalink)


It goes through each possible value in the i column and finds out if fixing that reduces the possible values in the j column.

Notice that it makes use of < as a set operator for proper subset. It's a generator too. It will yield any value in the i column that restricts the values in the j column, along with what the restriction is.

What if you want to check the dependency, not between just two columns but two groups of columns?

The solution turned out to involve a couple of cool modifications which I'll save for a followup post.

2005/05/26 : Categories python : 0 trackbacks : 10 comments (permalink)

Hanon Exercises

Hanon's exercises entitled "The Virtuoso Pianist" are going well so far.

The first part (which I'm on) consists of 20 exercises to increase finger agility and strength (especially in the naturally weak fourth and fifth fingers).

The second part consists of 23 exercises that further prepare the fingers for the third part which consists of what are called the "Virtuoso Exercises".

The first 20 exercises are to be played starting at 60bpm and working up to 108bpm. Here's how I'm doing them.

with the tempo of one pair of exercises being the tempo I played the previous pair the day before.

The sequence of tempos I (plan to) progress through is: 60bpm, 70bpm, 80bpm, 85bpm, 90bpm, 95bpm, 100bpm, 105bpm, 108bpm

I'm currently at 95bpm on 1 & 2 down to 60bpm for 11 & 12.

2005/05/26 : Categories piano : 0 trackbacks : 0 comments (permalink)

Leonardo 0.6 Release Candidate 1

The first release candidate of Leonardo 0.6 is now available at


Leonardo is the Python-based content management system that runs this site.

Assuming no blockers are found, I'll probably release Leonardo 0.6.0 early next week.

Let me know if you encounter any problems at all.

2005/05/25 : Categories leonardo : 0 trackbacks : 3 comments (permalink)

Date for O.C. Screening of Alibi Phone Network

The West Coast premier of Alibi Phone Network will be at 6pm on 3rd June (next Friday). It will be one of nine shorts shown that session of the O.C. Shorts Festival. It's a shame I can't make it but I've heard that at least one of our actors will be there.

2005/05/25 : Categories alibi_phone_network filmmaking : 0 trackbacks : 4 comments (permalink)

Almost Ready for Leonardo 0.6 Release

Tonight I finished the remaining items I wanted to get done for the release of Leonardo 0.6.

That puts me ahead as I wasn't planning on a release candidate until the weekend and I should get it out tomorrow.

Will give me more time the rest of the week to work on editing the Atlanta reality show pilot.

2005/05/24 : Categories leonardo : 0 trackbacks : 6 comments (permalink)

Testing For Directories Outside the Tree

In Leonardo, I have a case where I am concatenating a fixed directory x and a relative path y.

I want to avoid the result being outside the directory tree rooted by x.

Any ideas?


root = os.path.abspath(x)
path = os.path.abspath(os.path.join(x, y))
assert path.startswith(root)

a reasonable approach?

Actually, I should clarify: y isn't a relative path as such. y can be '/' which should taken to mean x. So perhaps what I want is:

root = os.path.abspath(x)
path = os.path.abspath(os.path.normpath(x + os.sep + y))
assert path.startswith(root)

I ruled out

assert os.path.normpath(x + os.sep + y).startswith(x)

For the case where 'x' is itself relative.

2005/05/24 : Categories python : 0 trackbacks : 9 comments (permalink)

Checkpoint at 31.5

Just went past the half-way mark between being 31 and 32.

On my 31st birthday, I posted a list of goals for my 32nd year. Let's see how I'm going:

Overall, not looking good but having spent months away from home, I have an excuse for some of them.

What am I most pleased with my progress on? Definitely Leonardo!

2005/05/24 : Categories personal : 0 trackbacks : 3 comments (permalink)

Dogbert the Ungrammatical

Today's Dilbert made me laugh but I found the second panel ungrammatical.

The antecedant of the plural "them" is the singular "every part of your body".

Secondly, when I checked with my sister Jenni she pointed out the use of "would" in that panel doesn't sound right either. It implies the existence of the procedure is hypothetical, which is not how it is presented in the first panel.

The "would" might just be a difference between Australian English and American English but it could also be a subtle slip up because the procedure is, in fact, hypothetical.

The "them" is, however, just plain ungrammatical as far as I can tell.

2005/05/23 : 0 trackbacks : 10 comments (permalink)

Managing Bibliographies with BibDesk

In preparation for my PhD, I recently started investigating Mac OS X tools for managing BibTeX-based bibliographies.

In the end I settled on BibDesk. I chose it because of its functional merits but it's great that it also turns out to be open source.

Because BibDesk allows me to link from an entry to a file on my local filesystem, I can just put all my PDFs in one directory and use BibDesk as the interface to all the papers.

One thing that I don't believe is supported (yet) but which I would like to use as work on my literature review continues is the ability to express relationships between entries, perhaps along the lines I talked about in Google Scholar and Typed Citations.

Of course, then I'd like to express relationships between other entities such as authors and maybe concepts, terminology, etc.

Actually, a lot of the features I'd like to see in BibDesk are features I'd like to see in any MicroContent browser. After all, that's what BibDesk really is.

2005/05/22 : Categories phd information_management : 0 trackbacks : 0 comments (permalink)

Which Releases Have This Bug

I've talked before about my thoughts on severity and priority in issue tracking systems.

Things seem to be working well so far in how I've customised Roundup for Leonardo.

One thing that is still missing, however, is the ability for me to do queries like "show me all the bugs in 0.6b1 that have now been fixed". This is helpful for generating release notes. A simple list of "all the bugs that have been fixed in this release" isn't sufficient because it tends to include a lot of bugs that were only introduced during development of that release (e.g. bugs in new features).

So a simple "what build was this bug found in?" field is not what I want (although that's still useful, it doesn't solve the problem at hand). What I want is a field that lists which releases were shipped with the particular bug.

I think for most projects, it doesn't need to be a comprehensive list; really it just needs to be whether the bug existed in the last major release and the last minor release.

2005/05/22 : Categories software_craftsmanship : 0 trackbacks : 7 comments (permalink)


Via Bob Congdon, found an online puzzle of a very different type than the Python Challenge.

Check out Hapland. Very clever and a lot of fun.

2005/05/22 : Categories games : 0 trackbacks : 4 comments (permalink)

Leonardo 0.6 Beta 1 Released

The first beta of Leonardo 0.6 is now available at


Leonardo is the Python-based content management system that runs this site.

I'm still putting together a list of what's new since 0.5 but it's big: comments, trackbacks, file upload, categories and heaps of internal improvements.

Try it out and let me know below or via email how you go.

2005/05/21 : Categories leonardo : 0 trackbacks : 7 comments (permalink)

Upgrade Successful

Well, I survived upgrading this site to the latest revision of Leonardo (@278 on the trunk)

I'll release a beta this weekend.

2005/05/20 (permalink)

Comments Welcome

Leonardo 0.6 will include the beginnings of support for trackbacks and comments.

I'm turning them on on this post just to see how things go.

Feel free to comment and/or trackback!

UPDATE (2005-05-21): Turning off trackbacks and comments now. Testing is done :-)

2005/05/20 (permalink)

About to Upgrade Leonardo

I'm about to upgrade the software running this site and blog to a pre-release of 0.6 beta 1.

Apologies in advance in case anything goes wrong.

2005/05/20 (permalink)

Context-Free Design Grammars

Chris Coyne's Context Free Design Grammars appeal to me on so many levels (linguistically, artistically, mathematically, computationally, ...)

Basically they are a set of production rules where the terminals are geometric shapes (actually just a circle and a square) and each symbol on the right-hand-side of a rule is augmented with a geometric transformation.

So a sentence in the generated language is just a collections of squares and circles at different positions, sizes and orientations.

But the results are stunning.

Of course, immediately after discovering this, I had to write a Python implementation. My first implementation immediately hit the recursion limit so I rewrote it to use a pool of states rather than recurse. Coincidently, I used exactly the same technique working on level 24 of the Python Challenge and avoided the recursion depth issues others had encountered.

Once it's cleaned up, I'll make my Python implementation available.

2005/05/19 (permalink)

What Planet Am I On?

For reasons unknown to both myself and Ryan Phillips, my blog entries are no longer appearing on Planet Python.

Nothing has changed at this end. Sounds like it could be a problem with 304 Not Modified. The fact my feed gives 304s makes debugging the feed difficult at times.

2005/05/17 (permalink)

Python Challenge Continues

This week will be so much more productive if I just stay away from http://pythonchallenge.com/ but they've recently put up levels 23-26 and they are calling my name.


UPDATE (2005-05-18): Argh! I can't resist it. Fortunately, level 23 was easy. I think it was deliberate to suck me back in.

2005/05/17 (permalink)

Film Project Update: Accepted at Another Festival

Just found out that Alibi Phone Network made it in to the official selection of the O.C. Shorts Festival. It seems a cool little festival but unfortunately it's unlikely any of the filmmakers can make it out to California at that time.

Two of our actors are in L.A., though, so hopefully they can make it.

2005/05/17 (permalink)

Almost Ready for Next Leonardo Beta

I'm almost ready to release the first beta of Leonardo 0.6.

A lot has been improved since 0.5 and I'm keen to get 0.6 out so everyone can switch to it.

2005/05/17 (permalink)

43 Things and Self-Normalizing Folksonomies

Python Challenge is still sucking up my time but I did take a break and take another look at 43 Things.

43 Things is a site for declaring your goals and matching you up with other people who have the same goals or who have already accomplished them.

They've added some new features since I first checked out the site and one of them really impressed me—how they deal with the issue of distinctions without a difference. i.e. goals that are really the same thing but have been created separately and given different names.

Because goals are identified by the string given in answer to "I want to...", there is a distinction made between say "speak Italian fluently" and "speak fluent Italian" even though they are clearly the same goal.

How does 43 Things solve this?

When someone notices two very similar goals, they can suggest that one is really similar to the other. When they do this, the pages for both goals start showing the other goal under the heading "People have suggested XYZ is really the same as..."

Other people can then, with a single click (hmm, probably a GET), switch their goal from one to the other.

But here is the really clever thing. They say whether the other goal has more or less people. This means you can voluntarily switch your choice of goal naming to the one that emerges as more popular.

So the community's folksonomy becomes self-normalizing.

2005/05/12 (permalink)

Python Challenge

With the exception of a break to have an excellent dinner with James Marcus, I've spent the last twelve hours working on the Python Challenge. It's like playing Myst but with Python scripts and the Web.

I'm currently stuck on level 17 and just asked my first question on the forum.

I'm happy to give anyone hints up to that level.

UPDATE (2005-05-09): Up to level 20. No hints on the forum yet :-)

2005/05/09 (permalink)

More Metadata Adventures in Tiger

I created my first Smart Folder in Finder. I noticed when selecting what metadata fields to search on there were (besides all the photography ones) things like Project. I would love to be able to tag each file with what project it relates to. How do I add this to an arbitrary file, though?

And how do I add this to an email message? Not that it matters because Smart Folders in Finder seem to exclude searching email or vCards.

If I create a Smart Folder with something like Kind = Any and Author = James Saiz as the query and then go to Get Info on the folder, it shows the query as:

(kMDItemAuthors = 'James Saiz'cd) && (kMDItemContentType != com.apple.mail.emlx) && (kMDItemContentType != public.vcard)

So let me get this right: Kind = Any means any but email and vCards.

Interestingly, though, the query found Powerpoint presentations and Word docs even though I don't have either app installed at the moment.

2005/05/06 (permalink)

Metadata in Mail 2.0

I mentioned earlier that the ability to add metadata to emails and create smart folders based on that metadata is the feature that would secure my continued use of the new Mail 2.0 in Mac OS X 10.4 'Tiger'.

I find it odd that rules can colour messages but you can't manually label an email with a colour. If you could do that then have smart folders based on colour, that would be enough for how I want to organize my email.

But, of course, I'd love arbitrary metadata. And this is where it gets interesting.

Tiger has a command-line tool mdls which lists the metadata for a particular file. It is this metadata that is available to Spotlight.

All my email messages are downloaded by Mail via IMAP and put into ~/Library/Mail and each email message (and attachment) gets its own file.

I just tried mdls on one of those files and it has metadata for things like ItemTitle (subject), ItemAuthors and ItemRecipients. They are actually displayed in the Finder Get Info under More Information too.

If I add a Spotlight Comment in the Finder Get Info window, mdls will show it and I can easily search for it with Spotlight. From the Finder I can set a colour label and that shows up in mdls (and is hence Spotlightable) as well.

Why can't Mail 2.0 make better use of this?

UPDATE (2005-05-7): Joe Weaks suggested AppleScript for setting the background colour on a mail message. A quick Google search revealed the Label Your Mail hack from the O'Reilly Panther Hacks book. I'm still surprised this didn't make it in as a feature in Mail 2.0

2005/05/06 (permalink)

An HTTP Lesson from Google

Hopefully Google's Web Accelerator will teach a whole new generation of Web developers the dangers of using GET when they should be using POST.

2005/05/06 (permalink)

Setting Up Tiger

Not much blogging lately. Tiger arrived on Monday and I got it installed Tuesday evening. I haven't transferred any of my data back yet, although I've downloaded subversion binaries and SubEthaEdit ready to work on Leonardo.

Spotlight and the Dashboard have already been great time savers. I haven't tried Automator yet. The feed-reading capabilities of Safari RSS are actually better than I thought they'd be. I could even see reading a particular class of feeds there rather than in NetNewsWire.

Smart folders in Address Book mean I can fake tags by putting text like @filmmaker in the notes on a person and then creating a smart folder for cards whose notes contain @filmmaker. The birthday field integrated with iCal is very cool.

I'm giving Mail.app another chance. My 2 Gig+ of IMAP mail has pretty much been sync'ed. But without the ability to annotate mail, I can't fake tags with smart folders. Not sure I'll last on Mail.app without something like that.

2005/05/04 (permalink)

Watching Feynman

I'm just about finished watching the third of four lectures Richard Feynman gave at the University of Auckland in 1979. Besides being a fascinating overview of quantum electrodynamics for a general audience, it's wonderful to just see Feynman in action.

I'd always heard what a fantastic lecturer Feynman was and so I was keen to see him for myself. He was brilliant but not in the way I expected. He wasn't the most well-spoken person I've listened to; sometimes he would get a little lost in his train of thought, go off on tangents or start to say something only to decide not to proceed down that path; sometimes he'd make mistakes that he'd have to go back and correct.

So despite this, why were his lectures so good? A large part of it was his ability to extract out the key ideas of a theory and present them in a way that was relativity simple but still faithful to the full theory. This is true of his writings too. But what made his lecturing so good?

Four things come to mind:

His authority and his humility interacted in very interesting ways. Here was a man who was so comfortable with what he did and didn't know that he didn't need to boast. He could say to the audience "I'm not going to explain this because you wouldn't understand it" and not seem arrogant because he would just as often say "I'm not going to explain this because I don't understand it".

His humour was also remarkable; a combination of self-deprecation and genuine wit. When he made a mistake or decided to back-peddle a topic or example he had started, he'd always recover in a way that made the audience laugh.

But the thing that stood out more than anything else was his excitement about what he was teaching. You can tell, watching the video, that he just loved explaining this stuff to people. If I had to pick one thing that set him apart it would be that.

Even if you are not really that interested in physics, watch at least one of the videos just to see what a truly great teacher is like.

2005/05/02 (permalink)

Quicktime 7 for Panther

I haven't upgraded to Tiger yet because it won't arrive from Amazon until tomorrow. But today Software Update on my Panther-running PowerBook informed me that Quicktime 7 was available.

Given that the Quicktime 7 upgrade is available for Panther, it isn't really accurate to say that Quicktime 7 is a feature of Tiger. It just happened to come out at the same time and so is bundled with it.

I wonder if any other so-called Tiger features will be available as free Software Updates for Panther users. I'm guessing Safari 2.0 or Mail might be another contender. Maybe even iChat AV.

2005/05/01 (permalink)

Happy Birthday

Happy 50th, Dave.

2005/05/01 (permalink)

The Hard Way

Tonight I discovered the hard way that my rental car doesn't have a low fuel light.

UPDATE (2005-05-01) : And today I discovered the hard way where Burlington Police has your car towed if the side of the road you push your car to happens to be a fire lane.

2005/04/30 (permalink)

Finally Tonight

Finally tonight, I got to try my new keyboard. I actually got used to it pretty quickly. Spent a little while playing the first exercise from Hanon's The Virtuoso Pianist. It's amazing how much of a buzz it gave my fingers.

Finally tonight, Amazon shipped Tiger. It should arrive on Monday, just as another busy week of work begins.

2005/04/30 (permalink)

Did I Miss Something? (Besides Tiger)

I ordered Tiger from Amazon because they said they'd ship on 28th. It's now 29th and my order still hasn't shipped.

2005/04/29 (permalink)

Keyboard Arrived

The M-Audio Keystation 88es arrived today.

I haven't hooked it up yet but I have tried out the action. It's the first non-Hammer-action keyboard I've used in five years and so it feels really light. They call it semi-weighted so I'd hate to feel non-weighted.

But for $250, it's worth it and as a colleague just pointed out, I have 88 more keys than I did yesterday :-)

2005/04/28 (permalink)

Should I Continue Title-Only Feed?

Two (separate) questions:

Would anyone continue to subscribe to the title-only feed if there were a summary feed?

Would anyone object to me stopping the title-only feed all together?

Please email jtauber /at/ jtauber /dot/ com.

2005/04/27 (permalink)

Poincare Project: Topological Properties Revisited

Part of the Poincare Project.

Recall that a topological property is one based only on the open sets of a topology and not any other structure. For this reason a topological property is preserved under a homeomorphism. If one topological space has a topological property and another doesn't have that property then the two spaces can't be homeomorphic.

So far we've talked about the following topological properties:

Compactness is enough to topologically distinguish a circle from an open interval. A circle is compact whereas an open interval is not.

Connectedness is enough to topologically distinguish the real line R from the plane R^2 because if you take away a point from R and from R^2 then R is disconnected but R^2 is still connected.

We don't yet have a topological property that can distinguish a sphere from torus. We shortly will and it will be at the heart of the Poincare Conjecture.

2005/04/27 (permalink)

Designing from the Outside In

In his post Designing from the Outside In on the new O'Reilly Radar blog, Tim O'Reilly mentions a conversation he had with Jason Fried from 37signals (is it so-called because 37 is a psychologically random number?)


believes that contrary to the normal expectation that applications are built on top of frameworks, applications should always be designed "from the outside in." That is, at 37signals, they try to design the usability and function of the application first, and that drives the implementation. And if they can then extract a re-usable framework, all the better. For example, basecamp wasn't built on top of Ruby on Rails. Rather, Ruby on Rails was extracted from basecamp.

That notion of extracting a re-usable framework after the fact struck me as interesting because that's really what's happened with Leonardo. Two years ago, I wrote a little wiki-like script in Python in order to enable editing of content on jtauber.com from a browser. I then decided to expand it just over a year ago to include a blog. Now, as more features are being requested, an underlying web framework is emerging that could very well be useful outside of running a wiki or blog.

It reminds me of a point Jon Bosak used to make that Backus-Naur Form (BNF) came out of work on the specification for Algol. Another example of extracting the general from the specific rather than attemping to build the general in isolation of a specific use.

Tim also mentions Jason's referring to Christopher Alexander's Paths and Goals pattern.

If you read Tim's full post you'll also see there's another whole aspect to what he's talking about with regard to UI-centric development and the role of designers. Jason's blog is a great read too.

2005/04/27 (permalink)

M-Audio 88-Note MIDI Controller

One of the biggest problems with being away from home for months at a time is not being able to play music. I don't mean listen. I mean compose, improvise and perform.

So tonight I ordered an M-Audio 88-Note MIDI Controller from Amazon (Sam Ash, actually) which I can keep here in the US.

The cost? $250. One-tenth the cost of the Roland A-90EX 88-Note MIDI Controller in my studio at home.

I'm not expecting the touch to be anything like the Roland or my Korg Triton LE 88. But when the alternative is not being able to play anything for months at a time, I'm willing to cope :-)

I don't know why I didn't do this two months ago.

2005/04/23 (permalink)

HTTP Abuse and Leonardo

Jon Udell started it with his article End HTTP Abuse and Leigh Dodds and Ryan Tomayko continue.

Jon Udell is focused on misuse of GET versus POST, arguing that if client-side toolkits made it easier to POST, then GET wouldn't be misused by developers on the server-side. Jon seems to give server-side developers the benefit of the doubt more than I would. I'm with Leigh that it's the server-side frameworks that need to improve.

Both Leigh and Ryan go further with the kinds of things a server-side framework needs to do well including:

Maybe getting these right in a Python web framework is what will help push Python as a language for Web applications.

I'm trying hard to do the Right Thing in Leonardo (which is actually shaping up to be another Python web framework for better or worse). I've done a bad job in some areas (which I hope to fix) but I think I've done an okay job with things like status codes and URI design.

One thing I hate having to do is overcome the lack of HTML forms support for PUT and DELETE by having two URIs /put and /delete that you POST to when you want to PUT the contents of a textarea as a resource or want to DELETE a resource.

I also need to work out how best to do authentication, rather than using cookies like I do (and almost everyone else does).

2005/04/23 (permalink)

Finally Made IMDb

Fulfilling a 10-year-old dream, I'm now listed on IMDb although my producer credit is still missing for some reason.

2005/04/23 (permalink)

Congrats to Jill Effron

Jill Effron, who James Marcus and I hung out with at the Palm Beach festival won the Audience Award for Best Short Film for her film A Day in the Life of a Bathroom Key. Way to go Jill! Must have been my 5 out of 5 vote that pushed it over the edge :-)

It was a great film. Very funny and executed very well.

2005/04/23 (permalink)

Cutting Down on Blog Reading

I reached a peak of 272 blogs. It's a long way from Scoble, but it's too much for me. If I go just a few days without getting a chance to read blogs, I end up one to two thousand entries behind.

So I've started unsubscribing—at least to keep things to a steady-state 256 although I will likely drop further.

It's actually difficult to work out which blogs to drop. I need to be able to rate entries and from this derive a rating for the feed as a whole. But even this isn't quite right as there are some feeds that are very easy to skim and so my tolerance for a lower proportion of good entries is higher.

What really matters is the comparative effort I need to put into reading a particular feed given what I get out of it. Is yield the right term here?

Perhaps what I want is something that combines rating with monitoring of how much time I spend on the feed. Which leads back to attention.xml.

2005/04/22 (permalink)

Poincare Project: Paths

It's been a while. Back to some topology—we're almost ready to state Poincaré's Conjecture.

Consider drawing a curve on the surface of a object. If we view the surface as a topological space then the curve can be thought of as a set of points in the space with the following property: there exists a continuous function from a closed interval on the reals to that set.

mapping from closed interval to set of points in topological space

This is the notion of a path. Some topologists will refer to the function as the path while others will refer to the image (i.e. the set of points in the space) as the path. Often it doesn't matter which is meant, e.g. in the sentence "there exists a path between any two points".

Note that there are an infinite number of continuous functions that result in the same image and vary only in the choice of parameterization.

2005/04/21 (permalink)

IMDb for Music

I've long wished there were an equivalent of IMDb for music.

You would have songwriters linked to songs linked to recordings (linked to albums and producers) and performances (grouped by concerts and also linked to recordings) with both recordings and performances linked to individual performers and bands (linked to performers).

Can someone please start this? I don't have the time :-)

UPDATE (2005-04-22):

A bunch of people wrote to me and mentioned MusicBrainz (which I knew about) and allmusic (which I didn't). Allmusic definitely seems to be closest to an IMDb for Music although the openness of MusicBrainz appeals and might enable some of the more obscure information (like producer, mixing and mastering engineers) that I'm interested in. Integration with things like the ASCAP ACE Title database would be good too.

Thanks to Michael Plump, Henning Koch, Will Guaraldi, Kevin Dangoor and Gavin Burris for replying.

2005/04/20 (permalink)

Are We In or Were We Once In?

This morning I received an email from a prominent US film festival that I won't name. It basically started by saying "seeing as you are part of the official selection for the festival, we thought we'd provide some tips on how to promote your film".

There are issues though. Although we did apply,

So, was the mistake in the email being sent to the wrong person or did we get accepted and I never received the acceptance?

I've asked for clarification but haven't received a response yet. If today's email was in error, I would have expected a pretty prompt correction.

If, however, we did get in, somehow never found out and weren't programmed because they never received a tape from us, then I'll be really really disappointed.

UPDATE (2005-04-22): Turns out we didn't get in and the email was mistakenly sent to me. Not sure if that makes me feel better or not.

2005/04/20 (permalink)

DATR in Python

I previously talked about wanting to implement the lexicon language DATR in Python. Well, I just received an email from Henrik Weber saying that (apparently inspired by my post) he has gone and done an implementation at http://pydatr.sourceforge.net/

Well done Henrik! I'm looking forward to trying it out and maybe contributing.

2005/04/19 (permalink)

Current MorphGNT Work

For the last few months, I've been making corrections to MorphGNT by attempting to merge an English translation (NASB) marked with Strong's numbers with my database. Although it's a tedious process, it's revealing numerous errors.

When James Strong compiled his concordance, he assigned a number to every lemma in the underlying Greek text of the King James Version. Other translations are often made available annotated with these Strong's numbers. Zack Hubert provided me with an electronic text of the NASB translation with Strong's numbers which I converted to something looking like this:

010101 record 976
010101 genealogy 1078
010101 Jesus 2424
010101 Messiah 5547
010101 son 5207
010101 son 5207
010101 Abraham 11

The first column is the book, chapter and verse, the second column is the English word as it appears in the NASB translation and the third column is the Strong's number. Note that not all words are included.

I then found an electronic text of Strong's lexicon and stripped out the formatting and the definitions to just get a list of Strong's numbers with a transliteration of the Greek lemma:

1 a
2 Aaron
3 Abaddon
4 abares
5 Abba
6 Abel
7 Abia
8 Abiathar
9 Abilene
10 Abioud

Finally I took my MorphGNT database and extracted the lemmata:

010101 βίβλος
010101 γένεσις
010101 Ἰησοῦς
010101 Χριστός
010101 υἱός
010101 Δαυίδ
010101 υἱός
010101 Ἀβραάμ

I then wrote a Python program that attempts to merge the first and third files on the basis of the second. Note that the transliterations in Strong's lexicon don't have accents and there is ambiguity too (both epsilon and eta go to 'e'). That's a fairly straightforward part of the join, however, because it can be automated by the script.

The real challenge comes because:

So my program simply indicates whenever it had trouble performing a match and I have to either:

There were initially thousands of exceptions that each required one of these actions. After a number of months, I now have one thousand left. It takes me about 4 hours to make 100 corrections so I still have a little way to go.

When I'm done, I'll release a new version of MorphGNT with the lemma errors that this task revealed corrected.

2005/04/19 (permalink)

April 29th

Tiger (which I've pre-ordered) and the Hitchhiker's Guide to the Galaxy movie both come out on April 29th. 10 more days to go!

Does anyone know if Tiger ships with Python 2.4?

2005/04/19 (permalink)

Film Project Update: The Palm Beach Screening

Saturday was the big screening of Alibi Phone Network at the Palm Beach International Film Festival. There were maybe 100 people in the audience and our film was first of the six shown.

I was really happy with how the film looked on the big screen and I didn't cringe as much as I normally do at my editing.

Very positive feedback. My favourite comment was from one of the festival volunteers who commented that "it was so nice to see a short film that actually had a story".

After the screening I got changed into my tux and went to the Gala dinner. I came close to asking Salma Hayek to dance but chickened out at the last minute.

2005/04/18 (permalink)

Film Project Update: Today is the Day

Today's the day Alibi Phone Network screens here at the Palm Beach International Film Festival. James Marcus and I have been having a great time since we arrived Wednesday evening, hanging out with some wonderful people and seeing some good films (When Do We Eat? on opening night and A Perfect Fit last night). Tom Bennett joined us last night after the film when went for drinks with the producer and director of A Perfect Fit.

Our film is showing at 4pm. Then tonight I'm going to a black-tie Gala dinner that's way more expensive than I can afford and more than I've ever spent on a dinner. But hey, I have my tux here and I know I'll regret it if I don't go.

I fly back to Boston tomorrow afternoon, then it will be back to Leonardo, MorphGNT and editing the Atlanta reality pilot.

2005/04/16 (permalink)

More Stuff Coming Soon

I've got lots of other stuff to blog about including MorphGNT, Leonardo, some work I've been doing on Bayesian Belief Networks and Pearl's Belief Propagation algorithm (which I'm writing a Python implementation of).

I doubt I'll get to it while I'm down in Palm Beach but hopefully next week I'll be blogging about a bunch of technical topics. Also have some more Poincare Project posts to make too.

Stay tuned.

2005/04/14 (permalink)

Film Project Update: In Palm Beach

Between a product milestone at work and this site being down for a few days, it's been a while since I've blogged.

I'm now in Palm Beach for the Palm Beach International Film Festival. James Marcus and I arrived last night. We decided to rent a convertible.

Tonight is the opening of the festival. Our film is screening on Saturday.

On Tuesday we picked up the postcards advertising the film.

2005/04/14 (permalink)

Film Project Update: Boston Underground Beats Palm Beach

The Boston Underground Film Festival will actually be the first public screening of Alibi Phone Network, not Palm Beach. It's screening at 4.30pm tomorrow (Friday) at the Somerville Theater in Davis Square. James Marcus dropped off a DVD and BetaSP to them today.

Will be interesting to see it on the big screen.

2005/04/07 (permalink)

Film Project Update

It seems like ages since I've blogged. Extremely busy with work and preparations for Palm Beach and the network at my hotel was down over the weekend.

Last Thursday, James Marcus and I dropped off a DVD-R containing the 3GB uncompressed video for Alibi Phone Network to a transfer house.

On Monday, we picked up 2 DigiBeta tapes and 2 BetaSP copies from the transfer house and FedEx'ed one of the DigiBetas to Palm Beach. It arrived today.

A photo of James Marcus is now up on the official site as well as a bio for Kelly Feener (now in LA and using the name Kelli Daniels).

IMDb still hasn't included me or the actors in their entry on the film.

2005/04/05 (permalink)

Tiger Still on 8A series

I've expressed before an interest in Apple build number conventions.

The latest Mac OS X 1.4 "Tiger" build number is apparently 8A425.

What is unusual is that they are still on the "A" series unlike previous releases:

Interestingly, the middle letter has consistently been lower each major release.

I'd still love to know what triggers a change in the middle letter (and why that trigger never happened in Tiger).

2005/03/31 (permalink)

AppleScript and Python

A lot of my work on MorphGNT involves cleaning up and merging data from multiple sources. It's time consuming manual work and there's no instant gratification but it's worth it in the end.

I use a combination of Python scripts and manual editing in a text editor. Yesterday I thought I'd try AppleScript to automate some of the text editor work.

Unfortunately SubEthaEdit, which I've actually come to love as a standalone editor, even when not collaborating, doesn't seem to be scriptable. TextWrangler, however, is, so I downloaded that.

I've never written AppleScript before but fortunately, I was able to "record" myself performing the action and then bring up the resultant script and parameterize it.

So my script has lots of things like:

tell application "TextWrangler" to find bcv searching in text 1 of text document "ubs.txt"

The language is definitely optimized for doing little things (and the attempt to make it read like English is cute) and I'd hate to do anything too involved with it but I absolutely love the fact that I can automate applications (and even across applications) so easily.

What I really want now is the ability to kick off AppleScript from within Python and pass data from Python into a parameterized AppleScript.

Anyone done something like that before?

UPDATE (2005-03-30): There was a presentation at PyCon just a week ago on this sort of thing. See http://toys.jacobian.org/presentations/2005/appscript/. Looks very cool!

UPDATE (2005-04-03): Mark Nottingham pointed me to Scripting AppleScriptable Applications with Python.

2005/03/30 (permalink)

Blog Reading Prioritization: Attention and Bayesian Approaches

A post from Steve Gillmor on Attention prompted me to starting looking more into the attention.xml spec.

The problem area attention.xml fits in to (if I understand it properly) is improving blog (or really any feed content) reading efficiency by helping to prioritize entries and reduce duplicates.

Just under a year ago, I suggested Bayesian classification for blog reading prioritization. My idea then (resembling an idea I had ten years earlier for reading USENET) was that your reader would predict, on the basis or what you read (or marked as interesting) what other posts you are likely to be interested in and prioritize accordingly, using Bayesian classification much like spam filters. My idea was not that entries would be filtered out nor that new entries from unsubscribed feeds would suggested to you. The idea was just to help with prioritization.

It seems like there could be a lot of synergy between that idea and attention.xml. I need to think about it some more - watch this space!

I certainly think there are still massive opportunities for innovation in blog reading technologies.

And where might Leonardo fit in? Given that I see Leonardo as the "hub" of my online presence, a lot. The key will be how to better integrate my feed reader with Leonardo to enable support for things like attention.xml

Exciting times!

2005/03/28 (permalink)

Simple Algorithm for Recurring Tasks

I previously mentioned the little tool for Mac OS X called Consistency from Sciral which is for managing flexibly recurring tasks.

The approach is very simple. For every task, you specify a minimum time and a maximum time between occurrences. The Sciral Consistency UI gives you a table with a row for each task and a column for each day and you simply mark when you've done the task for that day.

The squares are colour-coded as follows:

This works surprisingly well. However, I wanted a system with a couple of additions. Firstly, I wanted to prioritise tasks based on this approach with ordering within tasks of the same colour. Secondly, I wanted something that would handle tasks that can be done multiple times during a day (like reading blogs or email).

For prioritisation, the following seems to do the job:

score = max(0, (1 + interval_since_last - minimum) / (1 + maximum - minimum))

This maps to the colours as follows:

and you can sort the tasks relative to the actual score to prioritise within a colour band.

Handling tasks that can happen multiple times within a day turns out to be easy. Simply change the units that you use to measure interval_since_last, minimum and maximum to hours for that task and it just works.

One thing I observed implementing this in Python, though, is that interval_since_last should be an integer rounded down. Otherwise something with a minimum of 1 will start to have a score > 0 before that 1 interval is up.

I'll make my command-line Python implementation available soon.

2005/03/26 (permalink)

Poincare Project: Groups

Part of the Poincare Project.

Although we've already defined them as monoids with inverses, a group is such an important concept in pure mathematics that we'll summarise here.

A group is a set G of objects with some binary operation # that maps every pair of elements of G to an element in G such that:

As we've already seen, integers under addition form a group. Integers under multiplication do not form a group because the multiplicative inverse of an integer is not an integer (e.g. inverse of 2 would be 1/2). The rationals under multiplication do not form a group either because 0 does not have an inverse. However, the non-zero rationals under multiplication do form a group.

There are many sets outside of the numbers that form groups. For example, consider the different ways you can rotate an object. Consider G to be the set of all rotations. Now consider # to be the composition of two rotations, i.e. a # b is the single rotation that is equivalent to performing rotation a after you have performed rotation b. It turns out that (G, #) forms a group.

2005/03/26 (permalink)

Poincare and Leonardo

I sometimes get mail from people that have stubbled across the Poincaré Project posts and have wondered how they can easily get a full listing of them. I've now made such a list available on the Poincare Project page.

This is really just a temporary manual substitute for categories in Leonardo. The next version will support category-specific pages and feeds. It will also support comments and trackbacks.

2005/03/26 (permalink)

Licenses on Atom Entries

Henry Story suggested on the atom-syntax mailing list that it would be very helpful if there were a machine-readable way to express copyright policy on an Atom entry (e.g. via a Creative Commons URI)

This has come up before on this blog in the context of indicating whether one is happy to have an entry linkblogged.

Bob Wyman rightly points out that Creative Commons isn't about DRM in that CC licenses grant rather than restrict rights. A non-commerical CC license doesn't prohibit commercial use, it just grants non-commercial use.

Bob is worried that if Atom provides a way to link to a CC license, people will think that they can restrict the use of their content this way.

But I think not having a way to do this is worse.

Not having a way to restrict rights shouldn't preclude one from having a way to grant rights. As I've mentioned earlier, I don't mind people including the content of my blog in their link blogs with attribution. I don't see any problem with being able to declare that fact in a machine-readable in way my Atom feed. Should people that want to do this be dissuaded from doing so just because others (even the majority) may assume the mechanism allows rights to restricted rather than granted? I don't think so.

2005/03/25 (permalink)

Film Project Update: World Premiere at Palm Beach International Film Festival

I haven't been able to mention it until now, but the festival that Alibi Phone Network got in to is the Palm Beach International Film Festival (PBIFF).

The world premiere will be on April 16th.

The festival has a page on the film with a number of errors I need to get corrected.

2005/03/24 (permalink)

Film Project Update: On IMDb Too

One of my long-time goals has been to get on IMDb, the Internet Movie Database. For a film from unknowns to get listed, you have to be able to prove that the film has been or will be released for public exhibition (and this includes festivals).

Now that PBIFF has made public that they are screening Alibi Phone Network, my submission to IMDb has been accepted.

You can see the IMDb title entry at http://www.imdb.com/title/tt0450237/

The only odd thing is they omitted both my data and the actors' data. Everything else I submitted was included—even obscure things like the fact we reference Ferris Bueller's Day Off.

2005/03/24 (permalink)

SxSW: On Way Back to Boston

Well, South-by-Southwest is over for another year and I'm sitting in the Admirals Club at the Austin-Bergstrom International Airport ready to head back to Boston.

Like last year, I didn't attend nearly as much as I could have. Like last year, I spent a lot of my time in my hotel coding. Like last year, I missed out on hanging out with a bunch of people I knew there. But like last year, I met a bunch of new people and made some great contacts. And like last year I had a great time.

I'll likely attend SxSW again next year. I just hope it doesn't clash with ETech again!

2005/03/20 (permalink)

SxSW: Aussie Bands

Yesterday I went to the BBQ put on by the Australian Music Collective.

I was only there for a few hours (not the whole seven) but I did get to hear some good sets from Starky, Missy Higgins, Little Birdy, Old Man River and The Panda Band. Nothing that really blew me away but enjoyable stuff. I enjoyed what I heard more than at the Aussie BBQ last year. Less heavy and more melodic.

2005/03/19 (permalink)

Poincare Project: Inverses

We've already seen that a set with the additional structure of a binary operation is called a semigroup if the operation is associative and that a semigroup with an identity is called a monoid.

The integers under addition is an example of such a monoid (with 0 as the identity) and so is the set of strings under string concatenation (with the empty string as the identity).

However, unlike the integers under addition, there is no notion of an inverse in string concatenation. For every integer a there is an integer b such that a + b = 0 (the identity element). b is said to be the inverse of a.

The monoid of strings under string concatenation has no such concept of inverses. You can't concatenate an arbitrary string with some other string to get back to the empty string.

Monoids with inverses effectively have a function which maps a to f(a) such that for all a in the set, the binary operation applied to a and f(a) results in the identity element. For integers under addition, f is such that f(x) = -x.

A monoid whose elements all have inverses is called a group.

So the integers under addition form a group. The strings under string concatenation do not.

2005/03/19 (permalink)

Also Missing PyCon

I'm missing ETech. I'm also going to miss PyCon which is another conference I wanted to attend. In the case of PyCon, it's not a clash with another conference, it's just that I can't afford to be away from work any longer than I already have been.

2005/03/19 (permalink)


It's 2am and I just ordered ice cream from room service. I felt like a rock star with my decadence right up until the lady asked "is it just you tonight?"

Guess I don't quite fit the mould.

2005/03/18 (permalink)

Little Python Scripts

In the last year, any time I've written a Python script that I think others might find interesting, I've posted it to this blog. I've finally got around to putting a list of them together on my Python page so anyone stumbling upon that page will find them without having to read through my blog.

I've included them here for your pleasure too:

Enjoy! And suggestions for improvements are always welcome.

2005/03/18 (permalink)

SxSW: Music Opening Dinner

Last night was the opening dinner for the music festival. Like last year, it wasn't particularly well attended and the people I talked to today who were there last night thought it was a bit of a waste.

I had a different experience, though. I enjoy meeting people over dinner. I find you end up having much better discussions albeit with fewer people (call me an introvert but I like that).

Last night I met a some great people including Steve Turnidge (a mastering engineer and founder of Weed), Evan Blackstone (from Sarathan Records), Phyllis Dubinksy and Evie Silvers.

2005/03/17 (permalink)

SxSW: Drinks at the Australian Stand

This afternoon I went for drinks with the other Australians here at SxSW (there are a lot of them). Made a lot of contacts. I really am quite out of touch with the local scene—it's funny I have to come to Austin, Texas to meet music industry people that live in the same city as me!

I won't list people I talked to for fear of offending someone by omission but it was a good mixture of artists and managers. There is so much more I could be doing to promote myself as a composer and producer. I think foremost I just need to get a CD done to act as a calling card - much like Alibi Phone Network will become from a filmmaking perspective. I've also decided I need to find a record producer to mentor me so I've started putting out feelers here.

2005/03/17 (permalink)

SxSW: Music Starts Today

The music part of SxSW starts today. I've gone and collected my bag which is chock-full of magazines and CDs. Tonight is the welcome dinner.

2005/03/16 (permalink)

Poincare Project: Identities and Monoids

A set with an associative binary operation is called a semigroup. We'll learn what it takes to be a full group soon.

Consider a semigroup (S, #). If there is an element e in S such that:

e # x = x # e = x for all x in S

then e is referred to as an identity and the semigroup is called a monoid.

For example, the integers under addition is a monoid with identity 0. The integers under multiplication is a monoid with identity 1.

Note that our definition requires both e # x and x # e to be x even though we don't require x # y = y # x in general. It is possible to have so-called left-identities and right-identities for which only e # x = x or x # e = x respectively is required for all x. The unqualified term identity is taken to mean it is both a left-identity and right-identity and the definition of monoid requires this.

Note also that, because of our definition, the identity must be unique. The proof is straightforward. Imagine two identities e and f. Then e # f = f # e = e but also e # f = f # e = f. So e = f.

2005/03/16 (permalink)

Missing ETech

Last year I decided that O'Reilly's ETech was one of the conferences I most wanted to attend in 2005.

It was after I'd registered for the full 10 days of SxSW that I found out ETech had been scheduled for the same time.

I'm really disappointed I can't be there.

Hopefully I can make ETech 2006.

2005/03/15 (permalink)

Change to Optima

I've recently fallen in love with Hermann Zapf's Optima font family again and so have decided to change the CSS for this site. If you don't have Optima, it will degrade to a mixture of Verdana and Arial but you'll be missing out :-)

2005/03/15 (permalink)

SxSW: Southern Belles Party

After the film, I went to the Southern Belles party. Unlike the Hooligans party, the night before, this one was a lot smaller and so I got the chance to talk to most of the people there who were involved in the film. Many of them had just experienced the world premiere of their first feature film so it was a very exciting time for them. They were all extremely gracious and, although it sounds a little corny, I was really honoured to be able to share in their celebration.

I would love it if I could work with some of them on a film in the future.

2005/03/14 (permalink)

SxSW: Southern Belles

I went and saw Southern Belles largely on the strength of one of the producers, Zack Sanders, seeming like such a nice guy at the opening party. It turned out to be a wonderful film.

Belle and Bell are two lifelong friends in Georgia. Belle dreams of a better life and starts trying to raise enough money for her and Bell to move to Atlanta. Hopeless romantic (and Gone With the Wind fan) Bell, having ditched her loser boyfriend Hampton, is willing to go along with the plan until she falls for a local policeman by the name of Rhett Butler.

The film was a tremendous amount of fun and a great showcase for the comedic abilities of Anna Farris (who plays Belle) and Fred Weller (who plays Bell's ex-boyfriend Hampton). The real find was Laura Breckenridge, who plays Bell. Laura is definitely an actress to keep an eye on. The casting director, Jennifer McNamara deserves particular credit for putting together such a wonderful cast. Also deserving of credit is Eric Haase who shot the film beautifully on Super 16.

The script was a little uneven in parts but was overall a very warm comedy with just the right amount of odd-ball goofiness for my liking. The filmmakers should be very proud of what they have achieved. I certainly found their accomplishment inspirational for my own career.

2005/03/14 (permalink)

Statistically Improbable Words in Python

I've noticed recently that Amazon has started listing some significantly improbably phrases for many of their books.

About a year ago, my sister Jenni and I wrote a Python script to do something similar (although only at the word level, not phrase).

Inspired by Amazon, I've now put our script up at http://jamessaiz.en.wanadoo.es/2005/03/z_value.py

I'll need to think a little more how to extend it to phrases. In the meantime, have fun with the script and let me know if you have any suggestions.

2005/03/14 (permalink)

Poincare Project: Binary Operations

To begin topology, we took a set and added some structure to it by designating certain subsets as open sets.

To begin algebra, we will start with a set and add some structure to it by defining a binary operation.

An operation is a rule that takes one or more objects from a set and results in another object. For example, the addition operation takes two numbers and results in another number.

If the result is always in the same set the inputs came from, the set is said to be closed under that operation.

If an operation takes two inputs, it is called a binary operation.

So addition is an example of a binary operation and the set of integers is closed under that operation.

String concatenation is another example where two strings are concatenated to form a third.

Note that as long as you can define the rule (if need be just by listing the result for each pair of inputs) you have an operation.

So there is nothing wrong with defining a set {A, B, C} and defining some rule # such that: A#A = A, A#B = C, A#C = B, B#A = A, B#B = B, B#C = C, C#A = A, C#B = A, C#C = A

In this example, don't try to look for any pattern. I just randomly picked some results. All we require to have a binary operation is that a result is defined for each pair of inputs.

2005/03/14 (permalink)

Poincare Project: Associativity

Previously, we introduced the concept of a set with a binary operation that takes as inputs two elements of the set and outputs an element of the set.

What about taking three inputs? After all, in the examples given of integers with the addition operation or strings with the concatenation operation, it isn't difficult to think of calculating a+b+c by applying the + operation twice. You just work out a+b and then apply the operation again to that and c.

In other words, a+b+c = (a+b)+c.

In fact, in the case of adding integers or concatenating strings, you could work from the right too and work out b+c first and then apply the operation to a and that.

In other words, a+b+c = a+(b+c)

The fact that you can work this out by successive applications of a binary operation working either from the left or right isn't true of all binary operations. For example, if our set is the integers and our operation is subtraction then we can get a different result depending on which pair we start this.

If we interpret 3-2-1 as (3-2)-1 we get 0. However, if we interpret 3-2-1 as 3-(2-1) we get 2.

Addition of integers has a property that subtraction on integers does not. This property is called associativity.

A binary operation # on a set is said to be associative if and only if (a#b)#c = a#(b#c) for all a, b and c in the set.

If a binary operation is associative, it doesn't matter which way we calculate a#b#c.

If a binary operation is non-associative, we have to decide, usually just as a convention, whether we calculate the left-most pair first or the right-most pair first. If we adopt a left-first convention, the operator is said to be left-associative and if we adopt a right-first convention, the operator is said to be right-associative.

Note that left-associative and right-associative mean that the operation is non-associative. In other words, a binary operation is either associative or not and if not, then (by convention) left-associative or right-associative.

Subtraction on the integers is left-associative by convention. 3-2-1 = 0 and not 2.

Exponentiation on the integers is taken to be right-associative. 3^3^2 = 3^(3^2) = 3^9 and not (3^3)^2 = 27^3.

2005/03/14 (permalink)

CGI Environment Bug Fixed in Python 2.4.1rc1

Looks like the CGI environment bug in Python 2.4 has been fixed for 2.4.1.

This bug prevented Leonardo's test server from working out of the box with Python 2.4 on Windows.

2005/03/14 (permalink)

SxSW: Malcolm Gladwell Keynote

Wonderful keynote from Malcolm Gladwell—a sampling of some of the anecdotes and key observations in his latest book Blink about the snap decisions we make and why they are sometimes so wrong.

Some of the issues raised were not surprising if unfortunate (predominance of white males in symphony orchestras until auditions were done behind a screen) or even tragic (the shooting of Amadou Diallo).

Where Gladwell was most fascinating, though, was when he pointed out some of the less obvious prejudices which can "hijack" our snap decision making. For example, he talked about how poor doctors are at diagnosing whether chest pains are a heart attack or not when presented with a wealth of seemingly relevant information. Doctors who are presented with less information can make a much more accurate diagnosis.

He also related this "less information can help you make better decisions" to things like the intelligence community, suggesting that the intentions of the Japanese in 1941 were clearer to people reading only newspapers than to the intelligence community with the wealth of information that effectively overburdened their ability to judge the overall pattern.

One audience member asked a great question about how Gladwell does his research. His response was both humorous and insightful. He just made sure everyone he came into contact with knew exactly what he was interested in at that time and talked about nothing else. Many of the stories in the book, he said, came from chance conversations with people.

I'm looking forward to reading Blink (bought a copy after the talk which Malcolm signed). If you get a chance to hear Malcolm speak, jump at it. He is a great speaker.

2005/03/13 (permalink)

SxSW: Hooligans Party

Had a great time at the Hooligans Party. SxSW volunteer Dave Dart and I played "isn't that guy in some film I once saw". People we unambiguously recognized included Elijah Wood (who spend much of the night DJing, Claire Forlani and Chris Masterson (who seemed like a really nice guy). I recognized David Krumholtz too but had to ask someone what his name was and what he'd been in. I didn't remember him from the films the person listed but a quick check of IMDb when I got back to the hotel revealed I know him from Freaks and Geeks.

Many of the people I talked to have films at the festival. Again it was nice to follow "no I don't have a film at SxSW" with "but I did make the XYZ film festival next month" and generally get the response "oh cool, that's a great festival".

It's always easy to ask filmmakers about their film. I never know what to say to an actor, especially a well known one. What would I have said to Elijah Wood? Maybe I'm not giving him enough credit. Chris Masterson was certainly approachable.

2005/03/13 (permalink)

SxSW: Missed Hooligans

I left my hotel at 6.05pm to catch the 7pm premiere of Hooligans. I got to the theatre at around 6.15pm and joined the end of a very long line.

Unfortunately, they stopped letting people in about 20 in front of me. So I missed out on attending the premiere. I did, however, meet Nadine Takvorian in the line. Nadine is an illustrator whose short film Elegy made the official selection at SxSW.

2005/03/12 (permalink)

SxSW: Film Opening Party

Tonight was the opening party for the film stream. When asked if I was a filmmaker, it was nice to be able to say yes and name the festival we're in next month :-)

It was at this party last year that I met the guys behind I Am Stamos. This year I got talking to Zack Sanders, the producer of Southern Belles which is part of the official selection at SxSW. I'm looking forward to checking it out on Sunday night.

Saw Claire Forlani who's here for Hooligans which I plan to see tomorrow night.

Caught a glimpse of Stephen Tobolowsky.

I haven't fully planned tomorrow yet but there will certainly be conflicts between the film and interactive streams.

2005/03/12 (permalink)

What Vacation?

Yesterday at work, people kept wishing me well for my vacation. It was strange because I don't feel like I'm going on vacation. I feel like I'm switching to one of my other careers full-time for a week. Attending all three streams of SxSW (film, music and interactive) for a total of ten days isn't what I'd call a vacation. At least as fun. But not "vacation".

2005/03/11 (permalink)

SxSW: Tom Fulp and Alien Hominid

Just got back from an excellent talk by Tom Fulp opening the interactive stream. 26-year-old Fulp took us through how he co-founded the San Diego-based Behemoth indie game development studio to turn his hit Flash game Alien Hominid into the console game Alien Hominid.

Take aways:

2005/03/11 (permalink)

SxSW: Arrival

Got up at 3am to catch the early morning flight to Austin via Dallas Fort Worth. Boston is supposed to be hit with another round of snow. Austin weather couldn't be better (makes up for last year when it was raining).

Checked in to the Hilton right across from the convention center (it's the best place to stay if you're attending the conference parts of SxSW) and grabbed a bite to eat at the hotel grill before heading over to register.

Alas, the "computers were down", the line was hundreds long and not moving and I was told I should just come back later.

Crossing the street back to the Hilton, who should I bump into but Robert Scoble.

Was good to see you again after over four years, Robert!

2005/03/11 (permalink)

Atlanta Reality: Second Shift Editing

Last night, after my day job "shift", Tom and I did a work-day-length editing session on the Atlanta reality show concept.

It's coming together very nicely. Clips were reasonably well logged so it was fairly easy to find things when we wanted them.

We've put together a first cut of the first 3 minutes or so of a 5-10 minute demo. There were enough threads in the 12+ hours of footage to string together a story and weave in character intros and funny moments.

Lots of fun, although after a long editing session I have strange dreams - even when you're awake it's hard to stop some footage from playing in the back of your mind.

2005/03/10 (permalink)

Final Days before SxSW

I'm leaving for SxSW early Friday morning. I had hoped to spend this evening editing the Atlanta Reality TV footage but a snow storm hit and I needed to head back to my hotel (I don't have a car and am relying on the last colleague to leave each night to drive me back).

If you're going to be at SxSW, drop me a line (see contact information). Looking forward to catching up with lots of people!

2005/03/08 (permalink)

Poincare Project: Switching from Analysis to Algebra

Previously, we defined the mathematical structure known as a manifold which is a topological space that is locally homeomorphic to R^n (and hence able to have the notion of a coordinate system or systems).

You may recall that when we started our journey, we began with the idea of adding structure to sets and took a step down the path of topology by introducing the notion of open sets which allowed us to, in turn, define the notion of continuity. That path led us to manifolds. If we continue down the path we'll get into analysis.

But at this point, we're going to go back to sets and take a different path; rather than take the path of continuity we'll take the path of discreteness. Where as topology took us from sets to topological spaces to manifolds and the gateway to analysis, we will now explore the beginnings of group theory which will take us from sets to groups and the beginnings of abstract algebra.

Once we've spent a little time on group theory, we'll be ready to talk about the Poincaré Conjecture itself and also start laying the foundation for differential geometry, which is the basis for recent work on the conjecture as well as for Einstein's General Theory of Relativity.

2005/03/06 (permalink)

Otiose Apostrophes and SG-1

Dorothea Salo raises a question I've been wondering about myself for a while: what is it about scifi/fantasy and its love of the meaningless apostrophe?

A few months ago, during an all-day D&D session, my sister Jenni (who is a linguistics student) pointed out some of the names on the map contained apostrophes with no apparent linguistic meaning whatsoever.

Jenni and I also observed that Stargate SG-1 is particularly guilty of using the otiose apostrophe (e.g. Teal'c and many others names).

One interesting exception in Stargate is Goa'uld where the apostrophe could legitimately exist to indicate that 'a' and 'u' are separate syllables and not the diphthong 'au'.

But what is strange is that Goa'uld seems to be completely mispronounced by Daniel Jackson who is supposed to be a linguist (or is it an archaeologist this week? Or an anthropologist?). In the first two seasons I'm watching on DVD, I've heard three distinct pronunciations:

Jackson always says the first, Teal'c always says the third. Others vary (I think General Hammond uses the second).

2005/03/06 (permalink)

Leonardo Mailing List Down

Looks like the Leonardo mailing list has been down for the last week. Apologies to people on the list. I've emailed support at python-hosting.com so hopefully it will get sorted soon.

It happened last month too, although last time mail was still making it into the archives (just not getting distributed). Now it's not even getting that far.

2005/03/06 (permalink)

Bloglines Handling of Relative Links

Why is it that <a href="/leonardo">Leonardo</a> in Leonardo Mailing List Down is linked correctly to http://jamessaiz.en.wanadoo.es/leonardo when read in Bloglines but <a href="/nelson_james">Nelson James</a> in Blog of a Singer, Model and Actor is incorrectly linked to http://bloglines.com/nelson_james?

Is this a bug in Bloglines?

2005/03/06 (permalink)

Blog of a Singer, Model and Actor

If anyone is interested in following the very early career of a singer, model and actor, I recommend subscribing to my good friend (and Nelson James front man) Nelson Clemente's blog. You'll read about modelling classes, auditions, photo shoots, songwriting, theatre and the occasional bit of SCADA engineering (his day job). Oh, and don't miss his advice on using hair removal cream!

2005/03/06 (permalink)

Why No Apple Pro Photo App?

For some creative software categories, Apple has three levels:

For example, there are the triples:

Sometimes the express level is missing:

It has long struck me as at least interesting that Apple doesn't have any more in the photo editing series:

There's nothing competing with Photoshop.

Actually, there's nothing competing with anything in Adobe's Creative Suite (which I just bought, incidentally):

What am I missing about either the Adobe-Apple relationship or the market for photo/graphics/print versus video/music?

UPDATE (2005-10-19): Now see Aperture

2005/03/05 : Categories apple (permalink)

Time Zones in Software

Because I spend a lot of time in different time zones, it affects me greatly how software deals with varying the time zone it is running in.

Previously when using a Windows-based laptop I always left the clock in one time zone. I didn't trust what Windows would do if I effectively wound back the clock 13 hours when travelling from Perth to Boston.

I've always liked the Linux approach of internally using UTC and making the particular time shown to the user a display issue. The means things should just work.

Now that I'm using a PowerBook with OS X, I figured I'd take the risk and I just changed the time zone from AWST to EST. The good news is that when I do an ls -l, the times have all changed which suggests that they are internally using UTC and just displaying in my local time zone.

So the filesystem does The Right Thing. The other big culprit tends to be calendar apps. In my experience, many calendar apps still do not take into account time zones. Terribly frustrating when the US team sends me an invite for a conference call when I'm in Australia. OS X's iCal looks like it does the right thing in changing the event times when one changes the time zone, but the problem seems to be that the original Outlook invite was assumed to be in local time.

So an invite for a 9am EST meeting appeared in iCal as a 9am meeting while I was in AWST but now that I'm in EST zone, it's appearing as an 8pm meeting. Not sure if that's iCal's fault or Outlook's. Either Outlook's invite didn't specify that the 9am start time of the meeting was EST or iCal ignored it.

I just noticed another aberration. Just after I changed my time zone, I noticed Entourage was saying it wouldn't be checking mail for another 692 minutes so clearly some part of Entourage is relying on local time rather than some fixed time like UTC.

2005/03/05 (permalink)

Busy Week

Busy week at work (for very positive reasons) so haven't had much time to do anything (including blogging). Expect more entries this weekend.

The only things I've really achieved besides work are organising the exhibition tape and press kit for the still unnameable film festival we got in to and capturing the Atlanta Reality Project footage on to disk.

Regarding the latter—which was done in the background while I coded Java in Eclipse—I now have 372 clips from 15 tapes. Next step is to annotate and organise the clips.

2005/03/04 (permalink)

Google Safari Maps

This morning I used Google Maps and it worked fine. Why is this so surprising? Well, I was using Safari.

Just a few days ago it didn't work on Safari. Now it does. Thanks Google!

2005/03/01 (permalink)

Film Project Update: Transferring to DigiBeta

The major film festival we got in to requires the exhibition "print" to either be Beta SP, DigiBeta or 35mm.

Given it would cost more to transfer to 35mm than the entire film cost to make and submit, it's out of the question.

DigiBeta is the highest quality option. I was all ready to pay a transfer house to go from DVD to DigiBeta when it was pointed out that DVD quality isn't actually as good as MiniDV.

I did a bit of research and here are the numbers. In all cases, we're dealing with Standard Definition 720x480 NTSC. The difference is in the colour subsampling and the amount of (lossy) compression.

DigiBeta is 4:2:2 (which means the colour resolution is half the luminance resolution) and has a data rate of 90Mbps.

MiniDV (in NTSC) is 4:1:1 (colour resolution is one quarter the luminance resolution) and has a data rate of 25Mbps.

The MPEG-2 codec used by DVDs is 4:2:0 (alternates between horizontal and vertical colour information with a resolution of half the luminance resolution—ultimate result is similar looking to 4:1:1) and has a data rate of 9.8Mbps on average, peaking to 15Mbps.

Now, the data rate isn't all the matters - a better codec will have a lower data rate for the same quality. However, I believe that the compression used by MiniDV and MPEG-2 is pretty similar and hence the doubling of the data rate in MiniDV is pretty indicative of an improvement in quality, even though from a colour point of view they both fall pretty much equally short of DigiBeta.

Now here's the challenge: I'm in the US with a DVD. The raw MiniDV-quality footage is on my computer at home.

To get the best possible DigiBeta transfer, I need to get the raw file, either as an AVI or an uncompressed Quicktime. At 25Mbps, it should take up 3G which would fit on a DVD-R.

So I'm going to have to get one (or both) of my sisters to make me a DVD-R of the 3G file and courier it as quickly as possible.

The kicker is: I had the 3G file in question on the laptop I'm using right now - but I deleted it the day I left for the US to make more room.

2005/03/01 (permalink)

Film Project Update: Success

Today I received an email to say that Alibi Phone Network has made the Official Selection at a major North American festival (it's considered top-25 but not top-10).

I can't say just yet who it is because they've asked me not to until the full program is official.

Suffice it to say, I am absolutely thrilled. If nothing else it means that a complete stranger who is into films liked this film.

Wow! I'm so excited.

Thanks to those people who've been following along on this blog and who have offered me encouragement along the way.

3 rejections; 1 selection; 18+ more to go!

2005/03/01 (permalink)

Atlanta Reality

What was I doing in Atlanta?

Here's a clue:

close up of video camera

I was down there with Tom Bennett filming a proof-of-concept for a reality TV show. I can't give too many details away at this stage but I will say that I have 12 hours of footage that I now need to edit down to about 10 minutes. That's a shooting ratio of 72:1 compared with 12:1 for Alibi Phone Network.

Mind you, it's not unusual for reality shows to have ratios exceeding 200:1. No wonder they say that feature films are a director's showcase, episodic television is a writer's showcase, but reality television is an editor's showcase.

I'm kind of bummed that I'm not going to get the chance to edit on my new system because I'm in the US and won't be back in Australia any time soon.

But the editing should be fun nevertheless. Shooting was certainly a blast. Tom hired a professional crew with reality experience and that was an incredibly worthwhile decision on his part. It freed up Tom to focus on directing and building the overall story and freed me up to take detailed shot logs that will hopefully help with editing.

2005/02/28 (permalink)

Leaving Atlanta

Sickness kept me from blogging much last week but Thursday evening until now I haven't been able to blog at all because I've been down in Atlanta without Internet access.

What have I been doing down in Atlanta? I'll say more in another post. For now, I've got to get ready to hop on a plane back to Boston.

Tonight I'm having dinner with Mark Baker whom I haven't seen for about five years.

2005/02/27 (permalink)

Film Project Update: Third Rejection

I got back from Atlanta to received our third straight rejection for Alibi Phone Network, this time from the Ann Arbor Film Festival.

2005/02/27 (permalink)

Funny Google Disclaimer

I'm still too sick to post anything thoughtful, but found the following funny.

I was trying out Google's new movie: operator and noticed the following disclaimer at the bottom of a results page with movie reviews:

The selection and placement of reviews on this page were determined automatically by a computer program. No movie critics were harmed or even used in the making of this page.

2005/02/24 (permalink)

Building an Apple I Replica

I haven't been blogging for a little while as I've been sick (caught something on the plane from NY to Boston, I suspect).

But the soon-to-be-released book Apple I Replica Creation (with a forward from Woz) has got me excited enough to blog. Can't wait to buy it!

2005/02/22 (permalink)

Writers Guild Awards

I have to admit I was nervous going to the awards. I'm really not one for going to events where I don't know a single person. But I ended up having a wonderful time thanks to the delightful people they sat me with at the dinner, many of them writers just starting out.

Quite a few familiar faces from film and television were there, although I didn't talk to anyone whose face (or name) I knew prior to the evening. I would have liked to have talked to Alfonso Cuarón or James Schamus but I actually have no idea what I would have said to them :-)

2005/02/20 (permalink)

The Gates

Yesterday I went to Central Park to see Christo and Jeanne-Claude's The Gates.

While each individual "gate" is nothing impressive, the overall work is quite stunning for two reasons: one is just the shear scope of it—7500 gates along tens of kilometres of paths; the whole being greater than the sum of its parts. The second thing that struck me was the number of people that had come to see it. There was a certain buzz in just walking along the paths with all the other people.

The Gates

A lot more (bigger) photos that I took are here.

2005/02/20 (permalink)

Cryptonomicon and Glass

I'm (finally) reading Cryptonomicon and I can't help but think that if it were ever made into a movie, Philip Glass should compose the score. It's Philip's music I hear as I read it and listening to the Glassworks album on the plane today just evoked images from the book.

2005/02/20 (permalink)

Last Day in New York

Today's my last day in New York. Tomorrow I fly to Boston.

Last night I had some great sake and sushi at Blue Ribbon with my good friend James Marcus and his friend Aaron.

The Writers Guild Awards are tonight. My plan is to go see Christo and Jeanne-Claude's The Gates, come back to the Marriott, change in to my tux and head over to the awards at the Pierre hotel.

2005/02/19 (permalink)

Leonardo 0.5.0 Released

I'm pleased to announce that version 0.5.0 of my Python blog/wiki/CMS software Leonardo has been released at http://jamessaiz.en.wanadoo.es/2005/leonardo/leonardo-0.5.0.tgz

As well as numerous bug fixes and internal enhancements, 0.5 offers the following features and enhancements over 0.4:

2005/02/17 (permalink)

Trying Sony

I'm at Melbourne Airport. No Bose QuietComfort headphones in sight. So I had to make the decision: do I go with the Sony MDR-NC11s on sale or do without anything for the next 20 hours of flying?

I decided to give Sony a try. The headphones are in-ear, which is a bit of a change for me. They were also half the price of the Bose. The packaging certainly isn't nearly as nice.

I haven't bought the AAA battery required for noise cancellation yet but one immediate difference I discovered with the Sony over the Bose is that the NC11s will still work as normal headphones without a battery.

The earbuds also manage to block out far more sound on their own than one would expect.

So far so good. But I'll report again on them after I arrive in New York and I've put them to good use.

UPDATE (a Melbourne-to-LA flight later): The Sony NC11s did a pretty good job although I still prefer the Bose (even taking it account the doubled price). While playing music, the Sony were almost as good as I remember the Bose being (although I didn't have the latter for a direct comparison). For pure noise cancellation, the Sony NC11s definitely fall short of the Bose Quiet Comfort although they were still effective and made the flight more bearable. Low frequency attenuation is probably pretty close to what the Bose achieves but I found that the NC11s didn't do as well at higher frequencies. The Bose aren't perfect in the upper frequencies but the NC11s are worse.

Had I had the option and knowing what I know now, I would have gone with the Bose. But the Sony headphones did the job and made the flight more bearable. They'll last me a while (I hope) and perhaps by then, Bose will have released a QC3.

2005/02/16 (permalink)

Amazing Python Hack

I don't normally post naked links here but this recipe in the ASPN Python Cookbook has to go down as the best Python hack I've seen.

Be sure to read the discussion too.

2005/02/16 (permalink)

Sciral Consistency

Via Jenni, Sciral Consistency:

Calendars are great for keeping track of tasks where you need to coordinate with others by setting fixed times and intervals.

To-do lists are great for keeping track of tasks that you will do once, and that you need to keep in order by priority.

But there's another class of activities for which neither traditional calendars nor to-do lists are optimal. If you already use a calendar and a to-do list, you're probably trying to wedge these tasks into those tools, without realizing that they really call for a new kind of tool. Sciral Consistency is that tool.

Looks promising—I'll report how I go using it.

2005/02/14 (permalink)

Quoting charset

My reading of RFCs 2068 (section 3.7) and 3023 (section 8.1) suggests that mimetype parameters can be quoted so that:


should be the same as


However, numerous browsers (including Firefox) seem to only recognize the latter and not the former.

Have I misread the RFCs or are the browsers wrong?

2005/02/14 (permalink)

Not Without My QC

I've meticulously noted everything I need to do before leaving on my trip to the US but I just realised that I forgot one thing.

I've been meaning to replace my Bose QuietComfort ANC headphones. I bought a pair of the first generation product back three years ago and they've made a huge difference during the hundreds of hours I've spent in planes since then. However, after three years of abuse, they are on their last legs.

I want to buy a pair of the second generation QuietComfort 2 headphones. Problem is, I have no idea where I can buy them (in person, they won't have time to be shipped) in Perth.

2005/02/14 (permalink)

Almost Ready for Another Trip (and a Geek Dinner)

Three more days until my next big trip to the other side of the world (if you calculate the point on the Earth opposite Boston, the closest city is Perth, where I live).

I'm coming up to the one year anniversary of this blog. When I started blogging, it was just days before a trip. That trip started with the Academy Awards in LA and ended with SxSW in Austin. This trip is starting with the Writers Guild Awards in NYC and will contain (although not end with) a trip to Austin for SxSW. Interesting (if imperfect) parallel.

Most of my time will be spent in Boston. I'm thinking that it would be fun to organize a Geek Dinner while I'm there. So many other bloggers have Geek Dinners I can't make it to when I'm in Perth so I figure I'd better make one happen while I'm in Boston.

If you're in Boston and interested in meeting up, let me know!

2005/02/13 (permalink)

Poincare Project: Manifolds

A topological space is the most general space that has a notion of continuity, but as we discovered in recent instalments of the Poincare Project, most applications of topological spaces are restricted to a subset known as Hausdorff Spaces.

We're actually going to go a step further now and define a structure called a manifold. A manifold is the most general space that can have a coordinate system. It is the generalisation of what is often referred to as a surface and the foundation for things like vector calculus.

Define a chart to be a continuous one-to-one mapping from an open set to R^n.

A manifold is a topological space covered by one or more charts. In other words, every point (and some open set the point is in) is part of at least one chart.

One way to think of this is that an n-dimensional manifold is locally (but not necessarily globally) like R^n. It is the fact that a sphere is a 2-dimensional manifold that allows us to draw flat 2-dimensional maps of sections of it.

A chart provides a coordinate system and the coordinates of a point on a manifold are just the components of the point in R^n that the point on the manifold maps to in that chart.

For some manifolds, it is possible to cover the entire space with a single chart. Others needs multiple charts. For example, no single coordinate system can cover a sphere continuously one-to-one; for example, the coordinate system of latitude and longitude breaks down at the poles where a single point on the sphere maps to an infinite number of points in R^2.

Much of the foundational work on manifolds is due to Poincaré himself.

2005/02/11 (permalink)

Film Project Update: Second Rejection

We're now at 0-2:

An unfortunate fact of every film festival is that we receive more good films than we have room to show… and I am sad to inform you that I was not able to program Alibi Phone Network for this year’s Sonoma Valley Film Festival.

My original goal was 20% success so I'll start getting worried after the fourth straight rejection :-)

2005/02/11 (permalink)

Updated Python Trie Implementation

I previously wrote about my BetaCode to Unicode script which used a Trie.

A Trie acts like a dictionary but it allows you to match on longest prefix as well as exact matches.

I've now pulled out the Trie datastructure and made it available standalone at http://jamessaiz.en.wanadoo.es/2005/02/trie.py

I welcome any comments on how to improve it.

2005/02/10 (permalink)

Bandwidth Reduction Through Responsible Feeds

Back on the 23rd January, I switched jtauber.com over to a new version of Leonardo that:

What effect did this have on bandwidth? I think the results speak for themselves:

graph of daily bandwidth usage in January 2005

2005/02/10 (permalink)

Attending the Writers Guild Awards

Yesterday I received an invitation to the Fifty-Seventh Annual Writers Guild Awards in New York next Saturday (they must have figured it would cheer me up after being rejected by SxSW).

Anyway, I'm arriving in Boston two days earlier so making a trip to NYC is no big deal. The timing couldn't have been better, actually. Plus it gives me an excuse to pack my tux which I'll need anyway for all those film festivals I'm going to win at!

2005/02/09 (permalink)

Lineage of GMail Invites

I think it would be fun being able to trace one's GMail Invite Lineage.

I got mine from Mark Baker.

Now if Mark reads this and says in his blog where he got his from and that person says on their blog (assuming they've got one) where they got theirs from...

So a call to all bloggers with GMail accounts: spread the meme!

UPDATE (2005-02-09): Mark Baker replies:

I got mine from my old friend Nelson Minar, Google employee. I expect I know where he got his. 8-) Actually, let's save a whole lotta time and just ask him for a dump of the invites database. 8-)

Wow, I'm only two steps from the source!

I did figure Google could produce the family tree themselves. Perhaps someone can work on a browser into that for their 20% project.

2005/02/08 (permalink)

GMail invites available

50 invites available. Email me at jtauber@jtauber.com

2005/02/08 (permalink)

What a Difference RelaxNG Makes

Reading through the latest spec for the Atom Syndication Format, the thing that struck me was the use of RelaxNG Compact Syntax for the grammar. I wish every spec for an XML format did this.

Great work, Norm!

2005/02/04 (permalink)

Wikipedia URIs

Looks like the idea to use Wikipedia as a source of topic URIs has also occurred to fellow XMLer, David Megginson, who's just started a blog (welcome to the blogosphere, David!)

Technorati has already stepped forward with a mechanism for tagging using their URI space. One difference between that and what David and I are proposing, though, is that Technorati is just providing the URI space—the actual semantics of a tag are at the mercy of the interpretation of each tagger, with all the ambiguity that I've talked about before.

Wikipedia URIs have the advantage that disambiguation emerges quite quickly in the URI space. "python" is ambiguous as a tag, but the Wikipedia "Python" only refers to the snake and "Python programming language" is used for the programming language, "Monty Python" for the comedy group and "Rafael_Python_5" for the missile.

2005/02/02 (permalink)

Film Project Update: First Rejection

Unfortunately, this was the festival I most wanted to get in to.

Dear James Saiz,

We did not select Alibi Phone Network for screening at this year’s South by Southwest (SXSW) Film Festival.

2005/02/02 (permalink)

Server for Testing Trackbacks

Is anyone running a server available for testing trackbacks? Bryan Lawrence and I would like to implement trackback sending in Leonardo and would like somewhere to test on.

2005/02/01 (permalink)

Leonardo 0.5 Beta 1 Released

I've released version 0.5b1 of my Python blog/wiki/CMS software Leonardo. It's available at: http://jamessaiz.en.wanadoo.es/2005/leonardo/leonardo-0.5b1.tgz

There will be a handful more minor enhancements before a release candidate but I'm keen to get 0.5 wrapped up so I can start on trackbacks, comments and categories which will be the main themes of 0.6

2005/01/31 (permalink)

Film Project Update: Festival Announcements Soon

According to Matt Dentler's Blog, SxSW will be announcing the short film program on February 14th so I guess I'll know by then at the latest whether Alibi Phone Network got in.

The other 17 festivals we've submitted to so far will be sending out notifications at various times over the next few months.

No acceptances yet but no rejections yet either so there's hope.

2005/01/30 (permalink)

Poincare Project: More on Separation Axioms

We previously defined types of topological spaces called T_0, T_1 and T_2 spaces. I've tried below to capture the distinction between them informally with a diagram.

diagram showing differences between T0, T1 and T2 spaces

Recall that a topological space is:

2005/01/29 (permalink)

jtauber, jtauberer, jtauberest

Recently a Technorati search for "Tauber" I subscribe to came up with various references to a Joshua Tauberer who won the Technorati Developers Contest for GovTrack.us.

Well, I discovered from Language Log that Joshua Tauberer is a doctoral student in linguistics at UPenn.

We actually have common interests within linguistics: formal syntax, dependency grammar, etc.

I got in contact with Josh and he pointed out how amusing it would be if we co-authored a paper. It's not entirely out of the question.

2005/01/29 (permalink)

Poincare Project: Separation Axioms

The definition of topological spaces is very general and allows for some rather unusual spaces that have properties quite different from R^n. Put another way, there are some intuitive properties one might expect of a topological space which turn out to not necessarily be true from definition.

For example, there is nothing in the definition which says that two points can't be in exactly all the same open sets. However, two points in exactly all the same open sets are topologically indistinguishable from one another. Topologically, they are the same point.

An example is the space {a,b,c} with the topology {{}, {a,b}, {a,b,c}} where there is no topological distinction between a and b.

Even though the definition allows it, we will be restricting ourselves to topological spaces where every distinct pair of points is topological distinct. In other words, for any two points, there is an open set containing one but not the other. Such spaces are called T_0 spaces.

Furthermore, we will be dealing with spaces such that, for any two points, each is in an open set that the other one isn't. Such spaces are called T_1 spaces. The definition sounds very similar to T_0 but it is slightly more restrictive. T_0 only requires that one of the points is in an open set the other one isn't. T_1 requires this to be true of both points in the pair. Clearly all T_1 spaces are T_0 spaces.

A space that is T_0 but not T_1: {a,b} with the topology {{}, {a}, {a,b}}.

A further restriction turns out to be necessary in order to guarantee some of the intuitive characteristics of things like the real numbers.

In a T_1 space, we require that for any two points x and y, x is in an open set that y isn't (let's call it U) and y is in an open set that x isn't (let's call it V). There is nothing that says U and V are disjoint. They can intersect (as long as neither x nor y are in that intersection).

If a disjoint U and V exist for each pair of points, then we have what is called a T_2 space.

It may seem an arbitrary restriction to go from a T_1 to a T_2 but it turns out that this additional requirement is what allows us to define a metric on a space or take unique limits of sequences.

The additional axioms defined for T_0, T_1 and T_2 spaces say something about how separated the points have to be. For this reason they are referred to as separation axioms. There are more (T_3, T_4, etc) but, for our purposes, it is the T_2 axiom (and the T_1 and T_0 axioms that it implies) that are important to us.

T_2 spaces are also called Hausdorff spaces. From this point, pretty much all the topological spaces we deal with will be T_2/Hausdorff spaces.

2005/01/27 (permalink)

Happy Birthday Denise Tauber

Today is my mother's 56th birthday.

She taught me about family and about love. She taught me to listen to my conscience. She has always been there for me. Over the last few years, she's become a really good friend.

I love you mum. Happy Birthday!

2005/01/27 (permalink)

BetaCode to Unicode in Python

BetaCode is a common ASCII transcription for Polytonic Greek. I've been dealing with it for around twelve years. (As an aside, back in 1994, I designed a METAFONT for Polytonic Greek that enabled one to use BetaCode in TeX—I typeset my self-published Index to the Greek New Testament with it).

For the last six years, my preference has been to use Unicode, so I wrote a program (initially in Java but then in Python) that used a Trie to represent the multiple BetaCode characters that can map to a single pre-composed Unicode character.

I've had a version available on this site since 2002, but I've now updated it to what I've been using for my most recent work. You can download it at http://jamessaiz.en.wanadoo.es/2004/11/beta2unicode.py

At some stage I'll better factor out the conversion pairs so the code is useful for other conversions. The Trie code might be useful for other contexts too.

(Also see Ricoblog's Converting Greek Beta Code into Normalized Unicode.)

2005/01/27 (permalink)

Upgraded Leonardo on this site

I've upgraded Leonardo on this site to an early beta of 0.5. Apologies if I've inadvertently broken anything.

New features that impact you, the reader:

The last two should greatly reduce my bandwidth usage (and help yours too).

2005/01/23 (permalink)

Poincare Project: More on Compactness

Previously, we defined what it means for a topological space to be compact. The definition ("every open covering has a finite subcovering") is precise but hard to get an intuitive understanding of (well, it was hard for me).

I found it helpful to have some examples of well-known spaces and whether they are compact or not:

There is an informal sense in which non-compact spaces keep on going, whereas compact spaces stop (or return you to where you started).

Within the context of the Poincaré Conjecture, we will largely be narrowing the spaces we are interested in to those that are compact.

2005/01/22 (permalink)

Tim on Tags

Tim Bray has a great post on tags. Some of the topics:

plus some great closing questions, none of which I have answers for myself, except maybe why I think I need categories in Leonardo.

Here are two user stories I wrote on the Leonardo mailing list:

Story #1: Albert occasionally says some good things about FOO on his blog and Planet FOO is interested in aggregating them. However, they don't want to aggregate Albert's non-FOO posts so they'd like a feed just of his FOO topics.

Story #2: Betty is working on a project and provides updates on her blog. She'd like to have a page that just contains the entries relating to this project.

2005/01/20 (permalink)

Comparison with A-List Blogger

Not sure if he would call himself an A-List Blogger, but Jeremy Wright's blog is certainly right up there. He's just published some stats so I thought it would be interesting to do a comparison between him and someone whose blog is a little further down the Long Tail (me!)

PubSub Rank: 441 versus 43,827

Technorati Rank: 1,458 versus 34,528

Monthly visitors: 208,000 versus either 8,000 (unique IP) or 36,000 (visits)

Monthly pageviews: 510,000 versus 92,000

But here is an interesting one that I looked up directly:

Bloglines subscribers: 207 versus 144

Which raises the interesting question: why does Jeremy have so few Bloglines subscribers given his other stats?

2005/01/20 (permalink)

Nerd God

Okay, so according to this test I'm a Nerd God with a score of 95.

2005/01/19 (permalink)

DATR, MorphGNT, RDF and Python

I've been revisiting DATR, the lexical knowledge representation language, as a possible format for the next generation of MorphGNT. I was previously considering developing my own RDF/graph-based format but I suddenly remembered DATR from my student days and it makes a lot more sense to use it rather than try to build my own.

Looking at DATR material, I haven't seen anything more recent than 1998 so I'm not sure if it's still the state-of-the-art. It's a natural fit for some kind of RDFization, something I'm sure I'll eventually end up doing if someone hasn't already.

Of course, I'll have to write Python code to manipulate DATR. Again, unless some already exists. But I'm almost hoping not as I love implementing specs, especially using test-driven development.

UPDATE 2005-04-19: Now see DATR in Python

2005/01/19 (permalink)

Petals Around the Rose

I'd heard about the dice-based brain teaser Petals Around the Rose but didn't read the details until tonight when I followed Bob Congdon's link to this page.

At the outset of reading the latter, I decided I would try to work it out myself and not cheat by Googling the answer. I had a couple of hypotheses but none of them worked past one or two of the sample rolls given.

By the end of the page, I hadn't worked it out. Bob Congdon had suggested "the less you think about it the easier it is to solve" so I stopped thinking about it all together and started getting ready for bed.

Then it all of a sudden hit me! I rushed back to the computer and tested my hypothesis. I was right every time!

Go read the article for yourself. Even if you don't get it right away, stop thinking and let it come to you. Nothing beats working it out yourself.

2005/01/11 (permalink)

More on Lost CGI Environment Variables in Python 2.4

I previously mentioned problems a user was having running Leonardo under Python 2.4 on Windows and that I'd narrowed it down to CGIHTTPServer and the environment not getting populated.

Looks like others have had the same problem.

Pierre-André Côté pointed me to this bug report and fix at SourceForge and Markus Schramm suggested subclassing CGIHTTPServer with this workaround:

# There seems to be a bug in Python 2.4.0, that I could reproduce under
# Win98SE and WinXP Home SP1a (Python 2.3.4 works OK for both systems).
# The CGI variables are set inside the current process. Normally the
# CGI script will be executed in a new subprocess, but without this
# workaround the variables are not accessible there.
# Expected reason (some additional tests done):
# os.popen3(..) and os.popen4(..) do not correctly pass the modified
# os.environ to the new subprocess (Windows platforms only).
# Workaround:
# Redefine some class variable values to force a fallback mode that
# executes the CGI script in the current process.
if 'win' in sys.platform and sys.version_info >= (2, 4):
    have_fork = have_popen2 = have_popen3 = False

Markus also suggested the following override (unrelated to the lost environment problem)

# Overridden to not call socket.getfqdn(host), that doesn't work at
# all machines and is very very time consuming (several seconds) then.
# Note: Merely used for logging to the console.
def address_string(self):
    return '%s:%s' % self.client_address2

Thanks also to Jeff de Wet.

2005/01/11 (permalink)

Learning a Language with Pimsleur

I've just finished the Pimsleur Italian I course. I cannot recommend enough the Pimsleur approach to language learning if you are learning on your own. It is expensive, but having completed my first, I think it's money well spent (and I've already bought Italian II)

I thought I'd share some tips I've picked up along the way. I should note that, although you can get through it in 30 days, it took me much longer due to various false starts. Which leads me to my first tip:

Tip #1

If you miss doing it for more than a few days, consider starting again. At least go back five to ten lessons. At first I couldn't bring myself to do it but then I reminded myself that the objective was to learn Italian, not get through the CDs in record time.

Tip #2

If possible, do one full lesson a day (30 minutes, although it drops to 25 after Lesson 9, the case of Italian I, if you postpone the reading as I did). If you don't have 30 minutes a day, try overlapping over two days. e.g. 0:00-20:00 the first day, 10:00-30:00 the second day.

Tip #3

Never pause the CD to give you more time to answer. Much better to get used to thinking on your feet. If that's too hard, see Tip #4.

Tip #4

If you are having trouble with a lesson, go back and repeat the previous one. I found this incredibly useful and it enabled me to get through difficult patches (which I did find came every 5 lessons or so).

2005/01/11 (permalink)

Lost CGI Environment Variables in Python 2.4

Did something change between 2.3 and 2.4 that would affect os.environ being populated with CGI variables when using the BaseHTTPServer/CGIHTTPServer?

I received a bug report from someone trying to run Leonardo under Python 2.4 on Windows 2000. I was able to reproduce the problem under Python 2.4 on Windows XP Pro and confirmed that it worked fine under Python 2.3.

Simply printing out os.environ at the point things like PATH_INFO are extracted by Leonardo revealed that os.environ contained no CGI-related variables when run under Python 2.4 but did contain them under 2.3

I can't see in the code for the http server modules that anything changed in this area. Am I missing something?

2005/01/09 (permalink)

Bill Gates and the Creative Communists

By now most people in the blogosphere have heard about Bill Gates's statement, in response to a question on whether intellectual property rights need to be reformed, that "There are some new modern-day sort of communists who want to get rid of the incentive for musicians and moviemakers and software makers under various guises. They don't think that those incentives should exist."

A lot of people have been up-in-arms about the characterisation of groups like the Creative Commons folk as communists, but even if Bill was talking about Creative Commons, many of the criticisms I've read seem to miss the main point.

The key point to make, in the context of Creative Commons, is that CC isn't about legal reform—it's about helping creators to convey their licensing intentions within current copyright laws.

Yes I think current copyright terms are stupid, but your works don't have to be subject to them if you don't want. As a creator of the work, you have the control.

That is what was stupid about Michael Moore being okay about illegal copies of Fahrenheit 9/11 before the election. If he was the copyright holder, and he wanted it to be freely distributable for non-commercial purposes, he could have made the film available under a CC-like license. It's ridiculous to reserve all rights (or assign them to an entity that does) and then complain that people should be allowed to copy the work.

If Bill Gates was talking about Creative Commons, then his comment was a straw man. CC is about helping creators to realise the flexibility they have. To give them choices. Even expand the incentives. And there are great market opportunities for publishers, music distributors, etc who want to work with this flexibility too. Artist doesn't like the deal from the label? Go somewhere else like Magnatune. Consumer doesn't like the redistribution terms of the song? Don't buy it.

Some people find incentive in money, others in fame, others in making a lasting contribution. As long as people are free to pursue any or all of those paths, that sounds pretty good to me. If someone truly was wanting to get rid of incentives (of any kind), then I'd have a problem. In as much as Bill was saying that, then I agree with him.

UPDATE (2005-01-10): See Glen Otis Brown's post on the Creative Commons blog. Notice he characterises CC as a "voluntary, market-based approach to copyright". Just that one phrase pretty much makes the point this entire blog entry was trying to. And it pretty neatly sums up why I'm a fan of CC.

2005/01/09 (permalink)

Annotated Word Association Sketch

One of my favourite (and perhaps the cleverest) Monty Python sketch is John Cleese's Word Association.

Below is a transcript of the sketch which I have annotated according to an initial analysis performed by my sister Jenni and I. There is an underlying talk being given and I have marked that up in bold. I have tried to group the phrases resulting from a word association on distinct lines and have repeated in parentheses where a word forms part of two overlapping word associations that aren't part of the main text or where a difference of spelling exists. (To recover the sketch, just ignore the parentheticals.)


2005/01/09 (permalink)

Leonardo 0.4.1 Released

Leonardo is the Python software that runs this site, providing a blog and wiki-like content management system for personal websites. Leonardo requires Python 2.3 but no additional software as it uses the filesystem directly for storage.

0.4.1 fixes a bug that prevented editing of the css stylesheet.

Leonardo 0.4.1 is available at: http://jamessaiz.en.wanadoo.es/2005/leonardo/leonardo-0.4.1.tgz

2005/01/07 (permalink)

Feedster Interesting Feeds of the Day

I just discovered Feedster's Interesting Feeds of the Day and, to my shock, discovered I was chosen as the Interesting Feed of the Day back on 14th November.

Given some of the other blogs linked to, I can't help but feel (like I did when I was put alongside people like Eric Schmidt and Ray Ozzie by Network World magazine in 2000) that it's just a mistake that will soon be corrected.

Then again, they didn't say, "good", or "enjoyable" or "informative". Just "interesting".

2005/01/07 (permalink)

Delicious Trackbacks and Leonardo

Peter Sefton has another great post about his team's intended use of del.icio.us to share bookmarks within the group.

The downsides Peter points out got me thinking about Leonardo acting as a delicious server.

The advantage of del.icio.us (the actual site, not the software or idea) is that it aggregates a lot in one place. But for more specialised categories, running a delicious-like service for your domain of interest isn't a bad idea and that's where Leonardo could come in.

I've previously suggested that trackbacks could be used for annotating resources and that categories could be viewed as resources that entries in that category track back to. Pinging delicious is really just like a trackback but the actor isn't necessarily the source and the target is a category/tag rather than another entry.

So a team could set up a Leonardo server (once the functionality I'm talking about has been implemented) and set up categories for the team. When they come across a resource of interest they use a delicious/trackback-like API to tell that Leonardo server about the resource.

Of course, there's nothing specific to Leonardo there. See, for example, this delicious clone (via Steve Mallett).

Another interesting result is that you've essentially namespaced your tags. So "leonardo" on jtauber.com can mean the software without clashing with other senses the tag might be used for.

2005/01/07 (permalink)

Who's Coming to SxSW?

I registered back in September but now that it's getting closer, I thought I'd ask again if there's anyone reading this that is planning on attending. Send me an email. We'll catch up.

The speakers for the Interactive stream have been announced. The Interactive stream is where I know the most people but, like last year, I'm also attending the Music and Film streams as well.

And, with any luck, Alibi Phone Network will be at the film festival!

2005/01/06 (permalink)

NetNewsWire and Flagging Items

When I came back in October from my extended trip to the US, I thought I'd try out NetNewsWire and I've used it ever since (previously I was using Bloglines in a browser).

I've done a good job of getting my unread entries to zero each day but that's misleading because what I find myself doing is flagging items and never going back to them.

What I need at a bare minimum is a display of how many flagged items there are. NetNewsWire has a virtual folder that automatically contains all flagged items. It would be nice if it displayed not only the number of unread items but the overall total.

Without a total count, it's easy to forget you've got stuff in the folder and having an exact number makes it much easier to set goals like "I'll keep my flagged items below 100".

In fact, as a general rule, hierarchical containers of items to read should show both the unread count and the total count. I find it frustrating that most email clients don't show both. But for now I'd just be happy if NetNewsWire did it.

I flag items for a variety of reasons:

So the second thing that would be nice is multiple flag types. That way I could distinguish entries to keep from ones I need to act on in a timely fashion. I'm thinking just a variety of colours and a virtual folder for each flag type.

Hopefully these two features are on their way in NetNewsWire. Brent?

2005/01/06 (permalink)

Translations, Glosses, Tags and Folksonomies

There's been some recent discussion on Slashdot and in the blogosphere on the incremental, bottom-up taxonomies ("folksonomies") created via tags in things like del.icio.us and Flickr.

Beside the fact that I've long been interested in taxonomies, I've been thinking about some of these issues recently because (a) I'll soon be implementing categories in Leonardo; (b) I've just started reading John Lee's A History of New Testament Lexicography (which, for all you New Testament Greek scholars out there, is a must read).

What does New Testament Lexicography have to do with del.icio.us tags? Read on.

When I'm explaining to people some of the challenges with translation and reading translated works (whether the New Testament or any other work), I like to use the following Venn diagram:

Two intersecting circles, one marked A, the other marked B. The intersection is marked '2'; the part of A not intersecting B is marked '1'; the part of B not intersecting A is marked '3'

Consider A to be the word in the original language and the circle on the left to represent the range of possible meanings of that word. A translator chooses to translate A as the word B, with the circle on the right representing the range of possible meanings of that word.

Very few words match up between two languages. There will senses of A that B doesn't have (marked '1' above) and senses of B that A doesn't have (marked '3')

The first thing that can go wrong is the translator assuming the wrong sense of A. If the original author meant '1' then B will be a bad translation.

But even if the translator gets the sense right there is still the possibility that the reader of the translation will assume the wrong sense of B (marked '3').

This challenge arises not only in translating texts but also in dictionaries and this is where Lee's book is so fascinating. Looking up an individual word in a bilingual dictionary is subject to the same challenge, particularly if the dictionary just provides a gloss rather than a full definition. In just providing a gloss (an equivalent word in the target language) there is a risk that a user of the dictionary will take the wrong sense of the gloss.

Full definitions are generally much better, although, as Lee points out there are cases where a gloss does just fine and is even preferable. χιών is adequately defined by the gloss snow and there is no need to define χιών as "the aqueous vapour of the atmosphere precipitated in partially frozen crystalline form and falling to the earth in white flakes" (which is how one dictionary, cited by Lee, defines "snow").

In the realm of New Testament Lexicography, lexicons such as Louw and Nida's Greek-English Lexicon of the New Testament Based on Semantic Domains does an excellent job of teasing out the different senses of Greek words and making clearer which senses of corresponding English words they map to.

What does all this mean for tags? There is a tremendous practicality in tag-based folksonomies but they do suffer from many of the same problems as glossing. Perhaps the biggest issue is disambiguation. A given tag can have multiple senses.

Say I used the tag "leonardo" for my software. I'd then need to come up with a different tag if I wanted to talk about Leonardo da Vinci. If I'd talked about the latter first and chosen "leonardo" for him, I would have then needed to come up with a different tag for my software.

That doesn't sound that big a deal, but in a common tag set, it's much more difficult to coordinate that kind of disambiguation. Someone might have already started using "leonardo" for one sense and another come along and used it for another sense without realising.

In a way, the problem is that the tags are their own gloss. There's no definition of what their sense or scope is. How might one provide a disambiguated version of a tag, without adding complexity that would drive people away from using them at all? Using URIs instead of tags is, of course, the "right" thing to do (in as much as it would provide a unique identify for each sense) but it just won't fly with the majority of Flickr or even del.icio.us users.

That's why previously, I suggested wikipedia as the basis for disambiguation. Wikipedia provides an excellent platform for disambiguation, not at the level a lexicographer or translator might expect, but good enough that it would provide enough benefit for the cost in folksonomy tag disambiguation.

Also see Tag the Tags which suggests an easy way to add expressiveness to the tagging approach to classification without adding too much complexity.

2005/01/05 : 0 trackbacks : 0 comments (permalink)

Tag the Tags

After my previous entry on Translations, Glosses, Tags and Folksonomies, I started thinking about some of the other limitations with tags as well, including normalising synonyms and expressing hierarchy or grouping.

One solution could solve both. If you could have a tag that meant the union of certain other tags, you could create a parent tag for all the synonyms. Taxonomists might shudder at the conflation of synonymity with grouping but it seems entirely appropriate for a folksonomy.

Of course, any tag should be allowed to have multiple parents where it fits into multiple larger categories (noting that parents generalise, not specialise)

How might these grouping be expressed? Well it just dawned on me:

Tag the Tags

If tags themselves could be tagged, a much richer taxonomy could be built. You could have tag groupings, tag types, etc. And none of it would interfere with the existing data.

Of course, it's poor-man's RDF with only one property and tags instead of URIs, but, hey, it just might work for folksonomies.

2005/01/05 (permalink)

Give Elsewhere Too

With all the coverage the Tsunami disaster is getting, it's easy to forget there are other places in the world hurting too.

So here's a suggestion: pick another project or appeal your favourite aid organisation is currently working on and match dollar-for-dollar what you gave for the Tsunami appeal. (I chose the Red Cross's Sudan Appeal.)

Who knows, it might become an addiction :-)

2005/01/05 (permalink)

Film Project Update: DVDs Finally Arrive

I previously recounted the ongoing saga of the DVDs of Alibi Phone Network that Tom sent UPS Next Day Air on 18th December.

Well, they finally arrived, 17 days after they were sent. The wait was worth it, though. They came out really nicely.

2005/01/04 (permalink)

Priority Levels

I've talked about the difference between priority and severity and proposed possible severity levels. Here's my current thinking on priority levels in issue tracking systems for software development (and specifically Leonardo's Roundup tracker).

Most systems I've seen simply use numbers for priorities: P1, P2, P3, P4 and P5, for example. To really know which one to use, groups end up assigning particular meaning to them.

Outside of software development, I've found it useful to think of priority in terms of modals like:

and, where appropriate adding things like:

I think it's useful in software development to think of those modals in the context of releases. e.g. this bug must get fixed for 0.5 or that enhancement should get in 0.6.

This is already better than P1-P5 but there are some complications that need to be addressed.

Firstly, if one assumes that maintence patches are still taking place on previous releases, it is possible for issues to have priorities relative to multiple releases. For example, a bug might be a "must for 0.5" and a "should for 0.4.1". How would one express that in an issue tracking system?

Secondly, one probably needs finer grained priorities on occasions when, of the 20 things that must or should get into the next release, there are 5 that must get done in the next week and another 5 that should.

The second is less of an issue for Leonardo in my opinion, but it would still be nice to address.

One way of addressing the former, especially if one assumes that one is only actively maintaining one version prior (reasonable at this stage of Leonardo at least) is to have separate priority fields: one for the upcoming patch and one for the upcoming new version.

So both "patch priority" and "trunk priority" could have values like:

What priorities have others found useful?

2005/01/03 (permalink)

More on Priority and Severity

Previously I talked about wanting to separate priority and severity in Roundup and proposed some severity levels:

At the time I left open how to handle features and what priority levels could be.

I just did a Google search on 'priority severity'. My previous blog entry was actually the first hit. The second hit was a page on the original c2 wiki espousing the principle DifferentiatePriorityAndSeverity.

Some commentators suggested a distinction was a nice idea in theory but too hard for submitters in practice. What I am suggesting, though, is that the submitter need only worry about severity, the developer fixing the bug about priority and only the people responsible for triage really need to worry about the relationship between the two.

One commentator mentioned Microsoft's severity levels as being:

This seems reasonable, although the second and third might get blurry unless it's clear what the granularity of a "feature" is (which is probably spelt out in specs at Microsoft). It also doesn't take into account whether a workaround exists which I think is important.

I do like the calling out of a crash or loss of data. And I did have a slight problem with my own earlier list in distinguishing "major" and "minor" which the Microsoft list doesn't worry about.

With regard to features, I think it is helpful to distinguish new features from enhancements to existing features. This may suffer from some of the same granularity problem mentioned earlier for the Microsoft severity levels but I think it makes sense in the context of a particular project's plan.

As I mentioned in my previous entry, I think there's also a need to handle code cleanup, refactoring and other internal enhancements. I previously suggested a possible separate class, but I think it can be done with severity. I think it also helps to have a catch-all "general tasks" for when it isn't clear there is (yet) a suitable level to assign the issue to.

So one possible severity level list would be:

Note that in both the B-series and E-series, the lower number means greater significance.

Next up, some thoughts on priority.

2005/01/03 (permalink)


2005/01/03 : Categories linguistic_observations (permalink)

Leonardo 0.4.0 Released

I am pleased to announce the release of Leonardo 0.4.0.

Leonardo is the Python software that runs this site, providing a blog and wiki-like content management system for personal websites. Leonardo requires Python 2.3 but no additional software as it uses the filesystem directly for storage.

Leonardo 0.4.0 is a complete re-architecture designed to facilitate future implementation of a wide variety of features including trackbacks, comments and latex-generated images.

Leonardo 0.4.0 is available at: http://jamessaiz.en.wanadoo.es/2005/leonardo/leonardo-0.4.0.tgz

2005/01/02 (permalink)

Content made available under a Creative Commons Attribution-NonCommercial-ShareAlike license