James Tauber : James Tauber's Blog 2004/12

James Saiz

journeyman of some

James Saiz's Blog 2004/12

Poincare Project: Open Coverings and Compactness

If you pick a collection of open sets whose union is the space's entire set, then that collection is called an open covering of the space.

For example, consider the set {a, b} with topology { {}, {a}, {b}, {a, b} }. One open covering would be:

{ {a}, {b} }

Another would be

{ {a}, {a, b} }

Clearly it is possible to cover any finite topological space with a finite number of open sets.

It is also possible to cover any infinite topological space with a finite number of open sets. Because X is an open set in any topology on X, a collection consisting of just X itself is an open covering.

If an open covering has a finite subset which still manages to cover the entire set, the covering is said to have a finite subcovering.

Some topological spaces have the property that every open covering has a finite subcovering. Such a space is said to be compact.

Compactness is a topological property. Recall that this means if a topological space is compact, any topological spaces homeomorphic to it will also be compact (and also that a homeomorphism can't exist between a compact topological space and one that is not compact).

2004/12/30 : Categories poincare_project (permalink)

Favourite Posts of 2004

I mentioned in Blog Hits by Age that I would, as others have done recently, list my favourite entries from my blog this year.

Here are the ones that come to mind. Some generated some good discussion in the blogosphere at the time; others disappointingly didn't generate any response at all.

In no particular order...

Conference Reporting

Questions and Observations

Programming Ideas

Little Python Scripts

Typed Citations Meme

Aggregation versus Hosting Meme

XML versus RDF Meme

2004/12/30 (permalink)

Upgrade Apologies

I've upgraded this site to Leonardo 0.4.0rc3. Apologies to feed readers for the numerous atom entries whose modification dates got changed as a result.

2004/12/29 (permalink)

More On LinkRanks Ups and Downs

Recently, I observed large jumps in the PubSub LinkRanks for jtauber.com and attributed it to influential sites coming in and out of the 10-day window PubSub uses.

However, in a comment to Trevor Cook's entry on the jumps, the PubSub CEO responded:

The reason for the sudden shift is that we increased the granularity of how we measure linkranks. Specifically, we added individual blogs from the various hosting services for the first time (e.g. livejournal.com/johndoe) - that has suddenly shifted everyone's ranking. Bob Wyman, our CTO, dropped 30,000 places (much to his chagrin). Check out his blog for more details - http://bobwyman.pubsub.com

While it has obviously affected some blogs in the downward direction, I've been sub-50,000 ever since.

LinkRanks PSI

2004/12/29 (permalink)

Film Project Update: The Long Journey Home

Sending the DVDs to festivals has, so far, gone smoothly. The same can't be said for the 20 Tom sent back to me (recall the mastering had been done in Australia but the duplication in the US).

Tom went to the UPS shop on Saturday 18th December. Once the package was in the system, they were claiming an arrival estimate of Wednesday 22nd December. This seemed optimistic at the time. I thought 23rd was a possibility. Backworking, that would mean it would have to arrive in Sydney on 22nd and hence leave California by late on 20th.

The package, however, was not even picked up from the UPS shop until 6pm on Monday 20th. This meant it didn't fly out of New Hampshire until 10pm that night. At that point I knew it probably wouldn't make it by Christmas. The arrival estimate, however, was still showing 22nd.

It arrived in Ontario, California at 7.10am on 21st and within a few hours, had been seized by customs (or whatever "PKG DELAY-ADD'L SECURITY CHECK BY GOV'T OR OTHER AGENCY- BEYOND UPS CONTROL" means).

Finally at 3.44am on 23rd December, another hub scan was done. The arrival estimate was still showing as 22nd but at least there was a chance it was going to make it on a flight pretty soon and make it into the country by Christmas at least.

But, alas, no new scans. It didn't make it on a flight on 23rd or the 24th. By 27th there was still no new scan. Then on 28th December, ten days after the package had been sent, there was another hub scan done at Ontario, California. It still hadn't left the US!


I can only guess that the customs officials just really like the film. Hey guys, keep a couple of copies, just send the rest on please!

UPS is still showing the arrival estimate as...you guessed it...22nd December.

2004/12/29 (permalink)

TeX for Leonardo

Looking at Wikitex (via Simon Willison) has convinced me more than ever that I want support for TeX in Leonardo.

Hopefully 0.5 will have the framework (if not the actual implementations) to support a range of underlying document formats including TeX, XHTML and Word.

2004/12/23 (permalink)

Poincare Project: The Standard Topology for Ordered Sets

One common way of defining a topology is to take a set, add some structure to that set, define a collection of subsets that meet some criteria in that structure and then use that collection as a basis for the open sets.

Although we didn't have the vocabulary to accurately describe it in those terms, that's what we did previously with the topology of a metric space. A metric space, recall, adds to a set the structure of a distance function. From this, we can define the collection of open balls. This collection can then form the basis for the other open sets in a topology.

Here is another example. Take a set X and add to it the structure of a total ordering. A total ordering is a relationship < such that

In other words, a set with a total ordering is a set whose elements can be sorted.

Now define an open interval (a, b) to be the subset of X such that, for each element x, a < x and x < b.

The open intervals form the basis for a topology. So a total ordering on a set defines a particular topology. While other topologies are possible, the one based on the open intervals is referred to as the standard topology for the ordering or the order topology.

The real numbers, being a totally ordered set, has an order topology. While other topologies can be defined on the real numbers (as long as the rules for open sets are followed), the order topology is the most natural and consistent with one's intuitions about how the real numbers work.

2004/12/23 (permalink)

Blog Hits By Age

I was going to give a Top Ten Blog Entries By Number of Hits listing but I suspected it would not necessarily be that insightful under the hypothesis that hit numbers are partly a function of the age of the entry.

So I took the number of hits for each entry and graphed it against the age of the entry in days:

Blog Hits By Age

There definitely appears to be a linear baseline which the entries "rise above". To make this clearer, I graphed the hits per day against age:

Blog Hits Per Day By Age

Notice that the two entries from 250-300 days ago lower in significance while the entry from 50 days ago rises considerably. Which entries were these?

The older two are Eclipse is the next Emacs and Eclipse GEF. Both those get a lot of their referrals from Google searches.

The entry from 50 days ago is, funnily enough, another Eclipse GEF-related post, Six Snapshots of a Simple Eclipse GEF Application. Note that that entry is linked to from one of the older ones.

So, what effect does using average hits per day instead of just hits have on a Top Ten Blog Entries?

Here is a list of the top 10 just by hits:

And here is a list of the top 10 by hits per day (ignoring the last couple of days):

Is the second list more representative? I think so. It includes some extra entries (in bold) that were popular (judging by incoming links and del.icio.us citations) but didn't make the first list because they hadn't been around for as long.

How does any of this match up with what I consider my own favourite entries? I'll save that for another entry.

2004/12/23 (permalink)

LinkRanks Ups and Downs

PubSub LinkRanks seem to be very sensitive to very recent activity which means one's rank can jump around a lot. I'm guessing this is particularly true at the long tail where just one link can leap frog you over hundreds of thousands of fellow bloggers.

Yesterday I was 938,610, today I am 89,060. I've been sub-100,000 before but I also spend time around the 1,000,000 mark if I haven't been linked to in the last week or so.

Oddly, Trevor Cook and others are reporting their rank has dropped recently. Perhaps some highly weighted bloggers just dropped out of the time-weighted window of referrers for their sites.

Or maybe PubSub have changed their algorithm. They say they are still refining it.

Incidentally, I'll use the recommended PSI so PubSub know I'm talking about them.

2004/12/22 (permalink)

Happy Birthday Konrad Tauber

Today is my father's 56th birthday.

He opened up both the world of computers and the world of business to me. He gave me endless opportunities while always leaving the path up to me. He also taught me that business is about people.

I love you dad. Happy Birthday!

2004/12/22 (permalink)

Flickr and DataLibre

Darren Barefoot has come around on Flickr after earlier making the very DataLibre comment "I’ve yet to be convinced that the best place for my online photos isn’t on my own site."

He says it's the convenience that's won him over. Any feature in particular, Darren?

I certainly have found it easier to put photos up on Flickr than on jtauber.com, but that's just because of the current state of Leonardo. There's no reason why, in the future, Leonardo couldn't provide things like Windows Publishing Wizard support and iPhoto integration to make it just as easy to get stuff up on my own website.

But even then, I might still consider using Flickr. As I've mentioned before, I'm interesting in separating aggregation and hosting, not eliminating aggregation. I should be able to take advantage of Flickr's aggregation by pointing them to my self-hosted photos.

2004/12/22 (permalink)

Branching in Subversion

I'm just about to release Leonardo 0.4.0 so I thought I'd better learn how to branch in Subversion. Turned out to embarrassingly easy:

svn copy trunk branches/0.4

assuming you've got the entire tree checked out (otherwise it can be done almost as easily with URLs).

But it did get me thinking. Previously I've talked about replacing the structure recommended by the O'Reilly Subversion book


with more explicit indications of what I use tags for:


with further structure possible under the first four directories before getting to the actual source code.

Well, if I understand correctly, there is nothing special about the /trunk directory. I'm not even sure Subversion really has a notion of a trunk. So why not only have branches?

In other words, instead of keeping the latest development under /trunk and maintenance branches under /branches, why not have a branch for the current development version alongside the branches for maintenance. Something like:


where (in Leonardo's current state), next-version development takes place under /branches/0.5 and maintenance on 0.4 is done under /branches/0.4

Unless I'm missing something, this seems a clean way of organising things that is native Subversion. The original suggestion given by the Subversion book really makes sense only if you're coming from CVS.

Again, unless I'm missing something :-)

UPDATE (2004-12-23): Justin Johnson, in email noted:

The reason for using trunk is so that developers can continue working on the latest release without having to setup a new working copy everytime the project releases. For example, I were working on 0.4 and then 0.4 released and we created a 0.5 branch, I'd have to clobber my working copy and create a new one. But if I were looking at the trunk, I would be guaranteed that it always points to the latest release that is still in development. It may seem like a minor point, but when you have a lot of developers and when the size of the project is significant, it makes a huge difference.

This is a good point. I did consider the issue of "knowing which is the development branch" and that actually made me wonder about having aliases in Subversion.

However, in my own experience, for commercial software development at least, the developers (even on big projects) all know exactly what version is the latest development version and it is an important "event" in the engineering organization when a new branch is made.

I can see that, for distributed open source development, particularly if the cycles are short, a clearly designated trunk becomes more important, though.

2004/12/22 (permalink)

XML Elements versus Attributes

Ned Batchelder discusses the old question of elements versus attributes in XML. As I've been answering that question for over seven years in various places, I thought I'd put down my viewpoint here.

Firstly, there are distinctions based on performance or API usability. Those distinctions are so implementation-specific, I don't think they are very interesting; certainly not to someone doing schema design.

Secondly, there are distinctions based on a particular schema language. Different schema languages have different levels of expressiveness so it's important to distinguish the characteristics of elements and attributes inherent to XML from those that are true only because of the particular choice of schema language. One important take away here is that a schema is only part of the description of a markup language. In my experience there are always constraints placed on a language beyond what the schema (in any schema language) can say.

Thirdly, there are distinctions inherent to the XML syntax itself; things like the lack of attribute order or the inability to have further XML structure within an attribute value.

But when all those three are considered, there is still a fundamental "style" question around attributes and elements and here is where a lot of people really find themselves asking the elements versus attributes question.

My take on that is that the distinction is more meaningful the more markup-oriented your XML is and more fuzzy the more data-oriented your XML is.

If you are using XML to serialise objects, then the distinction is blurry and it largely comes down to convention and things like the third type of distinction above. In such cases, an element-only approach might my perfect sense, especially if you are using a schema language that can express characteristics that, in DTDs, attributes had over elements, like default values or insignificant ordering.

But if you are truly doing markup, in other words annotating text (particularly a pre-existing text) then the distinction between attributes and elements becomes much clearer and the reason why attributes exist in XML (and SGML) is far more obvious. The key is that attribute values are considered part of the markup, rather than part of the content. So the clearer the distinction is between markup and content, the clearer it will be between using attributes or child elements.

Imagine that you want to describe Max as a black cat. From a data structure representation point of view, there's no semantic distinction between:



<cat name="Max" colour="Black"/>

and so decisions about whether to use elements or attributes tend to boil down to (a) whether order matters; (b) whether values can have internal structure; (c) compactness or whatever.

However, if you are doing document markup, things are a little different. In the document markup case, you have some existing text that you annotate. So you start with a word "Max" in your document and you want to mark that up with a generic identifier and any additional properties you want to give that word (or referrant). You might end up with something like:

<cat colour="Black">Max</cat>

Making colour a child element rather than an attribute wouldn't make sense from a document markup perspective. In document markup there is a much clearer distinction between content and markup. "Max" is content. "Black" is markup. If you made "colour" a child element with "Black" as content then "Black" would change from being markup to content. Makes no difference in data structure representation but it does in document markup.

From a data structure representation point of view, this attribute/element distinction is so blurred that it is entirely possible to do away with attributes in representations (and sometime less confusing to do so). This is even more the case where you have schema languages that allow expression of the fact that element order (in a particular context) is not significant.

But in pure document markup applications, where attributes are just indicating characteristic qualities of an element's content, they have a clearer role.

2004/12/21 (permalink)

Film Project Update: Ten More Festivals

Just submitted Alibi Phone Network to ten more festivals: Phoenix FF, Palm Beach IFF, Newport Beach FF, Atlanta FF, Beverly Hills FF, San Fernando Valley IFF, Independent FF of Boston, Malibu IFF, Seattle IFF and IFP/Los Angeles FF.

2004/12/21 (permalink)

Alexa Does DataLibre Right (Almost)

I was fiddling around with Amazon.com's Alexa and discovered they provide a very DataLibre-style way of updating one's site information:

To update your contact info, you may place an info.txt file containing your contact info in the root of your site for Alexa to fetch.

Right-click this link: info.txt. And save it to your computer. Copy the info.txt file from your computer to the root of your site. Verify that the info.txt file is there with your browser. (Go to http://www.jtauber.com/info.txt.) Once you have verified that the file is there, tell us to fetch it by clicking this link: Go Fetch

Well done Amazon! Now if Bloglines did it with OPML, LinkedIn with FOAF, Freshmeat with DOAP, etc...

UPDATE (2004-12-22): Gary Fleming thinks info.txt is a bad idea. I agree with him. While I still like the DataLibre aspect of what Alexa does, Gary's entry persuaded me that requiring a fixed path "/info.txt" is the wrong way to do it. I should have been able to give Alexa my own URI. DataLibre means owning your own URI space too. Thanks Gary for making me realise that!

2004/12/21 (permalink)

New Mac for Audio and Video

For years, I've dreamed of having a computer dedicated to video and audio editing. It's always been hard to do because the moment I get a fast new machine with lots of memory and disk space, I want to move over to using it for everything. But I'm resolved this time to "keep it pure".

I got a PowerMac dual 2.0GHz G5 (on principle, I always buy the second-fastest processor available on the thinking that the state-of-the-art is over priced for the the people who will pay anything to get the best) with 2.5GB RAM, 2x250 HDDs and a GeForce 6800 GT card. I had earlier bought a 23" Cinema HD screen which I was running off my 12" Powerbook but now it belongs to the PowerMac.

(Actually, losing the 23" screen is going to be the toughest part of "staying pure" as I'm now back to 12" for things like Leonardo and MorphGNT. I might have to share the screen - that's not cheating is it? Do they make KVMs that work with Cinema HD screens?)

I spent a good part of today doing OS updates and installing Apple's Production Suite (Final Cut Pro HD, Motion and DVD Studio Pro). The machine came with OS X 10.3.4 which didn't have support for the 6800 card so I had to put a different graphics card in, upgrade to 10.3.7 and then put the 6800 back in.

The Production Suite install went smoothly. When it came to ProTools LE 6.1, things didn't go so well.

Until now, I've been running ProTools off my Windows machine. I'd forgotten just how much of a pain it was getting ProTools to work last time. ProTools is very picky about hardware and OS. I think I finally got it to work on Windows by upgrading my HDD drivers.

Anyway, I wasn't expecting any problems with my new Mac. But lo and behold, when I started up ProTools for the first time on the Mac, I got an error message (actually it was error code 1). A quick Google result on the DigiDesign discussion board indicated that error 1 meant that ProTools didn't like the OS version.

The next major version of ProTools is due soon so I wonder if that will work. Hopefully in the meantime there is a minor release that works on OS X 10.3.7. Going to investigate now...

UPDATE (2004-12-18): Looks like upgrading to ProTools LE 6.4 did the trick.

2004/12/18 (permalink)

Priority, Severity and Roundup

I'm a big fan of roundup as a bug tracking system. It does, however, come with an odd list of default priorities:

One thing I don't like about it is that it conflates priority and severity. I think it's useful in a bug tracking system to distinguish priority and severity. While the two are often related, it is possible to have a high-priority low-severity bug (e.g. embarrassing typo in UI the day before an important customer meeting) and a low-priority high-severity bug (e.g. software crashes on an unsupported OS)

Severity, in my view is, about the impact on what the user is trying to do. Severity is fairly easy for the submitter to judge. Priority, on the other hand, is more of a triaging issue that needs to take into account a number of factors the submitter might not be privy to. So priority is best assigned in some separate review session. That is not to say the submitter can't be involved in that review — just that others need to be involved too so priority can't generally be judged at the time of submission.

Here is a list I came up with a few years ago for the severity of bugs:

Any alternative lists people have used and found useful?

Note that features aren't included here. I'm not sure that features should be treated as a level of priority or severity. I like the approach of them being a completely different issue type. I also think there's value in having a "task" type which covers things that aren't features or bugs but nevertheless benefit from being tracked. The only problem I see with different types is that, as a developer you really want to see all your issues at once, whether they be features, bugs or tasks. It isn't clear to me how one would do that in roundup.

UPDATE (2005-01-03) : Now see More on Priority and Severity

2004/12/17 (permalink)

Nominations Open for 2005 Australian Blog Awards

see http://kekoc.com/wp/archives/2004/12/14/2005-australian-blog-awards-nominations/

2004/12/17 (permalink)

Leonardo Release Candidate

The first release candidate for Leonardo 0.4 is available at http://jamessaiz.en.wanadoo.es/2004/12/leonardo-0.4.0-rc1.tgz. Let me know if you encounter any problems. If all goes well, Leonardo 0.4 will be out by the end of the year.

2004/12/16 (permalink)

Why Couldn't They Have Had Blogs in 1986

I was reminiscing with my parents this evening about my first year of high school, which I did by correspondence because we were living in Brunei at the time. My mum reminded me that the thing I hated most was having to write a journal for English.

My teacher didn't care what I wrote, as long as I wrote something. But I always found it difficult, perhaps because the act of writing something down on paper and posting it off to my teacher in Australia made it all seem so formal.

How much easier it would have been if blogs had existed back in 1986!

2004/12/15 (permalink)

It Took Me A Lot Longer

Scoble mentions that today is the fourth anniversary of his blog and he credits Dave Winer as one of the people that talked him into it.

Thinking back, four years ago was the EDevCon conference in New Orleans that I gave a Web Services keynote at. Scoble was the organizer. I also met Dave Winer there for the first time (and Brent Simmons). Dave has a picture to prove it (that's me with the Slashdot fleece :-)

Whatever Dave said to Scoble to talk him into blogging, he mustn't have said to me, but I got there eventually.

2004/12/15 (permalink)

Architecture of the World Wide Web, Volume One

The Architecture of the World Wide Web, Volume One has become a W3C Recommendation.

Congratulations to the W3C TAG. This is a great piece of work (even if the title does sound like a Mel Brooks movie) and provides an invaluable foundation for the design of Web-based systems.

Where Leonardo has failed to embody the terminology, principles or best practices of this document, I consider that to be a bug in Leonardo.

2004/12/15 (permalink)

Thoughts on GNT-NET Parallel Glossing Project

Zack Hubert mentions that I'm thinking about using the NET Bible for a collaborative parallel glossing project.

Here is how it might work:

The user is presented with the Greek text and the NET text.

Consider Luke 1.1. The Greek reads:

Ἐπειδήπερ πολλοὶ ἐπεχείρησαν ἀνατάξασθαι διήγησιν περὶ τῶν πεπληροφορημένων ἐν ἡμῖν πραγμάτων,

The NET reads

Now many have undertaken to compile an account of the things that have been fulfilled among us,

It should be possible to select any number of words in the Greek and any number of words from the NET and assert that they correspond (or link) to one another. There is no need to link between the entire verse of Greek and the entire verse of the NET because that link has already been made automatically.

Say the user selects Ἐπειδήπερ. They should then be shown the part-of-speech and parse information for the word (in this case C) as well as the lexical form, ἐπειδήπερ. The user should also be shown all previous glosses for ἐπειδήπερ in other contexts.

The user is then instructed to select the word or words that directly translate ἐπειδήπερ. In this case, the user selects Now and submits.

The user need not progress in order. Say the next thing they select is the word πραγμάτων. As before, they are shown the part-of-speech and parse information (N-GPN) and the lexical form, πρᾶγμα. Again the user is show previous glosses. These glosses should include those specifically for πραγμάτων as well as other forms of πρᾶγμα, perhaps displayed differently.

The user then selects things and submits.

It should be possible to select multiple Greek words and link them to just one word from NET. It should also be possible to select one Greek word and link it to multiple words in the NET. Many-to-many links should also be possible. For example, a user could select περὶ τῶν πεπληροφορημένων ἐν ἡμῖν πραγμάτων and of the things that have been fulfilled among us and submit that linkage.

It is also possible that some words won’t link to anything.

Many-to-many linkages should be encouraged where the particular sense of a word is entirely determined by its use in a sequence (such as an idiom).

Users should be discouraged from doing many-to-many linkages where the sequence isn't a grammatical unit such as a phrase. For example, a user shouldn't submit a link between περὶ τῶν and of the. This clearly can't be enforced.

Users should be required to log in before they can submit linkages. Each linkage will be stored with the email address of the person that made the linkage.

While users may be encouraged to work on particular verses, they should be free to go to whatever verses interest them. Duplicate effort is not a problem and provides redundancy. The data can be checked later for inconsistencies.

2004/12/14 (permalink)

MorphGNT v5.05 Available

See MorphGNT.

2004/12/14 (permalink)

Film Project Update: DVDs and More Festivals

We've just submitted Alibi Phone Network to five more festivals: Newport, Sedona, Vail, OC and Sonoma Valley.

It was our first submission using professionally duplicated DVDs rather than making copies ourselves. We got a batch of 100 done, of which I expect around 50 to be submitted to festivals.

2004/12/14 (permalink)

Best Use of MorphGNT So Far

Zack Hubert has taken my MorphGNT and built a GNT Browser that blew me away! It displays the text in the browser; hover on a word and the lemma and parsing is shown in a pop-up; click on the word and you get a graph of word occurrence by book with the ability to list all occurrences.

I've toyed with web interfaces to the MorphGNT for years but nothing even remotely as slick as this.

2004/12/14 (permalink)

Ground Loop

The last few days I've been reorganising my home office / recording studio (unfortunately, they are still the same thing).

When I plugged my Korg Triton LE into my Digidesign Digi002 I noticed the distinctive hum of a ground loop. I've never had to deal with a ground loop before. Basically they occur when one device's path of least resistance to the ground is through the audio cable. The result is a low hum at AC frequency (50Hz in Australia).

So I hopped on to the excellent home-recording mailing list to ask what I should do.

Rodrigue Amyot came to my rescue with some things to try. The first possible problem we identified was that the Korg's power cable is only a two-pin (what were they thinking!)

Another possibility Rod raised was mixing balanced and unbalanced devices. I don't know what the Korg is (my Roland keyboard definitely has balanced outputs) and I don't know what the Digi002 takes although I would guess balanced. My cabling assumes both are balanced.

Unplugging the power to the Korg still left the hum which suggested it wasn't a power ground loop problem after all.

Still working on the problem. Audio electrics is fun.

2004/12/13 (permalink)

Blog Goals or Lack Thereof

Dorothea Salo in Caveat Lector comments on how odd it seemed being asked how her blog was going. I think I would react the same way.

Ask how my music's going, or my filmmaking, or my morphological analysis of the Greek New Testament and I'd be able to tell you. They are projects, or at least interests manifesting as specific projects. Even the Poincare Project is foremost about me taking notes on my way to understanding the (possible) proof of the Poincare Conjecture. The use of the blog for those notes is largely incidental to that goal.

Blogging in and of itself isn't a project for me. I think that's largely because I don't have goals for it. Sure I track referrer logs and webstats, etc. Sure I get a thrill when Mark Liberman likes an idea of mine or Doc Searls doesn't. But they aren't accomplishments tracked against some schedule. I don't have monthly Scoble linkblogging targets.

Not that there's anything wrong with that. But for me, like Dorothea, blogging is scribbling. Occasionally making announcements, but mostly just scribbling.

2004/12/13 (permalink)

On the Red Couch

No, I'm not appearing on Scoble's Red Couch (I wouldn't say no, though) but Nelson James will be on this red couch next Sunday.

That's right, the pop duo I'm in has been invited back (always a good sign) to perform on local chat show, The Couch, for their Christmas special.

UPDATE (2004-12-14): Unfortunately, there is a conflict with a play that Nelson is in and so we've had to cancel our television appearance. However, we should be appearing some time in the new year.

2004/12/12 (permalink)

More on Typed Citations

I've written before about the idea of typed citations.

Mark Liberman (who I might have studied under if I'd gone ahead with my PhD application to UPenn) comments on the idea of typed citations with some excellent thoughts. One thing that I realised, reading Mark's post: I probably wasn't clear that I was envisaging a controlled vocabulary, much like XFN has.

The notion of typed citations relates to trackbacks, a topic I've also talked about before. Bryan Lawrence (who has recently become my main sounding board in the development of Leonardo) asks about semantics in trackbacks. He is talking about typing the source object rather than relationship but the two are related. In RDF terms, one is a class, the other is a property. I would love to see both able to be expressed in a trackback.

2004/12/12 (permalink)

Poincare Project: A Basis for a Topology

Because of the requirement that unions and finite intersections of open sets must also be open sets, you don't need to specify every open set in order to define a topology. You can characterise a topology by describing a certain class of open sets from which the other open sets can be calculated.

Such a class is called a basis for the topology.

Because members of the basis are themselves open sets, once we have a basis we can generate all the other open sets by taking unions.

A random selection of subsets of X isn't always going to give as a basis for a topology on X anymore than it gives us a topology, so what restrictions exists on a basis the ensure it can generate a topology?

Clearly every element in the set X must appear in at least one basis open set. Otherwise that element would miss out on being in any open sets (and we know that, by definition, X itself must be open).

There is one more requirement, however, that must be met. Consider X = {a, b, c}. The open sets {a, b}, {b, c} cannot form a basis because if {a, b} and {b, c} are open then the intersection {b} must be. But {b} cannot be open because it isn't the union of basis open sets.

To avoid this, we have the additional requirement on a basis as follows:

if x is in the intersection of two basis open sets then x must also be in a third basis open set which is a subset of the intersection.

This, along with the requirement that every element must appear in at least one basis open set is sufficient to ensure that one has a basis for a topology.

2004/12/10 (permalink)

Shift to Aggregator Use

I noticed some interesting numbers in my website logs that suggest a significant shift towards aggregator use when reading this blog.

In October, there were 772 unique IP hits to the full-text atom feed. In November, that number was 941. That's a more than 20% increase.

However, October saw 3228 unique IP hits to blog pages compared with only 2600 in November. A just under 20% decrease.

Now this might not have been caused by a shift from people reading in a browser to people reading in an aggregator but it does seem plausible, even likely.

2004/12/09 (permalink)

MorphGNT v5.04 and Beyond

I've released a new version of my MorphGNT.

Details of the changes are on the MorphGNT page but they all stem from a simple query performed via a Python script: in cases where there is no parse-code (i.e. the word is essentially uninflected), is the text form the same as the lexical form (other than accentuation)?

In some cases this rule means that new lexical forms need to be provided to allow for spelling variation, rather than the lexical form normalising spelling. This is an editorial decision I've made that makes more sense in the larger picture of where I'm going with the MorphGNT.

The corrections I'm making to the CCAT database are really just a side-effect of my efforts to build an original database of New Testament Greek morphology. I'll say more about it as it develops but the idea is that surface forms, lexical forms, spelling variations, roots, stems, suppletion, morpho-phonological rules, etc. will all be catalogued with relationships between them expressed as a directed labelled graph.

Eventually, the MorphGNT will reference into this graph rather than merely give the lemma. There'll be a partial ordering of nodes in the graph (expressed by a subset of arc types) and so references will be to the node that is as general as can explain the specific surface form.

2004/12/09 (permalink)

Integrating Subversion and Roundup

I'm using Subversion for Leonardo and have recently started using Roundup for issue tracking.

I'd like to have some level of integration between the two. The sort of thing I was initially thinking of was being able to associate an issue with a revision and vice versa.

The Roundup wiki gives an example of making something like Version:37 in a issue message automatically get turned into a link to the version control system (or something like ViewCVS).

Because Roundup is extensible in the object types it manages, one could presumably go a step further and have a class called "change" and extend subversion to, every time a commit is done, create a new change object for it in Roundup including the commit message.

References to issues could then be made in commit messages (and the link automatically made). Furthermore, Roundup would facilitate chatting about revisions. Revisions could be classified by topic, assigned to people for review, etc.

2004/12/08 (permalink)

Film Project Update: Two More Festivals Without a Box

Just completed the submissions for Ann Arbor and Aspen. For these festivals I was able to use the phenomenally useful site WITHOUTABOX.

WITHOUTBOX lets you enter the information about your film once and submit electronically (everything but the film itself but that's coming) to each festival. If you're submitting to more than a couple of festivals, this is an incredible time saver. Not only that but the site provides a calender showing upcoming festival deadlines filtered by whether your film is eligible for the festival or not.

They have support for submission to hundreds of festivals (including some pretty big ones) and seem to be adding more all the time. They also have a larger database of known festivals that aren't part of the WITHOUTABOX submission system (yet) so you can still track their deadlines too.

2004/12/08 (permalink)

Poincare Project: Connectedness, Closed Sets and Topological Properties

Some topological spaces have the property that they can be decomposed into two disjoint non-empty open sets. In other words, there exist two non-empty open sets whose intersection is empty but whose union is the entire space. Take our ball of clay and cut it in half.

Such a topological space is said to be disconnected. Topological spaces for which this is not true are said to be connected.

Another way of defining the same notion of connectedness is via the notion of closed sets. (The existence of open sets suggested there would be something called closed sets right?)

A closed set of a topological space is simple one whose complement is open. In other words, if you have an open set, then the set of points not in that open set is a closed set. One interesting property of this definition is it allows a set to be both open and closed at the same time. If a set and its complement are both open, then both sets are also closed.

Because, by definition, the empty set and the set of all points in a topological space are open sets, they are also closed sets. And here is where we come to the definition of connectedness based on closed sets.

A topological space is connected if and only if the only two sets that are both open and closed are the empty set and the set of all points. If any other sets are both open and closed then the topological space must be disconnected.

It is fairly easy to see why this is true. If two disjoint non-empty open sets A and B have a union which is the entire space then A and B are each others complements. Therefore A must be closed (because B is open) and B must be closed (because A is open). Therefore A and B are both open and closed.

Connectedness is said to be a topological property because it is based purely on the open sets and no additional structure. Because topological properties are based only on the open sets, they are preserved by a homeomorphism. All homeomorphisms preserve all topological properties. So if a space is connected, then any space homeomorphic to it will also be connected. An important corollary is that you can never find a homeomorphism between a connected space and a disconnected one, or between any two spaces that have differing topological properties.

In the example of cutting our ball of clay in half, the before and after are not homeomorphic because the before is connected and the after is disconnected. Again, we've ripped apart points that were once in lots of open sets together so that now the only open set they share is the topological space as a whole.

2004/12/07 (permalink)

Next Film After Alibi

After we finished principal photography on Alibi Phone Network, I suggested our next short film should expand in one of the following three dimensions:

Tom has been working on a great script that I definitely want to produce—the problem is it expands on Alibi in all three dimensions simultaneously: 40 mins versus 14; really deserves HD rather than MiniDV; massive increase in cast/crew/prop/location requirements. To do well, it would take 5 times as long a shoot and 10-20 times the budget of Alibi and, particularly given my lack of experience on HD, just too much of a risk.

So today I suggested to Tom that we think about an intermediate project. One that is around 20-25 minutes, shot on HD but not requiring much more beyond Alibi in terms of cast/crew size, number of locations, etc.

I have an idea I came up with in 2001 that would probably fit well. Watch this space!

2004/12/07 (permalink)

MorphGNT v5.03 available

More corrections now and more coming soon.

Version 5.03 contains a major correction to the lemma PRO; a correction to MYRA; some spelling distinctions ENEKEN/ENEKA, BETHSAIDA(N), GOLGOTHA(N); and case corrections in proper names GERASENOS, STEFANOS, FOROS, TREIS, TABERNE, DIABLOS.

See MorphGNT.

2004/12/07 (permalink)

Film Project Update: First Festival Submission Arrived

The Alibi Phone Network DVD arrived at SXSW. Next up: Ann Arbor and Aspen.

2004/12/06 (permalink)

MorphGNT v5.02 Available

Some breathing corrections on rho-initial words. See MorphGNT.

2004/12/05 (permalink)

Structured Tag Naming in Subversion

I've recently started using Subversion for versioning the Leonardo code base. While I've admired the design of Subversion since before 1.0, I'd never really had an opportunity to use it on a project.

One of the things I've done with the Leonardo repository is followed the suggestion of the O'Reilly Subversion book in having three top-level directories:

However, it's just occurred to me that, because tags are just copies with their own directory path, I could add some structure to my tags. Because I normally use tags for either checkpoints, milestones or releases, my top-level directories could be:

Even within things like /releases I could have structure such as

I'm thinking aloud but it seems like a reasonable practice to follow. Anyone done anything similar?

2004/12/04 (permalink)

Content made available under a Creative Commons Attribution-NonCommercial-ShareAlike license