Archive for: 2008 2007 2006 2005 2004 2003

Too many files: Reiser FS vs hashed paths

2007-12-26 Tags: , ,

On my hard drive, some directories attract lint faster than my CPU fan. I wipe then clean but as soon as I blink they already contain a zillion files. And for some reason, having a zillion files in a directory on GNU/Linux is a really bad thing.

With Gazest, I decided to store only the files meta data in the database and to keep the content on disk. To prevent name clashes, the content filenames are the HMAC-SHA1 hashes of the data. Of course, that means that I have to expect a zillion files in the content directory and as soon as you mention the problem of to many files you hear "Reiser FS," loud and proud from the cube next to yours. But it is really the best solution?

I'll focus on a single problem: access time of a file stored in a directory with a lot of other files. I don't care about CPU usage, disk usage, fragmentation, paging, or recovery tools. When I read a file, how long will it take? Thats it. Surprisingly, I could not find good benchmark for that typical use case. Sure, there are many anecdotal tales of people who noticed a significant improvement with a new FS when they re-installed their mail server but they end up comparing completely different setups with a variable real world server load. Lack of measurements lead to counter productive faith based debates so here is my attempt to quantify the issue.

Exporting kmail filters

2007-12-23 Tags: , , ,

I love emails. Emails can be threaded into conversations, sent to multiple recipients, contain attachments, and be archived into folders for easy retrieval. Emails are scriptable. You can setup filters that will classify them according to complex criteria, put them into folders, call scripts to trigger events, pipe them through commands, or simply delete them.

I use kmail. It has a great filter system with a visual regular expression editor and filters can be bound to toolbar icon or to keyboard shortcuts. But kmail has a really poor filter export facility. I can understand why it's hard to export them: filters are not just action units; they are part of a pipeline and many of my filters are completely meaning less if they are not executed in a very specific order. Nevertheless, I have a set of non-trivial filters at work that I want to keep in sync with my filters at home.

Kmail hides its filters in kmailrc, a .ini file a lot a auto generated noise. At first, I tried to copy all the [Filter XX] blocks from one kmailrc to the other but there is a really big problem with this solution: it seems to work. Some of the filters will indeed get imported but if you end up with more filters than you previously had, the last few filters in your pipeline will silently get discarded. For some reason, kmail can't figure out how many filters you have so it keeps the count in the [General] section of the kmailrc. I don't want to count my filters each time I synchronize them so I wrote a convenient script to take care of that. Enjoy!

A case for hg on /etc

2007-11-27 Tags: , , ,

All of a sudden, I find myself editing lots of config files on systems on systems maintained by lots of other people. Most of the time, a text editor is all what you need but there are times when is doesn't cut. When I become a living implementation of bin/patch, I seriously consider putting /etc under a revision control system.

Why is it good to have /etc under a revision control system? We often have to perform a change on related but not quite similar servers. Furthermore, it happens that the solution is not straight forward to come up to. If we know that it's doing to require a lot of tweaking, we can always backup the original file and generate a diff once we have it working. This sucks for many reasons: it won't take into accounts changes involving multiple files, it's hard to merge into a single operation multiple edits, and you end up with way too many backups: foo.cfg, foo.cfg.bak, foo.cfg.old, foo.cfg.old1, foo.cfg.2007-11-18, and foo.cfg.051211. Yuck!

Battle cries from the trenches

2007-11-20 Tags: , , ,

Today I joined the front-line response team at Savoir-faire Linux. I have the pleasure to work with an extremely skilled group. This is not a help line where someone will ask if you forgot caps-locks; if you call us, we log into your server and we fix the problem, whatever it is.

One thing that strikes me is how inefficient splitting work can be. One guy hacking on a server can only accomplish so much. If you give him a helper, no matter how skilled he is, I doubt that you can increase the output by more than 25%. Why is this so?

A first guess is that voice can only carry so much bandwidth. Given how sequential a spoken message is, and how long the seek time on a human is, polling a coworker for more information has a fix cost of at least 20 man minute. One solution if of course to serialize knowledge in a common pool and this is where wikis can save an incredible amount of time, and money. The wiki that we use at the moment it crap. The markup is heavy and counter intuitive and search is completely broken. It has a quite fancy access control system but the whole thing can become annoyingly slow.

My primary target for Gazest is Internet communities but now I see something that had escaped me: wikis can help you make giant leaps if what you sell is knowhow. Of course, your needs won't be the same if you aim internal knowledge bases. Internet communities are perfectly fine with a simple user rank system like on Wikipedia: you have a lot users, a few admins, and handful of people who can grant admin privileges; there are no groups, no per-page rights, and pages can't be hidden from other users. On the other hand, internal wikis describe really intimate knowledge of mission critical systems, including pending security problems. If that kind of information was to get on the loose, the havoc would be much worst than if you had a million credit card numbers stolen.

Aside from access control, I see many opportunities to innovate: our front-line response team could use a non-crap set of macros to input network diagrams in textual form, a lawer firm could use a set of macros to refer to laws to numbers and to databases of former rulings, and it this is only the beginning. A wiki with a flexible macro language could probably spawn a decent consulting business since anyone who works primarily with knowledge will save a lot of money if they can bent theirs tools to make knowledge easier to input and retrieve: just count how much 20 man-minute costs you. I don't know where this can lead Gazest of if it's a path where I want to take it but this is definitely an area worth exploring.

Free video lectures by Richard Dawkins

2007-11-18 Tags: , ,

A common misconception is that science is a body of knowledge. Science is more accurately described as a process. This misconception is undoubtedly the result of bad textbooks that list known facts like an encyclopedia and tell little about how we come to then and what kind of evidences would be required to challenge them. Most textbooks teach much on what to recall but little on how to think.

I really like Richard Dawkins's prose so I was extremely pleased when he announced that five video lectures were now free to download from his website. I'm impressed by his talent as a lecturer: he leads his argument methodically without getting lost into details, he uses a wide variety of props and samples to illustrate his points, and he interacts efficiently with the audience to keep them alert. Those are the best videos I've ever watched on a computer, be sure to have a look at them.

Many code news

There are fashions in the markup world. There was a time when using colons (':') to split fields in /etc/passwd was enough, a time when no one had a problem with using TABs as command delimiter in Makefiles. Then came the era of heavy markup, "more semantic!", they all asked for, and we received XML.

More semantic is a good thing but anyone who wrote documentation using DocBook knows that the heavy syntax gets annoying really fast. No wonder no one documents his programs. Fortunately, some lazy programmers wanted, for some obscure reason, to document their programs; they propelled us into a new era of light weight markup.

There are quite a few really good light weight markups out there, and Gazest supports most of them. For simple formating, my favorite is definitely Markdown. It reads like text emails: the syntax doesn't do much but the essentials are there and the syntax actually helps to read the source instead of obfuscating it. For blog comments, or anything that won't need much semantic, in applications where you can't use for HTML, for security reasons or just because it's a pain to type, Markdown is the way to go.

Gazest is being sponsored

2007-11-08 Tags: , , , ,

I'm really excited to announce that the Gazest development is being sponsored by Savoir-faire Linux. The demo site will now run much faster: they have server farms with big fat pipes scattered across Montréal. Cyrille, their CEO, has a great understanding of post-industrial economy: the market of services where knowhow is the main capital. It's always a pleasure to talk with him; he sees free software not as an altruistic endeavor but as the only logical choice for corporations to compete in the modern world. He believes in the economic viability of free software and the current sponsorship is the testimony that those are not empty words. More exciting news to come shortly, stay tuned.

setuptools_git 0.3

2007-11-06 Tags: , , ,

My gitlsfiles plugin is dead: it was a silly name. It has reborn with the really sexy name of setuptools_git. Setuptools_git 0.3 has better documentation and is more portable than gitlsfiles.

The Hitchhacker's Guide to Montreal

2007-10-31 Tags: ,

This guide was originally developed on the Gazest demo site with the help of the MSLUG hackers. I'm about to push a new version of Gazest on the demo site that changes internal storage and the current content will probably be nuked rather than upgraded. This guide, unlike the rest of the demo site, is quite good so I repost it here in hope that it can help local dwellers and future visitors enjoy our fascinating city.

First of all, DON'T PANIC. OOPSLA starts in just a few days and we await a lot of famous Lisp hackers so I thought that it would be nice to give them a list of decent things to do in Montréal. One might think that I'm trying to coerce people into beta testing Gazest and he would be right.

Gazest 0.3.9

2007-10-27 Tags: , ,

Gazest 0.3.9 is out. It now runs on Alchemy 0.4, there are many bug-fixes here and there, the style have been improved and you can now search. Enjoy!

update: 0.3.9.1 is out: there was a bug with the abuse report form.

A new kind of wiki

2007-10-18 Tags: , , ,

What is a wiki? It has to be more then a just program to transform _light_ *weight* text markup into valid html. Beyond the markup, a wiki is a platform to help many persons work on a shared document. Being a programmer, I'm familiar with this concept since we use similar tools to work on shared programming projects: revision control systems.

The state of the art in revision control systems is Git and Mercurial. What is it that makes then so good? Some people will tell you that it's because they are distributed. That's a good point but there is more to it. They manage to work without centralized authority by merging concurrent changes. To be able to do that, they have to be able to detect which changes you need to merge and and which ones you already have and the key to do that is to keep the full family lineage of every revision of every files for all the people that you work with. The main problem with Subversion is not that it's centralized, it's that it flattens the history into a linear series of revision and that destroys the hope for smart merges.

Back from Slashdot anniversary party in Montréal

2007-10-15 Tags: ,

Back from the Slashdot anniversary party in Montréal. In Montréal, we have great beer and great brew pubs and when we say it's going to be a geek party, we really mean it.

I forgot everyone's name so feel free to point yourself out in the comments. If it doesn't look that wild, it's just because people were too lazy to take pictures while I was jumping on tables during my demonstration of rising on a surfboard.

We even had someone with a 2 digit Slashdot UID, that's quite something!

The other attendees have started to post their pictures on the Slashdot party page. We left with many questions unanswered. As an example, what could Stéphane's belt buckle stand for?

Open-letter to my favorite bands

2007-10-06 Tags: ,

How low can RIAA go? Jammie Thomas was just fined $220000 for sharing 24 songs. I'm not saying that she should be allowed to share the files. But is this a punishment fit for the crime? Music artists need to find someone to represent them that shows more common sense. Some propose to boycott the RIAA. Very good, but I already did that for the past few years now. It reached the point where I feel bad to just own CDs published by RIAA members. So I started to mail my disks back to my favorite bands. Here is the letter that I include in the envelope.

I want to return this CD. It's a really good CD; I bought it a long time ago and I have many strong memories tied to its songs. But I can't enjoy it anymore.

Fined $220000 for sharing 24 songs, is this reasonable? Until the senseless law suits end, I will only listen to music not distributed by the RIAA. I hope you plan to join a non-RIAA label because I really like what you do.

Some bands include a snail mail address in the booklet but many don't. Is there a place where this kind of information is available? If you decide to boycott RIAA, let the artists know about it. Radiohead has got the message and they now offer downloads of their next album at a reasonable price. The EFF has a good plan but it can only work if the artists are with us.

There is hope but we need to get busy and loud or it won't happen.

Slashdot anniversary party in Montréal

2007-10-04 Tags: , ,

There will be a Slashdot anniversary party in Montréal, probably at Ste-Elizabeth. Aside from the obvious great beer selection, there will be free t-shirts. I know many of you wasted a lot of time on Slashdot so I hope to see you there. The registration is a bit counter intuitive, you need to select "QC" on the party finder page and to register for the party you plan to attend from there.

Git and Setuptools

2007-09-28 Tags: , , ,

Explicit is better than implicit. It's in the Zen of Python. Who could disagree? Setuptools has a feature that would prevent me from reaching peace of mind. You can tell it to include in your package all the files that you track with a revision control system. I used to prefer being explicit by using MANIFEST.in, until I started to heavily refactor a package layout. This is one thing that Git does really well. You just add all the new files recursively and it will figure out which files are really new and which are new names for old files. But updating MANIFEST.in can become quite a pain.

What happens in practice is that rules in MANIFEST.in have an extremely broad scope. The latest Pylons recursively includes everything in the template directory. It would be a pain to make the right rule; you need to include all the templates for all the templating languages supported by Buffet and each engine is really permissive on the file extension used to name its template. The current rule will match all Emacs backup files and a lot of junk that most people don't want to distribute. When I switched to include the files tracked by a revision control, the only file that I don't explicitly wanted in there was .gitignore. In this case, being explicit on what we don't want is a lot cleaner than being explicit on what we do want.

"Oh wait," you may ask. I mentioned using Git but Setuptools has no Git plugin. Until now. Here is gitlsfiles (2.4 egg, 2.5 egg), a plugin to have Setuptools packages all the files tracked by Git. You just need to install it and Setuptools will figure out the rest.

On news sites

2007-09-18 Tags: , ,

When I started programming in '99, Slashdot was the big thing. I would read it as soon as I had time to waste. Be it waiting for the compiler to compile, waiting for the coffee to brew, or just waiting for 17h to arrive; Slashdot could fill my day.

Then something happened. I don't know when it happened and many disagree on that anyway but over time, Slashdot was not it anymore. More politic, more speculation, more outrage about insignificant quotes from famous persons. It was not news anymore; it was something else.

I tried to read Kuro5hin but that was still not it. It took some time for other geeky news sites to arrive and I was spending more of my "waiting" time on comic strips or even, and I do admit it with some shame, on Bash.

Relics

2007-08-23 Tags: , ,

Montréal is full of relics of the industrial revolution. Back in the old days, the Old Port was the heart of the city. Right next to it was a trading and economic district, just outside of the this district were factories and other industrial buildings. The Old Port is little more than a tourist trap these days. The economic center moved further inland and the factories are derelicts being slowly transformed in condos.

I went exploring those relics with an adventurous friend. The key to urban exploration is to keep an open mind; to look for places that people see but forget to visit; to find your way without breaking-in. Just as the hiker, the urban explorer should leave no trace. You get to see the city from many angles, the angles of the men and women who many years ago worked hard to turn Montréal into a city where people like to live in. Montréal is beautiful when seen through a truss bridge.


We ended up in a squat under a railway overpass. They managed to arrange a reasonably comfortable dwelling with couches and improvised room separations. I could live there if it wasn't for the lack of sanitation. And the lack of heating. No way to learn how they managed to heat their place; our hosts could only speak non-sense. There must be a fuse in our brain that blows to shield us from reality when it becomes to hard to bear.

Older stuff


2007-08-18 Interview with a tropical botanist


2007-08-16 Ugly Visa


2007-08-15 Fire


2007-08-13 Extended family expands


2007-08-13 The best time to submit to Digg and Reddit


2007-08-07 Back in Montréal


2007-08-06 Emacs 22 on Debian Etch


2007-08-01 Live Aloha


2007-07-30 Salt ponds


2007-07-24 On first bikes


2007-07-20 Could Hotmail drop mails to save bandwidth?


2007-07-18 A cure for gigazillion account syndrome


2007-07-16 Yould 0.3.5


2007-07-14 More Hawaiian Food


2007-07-12 On object infested APIs


2007-07-10 Stallman on GPLv3


2007-07-02 The Scrabble cheat sheet project


2007-07-01 On Hawaiian food


2007-06-29 Hiking Hawaii: the Kalalau Trail


2007-06-16 Yould 0.3


2007-06-11 Fairmount Bagel


2007-06-05 Server meltdown


2007-06-03 A better soda can stove


2007-05-26 New blog engine


2007-02-12 Finding good names


2007-02-08 On Firestarters


2007-01-03 Etch