fun file facts!

Posted by Mark Mon, 09 Jul 2007 06:09:36 GMT

so, I’m doing this odd little visualisation project. Part of it is to do with Facebook, which means that if it gets popular at all, my poor little server is going to get pounded harder than a goat at a furry convention. Therefore, I have some interesting constraints on resource usage.

Friends

I’m using the graph drawing library GraphViz, which is in C. I’m writing my app in Haskell using HAppS, and the Haskell interface to C is all fine and dandy: the difficulty comes because GraphViz wants to output its graph to a file, rather than making it available as a string in memory. This makes things difficult: I really don’t want even the possibility of hitting the disk.

To make this more concrete, I wrote a little C testing script to see how fast this is on my laptop. I took three approaches:
100 times, either:

  1. Copy the strings back and forth in memory - this should simulate what would be happening if GraphViz generated the graph in memory rather than insisted on copying it to a file.
  2. Create a ramdisk, and write the strings out to it
  3. Write the strings to disk, then read them in again: the hope here is that the built-in IO caching will save me

The results for each approach respectively:

10:47 ~/projects/current % time ./a.out -m
./a.out -m  1.68s user 0.09s system 71% cpu 2.480 total
10:47 ~/projects/current % time ./a.out -r            
./a.out -r  0.43s user 2.56s system 48% cpu 6.233 total
10:47 ~/projects/current % time ./a.out -f
./a.out -f  0.49s user 4.09s system 28% cpu 16.107 total

So it looks like the inbuilt caching method is not so great. The ramdisk is faster, but still not as good as using the strings in memory - presumably system call overhead is hurting me. A third option I haven’t yet investigated would be for the Haskell process to open a named pipe and have C write to it, but I think that would require at least two processes: at the moment, I just have the one HAppS process and would like to keep it that way if possible. (My current host has a limit of 20 processes, which is a bit anemic.) In any case, it’d be at least as bad as the ramdisk approach, although possibly a bit more doable on a shared host.

In other news, I saw the Maladies play last night at the Hoey. They did an absolutely blistering set: I’ve never seen them quite that sharp. Roll on the album…

3 comments

Comments

  1. peteg said about 5 hours later:

    Hey, it’s one thing to be a leaf node and quite another to be non-existent.

    You really are a sucker for punishment, using all these languages… IIRC it might pay to look a little closer at hacking graphviz so it does what you want… lots of backends implies (hopefully) a decent abstraction. Maybe. Quickest hack now, when you get users and job offers you can tart it up. Just abstract it in your Haskell program…

    Nothing quite like backseat design.

  2. peteg said about 5 hours later:

    Oh, and send me a copy of the CD if it ever shows up.

  3. mark said 1 day later:

    Aw, snook, are you feeling left out? I think it’s a pre-gammie scan of the network. One of the things I’ll fix in my Copious Free Time.

    you’ll be back in Sydney before the record shows up and you know it, too.

(leave url/email »)

   Comment Markup Help Preview comment