The diary

21.01.2014 ssh keepalives
  • If your remote server hangs on you occasionally, remember to set ClientAliveInterval in the sshd_config (it's not there by default, at least not on Precise): [z9.io]
15.01.2014 TLER on Samsung 840 PRO
  • If it supports it, why not enable it? [en.wikipedia.org]
     smartctl -l scterc /dev/sda
          SCT Error Recovery Control:
             Read: 70 (7.0 seconds)
             Write: 70 (7.0 seconds)
    If the drive is taking more than 7 seconds to return on an operation, I'd rather Linux knew about it than kicking the drive out completely.
14.01.2014 Bitlee failing to connect to MSN?
  • If you are getting messages like
     msn - Logging in: Connecting
     msn - Login error: Could not connect to server
     msn - Logging in: Signing off..
     msn - Logging in: Reconnecting in 5 seconds..
     msn - Logging in: Connecting
     msn - Login error: Could not connect to server
     msn - Logging in: Signing off..
     msn - Logging in: Reconnecting in 15 seconds..
     msn - Logging in: Connecting
     msn - Login error: Could not connect to server
     msn - Logging in: Signing off..
     msn - Logging in: Reconnecting in 45 seconds..
     msn - Logging in: Connecting
     msn - Login error: Could not connect to server
     msn - Logging in: Signing off..
     msn - Logging in: Reconnecting in 135 seconds..
    see [wiki.bitlbee.org] and [ismsndeadyet.com] for the hint -- it's just that you can't connect to the default server, and instead need to use "account msn set server X" before connecting. Doh!
07.01.2014 AMT making you upset?
06.01.2014 BTRFS for production?
16.12.2014 DDNS hurts
  • DDNS updates finally nailed; took quite a lot of investigating as ISC DHCPD's configuration is full of weird gotchas. The primary piece of help was [www.semicomplete.com] which explains in a simple way how static entries need to be set up. Key findings follow.
  • Setting a "ddns-hostname" makes the system actually work more reliably; ISTM that the query the client sends actually affects the way the hostname is determined. I assume this is tied to allowing the client to send its own hostname, which I consider undesireable, as I want the server to control what hostname I put in my domain).
  • Other references: [prefetch.net] [lists.isc.org] [www.zytrax.com]
05.12.2014 Using ping -I
  • It turns out that ping -I is a bit tricky. The simplest thing to do is to use the interface name: kiko@anthem:~$ ping -I eth2 altern.org PING altern.org (80.67.174.57) from 189.35.185.240 eth2: 56(84) bytes of data. 64 bytes from altern.org (80.67.174.57): icmp_req=1 ttl=49 time=259 ms but that is actually lying: the packet isn't going out from 189.35.185.240, which is the address for eth3, but rather from eth2's native address. I even tcpdump'd to confirm.
  • And if you use just the address, it doesn't seem to work:
     kiko@anthem:~$ ping -I 189.19.234.109 altern.org
     PING altern.org (80.67.174.57) from 189.19.234.109 : 56(84) bytes of data.
30.11.2014 LXC aargh and NFS mounts
  • If you are trying to mount an NFS share inside an LXC container on 14.04, it won't work until you fix the apparmor profile: [technuts.tru.my]
29.11.2014 Cron madness
  • I have been trying to get a find command to delete old files and directories under a tree; this is run in a cronjob and I've just been sloppy at it. Today I finally discovered -mindepth 1 and -depth were what I was looking for all along!
  • BTW, a trivial way to ensure only a single cronjob runs is to use flock: flock -n /var/lock/foo command Before I used lckdo, which is included in the moreutils package, but flock is part of util-linux and doesn't need perl madness.
28.11.2014 mtab versus /proc/mounts
02.11.2014 Happy Mailman Day: fixing unhandled bounces!
  • [mail.python.org] had the hint:
     grep ^"<[a-z]" ~/mail/bounces | tr -d \<  | tr -d \> | xargs bin/remove_members --fromall
23.10.2014 Magic SysRq actions disabled?
11.10.2014 Spin the furniture
  • Spent hours of my Saturday with Rafa and two woodworkers doing a full 180 of the TV rack which ended up being low enough for Rafa to hit his head (which would hurt). It almost killed us all but we succeeded and the results are actually.. pretty good!
10.09.2014 Google Hangouts auto-mute
  • I hate it, but [www.pixelmonkey.org] indicates a trivial way to fix it, which is adding a single line to a .config file for the talk plugin.
  • Gparted killed my Windows partition when resizing. Trying to get a recovery disk was crazy hard! It turns out unetbootin is the easiest way to do it, but Windows required NTFS which current unetbootin doesn't easily allow unless you use the hack in [askubuntu.com]
  • And once you get your Windows back you will discover that expanding the partition is instantaneous in the actual disk manager UI!
  • I messed up my GPG trustdb, but luckily: [trog.qgl.org]
03.09.2014 Iodine
  • Finally got IP over DNS working and it's amazing to say the least! [dev.kryo.se]
18.06.2014 CUPS & double-sided printing?
  • We have a decade-old Laserjet 1320 that is great for double-sided printing. That is, until recently -- perhaps even as recently as we moved our workstations to Trusty. What happened?
  • The best hint I found was at [ubuntuforums.org] trying to set the default options for the printer. When I enabled double-sided printing, CUPS warned me there is a Duplexer Installed option -- which was off. Fixed!
30.05.2014 Updated Server BIOS for out S5520HC
  • Huge filename and pretty massive firmware update: S5520HC_S5520SC_EFI_BIOS64_BMC61_FRU33_ME112.zip
  • Moved from 0050 to 0064 -- a 4 year delta between versions!
  • Only issue was the the FRU and DSR update didn't get done because it printed a scary warning about not being able to detect a temperature sensor.
     Detecting Front Panel Temp Sensor Device. Please wait...
      
     Front Panel Temperature sensor device hardware is not found.
     Chassis fan Speed Control (FSC) will not work properly without this
     sensor!
      
     Do you want to still Continue (Y/N)?
  • I guess I'm just going to ignore that as I don't really seem to need the update for those bits. What do the FRU and SDR pieces do anyway?
  • Oh.. I guess I understand now. This is why my fans are screaming! [www.experts-exchange.com] [www.intel.com] [communities.intel.com] [downloadcenter.intel.com] [communities.intel.com] [www.samsung.com]
16.05.2014 USB drives and burn-in
  • I'm replacing the server USB backup drives and looking for good alternatives. I've picked a few and am trying to burn them in before making the commitment (as previous drives I was trying to use ended up dying on me mid-flight). Burn-in for me is badblocks for a few days and some SMART self-testing.
  • One of the drives I got was a Seagate, and it annoyed me that there were a lot of errors in the two first SMART values listed by smartmontools as I did a badblocks on it:
     1 Raw_Read_Error_Rate     0x000f   100   100   006    Pre-fail  Always -       236840
     7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always -       204379
    I say it annoyed me, but then I read this: [serverfault.com]
  • So the lower side of the number is just a counter. And math shows they have zero errors. Yay!
  • And if you have a USB drive that has self-tests being aborted by the host, check to see if it's not sleeping mid-test. At least that is what [ddumont.wordpress.com] says; I'm trying it.
  • Daft Google syncml limitations: [www.nodhing.com]
  • Saved my ass with DD-WRT passwords today not being synced between web and ssh: [www.dd-wrt.com]
  • Quora just taught me:
     Answer from Susan Ng
     I learned this in 1st grade - it's a REALLY easy and simple way to learn
     your nine times table. Or to teach someone else to learn!
     1) Look at your (or someone else's) hands
     2) Say you want to find 9x7.  Put down your 7th finger
     3) Count the number of fingers to the left of your 7. This is your tens
     digit.  Count the number of fingers to your right.  This is your ones
     digit.
     9x7=63
    Wow!
  • What a trick: [testefromhell.wikispaces.com]
  • Typing the em-dash: [askubuntu.com]
  • Getting .bash_profile sourced: [askubuntu.com]
  • Just ran into: [www.fxp0.org.ua]
23.04.2014 Juju and LXC
  • Debugging session as to why I can't get the local provider to give me new machines in Juju. This is probably a regression in 1.18.1.4, but I still don't know yet.
  • One thing which is interesting is that the log for machine-0 is actually where a lot of the container traffic appears. machine-0 is the bootstrap node, and in the local provider, it's what houses all the other containers.
  • root@chorus:/etc/default# apt-get install lxc/precise-backports
  • Escape the console?
  • lxc-ls is busted?
  • [curtis.hovey.name]
20.04.2014 36ers?
09.04.2014 HP Virtual Rooms
  • The trick to getting [rooms.hp.com] to work is to know that the plugin and application they provide are 32-bit. That's not something which is obvious unless you actually read the page carefully, and the failure mode is completely unobvious (the installer runs, the plugin is there, the test page looks like it works but no virtual room ever opens, with a URL flashing quickly before loading back into the test page). There is a trick which you can use to test manually and see what is wrong:
     kiko@limpinho:~$ cd .hpvirtualrooms 
     kiko@limpinho:~/.hpvirtualrooms$ ./hpvirtualrooms 
     bash: ./hpvirtualrooms: No such file or directory
     kiko@limpinho:~/.hpvirtualrooms$ file hpvirtualrooms 
     hpvirtualrooms: ELF 32-bit LSB executable, Intel 80386, version 1
     (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15
    Aha! Okay. So I need some 32-bit libraries..
  • First step, I need
     sudo apt-get install libc6-i386 
  • That lets me see via ldd the situation. And it's pretty bad: I need a total of 44 libraries pulled in as dependencies from this core set:
     libsm6:i386 libpng12:i386 libfreetype6:i386 libxi6:i386
     libx11-6:i386 libasound2:i386 libstdc++6:i386 libfontconfig1:i386
     libxrender1:i386 libxrandr2:i386 libglib2.0-0:i386 libxfixes3:i386
     libglu1:i386
  • But once that's done, it seems to work. I need to put a bit more effort into validating it in the office, but I at least now know how to do it. And best of all, it doesn't seem to need Java!
17.03.2014 Android and Account Syncing
  • If you look at your Android phone settings and all your accounts have "sync disabled", you will never figure out how to fix it. It turns out that you need to look in the Gmail app and enable sync. That in turn enables Android-wide synchronization, or at least that's my experience and what [androidforums.com] tells you to do. WTF.
  • Ah, now that I looked at [support.google.com] it looks more sensible. So the reality is that you control that same setting in both gmail and in the Data Usage settings screen. I bet I disabled it while roaming internationally!
21.02.2014 DHCP in the eyes of Wireshark
  • We had a cable modem that was annoying the hell out of us because it needed to be restarted periodically -- twice a day in the latest weeks. So we called the cable company in and convinced them to swap the modem out. In putting the new modem in I did a lot of log-digging and realized that actually the request note that goes out periodically:
     Feb 20 21:31:05 anthem dhclient: DHCPREQUEST of 177.34.169.88 on eth3 to 189.7.80.20 port 67
    is actually not a request which is going unanswered, since Wireshark shows clearly that there is a Request packet followed by an ACK from that IP address. Oh, actually, I am just grepping the log wrong, because if you look at the full successful operation it looks like this:
     Feb 17 12:22:51 anthem dhclient: DHCPREQUEST of 179.154.136.190 on eth3 to 189.7.80.20 port 67
     Feb 17 12:22:51 anthem dhclient: DHCPACK of 179.154.136.190 from 189.7.80.20
     Feb 17 12:22:51 anthem dhclient: bound to 179.154.136.190 -- renewal in 4407 seconds.
  • The problem we were having previously was that at some point the modem stopped working, and the refresh DHCPREQUEST never got a response, which looks like:
     Feb 17 20:01:24 anthem dhclient: DHCPREQUEST of 179.154.136.190 on eth3 to 189.7.80.20 port 67
     Feb 17 20:02:32  dhclient: last message repeated 6 times
     Feb 17 20:03:35  dhclient: last message repeated 4 times
    Sometimes the modem would do 7 refreshes before stalling, but lately it was rare to get to 4. The log looks much healthier now!
  • One particularly weird thing is that in the actual IP allocation request comes from a different DHCP server than the one which provides the response:
     Feb 20 17:37:45 anthem dhclient: DHCPDISCOVER on eth3 to 255.255.255.255 port 67 interval 3
     Feb 20 17:37:45 anthem dhclient: DHCPREQUEST of 177.34.169.88 on eth3 to 255.255.255.255 port 67
     Feb 20 17:37:45 anthem dhclient: DHCPOFFER of 177.34.169.88 from 177.34.168.1
     Feb 20 17:37:45 anthem dhclient: DHCPACK of 177.34.169.88 from 177.34.168.1
     Feb 20 17:37:46 anthem dhclient: bound to 177.34.169.88 -- renewal in 4840 seconds.
    So 177.34.168.1 provided the response, but if you look at the DHCP Server Identifier that comes back in the OFFER packet it says 189.7.80.20. I don't think that's illegal, but it's certainly not what I've seen in normal site-wide DHCP. And if you look at the updates afterwards the refresh DHCPREQUEST is ACK'd by 189.7.80.20.
27.01.2014 Undocked Libreoffice panes
  • I had this problem for the longest time, and just found out that it is actually a documentation issue: [askubuntu.com]
  • How on earth did they get to control-doubleclick, though?!
22.01.2014 ADSL and Telefonica
  • Once a year I try calling my operator to see if they can upgrade my uplink. I'm amazed that to this day I can only get a 4MB/s link on an ADSL connection from Vivo (ex-Telefonica, ex-Telesp), the local wired operator. It's even weirder that on my current line, which I've had for about 10 years, I can't get an upgrade at all from the current 1MB/s. At the same time, Virtua offers me 20 and 100MB/s on cable at not much more that Vivo charges for their measly 1MB/s. Maybe I won't call again next year!
08.01.2014 Reminder to self: DBL and sendmail access map
  • The [www.spamhaus.org] DBL is great, but if it is blocking email you should be receiving, the way sendmail integrates with milters means you can't work around it by adding the sender address to the access map. The URI-milter package that we use doesn't provide whitelist support either; it's even mentioned in their TODO at [email.uoa.gr] Should I not be using 0.1-versioned software?
  • Another thing which sucks about the URI-milter is that /any/ match is considered positive; for the DBL, which I just found out lists even bit.ly (see [www.circleid.com] for details), this means that both the 127.0.1.2 and 127.0.1.3 are blocked, but they are quite different -- the first is for actual spam domains, and the other, for redirector domains which may be abused by spammers (see [www.spamhaus.org] for details).
     Non-authoritative answer:
     Name:   bit.ly.dbl.spamhaus.org
     Address: 127.0.1.3
  • PS: I've been invited to speak at the Brazilian Campus Party this year! I'm presenting at the Socrates stage on the 29th from 15h30 to 17h00. Joining me will be Paulo Henrique de Lima Santana, Fabio Pires, Marcio Junior Vieira and Marcelo Marques.
07.01.2014 Spreadsheets and Locales
  • I hate locales in spreadsheets. [webapps.stackexchange.com] -- why on earth does the locale change the ARGUMENT SEPARATOR in formulas??
05.01.2014 DHCP root-path weirdness
  • There is an odd bug in the DHCP root-path setting. I just don't know what it is.
  • I used dhcpdump to study it, FWIW; see [bentis.calepin.co] for some handy DHCP debug advice.
  • Cooked up a very crude overall boot time measurement system using rc.local and /proc/uptime. I wonder how reliable it is.
  • PS: for the FAQ "why does /proc/uptime show a larger number for idle time than raw uptime", see [ubuntuforums.org]
04.01.2014 Ripening
  • Have you ever wondered what causes fruit to ripen? Well, I did tonight, and I looked it up on Wikipedia and was amazed to see the article is terrible! But it contained a link to an incredibly interesting entry in the Plant Physiology info homepage: [plantphys.info]
03.01.2014 FSCache for the new year
  • I got a few 16GB SSDs to try as FSCache drives for our NFS-root diskless network. The idea of being able to transparently cache data in them and improve performance is really appealing, but at least on Ubuntu Precise the results weren't ideal.
  • Setup is fairly simple. I wired the SSD drive into the chassis, formatted it to ext4, installed cachefilesd, enabled it in /etc/default and modified /etc/fstab to mount the SSD and add the "fsc" option to the NFS mounts.
  • There is a problem with the kernel Yama security provisions that seems to be triggered by enabling FSCache; when running mutt a bunch of errors show up in the syslog like this:
     kernel: non-accessible hardlink creation was attempted by: mutt_dotlock
    However, it's possible to work around this (see post at [utcc.utoronto.ca] for details) by just setting
     kernel.yama.protected_nonaccess_hardlinks = 0
  • It seems to be working (well, in fact it started working after I fixed the configuration file; I think brun can't be less than 10% or it errors out) as I can see the cache directory growing and nodes being added to the cache hierarchy. And it probably does speed things up, as repeatedly opening up a 6.5 GB-file (this box has 8 GB RAM) results in a pretty good speedup with the disk drive being read at a constant 70MB/s:
     $ time cat win_xp.qcow2 > /dev/null 
     real    4m17.169s
     $ time cat win_xp.qcow2 > /dev/null 
     real    1m11.031s
  • Unfortunately, there are three issues I've found so far. The first is that there is an obvious race that happens when opening files multiple times simultaneously. This happens most frequently when using mutt to try and open the same mailbox twice; when this happens I get a flood of kernel messages and mutt weirdly showing an "unknown" mailbox.
     CacheFiles: Error: Object already preemptively buried
     [kworke] preemptive burial: OBJb2 [OBJECT_RECYCLING]
  • The second issue is that the file UID/GIDs appear to come up busted, at least in some mount points, like you get with NFSv4 when idmapd isn't running:
     kiko@memento:~$ ls -ld .ssh/
     drwx------ 2 4294967294 4294967294 4096 Oct 12 21:02 .ssh/
     
    And while behaviour under concurrent reads may be fixed in newer kernel versions (see [lwn.net] and [www.redhat.com] the other issue is that this KingSpec SSD is actually not that fast, even when compared with loading bits over the GBe network. hdparm shows the performance is kinda miffy:
     $ hdparm -tT /dev/sda1
     /dev/sda1:
     Timing cached reads:   18356 MB in  2.00 seconds = 9183.84 MB/sec
     Timing buffered disk reads: 214 MB in  3.02 seconds =  70.85 MB/sec
    For the same comparison above, if I turn cachefilesd off, here's the result:
     $ time cat win_xp.qcow2 > /dev/null 
     real    0m45.817s
    Wow -- I see a sustained 120MB/s over the network for that read.
16.12.2013 Ubuntu LTS Kernel Enablement Baselines
13.12.2013 Certificate renewal
  • Our SSL cert is expiring and I'm trying to remember how to generate a new CSR. Let me find out.. ah, right, there is a guide at [support.comodo.com]
  • One odd thing is having to generate the combined PEM file manually; I guess they can't do it for you because you hold on to the key when generating the CSR.
  • Anway, all of this is stuffed into /etc/ssl/LOCAL for us now, with a convenient README.
  • Installed also the certificate to be used with our SMTP server; this required a change to starttls.m4 which uses the same cert config entry for the CA and for the server (which suggests to me that it should really be a combined PEM file..)
10.12.2013 Oracle Java dialogs
  • 2013 has been the Year of the Java Update it seems. We've had to update multiple times given the security problems raised, and this is a problem because Banco do Brasil heavily depends on Java for access control to both company and personal banking websites. And since this raft of updates, OpenJDK no longer works for the company banking site, so I've been forced to use a computer with Oracle Java just for this.
  • Previous issues with updates included annoying popups and an unreliable login mechanism; you would get in once every 10 attempts or fewer on my machine. When it failed it would redirect you to a page telling you to install Java.
  • With the latest update (1.7.0_45) there are no longer blocking issues, but there is an annoying dialog that pops up every time I access the bank website: [java.com]
  • Turns out it's because of a limitation in the security mechanism for LiveConnect , a JS mechanism to call from a website into an applet; there is no way to make it work for all versions of Java: [blogs.oracle.com]
08.12.2013 Minors and airline points
  • Did you know TAM only lets you enroll into their alliance program children that are older than 2 years old? I generally wouldn't care to give a corporation a children's data, but the round trip to Taipei is probably worth a free flight or two so it would be nice to get.
  • It turns out China Airlines and SAA don't let you either. I guess those points are void :-/
  • Just recharged my TIM and Vivo pre-paid chips with 50 and 60, respectively. They are supposed to last 180 days. Will they?
03.10.2013 NFS TUNE
21.08.2013 Catch-all
  • Lent: GoPro to Iuri
  • Lent: Wheel-bag to João (PT)
  • Lent: Wheel-bag to Ozias (Disc)
20.08.2013 GRUB2 RAID weirdness.. understood and solved
  • I rebooted the server for the first time (after the failed disk) to cope with a kernel upgrade. No, I haven't yet swapped out the disk -- the chassis makes it a bit painful. But anyway, to my surprise, the system got stuck in the grub prompt, and I couldn't get it to boot by specifying the linux and initrd lines. Why?
  • One symptom I found was that cat /boot/grub/grub.cfg returned garbage. The other was that there was only one kernel version listed in /boot, although I had just upgraded the kernel so there had to be at least two. And yet another was that a file I had touched in /etc was also garbaged up. What's going on?
  • It turns out that grub was assembling the raid array using the failed drive (with SCSI ID 8) instead of the spare. It's really interesting that grub does a read-only RAID mount, but with very little checking, so when you have failed drives it show you weirdly half-stale data. To address this I disabled the drive in the server's SCSI utility and booted again, successfully. It gives an indication of how the grub RAID code works; I wish I had a way of saying what drives were being used in an md array as it would have saved me a lot of pondering.
  • Oh, and I used the SCSI utility to verify the drive as well. I have re-added it back and am waiting for it to fail again. The IBM drive seems to be really slow.. or maybe it's that it's on the secondary SCSI interface with a slower tape drive on it as well.
05.08.2013 RAID drive failure
  • Our sdd drive (SCSI ID 8) was kicked out of the raid because of an abort SCSI command overnight, at 3:36am local time to be precise. I'll add it back to the array after a reboot to see if it is transient or if it's really dead. The spare meanwhile seems to be working okay, but the resync takes ages..
  • It's worth noting that this fall-back setup has become weirdly slow. I'm still trying to figure out if it's the disk or something else.
06.06.2013 Java Banco do Brasil Locale bonging
  • Banco do Brasil's Java authenticator won't work in Mari's chromium browser, but it works in Firefox. What is the difference? Well, in chromium, we run into this issue: [groups.google.com] which has shown up on [bugs.debian.org] and [bugs.debian.org]
  • So Mari's locale is pt_BR.UTF-8. But the question is, if the error really is locale-dependent, why doesn't it trigger for Firefox?
  • I know now one more piece of the puzzle. We know the system locale is pt_BR.UTF-8. But if I visit [www.browserleaks.com] with both browsers I notice there is an important difference when displaying the results of Locale.getDefault() -- Chromium displays:
     Locale.getDefault() :
     Language Code   en
     Display Language    English
     Country Code    US
     Display Country United States
     
    whereas Firefox gives me
     Locale.getDefault() :
     Language Code   pt
     Display Language    português
     Country Code    BR
     Display Country Brasil
     
    and that's likely to explain why Chromium fails (C locale parsing assuming a dot as the decimal separator) while Firefox succeeds.
  • I'm still not sure what triggers the BB machine authentication reset that we run into periodically. So far I know that kernel updates do trigger it. What doesn't: java updates, firefox updates. Unknown: changing from OpenJDK to Oracle Java.
23.05.2013 The Joule
  • My Joule was acting up, so I tried to kick it into submission by formatting its partition via mkfs.vfat. I ended up with a filesystem with a bunch of garbage files that I couldn't quite figure out. I called Saris up and they said literally "don't use FAT32 or anything, just plain FAT" which I took to mean mkfs.msdos. That ended up creating a 12-bit FAT. I ended up with an empty drive. So far so good. Then after disconnecting and reconnecting the device to the computer, maybe a few times, it automatically created a CycleOps directory with a config subdir below it. Perfect! I guess the corrupted FAT entries suggest that the device's firmware only knows to write FAT-12 to it.. I need to check another device to confirm.
22.05.2013 Packet Mystery
  • My firewall ends up occasionally seeing traffic on a certain interface with the wrong source IP address. Why is that?
  • This is interesting but I believe unrelated: [blogs.oracle.com]
  • Wow, Java history is complicated.. or maybe just interesting [weblogs.java.net]
29.03.2013 Movie catch-up
  • Silver Linings Playbook
  • The Prestige
  • Un Cuento Chino
  • XXX bad movie about haunted house
28.03.2013 Shaping
  • We're planning on shaping our main incoming link to see if it can carry our regular traffic together with VOIP. I'm storing some pointers here to help me when we get to that: [en.wikipedia.org] [forums.juniper.net]
23.11.2012 Where is my /tmp/.X11-unix directory?
  • It's missing on all our diskless machines. What's going on?
  • Dunno, but it solved itself as part of regular updates and a pretty major fix to our diskless /etc/init scripts; there were subdirectories inside /etc/init with older copies of /etc/init/*.conf, and it turns out upstart also parses subdirectories -- oops!
  • Got an icalendar invitation viewer for mutt set up using [nickmurdoch.livejournal.com] though it did force me to use gem install which is so not the way to do it in Ubuntu!
22.11.2012 New link, and the shaping of ingress traffic
  • We had a new internet connection installed today, and it's a 4Mbps premium, unshaped link. In a weird encounter, however, this line from wondershaper completely kills my download performance (measured by a simple wget) on it:
     # tc filter add dev eth0 parent ffff: protocol ip prio 20 u32 match ip
         src 0.0.0.0/0 police rate 3800kbit burst 10k drop flowid :1
    I can't figure out why. I thought that maybe the rate stuff was wonky and wanted to use the avrate policer, but that doesn't work either:
     # tc filter add dev eth0 parent ffff: protocol ip prio 20 u32 match ip 
         src 0.0.0.0/0 police avrate 380 reclassify flowid :1
     RTNETLINK answers: Invalid argument
     We have an error talking to the kernel
    I thought that had to do with NET_ESTIMATOR being missing in the kernel config, as the LARTC meantions that estimators needing to be compiled into the kernel, but it seems that option is now gone and they are always built in. Oh, I see what's missing -- an "estimator" option. So this runs ok:
     # tc filter add dev eth0 parent ffff: protocol ip prio 20 estimator 1 2 
         u32 match ip src 0.0.0.0/0 police avrate 380 reclassify flowid :1
    Unfortunately, I'm still only getting 50% of what I expected..
08.10.2012 Bike Updates
  • Swapped the F1SL's chain, and replaced the F2C's rear derailleur cable. And then the week ended, and I left the office!
12.09.2012 So halt -p huh? And rsyslog
  • It seems that /sbin/halt no longer turns the system power off, and you need to run poweroff (or halt -p) instead. Did you know that?
  • rsyslog in Ubuntu has a rule that provides an admin feature I've always loved; the ability to log stuff to /dev/tty*. Yes, it's a bit of a security disclosure but I figure if you have tty access anyway..
  • The problem is it ships broken by default in Ubuntu. The issue has to do with privilege dropping; [kb.monitorware.com] notes correctly that the PrivDrop bits in rsyslog.conf cause tty writing to fail. What it doesnt't note is that you can use named pipes for this functionality and it works just fine; I found this out in a very unlikely blog comment here: [mikebeach.org]
  • So all you need to do to get this to work is to use "|/dev/ttyX" as the destination string for the facility. Cool!
10.09.2012 DD-WRT, time and NTP
  • [www.dd-wrt.com] has a pretty complete analysis of just how wrote the timezone and NTP handling in DD-WRT is busted. No wonder it was confusing me! For now, just using UTC on the device seems wisest.
09.09.2012 Cycling Metrics and The GoldenCheetah Performance Manager
  • For people looking for a way to get performance manager-style data from GoldenCheetah, in particular how to get the Performance Manager graph to make sense, check out the guide at [www.cyclingmusings.com] which apart from making sense of the metrics and PM tabs is particularly useful in explaining that you need to have Power Zone data entered (under Athlete options) in order to get BikeScores -- probably the #1 gotcha there. You can also get an idea of the mechanics behind the Metrics tab in a webcast at [bugs.goldencheetah.org] -- particularly interesting are the pieces that describe saved charts and user-defined ranges, which are hard-to-impossible to discover through the UI directly. And the user-defined ranges are automatically available as seasons in the Critical Power chart, which I found out about in this post: [groups.google.com]
  • And, if you're confused about TSS, CTL, ATL and TSB not appearing in GoldenCheetah, the thing to know is that GC 2.x uses Phil Skiba's metric system, which maps to the Coggan model as below:
     TSS => BikeScore
     IF  => Relative Intensity
     NP  => xPower
     VI  => Skiba VI (XPower/Average_power)
     CTL => LTS
     ATL => STS
     TSB => SB
    AFAICT Skiba VI is only visible in the metrics tab.
06.09.2012 PSU Again?
  • Actually, not so fast on the PSU fix. Ever since we put the drives back in the box, we've had somewhat random SCSI errors. This morning, when I installed a new network card, though, I'm unable to get the thing to work again. And get this: it only happens when the drive caddy is inserted into the case. If the caddy is sitting outside of the box, everything works fine. But the moment I slide it inside the case, SCSI errors galore. And I've replaced the cabling, improved the drive positioning, disconnected fans.. my hypothesis is a PSU grounding problem. But I need to get a replacement to actually verify..
21.08.2012 Clamav on Ubuntu as a Sendmail Milter
  • I am a bit surprised that nowhere is there good documentation on how to get Clamav running on Ubuntu as a milter. It's actually pretty easy.
  • sudo apt-get install clamav-milter
  • sudo freshclam
  • sudo /etc/init.d/clamav-daemon start
  • Add
     include(`/etc/mail/m4/clamav-milter.m4')dnl
    to your sendmail.mc file.
  • make && /etc/init.d/sendmail restart
  • That's it -- you'll have a working installation that is already scanning, quarantining and updating the virus database. I'm not sure what exactly causes freshclam -d to run in the steps above, but it's a daemon that will keep your database up to date.
  • To test, just bounce a message containing a virus (you probably have too many!) to yourself. It'll be put in quarantine mode, which I took a long time to figure out is actually a special sendmail queue, which you view like this:
     kiko@anthem:/etc/clamav$ mailq -qQ
     MSP Queue status...
     /var/spool/mqueue-client is empty
             Total requests: 0
     MTA Queue status...
             /var/spool/mqueue (1 request)
     -----Q-ID----- --Size-- -----Q-Time----- ------------Sender/Recipient-----------
     q7LCqigM003096    40178 Tue Aug 21 09:52 
          QUARANTINE: quarantined by clamav-milter
                          
             Total requests: 1
  • Quarantined messages are going to be in the same /var/spool/mqueue directory, but are prefixed with "hf". To remove stuff from the quarantine queue, you can use
    /usr/share/sendmail/qtool.pl -d -Q /var/spool/mqueue
    and you can also use qtool.pl to remove individual files.
  • If you don't want the messages quarantined (I probably don't) you can just set the configuration option "OnInfected Reject" in /etc/clamav-milter.conf. Note also that Stephen Warren suggests "AddHeader Add" here: [bugs.launchpad.net] Once you've made any configuration changes, just run an /etc/init.d/clamav-milter restart.
  • So far I am /very/ impressed with how simple and well it all works. Kudos to the project team who has come up with a very simple design -- scanner daemon, milter, database update daemon, and that's it. The packaging is also really nicely done, with the user permissions set up correctly and intuitively. It eats up some memory on the server, but we have so much anyway..
20.08.2012 Avenging Spelling
  • Every once in a while you receive the odd surprisingly fantastic message, and this weekend's winner says this:
    Hi, there is a small typo in [www.linaro.org] .. commecial should be commercial. No need to thank me, it's what I was born for.
    It's then signed "The Spelling Avenger". So how cool is that? I wish he had a website to link to..
17.08.2012 Pop and there goes a pirate
  • Our server's Corsair AX750 power supply just gave up the ghost, about three years in. Swapping it out was tough, in particular because the drives didn't seem to enjoy the whole movement. We installed spare drives, futzed around and finally remounted the original configuration correctly. Go figure!
  • Pretty cool reference on what directories to exclude from an Ubuntu backup: [askubuntu.com]
09.07.2012 Tracking Hangouts
  • Our somewhat complex multilink setup here at the office has a low-latency line which works really well for VOIP and video conferences, but to make the policy-based routing work, we need to know what hosts we are sending traffic to. Google Hangouts presents that challenge, since it's unclear what the hosts involved are. Plus, it changes! It used to be that talkgadget.google.com was all you needed to track, but now they've added stun.l.google.com and I'm still figuring out if that's all I need to pay attention to..
08.07.2012 If Unity won't run..
  • [askubuntu.com]
     kiko@gasolinux:~$ /usr/lib/nux/unity_support_test -p
     OpenGL vendor string:   VMware, Inc.
     OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 0x300)
     OpenGL version string:  2.1 Mesa 8.0.2
      
07.07.2012 War and the X7DVA BIOS
  • Our Supermicro X7DVA-8 server has ECC memory, and the memory and Northbridge run REALLY hot, so hot you can't touch the parts when the box is running, and if you load the server up too much the temperature trip squeals. So I decided that we could try updating the BIOS to the latest (albeit 2008) version. However, I was unable, I mean, TRULY unable, to get it to boot from USB. I managed to do it once, on a fluke, but the drive failed to boot, and after that I could never do it again. I ended up following Johan's advice and using a SATA drive with Freedos on it.
  • Booting USB on the server never worked because, although the BIOS could see the drive, it didn't seem to identify it as bootable; the drive would never appear as bootable on the F11 menu, nor on the BIOS boot order menu -- so though the drive /did/ appear in the BIOS listing, it never had a sequence number next to it, while the first 8 entries have numbers 1-8 next to them.
  • Getting Freedos on the SATA drive was less trivial than I thought it would be. I had a Freedos USB stick, and I can boot from that on another workstation, so it should be a matter of doing a format /s, or an old-skool sys c:. But getting the SATA drive to actually boot was tricky, and I ended up being reminded of the magical incantation:
     fdisk /mbr
    which was exactly what was missing. Getting fdisk.exe (and format.com) involved ISO-mounting the freedos ISO, but that wasn't so hard in the end. And once I managed to boot, the flashing procedure was trivial and only took a few seconds (just type in "flash x7dva208.rom").
  • However, after flashing, the system continues to run as hot as ever! At least the BIOS update got us up to DMA version 2.5, and perhaps because of it, or because I tweaked a bunch of BIOS settings, my lshw now reports I have L2 cache. This bothers me though... does it mean I didn't have this before, or that lshw can how see it?
             *-cache:1
                  description: L2 cache
                  physical id: 7
                  slot: L2 Cache
                  size: 12MiB
                  capacity: 16MiB
                  capabilities: burst internal write-back 
    Now why does the capacity not match the size? The mysteries of modern hardware..
04.07.2012 Vitamin D and the endurance athlete
  • My wife's serum Vitamin D came up rather low. So I did some reading about it over at [www.ncbi.nlm.nih.gov] and, particularly interesting for an athlete, [www.vitamindcouncil.org] Mariana never leaves home without a serious sunscreen, and we both always shower after riding, so it's probably advised to supplement, particularly for Mari whose serum levels are low. I like this coach's take on it: [lwcoaching.com]
  • Iron, Creatine, Beta-Alanine, Lysine, Alanine, Niacin, Vitamin C, Zinc.. Sometimes I am reminded how crazy it seems to take this many supplements every morning, and I eat a /lot/ of variety -- that's life being a vegetarian athlete!
16.06.2012 spambayes 1.1 and persistent_use_database
  • If you used spambayes like me -- and truly apparently only me, since there are ZERO hits on Google for this issue -- and have recently upgraded to Precise, you will probably find your .proclog spewing errors like this:
     Attempted to set [Storage] persistent_use_database
     with invalid value True ()
    This is actually caused by a change in a spambayes' configuration key, which is being set in your ~/.spambayesrc to "True", which used to be the right way to do it -- and still is according to the Debian package's README. If you look at spambayes' Options.py you'll find out that this is now a string field, with the following options available:
     ("zeo", "zodb", "cdb", "mysql", "pgsql", "dbm", "pickle")
    Apparently the old format is dbm, which is sort of the new default (the default is runtime selected between ZODB and dbm), but it's probably sensible to be explicit about it, so into .spambayesrc it goes!
12.05.2012 rdiff-backup and a full disk
  • Do you know what you get when you mix rdiff-backup and a disk full error? I do now:
     File "/var/lib/python-support/python2.6/rdiff_backup/regress.py", line
     290, in restore_orig_regfile
     tf.write_from_fileobj(rf.get_restore_fp())
     File "/var/lib/python-support/python2.6/rdiff_backup/rpath.py", line
     1195, in write_from_fileobj
     copyfileobj(fp, outfp)
     File "/var/lib/python-support/python2.6/rdiff_backup/rpath.py", line 64,
     in copyfileobj
     outputfp.write(inbuf)
     IOError: [Errno 28] No space left on device
    Besides being a royal pain in the ass you won't believe me when I tell you that to repair this requires actual surgery on the filesystem. Here's a bug report [savannah.nongnu.org] and a thread that discusses the issue more widely [comments.gmane.org]
  • The solution I found was to find a couple of really large files (MOVs in fact) in the rdiff-backup-data subdirectories and move them around to a separate filesystem while running --check-destination-dir. Hopefully that won't error out completely -- still running.
  • Ah, indeed rdiff-backup can cope with that -- it basically creates zero-sized files where it would have placed the original file and moves on. In my case it's slightly more complicated to interpret because this is actually a recovery pass (using --check-destination-dir) from a backup that failed and therefore the recovery pass is trying to recreate files in the rdiff-backup master directory which are actually deleted in the live system. But that's easier to amend later!
  • The best solution I've found to this problem, so far, is to keep some easily-freed large files on the filesystem. That way, even if you /do/ run out of space and crash, well, you can move them away and then recover.
05.05.2012 Swaps
  • Swapped the battery on my Powertap Pro
  • Also swapped stem and handlebar on the F2C
24.04.2012 Cycling Complexity
  • Finding the exact replacement part I need from Shimano proves tricky: Y-4BN98060 is what I need according to [www3.big.or.jp] but it did take a while for me to figure out I had an SL wheelset. In fact, I by mistake bought a pair of Y-4B909000 (confusingly labeled 4-4B909000) only to find it didn't actually work! Of course, that is for the WH-7800, etc, etc. Damn.
  • /usr/share/xsessions is where GDM finds the environments available for users to log in to the system.
  • How cool is [www.asciiflow.com] huh?
19.03.2012 Sloppy focus on Ubuntu Unity
21.02.2012 TouchPlayer
  • Man I'm loving the HP WebOS Touchpad. I'll write up a proper blog entry about it, but if you're trying to get an application to play random AVI and WEBM videos on it, you'll need to install a third-party application. I installed TouchPlayer, which apparently is a build of mplayer. It wasn't exactly trivial, essentially because you need to use a host PC to do the whole process.
  • First, you enable "developer mode" on the Touchpad, which involves typing stuff into the tablet's search bar.
  • Next step, you run, on a host computer, something called "WebOS Quick Install". This is just a java archive you run doing "java -jar WebOSQuickInstall-4.4.0.jar". You can get it from [dl.dropbox.com]
  • Before running this for the first time, you need to install a "driver", which is called palm-novacom; you can download the .deb from [developer.palm.com] -- beware, it will run a daemon, which kind of freaks me out.
  • You can now connect the tablet to the computer. You now run WebOS Quick Install, which should detect your device fine. Now, click on a little networky icon, and select "Preware". This gets installed on the actual device. You can later use this on the actual Touchpad to install applications, and there are quite a few.
  • Last stop! You then need to download two .ipk files and install them on the tablet through WebOS Quick Install. The first one is for the filemgr service, which should be installable through Preware, but which is currently 404ing -- no worries, a version is available from [code.google.com] Touchplayer itself is available as a download from [mobilecoder.touchpadhp.info] -- just install both ipks and you'll have it available on the device.
  • That's it -- disconnect and enjoy!
20.02.2012 Blobs
  • I just extracted Nvidia's driver source from Ubuntu's latest package and generated an ls -l on kernel/. It's interesting that it's a non-GPL'd kernel module, something which I know is both kinda rare and kinda controversial. The full list is at [pastebin.com] but the remarkable thing is this blob in the middle of it:
     [... dozens of 1-100K .c files]
     -rw-r--r-- 1 kiko mondo 13444768 2011-04-18 18:54 nv-kernel.o
12.02.2012 IcedTea and Banco do Brasil
  • I know that IcedTea really didn't /use/ to work with BB, forcing us to install Sun Java, and yes, even the bank says that at [seg.bb.com.br] but while it is true that in Chromium it doesn't appear to work, if you run Firefox in Oneiric it works exactly as you'd expect it to. Yay!
29.01.2012 Google Talk IPs
  • 74.125.93.127, 74.125.93.126 for the actual STUN and UDP media traffic
  • 74.125.93.102 for the HTTPS traffic.
23.01.2012 Randomness
  • Installed today a new front tire on my Felt F2C, a Schwalbe Durano Plus; let's see how long it will last. The last one, a Specialized All Condition tire, is my current favorite, but I got a nasty sidewall cut in it with practically new thread that kind of miffed me.
  • I did a TV interview today about the educational work we're doing with cyclists and drivers in São Carlos: [eptv.globo.com]
  • Also posted a reply to a rather misleading analysis of the effects of open source on the market, this time directed at Stoq, who the poster says is "killing commercial sales-focused ERP software" -- I do wish that part was right, though! [www.guiadopc.com.br]
  • Fixed my diNovo Edge keyboard again on Oneiric (and chattr +i'd the udev rules file to stop it from breaking!) as per [ubuntuforums.org]
  • Where did my panel volume control go? Well, actually, in 11.04 and onwards, the volume control is now provided by an indicator applet, so what's you're really missing is the indicator applet! Just add it back and be happy.
  • Need to replace your PowerTap bearings? Check out [www.youtube.com]
13.01.2012 Happy New Year
  • The random changes over the new year bring me a few new bits of wisdom. I'm doing the finishing touches on a migration from a 32-bit Lucid on an X61 to an amd64 Oneiric on my mom's new X220 with SSD. She's not sure she likes it yet, but I'm trying hard so she does!
  • The first hint is a workaround for the very odd 40 second hang that you get with OpenOffice (and co) when using your computer in a network whose DNS server doesn't respond to weird queries. It seems that OOO is doing a DNS lookup for, literally, "foobar.(local)" where foobar is your local hostname. The way to solve that is to add a weird entry for the machine in your /etc/hosts file. End of hang!
  • Second is how to install hamachi on Oneiric. Just pull the file from a helpful PPA on [launchpad.net] -- this and hachigui, which you can get from webupd8team's PPA at [ppa.launchpad.net]
  • Third is a reminder of the Qt4 problem that happens on Oneiric; running something like GoldenCheetah fails because of [bugs.launchpad.net] -- or maybe I should say maybe because of, because that bug is fixed and yet this fresh Oneiric install can't run GC. Never mind, install qt4-qtconfig and then run qtconfig-qt4 to select Cleanlooks to get it working.
  • There is so much stuff that I just know nothing about. Today, it's Xen. I'm actually looking for cases of things that exist in the kernel source tree but which don't build into a kernel itself; I know perf is one such thing, and kvm-tools might be another (see [lwn.net] for Ingo's rationale of why more userspace should go in). In fact, there's a bunch of stuff in tools/ -- cpufreqtools, turbostat -- enough that in Ubuntu you get this all through the linux-tools packages. Outside of tools, nothing I can see. Well, there's scripts, but the distinction between tools and scripts is blurry to me -- FWICT scripts is for tools related to managing the kernel source tree, whereas tools is for userspace tools you'd use inside the OS. (Sidestep into coccinelle, which I had seen in-tree but didn't know what was -- it's a tool to describe and help apply a semantic change to source code) There's samples/ too, which contains example kernel and userspace code. Finally, there's firmware/, though that's legacy firmware pulled out from old drivers that are being moved to use the external linux-firmware package and the request_firmware() API. So there are quite a few things, but none of them seems to be a hypervisor (or a similar runnable hyperthing).
  • Anyway, short story is Xen itself, the hypervisor (in other words, the 500k-or-so /boot/xen-*.gz) is not in-tree. It is distributed standalone from [www.xen.org] and includes quite a bit of code that is forked from the kernel (obvious examples are bunzip.c and the acpi/tables stuff). Support for running the kernel as a Xen dom0 guest works out of the box as of recent kernel versions.
01.12.2011 SRAM Chains
  • Are not directional. At least the PC1070 chain I replaced on my F2C isn't! Apparently all the Shimano 10-speed chains are, though: [www.bikerumor.com]
  • Insightful link on the risks of sendmail virtusertable remote forwarding: [www.clasohm.com]
  • My Nokia 3250 was driving me crazy asking me "permitir ao cartão sim o envio da mensagem" (which actually means "Allow SIM to send message?"). Turns out you need to disable Settings->Phone->Security->Confirm SIM Services, and it happens because it's a 64K SIM and this phone doesn't support something called ENS. More information here: [www.howardforums.com]
  • Do you have an old server upgraded to Natty that's not getting its grub updated when you install a new kernel? Well, I did, and the reason was it was missing a single line in /etc/kernel-img.conf:
     postinst_hook = update-grub
    Thanks to Tim Gardner and Steve Langasek who helped track the problem down.
18.11.2011 Multi-homed pain
  • Upgraded Anthem to Oneiric; everything works well EXCEPT for the fact that some packets are ending up in the wrong interface. Specifically, if a request comes in to the interface which isn't the default route, it's not being replied to on the same interface. What's going on?
  • Turns out that SOMETHING happened in the 3.0 timeline that finally made it mandatory to specify rules that explicitly set the right routing table for replies on each IP address. So the fix was just adding these two lines:
     from 189.19.234.109 lookup dsl-eth2 
     from 200.210.17.18 lookup dsl-eth0
    That was really all that was missing. And, if you read the rules, you gotta ask yourself how it is possible that this worked before!
  • Of course, after I've found out what it is I find this site from 2007 explaining exactly this: [kindlund.wordpress.com]
  • Ran into bzr bug with our managed /etc. Damn it! [bugs.launchpad.net]
16.11.2011 The Eyes!
  • I've finally booked eye surgery for next week; Monday I go in for exams and then Tuesday or Thursday is the actual operation.
  • Interesting research in healing of Lasik-cut flaps: [www.usaeyes.org] -- damn, 2 years is a long time to wait for safety, though!
26.10.2011 Clearing out the attic!
  • Just deleted my ~/tmp, ~/Downloads, ~/devel/FREEZER today, so if I need it, remember to look at yesterday's backup ;-)
07.07.2011 A History of Tabs
  • I'm today surprised and mildly annoyed that the shape of Chromium's tabs are exacly the same as the tabs I designed in the first version of the tool www.lecto.com.br uses internally.
  • Was stuck doing a nfs-common update on the diskless, which seems to keep breaking in the postinst phase as nobody really runs nfsroot at Canonical (wink), but found out that dpkg keeps all the scripts in /var/lib/dpkg/info, and you can just hack the file to make the postinst pass and forget about it for now. Yay!
06.07.2011 Xen and Natty
  • Don't seem to mix, we found out when upgrading dragon2 to Natty, unless you update /etc/initramfs-tools/modules to include platform_pci and xen_blkfront. Read all about it at [bugs.launchpad.net]
05.07.2011 GRUBbed out on a Sandybridge
  • Since in the office we only use diskless machines which use gPXE, I end up not worrying about Grub very much. It turns out Grub2 has some weird traits, some great, some bad, and some probably a bit buggy.
  • For instance, with the default settings
     GRUB_HIDDEN_TIMEOUT=0
     GRUB_HIDDEN_TIMEOUT_QUIET=true
     GRUB_TIMEOUT=10
    I can never seem to get access to the menu. I ended up adding a boot tone -- Super Mario no less at [www.reddit.com] -- and then cranking HIDDEN_TIMEOUT to 1, which allows me to press two escapes during the tune and getting to the menu. The _QUIET thing is interesting; if you mark it false you get an ASCII countdown instead of a blinking cursor.
  • When I update-grub on this computer, I /must/ do a grub-install /dev/sda or I get into an infinite reboot loop without any recourse. Recovering involves using a USB boot image which fails into an initramfs prompt without trivial access to my RAID1, so I am not really interested in debugging beyond this.
  • This is Maverick's grub2, so it might be a solved-in-Natty thing.
  • Oh and, please note: if you have a Sandybridge CPU, Memtest86+ 4.10 will hang before starting up. You need to run at least 4.20; thankfully it's pretty easy to replace it -- download and stick it into /boot.
  • Mari's new box also suffers from [bugs.launchpad.net] so I'm also figuring out an updated e1000e driver for her. Ended up using DKMS to do the build, which isn't as hard as it looks. Just use a config like this:
     PACKAGE_NAME="e1000e"
     PACKAGE_VERSION="1.3.17"
     MAKE="make -C src/ BUILD_KERNEL=${kernelver}"
     CLEAN="make -C src/ clean"
     BUILT_MODULE_NAME[0]="e1000e"
     BUILT_MODULE_LOCATION=src/
     DEST_MODULE_LOCATION[0]="/usr/src/"
     AUTOINSTALL="yes"
     DEST_MODULE_LOCATION[0]=/extra
    make sure you have the right headers packages installed, and grab the latest e1000e driver from [downloadmirror.intel.com] -- funny thing was, I had it all set up when I installed the headers, and it Just Worked as part of the post-install hook. I did an update-initramfs -k all -u just in case, though.
  • Finally, this specific Sandybridge Maverick setup ended up with a horribly slow Xorg; turns out it's trivial to get it running fast by using the PPAs suggested in [ubuntuforums.org]
27.06.2011 A RAID Tale
23.06.2011 An ARM Machine List
22.06.2011 Eating those leftovers
  • Started the day by noticing the backup failed because the disk was full. Turns out I was a) backing up a bind-mounted /proc and b) had an extra copy of the root directory backup. Cleaning these up via rdiff-backup --check-destination-dir.
  • Updating Chromium to the latest beta fixed the hang I was seeing -- nice when it's easy to fix something like that!
  • The GC loader worked, but I needed to update /etc/group in the diskless system to get it to read /dev/ttyUSB0.
  • Worked around the weird upstart hang by checking for statd running inside diskless-mount, and used the same approach to avoid having to do the ugly sleep 6s inside statd-start.conf.
  • My GF account is unblocked.
  • Note to self: just rename .conf file extensions to something else if you want to disable them.
  • The only weird thing is that I asked this machine to halt and it.. seems to be hanging; or at least, taking a real long time. Okay, it seems to be that initctl is telling portmap to shut down, but it isn't dying, or maybe it's not even getting that far. I change umountnfs.sh to do an emit --no-wait to work around this and move on.
  • If you're using nautilus on an NFS serve, for instance, looking at your home directory, you may find it hangs to a horrible halt. The problem is related to apparmor: [bugs.launchpad.net] -- the workaround is pretty simple, adding the nameservice stanza to evince-thumbnailer.
  • References you need when doing this sort of work: [nfs.sourceforge.net] [upstart.ubuntu.com] [upstart.ubuntu.com]
  • Finally, the command you want when you want all your UUIDs neatly presented: blkid(8)!
21.06.2011 Natty boot leftovers
  • Sometimes, I'm getting a weird hang at bootup that says:
     IP-Config: no response after 3 secs - giving up
     IP-Config: eth0 hardware address XXX mtu 1500 DHCP RARP
    Online reference: [bugs.debian.org]
  • I've just noticed that cups and ssh get stuck in "starting" states; wonder what's causing that. SSH at least starts up okay at first, though if I stop and then start it, it hangs. I thought it had to do with both being "respawn" jobs, but acpid is as well and I can start and stop it without issues. Does this have to do with the /state stuff I made wait on both those services... or are they just being very very slow?
  • I wasn't logging stuff properly; turns out my rsyslog.d directory didn't have a proper 50-default.conf file, and nothing was getting logged out. But it is also being affected by this weird state blockage I am pointing out above.
  • Chromium won't load any pages. Gah.
  • Need to check if the PowerTap loader in GC still works.
  • Need to check if my GF account is unblocked.
  • rsyslog whines that xconsole is missing, but runs fine; Ubuntu bug: [bugs.launchpad.net]
  • The backup failed because the external drive ran out of space. Guess it's time I went to bed :-/
20.06.2011 Updating Root NFS to Natty
  • I'm in the process of updating our default root filesystem to Natty. I started out a bit stuck on this because our previous filesystems used modified bzimages (XXX: which tool?) to indicate we were booting from NFS; reading through [help.ubuntu.com] I just needed to update the initramfs.conf to indicate BOOT=nfs and regenerate the initramfs'. The server IP address and root-path is provided through DHCP, and this change AIUI tells the initramfs to look there.
  • Now, /dev/shm was coming up with the wrong permissions -- it's meant to be 1777, but wasn't. I can't quite figure out why this was happening, but it has to do with some leftover in /etc/init because I started from a clean slate and it's not happening now.
  • Next, when we boot NFS we get hung in upstart; if I use the alt-sysreq key to kill everything I notice that a) no portmapper is running and b) neither is statd. If I try and run the portmapper manually, it just exits without telling me what's going on. A strace tells me that a) /var/run/portmap.pid can't be written to (sure, the filesystem is read-only) and /dev/log isn't running.
  • So first, let's see if we can mount that filesystem read-write up front. I try this first by putting an entry in the rootpath being served up by dhcpd, but I get nfs-premount complaints suggesting that it doesn't like the ",rw" suffix. Instead, I put an "rw" into the gPXE script commandline and it seems to work.
  • To check, I created an upstart script that simply spawns bash:
     description^I"BASH for the power hungry"
     #   
     start on startup
     #   
     # Output to the console
     console output
     # Tell upstart to wait (see
     # [upstart.at] for more)
     task
     # Run the command
     exec bash
  • And indeed, I can now write to the root filesystem. Great. Now on to debugging why portmap and statd don't run when they should. I can't seem to wrap my head around the statd-mounting and statd interaction, so I'm trying to break it into pieces. Now, I expected the following script:
     start on mounting TYPE=nfs
     task
     exec start statd
     
    would block the NFS mount from going through until statd was running. But either start is asynchronous, or it doesn't block at all. I guess it's because statd takes a while to actually get going, and meanwhile the "mounting" event has completed and mountall. My workaround looks like this:
     start on mounting TYPE=nfs
     task
     console output
     normal exit 1 2
     # 
     script
         start statd
         # This apparently is necessary to ensure the statd run
         # completes; it's a hack but it seems to work more reliably than
         # anything else
         exec sleep 6s
     end script
    and so far it seems to work okay.
  • I then added in scripts, in the following order: rc-sysinit, udev-fallback-graphics, ssh, dbus and gdm. Reboot.
  • Worked. Added now rc and rcS.
  • Noticed I should have brought in mounted-varrun and the other mounted-* bits earlier. Done, though I question how useful mounted-tmp and mounted-dev are in this diskless setup.
  • Brought in a few more hopefully harmless bits: console-setup, dmesg, hwclock*, irqbalance, control-alt-del and module-init-tools.
  • Pulled in a slightly modified ypbind startup script and made gdm depend on it. The script comes from [bugs.launchpad.net] though I can't quite get it to work with the IFACE=!lo check that it uses; I just dropped that check which should work okay.
  • I dropped the mountall-net script which seems to be a hack to work around the lockd issue I think I worked around in a simpler (if slower) way.
  • Started using bzr to control the directory as it's a much better match than this ranting blog entry ;-)
  • I'm thankful for Johan's hint to radeon.modeset=0 on the kernel commandline which (in combination with disabling udev-fallback-graphics.conf) allows me to actually see what is being spewed in the log. I'd love to have upstart just log all the events to a file..
  • Overall, the main issues with upstart racing seem to be around the time the daemon starts up and is actually ready to handle events, and to a lesser extent around the complexity of the state transitions themselves. In our case, while we hooked on mounting to start statd, running the statd.conf script instantly allowed mounting to proceed, which would fail because statd wasn't yet running (why 5s of waiting addresses that, though..) Or in the case of NIS, which is running but not enough for GDM to actually show the list of users.
  • Spend a few minutes figuring out why file locking was broken (again). Turns out that a) /var/lib/nfs needs to have the sm and sm.bak directories mode 700 and writeable by the user running statd. Who, incidentally, is taken from the owner of /var/lib/nfs, and which was incorrectly set to syslog on this system (and there was actually no lockd user in the passwd file, oops).
  • It's unlikely postfix will actually work without /var/spool set up for it. But how do you get it set up initially? Easy -- just create /var/spool/postfix; the rest gets set up by postfix itself!
  • The diskless boxes write to /var/log early in the boot process; I've worked around this by mounting a tmpfs there and later on mounting directories under /state to handle that more gracefully. Looking at the tmpfs /var/log generated without /state. up to gdm running, so far what's written to it is:
     total 336
     drwxr-xr-x 2 root root     60 Jun 20 21:02 ConsoleKit
     -rw-r--r-- 1 root root 108599 Jun 20 21:02 Xorg.0.log
     drwxr-xr-x 2 root root     80 Jun 20 21:02 gdm
     -rw-r--r-- 1 root root    292 Jun 20 21:03 lastlog
     -rw-r--r-- 1 root root   1615 Jun 20 21:02 pm-powersave.log
     -rw-r--r-- 1 root root 208615 Jun 20 21:02 udev
    I've stocked this in 00early-log.contents files in the directory for later debugging.
02.06.2011 Bootable DOS USB Drives (for BIOS updates)
  • Have this annoying issue of having to update a BIOS, but not having a bootable DOS USB stick to actually run the update? Well, I did, and I spent a LONG time reading through various confusing blog posts until I stumbled upon one that gave me two important nuggets:
  • First, use FreeDOS
  • Second, use makebootfat (see [linux.die.net] for a manpage)
  • This actually translates into a very small number of steps. First, you download the fullcd FreeDOS image from [ftp.ibiblio.org] and then you do something like this (watch out for sdX below):
     # You'll need to change only this line
     DEVICE=/dev/sdX
     # Set up your DOS filesystem
     sudo mount -o loop fdfullcd.iso /mnt
     mkdir /tmp/dosboot
     cp /mnt/freedos/setup/odin/kernel.sys /tmp/dosboot
     cp /mnt/freedos/setup/odin/himem.exe /tmp/dosboot
     cp /mnt/freedos/setup/odin/command.com /tmp/dosboot
     cp /mnt/freedos/setup/odin/more.exe /tmp/dosboot
     # Set up a config.sys
     cat << __EOF__ > /tmp/dosboot/config.sys
     DEVICE=HIMEM.EXE
     LASTDRIVE=Z
     BUFFERS=20
     FILES=40
     DOS=HIGH,UMB
     DOSDATA=UMB
     SHELLHIGH=command.com /P
     __EOF__
     # Get the FAT boot sector
     cd /tmp
     unzip /mnt/freedos/packages/src_base/kernels.zip source/ukernel/boot/fat16.bin
     mv source/ukernel/boot/fat16.bin .
     # Do it
     sudo makebootfat -X -o $DEVICE -b /tmp/fat16.bin /tmp/dosboot/
     
  • I'm probably overdoing it by using HIMEM, and selecting only a few files from the odin/ directory; you might be able to avoid the config.sys entirely, and just use the whole /mnt/freedos/setup/odin/ as makebootfat's argument instead of the dosboot thing.
  • The -X is the only gotcha. I think you need to use it because you're using the fat16.bin file (for compatibility, maybe?)
  • It's really that easy. I have no idea why people complicate this so much. The only other post I've seen which uses this strategy is [notes.realitysedge.com] but it is still not enough of a cookbook for me. I suspect there's some issue with BIOS compatibility, and that my method doesn't work for all USB drives or computers, but it does work for me. For reference, the original blog post that hinted me on this is at [blog.realcomputerguy.com]
  • And wow, FreeDOS really does boot fast.
24.05.2011 Mutt, vim and auto-completion
  • I recently changed my mutt options to use autoedit, which is cool because it puts me in vim very quickly to reply to email, but not so cool in that typing in addresses becomes a lot harder. Well, today I spent an hour working up something that autocompletes the addresses when I type tab in those fields. Enjoy my quick hack here: [pastebin.ca]
22.05.2011 From 1TB to 2TB
  • Mari's computer needs lots of disk space because it's where she collects the photos and pictures she publishes on [www.marignatios.com] -- and photos are huge. This weekend I had to move the files from the existing dual-disk RAID-1 to a new pair of disks.
  • It's a long story, but to shorten it: a) I used rdiff-backup to back up the actual filesystem, except for the images b) unfortunately, the only external drive I had that was big enough to hold all her images was formatted as vfat -- I knew I was going to regret it c) I ended up just rsyncing the images to the vfat drive which worked okay because there were no permissions or ownership to care about d) I had to use a live-usb image to actually set up the new disks and copy the data across and finally e) I had a hard time getting grub to work, and I ended up stuck in the grub rescue> prompt once. When that happened, I found [help.ubuntu.com] to be invaluable, so if you ever run into that prompt yourself and feel lost, just read that page. Once I had booted into the system once, grub-install and update-grub fixed it permanently. Two big hammers, but they fix things just like the old /sbin/lilo did ;-)
17.05.2011 Can't Mailman and LinkedIn just be friends?
  • We run a number of mailing lists at Async; quite a few of them are related to Stoq, our made-for-Brazil point-of-sales-and-everything-else management sytem. The lists are busy with lots of users that subscribe to ask about features and workflows and it is always cool to see the interactions there. However, there's one thing which really drives me nuts, and that is that because of web-based email integration, LinkedIn thinks it's cool to send email to our mailing lists. Well, it's not cool, but mysteriously, Mailman doesn't block the emails either!
  • The reason this happens is a subtle Mailman behaviour that I suspected yesterday but which Barry Warsaw confirmed today: by default Mailman also looks at the Reply-To header to check whether the sender is subscribed and therefore allowed to send mail to the list. The email we got had headers like this:
     From: Foo Bar via LinkedIn <member@linkedin.com>
     To: Bar Baz <stoq-users@async.com.br>
     Message-ID:
     <1494000622.13827915.1305593008105.JavaMail.app@ela4-bed35.prod>
     MIME-Version: 1.0
     X-LinkedIn-Template: invite_member_23
     X-LinkedIn-Class: INVITE-MBR
     X-LinkedIn-fbl: s-qakeuW-Xh7nGNqQ4F7rGOINKZVY7HNzQuIeYRlX2tnWAO4zNKkm
     Subject: [Stoq-users] Foo Bar quer manter contato no LinkedIn
     X-BeenThere: stoq-users@www.async.com.br
     X-Mailman-Version: 2.1.13
     Precedence: list
     Reply-To: Foo Bar <foo-bar-is-subscribed-to-list@hotmail.com>
      
     [...]
  • The email member@linkedin.com is actually in a discard_these_nonmembers configuration rule for the stoq-users mailing list, so it should be getting discarded. But because the Reply-To address is of a list subscriber, Mailman thinks that the email is truly being sent by the subscriber, and not by a proxy like LinkedIn. It happily ignores other sender filters and delivers the spam. Ouch!
  • To fix this, you need to update /etc/mailman/mm_cfg.py and set the SENDER_HEADERS variable; the default value (in /usr/lib/mailman/Mailman/Defaults.py.) is:
    SENDER_HEADERS = ('from', None, 'reply-to', 'sender')
    I ended up using simply:
    SENDER_HEADERS = ('from', None)
    and then restarted Mailman. And now I'm waiting to see what my moderation queue looks like -- hopefully it will prove that the change worked! Barry tells me that in Mailman 3 this behaviour is clearer, and that they also have a debug mode planned which would allow us to send a probe email to find out why Mailman is doing what it does. But for now, problem solved!
  • (Launchpad, btw, also does this impersonation trick in order for its bug mail interface to work -- but because of how accounts are set up we don't really make it easy for you to send unwanted mail to a mailing list)
16.05.2011 shirt sizes, aggregate totals and SUMIF
  • I produced a spreadsheet in Google Docs today that was a pretty simple mapping of name and shirt sizes to quantities; something like this:
     foo S 3 
     bar M 5
     baz S 2
     poo L 1
  • I wanted to include an aggregate sum of each individual size; something like:
     S 5
     M 5
     L 1
  • It turns out that the SUMIF function is what I want. You just need to get the syntax right; I used, for each of the sizes, a cell like this:
    =SUMIF(B2:B31, "=S",C2:C31)
    That formula will check column B for cells that contain the string "S"; where they do, it will sum the numbers in the corresponding row of column C, which is exactly what I wanted.
  • There might be a way to do this without having to actually code a cell for each total. But finding out how eluded me in the five minutes I had for this task, so SUMIF remains as my favorite solution for today.
09.04.2011 MSN on Pidgin on Maverick
  • On Mari's computer, Pidgin doesn't like MSN anymore; it reports
     1 account was disabled because you signed on from another location
  • I've seen this mentioned in a few places, but nowhere as a big deal for Maverick users: [developer.pidgin.im] [trac.adium.im] [permalink.gmane.org] [pidgin.im] [bugs.launchpad.net]
  • I /think/ the problem is just that Maverick's Pidgin is old; as has happened before, the MSN protocol was updated and the implementation wasn't. At least upgrading pidgin to the version packaged in their release PPA (see [www.pidgin.im] for details) was enough to solve the problem permanently.
08.04.2011 Path MTU Discovery mysteries
  • Again, facing P-MTU-D issues on my secondary outbound interfaces, and I don't think it's not the fault of the upstream provider (well, I tried with both, and maybe they are both broken). Symptoms are the usual large-transfers-get-packets-dropped-silently-when-incoming.
  • It could be a network card issue, because I'm seeing errors only on that interface:
     RX packets:3058114 errors:14691 dropped:0 overruns:0 frame:14691
  • And I'm fascinated by the reply Stephen Hemminger gives at [www.spinics.net]
  • But for now the workaround is to use the good ole MSS clamping cheat covered at [lartc.org] (how did that disappear from my iptables rules, though..)
  • For other possible problems, look at [prowiki.isc.upenn.edu]
  • In updating the subnet entries, I cheated on the boolean math and used [jodies.de] to calculate the masks and addresses, which is pretty neat.
06.04.2011 Building GoldenCheetah 3.0
  • Midnight project. Trying to build GoldenCheetah's 3.0 branch requires lots of package scavenging, installation and some makefile hackery. But when it does build:
     kiko@gasolinux:~/GoldenCheetah/src$ ./GoldenCheetah 
     Cannot open qollector_interpret program, available from
     [opensource.quarq.us]
     QMetaProperty::read: Unable to handle unregistered datatype 'RideItem*'
     for property 'RideMetadata::ride'
     QFileSystemWatcher::addPaths: list is empty
     Segmentation fault
  • Sigh.
  • A make clean and make later, it seems to kinda work though! I am getting weird results on the graphs for imported rides, so I need to test next by downloading from the actual PT head to see what I think.
15.10.2010 Full CUPS and an empty lsusb
  • Mari's printer won't print. It's not the second time this has happened, but I keep forgetting what causes it. The symptom is simple: you ask the print dialog to print, and nothing happens. The printer properties screen says "/usr/lib/cups/backend/hp failed". The CUPS error_log says
     D [21/Nov/2010:12:30:49 -0200] [Job 43] prnt/backend/hp.c 745: ERROR:
     open device failed stat=12: hp:/usb/PSC_1400_series?serial=BR64H3G1K704BM
    And the final, odd hint is that lsusb just returns you to a shell prompt with no output. Do you know what the problem is?
  • It's related to working around a PowerAgent bug I wrote about a few months ago. To work around a Java library's fixation on /dev/usb I added a link from /dev/bus/usb to /dev/usb, and forgot it there. What happens then is funny: for some reason when plugging in the printer (or returning from suspend, it turns out) udev wants there to be a /dev/usb/lp0 entry, and since /dev/usb is linked to /dev/bus/usb, it ends up creating the lp0 node in /dev/bus/usb/lp0 which in turn causes lsusb to break, since it doesn't expect to find any device nodes on the first level of /dev/bus/usb. Delete the /dev/usb link and you're back in business. Pity it took me 20 minutes today to remember this!
  • PS: the same sort of error hit us on Anthem, our server, a few weeks later -- the hp.c backend complaining about open-device-failed. It ended up also being an issue with USB connectivity -- the USB hub we're plugging the printer into is just flaky.
14.10.2010 For a rainy Saturday
13.10.2010 The keyboard layout that won't go away
  • Is it happening to you too? Ubuntu won't let me get rid of an incorrect keyboard layout, and in fact defaults to it!
  • The problem is pretty complicated, but it's related to GDM, .dmrc and /var/cache/gdm/*/dmrc files. Or at least I've figured this from the various places it's been reported: [bugs.launchpad.net] [bugzilla.redhat.com] [ubuntuforums.org] [bugs.launchpad.net]
  • I deleted .dmrc, /var/cache/gdm/kiko/dmrc, logged out, logged back in and the problem is gone. But I think there's definitely a bug in there..
12.10.2010 Anthem now 64-bit
  • After a whole day of work Anthem is now Maverick 64-bit installed on a simpler set of RAID-10s. Issues that we hit so far:
  • [bugs.edge.launchpad.net]
  • [bugs.edge.launchpad.net]
  • A weird long hang of sync and dpkg when under heavy I/O load that might a tuning issue or something else
  • No /etc/init.d/iptables to restore my firewall rules, but iptables-persistent saved the day.
  • Cold bootup via NFS to GDM screen in 55s; not bad IMO! However, I'm still stuck with a /var/spool filesystem that still won't mount at boot, though in recovery mode a mount -a fixes it. Mystery!
  • Note to self: /proc/sys/kernel/domainname can't be set to anything but the correct NIS domain, as ypbind will always use it (and not read files or anything else to get it right)
09.10.2010 From Jaunty to Maverick in many painful steps
  • Upgrading our serveraxis-hosted server from an ancient Jaunty to Maverick has proven to be trickier than I thought. Here's the list of problems:
  • The kernel we boot off is a Xen guest kernel and it's hosted outside of the image, so when I upgrade to Karmic mountall starts failing all over because the kernel is too old: [bugs.edge.launchpad.net] [bugs.edge.launchpad.net] [blog.webangel.ie]
  • Once we boot we get a recovery console. I manage to log in by using serveraxis' excellent webconsole, but to get the paste working there requires some tricking Firefox into letting JS apps access the clipboard: [kb.mozillazine.org] -- in particular you need to use the about:config trick for granting access to signed scripts.
  • Once there it's possible to bring network interfaces up and even to start sshd, but logging in via ssh doesn't work because there's no /dev/pts, which you can easily fix by just mounting it: [www.linuxquestions.org]
  • I decided to contact serveraxis support about the kernel and continue recklessly upgrading with a non-booting system in the hope the kernel can get sorted out separately; apparently I'm not the only one and they've made at least one customer happy: [jenniandjordan.com]
  • When upgrading to Lucid, the installation fails to install the Lucid version of mountall -- again, because the kernel is too old, though this time it's because it triggers a tar bug: [bugs.edge.launchpad.net] [bugs.edge.launchpad.net]
  • I managed to wget mountall and a new tar, but the new tar package doesn't install because of the tar bug. Not to worry: ar x tar_1.23-2_amd64.deb and grab the tar binary in the data.tar.gz, putting it in /sbin. Got the mountall package installed and apt-get -f install && apt-get -u dist-upgraded away, which brings us all the way to Lucid.
  • The reboot into Lucid was as eventful as the one to Karmic, but I'm now a pro at fixing it up so I can actually upgrade fine.
  • The server has been mysteriously halted as I was using it; either this is routine maintenance or somebody's doing something to the server for me! And indeed, here it is:
     kiko@dragon:~$ uname -a
     Linux dragon 2.6.35.4-2.pvops #0 SMP Mon Sep 20 18:32:22 CDT 2010 x86_64 GNU/Linux
    ServerAxis absolutely rock -- they are actually a bit scary even!
  • Finally, until the reboot happened I was getting for a brief moment a compatibility issue with dircolors that is best explained by this bug: [bugs.debian.org]
  • Back to where I started: getting bitlbee-plugin-skype working on Dragon so I can use it to talk to people without proper chat clients. Pulled and installed the two packages from sid: [packages.debian.org] [packages.debian.org]
  • Then installed a vnc4 server and client and ran skype manually on the server; after some rock and rolling managed to get it to log in and run, and the skyped configuration (once you understand it) isn't that bad in fact. The thing which I wasn't clear about is that a) you need to have skype running and b) running skyped is what will cause you to be prompted to allow Skype4Py to access skype. But now it's all clear!
  • Once the installation and skype were happy through xvnc, I installed xvfb and just ran skype and skyped with the display set appropriately; it seems to just work which is nice after a whole afternoon of things that just didn't ;-)
  • Found out this little command
     account set skype/display_name "John Smith" 
    That seems to work with MSN as well. Interested in knowing what it does (and if it's persistent across connections and Bitlbee restarts..)
09.09.2010 Memory and Amnesia
  • Upgrading a desktop to 4GB. I won't really believe that somebody can actually need four gigs of RAM on a desktop, but Firefox and OpenOffice keep surprising me as their appetite for memory grows! Anyway, the important thing to know about when doing this upgrade is that:
  • You really want a 64-bit installation if you are using more than 4GB of memory; you can use PAE if you want to stick with 32-bit, but I'm not entirely sure it's worth it.
  • Low-end motherboards, including those using the Intel i945 chipset, don't really support much more than 3 gigs of ram; they have I/O addresses within that address space, and they lack memory remapping functionality to make it work. [en.wikipedia.org] has the skinny, and [www.linuxquestions.org] discusses the actual i945 chipset. There's a blog post at [blogs.msdn.com] that presents it nicely.
  • Meanwhile, Frictional Games released a pretty amazing game called Amnesia -- and they have a version for Linux! Spend your US$20 wisely here: [www.amnesiagame.com]
25.08.2010 unclutter hurts
  • I have two annoying things bothering me in Lucid. First .xsession-errors fills up to gig-size every other day with some weird
     "Window manager warning: Got a request to focus 0x2821bae (Terminal)
     with a timestamp of 0. This shouldn't happen!"
    messages. Second, if my mouse is highlighting a window and I alt-tab to another one, every other second or so the focus would shift to the original window. That essentially drove me CRAZY. So today, after having to delete the file for the Nth time because it was at 2GB, I found it that the culprit is UNCLUTTER!! [bugs.edge.launchpad.net] has the scoop, but if you want a recommendation from me, it's "apt-get remove --purge unclutter for now. I think it's just buggy -- the idea is a nice one, though I'm not sure I want an extra daemon just for that functionality -- in particular because it seems that the mouse pointer blanks over gnome-terminal when I'm typing anyway.
  • Bonus Lucid hint for the day: if your sun java plugin is installed but your firefox or chromium don't see them, try Johan's magic combo:
     update-java-alternatives -l
     update-java-alternatives -s java-6-sun
06.08.2010 A Logitech DiNovo Edge dongle's embedded and HCI modes
  • I have a DiNovo edge and think it is absolutely fantastic, except when it stopped working on my way to my Lucid upgrade. What happened? The problem is that to Windows users the dongle normally works in embedded mode, meaning it appears to the system as a regular USB keyboard and mouse. Many Windows users hate that behaviour, and would rather the dongle behaved as a regular bluetooth adapter. However, Linux users have the flexibility of choosing in which mode they want the adapter to work, and the magic of selecting them is done through hid2hci, which is run by udev when the device is plugged in. Now it used to be that you could enable or disable hid2hci by fumbling in /etc/default/bluetooth; with the inexorable move to udev, however, on Lucid this behaviour is now hardcoded in udev rules and by default hid2hci is run, which means the keyboard only works if you perform a bluetooth pairing exercise. And since it's a keyboard, that exercise can prove pretty challenging!
  • The command which is run when the BT dongle is plugged in is pretty simple:
     /lib/udev/hid2hci --method=logitech-hid 
         --devpath=/devices/pci0000:00/0000:00:1d.2/usb4/4-2/4-2.3/4-2.3:1.0/usb/hiddev0
    Note the path points to the hiddev0 directory for the device in question. Note also that it seems you can't revert back to embedded mode once the command is run, for some reason.
  • People that are running into this problem on Lucid should know that it's possible to have the dongle still work in embedded mode as long as the hid2hci call in /lib/udev/70-hid2hci.rules for the keyboard isn't run. When embedded mode is active, an lsusb run will list 3 devices like this:
       Bus 004 Device 017: ID 046d:c714 Logitech, Inc. 
       Bus 004 Device 016: ID 046d:c713 Logitech, Inc. 
       Bus 004 Device 015: ID 046d:0b04 Logitech, Inc.
    When you issue the hid2hci run, a 4th device appears, representing the mini-receiver in HCI mode:
       Bus 004 Device 024: ID 046d:c709 Logitech, Inc. BT Mini-Receiver (HCI mode)
       Bus 004 Device 023: ID 046d:c714 Logitech, Inc. 
       Bus 004 Device 022: ID 046d:c713 Logitech, Inc. 
       Bus 004 Device 021: ID 046d:0b04 Logitech, Inc.
    In this mode, you'll need to connect through bluetooth. Note that I haven't found that works reliably on my powerpc mac mini, though it seemed to on a little netbook. The simple workaround seems to be to avoid bluetooth mode by commenting that udev rule out, and keep an eye out for changes in that area when you upgrade.
  • Embedded mode is kinda fascinating; from the OSs perspective it's just a keyboard and mouse that are plugged into a hub; there's no bluetooth anything. When you plug the dongle in, you get the usual dmesg spew of USB information. Here's what dmesg is actually telling you:
  • A USB hub is found; 3 ports detected (of which one is the hub, device 046d:0b04)
  • A keyboard (046d:c713) is found, though the device string identifies it as a "Logitech Logitech BT Mini-Receiver"
  • A mouse (046d:0b04) is found, with the same device string as above.
  • You can play with the dongle a bit when it is in HCI mode, by the way, with hcitool; here's some info on Johan's macbook, and on my Nokia e51.
     kiko@gasolinux:/lib/udev/rules.d$ hcitool dev
     Devices:
         hci0    00:07:61:E3:38:E0
     kiko@gasolinux:/lib/udev/rules.d$ hcitool inq
     Inquiring ...
         00:23:12:39:44:1F   clock offset: 0x1b9d    class: 0x38010c
         00:22:FC:4D:65:47   clock offset: 0x618c    class: 0x5a020c
     kiko@gasolinux:/lib/udev/rules.d$ sudo hcitool info 00:23:12:39:44:1F
     Requesting information ...
         BD Address:  00:23:12:39:44:1F
         Device Name: Johan Dahlin’s MacBook Pro
         LMP Version: 2.1 (0x4) LMP Subversion: 0x21b4
         Manufacturer: Broadcom Corporation (15)
         [...]
     kiko@gasolinux:/lib/udev/rules.d$ sudo hcitool info 00:22:FC:4D:65:47
     Requesting information ...
         BD Address:  00:22:FC:4D:65:47
         Device Name: Kiko o substituto
         LMP Version: 2.0 (0x3) LMP Subversion: 0x6cc
         Manufacturer: Cambridge Silicon Radio (10)
    If you're in embedded mode, that won't work of course:
     kiko@gasolinux:~$ hcitool dev
     Devices:
  • References to this problem include [bugs.debian.org] [www.mail-archive.com] [www.mail-archive.com] [bugs.edge.launchpad.net] [bugs.edge.launchpad.net]
05.08.2010 RAID1 Lucid desktops
  • I followed most of the advice provided by [blog.foobaria.com] in installing a RAID1 array on a Lucid desktop, and in the end got to a booting system, but there were some things that I thought were worthy of note:
  • I didn't have network for the installation, so I just put the mdadm package on a USB stick and dpkg -i'd it into the system.
  • Perhaps for the reason above, upon first boot I ended up stuck in initramfs because my /etc/mdadm.conf had no ARRAY entries. I am still a bit surprised that we actually need that file to boot (see my posting of 2009-12-09 for more RAID-on-Ubuntu woes), and I am not entirely sure why the --scan option to assemble doesn't work, but I had to enter three ARRAY lines into it beore it actually worked. In the process I learned about the convenient -m flag to mdadm -A.
  • You can ask the installer to install GRUB on the MD device, which causes it to be installed to both drives, which is what you wanted.
  • I'm not sure I'm entirely happy with the speed of the desktop on the drives I have, though. Maybe it's just that it's RAID1. Interface-wise I know the drives support SATA-II signalling:
     kiko@hellokitty:~$ sudo hdparm -I /dev/sdb
         [...]
            *    Gen1 signaling speed (1.5Gb/s)
            *    Gen2 signaling speed (3.0Gb/s)
    But I'm not sure this ICH7 motherboard supports 3.0Gb/s..
02.06.2010 iwlagn disconnects.. then connects.. then disconnects.. then
  • For some reason, since I've upgraded to Karmic and through Lucid, my x61 and Mariana's x61s have the same problem: they keep dropping their connection to our DD-WRT WRT54G AP, with the same pattern:
     [35502.041062] No probe response from AP 00:40:10:10:00:03 after 500ms, disconnecting.
     [35503.958234] wlan0: direct probe to AP 00:40:10:10:00:03 (try 1)
     [35503.968728] wlan0: direct probe responded
     [35503.968738] wlan0: authenticate with AP 00:40:10:10:00:03 (try 1)
     [35503.970691] wlan0: authenticated
     [35503.970731] wlan0: associate with AP 00:40:10:10:00:03 (try 1)
     [35503.973310] wlan0: RX AssocResp from 00:40:10:10:00:03 (capab=0x431 status=0 aid=1)
     [35503.973318] wlan0: associated
     [35622.060151] No probe response from AP 00:40:10:10:00:03 after 500ms, disconnecting.
     [35623.973605] wlan0: direct probe to AP 00:40:10:10:00:03 (try 1)
     [35623.976226] wlan0: direct probe responded
     [35623.976234] wlan0: authenticate with AP 00:40:10:10:00:03 (try 1)
     [35623.979137] wlan0: authenticated
     [35623.979177] wlan0: associate with AP 00:40:10:10:00:03 (try 1)
     [35623.981630] wlan0: RX AssocResp from 00:40:10:10:00:03 (capab=0x431 status=0 aid=1)
     [35623.981633] wlan0: associated
    ad infinitum, every 120s. Seems like the only people that have this so far are at [permalink.gmane.org] and [ubuntuforums.org] and [bugs.donarmstrong.com] but neither of them have 4965AGNs like our Thinkpads do.
  • I've found a bug at [bugzilla.intellinuxwireless.org] but it's not really the same issue I think; at any rate there are a number of patches attached to comment 113 there, so it might be worth looking through. And the guy at [bugs.launchpad.net] seems to have the same problem (though he's posting to a bug which doesn't describe the same symptoms) so let's try disabling ipv6 (though it makes no sense to me) since it is my only lead... no, of course that doesn't work. Oh well, let's keep looking.
  • So it /could/ be the issue at [bugs.launchpad.net] which is caused by network-manager rescanning every 2 minutes and the driver hanging while the rescan happens. But one thing that strikes me is that it only happens with the access point we use at home -- in the office it works just fine.
  • I now think the reason this happens is because there is some oddness with the WRT54gv6 that I have at home. It's configured in repeater bridge mode, and that's where we have problems. I've noticed that it is often out of memory and has difficulty answering to web requests; it seems this is more than just a wifi adapter driver issue. I've since replaced that AP with a WRT160NL and it is working much more reliably; I also no longer get the driver hangs and error messages I was getting above. It seems to be a weird interaction between the AP and the wireless card, but it's not entirely clear what it is.
  • Later update: if dmesg tells you "Firmware error" then check out [bugzilla.intellinuxwireless.org] which is another one of those ironic bugs that are a) hard to identify b) affect lots of people. As a workaround you can enable swcrypto (whatever that means!) and/or try grabbing the older firmware at [intellinuxwireless.org]
01.06.2010 Reverse Path Filtering
  • We have a multi-homed server that chooses where to send traffic depending on its content; web traffic goes through one network, VoIP over another, and so on. One thing that has always left me surprised is the fact that we log so many martian packets on it with rp_filter enabled -- if reverse path filtering is desireable then why can't I use it here.
  • Turns out that I just don't understand RPF well enough. I used to think it was just going to drop packets that are from non-routeable RFC-1918 addresses and other obviously broken sources, but in fact it is more than that; the router when receiving a packet does a route check on the source address. It then will only forward the packet to another interface /if/ the packet was received from the interface it would return it to according to its routing tables. So if I receive a packet on interface A from a source address S destined somewhere non-local, that packet will only be allowed through if a packet destined to address S would be routed to interface A.
  • This works okay for "normal" routers, since there is only one path which packets should go through. but on our traffic-balancing setup it ends up being too simplistic. For instance, we have a default route set up to our primary network interface, and if /any/ packet arrives on our secondary interface, it gets dropped: the default route is on the primary interface, and since we use firewall marking to determine the route, the reverse path filtering check never realizes the routing table would have decided differently. I found the answer at [www.mail-archive.com] pretty interesting in confirming that, and there's even a patch that fixes this up at [bugzilla.kernel.org] but it was never applied as there was a design concern; the feeling was that netfilter could handle its part of rp_filter checking. If you care about how it's implemented, check out fib_validate_source() in net/ipv4/fib_frontend.c but it's not very easy reading unless you know that corner of the kernel fairly well.
  • Incidentally, really good commentary on forging of source addresses and what routers should do about it can be found here: [serverfault.com]
30.05.2010 DD-WRT Wireless Bridging
  • Spent about 3h today trying to get my wifi bridge operational again. Here are some of the stumbling blocks I ran into in the hope they are useful to anybody using DD-WRT on a pair of WRT54g boxes from Linksys:
  • I'm using DD-WRT v24-sp2 13064 on both main AP and repeater, but the repeater is using the micro flavor, since it's a v6 box.
  • I wasted about 2h on the fact that the repeater would not join the main AP's network, and would never show up as a client in the Wireless Nodes listing in Status_Wireless. After trying pretty much every single configuration known to man, Johan convinced me to do a hard reset of the main AP, and guess what? It worked! @#!@*#*@! I would never have guessed this, because at least in theory the main AP has nothing to do with the repeater AP, but something was cached in the configuration that survived across reboots. So if you find yourself stuck where the repeater just accumulates TX errors and never finds the main AP, try doing a hard 30/30/30 reset of the main AP. Even if the documentation misleads you into thinking otherwise.
  • WPA Personal encryption does work, but if you set the advanced configuration option "Authentication: shared key" it stops working, and the password MUST be the same in the main and virtual interfaces. In WEP mode, which is what I was using before, the passwords didn't have to match, which I really liked. There might be something in the hardware which limits this -- I don't understand enough about how wireless encryption and DD-WRT's wireless bridging mode is implemented.
  • I have a pair of higher gain omnidirectional antennas but I'm not convinced they work very well, nor do I understand how or whether DD-WRT's auto mode for the antennas works. I'm going to graph packet loss between houses now and do one change per two evenings over this week to figure out what can be better.
  • There are many advanced config settings to tweak, but for now I'm starting with the same configuration on both, using a G-only network.
15.12.2009 Evolution and Google Calendar
  • This works the way you'd expect, just adding a new calendar in Evolution, and it syncing automatically to the calendar applet. But there are a few caveats to it:
  • Google provides you with https:// URLs; Evolution expects them to start with webcal:// and will unhelpfully tack on the method in front of the URL for you, so you end up with something that starts with [https] -- yuck. Just delete the https:// in front of the address and everything works (though I am surprised it does, as I'd expect it to be SSL-only -- oh well). I didn't use Secure connection and it still worked -- don't know if it should but I'm not pretending to know what I'm doing here!
  • If it doesn't work, I bet your system provides settings for a network proxy. For some reason, this doesn't work for calendaring -- perhaps [bugs.edge.launchpad.net] has something to do with it -- but the workaround is pretty easy, just don't use a proxy for Evolution, or provide details there manually.
  • Whatever you change, it appears you have kill the additional processes Evolution spawns to actually have them pick up the changes. At least I did knock out evolution-data-server and evolution-alarm-notify for good measure.
  • If something goes wrong, you get notified of a problem with the ics URL through the statusbar, which might be a bit unexpected -- if you click on the icon in the statusbar you get a dialog with more details -- in my case, a connection problem.
10.12.2009 WCDMA and UMTS
  • For Brazilians, the important thing to remember is that 3G frequencies vary according to operator and location, and [www.gdhpress.com.br] tells you more about that sort of thing. In general, for my operator, Claro, north american versions work, but that means no E52 for me, since the US version isn't out yet.
09.12.2009 mdadm, USB and SCSI, UUIDs and initramfs-tools
  • We've had a problem with our Ubuntu server since last year when I bought an external USB drive to handle rdiff-backups: when the bootup process starts, if the USB drive is plugged in, the RAID-5 arrays we have get confused because the drive names change -- normally sda is the first SCSI drive, but since probing is asynchronous, when the USB drive is detected first it's the external drive and mdadm fails into an initramfs shell. That's not nice!
  • A separate problem we were also running into was that the spare partitions, which normally live in /dev/sde, were not being automatically added to the array (even when the USB drive wasn't connected and the bootup succeeded) so I had to add an ugly rc.local command to add them in. This is similar to the issues found at [bugs.edge.launchpad.net] and [ubuntuforums.org] but not quite the same, as mdadm.conf's DEVICES entry actually included the spare drive.
  • So from 7pm to 11pm yesterday Johan and I worked on figuring out exactly what was causing this. And it turns out that it was a combination of two things, both related: the mdadm.conf ARRAY definitions, and a race condition during the initramfs process between the RAID autostart and the module-based kernel hardware probing. Here's how it works.
  • Ubuntu starts and mounts its RAID arrays in an initramfs. The initramfs image is packed with a set of shell scripts; the image is assembled from a bunch of code in /usr/share/initramfs-utils, primarily scripts with additional hooks that are run during the initramfs image generation and which modify the image itself. Anyway, while in the initramfs, the following happens:
  • Between initramfs' init-top and init-premount phases, it loads all essential modules. It knows to load the RAID modules through the work of hooks/mdadm (installed by the mdadm package), which adds them to the essential module list inside the initramfs. It knows to load the USB and SCSI modules because by default Ubuntu uses MODULES=most in its initramfs configuration.
  • Now, module loading order is deterministic, but in Ubuntu the SCSI probing is set to asynchronous (via the kernel option CONFIG_SCSI_SCAN_ASYNC=y; see [lwn.net] for details) the actual drives show up as the bus scans are finalized.
  • Meanwhile, in init-premount, a udev script fires off udevd which goes away and kicks off 85-mdadm.rules' RAID assembly, using mdadm --incremental. This is a pretty magical mdadm mode, and a read through the manpage section is pretty interesting.
  • Also in init-premount, an mdadm script checks for degraded RAID and LVM devices, and figures out whether or not to try and run degraded arrays. It does this by scanning the RAID superblocks using mdadm --misc --scan --detail.
  • After loading the modules, a script is run to mount the local filesystems; on non-NFS systems this script is called "local", and this is where we see if the root device is present and mountable.
  • After all this is done, the init-bottom scripts are run; here is where the udevd process is killed and whatever handling of the available hardware stops until udevd is started again.
  • The problem we are running is that the initial SCSI bus probe is done in a flurry of asynchronous activity; if you stop and read your kern.log after a reboot you're find out just how random the ordering is. What can happen on systems with lots of drives (such are ours, which has 6) is a race between the SCSI bus scanning, udev's triggering of 85-mdadm.rules and the killing of udev after the root filesystem is mounted. The race happens because the bus scan is asynchronous, which in turn means that the devices (which if you noticed above, are being added via mdadm --incremental) might take too long to show up -- long enough that udevd is killed and the real init starts trying to mount the rest of the (still incomplete) local filesystems.
  • To solve this, we simply added a script which sleeps for 15s in initramfs' local-premount; this is enough time for the hardware probing and udev rules to complete firing, ensuring that all RAID devices have been assembled and are available for mounting once /sbin/init kicks in. Simple but does the trick. It is likely that changing the scsi_mod scan option to "sync" would also solve this part of the problem. Yet another option would be to load the scsi_wait_scan module in local-premount.
  • We had an additional problem, which was that our mdadm.conf file specified sdX-style partition names for each ARRAY:
     ARRAY /dev/md2 devices=/dev/sda5,/dev/sdb5,/dev/sdc5,/dev/sdd5
    This doesn't play well with mdadm's --incremental mode when the drive order is changing around -- so when the USB drive appeared as /dev/sda the array could never be initialized and we failed to mount the root device. Changing the ARRAY line to refer to the MD device through a UUID solved this problem:
     ARRAY /dev/md2 UUID=b3b855d3:d8ee85f4:2ee19fc3:ff71564e
    As a nice side-effect once we made this change, mdadm --incremental started assembling our spare devices into our arrays: I had never been able to specify spares using the devices= syntax.
  • I suspect that in part [bugs.edge.launchpad.net] is caused by the on-demand nature of the module loading, but I'm not entirely sure -- perhaps specifying an explicit DEVICE line in mdadm.conf causes each device to be hit, which in turn means the right modules are probed. But I think it's actually a red herring, and that in fact the problem is with the modules Jan was missing in the initramfs.
  • Finally, it is strange but true that in the cases where mounting non-root local filesystems failed because of the udevd race, udevd in userspace never kicked the incremental mdadm again to finalize the running arrays, and even when running cat /proc/mdstat from the prompt, you could also see they were left incomplete or lacking spares. I'm not sure why this happens.
08.12.2009 Bad source addresses on my local server
  • Since a few reboots ago we ran into a pretty annoying issue on our server: packets originating on the server that were delivered to the server itself, handled by lo, were using the wrong source address. In more detail: our internal address for the server is 192.168.99.4. If when sitting on the server and I did:
     kiko@anthem:~$ telnet 192.168.99.4 80
     Trying 192.168.99.4...
    the connection was never completed. If I looked at a tcpdump trace
     17:18:17.439479 IP 189.x.x.x.42826 > 192.168.99.4.80: S 2220677816:2220677816(0) win 32792 mss 16396,sackOK,timestamp 6978218 0,nop,wscale 6
    it became clear that the wrong source address was being selected. But why?
  • I spent a lot of time reading and rereading the source address selection descriptions at [linux-ip.net] and couldn't figure it out. Everything in my routing tables was kosher -- and even if I cleared out all our fancy ip balancing rules and used plain turkey kernel and default gateway rules, the address was wrong. The src hints were there. There was no funky ordering issue. So why was the address wrong?
  • I finally read a slightly unrelated post at [lists.unix-ag.uni-kl.de] that gave me another idea: NAT! I hadn't thought of this possibility, nor did I recall changing anything there, but it was worth a try.
  • Turns out it was exactly that. I had a rule which said that:
     -t nat -A POSTROUTING -s 192.168.99.0/24 -o !eth1 -j MASQUERADE
    This would have worked perfectly if the packets were actually going to eth1. However, in the case where you are on the server connecting to itself, the packets go to the lo interface, but with eth1's source address 192.168.99.4, which matches that rule -- oops.
  • There are various solutions to this problem -- the one I went with was simple, specifying a destination network of ! -d 192.168.99.0/24 -- in other words, we only masquerade packets that are actually meant to be routed elsewhere. I could also have specified two ACCEPT rules that came before the MASQUERADE rule, avoiding masquerading for eth1 and lo simultaneously.
  • The trickiest thing with masquerading is that changing the rules takes a while to actually kick in -- it's not instantaneous. So if you change the rule and run the test immediately, it will fail -- but if you wait a litle bit you'll see it's actually fixed. Gar!
  • The morals to the story are
  • a) source address selection is also affected by IP masquerading even if the documentation doesn't remind you of that
  • b) when a host connects to itself, the lo interface is always used, even if the address being used to connect is not 127.0.0.1, and
  • c) it takes a while for iptables rule changes to actually take effect, so wait a while before actually testing them!
03.12.2009 Google Calendar on the Ubuntu Desktop n more
  • [johnnyjacob.wordpress.com]
  • When your sound goes bad, "sudo alsa force-reload" to the rescue!
  • When your xchat completion is weird, use "/set completion_amount 0"
23.11.2009 Git crack HEADs
  • Maybe git knows that I don't actually want to be using it:
     kiko@baratinha:~$ git clone [gitorious.org]
     Initialized empty Git repository in /home/kiko/x11-maemo/.git/
     remote: Counting objects: 241288, done.
     remote: Compressing objects: 100% (73222/73222), done.
     remote: Total 241288 (delta 191071), reused 214443 (delta 165447)
     Receiving objects: 100% (241288/241288), 186.11 MiB | 64 KiB/s, done.
     Resolving deltas: 100% (191071/191071), done.
     warning: remote HEAD refers to nonexistent ref, unable to checkout.
  • I wonder what it means when I have just spent 30 minutes downloading revisions to end up with no working tree! #$!@#@
26.10.2009 Saris Poweragent woes
  • Every once in a while I end up unable to download my PowerTap power files because Poweragent can't contact the PT head. Today it's giving me a traceback:
     WARNING [com.cycleops.jpowertap.Manager]: Exception Occurred Opening Device 0
     java.lang.NullPointerException
         at com.cycleops.jpowertap.CycleOpsDevice.readVersion
             (CycleOpsDevice.java:313)
         at com.cycleops.jpowertap.CycleOpsDevice.getVersion
             (CycleOpsDevice.java:286)
         at com.cycleops.jpowertap.CycleOpsDevice.getVersionFloat
             (CycleOpsDevice.java:387)
     [catch] at com.cycleops.jpowertap.Manager.getConnectedDevices
             (Manager.java:138)
         at com.cycleops.devicemanager.DeviceManager.getSelectedDevice
             (DeviceManager.java:52)
         at com.cycleops.devicemanager.DeviceManager.getSelectedDevice
             (DeviceManager.java:41)
         at com.cycleops.devicemanager.DownloadDeviceAction.performAction
             (DownloadDeviceAction.java:83)
    Wish I knew how to fix this.
  • Some more investigation using good ole strace shows this:
     [pid 19639] open("/proc/bus/usb", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_ CLOEXEC)
         = -1 ENOENT (No such file or directory)
     [pid 19639] open("/dev/usb", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC)
         = -1 ENOENT (No such file or directory)
    which suggests that we've got a legacy compatibility problem -- Ubuntu hasn't had a /proc/bus/usb mounted in a while. Let's see what we can do.
  • So I believe that /dev/usb was deprecated a while back in favor of /dev/bus/usb; in experimenting I add a symlink from /dev/bus/usb to /dev/usb and it seems to go a bit further. Next stop:
     [pid 21297] open("/dev/usb/005/012", O_RDONLY 
     [pid 21297] <... open resumed> )        = 126
     [pid 21297] read(126, "\22\1\20\1\0\0\0\10\3\4\1`\0\4\1\2\0\1", 18) = 18
     [pid 21297] read(126, "\t\2 \0\1\1\0\200", 8) = 8
     [pid 21297] read(126, "-\t\4\0\0\2\377\377\377\2\7\5\201\2@\0\0\7\5\2\2@\0\0", 24) = 24
     [pid 21297] close(126 
     [pid 21297] <... close resumed> )       = 0
    It turns out that this strace is misleading. The open() there is actually failing because of a permissions problem:
     ls -l /dev/usb/005/012
     total 0
     crw-rw-r-- 1 root root 189, 512 2010-10-05 12:08 001
     crw-rw-r-- 1 root root 189, 524 2010-10-05 18:31 013
    That's expected, since udev is the one expected to create user-visible device nodes under /dev -- for instance, /dev/ttyUSB0, which has the right permissions.
     crw-rw---- 1 root dialout 188, 0 2010-10-05 18:55 /dev/ttyUSB0
  • Anyway, a quick chmod 666 and poweragent runs again, and this time, it actually works! Here's the strace to show what happens:
     [pid 22794] open("/dev/usb/005/014", O_RDONLY) = 125
     [pid 22794] read(125, "\22\1\20\1\0\0\0\10\3\4\1`\0\4\1\2\0\1", 18) = 18
     [pid 22794] read(125, "\t\2 \0\1\1\0\200", 8) = 8
     [pid 22794] read(125, "-\t\4\0\0\2\377\377\377\2\7\5\201\2@\0\0\7\5\2\2@\0\0", 24) = 24
     [pid 22794] close(125)
    As you can see, the open() and close()s aren't unfinished this time. I'm not entirely sure why they don't finish when the permissions immediately fail, but maybe strace is tricking me, or maybe something's happen asynchronously. At any rate, once Poweragent has finished looking at all the USB devices, which it does in turn, it finally open()s the right device and then creates a long-lived fd to manipulate and download from:
     [pid 22794] open("/dev/usb/005/014", O_RDWR) = 35
     [pid 22794] ioctl(35, USBDEVFS_GETDRIVER, 0x9e4fdd10) = 0
     [pid 22794] ioctl(35, USBDEVFS_CONTROL 
     [pid 22794] <... ioctl resumed> , 0x9e4fdc70) = 30
     [pid 22794] ioctl(35, USBDEVFS_CONTROL, 0x9e4fdc70) = 30
     [pid 22794] open("/usr/local/lib/ftd2xx.cfg", O_RDONLY) = -1 ENOENT (No such file or directory)
     [pid 22794] open("/usr/lib/ftd2xx.cfg", O_RDONLY) = -1 ENOENT (No such file or directory)
    I'm not sure exactly how to fix this in a non-hackish way, but I'll write to Saris and see what they think.
21.10.2009 A Nokia E51 as a Vonage Softphone
  • So today I set up my E51 as a Vonage Softphone sip client; the instructions I followed were up at [www.esato.com] and [www.vonage-forum.com] and apart from the fact that I got the password wrong, because in their wisdom vonage.com prints the password in a non-serif font (meaning lowercase L and uppercase I are kinda impossible to distinguish) it works really well. The other thing to watch out for is that the domain is "sphone.vopr.vonage.net" -- if you use just sphone.vonage.net the registration will be successful but you won't be able to dial.
  • The only bummer is that the international call rates are not the same as for the main service (that's kinda fine-printish but oh well). But hey, the portability is hard to beat..
  • Read a little bit about the ToS IP field: [forums.whirlpool.net.au] -- I used this to grok the instructions at [www.voip-info.org] which basically is a real upgrade from basic wshaper for VOIP; basically it puts the VOIP traffic in a non-balanced queue, and everything else in an HTB-set set of classes -- I read [www.opalsoft.net] to really understand how the whole thing works, and it's actually pretty cool.
  • Upgraded the E51's software via Windows using a couple of hacks as well. Used NSS to change the product code, then ran the upgrader and presto, it all worked. Took a /long/ time but in the end it paid off -- the new firmware's VoIP stacks and applications work really well.
09.10.2009 Booting USB drives on old BIOSes
  • In preparing an old server to sell off today I actually ran into a problem which apparently affects lots of old computers: the BIOS, though it supports bootable USB drives, is really picky about the drive geometry. After swimming through a collection of links like [brainstorm.ubuntu.com] and [help.ubuntu.com] I started tackling the trial-and-error that fixing up the geometry implies -- reading through [www.sudhian.com] and [www.911cd.net] for inspiration. And after 5 tries, I actually managed to get something that worked:
     16 heads, 63 sectors/track, 1986 cylinders
     Units = cylinders of 1008 * 512 = 516096 bytes
     Disk identifier: 0x00000000
     .
        Device Boot      Start         End      Blocks   Id  System
     /dev/sda1   *           1        1986     1000912+   6  FAT16
    Woot!
  • Of course, It Never Works The First Time; installing from the Ubuntu Server USB key left me with a broken system because it installed grub onto the USB key instead of installing it onto my SCSI drive. I ended up booting with the USB key plugged in (it had grub installed on it by the installer, which ruined it but allowed it to boot my server!) and running install-grub to get that fixed.
08.10.2009 So it's old news..
  • Still, the blog post "If Version Control Systems were Airlines" at [changelog.complete.org] is the smartest and funniest blog post I think I've ever read. And it's true, too.
02.10.2009 London and pings
  • So when traceroute tells you
     send failed: Operation not permitted 
    it is time to look at your firewalls, because it's likely that you've got a wrong rule in the output chain.
  • In London for the week, and back again on Saturday to Brazil. I miss Mariana and the way she smiles when I arrive.
20.09.2009 iptstate
  • Ever wondering what your IP masqueraded connections looked like? I always did, but I just found iptstate(1) and it is exactly what I wanted.
  • It seems that Embratel's link blocks ICMP packets related to fragmentation; the MTU on the cable modem's interface is 1472, but when I set it correctly on my router's interface, I get PMTU-D problems with the internal network (which has an MTU of 1500). The solution I cribbed from [tldp.org] involves using the TCPMSS target to iptables.
  • Other reading for today [www.wallfire.org] [www.netopia.com]
  • Oh, and for the record, even though [www.voiptroubleshooter.com] says that MTU 512 works best with ADSL modems, it certainly does not work for the Opticom we have connected here to the Telefonica line.
19.09.2009 PPC Mac Mini on Ubuntu Jaunty and video playback
  • Next problem for the weekend was video playback on my PowerPC Mac Mini. Standard Jaunty was installed on it, and it worked pretty well -- networking, audio, removable USB. But one thing was bugging me: video playback with Totem no longer really worked. Most videos I have (for instance, old Giro and Tour recordings in DiVX format) would no longer play, and the ones that did (like a recording of the 2007 world's) would play at terrible FPS, massively dropping frames and totally burning up CPU in full-screen mode. I figured that this was the real end of the road for this little mini; after all, Ubuntu's not really supported on it, and there's no binary magic (like fglrx or even flash) to make it work. This is a standard old-school PPC Mac mini:
     0000:00:10.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200] (rev 01)
  • Well, today I spent some time looking into the problem. Turns out there were a couple of problems compounded. The first things I looked into were acceleration-related, and because most posts on the web are around 3D acceleration, the first thing I discovered was that there's an issue with the stock Jaunty kernel that I was running: [bugs.edge.launchpad.net] -- you can pretty easily diagnose this by running glxgears (yeah, we know, not a benchmark) and seeing that you get very low frame rates -- like 60 or so; in other words, falling back to swrast and not using hardware acceleration like the radeon driver in Jaunty should. The easiest solution I found to it was linked to from [ubuntuforums.org] -- I just downloaded and installed the kernel at [tmpstore.free.fr] and that was it, glxgears was up in the 700s after that, and CPU pretty low.
  • Turns out that 3D performance has nothing to do with 2D video, and this was a bit of a red herring (though it's great to have accelerated 3D graphics, I guess). I spent some time looking into the situation: first, VLC would play most videos that totem and gstreamer wouldn't, but it was still horribly slow -- the hardware wasn't being used for playback, indicating something was wrong with Xvideo. I looked around a lot for a solution, and ended up deciding to try an updated X video driver pulled from Tormod Volden's PPA at [edge.launchpad.net] -- just had to apt-get source xserver-xorg-video-ati, apt-get install the dependencies and dpkg-buildpackage it into existence. To save people the trouble of actually doing this, I put the assembled debs up at [people.canonical.com] and feel free to use them if you ever run into this problem. Once this was installed, I ran VLC and sure enough, the video playback was as smooth as ice and CPU usage really low.
  • A couple of closing notes. First, I'm not sure why this was set up this way, but the reason Totem wasn't even trying to do Xvideo was that gstreamer-properties had it disabled; once I enabled it it was as fine as VLC playing back the videos that didn't crash it. The second thing was that my /dev/dri/card0 was set to mode 660, which may have meant that GL apps running under me couldn't actually do DRI -- I set it to 666 but I don't think that actually did anything, so maybe that's crack.
18.09.2009 WRT54g Wireless Repeater Bridging
  • After investing R$80 on a new WRT54g to connect the house and office together in a way which works reliably, I spent some time researching alternatives to actually wirelessly connect them. I was always under the impression that it was pretty much impossible to do wireless bridging (i.e. a wireless AP here extending the range of a remote wireless AP, including the local ethernet switching) without having multiple radios available (and I didn't think the Linksys unit has multiple radios, nor do I know now), but apparently not only is it possible, it's trivial to do with recent DD-WRT v24 (there's no need, for instance, to set up WDS, which I think is a standard mechanism to do exactly this). I followed a tutorial on the DD-WRT wiki which was pretty much exactly what I needed: [www.dd-wrt.com] -- there's a version in portuguese which is actually more complete, though I was a bit reluctant to trust it: [www.infohelp.org] -- ironically the DD-WRT wikipage was updated YESTERDAY to add one critical step to the recipe. Guess I'm lucky!
  • The only trick I ran into was when flashing DD-WRT onto the Linksys. I had a WRT54v6 (see [www.dd-wrt.com] for details) which makes the process slightly trickier -- the flash and RAM are really reduced so you can't load the standard DD-WRT image through the standard (VxWorks) web UI -- you need to go through the steps outlined at [www.scorpiontek.org] -- it is a lot simpler than the stuff in the DD-WRT wiki itself, which seems to be geared towards windows users. I managed to get the prep and killer firmware loaded in the initial stages, but I was surprised when the tftp upload of DD-WRT itself didn't work. The upload itself went, the unit seemed to be doing something (pings stopped) but the web UI never came up. I could still tftp files across, so it wasn't bricked, but.. weird. Well, guess what -- ascii mode tftp won't work. Doh! Took quite a few reboots when looking at the flashing light to figure out that I needed to type "bin" before the put. Guess I forgot the lessons learned in the old inet BBS days.
  • Once that was done, the other thing I got stuck with was that the DD-WRT's web interface would give me a blank page whenever I saved or applied any changes. Turns out this is related to something cached in the browser (I suspect by IP, some shared cookie between the VxWorks web server and the DD-WRT one); I eventually read some dozen threads that all said the same thing: CLEAR YOUR COOKIES AND CACHE. It worked, too ;-)
24.08.2009 Lipoprotein A
10.08.2009 Seen today on my "2gb link"
  • Up 517kbps and Down 144 Kbps. From my ADSL modem's console. And then a reboot back to the usual 659/2297 Kbps -- so you tell me, what the hell?
15.07.2009 Well yes it's been a while
  • And no, I won't post anything interesting apart from recording the time schedule for SESC's swimming pool here in São Carlos. It's from 13:30 to 21:30 from Tuesday to Friday, and on weekends and holidays it's from 9:30 to 18:00. And you need to do a medical test before you are allowed to start going; make sure you don't forget that part.
  • Found out that F-Spot can upload to Smugmug, but there's apparently a bug which is failing on Hardy and I wonder if it works on Jaunty: [bugs.edge.launchpad.net]
04.05.2009 The only thing I can do is read
  • MySQL pissed me off today. It started out with it going OOM and then corrupting Bugzilla's logincookies file. So I turned it off, pulled out myisamchk and it did the right thing. Or half of it, it turns out. Once I had done that, the table was marked read only, and how the hell does one get out of that?
  • Turns out there were THREE things that needed doing. First, I had to chown the actual MYI file to mysql:mysql. Then I had to chmod it to 660, because the perms were broken. After that I got stuck, because mysqlcheck wouldn't update it, still saying "Table 'logincookies' is read only". Well, damn it -- turns out the last bit missing was running a "FLUSH TABLE logincookies;". So how was I supposed to guess that?
28.04.2009 You can't make this stuff up
13.04.2009 Eastern Easters
  • Possibly one of the all-time best Roubaix climaxes ran yesterday as an Easter present for everybody. Or well, I guess everybody but Flecha, Hoste and Hushovd: [www.youtube.com]
26.03.2009 How can one not love Launchpad?
     <danilo> kiko: got any cheap offers for marriage in EU? and can I
     continue to live here? (fwiw, I am busting my ass with documentation
     this time around since I want to ask for longer visas, and I'll have
     Canonical lie a bit in the invitation letter how I am going to go to
     Spain 745 times in the next year :)
     <intellectronica> danilo: as soon as i get my EU pass i'll be happy to
     marry you
     <danilo> intellectronica: sounds good :)
     <intellectronica> (even though you're a bit too tall to my taste)
     <danilo> intellectronica: well, you'd not be my first choice either, but
     I'll love you as if you were :)
     
14.03.2009 Things I miss about Brasilia
  • I left Brasília many years ago, and I only lived here for 4 years before starting university down south. But there are some things that I really miss about it.
  • One is the rain. It is spectacular in the summer months how beautiful and shocking the thunderstorms are; my parents have a house which is glass everything and the sound is deafening. When I come here I love staying at home and just listening to the roar of the storm hitting the house and the lake. And the thunder, when it strikes, makes your heart race, eyes widened at how loud that could really be.
  • Home is another. It is very neat to be surrounded with all these weird mementos of your former lives; trophies from races you did who knows when, comic book collections that you bought in forgotten newsstands and in ancient trips to places you never visited again, bits of arcane computer hardware that might just work if you can piece it all together. And the sounds of that old life around you.
  • My parents. I have had a lot of luck in my life, but wow, it seems almost unfair that I had a pair of parents as fantastic as mine. They are two ever-surprisingly versatile individuals if I ever saw a pair, and every year they seem to decide how they want to reinvent themselves. And whatever it is you want to do, they are interested in doing it with you. Run 15k? Sure! Waterfalling? Always! Out to buy random stuff I need? Now! Never boring. What an amazing couple.
10.03.2009 The purposes of war
18.02.2009 Argus Tick tock
  • The Cape Argus cycletour is on the 8th of March. The route is nothing but spectactular [www.cycletour.co.za] and it's not as flat as one might think [www.cycletour.co.za] -- and I am so happy I will be there along with thirty thousand other people. If everything goes according to plan..
16.02.2009 Getting older feels great
  • But let me ride my 110km this morning and then tell you exactly how great. I just wish it wasn't raining..
15.02.2009 pioggia maledeta
  • After 105km of rain yesterday, you'd think that it would let up for a birthday lunch? Guess I'm kinda short on credit with the weathermakers. Plan B!
09.02.2009 The politics of scanning
  • If you own an Epson scanner and want it to work in Ubuntu without installing the debs provided at [www.avasys.jp] by hand, check out my comment 5 at [bugs.edge.launchpad.net] -- basically, the versioning of these packages is psychedelic and somebody needs to get Avasys to make some sense of it so we can reasonably package and update iscan without breaking everything we ship based on it. Michael Casadevall and I spent a few hours on this over the weekend and though we have some packages in his PPA, they aren't really something which we can get into the archive for Jaunty.
08.02.2009 Some pictures are truly worth a thousand words
  • This gallery has such a striking set of contrasts I can't help but love it: [www.velonews.com]
  • Just got back from the first race of 2009, which was a 50km mountain marathon; out of the 800-odd registered I finished around 10th overall and, again, 2nd master. 42 seconds off! [www.sistime.com.br] I rode well the first half of the race, but lost contact in the hard climb in the middle of the race and just wasn't in the mood for a solo chase. I felt strong on the flats but had a little difficulty staying in contact on the climbs, and I particularly hated the soft feeling my SID has up front -- it makes me really not want to stand up when I'm riding, which means I can't easily jump on wheels or deal out the pain. Santa, bring me a new fork?
02.02.2009 Oggplay is just amazing
  • I have for years been enduring the Nokia music player on my 3250, which is the phone I use on bike rides. It's a big fat phone but the built-in speaker is loud and it is as solid as they get; mine has a cracked LCD, a partially broken case and so much water and dirt inside it you'd be amazed it still turned on. Not only does it work, but it works as well as it did when I got it 3 years ago.
  • I have a ton of Ogg Vorbis files that are the format I rip from my CD collections into. I knew there was this software called Oggplay [symbianoggplay.sourceforge.net] and that it played FLAC and OGGs, but when I looked at it I was worried that it was one of these applications that worries more about skinning than getting the UI right. I was also unsure about how much battery it would melt. I never tried it.
  • But, come 2009 music refresh program, I have Girl Talk in FLAC and I haven't been able to listen to it on the phone. I decided that today was as good a time as ever to try it out. And wow, it is amazing! The UI is so well designed I can already use it with my eyes closed (pretty essential for a player you use while riding). It has the file browser integrated with the playback screen, and it is really fast -- much faster than the built-in player. And FLACs and OGGs! Just really impressive. Guess I might just need to wait for Nokia to get over its DRM obsession before it considers shipping it by default?
  • One note on the installation is that downloading zips does seem a bit counter-intuitive to me; I was expecting a set of links from the website to installable SIS files. Maybe it has to do with how the downloads are hosted at Sourceforge?
01.02.2009 A campaign for modern music
  • It was last November that I realized it. When you're 33 all you think of is the songs you listened to when you're 23. That's rubbish. So I'm planning ahead and since December have been listening to everything mildly interesting I hear about. If you're in the same situation, here are my picks so far.
  • Foals, Fischerspooner, Chromeo, TV on the Radio's Dear Science, The Black Ghosts, Girl Talk, The Ting Tings, Cut Copy, Ladyhawke, Late of the Pier, The Whip, VHS or Beta, The Virgins, Buck 65, The Rapture
  • Not so new, but still great: Idan Raichel, Sad Lovers and Giants, Lene Lovich (can't believe I had never heard of her)
  • Stuff I (perhaps surprisingly given the hype) did not like: MGMT, Hot Chip, Les Savy Fav, Wire, Vampire Weekend
  • If you know me and want a USB mixstick put together just send me one through the mail; I'll fill it with my picks. Postage free.
31.01.2009 Waking up when it's dark
  • I kinda hate it when I can't sleep through sunrise. I go to bed pretty early and usually am asleep by 22:30; at that time my brain is already flagging and when it's not I remind myself of Jacobson's progressive relaxation method [en.wikipedia.org] and that's it, Zs show up in my brain. The problem for me is that when I am at home I just can't figure out how to wake up after 6. Normally this isn't a problem, since I go to sleep so early, but if I have a late night, it is a disaster. Yesterday the Launchpad team leads meeting ended in a dinner which we walked out from at midnight (!) and this 'morning' I've tried reading, drinking water, PMR, etc and just around now have given up and come to face my 1177-message-strong inbox. Funny thing is, there are only two situations in which I don't have this problem, and one of them is when I'm travelling on vacation. Damn.
29.01.2009 got watts
  • This week I'm supposed to be gearing up for the first race of the year, which is a pretty unspectacular marathon in Itu. I mean, it'll be painfully fun as all races are, but it's not a particularly long or hard race -- though some people argue that the easier the parcours the harder your buddies make it. Anyway, so because I have this race I'm doing some moderate-to-high intensity stuff this week -- I had done pretty much only hard tempo rides between December and January. I did 3x5' of 20/40s on Tuesday and Friday (at 300/500W), which left my legs soft and bubbly, and nearly beat my PR for 5' (which is 398W from last November). 20/40s require a serious psychological edge: the first hit is fun, the second is hard, the third is painful, and the two last ones are just total regret. But because the rest is so short and constrained, you just keep coming back..
  • Then on Wednesday I did this killer workout on rollers (since it doesn't f**** stop raining here) with 10x2' at 300/350W and then a little 3x5' ladder going from 260W to 300W. But man. AIIII. I was so lost after it. Johan said something to me during it but blam if I can remember what it was: just the sweat pouring off my face and the black spots in my brain. I had dinner at Sal6 later with Martin and Johan and I was absolutely useless (and then you called and it was even worse). My legs felt okay, and the ride the next day was fine, but the system shock was hard to get over. But then again, I broke my 10' and 20' records at 332W and 326W, which is 5.1 W/kg and nothing to sneeze at. Never thought I could do that on a January! I could have done a few more minutes but I'm not sure how many more; my heart maxed out at 173 which means I wasn't self-destructing, but the last minutes took a loooong time to go by.
22.01.2009 Wow
  • "I think it's the kind of stuff I need to do, I need to get in the race and work that top end," he said. "Like I've said 100 times here, I can't get that in training so the more I can be out there... I mean, I looked down at my power metre and the average after two hours was 340 watts, you can't do that in training. You're just constantly going and