The diary

29.06.2016 Mailman errors
  • Happen. In doubt, actively check /var/log/mailman/errors, and use unshunt liberally.
23.05.2016 LXC, delays and ordering
  • I spent some time today trying to figure out why my Gerrit LXC would never start up correctly after a system reboot. The container would come up, but Gerrit wasn't running. But if you started or restarted the container manually, it would all work fine. What the hell?
  • It turns out that the problem was a dependency on another container running PostgreSQL, but what's funny is that getting the containers to start in the right order was not at all trivial.
  • First, I thought of using lxc.start.delay to delay starting the Gerrit container itself. However, it turns out that lxc.start.delay delays initializations of containers started AFTER this one; a more appropriate name might be lxc.start.postdelay or lxc.start.delayafter. So that won't help me here.
  • Given we can't delay starting the Gerrit container itself, the trick is to start the PostgreSQL container earlier, which involves using lxc.start.order (and then lxc.start.delay to block containers starting after that). But again, there's a gotcha; lxc.start.order in LXC 1.x is actually in descending order, so 99 starts before 1. That was changed in LXC 2.0 after somebody spotted the inconsistency: [lxc-devel.linuxcontainers.narkive.com]
  • The final piece of gotcha is that I was kind of hoping to set the default order for *every* container in /etc/lxc/default.conf, but that file is actually only used to seed new container configs (so seed.conf might have been more appropriate) -- the place where you can drop in defaults that are used for all containers is actually in /usr/share/lxc/config/common.conf.d. I ended up modifying the configs manually, and it all works as expected. Phew!
13.04.2016 SASL PAM auth with Sendmail is hard
  • For a very long time we've wanted to use PAM authentication to enable users to relay through our mailservers, but never managed to get it working. The workaround was to define a sasldb2 file and manually maintain passwords, but that was not very practical and for new users required extra setup. Well, today I finally managed to get it working.
  • The first hint was that, in order to avoid needing a separate sasldb2 file, you need to use the LOGIN and PLAIN authentication mechanisms. I found this in a footnote at [www.puresimplicity.net] that goes into it in more detail; specifically:
    Adding support for CRAM-MD5 and DIGEST-MD5 complicates password-management greatly. CRAM-MD5 and DIGEST-MD5 can not authenticate against the regular password system, be it saslauthd talking to PAM (the default system the above setup uses for sendmail) or straight from local system passwords. Keeping plain-text passwords in files is just a Bad Thing, plus password synchronization becomes a problem when you have to maintain three separate passwords.
  • Getting AUTH and LOGIN enabled in sendmail.mc took a bit of digging as sendmail.mc is a bit confusing, but in the end I managed to get it working, including only allowing these over TLS. Reading [www.sendmail.org] would have helped.
  • The final piece was equally tricky; I edited Sendmail.conf but was a bit too eager to specify configuration, and ended up including saslauthd_path, but without the trailing "/mux" named pipe. The most annoying thing is that Sendmail just silently fails if you get that wrong -- nothing is logged. In fact, no SASL activity is logged at all by Sendmail AFAICT. The hint that this was wrong comes from here: [lists.andrew.cmu.edu] and the actual documentation is here: [www.sendmail.org]
  • Bonus learning for today, part 1: what dnl in the sendmail.mc file actually means: "delete through newline", courtesy of [serverfault.com] It ensures that whatever comes after it never shows up in the sendmail.cf file. Contrast with a line that starts with #, which is copied into sendmail.cf but will become a comment there.
  • Part 2: when forwarding using .forward, a backslash prior to the name avoids further expansion, and you can use that to deliver a copy of your mail locally along with whatever else you are doing. Courtesy of [docstore.mik.ua]
  • Part 3: when testing sendmail in STARTTLS mode, gnutls-cli is really easy to set up, and [www.moeding.net] describes it perfectly.
04.04.2016 Windows Backup Sucks
02.12.2015 rdiff-backup and UpdateError/UpdateErrorOne
  • While the rdiff-backup wiki has been offline for months, the [web.archive.org] copy is still valid. It doesn't really give great ideas, though perhaps cloning (or snapshotting, yay ZFS) the log directory would not be a bad solution to the problem.
08.09.2015 DKMS, CH9200 and LTS backported kernels
  • So to handle the server disaster I ended up replacing the board with one with only 2 ethernet ports, and our router needs 3, so I also went out and bought a USB Ethernet device. This is what I got:
     [    4.356423] usb 6-2: New USB device found, idVendor=1a86, idProduct=e092
     [    4.356434] usb 6-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
     [    4.356443] usb 6-2: Product: CH9200 USB Ethernet Adapter
    Of course, is it just my luck that the device doesn't work without an out-of-tree driver?
  • DKMS to the rescue! Googling led me to a site with some debs for this device that have already been DKMS-ified: [towo.eu]
  • The only remaining piece was that I was missing the 3.13 kernel headers as I use a backported kernel, as [unix.stackexchange.com] hinded. I installed the linux-headers-generic-lts-trusty package, dpkg-reconfigure'd dkms to build the module, tweaked the interface name in /etc/udev/rules.d/70-persistent-net.conf and all was well. Phew!
  • (I say wll was well except our ADSL modem is still dead. Yuck!)
07.09.2015 Public Holidays Kill Servers
  • 2 of our servers died this Monday, at 1PM. We're not sure what happened, but they seem to be toast. The motherboards were fried, probably during a horrible thunderstorm that struck us, but the weird thing is that the UPS, switch and cable modem seems to be fine. I'm not sure what happened! The only other suspect pieces: a USB hub which died, and the ethernet port on an ADSL switch which is partially malfunctioning. Subtle, eh?
  • The dead ADSL port does mean we have no SMTP or inbound services until then though.. kind of annoying but it's somewhat liberating!
18.08.2015 logrotate and 6:50AM
  • By default, Ubuntu sets cron to run daily jobs at 6:25AM, weekly jobs at 6:47AM and monthly jobs at 6:52 of the 1st of the month. So if something weird happens around that time, check logrotate first. And if the time is not that time, then check something else (which is what it was in my case :-)
17.08.2015 rpcbind dead, sendmail mail loss!
  • At 8:38AM of the 15th, a Saturday, rpcbind died, and since then we've been losing email sent to local recipients. As Mark said, a good way to have a quiet weekend. But yuck!
21.01.2015 ssh keepalives
  • If your remote server hangs on you occasionally, remember to set ClientAliveInterval in the sshd_config (it's not there by default, at least not on Precise): [z9.io]
15.01.2015 TLER on Samsung 840 PRO
  • If it supports it, why not enable it? [en.wikipedia.org]
     smartctl -l scterc /dev/sda
          SCT Error Recovery Control:
             Read: 70 (7.0 seconds)
             Write: 70 (7.0 seconds)
    If the drive is taking more than 7 seconds to return on an operation, I'd rather Linux knew about it than kicking the drive out completely.
14.01.2015 Bitlee failing to connect to MSN?
  • If you are getting messages like
     msn - Logging in: Connecting
     msn - Login error: Could not connect to server
     msn - Logging in: Signing off..
     msn - Logging in: Reconnecting in 5 seconds..
     msn - Logging in: Connecting
     msn - Login error: Could not connect to server
     msn - Logging in: Signing off..
     msn - Logging in: Reconnecting in 15 seconds..
     msn - Logging in: Connecting
     msn - Login error: Could not connect to server
     msn - Logging in: Signing off..
     msn - Logging in: Reconnecting in 45 seconds..
     msn - Logging in: Connecting
     msn - Login error: Could not connect to server
     msn - Logging in: Signing off..
     msn - Logging in: Reconnecting in 135 seconds..
    see [wiki.bitlbee.org] and [ismsndeadyet.com] for the hint -- it's just that you can't connect to the default server, and instead need to use "account msn set server X" before connecting. Doh!
07.01.2015 AMT making you upset?
06.01.2015 BTRFS for production?
16.12.2014 DDNS hurts
  • DDNS updates finally nailed; took quite a lot of investigating as ISC DHCPD's configuration is full of weird gotchas. The primary piece of help was [www.semicomplete.com] which explains in a simple way how static entries need to be set up. Key findings follow.
  • Setting a "ddns-hostname" makes the system actually work more reliably; ISTM that the query the client sends actually affects the way the hostname is determined. I assume this is tied to allowing the client to send its own hostname, which I consider undesireable, as I want the server to control what hostname I put in my domain).
  • Other references: [prefetch.net] [lists.isc.org] [www.zytrax.com]
05.12.2014 Using ping -I
  • It turns out that ping -I is a bit tricky. The simplest thing to do is to use the interface name: kiko@anthem:~$ ping -I eth2 altern.org PING altern.org (80.67.174.57) from 189.35.185.240 eth2: 56(84) bytes of data. 64 bytes from altern.org (80.67.174.57): icmp_req=1 ttl=49 time=259 ms but that is actually lying: the packet isn't going out from 189.35.185.240, which is the address for eth3, but rather from eth2's native address. I even tcpdump'd to confirm.
  • And if you use just the address, it doesn't seem to work:
     kiko@anthem:~$ ping -I 189.19.234.109 altern.org
     PING altern.org (80.67.174.57) from 189.19.234.109 : 56(84) bytes of data.
30.11.2014 LXC aargh and NFS mounts
  • If you are trying to mount an NFS share inside an LXC container on 14.04, it won't work until you fix the apparmor profile: [technuts.tru.my]
29.11.2014 Cron madness
  • I have been trying to get a find command to delete old files and directories under a tree; this is run in a cronjob and I've just been sloppy at it. Today I finally discovered -mindepth 1 and -depth were what I was looking for all along!
  • BTW, a trivial way to ensure only a single cronjob runs is to use flock: flock -n /var/lock/foo command Before I used lckdo, which is included in the moreutils package, but flock is part of util-linux and doesn't need perl madness.
28.11.2014 mtab versus /proc/mounts
02.11.2014 Happy Mailman Day: fixing unhandled bounces!
  • [mail.python.org] had the hint:
     grep ^"<[a-z]" ~/mail/bounces | tr -d \<  | tr -d \> | xargs bin/remove_members --fromall
23.10.2014 Magic SysRq actions disabled?
11.10.2014 Spin the furniture
  • Spent hours of my Saturday with Rafa and two woodworkers doing a full 180 of the TV rack which ended up being low enough for Rafa to hit his head (which would hurt). It almost killed us all but we succeeded and the results are actually.. pretty good!
10.09.2014 Google Hangouts auto-mute
  • I hate it, but [www.pixelmonkey.org] indicates a trivial way to fix it, which is adding a single line to a .config file for the talk plugin.
  • Gparted killed my Windows partition when resizing. Trying to get a recovery disk was crazy hard! It turns out unetbootin is the easiest way to do it, but Windows required NTFS which current unetbootin doesn't easily allow unless you use the hack in [askubuntu.com]
  • And once you get your Windows back you will discover that expanding the partition is instantaneous in the actual disk manager UI!
  • I messed up my GPG trustdb, but luckily: [trog.qgl.org]
03.09.2014 Iodine
  • Finally got IP over DNS working and it's amazing to say the least! [dev.kryo.se]
18.06.2014 CUPS & double-sided printing?
  • We have a decade-old Laserjet 1320 that is great for double-sided printing. That is, until recently -- perhaps even as recently as we moved our workstations to Trusty. What happened?
  • The best hint I found was at [ubuntuforums.org] trying to set the default options for the printer. When I enabled double-sided printing, CUPS warned me there is a Duplexer Installed option -- which was off. Fixed!
30.05.2014 Updated Server BIOS for out S5520HC
  • Huge filename and pretty massive firmware update: S5520HC_S5520SC_EFI_BIOS64_BMC61_FRU33_ME112.zip
  • Moved from 0050 to 0064 -- a 4 year delta between versions!
  • Only issue was the the FRU and DSR update didn't get done because it printed a scary warning about not being able to detect a temperature sensor.
     Detecting Front Panel Temp Sensor Device. Please wait...
      
     Front Panel Temperature sensor device hardware is not found.
     Chassis fan Speed Control (FSC) will not work properly without this
     sensor!
      
     Do you want to still Continue (Y/N)?
  • I guess I'm just going to ignore that as I don't really seem to need the update for those bits. What do the FRU and SDR pieces do anyway?
  • Oh.. I guess I understand now. This is why my fans are screaming! [www.experts-exchange.com] [www.intel.com] [communities.intel.com] [downloadcenter.intel.com] [communities.intel.com] [www.samsung.com]
16.05.2014 USB drives and burn-in
  • I'm replacing the server USB backup drives and looking for good alternatives. I've picked a few and am trying to burn them in before making the commitment (as previous drives I was trying to use ended up dying on me mid-flight). Burn-in for me is badblocks for a few days and some SMART self-testing.
  • One of the drives I got was a Seagate, and it annoyed me that there were a lot of errors in the two first SMART values listed by smartmontools as I did a badblocks on it:
     1 Raw_Read_Error_Rate     0x000f   100   100   006    Pre-fail  Always -       236840
     7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always -       204379
    I say it annoyed me, but then I read this: [serverfault.com]
  • So the lower side of the number is just a counter. And math shows they have zero errors. Yay!
  • And if you have a USB drive that has self-tests being aborted by the host, check to see if it's not sleeping mid-test. At least that is what [ddumont.wordpress.com] says; I'm trying it.
  • Daft Google syncml limitations: [www.nodhing.com]
  • Saved my ass with DD-WRT passwords today not being synced between web and ssh: [www.dd-wrt.com]
  • Quora just taught me:
     Answer from Susan Ng
     I learned this in 1st grade - it's a REALLY easy and simple way to learn
     your nine times table. Or to teach someone else to learn!
     1) Look at your (or someone else's) hands
     2) Say you want to find 9x7.  Put down your 7th finger
     3) Count the number of fingers to the left of your 7. This is your tens
     digit.  Count the number of fingers to your right.  This is your ones
     digit.
     9x7=63
    Wow!
  • What a trick: [testefromhell.wikispaces.com]
  • Typing the em-dash: [askubuntu.com]
  • Getting .bash_profile sourced: [askubuntu.com]
  • Just ran into: [www.fxp0.org.ua]
23.04.2014 Juju and LXC
  • Debugging session as to why I can't get the local provider to give me new machines in Juju. This is probably a regression in 1.18.1.4, but I still don't know yet.
  • One thing which is interesting is that the log for machine-0 is actually where a lot of the container traffic appears. machine-0 is the bootstrap node, and in the local provider, it's what houses all the other containers.
  • root@chorus:/etc/default# apt-get install lxc/precise-backports
  • Escape the console?
  • lxc-ls is busted?
  • [curtis.hovey.name]
20.04.2014 36ers?
09.04.2014 HP Virtual Rooms
  • The trick to getting [rooms.hp.com] to work is to know that the plugin and application they provide are 32-bit. That's not something which is obvious unless you actually read the page carefully, and the failure mode is completely unobvious (the installer runs, the plugin is there, the test page looks like it works but no virtual room ever opens, with a URL flashing quickly before loading back into the test page). There is a trick which you can use to test manually and see what is wrong:
     kiko@limpinho:~$ cd .hpvirtualrooms 
     kiko@limpinho:~/.hpvirtualrooms$ ./hpvirtualrooms 
     bash: ./hpvirtualrooms: No such file or directory
     kiko@limpinho:~/.hpvirtualrooms$ file hpvirtualrooms 
     hpvirtualrooms: ELF 32-bit LSB executable, Intel 80386, version 1
     (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15
    Aha! Okay. So I need some 32-bit libraries..
  • First step, I need
     sudo apt-get install libc6-i386 
  • That lets me see via ldd the situation. And it's pretty bad: I need a total of 44 libraries pulled in as dependencies from this core set:
     libsm6:i386 libpng12:i386 libfreetype6:i386 libxi6:i386
     libx11-6:i386 libasound2:i386 libstdc++6:i386 libfontconfig1:i386
     libxrender1:i386 libxrandr2:i386 libglib2.0-0:i386 libxfixes3:i386
     libglu1:i386
  • But once that's done, it seems to work. I need to put a bit more effort into validating it in the office, but I at least now know how to do it. And best of all, it doesn't seem to need Java!
17.03.2014 Android and Account Syncing
  • If you look at your Android phone settings and all your accounts have "sync disabled", you will never figure out how to fix it. It turns out that you need to look in the Gmail app and enable sync. That in turn enables Android-wide synchronization, or at least that's my experience and what [androidforums.com] tells you to do. WTF.
  • Ah, now that I looked at [support.google.com] it looks more sensible. So the reality is that you control that same setting in both gmail and in the Data Usage settings screen. I bet I disabled it while roaming internationally!
21.02.2014 DHCP in the eyes of Wireshark
  • We had a cable modem that was annoying the hell out of us because it needed to be restarted periodically -- twice a day in the latest weeks. So we called the cable company in and convinced them to swap the modem out. In putting the new modem in I did a lot of log-digging and realized that actually the request note that goes out periodically:
     Feb 20 21:31:05 anthem dhclient: DHCPREQUEST of 177.34.169.88 on eth3 to 189.7.80.20 port 67
    is actually not a request which is going unanswered, since Wireshark shows clearly that there is a Request packet followed by an ACK from that IP address. Oh, actually, I am just grepping the log wrong, because if you look at the full successful operation it looks like this:
     Feb 17 12:22:51 anthem dhclient: DHCPREQUEST of 179.154.136.190 on eth3 to 189.7.80.20 port 67
     Feb 17 12:22:51 anthem dhclient: DHCPACK of 179.154.136.190 from 189.7.80.20
     Feb 17 12:22:51 anthem dhclient: bound to 179.154.136.190 -- renewal in 4407 seconds.
  • The problem we were having previously was that at some point the modem stopped working, and the refresh DHCPREQUEST never got a response, which looks like:
     Feb 17 20:01:24 anthem dhclient: DHCPREQUEST of 179.154.136.190 on eth3 to 189.7.80.20 port 67
     Feb 17 20:02:32  dhclient: last message repeated 6 times
     Feb 17 20:03:35  dhclient: last message repeated 4 times
    Sometimes the modem would do 7 refreshes before stalling, but lately it was rare to get to 4. The log looks much healthier now!
  • One particularly weird thing is that in the actual IP allocation request comes from a different DHCP server than the one which provides the response:
     Feb 20 17:37:45 anthem dhclient: DHCPDISCOVER on eth3 to 255.255.255.255 port 67 interval 3
     Feb 20 17:37:45 anthem dhclient: DHCPREQUEST of 177.34.169.88 on eth3 to 255.255.255.255 port 67
     Feb 20 17:37:45 anthem dhclient: DHCPOFFER of 177.34.169.88 from 177.34.168.1
     Feb 20 17:37:45 anthem dhclient: DHCPACK of 177.34.169.88 from 177.34.168.1
     Feb 20 17:37:46 anthem dhclient: bound to 177.34.169.88 -- renewal in 4840 seconds.
    So 177.34.168.1 provided the response, but if you look at the DHCP Server Identifier that comes back in the OFFER packet it says 189.7.80.20. I don't think that's illegal, but it's certainly not what I've seen in normal site-wide DHCP. And if you look at the updates afterwards the refresh DHCPREQUEST is ACK'd by 189.7.80.20.
27.01.2014 Undocked Libreoffice panes
  • I had this problem for the longest time, and just found out that it is actually a documentation issue: [askubuntu.com]
  • How on earth did they get to control-doubleclick, though?!
22.01.2014 ADSL and Telefonica
  • Once a year I try calling my operator to see if they can upgrade my uplink. I'm amazed that to this day I can only get a 4MB/s link on an ADSL connection from Vivo (ex-Telefonica, ex-Telesp), the local wired operator. It's even weirder that on my current line, which I've had for about 10 years, I can't get an upgrade at all from the current 1MB/s. At the same time, Virtua offers me 20 and 100MB/s on cable at not much more that Vivo charges for their measly 1MB/s. Maybe I won't call again next year!
08.01.2014 Reminder to self: DBL and sendmail access map
  • The [www.spamhaus.org] DBL is great, but if it is blocking email you should be receiving, the way sendmail integrates with milters means you can't work around it by adding the sender address to the access map. The URI-milter package that we use doesn't provide whitelist support either; it's even mentioned in their TODO at [email.uoa.gr] Should I not be using 0.1-versioned software?
  • Another thing which sucks about the URI-milter is that /any/ match is considered positive; for the DBL, which I just found out lists even bit.ly (see [www.circleid.com] for details), this means that both the 127.0.1.2 and 127.0.1.3 are blocked, but they are quite different -- the first is for actual spam domains, and the other, for redirector domains which may be abused by spammers (see [www.spamhaus.org] for details).
     Non-authoritative answer:
     Name:   bit.ly.dbl.spamhaus.org
     Address: 127.0.1.3
  • PS: I've been invited to speak at the Brazilian Campus Party this year! I'm presenting at the Socrates stage on the 29th from 15h30 to 17h00. Joining me will be Paulo Henrique de Lima Santana, Fabio Pires, Marcio Junior Vieira and Marcelo Marques.
07.01.2014 Spreadsheets and Locales
  • I hate locales in spreadsheets. [webapps.stackexchange.com] -- why on earth does the locale change the ARGUMENT SEPARATOR in formulas??
05.01.2014 DHCP root-path weirdness
  • There is an odd bug in the DHCP root-path setting. I just don't know what it is.
  • I used dhcpdump to study it, FWIW; see [bentis.calepin.co] for some handy DHCP debug advice.
  • Cooked up a very crude overall boot time measurement system using rc.local and /proc/uptime. I wonder how reliable it is.
  • PS: for the FAQ "why does /proc/uptime show a larger number for idle time than raw uptime", see [ubuntuforums.org]
04.01.2014 Ripening
  • Have you ever wondered what causes fruit to ripen? Well, I did tonight, and I looked it up on Wikipedia and was amazed to see the article is terrible! But it contained a link to an incredibly interesting entry in the Plant Physiology info homepage: [plantphys.info]
03.01.2014 FSCache for the new year
  • I got a few 16GB SSDs to try as FSCache drives for our NFS-root diskless network. The idea of being able to transparently cache data in them and improve performance is really appealing, but at least on Ubuntu Precise the results weren't ideal.
  • Setup is fairly simple. I wired the SSD drive into the chassis, formatted it to ext4, installed cachefilesd, enabled it in /etc/default and modified /etc/fstab to mount the SSD and add the "fsc" option to the NFS mounts.
  • There is a problem with the kernel Yama security provisions that seems to be triggered by enabling FSCache; when running mutt a bunch of errors show up in the syslog like this:
     kernel: non-accessible hardlink creation was attempted by: mutt_dotlock
    However, it's possible to work around this (see post at [utcc.utoronto.ca] for details) by just setting
     kernel.yama.protected_nonaccess_hardlinks = 0
  • It seems to be working (well, in fact it started working after I fixed the configuration file; I think brun can't be less than 10% or it errors out) as I can see the cache directory growing and nodes being added to the cache hierarchy. And it probably does speed things up, as repeatedly opening up a 6.5 GB-file (this box has 8 GB RAM) results in a pretty good speedup with the disk drive being read at a constant 70MB/s:
     $ time cat win_xp.qcow2 > /dev/null 
     real    4m17.169s
     $ time cat win_xp.qcow2 > /dev/null 
     real    1m11.031s
  • Unfortunately, there are three issues I've found so far. The first is that there is an obvious race that happens when opening files multiple times simultaneously. This happens most frequently when using mutt to try and open the same mailbox twice; when this happens I get a flood of kernel messages and mutt weirdly showing an "unknown" mailbox.
     CacheFiles: Error: Object already preemptively buried
     [kworke] preemptive burial: OBJb2 [OBJECT_RECYCLING]
  • The second issue is that the file UID/GIDs appear to come up busted, at least in some mount points, like you get with NFSv4 when idmapd isn't running:
     kiko@memento:~$ ls -ld .ssh/
     drwx------ 2 4294967294 4294967294 4096 Oct 12 21:02 .ssh/
     
    And while behaviour under concurrent reads may be fixed in newer kernel versions (see [lwn.net] and [www.redhat.com] the other issue is that this KingSpec SSD is actually not that fast, even when compared with loading bits over the GBe network. hdparm shows the performance is kinda miffy:
     $ hdparm -tT /dev/sda1
     /dev/sda1:
     Timing cached reads:   18356 MB in  2.00 seconds = 9183.84 MB/sec
     Timing buffered disk reads: 214 MB in  3.02 seconds =  70.85 MB/sec
    For the same comparison above, if I turn cachefilesd off, here's the result:
     $ time cat win_xp.qcow2 > /dev/null 
     real    0m45.817s
    Wow -- I see a sustained 120MB/s over the network for that read.
16.12.2013 Ubuntu LTS Kernel Enablement Baselines
13.12.2013 Certificate renewal
  • Our SSL cert is expiring and I'm trying to remember how to generate a new CSR. Let me find out.. ah, right, there is a guide at [support.comodo.com]
  • One odd thing is having to generate the combined PEM file manually; I guess they can't do it for you because you hold on to the key when generating the CSR.
  • Anway, all of this is stuffed into /etc/ssl/LOCAL for us now, with a convenient README.
  • Installed also the certificate to be used with our SMTP server; this required a change to starttls.m4 which uses the same cert config entry for the CA and for the server (which suggests to me that it should really be a combined PEM file..)
10.12.2013 Oracle Java dialogs
  • 2013 has been the Year of the Java Update it seems. We've had to update multiple times given the security problems raised, and this is a problem because Banco do Brasil heavily depends on Java for access control to both company and personal banking websites. And since this raft of updates, OpenJDK no longer works for the company banking site, so I've been forced to use a computer with Oracle Java just for this.
  • Previous issues with updates included annoying popups and an unreliable login mechanism; you would get in once every 10 attempts or fewer on my machine. When it failed it would redirect you to a page telling you to install Java.
  • With the latest update (1.7.0_45) there are no longer blocking issues, but there is an annoying dialog that pops up every time I access the bank website: [java.com]
  • Turns out it's because of a limitation in the security mechanism for LiveConnect , a JS mechanism to call from a website into an applet; there is no way to make it work for all versions of Java: [blogs.oracle.com]
08.12.2013 Minors and airline points
  • Did you know TAM only lets you enroll into their alliance program children that are older than 2 years old? I generally wouldn't care to give a corporation a children's data, but the round trip to Taipei is probably worth a free flight or two so it would be nice to get.
  • It turns out China Airlines and SAA don't let you either. I guess those points are void :-/
  • Just recharged my TIM and Vivo pre-paid chips with 50 and 60, respectively. They are supposed to last 180 days. Will they?
03.10.2013 NFS TUNE
21.08.2013 Catch-all
  • Lent: GoPro to Iuri
  • Lent: Wheel-bag to João (PT)
  • Lent: Wheel-bag to Ozias (Disc)
20.08.2013 GRUB2 RAID weirdness.. understood and solved
  • I rebooted the server for the first time (after the failed disk) to cope with a kernel upgrade. No, I haven't yet swapped out the disk -- the chassis makes it a bit painful. But anyway, to my surprise, the system got stuck in the grub prompt, and I couldn't get it to boot by specifying the linux and initrd lines. Why?
  • One symptom I found was that cat /boot/grub/grub.cfg returned garbage. The other was that there was only one kernel version listed in /boot, although I had just upgraded the kernel so there had to be at least two. And yet another was that a file I had touched in /etc was also garbaged up. What's going on?
  • It turns out that grub was assembling the raid array using the failed drive (with SCSI ID 8) instead of the spare. It's really interesting that grub does a read-only RAID mount, but with very little checking, so when you have failed drives it show you weirdly half-stale data. To address this I disabled the drive in the server's SCSI utility and booted again, successfully. It gives an indication of how the grub RAID code works; I wish I had a way of saying what drives were being used in an md array as it would have saved me a lot of pondering.
  • Oh, and I used the SCSI utility to verify the drive as well. I have re-added it back and am waiting for it to fail again. The IBM drive seems to be really slow.. or maybe it's that it's on the secondary SCSI interface with a slower tape drive on it as well.
05.08.2013 RAID drive failure
  • Our sdd drive (SCSI ID 8) was kicked out of the raid because of an abort SCSI command overnight, at 3:36am local time to be precise. I'll add it back to the array after a reboot to see if it is transient or if it's really dead. The spare meanwhile seems to be working okay, but the resync takes ages..
  • It's worth noting that this fall-back setup has become weirdly slow. I'm still trying to figure out if it's the disk or something else.
06.06.2013 Java Banco do Brasil Locale bonging
  • Banco do Brasil's Java authenticator won't work in Mari's chromium browser, but it works in Firefox. What is the difference? Well, in chromium, we run into this issue: [groups.google.com] which has shown up on [bugs.debian.org] and [bugs.debian.org]
  • So Mari's locale is pt_BR.UTF-8. But the question is, if the error really is locale-dependent, why doesn't it trigger for Firefox?
  • I know now one more piece of the puzzle. We know the system locale is pt_BR.UTF-8. But if I visit [www.browserleaks.com] with both browsers I notice there is an important difference when displaying the results of Locale.getDefault() -- Chromium displays:
     Locale.getDefault() :
     Language Code   en
     Display Language    English
     Country Code    US
     Display Country United States
     
    whereas Firefox gives me
     Locale.getDefault() :
     Language Code   pt
     Display Language    português
     Country Code    BR
     Display Country Brasil
     
    and that's likely to explain why Chromium fails (C locale parsing assuming a dot as the decimal separator) while Firefox succeeds.
  • I'm still not sure what triggers the BB machine authentication reset that we run into periodically. So far I know that kernel updates do trigger it. What doesn't: java updates, firefox updates. Unknown: changing from OpenJDK to Oracle Java.
23.05.2013 The Joule
  • My Joule was acting up, so I tried to kick it into submission by formatting its partition via mkfs.vfat. I ended up with a filesystem with a bunch of garbage files that I couldn't quite figure out. I called Saris up and they said literally "don't use FAT32 or anything, just plain FAT" which I took to mean mkfs.msdos. That ended up creating a 12-bit FAT. I ended up with an empty drive. So far so good. Then after disconnecting and reconnecting the device to the computer, maybe a few times, it automatically created a CycleOps directory with a config subdir below it. Perfect! I guess the corrupted FAT entries suggest that the device's firmware only knows to write FAT-12 to it.. I need to check another device to confirm.
22.05.2013 Packet Mystery
  • My firewall ends up occasionally seeing traffic on a certain interface with the wrong source IP address. Why is that?
  • This is interesting but I believe unrelated: [blogs.oracle.com]
  • Wow, Java history is complicated.. or maybe just interesting [weblogs.java.net]
29.03.2013 Movie catch-up
  • Silver Linings Playbook
  • The Prestige
  • Un Cuento Chino
  • XXX bad movie about haunted house
28.03.2013 Shaping
  • We're planning on shaping our main incoming link to see if it can carry our regular traffic together with VOIP. I'm storing some pointers here to help me when we get to that: [en.wikipedia.org] [forums.juniper.net]
23.11.2012 Where is my /tmp/.X11-unix directory?
  • It's missing on all our diskless machines. What's going on?
  • Dunno, but it solved itself as part of regular updates and a pretty major fix to our diskless /etc/init scripts; there were subdirectories inside /etc/init with older copies of /etc/init/*.conf, and it turns out upstart also parses subdirectories -- oops!
  • Got an icalendar invitation viewer for mutt set up using [nickmurdoch.livejournal.com] though it did force me to use gem install which is so not the way to do it in Ubuntu!
22.11.2012 New link, and the shaping of ingress traffic
  • We had a new internet connection installed today, and it's a 4Mbps premium, unshaped link. In a weird encounter, however, this line from wondershaper completely kills my download performance (measured by a simple wget) on it:
     # tc filter add dev eth0 parent ffff: protocol ip prio 20 u32 match ip
         src 0.0.0.0/0 police rate 3800kbit burst 10k drop flowid :1
    I can't figure out why. I thought that maybe the rate stuff was wonky and wanted to use the avrate policer, but that doesn't work either:
     # tc filter add dev eth0 parent ffff: protocol ip prio 20 u32 match ip 
         src 0.0.0.0/0 police avrate 380 reclassify flowid :1
     RTNETLINK answers: Invalid argument
     We have an error talking to the kernel
    I thought that had to do with NET_ESTIMATOR being missing in the kernel config, as the LARTC meantions that estimators needing to be compiled into the kernel, but it seems that option is now gone and they are always built in. Oh, I see what's missing -- an "estimator" option. So this runs ok:
     # tc filter add dev eth0 parent ffff: protocol ip prio 20 estimator 1 2 
         u32 match ip src 0.0.0.0/0 police avrate 380 reclassify flowid :1
    Unfortunately, I'm still only getting 50% of what I expected..
08.10.2012 Bike Updates
  • Swapped the F1SL's chain, and replaced the F2C's rear derailleur cable. And then the week ended, and I left the office!
12.09.2012 So halt -p huh? And rsyslog
  • It seems that /sbin/halt no longer turns the system power off, and you need to run poweroff (or halt -p) instead. Did you know that?
  • rsyslog in Ubuntu has a rule that provides an admin feature I've always loved; the ability to log stuff to /dev/tty*. Yes, it's a bit of a security disclosure but I figure if you have tty access anyway..
  • The problem is it ships broken by default in Ubuntu. The issue has to do with privilege dropping; [kb.monitorware.com] notes correctly that the PrivDrop bits in rsyslog.conf cause tty writing to fail. What it doesnt't note is that you can use named pipes for this functionality and it works just fine; I found this out in a very unlikely blog comment here: [mikebeach.org]
  • So all you need to do to get this to work is to use "|/dev/ttyX" as the destination string for the facility. Cool!
  • Actually.. coming back years later, I think that was the case for older versions of rsyslog. But the current version in Trusty removes the need for the pipe. However, to get it working I definitely had to add the syslog user to the tty group, something I had never tried before.
10.09.2012 DD-WRT, time and NTP
  • [www.dd-wrt.com] has a pretty complete analysis of just how wrote the timezone and NTP handling in DD-WRT is busted. No wonder it was confusing me! For now, just using UTC on the device seems wisest.
09.09.2012 Cycling Metrics and The GoldenCheetah Performance Manager
  • For people looking for a way to get performance manager-style data from GoldenCheetah, in particular how to get the Performance Manager graph to make sense, check out the guide at [www.cyclingmusings.com] which apart from making sense of the metrics and PM tabs is particularly useful in explaining that you need to have Power Zone data entered (under Athlete options) in order to get BikeScores -- probably the #1 gotcha there. You can also get an idea of the mechanics behind the Metrics tab in a webcast at [bugs.goldencheetah.org] -- particularly interesting are the pieces that describe saved charts and user-defined ranges, which are hard-to-impossible to discover through the UI directly. And the user-defined ranges are automatically available as seasons in the Critical Power chart, which I found out about in this post: [groups.google.com]
  • And, if you're confused about TSS, CTL, ATL and TSB not appearing in GoldenCheetah, the thing to know is that GC 2.x uses Phil Skiba's metric system, which maps to the Coggan model as below:
     TSS => BikeScore
     IF  => Relative Intensity
     NP  => xPower
     VI  => Skiba VI (XPower/Average_power)
     CTL => LTS
     ATL => STS
     TSB => SB
    AFAICT Skiba VI is only visible in the metrics tab.
06.09.2012 PSU Again?
  • Actually, not so fast on the PSU fix. Ever since we put the drives back in the box, we've had somewhat random SCSI errors. This morning, when I installed a new network card, though, I'm unable to get the thing to work again. And get this: it only happens when the drive caddy is inserted into the case. If the caddy is sitting outside of the box, everything works fine. But the moment I slide it inside the case, SCSI errors galore. And I've replaced the cabling, improved the drive positioning, disconnected fans.. my hypothesis is a PSU grounding problem. But I need to get a replacement to actually verify..
21.08.2012 Clamav on Ubuntu as a Sendmail Milter
  • I am a bit surprised that nowhere is there good documentation on how to get Clamav running on Ubuntu as a milter. It's actually pretty easy.
  • sudo apt-get install clamav-milter
  • sudo freshclam
  • sudo /etc/init.d/clamav-daemon start
  • Add
     include(`/etc/mail/m4/clamav-milter.m4')dnl
    to your sendmail.mc file.
  • make && /etc/init.d/sendmail restart
  • That's it -- you'll have a working installation that is already scanning, quarantining and updating the virus database. I'm not sure what exactly causes freshclam -d to run in the steps above, but it's a daemon that will keep your database up to date.
  • To test, just bounce a message containing a virus (you probably have too many!) to yourself. It'll be put in quarantine mode, which I took a long time to figure out is actually a special sendmail queue, which you view like this:
     kiko@anthem:/etc/clamav$ mailq -qQ
     MSP Queue status...
     /var/spool/mqueue-client is empty
             Total requests: 0
     MTA Queue status...
             /var/spool/mqueue (1 request)
     -----Q-ID----- --Size-- -----Q-Time----- ------------Sender/Recipient-----------
     q7LCqigM003096    40178 Tue Aug 21 09:52 
          QUARANTINE: quarantined by clamav-milter
                          
             Total requests: 1
  • Quarantined messages are going to be in the same /var/spool/mqueue directory, but are prefixed with "hf". To remove stuff from the quarantine queue, you can use
    /usr/share/sendmail/qtool.pl -d -Q /var/spool/mqueue
    and you can also use qtool.pl to remove individual files.
  • If you don't want the messages quarantined (I probably don't) you can just set the configuration option "OnInfected Reject" in /etc/clamav-milter.conf. Note also that Stephen Warren suggests "AddHeader Add" here: [bugs.launchpad.net] Once you've made any configuration changes, just run an /etc/init.d/clamav-milter restart.
  • So far I am /very/ impressed with how simple and well it all works. Kudos to the project team who has come up with a very simple design -- scanner daemon, milter, database update daemon, and that's it. The packaging is also really nicely done, with the user permissions set up correctly and intuitively. It eats up some memory on the server, but we have so much anyway..
20.08.2012 Avenging Spelling
  • Every once in a while you receive the odd surprisingly fantastic message, and this weekend's winner says this:
    Hi, there is a small typo in [www.linaro.org] .. commecial should be commercial. No need to thank me, it's what I was born for.
    It's then signed "The Spelling Avenger". So how cool is that? I wish he had a website to link to..
17.08.2012 Pop and there goes a pirate
  • Our server's Corsair AX750 power supply just gave up the ghost, about three years in. Swapping it out was tough, in particular because the drives didn't seem to enjoy the whole movement. We installed spare drives, futzed around and finally remounted the original configuration correctly. Go figure!
  • Pretty cool reference on what directories to exclude from an Ubuntu backup: [askubuntu.com]
09.07.2012 Tracking Hangouts
  • Our somewhat complex multilink setup here at the office has a low-latency line which works really well for VOIP and video conferences, but to make the policy-based routing work, we need to know what hosts we are sending traffic to. Google Hangouts presents that challenge, since it's unclear what the hosts involved are. Plus, it changes! It used to be that talkgadget.google.com was all you needed to track, but now they've added stun.l.google.com and I'm still figuring out if that's all I need to pay attention to..
08.07.2012 If Unity won't run..
  • [askubuntu.com]
     kiko@gasolinux:~$ /usr/lib/nux/unity_support_test -p
     OpenGL vendor string:   VMware, Inc.
     OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 0x300)
     OpenGL version string:  2.1 Mesa 8.0.2
      
07.07.2012 War and the X7DVA BIOS
  • Our Supermicro X7DVA-8 server has ECC memory, and the memory and Northbridge run REALLY hot, so hot you can't touch the parts when the box is running, and if you load the server up too much the temperature trip squeals. So I decided that we could try updating the BIOS to the latest (albeit 2008) version. However, I was unable, I mean, TRULY unable, to get it to boot from USB. I managed to do it once, on a fluke, but the drive failed to boot, and after that I could never do it again. I ended up following Johan's advice and using a SATA drive with Freedos on it.
  • Booting USB on the server never worked because, although the BIOS could see the drive, it didn't seem to identify it as bootable; the drive would never appear as bootable on the F11 menu, nor on the BIOS boot order menu -- so though the drive /did/ appear in the BIOS listing, it never had a sequence number next to it, while the first 8 entries have numbers 1-8 next to them.
  • Getting Freedos on the SATA drive was less trivial than I thought it would be. I had a Freedos USB stick, and I can boot from that on another workstation, so it should be a matter of doing a format /s, or an old-skool sys c:. But getting the SATA drive to actually boot was tricky, and I ended up being reminded of the magical incantation:
     fdisk /mbr
    which was exactly what was missing. Getting fdisk.exe (and format.com) involved ISO-mounting the freedos ISO, but that wasn't so hard in the end. And once I managed to boot, the flashing procedure was trivial and only took a few seconds (just type in "flash x7dva208.rom").
  • However, after flashing, the system continues to run as hot as ever! At least the BIOS update got us up to DMA version 2.5, and perhaps because of it, or because I tweaked a bunch of BIOS settings, my lshw now reports I have L2 cache. This bothers me though... does it mean I didn't have this before, or that lshw can how see it?
             *-cache:1
                  description: L2 cache
                  physical id: 7
                  slot: L2 Cache
                  size: 12MiB
                  capacity: 16MiB
                  capabilities: burst internal write-back 
    Now why does the capacity not match the size? The mysteries of modern hardware..
04.07.2012 Vitamin D and the endurance athlete
  • My wife's serum Vitamin D came up rather low. So I did some reading about it over at [www.ncbi.nlm.nih.gov] and, particularly interesting for an athlete, [www.vitamindcouncil.org] Mariana never leaves home without a serious sunscreen, and we both always shower after riding, so it's probably advised to supplement, particularly for Mari whose serum levels are low. I like this coach's take on it: [lwcoaching.com]
  • Iron, Creatine, Beta-Alanine, Lysine, Alanine, Niacin, Vitamin C, Zinc.. Sometimes I am reminded how crazy it seems to take this many supplements every morning, and I eat a /lot/ of variety -- that's life being a vegetarian athlete!
16.06.2012 spambayes 1.1 and persistent_use_database
  • If you used spambayes like me -- and truly apparently only me, since there are ZERO hits on Google for this issue -- and have recently upgraded to Precise, you will probably find your .proclog spewing errors like this:
     Attempted to set [Storage] persistent_use_database
     with invalid value True ()
    This is actually caused by a change in a spambayes' configuration key, which is being set in your ~/.spambayesrc to "True", which used to be the right way to do it -- and still is according to the Debian package's README. If you look at spambayes' Options.py you'll find out that this is now a string field, with the following options available:
     ("zeo", "zodb", "cdb", "mysql", "pgsql", "dbm", "pickle")
    Apparently the old format is dbm, which is sort of the new default (the default is runtime selected between ZODB and dbm), but it's probably sensible to be explicit about it, so into .spambayesrc it goes!
12.05.2012 rdiff-backup and a full disk
  • Do you know what you get when you mix rdiff-backup and a disk full error? I do now:
     File "/var/lib/python-support/python2.6/rdiff_backup/regress.py", line
     290, in restore_orig_regfile
     tf.write_from_fileobj(rf.get_restore_fp())
     File "/var/lib/python-support/python2.6/rdiff_backup/rpath.py", line
     1195, in write_from_fileobj
     copyfileobj(fp, outfp)
     File "/var/lib/python-support/python2.6/rdiff_backup/rpath.py", line 64,
     in copyfileobj
     outputfp.write(inbuf)
     IOError: [Errno 28] No space left on device
    Besides being a royal pain in the ass you won't believe me when I tell you that to repair this requires actual surgery on the filesystem. Here's a bug report [savannah.nongnu.org] and a thread that discusses the issue more widely [comments.gmane.org]
  • The solution I found was to find a couple of really large files (MOVs in fact) in the rdiff-backup-data subdirectories and move them around to a separate filesystem while running --check-destination-dir. Hopefully that won't error out completely -- still running.
  • Ah, indeed rdiff-backup can cope with that -- it basically creates zero-sized files where it would have placed the original file and moves on. In my case it's slightly more complicated to interpret because this is actually a recovery pass (using --check-destination-dir) from a backup that failed and therefore the recovery pass is trying to recreate files in the rdiff-backup master directory which are actually deleted in the live system. But that's easier to amend later!
  • The best solution I've found to this problem, so far, is to keep some easily-freed large files on the filesystem. That way, even if you /do/ run out of space and crash, well, you can move them away and then recover.
05.05.2012 Swaps
  • Swapped the battery on my Powertap Pro
  • Also swapped stem and handlebar on the F2C
24.04.2012 Cycling Complexity
  • Finding the exact replacement part I need from Shimano proves tricky: Y-4BN98060 is what I need according to [www3.big.or.jp] but it did take a while for me to figure out I had an SL wheelset. In fact, I by mistake bought a pair of Y-4B909000 (confusingly labeled 4-4B909000) only to find it didn't actually work! Of course, that is for the WH-7800, etc, etc. Damn.
  • /usr/share/xsessions is where GDM finds the environments available for users to log in to the system.
  • How cool is [www.asciiflow.com] huh?
19.03.2012 Sloppy focus on Ubuntu Unity
21.02.2012 TouchPlayer
  • Man I'm loving the HP WebOS Touchpad. I'll write up a proper blog entry about it, but if you're trying to get an application to play random AVI and WEBM videos on it, you'll need to install a third-party application. I installed TouchPlayer, which apparently is a build of mplayer. It wasn't exactly trivial, essentially because you need to use a host PC to do the whole process.
  • First, you enable "developer mode" on the Touchpad, which involves typing stuff into the tablet's search bar.
  • Next step, you run, on a host computer, something called "WebOS Quick Install". This is just a java archive you run doing "java -jar WebOSQuickInstall-4.4.0.jar". You can get it from [dl.dropbox.com]
  • Before running this for the first time, you need to install a "driver", which is called palm-novacom; you can download the .deb from [developer.palm.com] -- beware, it will run a daemon, which kind of freaks me out.
  • You can now connect the tablet to the computer. You now run WebOS Quick Install, which should detect your device fine. Now, click on a little networky icon, and select "Preware". This gets installed on the actual device. You can later use this on the actual Touchpad to install applications, and there are quite a few.
  • Last stop! You then need to download two .ipk files and install them on the tablet through WebOS Quick Install. The first one is for the filemgr service, which should be installable through Preware, but which is currently 404ing -- no worries, a version is available from [code.google.com] Touchplayer itself is available as a download from [mobilecoder.touchpadhp.info] -- just install both ipks and you'll have it available on the device.
  • That's it -- disconnect and enjoy!
20.02.2012 Blobs
  • I just extracted Nvidia's driver source from Ubuntu's latest package and generated an ls -l on kernel/. It's interesting that it's a non-GPL'd kernel module, something which I know is both kinda rare and kinda controversial. The full list is at [pastebin.com] but the remarkable thing is this blob in the middle of it:
     [... dozens of 1-100K .c files]
     -rw-r--r-- 1 kiko mondo 13444768 2011-04-18 18:54 nv-kernel.o
12.02.2012 IcedTea and Banco do Brasil
  • I know that IcedTea really didn't /use/ to work with BB, forcing us to install Sun Java, and yes, even the bank says that at [seg.bb.com.br] but while it is true that in Chromium it doesn't appear to work, if you run Firefox in Oneiric it works exactly as you'd expect it to. Yay!
29.01.2012 Google Talk IPs
  • 74.125.93.127, 74.125.93.126 for the actual STUN and UDP media traffic
  • 74.125.93.102 for the HTTPS traffic.
23.01.2012 Randomness
  • Installed today a new front tire on my Felt F2C, a Schwalbe Durano Plus; let's see how long it will last. The last one, a Specialized All Condition tire, is my current favorite, but I got a nasty sidewall cut in it with practically new thread that kind of miffed me.
  • I did a TV interview today about the educational work we're doing with cyclists and drivers in São Carlos: [eptv.globo.com]
  • Also posted a reply to a rather misleading analysis of the effects of open source on the market, this time directed at Stoq, who the poster says is "killing commercial sales-focused ERP software" -- I do wish that part was right, though! [www.guiadopc.com.br]
  • Fixed my diNovo Edge keyboard again on Oneiric (and chattr +i'd the udev rules file to stop it from breaking!) as per [ubuntuforums.org]
  • Where did my panel volume control go? Well, actually, in 11.04 and onwards, the volume control is now provided by an indicator applet, so what's you're really missing is the indicator applet! Just add it back and be happy.
  • Need to replace your PowerTap bearings? Check out [www.youtube.com]
13.01.2012 Happy New Year
  • The random changes over the new year bring me a few new bits of wisdom. I'm doing the finishing touches on a migration from a 32-bit Lucid on an X61 to an amd64 Oneiric on my mom's new X220 with SSD. She's not sure she likes it yet, but I'm trying hard so she does!
  • The first hint is a workaround for the very odd 40 second hang that you get with OpenOffice (and co) when using your computer in a network whose DNS server doesn't respond to weird queries. It seems that OOO is doing a DNS lookup for, literally, "foobar.(local)" where foobar is your local hostname. The way to solve that is to add a weird entry for the machine in your /etc/hosts file. End of hang!
  • Second is how to install hamachi on Oneiric. Just pull the file from a helpful PPA on [launchpad.net] -- this and hachigui, which you can get from webupd8team's PPA at [ppa.launchpad.net]
  • Third is a reminder of the Qt4 problem that happens on Oneiric; running something like GoldenCheetah fails because of [bugs.launchpad.net] -- or maybe I should say maybe because of, because that bug is fixed and yet this fresh Oneiric install can't run GC. Never mind, install qt4-qtconfig and then run qtconfig-qt4 to select Cleanlooks to get it working.
  • There is so much stuff that I just know nothing about. Today, it's Xen. I'm actually looking for cases of things that exist in the kernel source tree but which don't build into a kernel itself; I know perf is one such thing, and kvm-tools might be another (see [lwn.net] for Ingo's rationale of why more userspace should go in). In fact, there's a bunch of stuff in tools/ -- cpufreqtools, turbostat -- enough that in Ubuntu you get this all through the linux-tools packages. Outside of tools, nothing I can see. Well, there's scripts, but the distinction between tools and scripts is blurry to me -- FWICT scripts is for tools related to managing the kernel source tree, whereas tools is for userspace tools you'd use inside the OS. (Sidestep into coccinelle, which I had seen in-tree but didn't know what was -- it's a tool to describe and help apply a semantic change to source code) There's samples/ too, which contains example kernel and userspace code. Finally, there's firmware/, though that's legacy firmware pulled out from old drivers that are being moved to use the external linux-firmware package and the request_firmware() API. So there are quite a few things, but none of them seems to be a hypervisor (or a similar runnable hyperthing).
  • Anyway, short story is Xen itself, the hypervisor (in other words, the 500k-or-so /boot/xen-*.gz) is not in-tree. It is distributed standalone from [www.xen.org] and includes quite a bit of code that is forked from the kernel (obvious examples are bunzip.c and the acpi/tables stuff). Support for running the kernel as a Xen dom0 guest works out of the box as of recent kernel versions.
01.12.2011 SRAM Chains
  • Are not directional. At least the PC1070 chain I replaced on my F2C isn't! Apparently all the Shimano 10-speed chains are, though: [www.bikerumor.com]
  • Insightful link on the risks of sendmail virtusertable remote forwarding: [www.clasohm.com]
  • My Nokia 3250 was driving me crazy asking me "permitir ao cartão sim o envio da mensagem" (which actually means "Allow SIM to send message?"). Turns out you need to disable Settings->Phone->Security->Confirm SIM Services, and it happens because it's a 64K SIM and this phone doesn't support something called ENS. More information here: [www.howardforums.com]
  • Do you have an old server upgraded to Natty that's not getting its grub updated when you install a new kernel? Well, I did, and the reason was it was missing a single line in /etc/kernel-img.conf:
     postinst_hook = update-grub
    Thanks to Tim Gardner and Steve Langasek who helped track the problem down.
18.11.2011 Multi-homed pain
  • Upgraded Anthem to Oneiric; everything works well EXCEPT for the fact that some packets are ending up in the wrong interface. Specifically, if a request comes in to the interface which isn't the default route, it's not being replied to on the same interface. What's going on?
  • Turns out that SOMETHING happened in the 3.0 timeline that finally made it mandatory to specify rules that explicitly set the right routing table for replies on each IP address. So the fix was just adding these two lines:
     from 189.19.234.109 lookup dsl-eth2 
     from 200.210.17.18 lookup dsl-eth0
    That was really all that was missing. And, if you read the rules, you gotta ask yourself how it is possible that this worked before!
  • Of course, after I've found out what it is I find this site from 2007 explaining exactly this: [kindlund.wordpress.com]
  • Ran into bzr bug with our managed /etc. Damn it! [bugs.launchpad.net]
16.11.2011 The Eyes!
  • I've finally booked eye surgery for next week; Monday I go in for exams and then Tuesday or Thursday is the actual operation.
  • Interesting research in healing of Lasik-cut flaps: [www.usaeyes.org] -- damn, 2 years is a long time to wait for safety, though!
26.10.2011 Clearing out the attic!
  • Just deleted my ~/tmp, ~/Downloads, ~/devel/FREEZER today, so if I need it, remember to look at yesterday's backup ;-)
07.07.2011 A History of Tabs
  • I'm today surprised and mildly annoyed that the shape of Chromium's tabs are exacly the same as the tabs I designed in the first version of the tool www.lecto.com.br uses internally.
  • Was stuck doing a nfs-common update on the diskless, which seems to keep breaking in the postinst phase as nobody really runs nfsroot at Canonical (wink), but found out that dpkg keeps all the scripts in /var/lib/dpkg/info, and you can just hack the file to make the postinst pass and forget about it for now. Yay!
06.07.2011 Xen and Natty
  • Don't seem to mix, we found out when upgrading dragon2 to Natty, unless you update /etc/initramfs-tools/modules to include platform_pci and xen_blkfront. Read all about it at [bugs.launchpad.net]
05.07.2011 GRUBbed out on a Sandybridge
  • Since in the office we only use diskless machines which use gPXE, I end up not worrying about Grub very much. It turns out Grub2 has some weird traits, some great, some bad, and some probably a bit buggy.
  • For instance, with the default settings
     GRUB_HIDDEN_TIMEOUT=0
     GRUB_HIDDEN_TIMEOUT_QUIET=true
     GRUB_TIMEOUT=10
    I can never seem to get access to the menu. I ended up adding a boot tone -- Super Mario no less at [www.reddit.com] -- and then cranking HIDDEN_TIMEOUT to 1, which allows me to press two escapes during the tune and getting to the menu. The _QUIET thing is interesting; if you mark it false you get an ASCII countdown instead of a blinking cursor.
  • When I update-grub on this computer, I /must/ do a grub-install /dev/sda or I get into an infinite reboot loop without any recourse. Recovering involves using a USB boot image which fails into an initramfs prompt without trivial access to my RAID1, so I am not really interested in debugging beyond this.
  • This is Maverick's grub2, so it might be a solved-in-Natty thing.
  • Oh and, please note: if you have a Sandybridge CPU, Memtest86+ 4.10 will hang before starting up. You need to run at least 4.20; thankfully it's pretty easy to replace it -- download and stick it into /boot.
  • Mari's new box also suffers from [bugs.launchpad.net] so I'm also figuring out an updated e1000e driver for her. Ended up using DKMS to do the build, which isn't as hard as it looks. Just use a config like this:
     PACKAGE_NAME="e1000e"
     PACKAGE_VERSION="1.3.17"
     MAKE="make -C src/ BUILD_KERNEL=${kernelver}"
     CLEAN="make -C src/ clean"
     BUILT_MODULE_NAME[0]="e1000e"
     BUILT_MODULE_LOCATION=src/
     DEST_MODULE_LOCATION[0]="/usr/src/"
     AUTOINSTALL="yes"
     DEST_MODULE_LOCATION[0]=/extra
    make sure you have the right headers packages installed, and grab the latest e1000e driver from [downloadmirror.intel.com] -- funny thing was, I had it all set up when I installed the headers, and it Just Worked as part of the post-install hook. I did an update-initramfs -k all -u just in case, though.
  • Finally, this specific Sandybridge Maverick setup ended up with a horribly slow Xorg; turns out it's trivial to get it running fast by using the PPAs suggested in [ubuntuforums.org]
27.06.2011 A RAID Tale
23.06.2011 An ARM Machine List
22.06.2011 Eating those leftovers
  • Started the day by noticing the backup failed because the disk was full. Turns out I was a) backing up a bind-mounted /proc and b) had an extra copy of the root directory backup. Cleaning these up via rdiff-backup --check-destination-dir.
  • Updating Chromium to the latest beta fixed the hang I was seeing -- nice when it's easy to fix something like that!
  • The GC loader worked, but I needed to update /etc/group in the diskless system to get it to read /dev/ttyUSB0.
  • Worked around the weird upstart hang by checking for statd running inside diskless-mount, and used the same approach to avoid having to do the ugly sleep 6s inside statd-start.conf.
  • My GF account is unblocked.
  • Note to self: just rename .conf file extensions to something else if you want to disable them.
  • The only weird thing is that I asked this machine to halt and it.. seems to be hanging; or at least, taking a real long time. Okay, it seems to be that initctl is telling portmap to shut down, but it isn't dying, or maybe it's not even getting that far. I change umountnfs.sh to do an emit --no-wait to work around this and move on.
  • If you're using nautilus on an NFS serve, for instance, looking at your home directory, you may find it hangs to a horrible halt. The problem is related to apparmor: [bugs.launchpad.net] -- the workaround is pretty simple, adding the nameservice stanza to evince-thumbnailer.
  • References you need when doing this sort of work: [nfs.sourceforge.net] [upstart.ubuntu.com] [upstart.ubuntu.com]
  • Finally, the command you want when you want all your UUIDs neatly presented: blkid(8)!
21.06.2011 Natty boot leftovers
  • Sometimes, I'm getting a weird hang at bootup that says:
     IP-Config: no response after 3 secs - giving up
     IP-Config: eth0 hardware address XXX mtu 1500 DHCP RARP
    Online reference: [bugs.debian.org]
  • I've just noticed that cups and ssh get stuck in "starting" states; wonder what's causing that. SSH at least starts up okay at first, though if I stop and then start it, it hangs. I thought it had to do with both being "respawn" jobs, but acpid is as well and I can start and stop it without issues. Does this have to do with the /state stuff I made wait on both those services... or are they just being very very slow?
  • I wasn't logging stuff properly; turns out my rsyslog.d directory didn't have a proper 50-default.conf file, and nothing was getting logged out. But it is also being affected by this weird state blockage I am pointing out above.
  • Chromium won't load any pages. Gah.
  • Need to check if the PowerTap loader in GC still works.
  • Need to check if my GF account is unblocked.
  • rsyslog whines that xconsole is missing, but runs fine; Ubuntu bug: [bugs.launchpad.net]
  • The backup failed because the external drive ran out of space. Guess it's time I went to bed :-/
20.06.2011 Updating Root NFS to Natty
  • I'm in the process of updating our default root filesystem to Natty. I started out a bit stuck on this because our previous filesystems used modified bzimages (XXX: which tool?) to indicate we were booting from NFS; reading through [help.ubuntu.com] I just needed to update the initramfs.conf to indicate BOOT=nfs and regenerate the initramfs'. The server IP address and root-path is provided through DHCP, and this change AIUI tells the initramfs to look there.
  • Now, /dev/shm was coming up with the wrong permissions -- it's meant to be 1777, but wasn't. I can't quite figure out why this was happening, but it has to do with some leftover in /etc/init because I started from a clean slate and it's not happening now.
  • Next, when we boot NFS we get hung in upstart; if I use the alt-sysreq key to kill everything I notice that a) no portmapper is running and b) neither is statd. If I try and run the portmapper manually, it just exits without telling me what's going on. A strace tells me that a) /var/run/portmap.pid can't be written to (sure, the filesystem is read-only) and /dev/log isn't running.
  • So first, let's see if we can mount that filesystem read-write up front. I try this first by putting an entry in the rootpath being served up by dhcpd, but I get nfs-premount complaints suggesting that it doesn't like the ",rw" suffix. Instead, I put an "rw" into the gPXE script commandline and it seems to work.
  • To check, I created an upstart script that simply spawns bash:
     description^I"BASH for the power hungry"
     #   
     start on startup
     #   
     # Output to the console
     console output
     # Tell upstart to wait (see
     # [upstart.at] for more)
     task
     # Run the command
     exec bash
  • And indeed, I can now write to the root filesystem. Great. Now on to debugging why portmap and statd don't run when they should. I can't seem to wrap my head around the statd-mounting and statd interaction, so I'm trying to break it into pieces. Now, I expected the following script:
     start on mounting TYPE=nfs
     task
     exec start statd
     
    would block the NFS mount from going through until statd was running. But either start is asynchronous, or it doesn't block at all. I guess it's because statd takes a while to actually get going, and meanwhile the "mounting" event has completed and mountall. My workaround looks like this:
     start on mounting TYPE=nfs
     task
     console output
     normal exit 1 2
     # 
     script
         start statd
         # This apparently is necessary to ensure the statd run
         # completes; it's a hack but it seems to work more reliably than
         # anything else
         exec sleep 6s
     end script
    and so far it seems to work okay.
  • I then added in scripts, in the following order: rc-sysinit, udev-fallback-graphics, ssh, dbus and gdm. Reboot.
  • Worked. Added now rc and rcS.
  • Noticed I should have brought in mounted-varrun and the other mounted-* bits earlier. Done, though I question how useful mounted-tmp and mounted-dev are in this diskless setup.
  • Brought in a few more hopefully harmless bits: console-setup, dmesg, hwclock*, irqbalance, control-alt-del and module-init-tools.
  • Pulled in a slightly modified ypbind startup script and made gdm depend on it. The script comes from [bugs.launchpad.net] though I can't quite get it to work with the IFACE=!lo check that it uses; I just dropped that check which should work okay.
  • I dropped the mountall-net script which seems to be a hack to work around the lockd issue I think I worked around in a simpler (if slower) way.
  • Started using bzr to control the directory as it's a much better match than this ranting blog entry ;-)
  • I'm thankful for Johan's hint to radeon.modeset=0 on the kernel commandline which (in combination with disabling udev-fallback-graphics.conf) allows me to actually see what is being spewed in the log. I'd love to have upstart just log all the events to a file..
  • Overall, the main issues with upstart racing seem to be around the time the daemon starts up and is actually ready to handle events, and to a lesser extent around the complexity of the state transitions themselves. In our case, while we hooked on mounting to start statd, running the statd.conf script instantly allowed mounting to proceed, which would fail because statd wasn't yet running (why 5s of waiting addresses that, though..) Or in the case of NIS, which is running but not enough for GDM to actually show the list of users.
  • Spend a few minutes figuring out why file locking was broken (again). Turns out that a) /var/lib/nfs needs to have the sm and sm.bak directories mode 700 and writeable by the user running statd. Who, incidentally, is taken from the owner of /var/lib/nfs, and which was incorrectly set to syslog on this system (and there was actually no lockd user in the passwd file, oops).
  • It's unlikely postfix will actually work without /var/spool set up for it. But how do you get it set up initially? Easy -- just create /var/spool/postfix; the rest gets set up by postfix itself!
  • The diskless boxes write to /var/log early in the boot process; I've worked around this by mounting a tmpfs there and later on mounting directories under /state to handle that more gracefully. Looking at the tmpfs /var/log generated without /state. up to gdm running, so far what's written to it is:
     total 336
     drwxr-xr-x 2 root root     60 Jun 20 21:02 ConsoleKit
     -rw-r--r-- 1 root root 108599 Jun 20 21:02 Xorg.0.log
     drwxr-xr-x 2 root root     80 Jun 20 21:02 gdm
     -rw-r--r-- 1 root root    292 Jun 20 21:03 lastlog
     -rw-r--r-- 1 root root   1615 Jun 20 21:02 pm-powersave.log
     -rw-r--r-- 1 root root 208615 Jun 20 21:02 udev
    I've stocked this in 00early-log.contents files in the directory for later debugging.
02.06.2011 Bootable DOS USB Drives (for BIOS updates)
  • Have this annoying issue of having to update a BIOS, but not having a bootable DOS USB stick to actually run the update? Well, I did, and I spent a LONG time reading through various confusing blog posts until I stumbled upon one that gave me two important nuggets:
  • First, use FreeDOS
  • Second, use makebootfat (see [linux.die.net] for a manpage)
  • This actually translates into a very small number of steps. First, you download the fullcd FreeDOS image from [ftp.ibiblio.org] and then you do something like this (watch out for sdX below):
     # You'll need to change only this line
     DEVICE=/dev/sdX
     # Set up your DOS filesystem
     sudo mount -o loop fdfullcd.iso /mnt
     mkdir /tmp/dosboot
     cp /mnt/freedos/setup/odin/kernel.sys /tmp/dosboot
     cp /mnt/freedos/setup/odin/himem.exe /tmp/dosboot
     cp /mnt/freedos/setup/odin/command.com /tmp/dosboot
     cp /mnt/freedos/setup/odin/more.exe /tmp/dosboot
     # Set up a config.sys
     cat << __EOF__ > /tmp/dosboot/config.sys
     DEVICE=HIMEM.EXE
     LASTDRIVE=Z
     BUFFERS=20
     FILES=40
     DOS=HIGH,UMB
     DOSDATA=UMB
     SHELLHIGH=command.com /P
     __EOF__
     # Get the FAT boot sector
     cd /tmp
     unzip /mnt/freedos/packages/src_base/kernels.zip source/ukernel/boot/fat16.bin
     mv source/ukernel/boot/fat16.bin .
     # Do it
     sudo makebootfat -X -o $DEVICE -b /tmp/fat16.bin /tmp/dosboot/
     
  • I'm probably overdoing it by using HIMEM, and selecting only a few files from the odin/ directory; you might be able to avoid the config.sys entirely, and just use the whole /mnt/freedos/setup/odin/ as makebootfat's argument instead of the dosboot thing.
  • The -X is the only gotcha. I think you need to use it because you're using the fat16.bin file (for compatibility, maybe?)
  • It's really that easy. I have no idea why people complicate this so much. The only other post I've seen which uses this strategy is [notes.realitysedge.com] but it is still not enough of a cookbook for me. I suspect there's some issue with BIOS compatibility, and that my method doesn't work for all USB drives or computers, but it does work for me. For reference, the original blog post that hinted me on this is at [blog.realcomputerguy.com]
  • And wow, FreeDOS really does boot fast.
24.05.2011 Mutt, vim and auto-completion
  • I recently changed my mutt options to use autoedit, which is cool because it puts me in vim very quickly to reply to email, but not so cool in that typing in addresses becomes a lot harder. Well, today I spent an hour working up something that autocompletes the addresses when I type tab in those fields. Enjoy my quick hack here: [pastebin.ca]
22.05.2011 From 1TB to 2TB
  • Mari's computer needs lots of disk space because it's where she collects the photos and pictures she publishes on [www.marignatios.com] -- and photos are huge. This weekend I had to move the files from the existing dual-disk RAID-1 to a new pair of disks.
  • It's a long story, but to shorten it: a) I used rdiff-backup to back up the actual filesystem, except for the images b) unfortunately, the only external drive I had that was big enough to hold all her images was formatted as vfat -- I knew I was going to regret it c) I ended up just rsyncing the images to the vfat drive which worked okay because there were no permissions or ownership to care about d) I had to use a live-usb image to actually set up the new disks and copy the data across and finally e) I had a hard time getting grub to work, and I ended up stuck in the grub rescue> prompt once. When that happened, I found [help.ubuntu.com] to be invaluable, so if you ever run into that prompt yourself and feel lost, just read that page. Once I had booted into the system once, grub-install and update-grub fixed it permanently. Two big hammers, but they fix things just like the old /sbin/lilo did ;-)
17.05.2011 Can't Mailman and LinkedIn just be friends?
  • We run a number of mailing lists at Async; quite a few of them are related to Stoq, our made-for-Brazil point-of-sales-and-everything-else management sytem. The lists are busy with lots of users that subscribe to ask about features and workflows and it is always cool to see the interactions there. However, there's one thing which really drives me nuts, and that is that because of web-based email integration, LinkedIn thinks it's cool to send email to our mailing lists. Well, it's not cool, but mysteriously, Mailman doesn't block the emails either!
  • The reason this happens is a subtle Mailman behaviour that I suspected yesterday but which Barry Warsaw confirmed today: by default Mailman also looks at the Reply-To header to check whether the sender is subscribed and therefore allowed to send mail to the list. The email we got had headers like this:
     From: Foo Bar via LinkedIn <member@linkedin.com>
     To: Bar Baz <stoq-users@async.com.br>
     Message-ID:
     <1494000622.13827915.1305593008105.JavaMail.app@ela4-bed35.prod>
     MIME-Version: 1.0
     X-LinkedIn-Template: invite_member_23
     X-LinkedIn-Class: INVITE-MBR
     X-LinkedIn-fbl: s-qakeuW-Xh7nGNqQ4F7rGOINKZVY7HNzQuIeYRlX2tnWAO4zNKkm
     Subject: [Stoq-users] Foo Bar quer manter contato no LinkedIn
     X-BeenThere: stoq-users@www.async.com.br
     X-Mailman-Version: 2.1.13
     Precedence: list
     Reply-To: Foo Bar <foo-bar-is-subscribed-to-list@hotmail.com>
      
     [...]
  • The email member@linkedin.com is actually in a discard_these_nonmembers configuration rule for the stoq-users mailing list, so it should be getting discarded. But because the Reply-To address is of a list subscriber, Mailman thinks that the email is truly being sent by the subscriber, and not by a proxy like LinkedIn. It happily ignores other sender filters and delivers the spam. Ouch!
  • To fix this, you need to update /etc/mailman/mm_cfg.py and set the SENDER_HEADERS variable; the default value (in /usr/lib/mailman/Mailman/Defaults.py.) is:
    SENDER_HEADERS = ('from', None, 'reply-to', 'sender')
    I ended up using simply:
    SENDER_HEADERS = ('from', None)
    and then restarted Mailman. And now I'm waiting to see what my moderation queue looks like -- hopefully it will prove that the change worked! Barry tells me that in Mailman 3 this behaviour is clearer, and that they also have a debug mode planned which would allow us to send a probe email to find out why Mailman is doing what it does. But for now, problem solved!
  • (Launchpad, btw, also does this impersonation trick in order for its bug mail interface to work -- but because of how accounts are set up we don't really make it easy for you to send unwanted mail to a mailing list)
16.05.2011 shirt sizes, aggregate totals and SUMIF
  • I produced a spreadsheet in Google Docs today that was a pretty simple mapping of name and shirt sizes to quantities; something like this:
     foo S 3 
     bar M 5
     baz S 2
     poo L 1
  • I wanted to include an aggregate sum of each individual size; something like:
     S 5
     M 5
     L 1
  • It turns out that the SUMIF function is what I want. You just need to get the syntax right; I used, for each of the sizes, a cell like this:
    =SUMIF(B2:B31, "=S",C2:C31)
    That formula will check column B for cells that contain the string "S"; where they do, it will sum the numbers in the corresponding row of column C, which is exactly what I wanted.
  • There might be a way to do this without having to actually code a cell for each total. But finding out how eluded me in the five minutes I had for this task, so SUMIF remains as my favorite solution for today.
09.04.2011 MSN on Pidgin on Maverick
  • On Mari's computer, Pidgin doesn't like MSN anymore; it reports
     1 account was disabled because you signed on from another location
  • I've seen this mentioned in a few places, but nowhere as a big deal for Maverick users: [developer.pidgin.im] [trac.adium.im] [permalink.gmane.org] [pidgin.im] [bugs.launchpad.net]
  • I /think/ the problem is just that Maverick's Pidgin is old; as has happened before, the MSN protocol was updated and the implementation wasn't. At least upgrading pidgin to the version packaged in their release PPA (see [www.pidgin.im] for details) was enough to solve the problem permanently.
08.04.2011 Path MTU Discovery mysteries
  • Again, facing P-MTU-D issues on my secondary outbound interfaces, and I don't think it's not the fault of the upstream provider (well, I tried with both, and maybe they are both broken). Symptoms are the usual large-transfers-get-packets-dropped-silently-when-incoming.
  • It could be a network card issue, because I'm seeing errors only on that interface:
     RX packets:3058114 errors:14691 dropped:0 overruns:0 frame:14691
  • And I'm fascinated by the reply Stephen Hemminger gives at [www.spinics.net]
  • But for now the workaround is to use the good ole MSS clamping cheat covered at [lartc.org] (how did that disappear from my iptables rules, though..)
  • For other possible problems, look at [prowiki.isc.upenn.edu]
  • In updating the subnet entries, I cheated on the boolean math and used [jodies.de] to calculate the masks and addresses, which is pretty neat.
06.04.2011 Building GoldenCheetah 3.0
  • Midnight project. Trying to build GoldenCheetah's 3.0 branch requires lots of package scavenging, installation and some makefile hackery. But when it does build:
     kiko@gasolinux:~/GoldenCheetah/src$ ./GoldenCheetah 
     Cannot open qollector_interpret program, available from
     [opensource.quarq.us]
     QMetaProperty::read: Unable to handle unregistered datatype 'RideItem*'
     for property 'RideMetadata::ride'
     QFileSystemWatcher::addPaths: list is empty
     Segmentation fault
  • Sigh.
  • A make clean and make later, it seems to kinda work though! I am getting weird results on the graphs for imported rides, so I need to test next by downloading from the actual PT head to see what I think.
15.10.2010 Full CUPS and an empty lsusb
  • Mari's printer won't print. It's not the second time this has happened, but I keep forgetting what causes it. The symptom is simple: you ask the print dialog to print, and nothing happens. The printer properties screen says "/usr/lib/cups/backend/hp failed". The CUPS error_log says
     D [21/Nov/2010:12:30:49 -0200] [Job 43] prnt/backend/hp.c 745: ERROR:
     open device failed stat=12: hp:/usb/PSC_1400_series?serial=BR64H3G1K704BM
    And the final, odd hint is that lsusb just returns you to a shell prompt with no output. Do you know what the problem is?
  • It's related to working around a PowerAgent bug I wrote about a few months ago. To work around a Java library's fixation on /dev/usb I added a link from /dev/bus/usb to /dev/usb, and forgot it there. What happens then is funny: for some reason when plugging in the printer (or returning from suspend, it turns out) udev wants there to be a /dev/usb/lp0 entry, and since /dev/usb is linked to /dev/bus/usb, it ends up creating the lp0 node in /dev/bus/usb/lp0 which in turn causes lsusb to break, since it doesn't expect to find any device nodes on the first level of /dev/bus/usb. Delete the /dev/usb link and you're back in business. Pity it took me 20 minutes today to remember this!
  • PS: the same sort of error hit us on Anthem, our server, a few weeks later -- the hp.c backend complaining about open-device-failed. It ended up also being an issue with USB connectivity -- the USB hub we're plugging the printer into is just flaky.
14.10.2010 For a rainy Saturday
13.10.2010 The keyboard layout that won't go away
  • Is it happening to you too? Ubuntu won't let me get rid of an incorrect keyboard layout, and in fact defaults to it!
  • The problem is pretty complicated, but it's related to GDM, .dmrc and /var/cache/gdm/*/dmrc files. Or at least I've figured this from the various places it's been reported: [bugs.launchpad.net] [bugzilla.redhat.com] [ubuntuforums.org] [bugs.launchpad.net]
  • I deleted .dmrc, /var/cache/gdm/kiko/dmrc, logged out, logged back in and the problem is gone. But I think there's definitely a bug in there..
12.10.2010 Anthem now 64-bit
  • After a whole day of work Anthem is now Maverick 64-bit installed on a simpler set of RAID-10s. Issues that we hit so far:
  • [bugs.edge.launchpad.net]
  • [bugs.edge.launchpad.net]
  • A weird long hang of sync and dpkg when under heavy I/O load that might a tuning issue or something else
  • No /etc/init.d/iptables to restore my firewall rules, but iptables-persistent saved the day.
  • Cold bootup via NFS to GDM screen in 55s; not bad IMO! However, I'm still stuck with a /var/spool filesystem that still won't mount at boot, though in recovery mode a mount -a fixes it. Mystery!
  • Note to self: /proc/sys/kernel/domainname can't be set to anything but the correct NIS domain, as ypbind will always use it (and not read files or anything else to get it right)
09.10.2010 From Jaunty to Maverick in many painful steps
  • Upgrading our serveraxis-hosted server from an ancient Jaunty to Maverick has proven to be trickier than I thought. Here's the list of problems:
  • The kernel we boot off is a Xen guest kernel and it's hosted outside of the image, so when I upgrade to Karmic mountall starts failing all over because the kernel is too old: [bugs.edge.launchpad.net] [bugs.edge.launchpad.net] [blog.webangel.ie]
  • Once we boot we get a recovery console. I manage to log in by using serveraxis' excellent webconsole, but to get the paste working there requires some tricking Firefox into letting JS apps access the clipboard: [kb.mozillazine.org] -- in particular you need to use the about:config trick for granting access to signed scripts.
  • Once there it's possible to bring network interfaces up and even to start sshd, but logging in via ssh doesn't work because there's no /dev/pts, which you can easily fix by just mounting it: [www.linuxquestions.org]
  • I decided to contact serveraxis support about the kernel and continue recklessly upgrading with a non-booting system in the hope the kernel can get sorted out separately; apparently I'm not the only one and they've made at least one customer happy: [jenniandjordan.com]
  • When upgrading to Lucid, the installation fails to install the Lucid version of mountall -- again, because the kernel is too old, though this time it's because it triggers a tar bug: [bugs.edge.launchpad.net] [bugs.edge.launchpad.net]
  • I managed to wget mountall and a new tar, but the new tar package doesn't install because of the tar bug. Not to worry: ar x tar_1.23-2_amd64.deb and grab the tar binary in the data.tar.gz, putting it in /sbin. Got the mountall package installed and apt-get -f install && apt-get -u dist-upgraded away, which brings us all the way to Lucid.
  • The reboot into Lucid was as eventful as the one to Karmic, but I'm now a pro at fixing it up so I can actually upgrade fine.
  • The server has been mysteriously halted as I was using it; either this is routine maintenance or somebody's doing something to the server for me! And indeed, here it is:
     kiko@dragon:~$ uname -a
     Linux dragon 2.6.35.4-2.pvops #0 SMP Mon Sep 20 18:32:22 CDT 2010 x86_64 GNU/Linux
    ServerAxis absolutely rock -- they are actually a bit scary even!
  • Finally, until the reboot happened I was getting for a brief moment a compatibility issue with dircolors that is best explained by this bug: [bugs.debian.org]
  • Back to where I started: getting bitlbee-plugin-skype working on Dragon so I can use it to talk to people without proper chat clients. Pulled and installed the two packages from sid: [packages.debian.org] [packages.debian.org]
  • Then installed a vnc4 server and client and ran skype manually on the server; after some rock and rolling managed to get it to log in and run, and the skyped configuration (once you understand it) isn't that bad in fact. The thing which I wasn't clear about is that a) you need to have skype running and b) running skyped is what will cause you to be prompted to allow Skype4Py to access skype. But now it's all clear!
  • Once the installation and skype were happy through xvnc, I installed xvfb and just ran skype and skyped with the display set appropriately; it seems to just work which is nice after a whole afternoon of things that just didn't ;-)
  • Found out this little command
     account set skype/display_name "John Smith" 
    That seems to work with MSN as well. Interested in knowing what it does (and if it's persistent across connections and Bitlbee restarts..)
09.09.2010 Memory and Amnesia
  • Upgrading a desktop to 4GB. I won't really believe that somebody can actually need four gigs of RAM on a desktop, but Firefox and OpenOffice keep surprising me as their appetite for memory grows! Anyway, the important thing to know about when doing this upgrade is that:
  • You really want a 64-bit installation if you are using more than 4GB of memory; you can use PAE if you want to stick with 32-bit, but I'm not entirely sure it's worth it.
  • Low-end motherboards, including those using the Intel i945 chipset, don't really support much more than 3 gigs of ram; they have I/O addresses within that address space, and they lack memory remapping functionality to make it work. [en.wikipedia.org] has the skinny, and [www.linuxquestions.org] discusses the actual i945 chipset. There's a blog post at [blogs.msdn.com] that presents it nicely.
  • Meanwhile, Frictional Games released a pretty amazing game called Amnesia -- and they have a version for Linux! Spend your US$20 wisely here: [www.amnesiagame.com]
25.08.2010 unclutter hurts
  • I have two annoying things bothering me in Lucid. First .xsession-errors fills up to gig-size every other day with some weird
     "Window manager warning: Got a request to focus 0x2821bae (Terminal)
     with a timestamp of 0. This shouldn't happen!"
    messages. Second, if my mouse is highlighting a window and I alt-tab to another one, every other second or so the focus would shift to the original window. That essentially drove me CRAZY. So today, after having to delete the file for the Nth time because it was at 2GB, I found it that the culprit is UNCLUTTER!! [bugs.edge.launchpad.net] has the scoop, but if you want a recommendation from me, it's "apt-get remove --purge unclutter for now. I think it's just buggy -- the idea is a nice one, though I'm not sure I want an extra daemon just for that functionality -- in particular because it seems that the mouse pointer blanks over gnome-terminal when I'm typing anyway.
  • Bonus Lucid hint for the day: if your sun java plugin is installed but your firefox or chromium don't see them, try Johan's magic combo:
     update-java-alternatives -l
     update-java-alternatives -s java-6-sun
06.08.2010 A Logitech DiNovo Edge dongle's embedded and HCI modes
  • I have a DiNovo edge and think it is absolutely fantastic, except when it stopped working on my way to my Lucid upgrade. What happened? The problem is that to Windows users the dongle normally works in embedded mode, meaning it appears to the system as a regular USB keyboard and mouse. Many Windows users hate that behaviour, and would rather the dongle behaved as a regular bluetooth adapter. However, Linux users have the flexibility of choosing in which mode they want the adapter to work, and the magic of selecting them is done through hid2hci, which is run by udev when the device is plugged in. Now it used to be that you could enable or disable hid2hci by fumbling in /etc/default/bluetooth; with the inexorable move to udev, however, on Lucid this behaviour is now hardcoded in udev rules and by default hid2hci is run, which means the keyboard only works if you perform a bluetooth pairing exercise. And since it's a keyboard, that exercise can prove pretty challenging!
  • The command which is run when the BT dongle is plugged in is pretty simple:
     /lib/udev/hid2hci --method=logitech-hid 
         --devpath=/devices/pci0000:00/0000:00:1d.2/usb4/4-2/4-2.3/4-2.3:1.0/usb/hiddev0
    Note the path points to the hiddev0 directory for the device in question. Note also that it seems you can't revert back to embedded mode once the command is run, for some reason.
  • People that are running into this problem on Lucid should know that it's possible to have the dongle still work in embedded mode as long as the hid2hci call in /lib/udev/70-hid2hci.rules for the keyboard isn't run. When embedded mode is active, an lsusb run will list 3 devices like this:
       Bus 004 Device 017: ID 046d:c714 Logitech, Inc. 
       Bus 004 Device 016: ID 046d:c713 Logitech, Inc. 
       Bus 004 Device 015: ID 046d:0b04 Logitech, Inc.
    When you issue the hid2hci run, a 4th device appears, representing the mini-receiver in HCI mode:
       Bus 004 Device 024: ID 046d:c709 Logitech, Inc. BT Mini-Receiver (HCI mode)
       Bus 004 Device 023: ID 046d:c714 Logitech, Inc. 
       Bus 004 Device 022: ID 046d:c713 Logitech, Inc. 
       Bus 004 Device 021: ID 046d:0b04 Logitech, Inc.
    In this mode, you'll need to connect through bluetooth. Note that I haven't found that works reliably on my powerpc mac mini, though it seemed to on a little netbook. The simple workaround seems to be to avoid bluetooth mode by commenting that udev rule out, and keep an eye out for changes in that area when you upgrade.
  • Embedded mode is kinda fascinating; from the OSs perspective it's just a keyboard and mouse that are plugged into a hub; there's no bluetooth anything. When you plug the dongle in, you get the usual dmesg spew of USB information. Here's what dmesg is actually telling you:
  • A USB hub is found; 3 ports detected (of which one is the hub, device 046d:0b04)
  • A keyboard (046d:c713) is found, though the device string identifies it as a "Logitech Logitech BT Mini-Receiver"
  • A mouse (046d:0b04) is found, with the same device string as above.
  • You can play with the dongle a bit when it is in HCI mode, by the way, with hcitool; here's some info on Johan's macbook, and on my Nokia e51.
     kiko@gasolinux:/lib/udev/rules.d$ hcitool dev
     Devices:
         hci0    00:07:61:E3:38:E0
     kiko@gasolinux:/lib/udev/rules.d$ hcitool inq
     Inquiring ...
         00:23:12:39:44:1F   clock offset: 0x1b9d    class: 0x38010c
         00:22:FC:4D:65:47   clock offset: 0x618c    class: 0x5a020c
     kiko@gasolinux:/lib/udev/rules.d$ sudo hcitool info 00:23:12:39:44:1F
     Requesting information ...
         BD