Archive for the ‘Fedora’ Category

On systemd adoption

Tuesday, August 24th, 2010

Recent discussion on Fedora devel about systemd got my pulse racing, so I thought I’d share a few thoughts here. This is all my own personal opinion and is not an attack on Lennart. He’s a great guy who takes on tough problems and is at most guilty of being overly optimistic that problems won’t arise in the adoption of new technology.

Firstly, let me note that the systemd idea is a good one, in the longer term. Other Operating Systems have tried this approach (perhaps the closest obvious example is OSX) to starting services when an attempt is made to talk to them, and there is a lot of merit to doing this as an automatic dependency-driven way of avoiding cruft, speeding up boot times, etc. Now I should note that none of you really actually care about boot times (you just think you do) because you suspend/resume for daily use, and if you’re doing hard-core plumbing or kernel work then you don’t have a lot of other cruft installed in the first place. So let’s just throw that “but but but…boottimes!” out of the window and admit to ourselves that’s just for fun. Good. Now, the fundamental approach to systemd is still a good one or others wouldn’t have tried this kind of thing in the past.

Secondly, Linux is not OSX. Scott had a tough enough time with upstart already (and wouldn’t it be nice if we could all just get along…but that’s another topic), things take time to change. Attempting to adopt a new technology late in a release just because it’s shiny and useful to Desktop users is in my personal opinion, unwise. The existing boot process works well – albeit it is not perfection, but then there are many other more pressing issues – and can certainly see us through another release of Fedora while systemd stabilizes, commitments to interfaces are thrashed out, and we figure out how to agree on stuff like what “noauto” even means anyway. You’d think that was standardized, but perhaps not. And if it’s not, this just shows how this can of worms opens up a ton of other stuff to be poking at at the same time.

But maybe I’m just hating on this because of my PulseAudio experience? Not at all. It’s true that I find PA frustrating, and none of the “killer app” features over raw ALSA have yet sold me on it. Playing to my “Air Tunes” device (RAOP) is skippy, grabs the device and won’t let go until PA is killed, playing from virtual machines is buggy, getting sound to come out of the right speakers is a challenge at the best of times (and the GNOME mixer app progressively becomes something so dumbed down it’s no longer useful to me), and I haven’t yet had the courage to try using Bluetooth speaker support. But these aren’t personal failings of Lennart. He’s just a talented guy who likes to take on tough projects that are complex and have a lot of dependencies. And sometimes there are dependencies on kernel changes, or other stuff that is tough for one guy to get done. I’m not hating on systemd, I’m saying that very talented people are still faced with the reality of glitches, bugs, and user adoption.

It comes down to this: most users would prefer to have some reliable sound than the best bells and whistles, and most of those same users (and I mean other than the handful who discuss stuff regularly on the Fedora “devel” list) would similarly prefer no init glitches over a technically better one. Pulseaudio has matured a lot. It’s still not perfect, but I generally let it stay installed these days (whereas I used to remove it immediately), except when I’m trying to do audio recording. I would say that systemd will mature nicely over the next 6 months or so at the present rate, at which time it can be considered for inclusion in Fedora as the optional or default init. But to include it now (in F14) is really – in my opinion – an exercise in forcing users to do testing without giving them a choice to opt into it. Let the experts try it first.

Jon.

The risk of upgrades

Saturday, August 21st, 2010

Sometimes it seems like we’re living in a world of two kinds of Open Source. On the one hand, we have those who like to run unstable/rawhide type of systems, with the latest kernels, and who feel that anything older than ten minutes is still in the stone age. These people are usually paid to work on such things, have a lot of free time not spent doing other stuff, etc. The very notion that you might not upgrade every system the moment – nay the femtosecond – that the new version is out obviously means you’re not cool enough for school, even if what you have is working well enough for you right now. That means that the desktop on which you only surf the web – not the system you actually do kernel compiles on – must be running the latest possible stuff released 2 hours ago, “just because” (insert no useful reason here).

On the other hand, we have “Enterprise” types who install something to solve a problem, and then have to pay real, tangible, money to upgrade/change. Testing costs money and takes time (and I mean of the non-throw it over the wall and hope – but hey, it’s new so it must be better and worth breakage, right?), and if it aint broke, why fix it? Seriously. If an older version of a Linux distribution with an older kernel works for you, and you can still get essential security fixes, then great. More power to you. This is where Open Source should be offering a compelling choice not to upgrade, if you don’t want to. Incidentally, that’s the reason you don’t see updates for lots of older embedded gadgets – as I pointed out at Linux Symposium when explaining how the average (non-geek) consumer doesn’t necessarily “need” Android 2.2 the moment it is first built in beta. Doing an OTA upgrade for something that already works introduces the risk of bricking many units and incurring cost. It doesn’t mean it’s not fun to upgrade your phone to the latest Android test build, or that some manufacturers won’t choose to do it as a value-add, but it does means it’s something you can do on your own dime. If it breaks because they upgraded it for you, they pay. If it breaks because you couldn’t leave it alone, it’s your fault. Keep both pieces.

I used to fall more into the camp of wanting the latest and greatest on every machine. Back when I enjoyed spending a weekend configuring APS to make my printer work with just the most ultra-pointlessly geeky layout for text files sent to lpr, I enjoyed re-installing, upgrading, and generally playing around. After all, this was before I really finally realized there is more to life than computers all the time. Over time, like many others, I grew up and realized that some things which work can appropriately be left alone without the universe exploding. Sure, we don’t want 40-year-old unmaintainable software disasters driving our government infrastructure of tomorrow, but there’s a medium somewhere in there too. I hope that, as the Open Source community matures, more people will come to appreciate this fact. By all means develop using the latest and greatest, but spare a thought for accepting that not everyone out there in the user community is as excited about wasting a day/weekend doing an upgrade unless or until they have to.

Jon.

Intel/AMD CPU catchup

Sunday, March 28th, 2010

So I decided roughly a decade too late that the POWER/PowerPC/SPARC RISC fanboy in me probably needed to (reluctantly) accept that x86 won the war, which means I’ve been brushing up on my x86-64 assembler (by poking in head_*.S, which is correctly written in AT&T syntax, and not that other syntax I shall not name) and trying to catch up on what the heck Intel and AMD are working on in terms of roadmaps, etc. I now know how REX prefixes work on “Intel 64″, for example. And I’ve read the recent changes to the ABI (Fortran support included!). I shall poke at the recent SVM/nested page tables stuff sometime for fun. Oh, and I don’t care that I’m not an IA32 assembly guru, I shall focus on flat 86-64 and forget about last century’s segmentation and other ye olde bank switching inspired hacks.

This weekend, I’ve gone over all of the recent models and public announcements, read some IDF bits, and learned about Intel’s QPI (as opposed to the one I knew, AMD HT – QPI is basically trying to throw off the FSB, but it does some nice failover things HT does not include AFAIK). I’ve concluded that the model numbers used by these guys these days are way, way too confusing. Even more so than when I last really cared about this stuff – determining which “Xeon” has Intel-VT or AMD-V is a game of looking up lots of 4 digit model numbers where a simple naming formula somehow including reference to the microarchitecture used in the model “name” would suffice to convey far more useful information). But, none of this stuff is your grandfather’s x86. It’s every bit as capable (in x86-64 anyway) of taking on the other Big Endian arches I have always personally preferred.

I expect to do a lot more to keep up with x86 development rather than letting my own personal academic fondness for cleaner ISAs limit my exposure. I’m thinking about getting another older Xeon build/test box for playing with x86 stuff and for speeding up kernel compiles at home – perhals a used Dell PE1950 or Precision 490 as these have the best bang for buck ratio at the moment. What I would like to know, from anyone who bothered to read this far, is where should I be going to get the very latest information on x86 developments? I’m on the k.o lists, and I am specifically not a game playing weenie who cares about that stuff – I want to know about roadmaps, things like the new AES extensions, etc. I don’t care that the “whizzbang X1234 blah blah would look uber l33t in this plexiglass case I just bought on eBay”.

Jon.

On Fedora updates

Sunday, March 14th, 2010

There’s been a lot of talk recently about Fedora’s policy (or lack thereof) for shipping updates to existing stable releases. Rather than keep repeating the same points on the mailing lists ad nauseam, let me give my own $0.02 here. Keep in mind that these are my own personal opinions, and nobody else’s.

First off, I know in using Fedora that I am not using an “Enterprise” distribution that is intended to remain rock solid and stable for a long time without substantive changes. I’m ok with having to upgrade every 6 or 12 months, and I’m willing to deal with fixing breakage when that happens (though obviously in a perfect world the upgrade would be entirely seemless). What I am not ok with is updates shipping that cause any breakage or behavioral changes to my perfectly working system when I have not asked to perform a major upgrade. I expect that, if I upgrade my laptop system ten minutes before a meeting, then it will still work exactly as it did before. I don’t want to have to delay doing any updates – as I do now – for fear of the result.

Since time doesn’t stand still following a release, and bugs and regressions are found – and security issues are raised – a flow of updates to a “stable” release is both necessary and healthy to any distribution. But updates should be just that: updates. And also “necessary”. To me, an “update” is not shipping a major version bump on an existing piece of software, or replacing an entire stack (complete with all manner of behavioral changes – no matter how “small”: it does matter if that menu item moves around mid-release) – after the release. That’s called a new release. Or rawhide. Or whatever. The point is that a release has to have some kind of meaning for it to even be worth having a release. Otherwise, you may as well just call it “Forever Rawhide”.

Now I’m not saying there can’t be flexibility. For example, I don’t personally care at all about KDE update frequency. I’m sure the people who work on it (many of whom I have met over the years) are very nice people, and I know they do good work. But I don’t use KDE (other than a few specific pieces of software, such as k3b), and I haven’t for years. So if KDE is updated ten times a day, I’m not going to even notice. I’d rather, for the sake of the users have a consistent policy, but perhaps that stack could be excluded since its maintainers are quite vocal about being able to make a lot of updates. I would rather an exemption were made for their specific stack rather than have the rest of the distribution need to following the same rolling update trend.

What I am going to notice, however, is if any of the critical path components that I rely upon is broken, has a behavioral change, or is needlessly updated way too often to make any real sense. Needless includes pulling in some minor upstream bits that aren’t materially warrented by actual or likely bug reports. Those things are best done in rawhide where they belong, and where those who are more than willing to test as they go will happily help shake out issues. I myself run rawhide also, but on dedicated real or virtual machines that are only for testing and not intended to be used or relied upon for daily work. Even in the case of rawhide, I think things should be at least reasonably tested on a standalone system (more than just compile tested) before pushing if they stand a chance of breaking something fundamental in the distribution.

Think about it this way. The Fedora development cycle is about 6 months. If you are a user and really, badly need some major new feature, you might have to wait an average of 3 months. Even if that’s hardware enablement that makes the distribution otherwise inaccesible to you, I would rather that you have to wait 3 months for the new version (during which time you are free to try the pre-releases, alphas, betas, etc.) than ship an intrusive update that may negatively affect other users who already have working systems that can already make full use of what they have available. It’s simply not worth inconveniencing existing users of a stable release for the possible benefit of those who are not already using it and can wait until the next time.

Anyway. I think Fedora needs an update policy, and it needs a strong one. If you know me, you know I am far from a conservative guy, but I do think that stable Fedora updates should have a fairly conservative update policy for at least all critical path components. Those should never be updated unless necessary to fix specific bugs, and only in a fashion not likely to cause regressions for other users not affected by those bugs, or who rely upon specific behavior not to change (i.e. not a whole major version bump). The components not in the critical path can have more wiggle room if necessary, but I would still like to see far fewer updates in the stable releases.

Jon.

Remote kgdb target debugging via the Cyclades TS-3000 Terminal Server

Friday, February 12th, 2010

So I’ve been poking at Jason Wessel’s kgdb patches recently (specifically, the ones in kgdb-next – you do believe in kernel debuggers, right? Good). They came in very handy when trying to track down an obscure netfilter brokenness last week that was causing Fedora kernels to fall over reproducibly when running KVM. That particular issue was caused by libvirt’s namespace code that attempts to create additional network namespaces on startup, just to see if it’s possible (for optional containers support). After a very long weekend, I pointed out a number of bugs that got fixed. But it got me thinking about kgdb and being able to easily debug stuff that rolls over and plays dead.

Traditionally, I have used a (somewhat loud, and sometimes therefore unfortunately annoying) PC attached to my debugging target via a serial crossover cable. Actually, it’s the inverse of the usual setup in which said other PC is intended to be the target of experimental test kernels, with my desktop generally not being anticipated to fall over with kernel bugs (as it has been doing increasingly of late). In any case, it’s not optimal to leave that PC running and I prefer it being used for evil test experiments. An opportunity to buy random crap on eBay presented itself in the form of an awesome Cyclades (now some other random company) terminal server. I bought a TS-3000 for $115, which is less than a tenth of what they used to go for retail. 48 ports of serial terminal server goodness for the home.

Photo: My Cyclades TS-3000 sitting atop an APC Masterswitch Plus

I was never very good at waiting for santa. I was tracking this damned thing several times a day for the two days it was in transit. And when it arrived – shock! – it might not have the latest firmware! Quick! Time to fix that. I hadn’t even used it in anger before I managed to brick the thing with an update not intended for this model. Cursing myself, I figured I would just rescue it via TFTP. But that requires a special console cable (not quite the same as some others) in order to interrupt the standard boot. Obviously I had none of these cables, and all of the ones here were useless. And I wasn’t prepared to wait ten minutes to order another one. So I went to Microcenter, and bought two RJ45-DB9 generic converters you can click together to wire yourself.

I followed a diagram online to make the RJ45-DB9 cable for the Cyclades – twice. But all of the posted diagrams were incorrect (this is nothing like a Cisco cable, even if you’re a moron and think that it is when you incorrectly make a website with the wrong pinouts, especially if you’re Cyclades and write a manual with the wrong information contained within it…thanks a bunch!). Not to be discouraged, the soldering iron came out, and I rummaged around in a box of parts to find some serial connectors. Fortunately, I had a female DB9 and plenty of old crappyish network cables. I soldered, desoldered, and resoldered this thing about 4 times before finding the correct Cyclades console cable pinout (ADB0036 female DB9) (repeated below, for the benefit of others who read this). Finally, I reflashed the unit with the same firmware it had had when it arrived (zImage_ts_140-3.bin) – the “new” firmware was only for specific other units of which mine was not one thereof, there is a newer “GPL” kit I will poke at sometime – and booted it up.

Photo: A homebrew Cyclades ADB0036 Cable

RJ45 pin DB9 pin
1 8 (CTS)
2 1 (DCD) and 6 (DSR)
3 2 (RD)
4 5 (SGND)
5 7 (RTS)
6 3 (TD)
7 4 (DTR)
8 4 (DTR)

Figure: The correct pinout for a Cyclades ADB0036 console cable (RJ45 to Female DB9 connector)

Cyclades made good (fanless) hardware, but they were hardly the most adept at making configuration straightforward. Sure, you can configure the network easily (this one is called “morse” after the inventor – in the US – of the coding used for telegraphs, which are an ancient precursor to the RS232 standard used on modern serial ports), but when it comes to the port setup…what you want to know is that you’re looking for the “Socket SSH” option, set to increment (e.g. from “1″ – no need to use the “7001″ example, you’re not directly sshing into the port anyway, as with telnet), and based upon a simple “CAS profile” with local authentication (make sure you add a new “system” user for those SSH logins), unless you want to use RADIUS (I have home KRB5, but haven’t deployed RADIUS at the moment). Always make sure you “Run Configuration” before flashing – it seems the former writes to the actual config files that the latter will use, so you cannot necessarily flash and then “Run Configuration” that way around, depending upon the particular operation you are performing.

Once you have the terminal server running, you can talk to it:

$ ssh user_name:port_number@terminal_server.address

More importantly perhaps, you can use the gdb remote target:

(gdb) target remote | ssh -t -t user_name:port_number@terminal_server.address

Remember to tell ssh not to ruin the day (fail to allocate a pty for your friendly conversation) by specifying the “-t -t”, then you can talk to Jason’s kgdb stub.

Next steps? I need to make some more of these damned ADB0036 cables (or find some more on eBay – anyone want some useless Cisco cables I bought thinking they were the same?) and hook them up to all of my systems at home. They will then constantly log via the awesomeness of GNU screen to a remote VM, and I can jump in if something rolls over and catch it so I won’t miss panic/debug opportunities.

Kernel debuggers FTW.

Jon.

Cloning a Fedora rawhide virtual machine

Saturday, August 8th, 2009

Setting up a clone of a Fedora rawhide virtual machine is so simple…

  • Create a new virtual machine instance
  • Stop and then copy the disk image file for the previous VM
  • Boot the new VM in single user mode
  • Edit the /etc/sysconfig/network file to change the hostname
  • Edit the /etc/sysconfig/network-scripts/ifcfg-eth0 file to change the networking
  • Do exactly the same thing in /etc/udev/rules.d/70-persistent-net.rules
  • grep through the filesystem to see where else network data is duplicated.

Notice how more and more abstraction of network configuration does not a simpler system make. At least I don’t care about sound on my virtual machines, so to avoid that fun I simply delete the sound device whenever I create a new VM. I never use NetworkManager on boxes with fixed IPs – somehow I don’t think cloning would get any easier (unless I used DHCP, which does work here but I prefer being certain the box has a fixed configuration when used for testing) with that turned on.

Jon.

Remote fencing with an APC Masterswitch Plus (with an AP9606)

Sunday, June 28th, 2009

Photo: APC Masterswitch Plus (with an AP9606)

As I mentioned before, I’ve been fencing most of my home/office systems (and even lights) these days. The problem is that cheaper power switches like the IP Power 9258 can be damaged quite easily. Two of mine have failed under a particular load element and I’m not saying in that case that it’s not my fault (I still like those units), but it’s clear that having something more “household name” can be a good idea. So I looked on ebay and discovered that old APC Masterswitches now often go for similar money to other more expensive kit.

I bought an 8-port Masterswitch Plus (with an AP9606) this week. Previously these went for up to $1000, but can now be had for even a tenth of that much. And they do telnet/SNMP (and ssh, if you upgrade them – not so much of a concern in this particular out-of-band configuration). I looked around for fencing scripts and obviously found the Red Hat Cluster Suite fence_apc stuff but I don’t want to install lots of stuff, and I don’t want to talk over telnet if I’ve got a private SNMP community configured and am reasonably comfortable with that. So I updated my previous script to talk to APC Masterswitch units.

APC Masterswitch Plus (with an AP9606) fencing script.

Jon.