Archive for the ‘Fedora’ Category

Response to “Why systemd?”

Friday, April 29th, 2011

So I read Lennart’s blog post entitled Why systemd?. In it, he makes a number of comparisons between systemd and the two other Linux init systems that are still in widespread use (this being the third init system some distributions have adopted within the last few years). Overall, he makes a good argument that systemd has many nice and exciting features, and I’m sure they are of interest to various people who want their init system to be SYSV on steroids. Here are some of them:

  • Interfacing via D-BUS
  • Shell-free bootup
  • Modular C coded early boot services included
  • Socket-based Activation
  • Socket-based Activation: inetd compatibility
  • Mount handling
  • fsck handling
  • Quota handling
  • Automount handling
  • Swap handling
  • Encrypted hard disk handling (LUKS)
  • Infrastructure for creating, removing, cleaning up of temporary and volatile files
  • Save/restore random seed
  • Static loading of kernel modules

These are all things I don’t want built into my init system. To me, there are many good reasons that they have been traditionally handled using simple, easy to edit and modify scripts, and that’s where I personally feel they should belong. In my mind, some don’t even make sense to build directly into the init system itself, such as automounting and the like (that belongs in autofs and friends). There’s more, but the main point I want to make here is that when you come up with a list of comparisons, that list should not really be an inverted list of features of the replacement (which obviously may not be in what is being replaced). A better comparison would be user experience. If I’m an admin, all of the new features are nice, but do I need to change my workflow for the new tool? And at the end of the day, what am I winning overall in terms of experience?

I’m not one of those who actually wanted YAIS (Yet Another Init System). No offence particularly to systemd, but I preferred good old fashioned sysvinit. It worked for longer time than many people have been using Linux (or UNIX), it was well understood, and well documented. It was far from perfect, but it got system services started. I can’t remember ever yelling at SYSV init and saying “wow, if only you weren’t so crappy, if only you started every service when I connected to it”. In fact, it was a mature tool that did everything I needed it to. It took a little longer to boot my system than it might, but then like most real users, I use suspend and other features that mean I boot from scratch infrequently, or I run servers where I really don’t care at all. I wouldn’t have cared if it took 10 minutes to boot my laptop, or an hour to boot my server…well, I exaggerate, but you get the point. And inetd? Or xinetd? Automount? Good enough for my uses as separate tools.


On Linux Platforms

Sunday, January 23rd, 2011

One of the major differences between Linux distributions, and other Operating Systems (both Free and non-Free) is that Linux often tries to give you everything from one source. Want a piece of third party software? You’re expected to get it (and its dependencies) into the distribution, and install that version(s). Other Operating Systems provide a base platform upon which third party tools, libraries, and applications can be installed into a separate location. This is close to the original intention of /opt, but it’s actually used rather than shunned is if it were some kind of bad idea to want to do this, and it allows one version of the basic OS to live for a number of years independently of any or all of the applications installed.

Unlike many distro folks and Linux enthusiasts, I actually prefer the idea of providing a basic, stable, unchanging platform upon which self-contained applications can be installed. Kinda like “Enterprise” Linux, but different – Enterprise Linux distributions basically snapshot a particular set of distro software and treat that like a “platform”, while their upstream sources don’t. In my perfect utopia, there’s a huge, bright line between basic OS components and everything else. I want a stable OS, but I might want to install a more recent web browser, or some engineering design tool that is more recent from my OS, and I want to be able to do that trivially and independently of the OS. I don’t want it installed in /usr/bin. I want my OS-supplied core junk to go in there, but I want my applications to live separately. Some experimental distros have even tried this stroke of sanity by cloning the OS X /Applications type of behavior, but only experimentally.

In my perfect world, I would get “Fedora” from the Fedora Project, I would install it, and I would get a basic environment including a desktop. It might even include a web browser, but it would not include all of the other stuff. Instead, this would be installed into completely separate directory structures, and be fully self-contained, away from the basic OS environment. It might be that some of it would come with the distro, and it might even be that some of it were packaged and distributed using distro tools, but it would be trivial to upgrade any software independently of the base OS platform because it would still be stored separately from core system components. Try installing a different version of Firefox, or some other system-supplied app on your favorite Linux distribution without having to place it into a separate directory, avoid using actual packaging, or butcher the distro config.

One day, what I want is going to happen. There will be a realization in the wider Linux community that consumers want a basic platform and that they want to be able to treat other pieces of non-core junk independently of that. But this realization (in the Linux space) is still several years away, and it comes after more people realize the benefit of having a computer that just works without the need for hacking or updating or messing around with OS pieces to get there.


The BeagleBoard [part 1]

Friday, December 31st, 2010

So I’ve been dissecting the BeagleBoard-xM OMAP3 port for a book project I’m working on in my spare time. This has also seen me brush up on my ARM assembler (PowerPC, POWER, and so forth are my first loves), in particular stuff that happens in “SVC” (Supervisor) mode. Here are some notes that might be of interest to others getting into Beagle or ARM porting (part 1).

ARM essentials

First, you should know that ARM (as it stands, in v7 of the architecture, we’ve yet to see the new 64-bit ISA released in v8 or whatever that will be) is a 32-bit processor IP core that uses fixed-width 32-bit instructions with various forms of instruction encoding. The ISA is designed to be simple, but it does include some very flexible (simple) instructions. For example, almost every instruction can be conditionalized based upon 4 bits of condition state (and that state is typically not updated automatically within the ALU unless it is specifically encoded into the affecting instruction) and some instructions (additions, etc.) can include shifts of one of the operands (giving you DSP-ish instructions for free). Using conditionalized instructions keeps the pipeline full as opposed to having a lot of branching, and can render very small and tight sequences. Modern ARM cores implement a modified Harvard architecture, as indeed are most “Harvard architecture” processors these days.

There are 30 general purpose registers, but not all of these are available at any one time. In fact, it is easier to think of ARM as having 15 “general purpose” registers with some banking being used in certain modes (such as FIQ or IRQ – Fast or regular IRQs) to replace a subset of the registers with a context-specific special set. Even better, think of it as 12 general purpose registers, with r13 used as the stack pointer (sp), r14 as the link register (lr), and r15 used as the program counter (pc). Even even better, think of according to the ABI calling convention, with r0-r3 used for argument passing, r4-r11 used for local variables, r12 as a function call scratch register, r13 as sp, r14 as lr, and r15 as pc. Some instructions assume the use of sp, lr, and pc (write something into r15 and that will be taken as a branch address), but generally register use is flexible. Of the 4 different stack modes supported, ARM conventionally uses a “Full Descending” stack in which the sp points to the last value written, but you can use any of the “store multiple” or “load multiple” instructions either with their “stack” friendly names (e.g. stmfd), or directly (stmdb). Processor state is encoded in the CPSR (Current Processor Status Register), and saved in the SPSR (Saved Processor Status Register) when handling various exceptions (traps).

The ARM ISA is intentionally extensible. Either ARM, certain third parties (in limited situations), or co-processors may extend it. Co-processors are a means for the ARM core to remain simple while offloading certain functions to additional cores (that might actually be part of the same die). For example, the original FPA (Floating Point Accelerator) was implemented as a co-processor, and has long since been deprecated by the VFP (Vector Floating Point) co-processor (currently at version 3 thereof). Another co-processor you might care about is cp15, the MMU or VMSA (Virtual Memory Systems Architecture). When an instruction the ARM doesn’t recognize is seen, it will try to find one of 16 possible co-processors that can implement it (each has a special pre-determined number, such as cp15 for the MMU) before hitting the illegal instruction handler. The latter allows some instructions to be implemented in software, via traps, such as the older floating point emulation in the Linux kernel. There are special instructions for copying data to/from co-processors, and each has its own self-contained/self-defined register set.

Modern ARM cores implement a very long pipeline internally, but the original design was three stage. And this leaks out into the visible value of, for example, the pc (which is generally 8 bytes away from where you think it should be – different pipeline stage). This is the reason why PC relative addressing can be confusing (though assemblers help to deal with this). Another issue is that the ARM fixed-width instructions didn’t traditionally have a direct parallel to the PPC “li” “ori” address loading sequence. Instead, a 4K pc-relative address could be easily loaded in one instruction, a 64K pc-relative in two, and certain other addresses could be loaded by means of the built-in barrel shifter used in (e.g.) the addition instruction or by using a “literal pool”, depending upon the assembler. This means that traditionally, only certain addresses could be loaded with a “LDR” “meta” assembly instruction, and that the assembler would try to replace your LDR with appropriate shifts (in possibly several instructions), etc. If it were unable to do so, it would generate a warning. You could also load addresses directly from memory and jump to those that way by means of loading them in to the pc. ARM “bl” relative branches (which use a direct label address) have a 32-MB limit, traditionally requiring trampolines to handle larger jumps[0]. Since ARMv6T2, there is a movt instruction for loading the top 16 bits of a register (leaving the lower alone), and since ARMv7 there is even a movw (load a value into lower 16 bits and zero the upper part), which now allows a movw/movt combination akin to the li/ori combination on PowerPC. A mov/movt combo is used by, e.g. GCC when generating longer jumps (in which it loads the pc directly).

The ABI for ARM was changed a few years ago from the OABI (old ABI) to the EABI (not to be confused with the PowerPC ABI of the same name). In particular, system calls are no longer implemented by passing the system call number in the “comment” field of the system call instruction, but are instead passed by means of register (faster than having the kernel poke at the memory to find what number was contained within that instruction). Also, 64-bit values are passed using sequential registers, beginning on an evenly aligned register number. The ARM implements a limited number of trap vectors, traditionally located at 0×0000_0000 but today possibly relocated high at 0xffff_0000 (this is the preference of the kernel, if it is possible, since it avoids having the NULL physical page over-used).

Memory alignment is extremely important to ARM. Not only did it not have things like floating point for the longest time, but to keep the design simple, unaligned memory accesses weren’t really supported historically. Modern processors do have some support for handling unaligned access, but you generally want to avoid that. ARM deals in Bytes, Half-Words (16-bit), and Words (32-bit), none of this IA32 historical nonsense of “double words”. 64-bit values can be handled by means of multiple word instructions, and also manipulated in user code on some systems by use of the VFP, and so forth.

ARM don’t make processors directly. They define the ISA (currently v7, though processors still implement v4 and v5, etc.) and provide reference designs in the form of soft and hard (netlists, gates, etc.) cores to third parties. Depending upon the licensing (whether it is a foundry or merely a regular licensee), that third party might only have the right to implement what they are given. But if they have special foundry access, they can actually change the design. In any case, certain CPU functionality is optional. In the past, these were designated by letters after the CPU (e.g. “TDMI” for older cores implementing the then-optional Thumb instruction set, supporting virtual memory, debug, etc.), but today profiles are generally used instead. Cortex (the latest generation) provides for A, R, and M profiles, of which the Application profile (A) is what we care about.

ARM kernel

Execution of the ARM kernel begins in the inferred standard location of arch/arm/kernel/head.S, at the place very obvious labeled with “Kernel startup entry point”. At this point, the MMU must be off, the D-cache must be off, I-cache can be on or off, r0 must contain 0, r1 must contain the “machine number” (an ARM Linux standard assigned number, one per machine port, passed from the bootloader code), and r2 must contain the “ATAGS” pointer (a flexible data structure precursor to things like fdt and device trees that allows a bootloader to pass parameters). First, the processor mode is quickly set to ensure interrupts (FIQ and IRQ) are off, and that the processor is properly in Supervisor (SVC) mode. Then, MMU co-processor register c0 is copied into ARM register r9 to obtain the processor ID. This is followed by a call to __lookup_processor_type (contained within head-common.S, the common file for both MMU-enabled and non-MMU enabled ARM kernels – the latter are not covered by this document).

__lookup_processor_type behaves like a number of other functions in the early kernel setup. First, you should note that it has a C-ABI companion (used later, in higher level code) called lookup_processor_type that also preserves stack and calling semantics for use from C-code, then calls into __lookup_processor_type. Like so many other of these functions, __lookup_processor_type has a __lookup_processor_type_data friend that contains three data items:

  • The address of the structure
  • A pointer to the beginning of an array of “processor info” structures
  • A pointer to the end of an array of “processor info” structures

The function loads the address of __lookup_processor_type_data into r3, then uses an ldmia instruction to load the members of that structure into r4-r6. Since the first member of the structure contains the (relocated) virtual address of the kernel, and r3 contains the (currently) physical address, a simple subtraction is used to store into r3 the offset between the linked address of __lookup_processor_type_data and the actual address. This offset is then applied to registers r5 and r6 (containing the second and third members of that structure – __proc_info_begin and __proc_info_end). A loop is then entered to iterate over the proc info structure array and find a matching known-processor. If a matching processor is found, the physical address is preserved in r5 on return, otherwise #0 (NULL) is written into it so that the calling code can determine whether a processor was found.

Once the processor type has been determined, a very similar function called __lookup_machine_type is called to find the machine type passed by the bootloader, and get the “machinfo” structure describing the machine in question. If either of these functions fails, a call is made to __error_p, which can be compiled (in debug mode) to output an error ASCII message to the UART (byte by byte) but at least gives a point to pick up in a hardware debugger. Next, a call is made to __vet_atags (which checks that the ATAGS in r2 begin with the ATAG_CORE magic marker[1], and so forth), then sometime later we call __create_page _tables to fudge up entries in cp15 that cover just the kernel code and data. After that, we call through a few functions beginning with __enable_mmu that actually cause the MMU to get enabled and for the return to be to __mmap_switched (also in the head-common.S file). __mmap_switched also has an __mmap_switched_data, which is used to store various global values (such as processor_id, __machine_arch_type, __atags_pointer, and so forth). After zeroing out the BSS, __mmap_switched causes the high-level kernel start_kernel function to be entered as usual.

start_kernel (init/main.c) does a lot of things on every architecture/platform. These include calls to setup_arch, which on ARM calls setup_machine. This latter function returns an mdesc (machine descriptor) which is used in setting up any ATAGS, and for paging_init (which calls devicemaps_init that uses the mdesc to see if any special device IO maps are needed early on, for example for debugging and so forth). If ATAGS are not specified, a default init_tags set is used that defaults the machine to e.g. MEMSIZE of 16MB RAM, and other limited (but sane) defaults. After running through other generic early startup code, start_kernel causes the kernel_init thread to run, which calls do_basic_setup. do_basic_setup calls various initcalls, including the special customize_machine initcall. Customize machine calls init_machine (e.g. omap3_beagle_init), which adds various platform devices and does other board specific setup.

When the board specific init function is run, it will typically call platform_device_register for each known platform device (you can see these in /sys/devices/platform for example). Some of these platform entries include a driver_name field, which will cause the actual device driver to be loaded, but others contain more limited data. For example, in the case of EHCI (USB) on the Beagle, an ehci platform device is registered, but what causes the actual USB driver to load (if not built-in) is that the OMAP EHCI driver contains a MODULE_ALIAS(“platform:omap-ehci”) that udev and modprobe will cause to load on boot. In the case of OMAP, it might seem like a lot of hardware addresses and so forth are missing from the platform registration, but this is because the hardware is standardized (EHCI is always in the same place for a given OMAP, and hard-coded into the USB driver). One thing that does differ is GPIO MUX assignment, and these pins/values are poked in the init_machine code for the Beagle.

That’s all for now. More later.


[0] Modern ARM also implements the Thumb instruction “BLX” (Branch with Link and optional eXchange) that can be used in combination with a series of assembler-generated mov/orr sequences to load an address into a register and branch to it. The older “bl” requires a pc-relative label address to jump to. In fact, when you use a sequence like “ldr reg, address” and “blx reg” the assembler may in fact generate 5 of more actual instructions using mov, orr, and bx or blx to perform the actual load and branch. Rather than switching to Thumb mode, the modern mov/movt combination is preferable for loading arbitrary addresses.

[1] I do wonder what significance this particular number has to rmk. If anyone knows why it was chosen – a birthday? or other date? etc. please do let me know.

[opinion] Fedora needs an architect

Wednesday, December 15th, 2010

I read yet another thread about Fedora randomly changing the way UNIX has done things forever (the specific thread was on /dev/shm mount options) and it reminded me that I’ve been saying for a while that Fedora urgently needs an architect. FESCo should appoint a person as their technical representative who speaks for overall system architecture concerns. The person in this role should actively seek out compatibility or integration problems but should also be a “go to” person for concerns that arise in the interests of distribution cohesion. Sure, they should be accountable, etc. but the idea that everything should be filed in some ticket and wait a week for FESCo to debate it is both the reason these things don’t get filed (because you can’t file every tiny annoyance) and also the reason why we have these long mailing list threads in the interim.

Here are some of the things an actual architect could do:

  • Embody the overall engineering effort. Help determine overall distribution vision, help set direction, and make recommendations to the various stakeholders about what is required and what is not in order to meet the goals Fedora sets forth. This includes rationalizing the impact of certain possible technical decisions, and recommending against others.
  • Help handle initial discussion of a new feature, work out the integration planning, liase with stakeholders (figure out who they are and actively seek them out if necessary)
  • Monitor the distribution mailing lists for technical issues and be able to have a non-partisan recommendation in the case that there isn’t a mutually agreed upon answer. For really contentious stuff, others would handle it anyway so they can pass it on. But for stuff like system defaults, they can help resolve many problems that arise quickly.
  • etc.

I know that an architect isn’t going to happen, but this is my personal opinion nonetheless. I am certain that were there actually one person in that role we’d see a marked improvement in overall cohesion and distribution quality well within 2 releases.


Using OpenOCD with BeagleBoard-xM

Monday, December 13th, 2010

NOTE: This is an extremely technical blog post intended only for very serious embedded Linux developers. It was written mostly for the benefit of Google. I wish that this posting had existed over the weekend when I was looking at this.

So I now have several Texas Instruments BeagleBoard-xM boards based upon the TI DM37x series Cortex-A8 based SoC. The “xM” is an improvement on the original BeagleBoard that features the improved DM3730 in place of the original OMAP3530. Like its predecessor, the DM37x series includes Texas Instruments’ on-chip “ICEpick” JTAG TAP controller that can be used to selectively expose various additional JTAG TAPs provided by other on-chip devices. This facilities selective exposure of these devices (which may not be present in the case of the “AM” version of the chip, or otherwise might be in a low-power or otherwise disabled state), because it is nice to be able to ignore devices we don’t want to play with today. The ICEpick itself can handle up to 16 devices, although the majority of those are “reserved” for the scarily more complex days of tomorrow. If you want some of the details, refer to chapter 27 of the DaVinci Digital Media Processor Technical Reference Manual (TRM) (under “User Guides”).

The full possible DM37x JTAG chain contains the following devices (in OpenOCD ordering, from TDO to TDI – the inverse of the physical ordering of the bus, which starts with the ICEpick at 0 and ends with the DAP at 3): DAP (Debug Access Port), Sequencer (ARM968), DSP, D2D, ICEpick. It’s the last device in the chain that we really care about because it is through the Debug Access Port that we will poke at the system memory address space and registers within the Cortex-A8 processor. At Power ON, the default JTAG chain configuration will depend upon the strapping of the EMU0 and EMU1 lines (which are exposed on the Flyswatter as jumpers adjacent to the JTAG ribbon cable). If both of these lines are high (pins 1-2 not 0-1 on the Rev-B Flyswatter, i.e. nearest to the ribbon cable) – which they had better be if you want an “out of the box experience” – then the only device to appear in the chain will be the ICEpick. OpenOCD is expecting this, because it contains special macro functions that will instruct the ICEpick through its control command registers to expose any additional TAPs after initialization (we actually only want the DAP in the default case, which is how OpenOCD will behave for you if you don’t change it). Each time a new TAP is exposed, the OpenOCD JTAG logic will do the necessary state transition for it to take effect (for the chain to lengthen or contract according to devices appearing and disappearing in the chain).

The default upstream version of OpenOCD does not work with BeagleBoard-xM at the moment (as of v0.4.0-651-gc6e0705 – last modification on December 8th) because some logic was recently added to master that attempts to handle busticated DAP ROM addresses on a Freescale (not Texas Instruments) IMX51 processor, which is very similar to the DM37x. Due to this “fixup”, the DM37x’s correctly provided ROM table information will be “fixed” to use the wrong DAP address, which won’t work. For the moment, find the following loop in src/target/arm_adi_v5.c:

        /* Some CPUs are messed up, so fixup if needed. */
        for (i = 0; i < sizeof(broken_cpus)/sizeof(struct broken_cpu); i++)
                if (broken_cpus[i].dbgbase == dbgbase &&
                        broken_cpus[i].apid == apid) {
                        LOG_WARNING("Found broken CPU (%s), trying to fixup "
                                "ROM Table location from 0x%08x to 0x%08x",
                                broken_cpus[i].model, dbgbase,
                        dbgbase = broken_cpus[i].correct_dbgbase;

Wrap it with a #if 0 (and don't forget to also comment or define away the definition of "i") such that it won't take effect. Then, if you have a Rev-B xM you will also need to edit the file tcl/target/amdm37x.cfg to add a new TAPID for the undocumented revision recently made to the ICEpick on these TI parts (this is backward compatible with the Rev-A because it will now look for both):

   switch $CHIPTYPE {
      dm37x {
         # Primary TAP: ICEpick-C (JTAG route controller) and boundary scan
         set _JRC_TAPID  0x0b89102f
         set _JRC_TAPID1 0x1b89102f
# Primary TAP: ICEpick - it is closest to TDI so last in the chain
jtag newtap $_CHIPNAME jrc -irlen 6 -ircapture 0x1 -irmask 0x3f \
   -expected-id $_JRC_TAPID -expected-id $_JRC_TAPID1

Then make and install the OpenOCD binaries (ensure you have the free FTDI libraries installed, and configure with “configure –enable-maintainer-mode –enable-ft2232_libftdi”). You can now use OpenOCD with the xM revA or B.


[rant] Desktop application complexity (part 2)

Sunday, December 12th, 2010

There were some useful comments on my previous post, but I think they missed my point: there’s too much complexity here. It’s true that entirely self-contained applications are a relic of yesteryear and that having things work well together is useful, but it’s the mechanics that drive me nuts…some questions I have to ask every time I interact with desktop apps and related infrastructure these days:

  • Is it gconf, dconf, gsettings, kconfig, or some random other thing today? Is it worth figuring out before it’s something else?
  • Has it recently been entirely re-written to be the one true solution, obsoleting everything that went ten minutes before? And will it be re-invented ten minutes from now?
  • Is there a web page I can go to to get a non core GNOME/whatever developer summary of what’s going on? (no, there isn’t).
  • etc.

Heck, last time I tried building GNOME I followed the jhbuild instructions, only to be told that I was doing it wrong and should use something else now GNOME Shell is the flavor of the month. I have nothing against forward progress, but I have a lot against random new things popping up that suddenly replace stuff and aren’t well understood outside of a small group of core developers. The Linux kernel developers take great steps to ensure this is not the case and I would love to see the same happen elsewhere.

Hope that clarifies my frustrations. It’s not so much that gnome-settings-daemon can’t handle NFS mounts properly, it’s that it might not even be worth looking into it because I fear it’ll be replaced by something entirely new, “super awesome” (but not well understood or widely documented) solution in about ten minutes from now.


[rant] Desktop application complexity

Saturday, December 11th, 2010

So it used to be (back in the day), that you would start an application, it would read some configuration, and you would be done. The application would pretty much be self contained, and so would its configuration (none of this windows registry nonsense). Heck, you could even read the configuration with a text editor and make useful changes without devoting time and effort to knowing all of the pieces of an entire stack that keeps changing over time.

These days we have many bits and pieces that are needed just to get my rhythmbox application to run. Tonight’s irritation was caused by gnome-settings-daemon, which seems to have been designed with only local desktop/laptop users in mind, or at least completely fails to handle NFS shares that have gone away. A bit of digging determined the problem, but debugging this is beyond most users. Most users, who start to find applications won’t run or buttons won’t do stuff will just assume the world has imploded and do the windows thing, rebooting their computers. I would love it if we would have either fewer of these random components or a lot more collective knowledge about their design and interaction (documentation, less frequently changing core pieces, whatever). That way, Google wouldn’t lead me only to pages from clueless users having the same problem telling each other to reboot their Linux computers. This is a problem. It is not a good future when everything is so tenuous that users have to reboot their computers to make things work right again.