Archive for the ‘Linux Kernel’ Category

Porting Linux: part 1 (of many)

Sunday, June 5th, 2011

So I’m working on a book at the moment, to be title “Porting Linux”, which will cover the process of porting the kernel to new architectures (and platforms within those architectures). It happens to coincide with a number of interests of mine. Anyway, I thought I would start making some online notes about porting. This is the first in an ongoing series of mini-dumps of unorganized thoughts on the topics I am researching/working on for the book.

At a high level, a new architecture port[0] needs to cover the following topics:

  • Fundamentals – the bitness and endianness of the system (bitsperlong, byteorder, etc.). Stuff that goes in system.h includes read_barrier_depends handling, and instruction sync and memory barrier definitions.
  • Atomic and bit operations – atomic, bitops, swab, etc. Many of these are used generically by the reference asm-generic code and core kernel to implement higher level primitives.
  • CPU and cacheing – SMP setup, cache management, percpu bits, topology, procfs, etc. The CPU(s) are bootstrapped in head.S and friends, but then they need functions to handle non-MMU items such as IPI, etc.
  • Init – Entry into the kernel, establishing exception vectors, calling into start_kernel. This is head.S and friends.
  • Interrupts and exceptions – IRQ setup, traps, entry, etc. The low-level exceptions might live in head, but they will call into generic C-level code to implement various functionality (specific higher-level functions for e.g. VM live elsewhere)
  • IO operations – IO, PCI root setup, legacy IDE bits, etc. Various miscellaneous stuff, especially generic panic-inducing inb/outb functions on modern arches without separated IO memory).
  • Library functions – Checksum support, asm-optimized stuff not specifically in another subsystem.
  • Locking – Spinlock support
  • Memory management – Init, faults, TLB setup, page management, MMU contexts, memcpy, strings, etc.
  • Modules – Load, unload, and relocation
  • Signals – Signal contexts, signal delivery, compat signal handling
  • Tasks – current macros, thread_info, unistd, process, mmap, ELF and auxiliary vectors
  • Time – timex, time setup
  • Linking – asm-offsets, linkage, symbols exported in assembly, etc.
  • Console drivers – early_printk support and a minimal character driver. The only driver work actually required for a port includes being able to squirt stuff straight out the UART in early_printk, and minimally handle the boot console output.
  • Debugging – backtrace, opcode dissassembly, stack unwind, ftrace, kexec, kgdb, kprobes, ptrace

Those are the areas needed to be covered for a minimally working port.


[0] Based on studying recent ports (tile, microblaze, etc.) from the first patch to the last, and long-time established existing ports (ARM, PowerPC, x86, etc.).

The BeagleBoard [part 1]

Friday, December 31st, 2010

So I’ve been dissecting the BeagleBoard-xM OMAP3 port for a book project I’m working on in my spare time. This has also seen me brush up on my ARM assembler (PowerPC, POWER, and so forth are my first loves), in particular stuff that happens in “SVC” (Supervisor) mode. Here are some notes that might be of interest to others getting into Beagle or ARM porting (part 1).

ARM essentials

First, you should know that ARM (as it stands, in v7 of the architecture, we’ve yet to see the new 64-bit ISA released in v8 or whatever that will be) is a 32-bit processor IP core that uses fixed-width 32-bit instructions with various forms of instruction encoding. The ISA is designed to be simple, but it does include some very flexible (simple) instructions. For example, almost every instruction can be conditionalized based upon 4 bits of condition state (and that state is typically not updated automatically within the ALU unless it is specifically encoded into the affecting instruction) and some instructions (additions, etc.) can include shifts of one of the operands (giving you DSP-ish instructions for free). Using conditionalized instructions keeps the pipeline full as opposed to having a lot of branching, and can render very small and tight sequences. Modern ARM cores implement a modified Harvard architecture, as indeed are most “Harvard architecture” processors these days.

There are 30 general purpose registers, but not all of these are available at any one time. In fact, it is easier to think of ARM as having 15 “general purpose” registers with some banking being used in certain modes (such as FIQ or IRQ – Fast or regular IRQs) to replace a subset of the registers with a context-specific special set. Even better, think of it as 12 general purpose registers, with r13 used as the stack pointer (sp), r14 as the link register (lr), and r15 used as the program counter (pc). Even even better, think of according to the ABI calling convention, with r0-r3 used for argument passing, r4-r11 used for local variables, r12 as a function call scratch register, r13 as sp, r14 as lr, and r15 as pc. Some instructions assume the use of sp, lr, and pc (write something into r15 and that will be taken as a branch address), but generally register use is flexible. Of the 4 different stack modes supported, ARM conventionally uses a “Full Descending” stack in which the sp points to the last value written, but you can use any of the “store multiple” or “load multiple” instructions either with their “stack” friendly names (e.g. stmfd), or directly (stmdb). Processor state is encoded in the CPSR (Current Processor Status Register), and saved in the SPSR (Saved Processor Status Register) when handling various exceptions (traps).

The ARM ISA is intentionally extensible. Either ARM, certain third parties (in limited situations), or co-processors may extend it. Co-processors are a means for the ARM core to remain simple while offloading certain functions to additional cores (that might actually be part of the same die). For example, the original FPA (Floating Point Accelerator) was implemented as a co-processor, and has long since been deprecated by the VFP (Vector Floating Point) co-processor (currently at version 3 thereof). Another co-processor you might care about is cp15, the MMU or VMSA (Virtual Memory Systems Architecture). When an instruction the ARM doesn’t recognize is seen, it will try to find one of 16 possible co-processors that can implement it (each has a special pre-determined number, such as cp15 for the MMU) before hitting the illegal instruction handler. The latter allows some instructions to be implemented in software, via traps, such as the older floating point emulation in the Linux kernel. There are special instructions for copying data to/from co-processors, and each has its own self-contained/self-defined register set.

Modern ARM cores implement a very long pipeline internally, but the original design was three stage. And this leaks out into the visible value of, for example, the pc (which is generally 8 bytes away from where you think it should be – different pipeline stage). This is the reason why PC relative addressing can be confusing (though assemblers help to deal with this). Another issue is that the ARM fixed-width instructions didn’t traditionally have a direct parallel to the PPC “li” “ori” address loading sequence. Instead, a 4K pc-relative address could be easily loaded in one instruction, a 64K pc-relative in two, and certain other addresses could be loaded by means of the built-in barrel shifter used in (e.g.) the addition instruction or by using a “literal pool”, depending upon the assembler. This means that traditionally, only certain addresses could be loaded with a “LDR” “meta” assembly instruction, and that the assembler would try to replace your LDR with appropriate shifts (in possibly several instructions), etc. If it were unable to do so, it would generate a warning. You could also load addresses directly from memory and jump to those that way by means of loading them in to the pc. ARM “bl” relative branches (which use a direct label address) have a 32-MB limit, traditionally requiring trampolines to handle larger jumps[0]. Since ARMv6T2, there is a movt instruction for loading the top 16 bits of a register (leaving the lower alone), and since ARMv7 there is even a movw (load a value into lower 16 bits and zero the upper part), which now allows a movw/movt combination akin to the li/ori combination on PowerPC. A mov/movt combo is used by, e.g. GCC when generating longer jumps (in which it loads the pc directly).

The ABI for ARM was changed a few years ago from the OABI (old ABI) to the EABI (not to be confused with the PowerPC ABI of the same name). In particular, system calls are no longer implemented by passing the system call number in the “comment” field of the system call instruction, but are instead passed by means of register (faster than having the kernel poke at the memory to find what number was contained within that instruction). Also, 64-bit values are passed using sequential registers, beginning on an evenly aligned register number. The ARM implements a limited number of trap vectors, traditionally located at 0×0000_0000 but today possibly relocated high at 0xffff_0000 (this is the preference of the kernel, if it is possible, since it avoids having the NULL physical page over-used).

Memory alignment is extremely important to ARM. Not only did it not have things like floating point for the longest time, but to keep the design simple, unaligned memory accesses weren’t really supported historically. Modern processors do have some support for handling unaligned access, but you generally want to avoid that. ARM deals in Bytes, Half-Words (16-bit), and Words (32-bit), none of this IA32 historical nonsense of “double words”. 64-bit values can be handled by means of multiple word instructions, and also manipulated in user code on some systems by use of the VFP, and so forth.

ARM don’t make processors directly. They define the ISA (currently v7, though processors still implement v4 and v5, etc.) and provide reference designs in the form of soft and hard (netlists, gates, etc.) cores to third parties. Depending upon the licensing (whether it is a foundry or merely a regular licensee), that third party might only have the right to implement what they are given. But if they have special foundry access, they can actually change the design. In any case, certain CPU functionality is optional. In the past, these were designated by letters after the CPU (e.g. “TDMI” for older cores implementing the then-optional Thumb instruction set, supporting virtual memory, debug, etc.), but today profiles are generally used instead. Cortex (the latest generation) provides for A, R, and M profiles, of which the Application profile (A) is what we care about.

ARM kernel

Execution of the ARM kernel begins in the inferred standard location of arch/arm/kernel/head.S, at the place very obvious labeled with “Kernel startup entry point”. At this point, the MMU must be off, the D-cache must be off, I-cache can be on or off, r0 must contain 0, r1 must contain the “machine number” (an ARM Linux standard assigned number, one per machine port, passed from the bootloader code), and r2 must contain the “ATAGS” pointer (a flexible data structure precursor to things like fdt and device trees that allows a bootloader to pass parameters). First, the processor mode is quickly set to ensure interrupts (FIQ and IRQ) are off, and that the processor is properly in Supervisor (SVC) mode. Then, MMU co-processor register c0 is copied into ARM register r9 to obtain the processor ID. This is followed by a call to __lookup_processor_type (contained within head-common.S, the common file for both MMU-enabled and non-MMU enabled ARM kernels – the latter are not covered by this document).

__lookup_processor_type behaves like a number of other functions in the early kernel setup. First, you should note that it has a C-ABI companion (used later, in higher level code) called lookup_processor_type that also preserves stack and calling semantics for use from C-code, then calls into __lookup_processor_type. Like so many other of these functions, __lookup_processor_type has a __lookup_processor_type_data friend that contains three data items:

  • The address of the structure
  • A pointer to the beginning of an array of “processor info” structures
  • A pointer to the end of an array of “processor info” structures

The function loads the address of __lookup_processor_type_data into r3, then uses an ldmia instruction to load the members of that structure into r4-r6. Since the first member of the structure contains the (relocated) virtual address of the kernel, and r3 contains the (currently) physical address, a simple subtraction is used to store into r3 the offset between the linked address of __lookup_processor_type_data and the actual address. This offset is then applied to registers r5 and r6 (containing the second and third members of that structure – __proc_info_begin and __proc_info_end). A loop is then entered to iterate over the proc info structure array and find a matching known-processor. If a matching processor is found, the physical address is preserved in r5 on return, otherwise #0 (NULL) is written into it so that the calling code can determine whether a processor was found.

Once the processor type has been determined, a very similar function called __lookup_machine_type is called to find the machine type passed by the bootloader, and get the “machinfo” structure describing the machine in question. If either of these functions fails, a call is made to __error_p, which can be compiled (in debug mode) to output an error ASCII message to the UART (byte by byte) but at least gives a point to pick up in a hardware debugger. Next, a call is made to __vet_atags (which checks that the ATAGS in r2 begin with the ATAG_CORE magic marker[1], and so forth), then sometime later we call __create_page _tables to fudge up entries in cp15 that cover just the kernel code and data. After that, we call through a few functions beginning with __enable_mmu that actually cause the MMU to get enabled and for the return to be to __mmap_switched (also in the head-common.S file). __mmap_switched also has an __mmap_switched_data, which is used to store various global values (such as processor_id, __machine_arch_type, __atags_pointer, and so forth). After zeroing out the BSS, __mmap_switched causes the high-level kernel start_kernel function to be entered as usual.

start_kernel (init/main.c) does a lot of things on every architecture/platform. These include calls to setup_arch, which on ARM calls setup_machine. This latter function returns an mdesc (machine descriptor) which is used in setting up any ATAGS, and for paging_init (which calls devicemaps_init that uses the mdesc to see if any special device IO maps are needed early on, for example for debugging and so forth). If ATAGS are not specified, a default init_tags set is used that defaults the machine to e.g. MEMSIZE of 16MB RAM, and other limited (but sane) defaults. After running through other generic early startup code, start_kernel causes the kernel_init thread to run, which calls do_basic_setup. do_basic_setup calls various initcalls, including the special customize_machine initcall. Customize machine calls init_machine (e.g. omap3_beagle_init), which adds various platform devices and does other board specific setup.

When the board specific init function is run, it will typically call platform_device_register for each known platform device (you can see these in /sys/devices/platform for example). Some of these platform entries include a driver_name field, which will cause the actual device driver to be loaded, but others contain more limited data. For example, in the case of EHCI (USB) on the Beagle, an ehci platform device is registered, but what causes the actual USB driver to load (if not built-in) is that the OMAP EHCI driver contains a MODULE_ALIAS(“platform:omap-ehci”) that udev and modprobe will cause to load on boot. In the case of OMAP, it might seem like a lot of hardware addresses and so forth are missing from the platform registration, but this is because the hardware is standardized (EHCI is always in the same place for a given OMAP, and hard-coded into the USB driver). One thing that does differ is GPIO MUX assignment, and these pins/values are poked in the init_machine code for the Beagle.

That’s all for now. More later.


[0] Modern ARM also implements the Thumb instruction “BLX” (Branch with Link and optional eXchange) that can be used in combination with a series of assembler-generated mov/orr sequences to load an address into a register and branch to it. The older “bl” requires a pc-relative label address to jump to. In fact, when you use a sequence like “ldr reg, address” and “blx reg” the assembler may in fact generate 5 of more actual instructions using mov, orr, and bx or blx to perform the actual load and branch. Rather than switching to Thumb mode, the modern mov/movt combination is preferable for loading arbitrary addresses.

[1] I do wonder what significance this particular number has to rmk. If anyone knows why it was chosen – a birthday? or other date? etc. please do let me know.

Using OpenOCD with BeagleBoard-xM

Monday, December 13th, 2010

NOTE: This is an extremely technical blog post intended only for very serious embedded Linux developers. It was written mostly for the benefit of Google. I wish that this posting had existed over the weekend when I was looking at this.

So I now have several Texas Instruments BeagleBoard-xM boards based upon the TI DM37x series Cortex-A8 based SoC. The “xM” is an improvement on the original BeagleBoard that features the improved DM3730 in place of the original OMAP3530. Like its predecessor, the DM37x series includes Texas Instruments’ on-chip “ICEpick” JTAG TAP controller that can be used to selectively expose various additional JTAG TAPs provided by other on-chip devices. This facilities selective exposure of these devices (which may not be present in the case of the “AM” version of the chip, or otherwise might be in a low-power or otherwise disabled state), because it is nice to be able to ignore devices we don’t want to play with today. The ICEpick itself can handle up to 16 devices, although the majority of those are “reserved” for the scarily more complex days of tomorrow. If you want some of the details, refer to chapter 27 of the DaVinci Digital Media Processor Technical Reference Manual (TRM) (under “User Guides”).

The full possible DM37x JTAG chain contains the following devices (in OpenOCD ordering, from TDO to TDI – the inverse of the physical ordering of the bus, which starts with the ICEpick at 0 and ends with the DAP at 3): DAP (Debug Access Port), Sequencer (ARM968), DSP, D2D, ICEpick. It’s the last device in the chain that we really care about because it is through the Debug Access Port that we will poke at the system memory address space and registers within the Cortex-A8 processor. At Power ON, the default JTAG chain configuration will depend upon the strapping of the EMU0 and EMU1 lines (which are exposed on the Flyswatter as jumpers adjacent to the JTAG ribbon cable). If both of these lines are high (pins 1-2 not 0-1 on the Rev-B Flyswatter, i.e. nearest to the ribbon cable) – which they had better be if you want an “out of the box experience” – then the only device to appear in the chain will be the ICEpick. OpenOCD is expecting this, because it contains special macro functions that will instruct the ICEpick through its control command registers to expose any additional TAPs after initialization (we actually only want the DAP in the default case, which is how OpenOCD will behave for you if you don’t change it). Each time a new TAP is exposed, the OpenOCD JTAG logic will do the necessary state transition for it to take effect (for the chain to lengthen or contract according to devices appearing and disappearing in the chain).

The default upstream version of OpenOCD does not work with BeagleBoard-xM at the moment (as of v0.4.0-651-gc6e0705 – last modification on December 8th) because some logic was recently added to master that attempts to handle busticated DAP ROM addresses on a Freescale (not Texas Instruments) IMX51 processor, which is very similar to the DM37x. Due to this “fixup”, the DM37x’s correctly provided ROM table information will be “fixed” to use the wrong DAP address, which won’t work. For the moment, find the following loop in src/target/arm_adi_v5.c:

        /* Some CPUs are messed up, so fixup if needed. */
        for (i = 0; i < sizeof(broken_cpus)/sizeof(struct broken_cpu); i++)
                if (broken_cpus[i].dbgbase == dbgbase &&
                        broken_cpus[i].apid == apid) {
                        LOG_WARNING("Found broken CPU (%s), trying to fixup "
                                "ROM Table location from 0x%08x to 0x%08x",
                                broken_cpus[i].model, dbgbase,
                        dbgbase = broken_cpus[i].correct_dbgbase;

Wrap it with a #if 0 (and don't forget to also comment or define away the definition of "i") such that it won't take effect. Then, if you have a Rev-B xM you will also need to edit the file tcl/target/amdm37x.cfg to add a new TAPID for the undocumented revision recently made to the ICEpick on these TI parts (this is backward compatible with the Rev-A because it will now look for both):

   switch $CHIPTYPE {
      dm37x {
         # Primary TAP: ICEpick-C (JTAG route controller) and boundary scan
         set _JRC_TAPID  0x0b89102f
         set _JRC_TAPID1 0x1b89102f
# Primary TAP: ICEpick - it is closest to TDI so last in the chain
jtag newtap $_CHIPNAME jrc -irlen 6 -ircapture 0x1 -irmask 0x3f \
   -expected-id $_JRC_TAPID -expected-id $_JRC_TAPID1

Then make and install the OpenOCD binaries (ensure you have the free FTDI libraries installed, and configure with “configure –enable-maintainer-mode –enable-ft2232_libftdi”). You can now use OpenOCD with the xM revA or B.


Open Hardware Summit and Maker Faire New York

Thursday, September 16th, 2010

So I decided to attend the first Open Hardware Summit next Thursday (September 23 2010) in Queens, New York. I’ll drive down on Wednesday and I’m staying nearby at one of the airport hotels. Open Hardware is one of the next big things that is already starting to influence the Open Source community, and I think it’s important to keep up with what’s going on. If you’re going to be there, give me a shout.

After the conference, I am debating taking Friday off and driving down to D.C. and Virginia for the day. I’ve yet to see Monticello (Thomas Jefferson’s home) and I’d so much like to tour the library and grounds. If I do that, I’ll drive down after the conference and stay nearer Virginia in the evening, or nearer D.C. (I also want to see the Declaration of Independence as I keep missing it when I go to D.C.). The idea of a day living in my imagined 18th century world of rose-tinted American idealism is nearly always more enjoyable than many rational people might think it ought to be :)

I’ve bought a ticket for Maker Faire New York for the weekend, thinking I might then drive back to Boston in Saturday and stop off in New York on the way back for a few hours. Something like that. Just as long as I’m back in time to spend Sunday writing!


Netbook recommendations sought

Wednesday, September 15th, 2010

So I’m considering finally entering the modern age with a netbook. I want a 64-bit capable Atom based N450 or N475+ system, to which I will add my own SSD (something that actually is known to work with “TRIM” and not just claims to, and not whatever it may come with). I intend to run Fedora on it, for development purposes, so there’s no point in looking to Android or other environments. It’ll be solely for ease of hacking, battery life, etc. It’ll probably run Rawhide and test kernels, etc. I have a shiny laptop for the times when a shinybook is required to fit in at the coffeeshop.

I’ve seen the more interestingly unusual stuff from Nokia and friends, but it’s pretty much come down to a choice between ASUS and Acer (apologies to the Dell Mini). I don’t need built-in 3G, nor WiMAX because in the US those are not sane and sensible choices at the moment. I think, on balance that the ASUS Eee PC 1015PED is probably the best choice right now, over the Acer Aspire One. The Acer does offer the same kinds of things, but the reviews aren’t treating it as nicely and the ASUS has a reasonably long line of heritage by now. So I think it’s basically the ASUS…but this post is intended to catch the case that I’m missing something very obvious.

Anyway. I’ve seen the Fedora wiki pages, various Google feedback on the ASUS, and I also know how to use the Internet :) So I don’t need generic advice of the form: “hey, look at this webpage blah blah blah”, nor do I care if I need to compile some driver or do something on a unit that will already be for hacking anyway. What I would like is a few specific, personal recommendations of the form: “yep, I have that and it’s a great choice”, or “I have this and find it better because…”. Jeremy bought one of the older Acer Aspire Ones, so that’s an interesting data point. You out there in Fedora land, what are you running right now in the way of a netbook?


The risk of upgrades

Saturday, August 21st, 2010

Sometimes it seems like we’re living in a world of two kinds of Open Source. On the one hand, we have those who like to run unstable/rawhide type of systems, with the latest kernels, and who feel that anything older than ten minutes is still in the stone age. These people are usually paid to work on such things, have a lot of free time not spent doing other stuff, etc. The very notion that you might not upgrade every system the moment – nay the femtosecond – that the new version is out obviously means you’re not cool enough for school, even if what you have is working well enough for you right now. That means that the desktop on which you only surf the web – not the system you actually do kernel compiles on – must be running the latest possible stuff released 2 hours ago, “just because” (insert no useful reason here).

On the other hand, we have “Enterprise” types who install something to solve a problem, and then have to pay real, tangible, money to upgrade/change. Testing costs money and takes time (and I mean of the non-throw it over the wall and hope – but hey, it’s new so it must be better and worth breakage, right?), and if it aint broke, why fix it? Seriously. If an older version of a Linux distribution with an older kernel works for you, and you can still get essential security fixes, then great. More power to you. This is where Open Source should be offering a compelling choice not to upgrade, if you don’t want to. Incidentally, that’s the reason you don’t see updates for lots of older embedded gadgets – as I pointed out at Linux Symposium when explaining how the average (non-geek) consumer doesn’t necessarily “need” Android 2.2 the moment it is first built in beta. Doing an OTA upgrade for something that already works introduces the risk of bricking many units and incurring cost. It doesn’t mean it’s not fun to upgrade your phone to the latest Android test build, or that some manufacturers won’t choose to do it as a value-add, but it does means it’s something you can do on your own dime. If it breaks because they upgraded it for you, they pay. If it breaks because you couldn’t leave it alone, it’s your fault. Keep both pieces.

I used to fall more into the camp of wanting the latest and greatest on every machine. Back when I enjoyed spending a weekend configuring APS to make my printer work with just the most ultra-pointlessly geeky layout for text files sent to lpr, I enjoyed re-installing, upgrading, and generally playing around. After all, this was before I really finally realized there is more to life than computers all the time. Over time, like many others, I grew up and realized that some things which work can appropriately be left alone without the universe exploding. Sure, we don’t want 40-year-old unmaintainable software disasters driving our government infrastructure of tomorrow, but there’s a medium somewhere in there too. I hope that, as the Open Source community matures, more people will come to appreciate this fact. By all means develop using the latest and greatest, but spare a thought for accepting that not everyone out there in the user community is as excited about wasting a day/weekend doing an upgrade unless or until they have to.


Intel/AMD CPU catchup

Sunday, March 28th, 2010

So I decided roughly a decade too late that the POWER/PowerPC/SPARC RISC fanboy in me probably needed to (reluctantly) accept that x86 won the war, which means I’ve been brushing up on my x86-64 assembler (by poking in head_*.S, which is correctly written in AT&T syntax, and not that other syntax I shall not name) and trying to catch up on what the heck Intel and AMD are working on in terms of roadmaps, etc. I now know how REX prefixes work on “Intel 64″, for example. And I’ve read the recent changes to the ABI (Fortran support included!). I shall poke at the recent SVM/nested page tables stuff sometime for fun. Oh, and I don’t care that I’m not an IA32 assembly guru, I shall focus on flat 86-64 and forget about last century’s segmentation and other ye olde bank switching inspired hacks.

This weekend, I’ve gone over all of the recent models and public announcements, read some IDF bits, and learned about Intel’s QPI (as opposed to the one I knew, AMD HT – QPI is basically trying to throw off the FSB, but it does some nice failover things HT does not include AFAIK). I’ve concluded that the model numbers used by these guys these days are way, way too confusing. Even more so than when I last really cared about this stuff – determining which “Xeon” has Intel-VT or AMD-V is a game of looking up lots of 4 digit model numbers where a simple naming formula somehow including reference to the microarchitecture used in the model “name” would suffice to convey far more useful information). But, none of this stuff is your grandfather’s x86. It’s every bit as capable (in x86-64 anyway) of taking on the other Big Endian arches I have always personally preferred.

I expect to do a lot more to keep up with x86 development rather than letting my own personal academic fondness for cleaner ISAs limit my exposure. I’m thinking about getting another older Xeon build/test box for playing with x86 stuff and for speeding up kernel compiles at home – perhals a used Dell PE1950 or Precision 490 as these have the best bang for buck ratio at the moment. What I would like to know, from anyone who bothered to read this far, is where should I be going to get the very latest information on x86 developments? I’m on the k.o lists, and I am specifically not a game playing weenie who cares about that stuff – I want to know about roadmaps, things like the new AES extensions, etc. I don’t care that the “whizzbang X1234 blah blah would look uber l33t in this plexiglass case I just bought on eBay”.