Nick Desaulniers

The enemy's gate is down

Mar 10, 2023 - 8 minute read - Comments - ARM

Disambiguating Arm, Arm ARM, Armv9, ARM9, ARM64, AArch64, A64, A78, ...

If you’re new to the Arm ecosystem, consider this a quick primer on terms you likely have seen before but might have questions about.

The Arm architecture is a family of Reduced Instruction Set Architectures (RISC) with simple addressing modes. Data processing is done on register operands otherwise relying on loads and stores to move data into and out of registers.

Arm Limited, the British company, stewards the Arm architecture.

ARM is a legacy acronym for Acorn RISC Machine, then Advanced RISC Machines. As we’ll see, with new advancements in the architecture, previous terms for things sometimes get renamed.

The Arm Architectural Reference Manual for A-profile architecture, affectionately referred to as the Arm ARM, is the programming manual for the architecture. If you’re doing anything with Arm assembly, you probably have this reference nearby.

Armv9 is the latest (as of this writing) in the family of architectures, featuring additions such as newer scalable SIMD vector (SVE2) and matrix (SME/SME2) operations and tracing functionality.

Armv9.4-A is the latest batch of extensions to Armv9. These extensions are documented in the Arm ARM. Some extensions are optional when introduced and many become mandatory in future revisions if they weren’t already when introduced.

The A in Armv9-A denotes the “Application Profile.” These support virtual memory via memory management units, and are what you’re likely to find on any Arm systems such as a phone, laptop, or server. There’s also the “R” profile for applications with real time system requirements, and “M” profiles which you’re more likely to find in microcontrollers which lack MMUs. The three architectural profiles are A, R, and M.

AArch64 is an execution state and was one of the larger additions with the introduction of ARMv8, which added support for 64b registers (31 general purpose registers, dedicated 64b stack pointer, 64b program counter that cannot be written to other than by branches or exceptions, and a zero-value pseudo-register) and addressing. At the same time, the AArch32 execution state was coined to refer to the legacy 32b functionality that folks were familiar with from ARMv7 (15 32b GPRs, no dedicated SP, PC is writable).

Curiously, the Arm ARM doesn’t mention the term ARM64; that seems to be a term preferred by Apple, Microsoft, and Linus Torvalds (that thread will always make me laugh, the maintainers of the port ultimately decided to use arm64 in the tree. The name ultimately makes sense; the arm64 Linux kernel port can execute userspace code in AArch64 or AArch32 execution states, though the kernel itself is AArch64-only).

If you want to learn about the calling convention (arguments are passed in which registers) used on these Arm systems, you might read the Procedure Call Standard for the Arm Architecture (aka AAPCS), which is published along with other documentation related to the ABI here. This made the previous APCS and TPCS standards obsolete. Apple platforms diverge from Arm’s ABI in specific ways. Microsoft also has docs (starting with a nice definition list like this post) on their ABI for Windows.

A64 is the instruction set introduced with AArch64. In fact, it is the only instruction set supported by AArch64. While registers in the AArch64 execution state are 64b, the instructions themselves are still only 32b (fixed width). A32 now refers to the older ISA, which was also 32b fixed width while T32 refers to the mixed 32b and 16b Thumb2 instructions. You may be familiar with those ISAs if you’ve worked with ARMv7 or older devices. A64 is a clean break from A32 and is a familiar but different ISA. For instance, much fewer instructions support predication in A64 than A32.

Not to be confused with A64, you might hear someone refer to a core as “being an A78,” or more formally Cortex-A78. Not only does Arm design the Arm architecture, but they also design implementations of the architecture which we call micro architectures. Regardless of the number that follows, if you see the terms Cortex or Neoverse, those are Arm-designed microarchitectures of the Arm architecture. Cortex-A78 for instance implements up to ARMv8.3 extensions. Wikipedia has a template that is a quick reference to the most recent Arm microarchitectures. Before we can talk more about microarchitectures in the Arm realm, we need to detour to topologies.

DynamIQ (and big.LITTLE before that) build upon the idea of using heterogeneous (different) cores rather than homogeneous (similar) cores for multi-core systems. I’m not sure this could still be considered symmetric multiprocessing. The advantage of this design is the flexibility to be good at different things at different times. We want large power hungry out of order processors to improve performance when we need it, but we might prefer slower in-order cores to help save power consumption (which would improve battery life). It’s interesting to see Intel doing something vaguely similar with the introduction of performance and efficiency cores in their Alder Lake microarchitecture.

By digging through the Technical Reference Manuals published by Arm for various microarchitectures, we can see an interesting evolution of support for various execution states with regards to various exception levels over time.

  • A55: “Both the AArch32 and AArch64 execution states at all Exception levels (EL0 to EL3).”
  • X1: “AArch32 Execution state at Exception level EL0 only. AArch64 Execution state at all Exception levels (EL0 to EL3)”
  • X3: “AArch64 Execution state at all Exception levels, EL0 to EL3.” [i.e. no AArch32 support]

If an SoC were to be composed of heterogeneous cores with varying levels of AArch32 support, that would place interesting constraints on the operating system’s process scheduler; you can’t run an AArch32 program on a core that doesn’t support it!

Below are some more legacy terms. They might still be relevant, depending on how old some systems you still support are.

ARM9 (not to be confused with Armv9, the version of the architecture) is a family of cores, some implementing ARMv4t, some ARMv5.

StrongARM was a series of ARMv4 CPUs built by Digital Equipment Corporation; Intel acquired this IP as part of a settlement of a lawsuit, and eventually designed their own ARMv5 microarchitecture called XScale. Eventually Intel sold the PXA SoC family which was using XScale to Marvell. One wonders what the world may have looked like had Intel stuck with XScale in addition to or instead of Atom.

ARMv4t introduced a compressed instruction set called Thumb. Instructions were 16b fixed width (that said, there were some oddities like BL and BLX that were actually encoded as a pair of 16b instructions each; implementations had to take care that exception returns worked correctly if an exception occurred in the middle of the pair; it was implementation defined if that could even occur).

ARMv6t2 introduced Thumb2 which added more instructions including some 32b wide instructions to support wider immediates, new instruction suffixes to differentiate between narrow vs wide encodings, and a Unified Assembly Language (UAL) that made it easier to write assembler that was valid in Arm or Thumb mode. This made Thumb no longer fixed width though. The introduction of execution states with ARMv8 renamed Thumb to T32; there was no such T32 term when these instructions were introduced!

You may come across the term aarch64be being used in the context of toolchains, which is referring to big-endian. Arm has supported bi-endianness since ARMv4, though most platforms these days use Arm in little-endian endian configuration. Big-endian is more common in networking appliances since network byte order is BE. -mlittle-endian and -mbig-endian are the compiler flags one might use to control codegen. ARMv4 and v5 supported a BE-32 bus byte ordering. Code linked with --be32 produced big-endian code and data. ARMv6 added a new bus byte ordering called BE-8. --be8 produced little-endian code and big-endian data (the compiler would emit big-endian code for relocatable files when built with -big-endian, then the linker would convert these to little endian when --be8 was used. This allowed compilers to not worry about byte-reversing code regardless of what bus byte ordering was to be used at the expense of linker complexity). ARMv6 had both BE-32 and BE-8 bus byte orderings (the older BE-32 became optional), though ARMv7 removed support for BE-32. This post shows why BE-8 replaced BE-32; it was simpler to support systems of both endiannesses if we used little endian instructions and had the memory bus reorder the bytes on access. ELF uses the file format identifiers elf64-littleaarch64, elf64-bigaarch64, elf32-littlearm, and elf32-bigarm; though those identifiers don’t appear in ELF for the Arm {64-bit} Architecture.

That’s a quick glossary over common terms related to the Arm ecosystem. Hopefully in a follow up post we can review terms like VFP, Neon, OABI, and EABI, but these are enough for now.

Many thanks to my friends Peter Smith, Kristof Beyls, and Mark Brown of Arm, Arnd Bergmann of Linaro, and Ard Biesheuvel of Google, for proofreading drafts of this post and supplying insightful feedback. Coincidentally, while I was taking my time editing this post, my friend and colleague Fangrui Song beat me to the punch which another great blog post touching on very similar topics; you should check out his blog if you like this kind of content!

Forking is not free; the hidden costs USPS as an ISP

comments powered by Disqus