Introduction and call for topics

This blog is a personal pet project of mine. I live and breathe for Computer Architecture, semiconductor technology, and all things in between. I was originally a Computer Scientist and so I am self taught when it comes to computer architecture, digital design, and other parts of the “full stack”. I don’t believe in arbitrary barriers between these disciplines. In fact, I believe that hardware and software are better when designed together, and that we as engineers are better when we see all of us as being on the same team, together.

My aim over the coming months is to cover a mixture of topics. Some will be “introductory” (but arbitrarily detailed). For example, I will explain exactly how a compiled/assembled program into machine code is executed in a modern Out-of-Order processor from before fetch (prefetch/predecode, etc.) through to dispatch and backend data flow execution. I will explain how speculation works, what kinds of predictors and state are kept, etc. I will walk through what a cache coherency protocol is, how it works, and how it differs from memory consistency. I will explain how modern “heterogenous” and “composable” systems work. I will attempt to provide useful examples and references to gem5 models and the like.

Some posts will dive into specific topics that are more advanced in nature. For example, I have recently been consumed with memory barriers and how one might speculate right through them using concepts familiar to those working on transactional memory. I might write this up at a high level, and give a related example of TSO implementation on x86.

My writing style will be technical in nature, but I intend for posts to be broadly accessible to those who work in the technology field (but not necessarily in microprocessor design). I will not cover anything proprietary or confidential to any one vendor, but general concepts.

Please reach out to me or comment below with ideas for topics you would like covered.

19 thoughts on “Introduction and call for topics

  1. >In memory processing
    >Row Hammer in DRAM
    >Out-of-order processing concepts
    >Advance branch prediction


  2. As a self taught student maybe a good way to start is to point out books and papers about comparch that you found really valuable and informative.


  3. Memory management in general, and virtual addressing in particular (especially from the software perspective)


  4. How about hardware interrupts on NUMA systems? Extremely large systems with thousand cores struggle with balancing the interrupts coming from hardware. It would be interesting to get a hardware engineers take on the issue coming from the software side I see the struggles.


  5. Differences between a mobile cpu and a conventional one. How does the computer architecture handle power consumption for a mobile processor.


  6. Some topic suggestions to add:

    – How locks like mutexes are implemented, hence memory ordering
    – Different types of caching (e.g. write through, write back, etc)
    – Different types of caches (instruction, data, TLB, TSB, MMU’s, etc)
    – How a page fault and hence traps work at the lower levels
    – How a context switch works at the lower levels
    – How a BMC works and interacts with a system
    – Different types of buses and mechanisms for hardware buses (e.g. IPMB, SMBus, LPC Bus, i2c, JTAG, etc)
    – Power control and power states
    – Microcode, how it is implemented and how it is used
    – Multiprocessor vs multicore, chiplets, mutiprocessing, threading, hyperthreading, etc.
    – Impacts of word size, cache line lengths, page sizes
    – TPM’s and the entire secure boot / firmware area
    – Virtualisation implementation at the hardware level
    – How I/O works, particularly configuring and talking to HBA’s, etc.


  7. Would love to read about & better understand how microcoded CPUs work & are implemented, with relevant examples.


  8. Hi, I would like a post about how does SW run on NPU, main differences in concept with regular cortex CPU.
    Thanks Jon!


  9. A favorite quip is that the hardest software to change is the stuff in people’s heads. What are the trade-offs of new chip features that require people to learn new things (multi-threading, to a lesser extent store buffers, transactional memory, persistent memory) vs features that require little beyond a recompilation?


  10. tldr; if we’d be able to start both hardware and software from scratch: how could a vastly improved CPU/SW architecture look like? In case such a paradigm switch is utopically imaginable, could it become reality? How?

    I teach an OS course. Every year I learn a bit more that makes me go “woah only now I’m realizing how much resources are wasted on this particular mechanism here”. I have a vague understanding of it all and how it came to be. I’d sum it up as: first there was hardware. At some point Von Neumann style CPU architectures became the mainstream. At some point C tried to be a portable assembler. The Bell Labs gang hacked together a multitasking OS. And that’s the end of it. From then on everything was in C, and the hardware tried to support the C model of how stuff is supposed to work (a heap, friggin pointer arithmetic, a stack) and the syscall/interrupt mechanism, virtual memory (I’m of course overnarrowing the perspective here) OS concept. It seems like a stinking pile of history caught in an eternal maelstrom around itself, never again able to break free from its historically taken architecture decisions.

    So if the whole industry/oekosystem could start from scratch, do away with historic decisions, build everything based on the needs of modern high level languages and language concepts, and software systems architectures. How would such a new CPU look like? Is there any hope we can break free from today’s huge pile of historical liabilities/technical debt and move to something very different? Via which niche could that be achieved?

    Maybe I’m just naive and my perspective that today’s tech is just a huge pile of historical liablities is rooted in me not having a deep enough understanding of the whole tech stack and the reasons why there’s no other way to solve the given fundamental problems. And so this whole request here is moot. However is it? So that’s my topic proposal.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: