Anatomy of a Thinking Machine

A technical walkthrough of hardware inference processes, exploring Nvidia and AMD tools, and an early demonstration of Cortex, an open‑source multi‑platform LLM runtime.

Overview

We all know that AI has a hardware problem.

What happens at a hardware level during inference?
What exactly are all these tools in the inference ecosystems from Nvidia, AMD…?
An early preview of Cortex, an open source tool that runs LLMs across multiple platforms

Here’s my cofounder Dan doing a similar talk, but I’m planning for this demo to be shorter & purely technical:

https://www.youtube.com/watch?v=orcPcUzSbOw&ab_channel=HackerHouseTW

Video

Links

https://github.com/janhq/cortex
Local C++ platform runs GGUF models via OpenAI-compatible API.
https://jan.ai/cortex
Jan is an open-source local LLM inference engine providing an OpenAI-compatible API server.

Tech stack