<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>The Docs on Alb Kestrel</title><link>https://alb-kestrel-infra.pages.dev/posts/</link><description>Recent content in The Docs on Alb Kestrel</description><generator>Hugo 0.125.0</generator><language>en-us</language><lastBuildDate>Fri, 08 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://alb-kestrel-infra.pages.dev/posts/index.xml" rel="self" type="application/rss+xml"/><item><title>Hard Lessons in VRAM Allocation: SIGKILL and the OOM Killer</title><link>https://alb-kestrel-infra.pages.dev/posts/vram-management/</link><pubDate>Fri, 08 May 2026 00:00:00 +0000</pubDate><guid>https://alb-kestrel-infra.pages.dev/posts/vram-management/</guid><description>The Incident While testing the limits of my RX 6800 16GB on CachyOS, I attempted to load the Qwen2.5-Coder-32B model using llama-server.
The Data As shown in my terminal logs, the ROCm0 model buffer size requested was 16123.35 MiB. With only 319.04 MiB mapped to the CPU, the GPU was being pushed to its absolute physical limit.
The Crash Because the model consumed nearly the entire 16GB of VRAM, the Linux kernel faced a critical memory shortage.</description></item></channel></rss>