-
Building a GPU SaaS Platform - Shared Proxy and SSH Access
Part 10: add a shared storage proxy, switch the accessor to dufs, and expose GPUUnit SSH through sidecars and frp.
-
Building a GPU SaaS Platform - Storage Data Jobs
Part 9: add storage prepare jobs, the first accessor path, and recovery state to GPUStorage.
-
Building a GPU SaaS Platform - Storage Lifecycle
Part 8: introduce GPUStorage, mount persistent data into GPUUnit, and separate data lifecycle from runtime lifecycle.
-
关于 AI Coding 的吐槽
从开源质量、架构演进、生产可靠性和工程责任的角度,聊聊我为什么反对在缺少 review 与验证时把 AI Coding 全权交给 agent。
-
Building a GPU SaaS Platform - One Unit, One Controller
Part 7: collapse stock and runtime into one GPUUnit resource, seed stock explicitly, and hand off warm units into active runtime.
-
Building a GPU SaaS Platform - Useful Operator Contracts
Part 6: add operation idempotency, Swagger docs, clearer controller failure status, and the first runtime template.
-
Building a GPU SaaS Platform - Operator Baseline
Part 5: move the project onto a standard kubebuilder layout, switch the API to Echo, and let requests create real custom resources.
-
Building a GPU SaaS Platform - Runtime Bootstrap in Go
Part 4: build the first runnable single-cluster runtime baseline with production-oriented engineering habits.
-
How to accelerate the startup of a large Docker image?
The Nvidia Docker image is too large for Kubernetes to extract. How to accelerate the startup of it?
-
Building a GPU SaaS Platform - The design spec
Let's write down the design spec for our GPU SaaS platform.