Tiny Startup Humiliates Nvidia and AMD: Runs 700B AI Models on a $0 Cloud Bill Using Decade-Old Chips

Q: What is the Skymizer HTX301 PCIe AI accelerator card?

The Skymizer HTX301 is a PCIe AI accelerator card built on Skymizer's HyperThought LPU platform. It packs six HTX301 chips and up to 384 GB of LPDDR4/LPDDR5 memory onto a single card, and Skymizer claims it can run LLMs with up to 700 billion parameters while consuming only 240 watts of power.

Q: How does the HTX301 run 700 billion parameter LLMs on a single card?

The HTX301 uses Skymizer's HyperThought LPU platform with efficient compression of both model weights and KV cache, allowing it to fit 700B parameter models into 384 GB of standard LPDDR memory. It outperforms llama.cpp by 9 to 17.8 percent on the same workloads according to Skymizer's benchmarks.

Q: How does HTX301 compare to Nvidia RTX PRO 6000 in power consumption?

The HTX301 consumes 240 watts. Nvidia's RTX PRO 6000 Blackwell consumes roughly 600 watts — more than double — for comparable LLM inference tasks. AMD's Instinct MI350P sits between the two at higher wattage than the HTX301 but with 144 GB of HBM3E memory.

Q: Does the HTX301 really use only 240 watts of power?

That is Skymizer's claimed figure. Independent verification hasn't happened yet — Skymizer is previewing the card at Computex 2025, where third-party testing will give a clearer picture of whether the 240W figure holds under real workloads.

Q: What memory does the Skymizer HTX301 use instead of HBM?

The HTX301 uses standard LPDDR4 and LPDDR5 memory rather than the expensive HBM3E used by AMD's Instinct MI350P or the GDDR solutions in many Nvidia cards. This significantly reduces cost and eliminates the need for specialized cooling infrastructure.

Q: When will the Skymizer HTX301 be available to buy?

Skymizer has not announced a commercial release date or pricing. The card will be publicly previewed at Computex 2025, and availability details are expected to follow after the event depending on how independent testing goes.

Q: Can the HTX301 replace a GPU cluster for enterprise AI workloads?

For LLM inference specifically, Skymizer claims it can. The card fits into a standard PCIe slot in any air-cooled server, requires no infrastructure changes, and supports agentic AI workflows for coding and automation. Whether it matches GPU clusters for training workloads is a separate question — the HTX301 is optimized for inference, not training.

By Rachel Wong Senior Editor

Published: May 11, 2026 Updated: May 11, 2026 9 Min Read

Tiny Startup Humiliates Nvidia and AMD: Runs 700B AI Models on a $0 Cloud Bill Using Decade-Old Chips

A Taiwanese company called Skymizer just announced a PCIe AI accelerator card that, on paper, does something Nvidia and AMD cards costing far more cannot: run a 700 billion parameter language model on a single device while drawing just 240 watts of power. The card is called the HTX301, and the specs alone are enough to make anyone in enterprise AI infrastructure sit up and pay attention.

But the real story here isn't just the performance claim. It's how Skymizer achieved it — using 28-nanometer chips that are a decade old and standard LPDDR4 and LPDDR5 memory instead of the expensive HBM solutions that define modern high-end AI hardware. The company will show the card at Computex, and that event will tell us whether these numbers are real or just well-crafted marketing.

What Is the Skymizer HTX301 and Why the AI World Is Stunned
HTX301 Full Specs: 384GB Memory at Just 240 Watts
The 28nm Trick: How Old Chip Architecture Outperforms Modern GPU Design
HTX301 vs Nvidia RTX PRO 6000 vs AMD Instinct MI350P: Honest Numbers
Which Businesses Should Actually Buy This Card
Computex Reality Check: Will HTX301 Deliver What Skymizer Promises
Frequently Asked Questions

What Is the Skymizer HTX301 and Why the AI World Is Stunned

The HTX301 is built on Skymizer's HyperThought platform, which uses a next-generation LPU — language processing unit — architecture designed specifically for LLM inference workloads. Each card holds six HTX301 chips working in parallel and offers up to 384 GB of total memory. Skymizer claims the card delivers 30 tokens per second at 0.5 TOPS across 100 GB per second of memory bandwidth.

The stunner isn't the raw throughput number. It's the power figure. Nvidia's RTX PRO 6000 Blackwell pulls roughly 600 watts. AMD's Instinct MI350P draws considerably more than the HTX301 as well. Skymizer's card claims to do comparable LLM inference work at 240 watts — less than half of what the Nvidia card requires. For data centers where power costs and cooling infrastructure are major operational expenses, that gap is significant.

Honestly, the skeptic in me wants to see Computex numbers before calling this confirmed. Startup claims without independent benchmarks are common in this space. But the architecture at least has a coherent explanation for why it works, which separates it from pure vaporware.

HTX301 Full Specs: 384GB Memory at Just 240 Watts

The memory configuration is where the HTX301 makes its most unusual choice. Instead of HBM3E — the high-bandwidth memory used in AMD's Instinct MI350P — or GDDR solutions, Skymizer built the card around standard LPDDR4 and LPDDR5. These are the same memory types found in laptops and smartphones. Using them in an AI accelerator is unconventional, but it directly explains the low power draw and lower cost structure.

The tradeoff is bandwidth. HBM3E in the AMD MI350P delivers dramatically higher memory bandwidth than LPDDR5. Skymizer compensates through compression. The card applies efficient compression to both model weights and KV cache, which reduces the effective memory bandwidth requirement for LLM inference. According to Skymizer, this compression approach outperforms unoptimized llama.cpp by 9 to 17.8 percent on the same tasks.

The card fits into a standard PCIe slot and works in any air-cooled server without modifications to power delivery or cooling infrastructure. That matters because upgrading a data center for liquid cooling or higher-amperage power circuits is expensive and slow.

Also Read

The 28nm Trick: How Old Chip Architecture Outperforms Modern GPU Design

Using a 28nm process node in 2025 sounds like a step backward. Modern AI chips from Nvidia and AMD use 4nm or 3nm nodes, which pack far more transistors into the same die area and achieve higher performance per watt at peak throughput. So why does 28nm work here?

The answer is workload specificity. General-purpose GPUs are designed to handle everything — gaming, rendering, training, inference, video encoding. That flexibility requires complex silicon. The HTX301 is not general-purpose. It is designed exclusively for LLM inference, and specifically for the memory-bandwidth-bound nature of that workload. A specialized architecture can extract efficiency that a general GPU cannot, even on older silicon.

The 28nm node also has practical advantages: mature manufacturing with high yield rates, lower cost per wafer, and availability from multiple foundries. Skymizer isn't competing with TSMC's leading-edge allocation alongside Nvidia and Apple. They're using capacity that is abundant and cheap. That feeds directly into a lower card price, though Skymizer hasn't announced pricing yet.

HTX301 vs Nvidia RTX PRO 6000 vs AMD Instinct MI350P: Honest Numbers

Here's what the published specs actually say, without inventing numbers the source material doesn't provide:

Card	Memory	Memory Type	Power Draw	Max LLM Size
Skymizer HTX301	Up to 384 GB	LPDDR4 / LPDDR5	240W (claimed)	700B params (claimed)
Nvidia RTX PRO 6000	96 GB	GDDR7	~600W	Limited by 96GB VRAM
AMD Instinct MI350P	144 GB	HBM3E	Higher than HTX301	Up to 4,600 TFLOPS MXFP4

The memory gap is the clearest differentiator. At 384 GB, the HTX301 can load models that simply don't fit in 96 GB or even 144 GB of VRAM — which is why Skymizer can claim 700B parameter capability. Running a 700B model on a single Nvidia RTX PRO 6000 isn't possible without model sharding across multiple cards. On the HTX301, Skymizer says it runs on one.

Key Takeaways

The HTX301 packs 384 GB of LPDDR memory onto a single PCIe card — enough to load 700B parameter models that won't fit on Nvidia or AMD cards without multi-card setups.
At 240W claimed power draw, it consumes less than half what Nvidia's RTX PRO 6000 requires, which reduces both electricity costs and cooling infrastructure needs.
Skymizer uses 28nm chips specifically because LLM inference is memory-bandwidth-bound, not compute-bound — older specialized silicon can outperform newer general-purpose silicon on this specific task.

Which Businesses Should Actually Buy This Card

The HTX301 is not a training card. If your workload involves training large models from scratch or fine-tuning at scale, this card is not the right tool — that work is compute-intensive in ways the HTX301's architecture isn't optimized for. Nvidia's H100 and AMD's MI300X remain the standard for training.

But for inference — running a deployed model to answer queries, generate code, or handle domain-specific workflows — the HTX301's profile makes sense for specific buyers:

Enterprises that want to run large open-source models like Llama 3 70B or larger on their own hardware without paying cloud API costs
Organizations in regulated industries — healthcare, finance, legal — where sending data to a cloud API raises compliance issues
Companies already running air-cooled server infrastructure who can't easily add liquid cooling for high-TDP cards
Businesses deploying agentic AI workflows for coding assistance or automation where latency and cost per token matter more than peak FLOPS

The card is not for someone who needs GPU flexibility. If you also need to run CUDA-based workloads, render 3D graphics, or train models, you need an Nvidia or AMD card. The HTX301 does one thing and, if Skymizer's claims hold, does it efficiently.

Computex Reality Check: Will HTX301 Deliver What Skymizer Promises

Skymizer will preview the HTX301 at Computex, allowing independent journalists and analysts to see the card run live. That matters because the gap between announced specifications and real-world performance in AI accelerators has been wide before — from multiple startups that raised significant capital and then shipped products that underdelivered.

The specific number to watch is the 240 tokens per second on Llama2 7B claim. That's the benchmark Skymizer cited. Independent testers at Computex will have a straightforward test to run, and the result will either confirm or undermine the card's positioning. The 240W power draw figure is equally testable with a simple power meter.

If the numbers hold, the HTX301 changes the on-premises AI infrastructure calculation for a meaningful segment of the market. If they don't, Skymizer joins a long list of companies that announced impressive specs and shipped something less impressive. The answer comes at Computex.

Frequently Asked Questions

What is the Skymizer HTX301 PCIe AI accelerator card?

The Skymizer HTX301 is a PCIe AI accelerator built on Skymizer's HyperThought LPU platform. It contains six HTX301 chips and up to 384 GB of LPDDR4/LPDDR5 memory, and Skymizer claims it can run LLMs with up to 700 billion parameters on a single card at 240 watts of power consumption.

How does the HTX301 run 700 billion parameter LLMs on a single card?

The 384 GB memory capacity is large enough to load models that won't fit on standard GPU VRAM. Skymizer's HyperThought platform applies compression to model weights and KV cache, reducing effective bandwidth requirements. The result, per Skymizer's benchmarks, outperforms llama.cpp by 9 to 17.8 percent on the same inference tasks.

How does HTX301 compare to Nvidia RTX PRO 6000 in power consumption?

The HTX301 draws 240 watts according to Skymizer's claim. Nvidia's RTX PRO 6000 Blackwell draws roughly 600 watts — more than double — for comparable LLM inference work. AMD's Instinct MI350P also consumes more power than the HTX301, though it offers 4,600 peak TFLOPS at MXFP4 precision for compute-heavy workloads.

Does the HTX301 really use only 240 watts of power?

That is Skymizer's stated specification. It hasn't been independently verified yet. Computex is where journalists and analysts will run their own tests. The 240W figure is testable with basic equipment, so verification should happen quickly once the card is available for hands-on review.

What memory does the Skymizer HTX301 use instead of HBM?

The HTX301 uses LPDDR4 and LPDDR5 — the same memory technology used in laptops and smartphones. This is significantly cheaper and lower-power than HBM3E but also lower bandwidth. Skymizer compensates with compression techniques that reduce the bandwidth demand for LLM inference specifically.

When will the Skymizer HTX301 be available to buy?

No commercial release date or pricing has been announced. The card will be previewed at Computex 2025. Availability details should follow after the event, depending on how independent testing goes and whether Skymizer has manufacturing capacity lined up.

Can the HTX301 replace a GPU cluster for enterprise AI workloads?

For LLM inference specifically, Skymizer claims it can handle workloads that currently require multi-GPU clusters. For model training, fine-tuning, or CUDA-dependent workloads, it cannot — those still require Nvidia or AMD hardware. The HTX301 is an inference-only card optimized for one specific type of workload.

Conclusion

The Skymizer HTX301 makes a specific and testable claim: 700B parameter LLM inference on a single PCIe card at 240 watts, using 28nm chips and LPDDR memory that costs a fraction of HBM. If that claim survives Computex testing, it changes the math for enterprise AI teams that have been told the only path to large model inference is expensive cloud APIs or multi-GPU server clusters.

Computex is the moment of truth. The benchmarks Skymizer cited — 30 tokens per second, 240 tokens per second on Llama2 7B, 9 to 17.8 percent improvement over llama.cpp — are specific enough to verify quickly. Watch what independent reviewers report from the show floor. That will determine whether the HTX301 is a real option for on-premises AI infrastructure or just an impressive slide deck.

Rachel Wong

Senior Editor

Experienced journalist bringing you accurate, well-researched stories. Follow for the latest updates and in-depth coverage.

⚡ TechVibe