Why Look at HBM and Storage After GPUs? Breaking Down the AI Infrastructure Investment Chain

AI infrastructure compute and storage chain in data centers

The AI infrastructure investment theme is expanding from “who has GPUs” to “who can keep GPUs running efficiently.” You should not only look at peak computing power. You also need to look at HBM bandwidth, GPU memory capacity, enterprise SSDs, Nearline HDDs, networking, power, and data center storage architecture. For ordinary investors, GPUs are the entry point, while HBM and storage are important signals for judging whether AI demand is continuing to spread across the infrastructure chain.

Key Takeaways

  • GPUs determine the upper limit of compute, while HBM determines model throughput and data movement efficiency.
  • AI training depends on cluster throughput, while AI inference relies more on memory, latency, and concurrency.
  • HBM, DRAM, SSDs, and HDDs correspond to different data temperatures and cost positions.
  • Storage demand comes from datasets, vector databases, logs, model weights, and inference outputs.
  • Investment analysis should focus on orders, pricing, gross margins, inventory, and customer concentration.
  • The AI storage chain offers growth elasticity, but also faces cycle reversals and valuation compression risks.

Why Does HBM and Storage Come After GPUs?

Data transfer relationship among GPUs, HBM, and semiconductor hardware

HBM and storage become important after GPUs because the AI system bottleneck is shifting from “whether there are enough GPUs” to “whether GPUs can continuously receive data that is fast enough, close enough, and large enough.” GPUs handle computation, but model weights, training data, KV cache, vector search, and inference logs all require different layers of memory and storage support.

You can think of an AI data center as an “AI factory.” GPUs are the core production line. HBM is the high-speed material warehouse attached to that production line. DRAM is the buffer inside the server. SSDs are the hot-data warehouse. HDDs and object storage are the large-capacity data lake. If the warehouse is too small or the conveyor belt is too slow, even expensive GPUs will end up waiting for data.

NVIDIA’s product upgrades also show this trend. NVIDIA H200’s 141GB of HBM3e and 4.8TB/s of bandwidth show that high-end AI GPU competition is not only about Tensor Cores, but also about larger GPU memory and higher bandwidth. DGX B200’s 1,440GB of GPU memory and 64TB/s of HBM3e bandwidth pushes this trend to the system level: an AI server is already a combination of GPUs, HBM, NVLink, CPUs, DRAM, NVMe SSDs, and networking equipment.

Segment Main Problem Solved Impact on AI Representative Asset
GPU Matrix computation Determines the upper limit of training and inference speed AI accelerator
HBM High-speed near-memory Affects model throughput, context length, and concurrency HBM3E, HBM4
DRAM System memory Supports CPU, caching, and data preprocessing DDR5, RDIMM
SSD Hot-data access Supports vector databases, RAG, caching, and high IOPS Enterprise SSD
HDD Large-capacity storage Supports data lakes, backup, training data, and logs Nearline HDD

For investors, this means AI infrastructure is no longer a single GPU story. You need to follow the data flow: where the data comes from, how it is read, how it is cached, how it enters the GPU, and how the inference results are stored afterward. Looking only at GPU orders can make you miss changes in HBM supply, enterprise SSD pricing, Nearline HDD shipments, and data center capital expenditure.

Blackwell architecture’s 208 billion transistors and 10TB/s chip-to-chip interconnect also show that AI chips are entering the system engineering stage. Internal chip interconnects, packaging, HBM, server memory, and rack-level networking together determine usable compute power, rather than any single component deciding the outcome on its own.

Summary: GPUs are the entry point for AI infrastructure investing, but they are not the end point. What you really need to observe is whether compute can be continuously fed with data. HBM solves GPU-adjacent bandwidth and capacity issues. DRAM and SSDs solve server-side caching and hot-data access. HDDs and object storage solve massive data accumulation. Looking at HBM and storage after GPUs is essentially about watching AI move from chip procurement to system deployment, from training to inference, and from one-time construction to long-term data operations.

Why Has HBM Become a Key Bottleneck for AI GPUs?

HBM high-bandwidth memory and the hardware foundation of chip interconnects

HBM has become a key bottleneck because large models need not only computation, but also high-speed data movement. Model parameters, activations, and KV cache all need to move in and out of GPU-adjacent memory frequently. Ordinary DRAM sits farther away from the GPU, and its bandwidth and latency struggle to meet the needs of top-tier AI accelerators. HBM uses stacking and advanced packaging to place high-bandwidth memory close to the GPU.

HBM Solves Bandwidth and Near-Memory Capacity Problems

HBM is not just an upgraded version of a standard memory module. It is a high-bandwidth memory technology designed around GPUs and AI accelerators. Through stacked DRAM dies, TSVs, interposers, and advanced packaging, HBM places wider data channels closer to the compute unit. The goal is not simply to increase capacity, but to reduce data movement time so that GPUs spend less time waiting during training and inference.

Micron HBM3E’s 24GB 8-high cube and over 1.2TB/s of bandwidth reflect HBM’s core value: each HBM stack provides extremely high bandwidth, and multiple stacks around a GPU form a high-throughput memory system. With Blackwell Ultra, NVIDIA’s technical blog mentions 288GB of HBM3e and 8TB/s of bandwidth per GPU, with the clear goal of allowing larger models, longer context windows, and higher-concurrency inference to run inside GPU-adjacent memory.

Why Is HBM Closer to the GPU Value Chain?

HBM is the closest memory layer to the GPU, so its pricing is more directly pulled by AI accelerator demand. It is not a standard component sold mainly into ordinary consumer electronics. Instead, it is tightly linked to AI GPUs, advanced packaging, wafer foundries, substrates, testing, and cloud provider procurement plans. In other words, changes in HBM demand more directly reflect the real construction pace of high-end AI servers.

In the third quarter of fiscal 2025, Micron mentioned that HBM revenue grew nearly 50% sequentially and data center revenue more than doubled year over year. This shows that AI is pushing some storage-cycle products toward higher-value server and data center scenarios. SK hynix also stated that 12-layer HBM4 samples had been delivered to major customers and that mass-production preparation would proceed after qualification, showing that HBM competition has already moved from HBM3E toward HBM4.

HBM demand usually rises for five reasons:

  1. Larger model parameter counts require more near-memory capacity.
  2. Longer context windows quickly increase KV cache usage.
  3. Higher inference concurrency makes GPU memory a service-capacity constraint.
  4. MoE, RAG, and agentic AI increase data access pressure.
  5. As GPU compute increases, memory bandwidth must rise at the same time.

Summary: HBM is the first area to watch after GPUs because it directly determines whether high-end AI GPUs can release their performance. The higher the compute peak, the greater the data movement pressure. The larger the model, the longer the context, and the higher the inference concurrency, the more critical HBM capacity and bandwidth become. But HBM is not a risk-free one-way growth asset. You also need to watch customer concentration, advanced packaging capacity, yield, contract pricing, technology iteration, and expansion pace. HBM’s upside comes from AI GPUs, and its risks also come from AI GPU procurement cycles and supply expansion.

What Kind of Storage Do AI Training and AI Inference Need?

Enterprise SSDs and hot-data access demand in AI inference

AI training needs high throughput and large capacity, while AI inference needs low latency, concurrency, and hot-data access. The training stage continuously reads massive datasets, saves checkpoints, and produces intermediate results. The inference stage processes user requests, context, KV cache, embeddings, vector databases, and RAG document retrieval.

AI Training Depends on Throughput, Capacity, and Stable Data Supply

Training does not end after data is placed into the GPU once. Pretraining requires massive text corpora, multimodal data, and distributed file systems. Fine-tuning requires industry-specific datasets and repeated experiments. During large-model training, checkpoints also need to be saved for failure recovery and version rollback. The key metric here is not whether a single drive is fast, but whether the entire storage system can continuously feed the GPU cluster.

Common storage needs in training include:

Data Type Main Use Key Metric Related Storage
Raw dataset Pretraining and cleaning Capacity, cost, reliability HDD, object storage
Cleaned dataset Training input Throughput, scalability SSD, distributed file system
Checkpoint Failure recovery Write speed, stability SSD, object storage
Logs and metrics Training monitoring Persistence, traceability HDD, object storage
Intermediate results Experiment management Read/write performance, version management SSD, DRAM

AI Inference Depends on Low Latency, Concurrency, and Hot Data

Once inference enters commercial deployment, storage pressure shifts from “preparing data before training” to “constantly reading and writing data during operation.” RAG needs to retrieve enterprise documents, agents need to read and write tool results, long-context models generate large amounts of KV cache, and user requests and outputs also become logs. WEKA’s explanation of the AI memory wall captures the key issue: when the memory needed for inference exceeds available physical GPU memory, both latency and concurrency are affected.

In inference scenarios, HBM, DRAM, SSDs, and object storage form a tiered structure. HBM stores the most urgent model-runtime data. DRAM handles system caching. NVMe SSDs support vector databases and hot data. HDDs and object storage store long-term data. When you see “more AI applications,” what is actually growing behind the scenes is tokens, embeddings, user logs, model versions, and audit records all at the same time.

Summary: Training and inference have different storage requirements. Training is more like a large engineering project that requires continuous, high-throughput, recoverable data supply. Inference is more like an online service that requires low latency, high concurrency, and hot-data access. After AI moves from the lab into enterprise production environments, storage demand no longer stays limited to training datasets. It expands into vector databases, RAG document libraries, user interaction logs, model versions, audit records, and long-term data lakes. Investment analysis should distinguish between training-driven and inference-driven demand instead of treating all “AI storage” as the same thing.

From HBM to HDD: How Is the AI Storage Chain Layered?

The AI storage chain can be layered by “distance from the GPU.” The closer a layer is to the GPU, the more it depends on speed, bandwidth, and latency. The farther it is from the GPU, the more it depends on capacity, cost, and reliability. HBM is the high-value segment closest to GPUs. DRAM and SSDs support hot data inside servers. Nearline HDDs and object storage support data center capacity pools.

The first layer is GPU-adjacent HBM. It handles model weights, activations, KV cache, and high-frequency data access, directly affecting token throughput, context length, and concurrent inference. The second layer is server-side DRAM and enterprise SSDs, which handle system caching, data preprocessing, vector retrieval, and high IOPS. The third layer is the data center capacity pool, where Nearline HDDs, object storage, and backup systems store training data, logs, archives, and long-term data.

Layer Distance from GPU Speed Requirement Unit Capacity Cost Typical Use
HBM Closest Highest Highest Model runtime, KV cache
DRAM Close High High System cache, preprocessing
Enterprise SSD Medium Relatively high Medium-high Vector databases, hot data, RAG
Nearline HDD Farther Lower Low Data lakes, backup, training data
Object storage Farthest Elastic Low Archives, logs, long-term retention

This tiering also explains why HDDs have not been completely replaced by SSDs. AI data centers need large amounts of hot data, but they need even more massive warm and cold data. Training corpora, videos, images, model versions, logs, and backups cannot all be stored in HBM or enterprise SSDs. In the third quarter of fiscal 2026, Seagate reported revenue of $3.112 billion and GAAP gross margin of 46.5%, showing that high-capacity storage demand is being reflected in hard drive vendors’ financial results. Western Digital also reported revenue of $3.337 billion and GAAP gross margin of 50.2% in the third quarter of fiscal 2026, showing a clear improvement in the cloud and data center storage cycle.

The key question here is not “whether SSDs will replace HDDs,” but “which data should be placed where.” As AI data grows, hot, warm, and cold data all increase. SSDs are better for frequent read/write access, low-latency access, and vector retrieval. HDDs are better for low-cost storage of massive capacity. In most cases, the two are more likely to coexist in tiers rather than replace each other directly.

Summary: The core rule of the AI storage chain is simple: the closer the data is to the GPU, the more important speed becomes; the farther it is from the GPU, the more important capacity and cost become. HBM’s investment elasticity comes from AI GPU performance release. Enterprise SSD elasticity comes from inference, RAG, and hot-data access. Nearline HDD elasticity comes from data lakes, backup, logs, and cloud provider capacity expansion. When analyzing different companies, you should not only ask whether they are “AI storage stocks.” You should also ask which storage layer they occupy, how much their revenue depends on pricing, who their customers are, and whether tight supply can last.

What Indicators Should You Watch in the AI Infrastructure Investment Chain?

To evaluate the AI infrastructure chain, you cannot only look at headlines saying “AI demand is strong.” A more practical approach is to watch three groups of indicators: on the demand side, cloud provider capital expenditure and AI workloads; on the supply side, HBM, NAND, and HDD capacity and yields; on the financial side, revenue growth, gross margins, inventory, cash flow, and customer concentration.

The most important demand-side indicator is hyperscaler capex. Whether cloud providers continue expanding AI data centers determines total demand for GPUs, HBM, servers, SSDs, HDDs, and networking equipment. Training-related spending is more concentrated on GPU clusters and high-throughput storage. Inference-related spending places more emphasis on cost, latency, hot data, and online service efficiency. You should not only look at whether a company says “AI demand is strong.” You also need to see whether orders are turning into shipments, pricing, and gross margins.

Supply-side indicators need to be separated by category. HBM is constrained by DRAM dies, TSVs, advanced packaging, yield, and customer qualification. NAND is affected by the pricing cycle and enterprise SSD demand. HDDs require attention to HAMR, areal density, nearline exabyte shipments, and long-term supply agreements. In the fourth quarter of fiscal 2025, Micron’s Cloud Memory Business Unit revenue of $4.543 billion and 59% gross margin showed that when AI data center demand flows into product mix and pricing, earnings elasticity can be significant.

Indicator Meaning for HBM Meaning for SSDs/HDDs Investment Interpretation
Cloud provider CapEx Determines GPU/HBM procurement strength Determines data center capacity expansion Main demand valve
HBM contracts Locks in pricing and capacity Affects ordinary DRAM supply indirectly Signal of tight supply and demand
NAND pricing Indirectly affects SSD costs Directly affects enterprise SSD profits Cycle turning point
Nearline shipments Indirectly reflect data growth Directly affect HDD vendor revenue Capacity demand signal
Gross margin Reflects product pricing power Reflects pricing and product mix Earnings elasticity signal
Inventory Helps judge supply-demand mismatch Helps judge cycle position Risk signal

There is another easily overlooked dimension: transaction costs. When you research AI infrastructure stocks, in addition to judging company fundamentals, you also need to understand that actual transaction costs can affect your holding and rebalancing experience. U.S. stock trading costs usually include more than commissions. They may also include platform fees, external agency fees, trading activity fees, and other charges. Biya charges $0 commission for U.S. stock trading, while platform fees, external agency fees, and other charges are subject to the information shown in U.S. stock trading fees and on the order page. Whether related services are available depends on the user’s location, identity verification result, platform rules, and applicable laws and regulations.

If you need to track U.S. and Hong Kong stocks across the AI storage chain, you can use Biya to follow market quotes, available asset classes, and account costs. You can also use U.S. stock information search to organize related names. Fees are not the factor that determines investment returns, but in high-volatility situations, frequent rebalancing, or small-ticket purchases, they can affect the real trading experience.

Summary: AI infrastructure investing should move from “concept judgment” to “indicator verification.” On the demand side, look at cloud provider capex, AI server shipments, inference concurrency, and enterprise AI deployment. On the supply side, look at HBM yield, advanced packaging, NAND pricing, and HDD supply discipline. On the financial side, look at revenue, gross margin, inventory, cash flow, and customer structure. Only when demand truly flows into orders, pricing, and earnings does the investment logic of the AI storage chain become more solid.

What Risks Should You See While Being Bullish on the AI Storage Chain?

Being bullish on the AI storage chain does not mean ignoring cycles and valuation. The main risks fall into three categories: slower cloud provider capital expenditure, supply expansion that causes storage prices to fall, and technology architecture changes that alter demand structure. AI demand is strong, but DRAM, NAND, and HDDs still have cyclical characteristics.

The first risk is an AI capex slowdown. If cloud providers find that inference revenue, compute utilization, or power availability falls short of expectations, capital spending may contract temporarily. GPUs, HBM, servers, SSDs, and HDDs are all part of the same infrastructure budget chain, and demand strength or weakness can spread across multiple segments. Short-term shortages should not be directly treated as permanent high prosperity, especially during periods of high valuation, when any change in order timing can trigger stock price volatility.

The second risk is supply expansion. High HBM prices encourage manufacturers to expand capacity. Higher NAND and HDD prices also improve the industry’s willingness to supply more. South Korea has launched large-scale investment plans around semiconductors, HBM, and AI data centers. Reuters’ coverage of South Korea’s AI and chip investment plan shows strategic importance on one hand, but also reminds you to pay attention to future cycle pressure after supply expansion.

The third risk is technology path change. Model quantization, sparsity, MoE, CXL memory pooling, KV cache offloading, and near-data processing can all change the relative demand for HBM, DRAM, SSDs, and HDDs. Software optimization can improve hardware utilization, and may also reduce hardware demand per unit of inference. SSDs and HDDs are not in a simple replacement relationship. In the future, they are more likely to be tiered by data temperature and cost.

Investors should avoid these mistakes:

  1. Treating all storage companies as HBM beneficiaries.
  2. Looking only at revenue growth while ignoring gross margins and inventory.
  3. Looking only at AI training while ignoring inference commercialization.
  4. Ignoring customer concentration and long-term contract pricing.
  5. Treating cyclical price increases as permanent growth-stock logic.
  6. Looking only at popular companies while ignoring valuation and cash flow.

Summary: The AI storage chain combines growth logic and cycle logic. It is not a low-risk one-way sector. HBM is closer to AI GPU growth, but it also has greater customer concentration, packaging capacity, and technology iteration risks. SSDs and HDDs are more affected by pricing cycles, inventory, and supply discipline. A more prudent approach is to look at industry momentum, supply and demand, valuation, and financial verification at the same time. Before trading, you should also understand platform rules, fee structures, and your own risk tolerance. Public market information does not constitute investment advice.

If you focus on AI infrastructure investing, you do not have to look only at GPU leaders. A more complete observation framework is to look at HBM, DRAM, NAND, enterprise SSDs, Nearline HDDs, data center equipment, cloud provider capex, and transaction costs together. You can first use the industry chain logic to screen directions, then use revenue, gross margins, inventory, and cash flow to verify the cycle. If related services are available in your region, you can also register an account to further explore Biya’s multi-asset trading support. Whether services such as U.S. stocks, Hong Kong stocks, and digital assets are available depends on the user’s location, identity verification result, platform rules, and applicable laws and regulations. Any trading decision should be based on personal goals, risk tolerance, and complete fee information.

FAQ

Why Should AI Infrastructure Investors Not Only Look at GPUs?

AI infrastructure investors should not only look at GPUs because GPUs require HBM, storage, networking, power, and cooling to work efficiently. GPUs determine the upper limit of compute, but HBM determines whether data can quickly enter the compute unit, while SSDs and HDDs determine whether training data, vector databases, and logs can be stored and accessed efficiently.

What Is the Difference Between HBM and Ordinary DRAM in AI?

HBM sits closer to the GPU and focuses on high bandwidth, low latency, and high-density packaging. Ordinary DRAM mainly serves as system memory inside servers. AI training and inference involve heavy data movement, so HBM more directly affects high-end GPU throughput, context length, and inference concurrency.

What Storage Demand Is Driven by AI Inference Growth?

AI inference growth drives demand for KV cache, vector databases, RAG document libraries, user logs, and hot-data access. HBM and DRAM handle near-memory and system caching, enterprise SSDs support low-latency retrieval, and HDDs and object storage handle long-term data retention.

Why Are Nearline HDDs Still Important for AI Data Centers?

Nearline HDDs remain important because they offer advantages in large capacity and lower unit storage cost. AI data centers need not only high-speed hot data, but also long-term storage for training corpora, logs, backups, archives, and model versions. For cold and warm data, HDDs remain an important capacity foundation.

How Can Ordinary Investors Judge the Risks of AI Storage Stocks?

Ordinary investors should look at demand, pricing, inventory, gross margins, customer concentration, and valuation. The storage industry is highly cyclical, and short-term price increases do not equal long-term certainty. When trading is involved, platform rules, bill details, and local regulatory requirements should also be considered.

Is the AI Storage Chain Better for Long-Term Investing or Cycle Trading?

Whether the AI storage chain is suitable for long-term investing depends on company positioning and personal risk tolerance. HBM is more closely tied to AI GPU growth, while SSDs and HDDs are more affected by pricing cycles. Before making any trade, investors should consider financial data, valuation, fee structure, and compliance requirements together.

*This article is provided for general information purposes and does not constitute legal, tax or other professional advice from BiyaPay or its subsidiaries and its affiliates, and it is not intended as a substitute for obtaining advice from a financial advisor or any other professional.

We make no representations, warranties or warranties, express or implied, as to the accuracy, completeness or timeliness of the contents of this publication.

Related Blogs of

Choose Country or Region to Read Local Blog

BiyaPay
BiyaPay makes crypto more popular!

Contact Us

Mail: service@biyapay.com
Customer Service Telegram: https://t.me/biyapay001
Telegram Community: https://t.me/biyapay_ch
Digital Asset Community: https://t.me/BiyaPay666
BiyaPay的电报社区BiyaPay的Discord社区BiyaPay客服邮箱BiyaPay Instagram官方账号BiyaPay Tiktok官方账号BiyaPay LinkedIn官方账号
Regulation Subject
BIYA GLOBAL LLC
BIYA GLOBAL LLC is registered with the Financial Crimes Enforcement Network (FinCEN), an agency under the U.S. Department of the Treasury, as a Money Services Business (MSB), with registration number 31000218637349, and regulated by the Financial Crimes Enforcement Network (FinCEN).
BIYA GLOBAL LIMITED
BIYA GLOBAL LIMITED is a registered Financial Service Provider (FSP) in New Zealand, with registration number FSP1007221, and is also a registered member of the Financial Services Complaints Limited (FSCL), an independent dispute resolution scheme in New Zealand.
©2019 - 2026 BIYA GLOBAL LIMITED