
The AI infrastructure investment theme is expanding from “who has GPUs” to “who can keep GPUs running efficiently.” You should not only look at peak computing power. You also need to look at HBM bandwidth, GPU memory capacity, enterprise SSDs, Nearline HDDs, networking, power, and data center storage architecture. For ordinary investors, GPUs are the entry point, while HBM and storage are important signals for judging whether AI demand is continuing to spread across the infrastructure chain.

HBM and storage become important after GPUs because the AI system bottleneck is shifting from “whether there are enough GPUs” to “whether GPUs can continuously receive data that is fast enough, close enough, and large enough.” GPUs handle computation, but model weights, training data, KV cache, vector search, and inference logs all require different layers of memory and storage support.
You can think of an AI data center as an “AI factory.” GPUs are the core production line. HBM is the high-speed material warehouse attached to that production line. DRAM is the buffer inside the server. SSDs are the hot-data warehouse. HDDs and object storage are the large-capacity data lake. If the warehouse is too small or the conveyor belt is too slow, even expensive GPUs will end up waiting for data.
NVIDIA’s product upgrades also show this trend. NVIDIA H200’s 141GB of HBM3e and 4.8TB/s of bandwidth show that high-end AI GPU competition is not only about Tensor Cores, but also about larger GPU memory and higher bandwidth. DGX B200’s 1,440GB of GPU memory and 64TB/s of HBM3e bandwidth pushes this trend to the system level: an AI server is already a combination of GPUs, HBM, NVLink, CPUs, DRAM, NVMe SSDs, and networking equipment.
| Segment | Main Problem Solved | Impact on AI | Representative Asset |
|---|---|---|---|
| GPU | Matrix computation | Determines the upper limit of training and inference speed | AI accelerator |
| HBM | High-speed near-memory | Affects model throughput, context length, and concurrency | HBM3E, HBM4 |
| DRAM | System memory | Supports CPU, caching, and data preprocessing | DDR5, RDIMM |
| SSD | Hot-data access | Supports vector databases, RAG, caching, and high IOPS | Enterprise SSD |
| HDD | Large-capacity storage | Supports data lakes, backup, training data, and logs | Nearline HDD |
For investors, this means AI infrastructure is no longer a single GPU story. You need to follow the data flow: where the data comes from, how it is read, how it is cached, how it enters the GPU, and how the inference results are stored afterward. Looking only at GPU orders can make you miss changes in HBM supply, enterprise SSD pricing, Nearline HDD shipments, and data center capital expenditure.
Blackwell architecture’s 208 billion transistors and 10TB/s chip-to-chip interconnect also show that AI chips are entering the system engineering stage. Internal chip interconnects, packaging, HBM, server memory, and rack-level networking together determine usable compute power, rather than any single component deciding the outcome on its own.
Summary: GPUs are the entry point for AI infrastructure investing, but they are not the end point. What you really need to observe is whether compute can be continuously fed with data. HBM solves GPU-adjacent bandwidth and capacity issues. DRAM and SSDs solve server-side caching and hot-data access. HDDs and object storage solve massive data accumulation. Looking at HBM and storage after GPUs is essentially about watching AI move from chip procurement to system deployment, from training to inference, and from one-time construction to long-term data operations.

HBM has become a key bottleneck because large models need not only computation, but also high-speed data movement. Model parameters, activations, and KV cache all need to move in and out of GPU-adjacent memory frequently. Ordinary DRAM sits farther away from the GPU, and its bandwidth and latency struggle to meet the needs of top-tier AI accelerators. HBM uses stacking and advanced packaging to place high-bandwidth memory close to the GPU.
HBM is not just an upgraded version of a standard memory module. It is a high-bandwidth memory technology designed around GPUs and AI accelerators. Through stacked DRAM dies, TSVs, interposers, and advanced packaging, HBM places wider data channels closer to the compute unit. The goal is not simply to increase capacity, but to reduce data movement time so that GPUs spend less time waiting during training and inference.
Micron HBM3E’s 24GB 8-high cube and over 1.2TB/s of bandwidth reflect HBM’s core value: each HBM stack provides extremely high bandwidth, and multiple stacks around a GPU form a high-throughput memory system. With Blackwell Ultra, NVIDIA’s technical blog mentions 288GB of HBM3e and 8TB/s of bandwidth per GPU, with the clear goal of allowing larger models, longer context windows, and higher-concurrency inference to run inside GPU-adjacent memory.
HBM is the closest memory layer to the GPU, so its pricing is more directly pulled by AI accelerator demand. It is not a standard component sold mainly into ordinary consumer electronics. Instead, it is tightly linked to AI GPUs, advanced packaging, wafer foundries, substrates, testing, and cloud provider procurement plans. In other words, changes in HBM demand more directly reflect the real construction pace of high-end AI servers.
In the third quarter of fiscal 2025, Micron mentioned that HBM revenue grew nearly 50% sequentially and data center revenue more than doubled year over year. This shows that AI is pushing some storage-cycle products toward higher-value server and data center scenarios. SK hynix also stated that 12-layer HBM4 samples had been delivered to major customers and that mass-production preparation would proceed after qualification, showing that HBM competition has already moved from HBM3E toward HBM4.
HBM demand usually rises for five reasons:
Summary: HBM is the first area to watch after GPUs because it directly determines whether high-end AI GPUs can release their performance. The higher the compute peak, the greater the data movement pressure. The larger the model, the longer the context, and the higher the inference concurrency, the more critical HBM capacity and bandwidth become. But HBM is not a risk-free one-way growth asset. You also need to watch customer concentration, advanced packaging capacity, yield, contract pricing, technology iteration, and expansion pace. HBM’s upside comes from AI GPUs, and its risks also come from AI GPU procurement cycles and supply expansion.

AI training needs high throughput and large capacity, while AI inference needs low latency, concurrency, and hot-data access. The training stage continuously reads massive datasets, saves checkpoints, and produces intermediate results. The inference stage processes user requests, context, KV cache, embeddings, vector databases, and RAG document retrieval.
Training does not end after data is placed into the GPU once. Pretraining requires massive text corpora, multimodal data, and distributed file systems. Fine-tuning requires industry-specific datasets and repeated experiments. During large-model training, checkpoints also need to be saved for failure recovery and version rollback. The key metric here is not whether a single drive is fast, but whether the entire storage system can continuously feed the GPU cluster.
Common storage needs in training include:
| Data Type | Main Use | Key Metric | Related Storage |
|---|---|---|---|
| Raw dataset | Pretraining and cleaning | Capacity, cost, reliability | HDD, object storage |
| Cleaned dataset | Training input | Throughput, scalability | SSD, distributed file system |
| Checkpoint | Failure recovery | Write speed, stability | SSD, object storage |
| Logs and metrics | Training monitoring | Persistence, traceability | HDD, object storage |
| Intermediate results | Experiment management | Read/write performance, version management | SSD, DRAM |
Once inference enters commercial deployment, storage pressure shifts from “preparing data before training” to “constantly reading and writing data during operation.” RAG needs to retrieve enterprise documents, agents need to read and write tool results, long-context models generate large amounts of KV cache, and user requests and outputs also become logs. WEKA’s explanation of the AI memory wall captures the key issue: when the memory needed for inference exceeds available physical GPU memory, both latency and concurrency are affected.
In inference scenarios, HBM, DRAM, SSDs, and object storage form a tiered structure. HBM stores the most urgent model-runtime data. DRAM handles system caching. NVMe SSDs support vector databases and hot data. HDDs and object storage store long-term data. When you see “more AI applications,” what is actually growing behind the scenes is tokens, embeddings, user logs, model versions, and audit records all at the same time.
Summary: Training and inference have different storage requirements. Training is more like a large engineering project that requires continuous, high-throughput, recoverable data supply. Inference is more like an online service that requires low latency, high concurrency, and hot-data access. After AI moves from the lab into enterprise production environments, storage demand no longer stays limited to training datasets. It expands into vector databases, RAG document libraries, user interaction logs, model versions, audit records, and long-term data lakes. Investment analysis should distinguish between training-driven and inference-driven demand instead of treating all “AI storage” as the same thing.
The AI storage chain can be layered by “distance from the GPU.” The closer a layer is to the GPU, the more it depends on speed, bandwidth, and latency. The farther it is from the GPU, the more it depends on capacity, cost, and reliability. HBM is the high-value segment closest to GPUs. DRAM and SSDs support hot data inside servers. Nearline HDDs and object storage support data center capacity pools.
The first layer is GPU-adjacent HBM. It handles model weights, activations, KV cache, and high-frequency data access, directly affecting token throughput, context length, and concurrent inference. The second layer is server-side DRAM and enterprise SSDs, which handle system caching, data preprocessing, vector retrieval, and high IOPS. The third layer is the data center capacity pool, where Nearline HDDs, object storage, and backup systems store training data, logs, archives, and long-term data.
| Layer | Distance from GPU | Speed Requirement | Unit Capacity Cost | Typical Use |
|---|---|---|---|---|
| HBM | Closest | Highest | Highest | Model runtime, KV cache |
| DRAM | Close | High | High | System cache, preprocessing |
| Enterprise SSD | Medium | Relatively high | Medium-high | Vector databases, hot data, RAG |
| Nearline HDD | Farther | Lower | Low | Data lakes, backup, training data |
| Object storage | Farthest | Elastic | Low | Archives, logs, long-term retention |
This tiering also explains why HDDs have not been completely replaced by SSDs. AI data centers need large amounts of hot data, but they need even more massive warm and cold data. Training corpora, videos, images, model versions, logs, and backups cannot all be stored in HBM or enterprise SSDs. In the third quarter of fiscal 2026, Seagate reported revenue of $3.112 billion and GAAP gross margin of 46.5%, showing that high-capacity storage demand is being reflected in hard drive vendors’ financial results. Western Digital also reported revenue of $3.337 billion and GAAP gross margin of 50.2% in the third quarter of fiscal 2026, showing a clear improvement in the cloud and data center storage cycle.
The key question here is not “whether SSDs will replace HDDs,” but “which data should be placed where.” As AI data grows, hot, warm, and cold data all increase. SSDs are better for frequent read/write access, low-latency access, and vector retrieval. HDDs are better for low-cost storage of massive capacity. In most cases, the two are more likely to coexist in tiers rather than replace each other directly.
Summary: The core rule of the AI storage chain is simple: the closer the data is to the GPU, the more important speed becomes; the farther it is from the GPU, the more important capacity and cost become. HBM’s investment elasticity comes from AI GPU performance release. Enterprise SSD elasticity comes from inference, RAG, and hot-data access. Nearline HDD elasticity comes from data lakes, backup, logs, and cloud provider capacity expansion. When analyzing different companies, you should not only ask whether they are “AI storage stocks.” You should also ask which storage layer they occupy, how much their revenue depends on pricing, who their customers are, and whether tight supply can last.
To evaluate the AI infrastructure chain, you cannot only look at headlines saying “AI demand is strong.” A more practical approach is to watch three groups of indicators: on the demand side, cloud provider capital expenditure and AI workloads; on the supply side, HBM, NAND, and HDD capacity and yields; on the financial side, revenue growth, gross margins, inventory, cash flow, and customer concentration.
The most important demand-side indicator is hyperscaler capex. Whether cloud providers continue expanding AI data centers determines total demand for GPUs, HBM, servers, SSDs, HDDs, and networking equipment. Training-related spending is more concentrated on GPU clusters and high-throughput storage. Inference-related spending places more emphasis on cost, latency, hot data, and online service efficiency. You should not only look at whether a company says “AI demand is strong.” You also need to see whether orders are turning into shipments, pricing, and gross margins.
Supply-side indicators need to be separated by category. HBM is constrained by DRAM dies, TSVs, advanced packaging, yield, and customer qualification. NAND is affected by the pricing cycle and enterprise SSD demand. HDDs require attention to HAMR, areal density, nearline exabyte shipments, and long-term supply agreements. In the fourth quarter of fiscal 2025, Micron’s Cloud Memory Business Unit revenue of $4.543 billion and 59% gross margin showed that when AI data center demand flows into product mix and pricing, earnings elasticity can be significant.
| Indicator | Meaning for HBM | Meaning for SSDs/HDDs | Investment Interpretation |
|---|---|---|---|
| Cloud provider CapEx | Determines GPU/HBM procurement strength | Determines data center capacity expansion | Main demand valve |
| HBM contracts | Locks in pricing and capacity | Affects ordinary DRAM supply indirectly | Signal of tight supply and demand |
| NAND pricing | Indirectly affects SSD costs | Directly affects enterprise SSD profits | Cycle turning point |
| Nearline shipments | Indirectly reflect data growth | Directly affect HDD vendor revenue | Capacity demand signal |
| Gross margin | Reflects product pricing power | Reflects pricing and product mix | Earnings elasticity signal |
| Inventory | Helps judge supply-demand mismatch | Helps judge cycle position | Risk signal |
There is another easily overlooked dimension: transaction costs. When you research AI infrastructure stocks, in addition to judging company fundamentals, you also need to understand that actual transaction costs can affect your holding and rebalancing experience. U.S. stock trading costs usually include more than commissions. They may also include platform fees, external agency fees, trading activity fees, and other charges. Biya charges $0 commission for U.S. stock trading, while platform fees, external agency fees, and other charges are subject to the information shown in U.S. stock trading fees and on the order page. Whether related services are available depends on the user’s location, identity verification result, platform rules, and applicable laws and regulations.
If you need to track U.S. and Hong Kong stocks across the AI storage chain, you can use Biya to follow market quotes, available asset classes, and account costs. You can also use U.S. stock information search to organize related names. Fees are not the factor that determines investment returns, but in high-volatility situations, frequent rebalancing, or small-ticket purchases, they can affect the real trading experience.
Summary: AI infrastructure investing should move from “concept judgment” to “indicator verification.” On the demand side, look at cloud provider capex, AI server shipments, inference concurrency, and enterprise AI deployment. On the supply side, look at HBM yield, advanced packaging, NAND pricing, and HDD supply discipline. On the financial side, look at revenue, gross margin, inventory, cash flow, and customer structure. Only when demand truly flows into orders, pricing, and earnings does the investment logic of the AI storage chain become more solid.
Being bullish on the AI storage chain does not mean ignoring cycles and valuation. The main risks fall into three categories: slower cloud provider capital expenditure, supply expansion that causes storage prices to fall, and technology architecture changes that alter demand structure. AI demand is strong, but DRAM, NAND, and HDDs still have cyclical characteristics.
The first risk is an AI capex slowdown. If cloud providers find that inference revenue, compute utilization, or power availability falls short of expectations, capital spending may contract temporarily. GPUs, HBM, servers, SSDs, and HDDs are all part of the same infrastructure budget chain, and demand strength or weakness can spread across multiple segments. Short-term shortages should not be directly treated as permanent high prosperity, especially during periods of high valuation, when any change in order timing can trigger stock price volatility.
The second risk is supply expansion. High HBM prices encourage manufacturers to expand capacity. Higher NAND and HDD prices also improve the industry’s willingness to supply more. South Korea has launched large-scale investment plans around semiconductors, HBM, and AI data centers. Reuters’ coverage of South Korea’s AI and chip investment plan shows strategic importance on one hand, but also reminds you to pay attention to future cycle pressure after supply expansion.
The third risk is technology path change. Model quantization, sparsity, MoE, CXL memory pooling, KV cache offloading, and near-data processing can all change the relative demand for HBM, DRAM, SSDs, and HDDs. Software optimization can improve hardware utilization, and may also reduce hardware demand per unit of inference. SSDs and HDDs are not in a simple replacement relationship. In the future, they are more likely to be tiered by data temperature and cost.
Investors should avoid these mistakes:
Summary: The AI storage chain combines growth logic and cycle logic. It is not a low-risk one-way sector. HBM is closer to AI GPU growth, but it also has greater customer concentration, packaging capacity, and technology iteration risks. SSDs and HDDs are more affected by pricing cycles, inventory, and supply discipline. A more prudent approach is to look at industry momentum, supply and demand, valuation, and financial verification at the same time. Before trading, you should also understand platform rules, fee structures, and your own risk tolerance. Public market information does not constitute investment advice.
If you focus on AI infrastructure investing, you do not have to look only at GPU leaders. A more complete observation framework is to look at HBM, DRAM, NAND, enterprise SSDs, Nearline HDDs, data center equipment, cloud provider capex, and transaction costs together. You can first use the industry chain logic to screen directions, then use revenue, gross margins, inventory, and cash flow to verify the cycle. If related services are available in your region, you can also register an account to further explore Biya’s multi-asset trading support. Whether services such as U.S. stocks, Hong Kong stocks, and digital assets are available depends on the user’s location, identity verification result, platform rules, and applicable laws and regulations. Any trading decision should be based on personal goals, risk tolerance, and complete fee information.
AI infrastructure investors should not only look at GPUs because GPUs require HBM, storage, networking, power, and cooling to work efficiently. GPUs determine the upper limit of compute, but HBM determines whether data can quickly enter the compute unit, while SSDs and HDDs determine whether training data, vector databases, and logs can be stored and accessed efficiently.
HBM sits closer to the GPU and focuses on high bandwidth, low latency, and high-density packaging. Ordinary DRAM mainly serves as system memory inside servers. AI training and inference involve heavy data movement, so HBM more directly affects high-end GPU throughput, context length, and inference concurrency.
AI inference growth drives demand for KV cache, vector databases, RAG document libraries, user logs, and hot-data access. HBM and DRAM handle near-memory and system caching, enterprise SSDs support low-latency retrieval, and HDDs and object storage handle long-term data retention.
Nearline HDDs remain important because they offer advantages in large capacity and lower unit storage cost. AI data centers need not only high-speed hot data, but also long-term storage for training corpora, logs, backups, archives, and model versions. For cold and warm data, HDDs remain an important capacity foundation.
Ordinary investors should look at demand, pricing, inventory, gross margins, customer concentration, and valuation. The storage industry is highly cyclical, and short-term price increases do not equal long-term certainty. When trading is involved, platform rules, bill details, and local regulatory requirements should also be considered.
Whether the AI storage chain is suitable for long-term investing depends on company positioning and personal risk tolerance. HBM is more closely tied to AI GPU growth, while SSDs and HDDs are more affected by pricing cycles. Before making any trade, investors should consider financial data, valuation, fee structure, and compliance requirements together.
*This article is provided for general information purposes and does not constitute legal, tax or other professional advice from BiyaPay or its subsidiaries and its affiliates, and it is not intended as a substitute for obtaining advice from a financial advisor or any other professional.
We make no representations, warranties or warranties, express or implied, as to the accuracy, completeness or timeliness of the contents of this publication.



