Technology

Turning Forecast Uncertainty into Inventory Decisions with NVIDIA cuOpt

Posted By:

Andy Andikko

Lim Ting Hui

Rafael Nicolas Fermin Cota

Dr. Melvyn Sim

How robust optimization helps enterprises make disciplined procurement and fulfillment decisions despite imperfect forecasts.

Building a Robust Multi-Warehouse Inventory Optimizer with NVIDIA cuOpt

MetaLearner builds production-grade forecasting and optimization systems for enterprise inventory management. This post demonstrates how MetaLearner transforms uncertain demand forecasts into actionable inventory decisions by utilizing robust optimization with NVIDIA cuOpt’s GPU-accelerated LP solvers. The result is a decision engine that enables practical, repeatable, enterprise-scale optimization through cuOpt LP solving.

Over a 52-week real-world backtest covering 17 warehouses and 63 items, our framework achieved a 94.9% service level with an average inventory value of $4.51 million. In comparison, the strongest forecast-plus-buffer benchmark required 40% more inventory to reach a service level that was still 1.9 percentage points lower. At enterprise scale, that difference is not a rounding error: on a $100M inventory base, a comparable inventory-efficiency gap would represent tens of millions of dollars in working capital.

Using NVIDIA cuOpt 26.2.0 on an NVIDIA GB10 GPU, each rolling-horizon solve converged to a feasible solution in about 0.02 seconds. Deploying cuOpt in this setup also revealed three engineering lessons:

Unnormalized LPs can silently produce economically wrong answers.
Rolling-horizon models with lead times can generate phantom orders.
Solver stability depends on numerical scaling and explicit method selection.

From Inventory Rules to Optimization-Based Decisions

Table 1. Overview of Inventory Management Models

Model	Use Case	Limitations
EOQ/Reorder point rules (Donato_TH, 2023)	Stable manufacturing inventory and repeat replenishment	Simple and intuitive, but assumes stable demand and does not handle large networks, cash constraints, or uncertainty well
Newsvendor (DeMarle, 2019)	Seasonal, perishable, fashion, promotion, or short-life-cycle products	Usually assumes a demand distribution, a simplified setting, and limited operational constraints
Forecast + Buffer	Common in ERP/MRP workflows because it is simple, explainable, and easy to deploy	Buffer size is often blunt or manually chosen; it may reduce stockouts, but tie up working capital or place inventory in the wrong location
Stochastic Optimization (Jin et al., 2025)	Energy, utilities, manufacturing, and high-uncertainty supply chains; also used to improve MRP or newsvendor-style policies	Depends heavily on historical distributions and scenario quality; can become computationally heavy as periods, products, warehouses, and constraints grow
Reinforcement Learning (Shakya, 2024)	Complex production or replenishment systems that are difficult to model explicitly	Difficult to explain and constrain; may produce unintuitive decisions, and performance depends heavily on simulator quality
Robust Optimization (Xue et al., 2025)	Multi-period, multi-item, multi-warehouse planning under uncertainty, especially when feasibility and interpretability matter	Needs careful mathematical formulation and scalable LP solvers

Inventory optimization is a sequential decision-making process: companies must determine how much to procure, where to store inventory, which demand to fulfill, and how to maintain liquidity amid lead times, payment delays, and uncertain demand. At MetaLearner, we have helped clients improve forecast accuracy by over 20%, but better forecasts do not automatically lead to better decisions.

According to the Institute of Business Forecasting, the average retail forecast error per SKU is about 30% (Jain, 2018). For a business with $100 million in inventory, this results in an estimated 6-week buffer that can tie up tens of millions of dollars in working capital. Therefore what enterprises need is an optimizer that converts uncertain forecasts into actionable replenishment and fulfillment decisions.

Across the approaches in Table 1, the core intuition is similar: balance the cost of ordering too much against the risk of ordering too little, and use a buffer to absorb forecast errors. The difference is how that buffer is chosen and translated into operational decisions. Robust optimization asks a sharper question: which decisions remain feasible when the forecast is wrong in plausible ways? This makes it a practical framework for turning uncertain forecasts into inventory decisions that remain disciplined under real-world volatility.

Computational concerns are the main obstacle when implementing sophisticated models like robust optimization. Inventory optimization often involves mixed-integer decision variables that can grow exponentially across scenarios, periods, items, and warehouses. MetaLearner overcomes this by formulating the robust decision layer as a continuous linear program. This method improves the model's mathematical efficiency while still allowing joint optimization across products, warehouses, lead times, and cash constraints. With this LP structure in place, NVIDIA cuOpt becomes the ideal solution for scaling repeated solves on the GPU.

MetaLearner’s Optimization Framework

MetaLearner’s robust model is further enhanced by a simulation-based parameter-tuning process and rolling-horizon control. Designed for multi-period, multi-warehouse, multi-product inventory planning, it aims to stay resilient against forecast uncertainty. At each horizon, the system determines how much of each SKU to procure for each warehouse and how much demand to meet from available inventory, since fulfilling all demand may not always be feasible under real operational constraints.

The model incorporates practical constraints that determine whether an optimization output can actually be executed in the business. These include:

Cash feasibility: The company has a limited amount of external cash available for procurement. The optimizer must ensure that ordering decisions do not violate cash availability over time.
Inventory feasibility: The model must respect available inventory after accounting for arrivals, demand, and fulfillment. For example, if realized demand is 10 units but only 5 units are available, the company must either allow backlogs or sell only what is currently in stock.
Key Performance Indicators (KPI): The model must achieve business targets, such as a minimum turnover rate and service level.
Lead times: Orders do not arrive in the same period in which they are placed. This is critical for both planning and backtesting. Without lead times, the model may unrealistically fulfill current demand by ordering inventory in the same period, inflating service-level KPIs.
Payment lags: Cash does not necessarily move at the same time as inventory. Payment delays affect future liquidity, so procurement decisions must account for when cash is actually spent or received.

In a multi-period system, demand uncertainty can propagate across time: an ordering decision made today affects not only today’s inventory position, but also future cash balances, available stock, warehouse capacity, and fulfillment capacity.

MetaLearner’s model addresses this by making decisions that ensure state robustness, maintaining inventory, cash, and fulfillment states feasible even when realized demand deviates from the forecast by margins observed historically. This does not mean every decision across the full horizon is guaranteed to be robust against all possible future outcomes. Instead, it acknowledges that forecasts are imperfect and that decisions should remain feasible under historically observed forecast deviations. In this way, the robust formulation is an uncertainty-set-based control policy for multi-period inventory and liquidity management, without requiring explicit demand-distribution assumptions.

A key design choice is to keep all constraints continuous and linear. This preserves convexity and global optimality while making the model suitable for NVIDIA cuOpt’s GPU-accelerated LP solvers.

Scaling LP Solves with NVIDIA cuOpt

Our initial attempts with CPU-based enterprise optimization solvers were difficult. The model either timed out, ran out of memory, or crashed completely. The simple solution was to divide and conquer: split SKUs into smaller groups, solve each group locally, and combine the results. However, inventory networks do not behave like isolated spreadsheets. Warehouses share financial constraints, items compete for inventory space, fulfillment decisions affect future periods, and lead times link current orders to future service levels. When we split the network, we also split the problem's economics.

The model had to solve the whole network, or it was not solving the real problem. This was the turning point for NVIDIA cuOpt. Because our robust formulation remains a continuous linear program after linearization, it is mathematically suitable for high-performance LP solving. With NVIDIA cuOpt, the same 1.8-million-constraint instance that overwhelmed CPU-based workflows was solved in about 600 seconds using the dual simplex method within cuOpt's LP solver.

Computational efficiency was further enhanced through a rolling-horizon setup, which replaces one large horizon-spanning solve with a sequence of smaller, more manageable problems. In practice, this means the model never commits to a long-term plan. It solves, executes only the nearest decisions, observes actual demand, and replans each period. In a typical iteration, the optimization problem included approximately 4,255 constraints and 3,910 variables before presolve. cuOpt reduced this to about 487 constraints and 710 variables. With normalized data, the model converged to an optimal solution in about 0.02 seconds on an NVIDIA GB10.

The lesson was clear: the breakthrough was not just robust optimization. It was keeping the robust decision problem linear, scalable, and solvable on GPUs. As we collaborated with larger enterprises, our profiler results showed that the computational hot spot shifted from solver time to model assembly time, making sparse matrix construction, constraint-growth management, and state-transition assembly just as important as solver speed.

NVIDIA cuOpt Implementation Lessons

During the development of MetaLearner’s robust optimization model in cuOpt, three practical lessons were learned:

Normalize LP coefficients before solving.
Control phantom orders near the horizon boundary.
Improve solver stability through scaling and explicit solver selection.

cuOpt exposes multiple solver methods and configuration parameters. The right configuration depends on the problem class, scale, sparsity, latency requirements, and accuracy requirements. For this reason, it is useful to centralize cuOpt configuration in a helper function rather than scattering solver settings throughout the codebase.

In Python, the behavior of the cuOpt solver can be configured using SolverSettings along with solver parameters from the cuopt.linear_programming.solver.solver_parameters module. A typical setup starts by importing the relevant configuration keys:

from cuopt.linear_programming.solver_settings import SolverSettings
from cuopt.linear_programming.solver.solver_parameters import (
    CUOPT_METHOD,
    CUOPT_PRESOLVE,
    CUOPT_TIME_LIMIT,
    CUOPT_ITERATION_LIMIT,
    CUOPT_LOG_TO_CONSOLE,
    CUOPT_LOG_FILE,
    CUOPT_NUM_GPUS,
    CUOPT_NUM_CPU_THREADS,
    CUOPT_FIRST_PRIMAL_FEASIBLE,
    CUOPT_CROSSOVER,
    CUOPT_SAVE_BEST_PRIMAL_SO_FAR,
    CUOPT_DUALIZE,
    CUOPT_PDLP_SOLVER_MODE,
)

settings = SolverSettings()
settings.set_parameter(CUOPT_METHOD, 0)          # concurrent
settings.set_parameter(CUOPT_TIME_LIMIT, 120.0)  # seconds
settings.set_parameter(CUOPT_LOG_TO_CONSOLE, 1)
settings.set_parameter(CUOPT_NUM_GPUS, 1)
model.solve(settings)

These parameters govern key solver behaviors. For example, CUOPT_METHOD selects the solver method, CUOPT_TIME_LIMIT and CUOPT_ITERATION_LIMIT control runtime budgets, CUOPT_LOG_FILE enables persistent logging, and CUOPT_SAVE_BEST_PRIMAL_SO_FAR is useful when a solve may stop early. However, a feasible incumbent solution should still be retained.

Normalize LP coefficients before solving

Optimization solvers are highly sensitive to numerical scale. In cuOpt, this often appears through warnings such as: “input problem contains a large range of coefficients.” This warning should not be ignored, because it usually indicates that the constraint matrix contains coefficients with very different magnitudes. For example, an inventory constraint may be written in units of products, while a cash-flow constraint may be written in currency units or currency multiplied by quantity. When these terms appear together in the same linear program, the resulting matrix can become poorly scaled and ill-conditioned. This matters because LP solvers operate on the geometry of the constraint matrix. Both the dual simplex and interior-point methods rely on repeated numerical operations involving this matrix. With coefficients of different scales, the solver can still find an optimal solution, but the solution can become numerically unstable or sensitive to harmless unit changes.

We observed this behavior directly through several test cases. In an unnormalized, deterministic, profit-maximizing model, changing only the external units materially altered the solution. Without normalization, the model converges to an objective of 0.4456, while the externally scaled solve converges to 0.4132. The physical policies also differed substantially: served demand changed by 25,478 units, fulfillment decisions had 1,205 mismatches, ordering decisions had 1,302 mismatches, and inventory paths had 606 mismatches. Since the underlying business problem was unchanged, these differences show that the unnormalized LP was not invariant to harmless unit changes.

After normalization, the coefficient range improved materially. The monetary coefficient span fell from approximately 691,000 to about 125, a reduction of several orders of magnitude. The normalized profit model returned an objective of 1.0 in both runs, served demand exactly at 5,918,229 units, and ordering and fulfillment decisions matched numerical noise. Inventory paths were identical. This shows that normalization restores similarity when comparing solutions in physical units.

Repeat-solve tests further support this conclusion. Unlike the unnormalized problem, solving the same normalized problem twice produced identical objectives, identical served demand, and zero differences in fulfillment, ordering, inventory, and cash arrays. This indicates that normalization is necessary for stable, repeatable, and economically meaningful cuOpt solutions in enterprise inventory management.

Control phantom orders near the horizon boundary

Phantom orders are a hidden cause of unstable-looking optimization results. During lead time, the solver might schedule large terminal-period orders that cannot physically arrive before the horizon ends. The policy appears feasible in the model but seems strange in the real world.

Consider a simple model that maximizes inventory turnover while ensuring positive cash flow, positive inventory levels, and non-negative orders. Due to lead time, an order placed today affects the inventory only some periods later. Thus, orders made near the end of the planning horizon may never impact inventory states or influence the objective. When ordering costs are included in the objective or constraints, these tail-period orders are usually managed. However, in practice, accurately estimating ordering and holding costs can be challenging, as client datasets often provide only product prices and unit costs, without reliable data on order, holding, or storage costs.

This is especially important in rolling-horizon optimization. Allowing phantom orders can distort reported decisions, impact future state transitions, and produce misleading backtest results. Production inventory models must therefore control terminal-period decision variables through horizon-aware constraints, execution masks, order-cost regularization, terminal penalties, or by excluding decisions that cannot physically arrive within the actionable horizon.

Improve solver stability through scaling and explicit solver selection

Occasionally, optimization failures can occur below the Python layer and appear as internal CUDA or C++ assertion errors, including segmentation faults. These failures are hard to catch with normal exception handling or retries. NVIDIA frequently releases updates to address issues reported by the community on GitHub.

One plausible failure mode is that instability in a concurrently running solver method terminates the process below the Python layer. Poorly scaled LPs are more likely to encounter degenerate or numerically inconsistent states during solve. In concurrent mode, multiple solver methods may be evaluated within the same solve workflow, making these failures harder to isolate.

This makes numerical scaling the first line of defense. A well-conditioned LP is less likely to trigger unstable solver behavior, and when concurrent mode remains unstable, selecting an explicit solver method, dual simplex for CPU-side stability or PDLP for GPU-native solving, can improve reliability. In our formulation, individual solver methods were reliable once scaling was implemented. The recommended approach is therefore to test the model for mathematical consistency, normalize the LP, and use an explicit solver if concurrent-mode instability persists.

Backtesting Against Forecast-Plus-Buffer Baselines

We compared the performance of the robust optimization model against several variants of a smart baseline ordering rule with different buffer levels. The smart baseline uses MetaLearner’s demand forecasts and places orders based on the forecast plus an x% uplift, subject to the same operational cash constraints and inventory feasibility requirements as the robust optimization model.

The data comes from a real-world inventory management challenge faced by one of MetaLearner’s clients. The setting consists of 17 warehouses and 63 items, with an intended rolling planning horizon of 3 weeks at each decision point. There is a 2-week lead time for all item–warehouse combinations, along with a maximum liquidity buffer that must be maintained. A total of 52 weeks of historical data was used to generate the backtested optimization results.

On this dataset, MetaLearner’s forecasts have a globally weighted mean absolute error of around 3.5%, measured as total absolute forecast error as a percentage of total actual demand. With an error of only 3.5%, the smart baseline performs at its best in this comparison. However, the robust optimization model still surpasses it in both service level and inventory efficiency, making the results even more meaningful.

We implement a dynamic rolling-horizon backtest on the historical dataset, which significantly reduces solver time and computational load compared with a single static implementation. For the robust optimization model, we use a 7-week rolling optimization horizon at each solve, but only execute the near-term 3-week decision window after each solve. This design accounts for the maximum lead time, reduces phantom-order behavior, and allows decisions made in the execution horizon to reflect the robustness of states in later periods. Instead of solving once over the full backtest period, the backtester applies the selected policy decisions, observes actual demand, updates the realized inventory and cash states, and then solves again. This produces a more realistic evaluation for strategy comparison, because each policy is tested under the same information structure it would face in production.

Table 2 compares the robust optimization model with smart baseline variants using buffer levels from 0% to 50%. The reported metrics are realized service level, final inventory turnover, and average inventory value. Service level represents the percentage of actual demand captured by policy decisions, given available inventory at each point in time, while ensuring operational cash flows remain feasible during replay. Turnover is measured as the ratio of cost of goods sold to average inventory cost.

Table 2. Policy comparison across smart baseline buffer levels

Policy / Variant	Service Level	Final turnover	Average inventory value	Incremental inventory per +1pp SL
Robust Optimization	94.9%	1.02x	4.51M	0.36M
Baseline, 0% buffer	88.8%	1.30x	2.33M	-
Baseline, 10% buffer	90.3%	1.21x	2.90M	0.38M
Baseline, 20% buffer	91.8%	1.19x	3.45M	0.37M
Baseline, 30% buffer	91.3%	1.10x	3.65M	0.53M
Baseline, 50% buffer	93.0%	0.84x	6.34M	0.95M

The results for the smart baseline show a clear trade-off. With no buffer, the baseline achieves the highest turnover at 1.30x and the lowest average inventory value at $2.33M. However, this comes at the expense of the lowest service level of 88.8%. This indicates that the no-buffer policy is too lean and fails to meet a significant portion of the actual demand. Using the 0% buffer baseline, the smallest increase in inventory cost per 1% rise in service level occurs with robust optimization, suggesting that the robust optimization model manages inventory capital more efficiently than the other baselines.

At higher buffer levels, the baseline increasingly loses its inventory-efficiency advantage. At the extreme, the 50% buffer case achieves a service level of 93.0% but requires an average inventory of $6.34M. This is materially higher than the robust optimization model’s average inventory value of approximately $4.51M, yet the 50% buffer baseline still does not match the robust model’s service level.

Overall, a tuned robust optimization model achieves the best balance between service and inventory efficiency. It delivers a service level of approximately 94.8–94.9%, an average inventory of about 4.51M, and a final turnover of around 1.02x. In contrast, the smart baseline can only approach this level of service by using a much larger fixed buffer, which significantly increases inventory and reduces turnover. Small buffers preserve turnover but leave too much demand unfulfilled, while large buffers improve service at the cost of excess inventory. The robust optimization model performs better not only because it controls for potential demand spikes when making policy decisions, but also because its buffer mechanism allocates inventory more selectively across items, warehouses, and time, rather than applying a uniform uplift to all forecasts.

Conclusion

The central lesson learned is that the forecasting problem and the decision problem are distinct, and solving one does not solve the other. Forecast-plus-buffer rules are easy to deploy, but they do not account for operational constraints, resulting in inefficient protection and higher holding costs. Large optimization models can capture more operational detail, but they are often computationally heavy, difficult to validate through backtesting, or hard to interpret before deployment. The practical path was to combine three ideas: formulate the decision problem as a continuous linear program, execute it through a rolling-horizon control loop, and use NVIDIA cuOpt to scale repeated LP solves on a GPU.

We built a production-ready inventory decision engine using robust optimization and NVIDIA cuOpt, deployed across 17 warehouses and 63 items. The robust model achieved a 94.9% service level, outperforming the best fixed-buffer baseline, which required 40% more inventory to achieve a service level that was still 1.9 percentage points lower. Three engineering lessons made the difference in practice: normalize the LP before solving, control phantom orders in terminal periods, and use explicit single-method solvers if concurrent mode instability persists.

Robust optimization does not eliminate forecast uncertainty. What it offers instead is bounded, interpretable degradation under demand shocks, stable feasibility across rolling solves, and a disciplined allocation of limited supply, properties that fixed-buffer rules cannot replicate regardless of buffer size. When paired with GPU-accelerated LP solvers and rolling-horizon execution, it forms a practical, production-ready feedback-control system for inventory and liquidity management.

Bibliography

Donato_TH. (2023, October 14). EOQ fundamentals: A guide to inventory efficiency. Medium. https://medium.com/donato-story/eoq-fundamentals-a-guide-to-inventory-efficiency-651f678a15d0

DeJans, A., Jr. (2025). Optimizing the uncertain: How stochastic optimization transforms decision-making in supply chains. Medium. https://medium.com/@adam.dejans/optimizing-the-uncertain-how-stochastic-optimization-transforms-decision-making-in-supply-chains-441feb91989e

DeMarle, P. (2019, August 27). The basics of the newsvendor model. Medium. https://medium.com/@pdemarle/the-basics-of-the-newsvendor-model-ef756f203433

Jain, Dr. C. L. (2018, June 22). Ask dr. Jain: What are the forecast accuracy benchmarks in retail?. Demand Planning SOP IBP Supply Planning Business Forecasting Blog. https://demand-planning.com/2018/06/22/what-are-the-benchmarks-in-retail-forecasting-accuracy/

Jin, Z. L., Maasoumy, M., Liu, Y., Zheng, Z., & Ren, Z. (2025). Stochastic optimization of inventory at large-scale supply chains. arXiv. https://doi.org/10.48550/arXiv.2502.11213

Shah, A. (2024, July 12). Addressing industry-specific supply chain challenges with AWS Supply Chain. Amazon Web Services. https://aws.amazon.com/blogs/supply-chain/addressing-industry-specific-supply-chain-challenges-with-aws-supply-chain/

Shakya, M. (2024). Deep reinforcement learning solutions for multi-period inventory replenishment optimization [Doctoral thesis, Nanyang Technological University]. NTU Digital Repository. https://doi.org/10.32657/10356/179797

Salas-Navarro, K., Pardo-Meza, J., Torres-Prentt, J., & Rivera-Alvarado, J. (2025). A multi-product and multi-period inventory planning model to optimize the supply of medicines in a pharmacy in Barranquilla, Colombia. Logistics, 9(4), 151. https://doi.org/10.3390/logistics9040151

Wei, C.-C., & Chen, L.-T. (2021). Supply chain replenishment decision for newsvendor products with multiple periods and a short life cycle. Sustainability, 13(22), 12777. https://doi.org/10.3390/su132212777

Xue, Y., Rujeerapaiboon, N., & Sim, M. (2025, June 16). Robust benchmark satisficing. SSRN. https://doi.org/10.2139/ssrn.5296720