Whylitics

In today’s competitive retail environment, effective inventory and demand planning is critical to profitability. As part of Whylitics’ commitment to delivering Optimal Production & Inventory Planning services, we conducted a comprehensive analysis using publicly available sales and inventory data from a liquor retail chain. The dataset, sourced from Kaggle, provided detailed insights into the store-level operations, covering over a year of sales, purchases, and inventory records. Our objective was to explore inventory inefficiencies, predict future demand, and develop a scalable decision-support tool for store managers.

Dataset Overview

The dataset included several interconnected tables:

Purches: Raw purchase order data with vendor and transaction information.
PurchasesFinal: A cleaned and enriched version ready for analysis.
Sales: Transaction-level sales data showing quantities, prices, and dates.
InvFinal & BegInv: Final and starting inventory balances.
PurchasePrice: Reference purchase prices and vendor relationships.

Through initial data preparation, we harmonized units of measurement (e.g., converting bottle sizes like "750mL", "1.75L", "5.0 Oz", "128.0 Gal" or "50mL 5 Pk" into consistent milliliter values), cleaned and converted timestamps, and encoded categorical variables efficiently. The entire dataset spanned sales and inventory movements across dozens of stores, covering a wide range of alcoholic beverages.

Financial Performance Analysis

One of the most critical goals was identifying operational inefficiencies across stores. We developed a financial metrics table per store, calculating: PurchaseSpend, FreightSpend, ExciseTax, Revenue, TotalOnHand (ending inventory), UnsoldInventoryValue, TotalExpense, Profit, EfficiencyIndex = Profit / UnsoldInventoryValue, ProfitMargin = Profit / Revenue.

Key Findings:

Store 50 held about $5 million in unsold goods, indicating severe over-purchasing.
Store 46 in contrast, showed zero inventory leftovers.
No store generated net profits during the analysis period.

The Efficiency Index served as a balanced metric to evaluate stores. While some stores had minimal leftovers, their revenues were negligible. Others made significant purchases but couldn’t convert them into efficient sales, suggesting a mismatch between purchasing and customer demand.

Monthly Sales Aggregation and Data Transformation

We aggregated transaction data into monthly buckets per product-store combination. The dataset totaled nearly 500,000 monthly records. Key statistics revealed high variance and skewness in sales, price, volume, and excise taxes.

For instance:

Median monthly sales quantity: 2 units
Max monthly sales quantity: 800 units
Median price: $12.99; Max: $4,999.99
Median volume: 750mL; Range: 0 to 90,000mL

To prepare the data for modeling, we performed:

Outlier Handling: Capped extreme sales quantities (e.g., at 99th percentile).
Seasonal Adjustments: Removed January and February due to exceptionally high sales (likely holiday-driven), but flagged them for managerial attention.
Log Transformations: Applied to SalesQuantity, SalesPrice, Volume, and ExciseTax to normalize skewed distributions.

Forecasting Demand Model

To enable forward-looking inventory optimization, we developed a predictive model to estimate monthly product demand at each store. This model relied on a combination of cleaned and engineered features, where past behavior was leveraged to predict future needs. A key feature introduced was the lagged variable PreviousMonthlySales, which captured the momentum of product-level demand from the prior month—a strong predictor in retail environments.

To enhance model performance and reduce computational load, we limited the dataset to a subset of representative stores—specifically stores 50, 73, 67, 34, 76, and 69. This allowed us to preserve data diversity while streamlining processing. Categorical variables such as store ID, brand, classification, and vendor number were then encoded using one-hot encoding to make them compatible with machine learning algorithms. After completing feature preparation, we trained a Random Forest Regressor on the structured dataset, using an 80/20 train-test split to validate model accuracy.

The results were highly promising. The model achieved an R² score of 0.9967, indicating an excellent fit between predicted and actual sales. The Mean Absolute Error (MAE) was just 0.0019, and the Root Mean Squared Error (RMSE) stood at 0.0295—both suggesting exceptional accuracy in forecasting monthly sales quantities. This level of precision was largely attributed to thoughtful feature engineering, rigorous outlier control, and the strong temporal signal captured through historical sales data.

To operationalize the forecasting model, we developed a user-friendly Decision-support Tool tailored for store managers. This tool allows managers to input key product and store attributes—such as store ID, brand ID, classification, vendor number, and previous month’s sales—and receive a precise recommendation for the optimal purchase quantity for the upcoming month.

By translating complex predictive outputs into actionable guidance, the tool empowers managers to make informed, data-driven procurement decisions aligned with their store’s specific demand patterns. This localized intelligence enhances purchasing accuracy, reduces overstock risk, and supports leaner, more efficient inventory management across the retail network.

Business Impact and Strategic Considerations

This analysis underscores the transformative potential of data science in retail operations—revealing hidden inefficiencies, quantifying financial leakage, and enabling more strategic, evidence-based purchasing decisions. For the liquor retail chain under study, several key insights emerged:

Excess inventory is placing a significant drag on profitability, with certain stores—like Store 50—holding millions in unsold stock.
A persistent disconnect between supply and actual customer demand is evident, especially across high-volume product categories like vodka.
Store-level demand forecasting holds immense value in aligning procurement with local sales behavior, minimizing waste and optimizing cash flow.

To address these challenges, we advocate for the adoption of predictive analytics tools that transform historical sales patterns into forward-looking recommendations. When applied consistently, such tools allow organizations to:

Reduce capital lockup by minimizing unsold inventory.
Enhance store-level profitability through smarter purchasing.
Improve warehouse efficiency with more accurate demand signals.
Empower local managers through decentralized, data-informed decision-making.

Importantly, while the model provides robust monthly demand forecasts, it is based on one year of historical data. To avoid distortion caused by seasonal spikes, we excluded January and February—months that showed unusually high sales likely tied to holidays and New Year events. Managers using the tool should remain mindful of these seasonal peaks and proactively adjust their purchase plans for such periods, anticipating higher-than-average demand.

In conclusion, this initiative demonstrates that inventory planning is no longer a function of guesswork or static rules. With machine learning and store-level analytics, retail organizations can shift from reactive stock management to a predictive, agile approach—ensuring better financial outcomes and stronger customer satisfaction.

View the full code behind this analysis in our Google Colab notebook.

Optimizing Inventory and Demand Forecasting: A Case Study on Retail Liquor Stores

Dataset Overview

Financial Performance Analysis

Top Products Analysis

Monthly Sales Aggregation and Data Transformation

Forecasting Demand Model

Business Impact and Strategic Considerations