Company A is an intra-city logistics platform that needs accurate delivery-time predictions to improve customer trust, partner allocation, and operational planning.
Build a business-ready regression workflow where the target is derived as the difference between order creation and actual delivery timestamps.
All business references are anonymized by design.
import warnings
warnings.filterwarnings('ignore')
from pathlib import Path
import random
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.dummy import DummyRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, HistGradientBoostingRegressor
from sklearn.inspection import permutation_importance
from sklearn.base import BaseEstimator, RegressorMixin
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
SEED = 42
np.random.seed(SEED)
random.seed(SEED)
torch.manual_seed(SEED)
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('deep')
pd.set_option('display.max_columns', 120)
print('Libraries loaded successfully.')
Libraries loaded successfully.
data_path = Path('../data/data_2.csv')
if not data_path.exists():
data_path = Path('..') / 'data' / 'data_2.csv'
df = pd.read_csv(data_path)
print(f'Dataset shape: {df.shape}')
display(df.head(3))
Dataset shape: (175777, 14)
| market_id | created_at | actual_delivery_time | store_primary_category | order_protocol | total_items | subtotal | num_distinct_items | min_item_price | max_item_price | total_onshift_dashers | total_busy_dashers | total_outstanding_orders | estimated_store_to_consumer_driving_duration | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 2015-02-06 22:24:17 | 2015-02-06 23:11:17 | 4 | 1.0 | 4 | 3441 | 4 | 557 | 1239 | 33.0 | 14.0 | 21.0 | 861.0 |
| 1 | 2.0 | 2015-02-10 21:49:25 | 2015-02-10 22:33:25 | 46 | 2.0 | 1 | 1900 | 1 | 1400 | 1400 | 1.0 | 2.0 | 2.0 | 690.0 |
| 2 | 2.0 | 2015-02-16 00:11:35 | 2015-02-16 01:06:35 | 36 | 3.0 | 4 | 4771 | 3 | 820 | 1604 | 8.0 | 6.0 | 18.0 | 289.0 |
This section validates data quality before feature engineering and model development.
schema_df = pd.DataFrame({
'column': df.columns,
'dtype': df.dtypes.astype(str).values,
'missing_count': df.isna().sum().values,
'missing_pct': (df.isna().mean() * 100).round(2).values,
'n_unique': df.nunique(dropna=True).values
}).sort_values('missing_pct', ascending=False)
duplicate_rows = df.duplicated().sum()
display(schema_df)
print(f'Total duplicate rows: {duplicate_rows}')
| column | dtype | missing_count | missing_pct | n_unique | |
|---|---|---|---|---|---|
| 0 | market_id | float64 | 0 | 0.0 | 6 |
| 1 | created_at | str | 0 | 0.0 | 162649 |
| 2 | actual_delivery_time | str | 0 | 0.0 | 160344 |
| 3 | store_primary_category | int64 | 0 | 0.0 | 73 |
| 4 | order_protocol | float64 | 0 | 0.0 | 7 |
| 5 | total_items | int64 | 0 | 0.0 | 54 |
| 6 | subtotal | int64 | 0 | 0.0 | 8182 |
| 7 | num_distinct_items | int64 | 0 | 0.0 | 20 |
| 8 | min_item_price | int64 | 0 | 0.0 | 2251 |
| 9 | max_item_price | int64 | 0 | 0.0 | 2585 |
| 10 | total_onshift_dashers | float64 | 0 | 0.0 | 172 |
| 11 | total_busy_dashers | float64 | 0 | 0.0 | 158 |
| 12 | total_outstanding_orders | float64 | 0 | 0.0 | 281 |
| 13 | estimated_store_to_consumer_driving_duration | float64 | 0 | 0.0 | 1318 |
Total duplicate rows: 0
df = df.drop_duplicates().copy()
df['created_at'] = pd.to_datetime(df['created_at'], errors='coerce')
df['actual_delivery_time'] = pd.to_datetime(df['actual_delivery_time'], errors='coerce')
df['delivery_time_min'] = (df['actual_delivery_time'] - df['created_at']).dt.total_seconds() / 60
# Safety checks for invalid or unrealistic delivery durations
initial_rows = len(df)
df = df.dropna(subset=['created_at', 'actual_delivery_time', 'delivery_time_min'])
df = df[(df['delivery_time_min'] > 2) & (df['delivery_time_min'] < 240)]
print(f'Rows before quality filtering: {initial_rows}')
print(f'Rows after quality filtering: {len(df)}')
print(f'Average delivery time (min): {df["delivery_time_min"].mean():.2f}')
Rows before quality filtering: 175777 Rows after quality filtering: 175777 Average delivery time (min): 46.20
We derive temporal and operational pressure signals that are usable at order-creation time, reducing leakage risk while improving predictive signal.
df['order_hour'] = df['created_at'].dt.hour
df['order_dayofweek'] = df['created_at'].dt.dayofweek
df['is_weekend'] = (df['order_dayofweek'] >= 5).astype(int)
df['order_month'] = df['created_at'].dt.month
for load_col in ['total_onshift_dashers', 'total_busy_dashers', 'total_outstanding_orders']:
df[load_col] = pd.to_numeric(df[load_col], errors='coerce')
df[load_col] = df[load_col].clip(lower=0)
safe_onshift = np.maximum(df['total_onshift_dashers'].values, 0) + 1.0
df['partner_load_ratio'] = df['total_busy_dashers'].values / safe_onshift
df['outstanding_to_onshift_ratio'] = df['total_outstanding_orders'].values / safe_onshift
df['price_range'] = df['max_item_price'] - df['min_item_price']
df['avg_item_price_proxy'] = df['subtotal'] / (df['total_items'] + 1e-3)
df[['partner_load_ratio', 'outstanding_to_onshift_ratio']] = df[['partner_load_ratio', 'outstanding_to_onshift_ratio']].replace([np.inf, -np.inf], np.nan)
df['market_id'] = df['market_id'].astype('Int64').astype(str)
df['order_protocol'] = df['order_protocol'].astype('Int64').astype(str)
df['store_primary_category'] = df['store_primary_category'].astype(str)
display(df[['delivery_time_min', 'order_hour', 'partner_load_ratio', 'outstanding_to_onshift_ratio']].describe().T)
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| delivery_time_min | 175777.0 | 46.203013 | 9.327424 | 32.0 | 39.000000 | 45.000000 | 52.000000 | 110.0 |
| order_hour | 175777.0 | 8.473441 | 8.676809 | 0.0 | 2.000000 | 3.000000 | 19.000000 | 23.0 |
| partner_load_ratio | 175777.0 | 0.887217 | 0.359558 | 0.0 | 0.772727 | 0.916667 | 0.981818 | 29.0 |
| outstanding_to_onshift_ratio | 175777.0 | 1.142865 | 0.456316 | 0.0 | 0.873016 | 1.148148 | 1.416667 | 23.5 |
num_cols_for_eda = [
'delivery_time_min', 'total_items', 'subtotal', 'num_distinct_items',
'min_item_price', 'max_item_price', 'total_onshift_dashers',
'total_busy_dashers', 'total_outstanding_orders',
'estimated_store_to_consumer_driving_duration', 'partner_load_ratio',
'outstanding_to_onshift_ratio'
]
fig, axes = plt.subplots(3, 4, figsize=(18, 11))
axes = axes.flatten()
for i, col in enumerate(num_cols_for_eda):
sns.histplot(df[col], kde=True, ax=axes[i], bins=30)
axes[i].set_title(col)
for j in range(i + 1, len(axes)):
axes[j].axis('off')
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(1, 3, figsize=(16, 4))
top_categories = df['store_primary_category'].value_counts().head(12)
sns.barplot(x=top_categories.values, y=top_categories.index, ax=axes[0])
axes[0].set_title('Top Store Categories')
sns.countplot(data=df, x='order_protocol', order=df['order_protocol'].value_counts().index, ax=axes[1])
axes[1].set_title('Order Protocol Distribution')
axes[1].tick_params(axis='x', rotation=45)
sns.countplot(data=df, x='market_id', order=df['market_id'].value_counts().index, ax=axes[2])
axes[2].set_title('Market Distribution')
axes[2].tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
sns.scatterplot(data=df.sample(min(len(df), 10000), random_state=SEED),
x='estimated_store_to_consumer_driving_duration', y='delivery_time_min', alpha=0.3, ax=axes[0, 0])
axes[0, 0].set_title('Driving Duration vs Delivery Time')
sns.scatterplot(data=df.sample(min(len(df), 10000), random_state=SEED),
x='subtotal', y='delivery_time_min', alpha=0.3, ax=axes[0, 1])
axes[0, 1].set_title('Subtotal vs Delivery Time')
sns.boxplot(data=df, x='is_weekend', y='delivery_time_min', ax=axes[1, 0])
axes[1, 0].set_title('Weekend Effect on Delivery Time')
hourly_median = df.groupby('order_hour', as_index=False)['delivery_time_min'].median()
sns.lineplot(data=hourly_median, x='order_hour', y='delivery_time_min', marker='o', ax=axes[1, 1])
axes[1, 1].set_title('Median Delivery Time by Hour')
plt.tight_layout()
plt.show()
corr_cols = [
'delivery_time_min', 'total_items', 'subtotal', 'num_distinct_items',
'min_item_price', 'max_item_price', 'total_onshift_dashers',
'total_busy_dashers', 'total_outstanding_orders',
'estimated_store_to_consumer_driving_duration', 'partner_load_ratio',
'outstanding_to_onshift_ratio', 'order_hour'
]
plt.figure(figsize=(12, 8))
sns.heatmap(df[corr_cols].corr(), cmap='coolwarm', center=0, annot=False)
plt.title('Correlation Heatmap (Numerical Features)')
plt.show()
skew_df = (df[num_cols_for_eda].skew().sort_values(ascending=False)).rename('skewness').to_frame()
q1 = df[num_cols_for_eda].quantile(0.25)
q3 = df[num_cols_for_eda].quantile(0.75)
iqr = q3 - q1
outlier_share = (((df[num_cols_for_eda] < (q1 - 1.5 * iqr)) | (df[num_cols_for_eda] > (q3 + 1.5 * iqr))).mean() * 100)
skew_df['outlier_share_pct_iqr'] = outlier_share
display(skew_df.round(3))
plt.figure(figsize=(10, 5))
sns.boxplot(data=df[num_cols_for_eda], orient='h', showfliers=False)
plt.title('Boxplot Overview (fliers hidden for readability)')
plt.tight_layout()
plt.show()
| skewness | outlier_share_pct_iqr | |
|---|---|---|
| total_items | 23.286 | 4.828 |
| partner_load_ratio | 12.209 | 9.910 |
| outstanding_to_onshift_ratio | 3.207 | 3.060 |
| min_item_price | 2.339 | 2.302 |
| max_item_price | 2.204 | 3.956 |
| subtotal | 1.918 | 4.580 |
| num_distinct_items | 1.574 | 2.986 |
| total_outstanding_orders | 1.191 | 2.955 |
| total_onshift_dashers | 0.857 | 0.687 |
| total_busy_dashers | 0.779 | 0.263 |
| delivery_time_min | 0.762 | 0.995 |
| estimated_store_to_consumer_driving_duration | 0.140 | 0.179 |
We use a 70/15/15 split with random-state control. All imputing, encoding, and scaling are fitted on training data only.
target_col = 'delivery_time_min'
drop_cols = ['created_at', 'actual_delivery_time', target_col]
X = df.drop(columns=drop_cols)
y = df[target_col].astype(float)
X_train, X_temp, y_train, y_temp = train_test_split(
X, y, test_size=0.30, random_state=SEED
)
X_val, X_test, y_val, y_test = train_test_split(
X_temp, y_temp, test_size=0.50, random_state=SEED
)
num_features = X_train.select_dtypes(include=[np.number]).columns.tolist()
cat_features = X_train.select_dtypes(exclude=[np.number]).columns.tolist()
print(f'Train shape: {X_train.shape}, Validation shape: {X_val.shape}, Test shape: {X_test.shape}')
print(f'Numeric features ({len(num_features)}): {num_features}')
print(f'Categorical features ({len(cat_features)}): {cat_features}')
Train shape: (123043, 20), Validation shape: (26367, 20), Test shape: (26367, 20) Numeric features (17): ['total_items', 'subtotal', 'num_distinct_items', 'min_item_price', 'max_item_price', 'total_onshift_dashers', 'total_busy_dashers', 'total_outstanding_orders', 'estimated_store_to_consumer_driving_duration', 'order_hour', 'order_dayofweek', 'is_weekend', 'order_month', 'partner_load_ratio', 'outstanding_to_onshift_ratio', 'price_range', 'avg_item_price_proxy'] Categorical features (3): ['market_id', 'store_primary_category', 'order_protocol']
try:
ohe = OneHotEncoder(handle_unknown='ignore', sparse_output=False)
except TypeError:
ohe = OneHotEncoder(handle_unknown='ignore', sparse=False)
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('encoder', ohe)
])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, num_features),
('cat', categorical_transformer, cat_features)
]
)
X_train_proc = preprocessor.fit_transform(X_train)
X_val_proc = preprocessor.transform(X_val)
X_test_proc = preprocessor.transform(X_test)
feature_names = preprocessor.get_feature_names_out()
print(f'Processed train matrix shape: {X_train_proc.shape}')
Processed train matrix shape: (123043, 102)
We start with baseline regressors to quantify incremental value from nonlinear deep models.
def regression_metrics(y_true, y_pred):
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
r2 = r2_score(y_true, y_pred)
return {'MAE': mae, 'RMSE': rmse, 'R2': r2}
baseline_models = {
'Dummy_Median': DummyRegressor(strategy='median'),
'Linear_Regression': LinearRegression(),
'Random_Forest': RandomForestRegressor(
n_estimators=160,
max_depth=18,
min_samples_leaf=4,
random_state=SEED,
n_jobs=-1
),
'Hist_Gradient_Boosting': HistGradientBoostingRegressor(
max_depth=10,
learning_rate=0.05,
max_iter=350,
random_state=SEED
)
}
model_store = {}
rows = []
for name, model in baseline_models.items():
model.fit(X_train_proc, y_train)
model_store[name] = model
val_pred = model.predict(X_val_proc)
test_pred = model.predict(X_test_proc)
val_metrics = regression_metrics(y_val, val_pred)
test_metrics = regression_metrics(y_test, test_pred)
rows.append({
'Model': name,
'Val_MAE': val_metrics['MAE'],
'Val_RMSE': val_metrics['RMSE'],
'Val_R2': val_metrics['R2'],
'Test_MAE': test_metrics['MAE'],
'Test_RMSE': test_metrics['RMSE'],
'Test_R2': test_metrics['R2']
})
baseline_results = pd.DataFrame(rows).sort_values('Val_RMSE')
display(baseline_results.round(4))
| Model | Val_MAE | Val_RMSE | Val_R2 | Test_MAE | Test_RMSE | Test_R2 | |
|---|---|---|---|---|---|---|---|
| 3 | Hist_Gradient_Boosting | 0.7621 | 0.9988 | 0.9885 | 0.7587 | 0.9952 | 0.9887 |
| 2 | Random_Forest | 1.1816 | 1.6005 | 0.9705 | 1.1769 | 1.6075 | 0.9706 |
| 1 | Linear_Regression | 1.9641 | 2.8430 | 0.9070 | 1.9664 | 2.8491 | 0.9075 |
| 0 | Dummy_Median | 7.3885 | 9.4074 | -0.0180 | 7.3849 | 9.4415 | -0.0159 |
We tune compact feed-forward architectures with early stopping on validation RMSE.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
X_train_t = torch.tensor(X_train_proc, dtype=torch.float32)
y_train_t = torch.tensor(y_train.values.reshape(-1, 1), dtype=torch.float32)
X_val_t = torch.tensor(X_val_proc, dtype=torch.float32)
y_val_t = torch.tensor(y_val.values.reshape(-1, 1), dtype=torch.float32)
X_test_t = torch.tensor(X_test_proc, dtype=torch.float32)
y_test_t = torch.tensor(y_test.values.reshape(-1, 1), dtype=torch.float32)
train_dataset = TensorDataset(X_train_t, y_train_t)
val_dataset = TensorDataset(X_val_t, y_val_t)
class NeuralRegressor(nn.Module):
def __init__(self, in_features, hidden_layers=(128, 64), dropout=0.15):
super().__init__()
layers = []
prev = in_features
for h in hidden_layers:
layers += [nn.Linear(prev, h), nn.ReLU(), nn.BatchNorm1d(h), nn.Dropout(dropout)]
prev = h
layers += [nn.Linear(prev, 1)]
self.net = nn.Sequential(*layers)
def forward(self, x):
return self.net(x)
def train_one_config(config, seed=SEED):
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)
model = NeuralRegressor(
in_features=X_train_proc.shape[1],
hidden_layers=config['hidden_layers'],
dropout=config['dropout']
).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=config['lr'], weight_decay=config['weight_decay'])
criterion = nn.MSELoss()
train_loader = DataLoader(train_dataset, batch_size=config['batch_size'], shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=4096, shuffle=False)
best_state = None
best_val_rmse = float('inf')
patience = config['patience']
wait = 0
history = []
for epoch in range(config['epochs']):
model.train()
train_losses = []
for xb, yb in train_loader:
xb = xb.to(device)
yb = yb.to(device)
optimizer.zero_grad()
pred = model(xb)
loss = criterion(pred, yb)
loss.backward()
optimizer.step()
train_losses.append(loss.item())
model.eval()
val_preds = []
val_true = []
with torch.no_grad():
for xb, yb in val_loader:
xb = xb.to(device)
out = model(xb).cpu().numpy().ravel()
val_preds.append(out)
val_true.append(yb.numpy().ravel())
yv_pred = np.concatenate(val_preds)
yv_true = np.concatenate(val_true)
val_rmse = np.sqrt(mean_squared_error(yv_true, yv_pred))
history.append((epoch + 1, float(np.mean(train_losses)), float(val_rmse)))
if val_rmse < best_val_rmse:
best_val_rmse = val_rmse
best_state = {k: v.cpu().clone() for k, v in model.state_dict().items()}
wait = 0
else:
wait += 1
if wait >= patience:
break
model.load_state_dict(best_state)
return model, best_val_rmse, pd.DataFrame(history, columns=['epoch', 'train_mse', 'val_rmse'])
nn_grid = [
{'hidden_layers': (128, 64), 'dropout': 0.10, 'lr': 1e-3, 'weight_decay': 1e-5, 'batch_size': 1024, 'epochs': 45, 'patience': 6},
{'hidden_layers': (256, 128, 64), 'dropout': 0.15, 'lr': 7e-4, 'weight_decay': 1e-5, 'batch_size': 1024, 'epochs': 55, 'patience': 7},
{'hidden_layers': (192, 96), 'dropout': 0.20, 'lr': 6e-4, 'weight_decay': 5e-5, 'batch_size': 2048, 'epochs': 60, 'patience': 7}
]
nn_trials = []
best_nn_model = None
best_nn_history = None
best_nn_config = None
best_nn_val_rmse = float('inf')
for config in nn_grid:
model, val_rmse, history_df = train_one_config(config, seed=SEED)
nn_trials.append({
'config': str(config),
'val_rmse': val_rmse
})
if val_rmse < best_nn_val_rmse:
best_nn_val_rmse = val_rmse
best_nn_model = model
best_nn_history = history_df
best_nn_config = config
nn_trials_df = pd.DataFrame(nn_trials).sort_values('val_rmse')
display(nn_trials_df)
print('Best NN config:', best_nn_config)
print('Best validation RMSE:', round(best_nn_val_rmse, 4))
Using device: cpu
| config | val_rmse | |
|---|---|---|
| 1 | {'hidden_layers': (256, 128, 64), 'dropout': 0... | 0.516073 |
| 2 | {'hidden_layers': (192, 96), 'dropout': 0.2, '... | 0.545449 |
| 0 | {'hidden_layers': (128, 64), 'dropout': 0.1, '... | 0.560671 |
Best NN config: {'hidden_layers': (256, 128, 64), 'dropout': 0.15, 'lr': 0.0007, 'weight_decay': 1e-05, 'batch_size': 1024, 'epochs': 55, 'patience': 7}
Best validation RMSE: 0.5161
plt.figure(figsize=(8, 4))
sns.lineplot(data=best_nn_history, x='epoch', y='train_mse', label='Train MSE')
sns.lineplot(data=best_nn_history, x='epoch', y='val_rmse', label='Validation RMSE')
plt.title('NN Training Dynamics')
plt.show()
def predict_nn(model, X_np):
model.eval()
with torch.no_grad():
preds = model(torch.tensor(X_np, dtype=torch.float32, device=device)).cpu().numpy().ravel()
return preds
nn_val_pred = predict_nn(best_nn_model, X_val_proc)
nn_test_pred = predict_nn(best_nn_model, X_test_proc)
nn_val_metrics = regression_metrics(y_val, nn_val_pred)
nn_test_metrics = regression_metrics(y_test, nn_test_pred)
print('NN Validation Metrics:', {k: round(v, 4) for k, v in nn_val_metrics.items()})
print('NN Test Metrics:', {k: round(v, 4) for k, v in nn_test_metrics.items()})
NN Validation Metrics: {'MAE': 0.402, 'RMSE': np.float64(0.5161), 'R2': 0.9969}
NN Test Metrics: {'MAE': 0.3987, 'RMSE': np.float64(0.5156), 'R2': 0.997}
stability_rows = []
for trial_seed in [11, 29, 47]:
model_s, val_rmse_s, _ = train_one_config(best_nn_config, seed=trial_seed)
val_pred_s = predict_nn(model_s, X_val_proc)
test_pred_s = predict_nn(model_s, X_test_proc)
stability_rows.append({
'seed': trial_seed,
'val_rmse': np.sqrt(mean_squared_error(y_val, val_pred_s)),
'test_rmse': np.sqrt(mean_squared_error(y_test, test_pred_s)),
'test_mae': mean_absolute_error(y_test, test_pred_s),
'test_r2': r2_score(y_test, test_pred_s)
})
stability_df = pd.DataFrame(stability_rows)
display(stability_df.round(4))
print('NN test RMSE std across seeds:', round(stability_df['test_rmse'].std(), 4))
| seed | val_rmse | test_rmse | test_mae | test_r2 | |
|---|---|---|---|---|---|
| 0 | 11 | 0.5082 | 0.5032 | 0.3950 | 0.9971 |
| 1 | 29 | 0.5301 | 0.5301 | 0.4127 | 0.9968 |
| 2 | 47 | 0.5754 | 0.5754 | 0.4462 | 0.9962 |
NN test RMSE std across seeds: 0.0365
The seed-based rerun shows limited but visible variation in neural-network error. Key outcome from this cell: test RMSE std = 0.0365 across tested seeds.
Business reading: model quality is strong, but production should monitor drift and variance jointly rather than relying on a single-run score.
comparison_df = baseline_results.copy()
comparison_df = comparison_df[['Model', 'Val_MAE', 'Val_RMSE', 'Val_R2', 'Test_MAE', 'Test_RMSE', 'Test_R2']]
nn_row = pd.DataFrame([{
'Model': 'PyTorch_NN_Tuned',
'Val_MAE': nn_val_metrics['MAE'],
'Val_RMSE': nn_val_metrics['RMSE'],
'Val_R2': nn_val_metrics['R2'],
'Test_MAE': nn_test_metrics['MAE'],
'Test_RMSE': nn_test_metrics['RMSE'],
'Test_R2': nn_test_metrics['R2']
}])
comparison_df = pd.concat([comparison_df, nn_row], ignore_index=True).sort_values('Val_RMSE')
display(comparison_df.round(4))
best_model_name = comparison_df.iloc[0]['Model']
print('Best model by validation RMSE:', best_model_name)
| Model | Val_MAE | Val_RMSE | Val_R2 | Test_MAE | Test_RMSE | Test_R2 | |
|---|---|---|---|---|---|---|---|
| 4 | PyTorch_NN_Tuned | 0.4020 | 0.5161 | 0.9969 | 0.3987 | 0.5156 | 0.9970 |
| 0 | Hist_Gradient_Boosting | 0.7621 | 0.9988 | 0.9885 | 0.7587 | 0.9952 | 0.9887 |
| 1 | Random_Forest | 1.1816 | 1.6005 | 0.9705 | 1.1769 | 1.6075 | 0.9706 |
| 2 | Linear_Regression | 1.9641 | 2.8430 | 0.9070 | 1.9664 | 2.8491 | 0.9075 |
| 3 | Dummy_Median | 7.3885 | 9.4074 | -0.0180 | 7.3849 | 9.4415 | -0.0159 |
Best model by validation RMSE: PyTorch_NN_Tuned
This comparison table confirms the ranking of tested models on the first run:
Business reading: use NN as champion with gradient boosting in active challenger mode.
First-run stability check for the tuned neural network showed test RMSE standard deviation of 0.0365 across multiple seeds. This indicates acceptable but non-zero variance, so production deployment should keep ensemble alternatives active as challengers.
Fallback models evaluated in the same run:
Operational recommendation: retain a champion-challenger setup with weekly tracking of error drift and variance.
Residual analysis is mandatory for regression reliability. We assess bias, spread, and systematic error patterns.
if best_model_name == 'PyTorch_NN_Tuned':
yhat_test = nn_test_pred
elif best_model_name in model_store:
yhat_test = model_store[best_model_name].predict(X_test_proc)
else:
yhat_test = nn_test_pred
residuals = y_test.values - yhat_test
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
sns.scatterplot(x=yhat_test, y=residuals, alpha=0.3, ax=axes[0])
axes[0].axhline(0, color='red', linestyle='--')
axes[0].set_title('Residuals vs Predictions')
axes[0].set_xlabel('Predicted delivery time (min)')
axes[0].set_ylabel('Residual (actual - predicted)')
sns.histplot(residuals, bins=40, kde=True, ax=axes[1])
axes[1].set_title('Residual Distribution')
sample_idx = np.random.choice(len(yhat_test), size=min(8000, len(yhat_test)), replace=False)
sns.scatterplot(x=y_test.values[sample_idx], y=yhat_test[sample_idx], alpha=0.3, ax=axes[2])
min_v = min(y_test.min(), yhat_test.min())
max_v = max(y_test.max(), yhat_test.max())
axes[2].plot([min_v, max_v], [min_v, max_v], color='red', linestyle='--')
axes[2].set_title('Actual vs Predicted')
axes[2].set_xlabel('Actual delivery time (min)')
axes[2].set_ylabel('Predicted delivery time (min)')
plt.tight_layout()
plt.show()
class NNWrapper(BaseEstimator, RegressorMixin):
def __init__(self, model):
self.model = model
def fit(self, X, y=None):
return self
def predict(self, X):
return predict_nn(self.model, X)
interp_model_name = 'PyTorch_NN_Tuned' if 'PyTorch_NN_Tuned' in comparison_df['Model'].values else comparison_df.iloc[0]['Model']
if interp_model_name == 'PyTorch_NN_Tuned':
estimator_for_importance = NNWrapper(best_nn_model)
else:
estimator_for_importance = model_store[interp_model_name]
sample_n = min(12000, len(X_val_proc))
sample_idx = np.random.choice(len(X_val_proc), size=sample_n, replace=False)
X_val_sample = X_val_proc[sample_idx]
y_val_sample = y_val.values[sample_idx]
perm = permutation_importance(
estimator_for_importance,
X_val_sample,
y_val_sample,
scoring='neg_root_mean_squared_error',
n_repeats=5,
random_state=SEED
)
importance_df = pd.DataFrame({
'feature': feature_names,
'importance_mean': perm.importances_mean,
'importance_std': perm.importances_std
}).sort_values('importance_mean', ascending=False)
display(importance_df.head(15).round(5))
plt.figure(figsize=(10, 6))
sns.barplot(data=importance_df.head(12), x='importance_mean', y='feature')
plt.title(f'Permutation Importance ({interp_model_name})')
plt.tight_layout()
plt.show()
| feature | importance_mean | importance_std | |
|---|---|---|---|
| 7 | num__total_outstanding_orders | 17.12451 | 0.11459 |
| 5 | num__total_onshift_dashers | 11.54246 | 0.07179 |
| 8 | num__estimated_store_to_consumer_driving_duration | 5.75097 | 0.02673 |
| 14 | num__outstanding_to_onshift_ratio | 4.25098 | 0.00987 |
| 6 | num__total_busy_dashers | 3.22255 | 0.01716 |
| 1 | num__subtotal | 2.86132 | 0.01436 |
| 10 | num__order_dayofweek | 2.76124 | 0.03131 |
| 11 | num__is_weekend | 2.55341 | 0.05192 |
| 9 | num__order_hour | 2.40792 | 0.01190 |
| 18 | cat__market_id_2 | 1.01769 | 0.01110 |
| 17 | cat__market_id_1 | 1.00895 | 0.00923 |
| 99 | cat__order_protocol_5 | 0.92017 | 0.01097 |
| 2 | num__num_distinct_items | 0.87323 | 0.00527 |
| 20 | cat__market_id_4 | 0.66689 | 0.01335 |
| 3 | num__min_item_price | 0.58189 | 0.00845 |
summary_findings = pd.DataFrame([
{'Area': 'Data Quality', 'Finding': 'Datetime parsing and duplicate handling completed', 'Business Impact': 'Reliable modeling inputs'},
{'Area': 'Target Engineering', 'Finding': 'Delivery time derived as timestamp difference in minutes', 'Business Impact': 'Direct KPI alignment'},
{'Area': 'EDA', 'Finding': 'Long-tail delays and load-driven pressure windows observed', 'Business Impact': 'Supports peak-hour planning'},
{'Area': 'Modeling', 'Finding': f'Best validation model: {best_model_name}', 'Business Impact': 'Improved ETA quality and planning'},
{'Area': 'Interpretation', 'Finding': 'Operational load and driving-duration proxies are key predictors', 'Business Impact': 'Actionable for dispatch and partner allocation'}
])
display(summary_findings)
final_metrics = comparison_df[['Model', 'Test_MAE', 'Test_RMSE', 'Test_R2']].sort_values('Test_RMSE')
display(final_metrics.round(4))
| Area | Finding | Business Impact | |
|---|---|---|---|
| 0 | Data Quality | Datetime parsing and duplicate handling completed | Reliable modeling inputs |
| 1 | Target Engineering | Delivery time derived as timestamp difference ... | Direct KPI alignment |
| 2 | EDA | Long-tail delays and load-driven pressure wind... | Supports peak-hour planning |
| 3 | Modeling | Best validation model: PyTorch_NN_Tuned | Improved ETA quality and planning |
| 4 | Interpretation | Operational load and driving-duration proxies ... | Actionable for dispatch and partner allocation |
| Model | Test_MAE | Test_RMSE | Test_R2 | |
|---|---|---|---|---|
| 4 | PyTorch_NN_Tuned | 0.3987 | 0.5156 | 0.9970 |
| 0 | Hist_Gradient_Boosting | 0.7587 | 0.9952 | 0.9887 |
| 1 | Random_Forest | 1.1769 | 1.6075 | 0.9706 |
| 2 | Linear_Regression | 1.9664 | 2.8491 | 0.9075 |
| 3 | Dummy_Median | 7.3849 | 9.4415 | -0.0159 |
The findings and final metrics tables convert technical results into operational decisions:
Business reading: this notebook now functions as both modeling artifact and decision-support document.
Built an end-to-end delivery-time prediction solution using anonymized logistics data, robust preprocessing, and model benchmarking across linear, tree-based, and neural approaches.
Treat this first run as a production-ready baseline with governance, not as a one-time modeling exercise.
# Store the model and results for future reference
import joblib
model_store_path = Path('../models/nn_regressor.pkl')
joblib.dump(best_nn_model.cpu(), model_store_path)
print(f'Best NN model saved to: {model_store_path}')
Best NN model saved to: ..\models\nn_regressor.pkl