Implementation of the paper: “Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling” (WSC 2025)
Authors: Bulent Soykan, Sean Mondesire, Ghaith Rabadi, Grace Bochenek Institution: University of Central Florida
This repository implements a novel Deep Reinforcement Learning (DRL) framework for solving the Unrelated Parallel Machine Scheduling Problem (UPMSP) with multiple objectives. The approach combines:
✅ Handles complex real-world constraints:
✅ Learns direct scheduling policies (not just heuristic selection)
✅ Achieves superior trade-off between conflicting objectives
✅ Fast inference time suitable for dynamic environments
Given:
Minimize simultaneously:
TWT = Σ w_j × max(0, C_j - d_j)
TST = Σ_k (s_{0,k(1),k} + Σ_l s_{k(l),k(l+1),k})
This problem is:
Traditional methods struggle with:
The UPMSP is formulated as an MDP (S, A, P, R, γ):
State (s_t):
Action (a_t):
Reward (r_t):
The GNN captures complex relationships:
Input Features
↓
Node Projections (Job, Machine, Setup)
↓
GATv2 Layers (4 layers, 128 hidden dim)
- Job ↔ Machine edges (eligibility, processing time, setup)
- Machine ↔ Setup edges (current setup state)
- Job ↔ Setup edges (required setup)
↓
Graph Embeddings
↓
Actor-Critic Networks
- Actor: selects actions (job-machine pairs)
- Critic: estimates state value
Key Design Choices:
Hyperparameters (from paper):
Advantages of PPO:
src/
├── environment.py # UPMSP simulator
├── instance_generator.py # Problem instance generation
├── gnn_model.py # Heterogeneous GNN architecture
├── ppo_agent.py # PPO algorithm implementation
├── baselines.py # ATCSR_Rm and GA baselines
└── visualizations.py # Plotting and analysis
example_demo.py # Demonstration script
requirements.txt # Dependencies
environment.py)instance_generator.py)gnn_model.py)HeterogeneousGNN: Multi-relation graph encoderStateEncoder: Converts environment state to graphppo_agent.py)GNN_PPO_Policy: Actor-critic with shared GNNPPOTrainer: PPO update algorithmbaselines.py)# Clone repository
git clone https://github.com/bulentsoykan/GNN-DRL4UPMSP.git
cd GNN-DRL4UPMSP
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# For CUDA 11.8 (adjust for your CUDA version)
pip install torch-geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
python example_demo.py
This will:
results/figures/from src.instance_generator import InstanceGenerator
generator = InstanceGenerator(seed=42)
instance = generator.generate(
n_jobs=50,
n_machines=10,
tau=0.4, # due date tightness
R=0.6, # due date range
beta=0.1, # setup time ratio
delta=0.75 # eligibility density
)
from src.baselines import ATCSR_Rm_Solver
solver = ATCSR_Rm_Solver(k1=2.0, k2=2.0)
result = solver.solve(instance)
print(f"TWT: {result['total_weighted_tardiness']:.2f}")
print(f"TST: {result['total_setup_time']:.2f}")
from src.baselines import GeneticAlgorithmSolver
ga = GeneticAlgorithmSolver(
population_size=50,
n_generations=100,
alpha=0.5 # weight for TWT
)
result = ga.solve(instance)
| Size | Method | Avg TWT | Avg TST | Avg Comp Time (s) |
|---|---|---|---|---|
| n=20, m=5 | ATCSR_Rm | 150.0 | 75.0 | <0.01 |
| n=20, m=5 | GA | 120.0 | 70.0 | 60.10 |
| n=20, m=5 | PPO-GNN | 110.0 | 65.0 | 0.52 |
| n=50, m=10 | ATCSR_Rm | 355.0 | 190.0 | <0.01 |
| n=50, m=10 | GA | 300.0 | 165.0 | 60.11 |
| n=50, m=10 | PPO-GNN | 260.0 | 140.0 | 0.92 |
| n=100, m=15 | ATCSR_Rm | 610.0 | 290.0 | <0.01 |
| n=100, m=15 | GA | 475.0 | 255.0 | 60.12 |
| n=100, m=15 | PPO-GNN | 420.0 | 225.0 | 1.57 |
✨ PPO-GNN achieves:
The implementation generates:
See results/figures/ after running the demo.
Problem sizes:
Parameters:
Test set:
GNN Architecture:
PPO Hyperparameters:
Hardware:
GNN-DRL4UPMSP/
│
├── src/
│ ├── __init__.py
│ ├── environment.py # UPMSP environment
│ ├── instance_generator.py # Instance generation
│ ├── gnn_model.py # Heterogeneous GNN
│ ├── ppo_agent.py # PPO implementation
│ ├── baselines.py # ATCSR_Rm & GA
│ └── visualizations.py # Plotting functions
│
├── results/
│ ├── figures/ # Generated plots
│ └── models/ # Saved models
│
├── requirements.txt # Dependencies
├── example_demo.py # Demo script
└── README.md # This file
Novel PPO application for direct scheduling policy learning in complex multi-objective UPMSP
Heterogeneous GNN design tailored for UPMSP with explicit modeling of jobs, machines, and setups
Multi-objective reward function that effectively balances TWT and TST minimization
Empirical validation showing PPO-GNN dominates both heuristic and metaheuristic baselines
Potential extensions mentioned in the paper:
If you use this code or build upon this work, please cite:
@inproceedings{soykan2025graph,
title={Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling},
author={Soykan, Bulent and Mondesire, Sean and Rabadi, Ghaith and Bochenek, Grace},
booktitle={2025 Winter Simulation Conference (WSC)},
year={2025},
organization={IEEE}
}
For questions or issues:
Note: This implementation provides the framework and baseline methods. Training the full PPO-GNN agent requires significant computational resources (GPU, multiple days of training). The demo uses simulated PPO-GNN results for illustration.
For the complete trained models or additional experimental data, please contact the author.