GNN-DRL4UPMSP

Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling

Implementation of the paper: “Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling” (WSC 2025)

Authors: Bulent Soykan, Sean Mondesire, Ghaith Rabadi, Grace Bochenek Institution: University of Central Florida

📋 Table of Contents

🎯 Overview

This repository implements a novel Deep Reinforcement Learning (DRL) framework for solving the Unrelated Parallel Machine Scheduling Problem (UPMSP) with multiple objectives. The approach combines:

Key Features

✅ Handles complex real-world constraints:

✅ Learns direct scheduling policies (not just heuristic selection)

✅ Achieves superior trade-off between conflicting objectives

✅ Fast inference time suitable for dynamic environments

📖 Problem Description

Unrelated Parallel Machine Scheduling Problem (UPMSP)

Given:

Objectives

Minimize simultaneously:

  1. Total Weighted Tardiness (TWT):
    TWT = Σ w_j × max(0, C_j - d_j)
    
  2. Total Setup Time (TST):
    TST = Σ_k (s_{0,k(1),k} + Σ_l s_{k(l),k(l+1),k})
    

Why This Matters

This problem is:

Traditional methods struggle with:

🧠 Methodology

1. Markov Decision Process (MDP) Formulation

The UPMSP is formulated as an MDP (S, A, P, R, γ):

State (s_t):

Action (a_t):

Reward (r_t):

2. Heterogeneous GNN Architecture

The GNN captures complex relationships:

Input Features
     ↓
Node Projections (Job, Machine, Setup)
     ↓
GATv2 Layers (4 layers, 128 hidden dim)
 - Job ↔ Machine edges (eligibility, processing time, setup)
 - Machine ↔ Setup edges (current setup state)
 - Job ↔ Setup edges (required setup)
     ↓
Graph Embeddings
     ↓
Actor-Critic Networks
 - Actor: selects actions (job-machine pairs)
 - Critic: estimates state value

Key Design Choices:

3. PPO Training Algorithm

Hyperparameters (from paper):

Advantages of PPO:

💻 Implementation

Code Structure

src/
├── environment.py         # UPMSP simulator
├── instance_generator.py  # Problem instance generation
├── gnn_model.py          # Heterogeneous GNN architecture
├── ppo_agent.py          # PPO algorithm implementation
├── baselines.py          # ATCSR_Rm and GA baselines
└── visualizations.py     # Plotting and analysis

example_demo.py           # Demonstration script
requirements.txt          # Dependencies

Key Components

1. Environment (environment.py)

2. Instance Generator (instance_generator.py)

3. GNN Model (gnn_model.py)

4. PPO Agent (ppo_agent.py)

5. Baselines (baselines.py)

🚀 Installation

Requirements

Setup

# Clone repository
git clone https://github.com/bulentsoykan/GNN-DRL4UPMSP.git
cd GNN-DRL4UPMSP

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Install PyTorch Geometric

# For CUDA 11.8 (adjust for your CUDA version)
pip install torch-geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html

🎮 Quick Start

Run Demo

python example_demo.py

This will:

  1. Generate problem instances for different sizes
  2. Run ATCSR_Rm and GA baselines
  3. Compare results
  4. Generate visualizations in results/figures/

Generate Custom Instance

from src.instance_generator import InstanceGenerator

generator = InstanceGenerator(seed=42)

instance = generator.generate(
    n_jobs=50,
    n_machines=10,
    tau=0.4,        # due date tightness
    R=0.6,          # due date range
    beta=0.1,       # setup time ratio
    delta=0.75      # eligibility density
)

Solve with ATCSR_Rm

from src.baselines import ATCSR_Rm_Solver

solver = ATCSR_Rm_Solver(k1=2.0, k2=2.0)
result = solver.solve(instance)

print(f"TWT: {result['total_weighted_tardiness']:.2f}")
print(f"TST: {result['total_setup_time']:.2f}")

Solve with Genetic Algorithm

from src.baselines import GeneticAlgorithmSolver

ga = GeneticAlgorithmSolver(
    population_size=50,
    n_generations=100,
    alpha=0.5  # weight for TWT
)

result = ga.solve(instance)

📊 Results

Performance Comparison (from paper)

Size Method Avg TWT Avg TST Avg Comp Time (s)
n=20, m=5 ATCSR_Rm 150.0 75.0 <0.01
n=20, m=5 GA 120.0 70.0 60.10
n=20, m=5 PPO-GNN 110.0 65.0 0.52
n=50, m=10 ATCSR_Rm 355.0 190.0 <0.01
n=50, m=10 GA 300.0 165.0 60.11
n=50, m=10 PPO-GNN 260.0 140.0 0.92
n=100, m=15 ATCSR_Rm 610.0 290.0 <0.01
n=100, m=15 GA 475.0 255.0 60.12
n=100, m=15 PPO-GNN 420.0 225.0 1.57

Key Findings

PPO-GNN achieves:

Visualizations

The implementation generates:

See results/figures/ after running the demo.

🔬 Experimental Setup (Paper Details)

Instance Parameters

Problem sizes:

Parameters:

Test set:

Training Configuration

GNN Architecture:

PPO Hyperparameters:

Hardware:

📁 Project Structure

GNN-DRL4UPMSP/
│
├── src/
│   ├── __init__.py
│   ├── environment.py           # UPMSP environment
│   ├── instance_generator.py    # Instance generation
│   ├── gnn_model.py             # Heterogeneous GNN
│   ├── ppo_agent.py             # PPO implementation
│   ├── baselines.py             # ATCSR_Rm & GA
│   └── visualizations.py        # Plotting functions
│
├── results/
│   ├── figures/                 # Generated plots
│   └── models/                  # Saved models
│
├── requirements.txt             # Dependencies
├── example_demo.py              # Demo script
└── README.md                    # This file

🎓 Key Contributions (from paper)

  1. Novel PPO application for direct scheduling policy learning in complex multi-objective UPMSP

  2. Heterogeneous GNN design tailored for UPMSP with explicit modeling of jobs, machines, and setups

  3. Multi-objective reward function that effectively balances TWT and TST minimization

  4. Empirical validation showing PPO-GNN dominates both heuristic and metaheuristic baselines

🔮 Future Work

Potential extensions mentioned in the paper:

📚 Citation

If you use this code or build upon this work, please cite:

@inproceedings{soykan2025graph,
  title={Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling},
  author={Soykan, Bulent and Mondesire, Sean and Rabadi, Ghaith and Bochenek, Grace},
  booktitle={2025 Winter Simulation Conference (WSC)},
  year={2025},
  organization={IEEE}
}

📧 Contact

For questions or issues:


Note: This implementation provides the framework and baseline methods. Training the full PPO-GNN agent requires significant computational resources (GPU, multiple days of training). The demo uses simulated PPO-GNN results for illustration.

For the complete trained models or additional experimental data, please contact the author.