Fraud detection models often improve with more data, but financial institutions can’t simply pool sensitive transaction records. A practical alternative is federated learning: each bank trains locally, shares only model updates, and benefits from a stronger global model—without exporting raw customer data. This article walks through a clean, CPU-friendly simulation of that setup using lightweight PyTorch building blocks, plus an optional step that turns the final metrics into an internal, decision-oriented risk report with OpenAI.
What this implementation is trying to achieve
The goal is to demonstrate a privacy-preserving fraud detection workflow that is:
- Federated: multiple parties (“banks”) train locally and only exchange model weights/updates.
- Lightweight: no heavyweight federated frameworks; a simple coordination loop is enough for experimentation.
- Realistic: fraud is rare and client datasets are heterogeneous (non-IID), reflecting how fraud patterns vary by institution.
- Actionable: after training, results are summarized into a concise fraud-risk report.
The simulation uses ten clients (ten independent banks), a highly imbalanced synthetic dataset (fraud is a small minority class), and a FedAvg aggregation loop for 10 rounds of training.
Environment setup: reproducible, CPU-friendly execution
The notebook-style implementation installs the essentials—torch, scikit-learn, numpy, and openai—and then fixes random seeds to keep results deterministic and repeatable. It also explicitly selects a CPU device to ensure the simulation runs in common environments without special hardware.
From a practical standpoint, this emphasis on determinism matters when you’re experimenting with federated learning behavior. Small changes in initialization or partitioning can lead to noticeably different convergence patterns, especially with skewed class distributions.
Generating a credit-card-like imbalanced fraud dataset
To mimic the class imbalance typical in fraud detection, the workflow uses make_classification to generate a dataset with:
- n_samples=60000
- n_features=30
- n_informative=18
- n_redundant=8
- weights=[0.985, 0.015] (fraud is ~1.5%)
- class_sep=1.5
- flip_y=0.01
- random_state=SEED
The data is then split into train and test sets using train_test_split with:
- test_size=0.2
- stratify=y (preserves the fraud/non-fraud ratio)
- random_state=SEED
Why there’s a “server-side” scaler here
A StandardScaler is fit on the full training data to produce standardized features for the global test set evaluation. This standardized test loader (batch size 1024, shuffle=False) provides a consistent benchmark to track how the global model changes after each federated round.
Even though the learning is federated, the evaluation step in this simulation is centralized for convenience: the global model is assessed on a single fixed test set after each aggregation round.
Simulating non-IID client data with Dirichlet partitioning
A key challenge in federated learning is that client data is rarely independent and identically distributed. Different banks have different customer bases, transaction mixes, and fraud exposure. To model this heterogeneity, the training data is partitioned across clients using a Dirichlet-based splitter:
- NUM_CLIENTS = 10
- alpha = 0.35 in the Dirichlet distribution
Dirichlet partitioning can create uneven class distributions and varying client dataset sizes—precisely the kind of skew that makes federated optimization harder than centralized training.
Client train/validation splits and a safety check for missing classes
Each client dataset is split into local train and validation partitions with:
- test_size=0.15
- stratify=yi
- random_state=SEED
The implementation includes an important guardrail: if a client ends up with only one class (e.g., no fraud examples), it adds a small number of samples from the opposite class (up to 10) to ensure the local training/evaluation is feasible.
Local feature scaling per client
Each client uses its own StandardScaler, fit on its local training split and applied to its validation split. This better reflects real-world decentralization: banks don’t share feature statistics, and preprocessing can differ subtly across institutions.
Client data loaders are created with:
- batch_size=512 for training (shuffled)
- batch_size=512 for validation
The fraud model: a compact neural network for tabular data
The fraud detector, FraudNet, is a small feedforward network designed for tabular features. It has:
- A linear layer from
in_dimto 64,ReLU, andDropout(0.1) - A linear layer from 64 to 32,
ReLU, andDropout(0.1) - A final linear layer from 32 to 1 logit
The model outputs a single logit (later passed through a sigmoid for probabilities during evaluation), which is standard for binary classification with BCEWithLogitsLoss.
Weight exchange utilities
Because federated learning here is implemented “from scratch,” the code includes helpers to:
- Extract weights from a model’s
state_dictinto NumPy arrays. - Set weights on a new model instance from those arrays.
This makes it straightforward to simulate sending model parameters from the server to clients and sending updated weights back to the server—without needing specialized infrastructure.
Evaluation metrics: beyond accuracy
Fraud detection is highly imbalanced, so accuracy can be misleading. The evaluation routine reports:
- loss (mean
BCEWithLogitsLoss) - auc via
roc_auc_score - ap via
average_precision_score - acc via
accuracy_scorewith a 0.5 threshold
In practice, AUC and especially Average Precision (AP) tend to be more informative than raw accuracy for rare-event detection, because they focus on ranking quality and precision-recall tradeoffs.
Federated training with FedAvg: 10 rounds of coordination
The coordination layer uses a classic federated averaging approach:
- ROUNDS = 10
- LR = 5e-4
- Each round:
- Create a local model per client and initialize it with the current global weights.
- Train locally using
AdamandBCEWithLogitsLoss. - Collect updated client weights and the number of local samples.
- Aggregate using
fedavg, weighting each client by its dataset size. - Evaluate the new global model on the fixed test loader and print metrics.
Why size-weighted averaging matters
FedAvg combines client updates proportionally to how much data each client trained on. In heterogeneous settings (especially with Dirichlet partitions), this helps prevent tiny clients from disproportionately steering the global model, while still allowing them to contribute.
At the same time, weighting by size can raise fairness questions in real deployments: larger institutions may dominate the global model. This implementation keeps the logic explicit so you can experiment with alternative weighting strategies if desired.
Turning model metrics into a risk-team report with OpenAI
After federated training completes, the workflow optionally generates an internal-facing fraud-risk summary using the OpenAI API. The API key is requested securely via hidden keyboard input (getpass) and placed into the OPENAI_API_KEY environment variable.
What gets summarized
A compact summary object is constructed containing:
- rounds (the number of federated rounds)
- num_clients
- final_metrics (the last printed evaluation metrics)
- client_sizes (training dataset sizes for each client)
- client_fraud_rates (fraud rate per client, derived from the client split)
This is then embedded into a prompt asking the model to: “Write a concise internal fraud-risk report. Include executive summary, metric interpretation, risks, and next steps.” The request uses client.responses.create with model="gpt-5.2" and prints the generated output_text.
Why this step is useful (even in a simulation)
Federated learning experiments often end with raw metrics that are meaningful to ML practitioners but less digestible for compliance teams, risk leaders, or fraud operations. The reporting step demonstrates how to convert technical outputs (AUC/AP, heterogeneous client sizes, varying fraud rates) into narrative guidance—without changing the underlying privacy premise (no raw transaction exports).
Implementation notes and practical takeaways
- Privacy boundary: In this setup, clients never transmit raw samples—only model weights are aggregated. That said, the code is a simulation; production-grade privacy typically requires additional protections and threat modeling.
- Non-IID effects are the point: Using a Dirichlet partition with alpha=0.35 surfaces convergence challenges that are easy to miss with uniform splits.
- Evaluate with the right metrics: Including AP alongside AUC and accuracy helps keep focus on imbalanced classification realities.
- Lightweight doesn’t mean simplistic: Even with a compact model and a minimal FedAvg loop, you can study realistic behaviors such as client skew, aggregation effects, and stability across rounds.
Conclusion
This lightweight PyTorch simulation shows how a federated fraud detection pipeline can be built from first principles: ten banks train local models on highly imbalanced data, a FedAvg loop aggregates updates over 10 rounds, and the final results can be summarized into a decision-ready internal report using OpenAI. It’s a practical blueprint for experimenting with privacy-aware collaboration without requiring complex federated infrastructure.
Attribution: This article is based on reporting originally published by www.marktechpost.com. The original tutorial includes the notebook and full implementation details available via the embedded “Full Codes here” link: Full Codes here.
<<>>
Related Articles
- The Cybersecurity Stories That Defined 2025: Hacks, Surveillance, and High-Stakes Reporting
- OpenAI searches for a new Head of Preparedness to tackle emerging AI risks
- How to Use ChatGPT App Integrations: Connect DoorDash, Spotify, Uber, and More
Based on reporting originally published by www.marktechpost.com. See the sources section below.
Sources
- www.marktechpost.com
- https://github.com/Marktechpost/AI-Tutorial-Codes-Included/blob/main/Federated%20Learning/openai_federated_fraud_detection_from_scratch_Marktechpost.ipynb
- https://x.com/intent/follow?screen_name=marktechpost
- https://www.reddit.com/r/machinelearningnews/
- https://www.aidevsignals.com/
- https://t.me/machinelearningresearchnews