Back to all case studies
Cybersecurity

Comparing intrusion detection in batch and streaming environments

Evaluated preprocessing and algorithm choices across batch and real-time IDS settings to guide production deployments.

Streaming accuracy

99.9%

Algorithms benchmarked

8

Preprocessing variants

3

Overview

Security teams needed clarity on which intrusion detection algorithms hold up under real-time constraints.

Our researchers conducted a side-by-side evaluation across batch and streaming environments using consistent datasets.

Challenges

  • Model rankings shift dramatically depending on label granularity and preprocessing.
  • Streaming deployments must handle concept drift while maintaining accuracy.
  • Practitioners lacked a playbook for selecting algorithms under differing operational regimes.

Approach

  • Feature engineering scenarios

    Created multiple preprocessing variants with different feature and label consolidations to test robustness.

  • Batch benchmarking

    Evaluated SVM, MLP, decision trees, Naive Bayes, and k-NN in WEKA using 10-fold cross-validation.

  • Streaming evaluation

    Ran Hoeffding Trees, IBLStream, Naive Bayes, and OzaBoost in MOA with prequential testing and fading factors to simulate drift.

Impact delivered

  • Identified top-performing models for binary, five-class, and multi-class setups in batch mode.
  • Showed that ensemble methods like OzaBoost maintain high accuracy and fast recovery under drift.
  • Delivered actionable recommendations for aligning IDS model selection with deployment constraints.

Key lessons

  • Always align preprocessing and labeling choices with intended deployment metrics.
  • Streaming evaluations require drift-aware protocols to reveal true model resilience.
  • One-size-fits-all algorithm recommendations rarely hold across operational contexts.

Ready to transform your data infrastructure?

Let's discuss how we can help you achieve similar results with a tailored approach for your organization.

Get in touch