Research

Infrastructure experiments, performance studies, and operational findings from hands-on systems systems work.

Research Areas

GPU Infrastructure

  • • CUDA optimization and kernel profiling
  • • GPU utilization patterns under inference loads
  • • Thermal and power characteristics
  • • Multi-GPU communication and scaling

Model Serving

  • • vLLM deployment and tuning
  • • Batch size vs. latency trade-offs
  • • Quantization impact on throughput
  • • KV cache optimization strategies

Storage Systems

  • • Model loading performance (NVMe vs HDD)
  • • Dataset I/O patterns
  • • Filesystem benchmarking (ext4, XFS, ZFS)
  • • Storage tiering strategies

Network Performance

  • • Inference API latency profiling
  • • Network topology for distributed workloads
  • • Bandwidth requirements for model synchronization
  • • Multi-node communication overhead

Virtualization

  • • GPU passthrough performance
  • • Container vs. bare metal inference
  • • Resource isolation strategies
  • • Overhead measurements

Monitoring & Observability

  • • GPU metrics collection (DCGM)
  • • Inference latency tracking
  • • Resource utilization dashboards
  • • Alerting strategies for compute workloads

Active Experiments

GPU Baseline Characterization

Experiment ID: EXP-001 | Started: 2026-02-01

Active

Establishing baseline performance metrics for GPU compute under sustained LLM inference workloads. Measuring throughput, latency, power consumption, and thermal behavior across different model sizes and batch configurations.

NVIDIA GPU vLLM Llama 3.3 70B DCGM

Storage Tier Performance Study

Experiment ID: EXP-002 | Started: 2026-02-05

Active

Comparing model loading times and I/O patterns across NVMe, SATA SSD, and HDD storage. Evaluating filesystem performance (ext4, XFS, ZFS) for large model weight files and dataset access patterns.

NVMe fio Model Loading Benchmarking

Inference Latency Optimization

Experiment ID: EXP-003 | Started: 2026-02-08

Planning

Investigating latency reduction techniques including continuous batching, speculative decoding, and KV cache tuning. Measuring p50, p95, and p99 latency under varying load conditions.

vLLM Continuous Batching Latency Load Testing

Experiment Methodology

Measurement Principles

  • • Benchmark on actual hardware, not cloud instances
  • • Run multiple iterations to account for variance
  • • Document environmental conditions (temperature, load)
  • • Use production-representative workloads
  • • Isolate variables to measure specific impacts

Documentation Standards

  • • Record hypothesis, methodology, and results
  • • Include hardware specs and software versions
  • • Document failures and negative results
  • • Share reproducible benchmark scripts
  • • Link to raw data and analysis notebooks

Publications & Findings

Experiments in Progress

Research findings and experiment reports will be published here as work progresses. Initial experiments are currently in the baseline measurement phase.

View Experiment Documentation

Experimental Standards

All performance claims are backed by empirical measurements on physical hardware under realistic operating conditions. Theoretical estimates and vendor benchmarks are treated as hypotheses requiring validation, not as established facts.

Failed experiments receive the same documentation rigor as successful ones. Negative results prevent redundant work and contribute meaningfully to the field's understanding of what approaches do not yield improvements under which conditions.

Experimental reports include complete methodology, environmental specifications, hardware configurations, software versions, and reproducible benchmark code. This enables independent verification and builds on the scientific principle that claims require evidence that others can examine.

All findings, datasets, and analysis code are published openly via the lab's GitHub repositories. This serves both institutional memory and the broader infrastructure research community's need for empirical data from real-world systems.

Follow the Research

Experiment documentation, findings, and methodology are maintained in the GitHub repository. New results are published as experiments progress.