Cluster Training Benchmark

Setup

  • Platform

    • Kubernetes: v1.6.2
    • Linux Kernel: v3.10.0
  • Resource

    • CPU: 10 Cores per Pod
    • Memory: 5GB per Pod
  • Docker Image

    We use different base Docker Image to run the benchmark on Kubernetes:

    • PaddlePaddle v2: paddlepaddle/paddle:0.11.0
    • PaddlePaddle Fluid: paddlepaddle/paddle:[commit-id]
    • TensorFlow: tensorflow/tensorflow:1.5.0-rc0
  • Model vgg16 is used in this benchmark.

Cases

  • Variable
    • Batch Size of training data.
    • PServer count of the training job.
    • The number of trainers.
  • Invariant
    • The resource of trainer/pserver Pod.

Measure the Performance for Different Batch Size

  • PServer Count: 40
  • Trainer Count: 100
  • Metrics: mini-batch / sec
Batch Size 32 64 128 256
PaddlePaddle Fluid - - - -
PaddlePaddle v2 - - - -
TensorFlow - - - -

Measure the Performance for Different PServer Count

  • Trainer Count: 100
  • Batch Size: 64
  • Metrics: mini-batch / sec
PServer Count 10 20 40 60
PaddlePaddle Fluid - - - -
PaddlePaddle v2 - - - -
TensorFlow - - - -

Measure Parallel Efficiency By Increasing Trainer Count

  • PServer Count: 20
  • Batch Size: 64
  • Metrics:

$S = \div(T1, TN)$

which S is the ratio of T1 over TN, training time of 1 and N trainers. The parallel efficiency is:

$E = \div(S, N)$

Trainer Counter 1 10 20 30 40 50 60 70 80 90 100
PaddlePaddle Fluid - - - - - - - - - - -
PaddlePaddle v2 - - - - - - - - - - -
TensorFlow - - - - - - - - - - -

Reproduce the benchmark

TODO