Use Case

Local Training

These command line arguments are commonly used by local training experiments, such as image classification, natural language processing, et al.

paddle train \
  --use_gpu=1/0 \                        #1:GPU,0:CPU(default:true)
  --config=network_config \
  --save_dir=output \
  --trainer_count=COUNT \                #(default:1)
  --test_period=M \                      #(default:0) 
  --num_passes=N \                       #(defalut:100)
  --log_period=K \                       #(default:100)
  --dot_period=1000 \                    #(default:1)
  #[--show_parameter_stats_period=100] \ #(default:0)
  #[--saving_period_by_batches=200] \    #(default:0)

show_parameter_stats_period and saving_period_by_batches are optional according to your task.

1) Pass Command Argument to Network config

config_args is a useful parameter to pass arguments to network config.

--config_args=generating=1,beam_size=5,layer_num=10 \

And get_config_arg can be used to parse these arguments in network config as follows:

generating = get_config_arg('generating', bool, False)
beam_size = get_config_arg('beam_size', int, 3)
layer_num = get_config_arg('layer_num', int, 8)


get_config_arg(name, type, default_value)
  • name: the name specified in the --config_args
  • type: value type, bool, int, str, float etc.
  • default_value: default value if not set.

2) Use Model to Initialize Network

add argument:


Local Testing

Method 1:

paddle train --job=test \
             --use_gpu=1/0 \ 
             --config=network_config \
             --trainer_count=COUNT \ 
             --init_model_path=model_path \
  • use init_model_path to specify test model.
  • only can test one model.

Method 2:

paddle train --job=test \
             --use_gpu=1/0 \ 
             --config=network_config \
             --trainer_count=COUNT \ 
             --model_list=model.list \
  • use model_list to specify test models
  • can test several models, where model.list likes:

Method 3:

paddle train --job=test \
             --use_gpu=1/0 \
             --config=network_config \
             --trainer_count=COUNT \
             --save_dir=model \
             --test_pass=M \
             --num_passes=N \

This way must use model path saved by Paddle like this: model/pass-%5d. Testing model is from M-th pass to (N-1)-th pass. For example: M=12 and N=14 will test model/pass-00012 and model/pass-00013.

Sparse Training

Sparse training is usually used to accelerate calculation when input is sparse data with highly dimension. For example, dictionary dimension of input data is 1 million, but one sample just have several words. In paddle, sparse matrix multiplication is used in forward propagation and sparse updating is perfomed on weight updating after backward propagation.

1) Local training

You need to set sparse_update=True in network config. Check the network config documentation for more details.

2) cluster training

Add the following argument for cluster training of a sparse model. At the same time you need to set sparse_remote_update=True in network config. Check the network config documentation for more details.

--ports_num_for_sparse=1    #(default: 0)


parallel_nn can be set to mixed use of GPUs and CPUs to compute layers. That is to say, you can deploy network to use a GPU to compute some layers and use a CPU to compute other layers. The other way is to split layers into different GPUs, which can reduce GPU memory or use parallel computation to accelerate some layers.

If you want to use these characteristics, you need to specify device ID in network config (denote it as deviceId) and add command line argument:


case 1: Mixed Use of GPU and CPU

Consider the following example:

#command line:
paddle train --use_gpu=true --parallel_nn=true trainer_count=COUNT


  • default_device(0): set default device ID to 0. This means that except the layers with device=-1, all layers will use a GPU, and the specific GPU used for each layer depends on trainer_count and gpu_id (0 by default). Here, layer fc1 and fc2 are computed on the GPU.
  • device=-1: use the CPU for layer fc3.
  • trainer_count:
    • trainer_count=1: if gpu_id is not set, then use the first GPU to compute layers fc1 and fc2. Otherwise use the GPU with gpu_id.
    • trainer_count>1: use trainer_count GPUs to compute one layer using data parallelism. For example, trainer_count=2 means that GPUs 0 and 1 will use data parallelism to compute layer fc1 and fc2.

Case 2: Specify Layers in Different Devices

#command line:
paddle train --use_gpu=true --parallel_nn=true --trainer_count=COUNT

fc2=fc_layer(input=l1, layer_attr=ExtraAttr(device=0), ...)
fc3=fc_layer(input=l1, layer_attr=ExtraAttr(device=1), ...)
fc4=fc_layer(input=fc2, layer_attr=ExtraAttr(device=-1), ...)

In this case, we assume that there are 4 GPUs in one machine.

  • trainer_count=1:
    • Use GPU 0 to compute layer fc2.
    • Use GPU 1 to compute layer fc3.
    • Use CPU to compute layer fc4.
  • trainer_count=2:
    • Use GPU 0 and 1 to compute layer fc2.
    • Use GPU 2 and 3 to compute layer fc3.
    • Use CPU to compute fc4 in two threads.
  • trainer_count=4:
    • It will fail (note, we have assumed that there are 4 GPUs in machine), because argument allow_only_one_model_on_one_gpu is true by default.

Allocation of device ID when device!=-1:

(deviceId + gpu_id + threadId * numLogicalDevices_) % numDevices_

deviceId:             specified in layer.
gpu_id:               0 by default.
threadId:             thread ID, range: 0,1,..., trainer_count-1
numDevices_:          device (GPU) count in machine.
numLogicalDevices_:   min(max(deviceId + 1), numDevices_)