Prepare Data

PaddlePaddle Fluid supports two methods to feed data into networks:

  1. Synchronous method - Python Reader:Firstly, use fluid.layers.data to set up data input layer. Then, feed in the training data through executor.run(feed=...) in fluid.Executor or fluid.ParallelExecutor .
  2. Asynchronous method - py_reader:Firstly, use fluid.layers.py_reader to set up data input layer. Then configure the data source with functions decorate_paddle_reader or decorate_tensor_provider of py_reader . After that, call fluid.layers.read_file to read data.

Comparisons of the two methods:

Aspects Synchronous Python Reader Asynchronous py_reader
API interface executor.run(feed=...) fluid.layers.py_reader
data type Numpy Array Numpy Array or LoDTensor
data augmentation carried out by other libraries on Python end carried out by other libraries on Python end
velocity slow rapid
recommended applications model debugging industrial training

Synchronous Python Reader

Fluid provides Python Reader to feed in data.

Python Reader is a pure Python-side interface, and data feeding is synchronized with the model training/prediction process. Users can pass in data through Numpy Array. For specific operations, please refer to:

Python Reader supports advanced functions like group batch, shuffle. For specific operations, please refer to:

Asynchronous py_reader

Fluid provides asynchronous data feeding method PyReader. It is more efficient as data feeding is not synchronized with the model training/prediction process. For specific operations, please refer to: