fluid

_switch_scope

paddle.fluid._switch_scope(scope)

BuildStrategy

class paddle.fluid.BuildStrategy

BuildStrategy allows the user to more preciously control how to build the SSA Graph in ParallelExecutor by setting the property.

Examples

build_strategy = fluid.BuildStrategy()
build_strategy.reduce_strategy = fluid.BuildStrategy.ReduceStrategy.Reduce

train_exe = fluid.ParallelExecutor(use_cuda=True,
                                   loss_name=loss.name,
                                   build_strategy=build_strategy)

train_loss, = train_exe.run([loss.name], feed=feed_dict)
debug_graphviz_path

The type is STR, debug_graphviz_path indicate the path that writing the SSA Graph to file in the form of graphviz, you. It is useful for debugging. Default “”

fuse_elewise_add_act_ops

The type is BOOL, fuse_elewise_add_act_ops indicate whether to fuse elementwise_add_op and activation_op, it may make the execution faster. Default False

gradient_scale_strategy

The type is STR, there are three ways of defining \(loss@grad\) in ParallelExecutor, ‘CoeffNumDevice’, ‘One’ and ‘Customized’. By default, ParallelExecutor sets the \(loss@grad\) according to the number of devices. If you want to customize \(loss@grad\), you can choose ‘Customized’. Default ‘CoeffNumDevice’.

reduce_strategy

The type is STR, there are two reduce strategies in ParallelExecutor, ‘AllReduce’ and ‘Reduce’. If you want that all the parameters’ optimization are done on all devices independently, you should choose ‘AllReduce’; if you choose ‘Reduce’, all the parameters’ optimization will be evenly distributed to different devices, and then broadcast the optimized parameter to other devices. In some models, Reduce is faster. Default ‘AllReduce’.

CPUPlace

class paddle.fluid.CPUPlace

create_lod_tensor

paddle.fluid.create_lod_tensor(data, recursive_seq_lens, place)

Create a lod tensor from a numpy array, a list, or an existing lod tensor.

Create a lod tensor by doing the following:

  1. Check that the length-based level of detail (LoD) also known as recursive_sequence_lengths of the input is valid.
  2. Convert recursive_sequence_lengths to a offset-based LoD.
  3. Copy the data from a numpy array, a list or a existing lod tensor to CPU or GPU device (based on input place).
  4. Set the level of detail (LoD) using the offset-based LoD.

Examples

Suppose we want LoDTensor to hold data for sequences of word, where each word is represented by an integer. If we want to create a LoDTensor to represent two sentences, one of 2 words, and one of 3 words.

Then data can be a numpy array of integers with shape (5, 1). recursive_seq_lens will be [[2, 3]], indicating the length(# of words) in each sentence. This length-based recursive_seq_lens [[2, 3]] will be converted to offset-based LoD [[0, 2, 5]] inside the function call.

Please reference api_guide_low_level_lod_tensor for more details regarding LoD.

Parameters:
  • data (numpy.ndarray|list|LoDTensor) – a numpy array or a LoDTensor or a list holding the data to be copied.
  • recursive_seq_lens (list) – a list of lists indicating the length-based level of detail info specified by the user.
  • place (Place) – CPU or GPU place indicating where the data in the new LoDTensor will be stored.
Returns:

A fluid LoDTensor object with tensor data and recursive_seq_lens info.

create_random_int_lodtensor

paddle.fluid.create_random_int_lodtensor(recursive_seq_lens, base_shape, place, low, high)

Create a LoDTensor containing random integers.

This function is frequently used in the book examples. So we revised it based on the new create_lod_tensor API and put it here in the lod_tensor module to simplify the code.

The function does the following:

  1. Calculate the overall shape of the LoDTensor based on the length-based recursive_seq_lens input and the shape of the basic element in base_shape.
  2. Create a numpy array of this shape.
  3. Create the LoDTensor using create_lod_tensor API.

Suppose we want LoDTensor to hold data for sequences of word, where each word is represented by an integer. If we want to create a LoDTensor to represent two sentences, one of 2 words, and one of 3 words. Then ‘base_shape’ is [1], input length-based ‘recursive_seq_lens’ is [[2, 3]]. Then the overall shape of the LoDTensor would be [5, 1], holding 5 words for two sentences.

Parameters:
  • recursive_seq_lens (list) – a list of lists indicating the length-based level of detail info specified by the user.
  • base_shape (list) – the shape of the basic element to be held by the LoDTensor.
  • place (Place) – CPU or GPU place indicating where the data in the new LoDTensor will be stored.
  • low (int) – the lower bound of the random integers.
  • high (int) – the upper bound of the random integers.
Returns:

A fluid LoDTensor object with tensor data and recursive_seq_lens info.

CUDAPinnedPlace

class paddle.fluid.CUDAPinnedPlace

CUDAPlace

class paddle.fluid.CUDAPlace

DataFeeder

class paddle.fluid.DataFeeder(feed_list, place, program=None)

DataFeeder converts the data that returned by a reader into a data structure that can feed into Executor and ParallelExecutor. The reader usually returns a list of mini-batch data entries. Each data entry in the list is one sample. Each sample is a list or a tuple with one feature or multiple features.

The simple usage shows below:

place = fluid.CPUPlace()
img = fluid.layers.data(name='image', shape=[1, 28, 28])
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
feeder = fluid.DataFeeder([img, label], fluid.CPUPlace())
result = feeder.feed([([0] * 784, [9]), ([1] * 784, [1])])

If you want to feed data into GPU side separately in advance when you use multi-GPU to train a model, you can use decorate_reader function.

place=fluid.CUDAPlace(0)
feeder = fluid.DataFeeder(place=place, feed_list=[data, label])
reader = feeder.decorate_reader(
    paddle.batch(flowers.train(), batch_size=16))
Parameters:
  • feed_list (list) – The Variables or Variables’name that will feed into model.
  • place (Place) – place indicates feed data into CPU or GPU, if you want to feed data into GPU, please using fluid.CUDAPlace(i) (i represents the GPU id), or if you want to feed data into CPU, please using fluid.CPUPlace().
  • program (Program) – The Program that will feed data into, if program is None, it will use default_main_program(). Default None.
Raises:

ValueError – If some Variable is not in this Program.

Examples

# ...
place = fluid.CPUPlace()
feed_list = [
    main_program.global_block().var(var_name) for var_name in feed_vars_name
] # feed_vars_name is a list of variables' name.
feeder = fluid.DataFeeder(feed_list, place)
for data in reader():
    outs = exe.run(program=main_program,
                   feed=feeder.feed(data))
feed(iterable)

According to feed_list and iterable, converters the input into a data structure that can feed into Executor and ParallelExecutor.

Parameters:iterable (list|tuple) – the input data.
Returns:the result of conversion.
Return type:dict
feed_parallel(iterable, num_places=None)

Takes multiple mini-batches. Each mini-batch will be feed on each device in advance.

Parameters:
  • iterable (list|tuple) – the input data.
  • num_places (int) – the number of devices. Default None.
Returns:

the result of conversion.

Return type:

dict

Notes

The number of devices and number of mini-batches must be same.

decorate_reader(reader, multi_devices, num_places=None, drop_last=True)

Converter the input data into a data that returned by reader into multiple mini-batches. Each mini-batch will be feed on each device.

Parameters:
  • reader (fun) – the input data.
  • multi_devices (bool) – the number of places. Default None.
  • num_places (int) – the number of places. Default None.
  • drop_last (bool) – the number of places. Default None.
Returns:

the result of conversion.

Return type:

dict

Raises:
  • ValueError – If drop_last is False and the data batch which cannot
  • fit for devices.

default_main_program

paddle.fluid.default_main_program()

Get default/global main program. The main program is used for training or testing.

All layer function in fluid.layers will append operators and variables to the default_main_program.

The default_main_program is the default program in a lot of APIs. For example, the Executor.run() will execute the default_main_program when the program is not specified.

Returns:main program
Return type:Program

default_startup_program

paddle.fluid.default_startup_program()

Get default/global startup program.

The layer function in fluid.layers will create parameters, readers, NCCL handles as global variables. The startup_program will initialize them by the operators in startup program. The layer function will append these initialization operators into startup program.

This method will return the default or the current startup program. Users can use fluid.program_guard to switch program.

Returns:startup program
Return type:Program

DistributeTranspiler

class paddle.fluid.DistributeTranspiler(config=None)

DistributeTranspiler

Convert the fluid program to distributed data-parallelism programs. Supports two modes: pserver mode and nccl2 mode.

In pserver mode, the main_program will be transformed to use a remote parameter server to do parameter optimization. And the optimization graph will be put into a parameter server program.

In nccl2 mode, the transpiler will append a NCCL_ID broadcasting op in startup_program to share the NCCL_ID across the job nodes. After transpile_nccl2 called, you *must* pass trainer_id and num_trainers argument to ParallelExecutor to enable NCCL2 distributed mode.

Examples

# for pserver mode
pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
current_endpoint = "192.168.0.1:6174"
trainer_id = 0
trainers = 4
role = os.getenv("PADDLE_TRAINING_ROLE")

t = fluid.DistributeTranspiler()
t.transpile(
     trainer_id, pservers=pserver_endpoints, trainers=trainers)
if role == "PSERVER":
     pserver_program = t.get_pserver_program(current_endpoint)
     pserver_startup_program = t.get_startup_program(current_endpoint,
                                                     pserver_program)
elif role == "TRAINER":
     trainer_program = t.get_trainer_program()

# for nccl2 mode
config = fluid.DistributeTranspilerConfig()
config.mode = "nccl2"
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id, workers=workers, current_endpoint=curr_ep)
exe = fluid.ParallelExecutor(
    use_cuda,
    loss_name=loss_var.name,
    num_trainers=len(trainers.split(",)),
    trainer_id=trainer_id
)
transpile(trainer_id, program=None, pservers='127.0.0.1:6174', trainers=1, sync_mode=True, startup_program=None, current_endpoint='127.0.0.1:6174')

Run the transpiler.

Parameters:
  • trainer_id (int) – id for current trainer worker, if you have n workers, the id may range from 0 ~ n-1
  • program (Program|None) – program to transpile, default is fluid.default_main_program().
  • pservers (str) – comma separated ip:port string for the pserver list.
  • trainers (int|str) – in pserver mode this is the number of trainers, in nccl2 mode this is a string of trainer endpoints.
  • sync_mode (bool) – Do sync training or not, default is True.
  • startup_program (Program|None) – startup_program to transpile, default is fluid.default_main_program().
  • current_endpoint (str) – need pass current endpoint when transpile as nccl2 distributed mode. In pserver mode this argument is not used.
get_trainer_program(wait_port=True)

Get transpiled trainer side program.

Returns:trainer side program.
Return type:Program
get_pserver_program(endpoint)

Get parameter server side program.

Parameters:endpoint (str) – current parameter server endpoint.
Returns:the program for current parameter server to run.
Return type:Program
get_pserver_programs(endpoint)

Get pserver side main program and startup program for distributed training.

Parameters:endpoint (str) – current pserver endpoint.
Returns:(main_program, startup_program), of type “Program”
Return type:tuple
get_startup_program(endpoint, pserver_program=None, startup_program=None)

Deprecated

Get startup program for current parameter server. Modify operator input variables if there are variables that were split to several blocks.

Parameters:
  • endpoint (str) – current pserver endpoint.
  • pserver_program (Program) – deprecated, call get_pserver_program first.
  • startup_program (Program) – deprecated, should pass startup_program when initalizing
Returns:

parameter server side startup program.

Return type:

Program

DistributeTranspilerConfig

class paddle.fluid.DistributeTranspilerConfig

slice_var_up (bool): Do Tensor slice for pservers, default is True. split_method (PSDispatcher): RoundRobin or HashName can be used

try to choose the best method to balance loads for pservers.
min_block_size (int): Minimum splitted element number in block.
According:https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-369912156 We can use bandwidth effiently when data size is larger than 2MB.If you want to change it, please be sure you see the slice_variable function.

ExecutionStrategy

class paddle.fluid.ExecutionStrategy

ExecutionStrategy allows the user to more preciously control how to run the program in ParallelExecutor by setting the property.

Examples

exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_threads = 4

train_exe = fluid.ParallelExecutor(use_cuda=True,
                                   loss_name=loss.name,
                                   exec_strategy=exec_strategy)

train_loss, = train_exe.run([loss.name], feed=feed_dict)
allow_op_delay

The type is BOOL, allow_op_delay represents whether to delay the communication operators to run, it may make the execution faster. Note that in some models, allow_op_delay may cause program hang. Default False.

num_iteration_per_drop_scope

The type is INT, num_iteration_per_drop_scope indicates how many iterations to clean up the temp variables which is generated during execution. It may make the execution faster, because the temp variable’s shape maybe the same between two iterations. Default 100.

Notes

  1. If you fetch data when calling the ‘run’, the ParallelExecutor will clean up the temp variables at the end of the current iteration.
  2. In some NLP model, it may cause the GPU memory is insufficient, in this case, you should reduce num_iteration_per_drop_scope.
num_threads

The type is INT, num_threads represents the size of thread pool that used to run the operators of the current program in ParallelExecutor. If \(num\_threads=1\), all the operators will execute one by one, but the order maybe difference between iterations. If it is not set, it will be set in ParallelExecutor according to the device type and device count, for GPU, \(num\_threads=device\_count*4\), for CPU, \(num\_threads=CPU\_NUM*4\), the explanation of:math:CPU_NUM is in ParallelExecutor. if it is not set, ParallelExecutor will get the cpu count by calling multiprocessing.cpu_count(). Default 0.

Executor

class paddle.fluid.Executor(place)

An Executor in Python, only support the single-GPU running. For multi-cards, please refer to ParallelExecutor. Python executor takes a program, add feed operators and fetch operators to this program according to feed map and fetch_list. Feed map provides input data for the program. fetch_list provides the variables(or names) that user want to get after program run. Note: the executor will run all operators in the program but not only the operators dependent by the fetch_list. It store the global variables into the global scope, and create a local scope for the temporary variables. The local scope contents will be discarded after every minibatch forward/backward finished. But the global scope variables will be persistent through different runs. All of ops in program will be running in sequence.

Parameters:place (core.CPUPlace|core.CUDAPlace(n)) – indicate the executor run on which device

Note: For debugging complicated network in parallel-GPUs, you can test it on the executor. They has the exactly same arguments, and expected the same results.

close()

Close this executor.

You can no long use this executor after calling this method. For the distributed training, this method would free the resource on PServers related to the current Trainer.

Example

>>> cpu = core.CPUPlace()
>>> exe = Executor(cpu)
>>> ...
>>> exe.close()
run(program=None, feed=None, fetch_list=None, feed_var_name='feed', fetch_var_name='fetch', scope=None, return_numpy=True, use_program_cache=False)

Run program by this Executor. Feed data by feed map, fetch result by fetch_list. Python executor takes a program, add feed operators and fetch operators to this program according to feed map and fetch_list. Feed map provides input data for the program. fetch_list provides the variables(or names) that user want to get after program run.

Note: the executor will run all operators in the program but not only the operators dependent by the fetch_list

Parameters:
  • program (Program) – the program that need to run, if not provied, then default_main_program will be used.
  • feed (dict) – feed variable map, e.g. {“image”: ImageData, “label”: LableData}
  • fetch_list (list) – a list of variable or variable names that user want to get, run will return them according to this list.
  • feed_var_name (str) – the name for the input variable of feed Operator.
  • fetch_var_name (str) – the name for the output variable of fetch Operator.
  • scope (Scope) – the scope used to run this program, you can switch it to different scope. default is global_scope
  • return_numpy (bool) – if convert the fetched tensor to numpy
  • use_program_cache (bool) – set use_program_cache to true if program not changed compare to the last step.
Returns:

fetch result according to fetch_list.

Return type:

list(numpy.array)

Examples

>>> data = layers.data(name='X', shape=[1], dtype='float32')
>>> hidden = layers.fc(input=data, size=10)
>>> layers.assign(hidden, out)
>>> loss = layers.mean(out)
>>> adam = fluid.optimizer.Adam()
>>> adam.minimize(loss)
>>> cpu = core.CPUPlace()
>>> exe = Executor(cpu)
>>> exe.run(default_startup_program())
>>> x = numpy.random.random(size=(10, 1)).astype('float32')
>>> outs = exe.run(
>>>     feed={'X': x},
>>>     fetch_list=[loss.name])

global_scope

paddle.fluid.global_scope()

Get the global/default scope instance. There are a lot of APIs use global_scope as its default value, e.g., Executor.run

Returns:The global/default scope instance.
Return type:Scope

LoDTensor

class paddle.fluid.LoDTensor

LoDTensor is a Tensor with optional LoD information.

np.array(lod_tensor) can convert LoDTensor to numpy array. lod_tensor.lod() can retrieve the LoD information.

LoD is short for Level of Details and is usually used for varied sequence length. You can skip the following comment if you don’t need optional LoD.

For example:

A LoDTensor X can look like the example below. It contains 2 sequences. The first has length 2 and the second has length 3, as described by x.lod.

The first tensor dimension 5=2+3 is calculated from LoD if it’s available. It means the total number of sequence element. In X, each element has 2 columns, hence [5, 2].

x.lod = [[2, 3]] x.data = [[1, 2], [3, 4],

[5, 6], [7, 8], [9, 10]]

x.shape = [5, 2]

LoD can have multiple levels (for example, a paragraph can have multiple sentences and a sentence can have multiple words). In the following LodTensor Y, the lod_level is 2. It means there are 2 sequence, the first sequence length is 2 (has 2 sub-sequences), the second one’s length is 1. The first sequence’s 2 sub-sequences have length 2 and 2, respectively. And the second sequence’s 1 sub-sequence has length 3.

y.lod = [[2 1], [2 2 3]] y.shape = [2+2+3, ...]

Note

In above description, LoD is length-based. In Paddle internal implementation, lod is offset-based. Hence, internally, y.lod is represented as [[0, 2, 3], [0, 2, 4, 7]] (length-based equivlent would be [[2-0, 3-2], [2-0, 4-2, 7-4]]).

Sometimes LoD is called recursive_sequence_length to be more self-explanatory. In this case, it must be length-based. Due to history reasons. when LoD is called lod in public API, it might be offset-based. Users should be careful about it.

has_valid_recursive_sequence_lengths(self: paddle.fluid.core.LoDTensor) → bool
lod(self: paddle.fluid.core.LoDTensor) → List[List[int]]
recursive_sequence_lengths(self: paddle.fluid.core.LoDTensor) → List[List[int]]
set_lod(self: paddle.fluid.core.LoDTensor, arg0: List[List[int]]) → None
set_recursive_sequence_lengths(self: paddle.fluid.core.LoDTensor, arg0: List[List[int]]) → None

LoDTensorArray

class paddle.fluid.LoDTensorArray
append(self: paddle.fluid.core.LoDTensorArray, arg0: paddle.fluid.core.LoDTensor) → None

memory_optimize

paddle.fluid.memory_optimize(input_program, skip_opt_set=None, print_log=False, level=0, skip_grads=False)

Optimize memory by reusing var memory.

Note: it doesn’t not support subblock nested in subblock.
Parameters:
  • input_program (str) – Input Program
  • skip_opt_set (set) – vars wil be skipped in memory optimze
  • print_log (bool) – whether to print debug log.
  • level (int) – If level=0, reuse if the shape is completely equal, o
Returns:

None

name_scope

paddle.fluid.name_scope(*args, **kwds)

Generate hierarchical name prefix for the operators.

Note: This should only used for debugging and visualization purpose. Don’t use it for serious analysis such as graph/program transformations.

Parameters:prefix (str) – prefix.

Examples

ParallelExecutor

class paddle.fluid.ParallelExecutor(use_cuda, loss_name=None, main_program=None, share_vars_from=None, exec_strategy=None, build_strategy=None, num_trainers=1, trainer_id=0, scope=None)

ParallelExecutor is designed for data parallelism, which focuses on distributing the data across different nodes and every node operates on the data in parallel. If you use ParallelExecutor to run the current program on GPU, the node means GPU device, and ParallelExecutor will get the available GPU device automatically on the current machine. If you use ParallelExecutor to run the current program on CPU, the node means the CPU device, and you can specify the CPU device number by adding ‘CPU_NUM’ environment variable, for example ‘CPU_NUM=4’, if the environment variable is not found, ParallelExecutor will call multiprocessing.cpu_count to get the number of CPUs in the system.

Parameters:
  • use_cuda (bool) – Whether to use CUDA or not.
  • loss_name (str) – The loss name must set in training. Default None.
  • main_program (Program) – The program that need to run, if not provided, then default_main_program will be used. Default None.
  • share_vars_from (ParallelExecutor) – If provide, it will share variables from the specified ParallelExecutor. Default None.
  • exec_strategy (ExecutionStrategy) – exec_strategy is used to control how to run the program in ParallelExecutor, for example how many threads are used to execute the program, how many iterations to clean up the temp variables which is generated during execution. For more information, please refer to fluid.ExecutionStrategy. Default None.
  • build_strategy (BuildStrategy) – build_strategy is used to control how to build the SSA Graph in ParallelExecutor by setting the property, for example reduce_strategy, gradient_scale_strategy. For more information, please refer to fluid.BuildStrategy. Default None.
  • num_trainers (int) – If greater than 1, NCCL will be initialized with multiple rank of nodes, each node should have same number of GPUs. Distributed training will be enabled then. Default 1.
  • trainer_id (int) – Must use together with num_trainers. trainer_id is the “rank” of current node starts from 0. Default 0.
  • scope (Scope) – scope to run with, default use fluid.global_scope().
Returns:

The initialized ParallelExecutor object.

Return type:

ParallelExecutor

Raises:

TypeError – If share_vars_from is provided, but not ParallelExecutor object.

Examples

train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=loss.name)
test_exe = fluid.ParallelExecutor(use_cuda=True,
                                  main_program=test_program,
                                  share_vars_from=train_exe)

train_loss, = train_exe.run([loss.name], feed=feed_dict)
test_loss, = test_exe.run([loss.name], feed=feed_dict)
run(fetch_list, feed=None, feed_dict=None, return_numpy=True)

Run a parallel executor with fetch_list.

The feed parameter can be a dict or a list. If feed is a dict, the feed data will be split into multiple devices. If feed is a list, we assume the data has been splitted into multiple devices, the each element in the list will be copied to each device directly.

For example, if the feed is a dict:

>>> exe = ParallelExecutor()
>>> # the image will be splitted into devices. If there is two devices
>>> # each device will process an image with shape (24, 1, 28, 28)
>>> exe.run(feed={'image': numpy.random.random(size=(48, 1, 28, 28))})

For example, if the feed is a list:

>>> exe = ParallelExecutor()
>>> # each device will process each element in the list.
>>> # the 1st device will process an image with shape (48, 1, 28, 28)
>>> # the 2nd device will process an image with shape (32, 1, 28, 28)
>>> #
>>> # you can use exe.device_count to get the device number.
>>> exe.run(feed=[{"image": numpy.random.random(size=(48, 1, 28, 28))},
>>>               {"image": numpy.random.random(size=(32, 1, 28, 28))},
>>>              ])
Parameters:
  • fetch_list (list) – The fetched variable names
  • feed (list|dict|None) – The feed variables. If the feed is a dict, tensors in that dict will be splitted into each devices. If the feed is a list, each element of the list will be copied to each device. Default None.
  • feed_dict – Alias for feed parameter, for backward compatibility. This parameter has been deprecated. Default None.
  • return_numpy (bool) – Whether converts the fetched tensor to numpy. Default: True.
Returns:

The fetched result list.

Return type:

List

Raises:

ValueError – If the feed is a list, but its length is not equal the length of active places, or its element’s is not dict.

Notes

  1. If the feed’s type is dict, the number of data that feeds to ParallelExecutor must be bigger than active places. Otherwise, it will throw exception from C++ side. Special attention should be paid to check whether the last batch of the dataset is bigger than active places.
  2. If active places are more than one, the fetch results for each variable is a list, and each element of this list is the variable of respective active place.

Examples

pe = fluid.ParallelExecutor(use_cuda=use_cuda,
                            loss_name=avg_cost.name,
                            main_program=fluid.default_main_program())
loss = pe.run(feed=feeder.feed(cur_batch),
              fetch_list=[avg_cost.name]))

ParamAttr

class paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)

Parameter attributes object. To fine-tuning network training process, user can set parameter’s attributes to control training details. Such as learning rate, regularization, trainable, do_model_average and the method to initialize param.

Parameters:
  • name (str) – The parameter’s name. Default None.
  • initializer (Initializer) – The method to initial this parameter. Default None.
  • learning_rate (float) – The parameter’s learning rate. The learning rate when optimize is \(global\_lr * parameter\_lr * scheduler\_factor\). Default 1.0.
  • regularizer (WeightDecayRegularizer) – Regularization factor. Default None.
  • trainable (bool) – Whether this parameter is trainable. Default True.
  • gradient_clip (BaseGradientClipAttr) – The method to clip this parameter’s gradient. Default None.
  • do_model_average (bool) – Whether this parameter should do model average. Default False.

Examples

w_param_attrs = fluid.ParamAttr(name="fc_weight",
                                learning_rate=0.5,
                                regularizer=fluid.L2Decay(1.0),
                                trainable=True)
y_predict = fluid.layers.fc(input=x, size=10, param_attr=w_param_attrs)

Program

class paddle.fluid.Program

Python Program. Beneath it is a ProgramDesc, which is used for create c++ Program. A program is a self-contained programing language like container. It has at least one Block, when the control flow op like conditional_block, while_op is included, it will contains nested block. Please reference the framework.proto for details.

Notes: we have default_startup_program and default_main_program by default, a pair of them will shared the parameters. The default_startup_program only run once to initialize parameters, default_main_program run in every mini batch and adjust the weights.

Returns:A empty program.

Examples

>>> main_program = fluid.Program()
>>> startup_program = fluid.Program()
>>> with fluid.program_guard(main_program=main_program, startup_program=startup_program):
>>>     fluid.layers.data(name="x", shape=[-1, 784], dtype='float32')
>>>     fluid.layers.data(name="y", shape=[-1, 1], dtype='int32')
>>>     fluid.layers.fc(name="fc", shape=[10], dtype='float32', act="relu")
op_role

The operator role. In a enum {Forward, Backward, Optimize}.

Notes: this is a low level API. It is used only for ParallelExecutor to duplicate or schedule operator to devices.

For example, the forward operator should be executed on every device. The backward operator should be executed on every device and the parameter gradient of backward (use op_role_var to get this variable) operator should be merged to one device. The optimization operators should be executed on only one device and broadcast the optimization result, i.e., the new parameter, to every other device.

set_op_role

The operator role. In a enum {Forward, Backward, Optimize}.

Notes: this is a low level API. It is used only for ParallelExecutor to duplicate or schedule operator to devices.

For example, the forward operator should be executed on every device. The backward operator should be executed on every device and the parameter gradient of backward (use op_role_var to get this variable) operator should be merged to one device. The optimization operators should be executed on only one device and broadcast the optimization result, i.e., the new parameter, to every other device.

op_role_var

The auxiliary variables for op_role property.

See Also: Program.op_role‘s documentation for details.

Notes: This is a very low-level API. Users should not use it directly.

set_op_role_var

The auxiliary variables for op_role property.

See Also: Program.op_role‘s documentation for details.

Notes: This is a very low-level API. Users should not use it directly.

to_string(throw_on_error, with_details=False)

To debug string.

Parameters:
  • throw_on_error (bool) – raise Value error when any of required fields is not set.
  • with_details (bool) – True if more details about variables and parameters, e.g., trainable, optimize_attr, need to print.
Returns
(str): The debug string.
Raises:ValueError – If any of required fields is not set and throw_on_error is True.
clone(for_test=False)

Create a new, duplicated program.

Some operators, e.g., batch_norm, behave differently between training and testing. They have an attribute, is_test, to control this behaviour. This method will change the is_test attribute of them to True when for_test=True.

  • Set for_test to False when we want to clone the program for training.
  • Set for_test to True when we want to clone the program for testing.

Notes: This API DOES NOT prune any operator. Use clone(for_test=True) before backward and optimization please. e.g.

>>> test_program = fluid.default_main_program().clone(for_test=True)
>>> optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9)
>>> optimizer.minimize()
Parameters:for_test (bool) – True if change the is_test attribute of operators to True.
Returns:The new, duplicated Program object.
Return type:Program

Examples

  1. To clone a test program, the sample code is:
>>> import paddle.fluid as fluid
>>> train_program = fluid.Program()
>>> startup_program = fluid.Program()
>>> with fluid.program_guard(train_program, startup_program):
>>>     img = fluid.layers.data(name='image', shape=[784])
>>>     hidden = fluid.layers.fc(input=img, size=200, act='relu')
>>>     hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
>>>     loss = fluid.layers.cross_entropy(
>>>                 input=fluid.layers.fc(hidden, size=10, act='softmax'),
>>>                 label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
>>>
>>> test_program = train_program.clone(for_test=True)
>>>
>>> sgd = fluid.optimizer.SGD(learning_rate=1e-3)
>>> with fluid.program_guard(train_program, startup_program):
>>>     sgd.minimize(loss)

2. The clone method can be avoid if you create program for training and program for testing individually.

>>> import paddle.fluid as fluid
>>>
>>> def network(is_test):
>>>     img = fluid.layers.data(name='image', shape=[784])
>>>     hidden = fluid.layers.fc(input=img, size=200, act='relu')
>>>     hidden = fluid.layers.dropout(hidden, dropout_prob=0.5, is_test=is_test)
>>>     loss = fluid.layers.cross_entropy(
>>>                 input=fluid.layers.fc(hidden, size=10, act='softmax'),
>>>                 label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
>>>     return loss
>>>
>>> train_program = fluid.Program()
>>> startup_program = fluid.Program()
>>> test_program = fluid.Program()
>>>
>>> with fluid.program_guard(train_program, startup_program):
>>>     with fluid.unique_name.guard():
>>>         loss = network(is_test=False)
>>>         sgd = fluid.optimizer.SGD(learning_rate=1e-3)
>>>         sgd.minimize(loss)
>>>
>>> # the test startup program is not used.
>>> with fluid.program_guard(test_program, fluid.Program()):
>>>     with fluid.unique_name.guard():
>>>         loss = network(is_test=True)

The two code snippets above will generate same programs.

static parse_from_string(binary_str)

Deserialize a program desc from protobuf binary string.

Notes: All information about parameters will be lost after serialization and deserialization.

Parameters:binary_str_type (str) – The binary prootbuf string.
Returns:A deserialized program desc.
Return type:Program
num_blocks

The number of blocks in this program.

random_seed

The default random seed for random operators in Program. Zero means get the random seed from random device.

Notes: It must be set before the operators have been added.

global_block()

Get the first block of this program.

block(index)

Get the index block of this program :param index: The index of block to get :type index: int

Returns:The index block
Return type:Block
current_block()

Get the current block. The current block is the block to append operators.

list_vars()

Get all variables from this Program. A iterable object is returned.

Returns:The generator will yield every variable in this program.
Return type:iterable

program_guard

paddle.fluid.program_guard(*args, **kwds)

Change the global main program and startup program with with statement. Layer functions in the Python with block will append operators and variables to the new main programs.

Examples

>>> import paddle.fluid as fluid
>>> main_program = fluid.Program()
>>> startup_program = fluid.Program()
>>> with fluid.program_guard(main_program, startup_program):
>>>     data = fluid.layers.data(...)
>>>     hidden = fluid.layers.fc(...)

Notes: The temporary Program can be used if the user does not need to construct either of startup program or main program.

Examples

>>> import paddle.fluid as fluid
>>> main_program = fluid.Program()
>>> # does not care about startup program. Just pass a temporary value.
>>> with fluid.program_guard(main_program, fluid.Program()):
>>>     data = ...
Parameters:
  • main_program (Program) – New main program inside with statement.
  • startup_program (Program) – New startup program inside with statement. None means do not change startup program.

release_memory

paddle.fluid.release_memory(input_program, skip_opt_set=None)

Modify the input program and insert delete_op to early drop not used variables. The modification will be performed inplace.

Notes: This is an experimental API and could be removed in next few releases. Users should not use this API.

Parameters:
  • input_program (Program) – The program will be inserted delete_op.
  • skip_opt_set (set) – vars wil be skipped in memory optimze
Returns:

None

Scope

class paddle.fluid.Scope
drop_kids(self: paddle.fluid.core.Scope) → None
find_var(self: paddle.fluid.core.Scope, arg0: unicode) → paddle.fluid.core.Variable
new_scope(self: paddle.fluid.core.Scope) → paddle.fluid.core.Scope
var(self: paddle.fluid.core.Scope, arg0: unicode) → paddle.fluid.core.Variable

scope_guard

paddle.fluid.scope_guard(*args, **kwds)

Change the global/default scope instance by Python with statement. All variable in runtime will assigned to the new scope.

Examples

>>> import paddle.fluid as fluid
>>> new_scope = fluid.Scope()
>>> with fluid.scope_guard(new_scope):
>>>     ...
Parameters:scope – The new global/default scope.

Tensor

paddle.fluid.Tensor

alias of LoDTensor

WeightNormParamAttr

class paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)

Used for weight Norm. Weight Norm is a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. Weight Norm has been implemented as discussed in this paper: Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.

Parameters:
  • dim (list) – The parameter’s name. Default None.
  • name (str) – The parameter’s name. Default None.
  • initializer (Initializer) – The method to initial this parameter. Default None.
  • learning_rate (float) – The parameter’s learning rate. The learning rate when optimize is \(global\_lr * parameter\_lr * scheduler\_factor\). Default 1.0.
  • regularizer (WeightDecayRegularizer) – Regularization factor. Default None.
  • trainable (bool) – Whether this parameter is trainable. Default True.
  • gradient_clip (BaseGradientClipAttr) – The method to clip this parameter’s gradient. Default None.
  • do_model_average (bool) – Whether this parameter should do model average. Default False.

Examples

data = fluid.layers.data(name="data", shape=[3, 32, 32], dtype="float32")
fc = fluid.layers.fc(input=data,
                     size=1000,
                     param_attr=WeightNormParamAttr(
                          dim=None,
                          name='weight_norm_param'))