fluid.transpiler

DistributeTranspiler

class paddle.fluid.transpiler.DistributeTranspiler(config=None)

DistributeTranspiler

Convert the fluid program to distributed data-parallelism programs. Supports two modes: pserver mode and nccl2 mode.

In pserver mode, the main_program will be transformed to use a remote parameter server to do parameter optimization. And the optimization graph will be put into a parameter server program.

In nccl2 mode, the transpiler will append a NCCL_ID broadcasting op in startup_program to share the NCCL_ID across the job nodes. After transpile_nccl2 called, you *must* pass trainer_id and num_trainers argument to ParallelExecutor to enable NCCL2 distributed mode.

Examples

# for pserver mode
pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
current_endpoint = "192.168.0.1:6174"
trainer_id = 0
trainers = 4
role = os.getenv("PADDLE_TRAINING_ROLE")

t = fluid.DistributeTranspiler()
t.transpile(
     trainer_id, pservers=pserver_endpoints, trainers=trainers)
if role == "PSERVER":
     pserver_program = t.get_pserver_program(current_endpoint)
     pserver_startup_program = t.get_startup_program(current_endpoint,
                                                     pserver_program)
elif role == "TRAINER":
     trainer_program = t.get_trainer_program()

# for nccl2 mode
config = fluid.DistributeTranspilerConfig()
config.mode = "nccl2"
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id, workers=workers, current_endpoint=curr_ep)
exe = fluid.ParallelExecutor(
    use_cuda,
    loss_name=loss_var.name,
    num_trainers=len(trainers.split(",)),
    trainer_id=trainer_id
)
transpile(trainer_id, program=None, pservers='127.0.0.1:6174', trainers=1, sync_mode=True, startup_program=None, current_endpoint='127.0.0.1:6174')

Run the transpiler.

Parameters:
  • trainer_id (int) – id for current trainer worker, if you have n workers, the id may range from 0 ~ n-1
  • program (Program|None) – program to transpile, default is fluid.default_main_program().
  • pservers (str) – comma separated ip:port string for the pserver list.
  • trainers (int|str) – in pserver mode this is the number of trainers, in nccl2 mode this is a string of trainer endpoints.
  • sync_mode (bool) – Do sync training or not, default is True.
  • startup_program (Program|None) – startup_program to transpile, default is fluid.default_main_program().
  • current_endpoint (str) – need pass current endpoint when transpile as nccl2 distributed mode. In pserver mode this argument is not used.
get_trainer_program(wait_port=True)

Get transpiled trainer side program.

Returns:trainer side program.
Return type:Program
get_pserver_program(endpoint)

Get parameter server side program.

Parameters:endpoint (str) – current parameter server endpoint.
Returns:the program for current parameter server to run.
Return type:Program
get_pserver_programs(endpoint)

Get pserver side main program and startup program for distributed training.

Parameters:endpoint (str) – current pserver endpoint.
Returns:(main_program, startup_program), of type “Program”
Return type:tuple
get_startup_program(endpoint, pserver_program=None, startup_program=None)

Deprecated

Get startup program for current parameter server. Modify operator input variables if there are variables that were split to several blocks.

Parameters:
  • endpoint (str) – current pserver endpoint.
  • pserver_program (Program) – deprecated, call get_pserver_program first.
  • startup_program (Program) – deprecated, should pass startup_program when initalizing
Returns:

parameter server side startup program.

Return type:

Program

DistributeTranspilerConfig

class paddle.fluid.transpiler.DistributeTranspilerConfig

slice_var_up (bool): Do Tensor slice for pservers, default is True. split_method (PSDispatcher): RoundRobin or HashName can be used

try to choose the best method to balance loads for pservers.
min_block_size (int): Minimum splitted element number in block.
According:https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-369912156 We can use bandwidth effiently when data size is larger than 2MB.If you want to change it, please be sure you see the slice_variable function.

HashName

class paddle.fluid.transpiler.HashName(pserver_endpoints)

Hash variable names to several endpoints using python “hash()” function.

Parameters:pserver_endpoints (list) – list of endpoint(ip:port).

memory_optimize

paddle.fluid.transpiler.memory_optimize(input_program, skip_opt_set=None, print_log=False, level=0, skip_grads=False)

Optimize memory by reusing var memory.

Note: it doesn’t not support subblock nested in subblock.
Parameters:
  • input_program (str) – Input Program
  • skip_opt_set (set) – vars wil be skipped in memory optimze
  • print_log (bool) – whether to print debug log.
  • level (int) – If level=0, reuse if the shape is completely equal, o
Returns:

None

release_memory

paddle.fluid.transpiler.release_memory(input_program, skip_opt_set=None)

Modify the input program and insert delete_op to early drop not used variables. The modification will be performed inplace.

Notes: This is an experimental API and could be removed in next few releases. Users should not use this API.

Parameters:
  • input_program (Program) – The program will be inserted delete_op.
  • skip_opt_set (set) – vars wil be skipped in memory optimze
Returns:

None

RoundRobin

class paddle.fluid.transpiler.RoundRobin(pserver_endpoints)

Distribute variables to serveral endpoints using RondRobin<https://en.wikipedia.org/wiki/Round-robin_scheduling> method.

Parameters:pserver_endpoints (list) – list of endpoint(ip:port).