Design for TensorArray

This design doc presents the necessity of a new C++ class TensorArray. In addition to the very simple C++ implementation

class TensorArray {
  explicit TensorArray(const LoDTensor&);
  explicit TensorArray(size_t size);

  vector<LoDTensor> values_;

We also need to expose it to PaddlePaddle’s Python API, because users would want to use it with our very flexible operators WhileLoop. An example for a RNN based on dynamic operators is

input =
num_steps = Var(12)

TensorArray states(size=num_steps)
TensorArray step_inputs(unstack_from=input)
TensorArray step_outputs(size=num_steps)

W = Tensor(...)
U = Tensor(...)
default_state = some_op()

step = Var(1)

wloop = paddle.create_whileloop(loop_vars=[step])
with wloop.frame():
    wloop.break_if(pd.equal(step, num_steps)
    pre_state =, default_state)
    step_input =
    state = pd.sigmoid(pd.matmul(U, pre_state) + pd.matmul(W, step_input))
    states.write(step, state)
    step_outputs.write(step, state) # output state

output = step_outputs.stack()


Steps are one of the core concepts of RNN. In each time step of RNN, there should be several input segments, states, and output segments; all these components act like arrays, for example, call states[step_id] will get the state in step_idth time step.

An RNN can be implemented with the following pseudocode

Array states;
Array input_segments;
Array output_segments;
Parameter W, U;

step = 1
seq_len = 12
while_loop {
   if (step == seq_len) break;
    states[step] = sigmoid(W * states[step-1] + U * input_segments[step]);
    output_segments[step] = states[step] // take state as output

According to the RNN roadmap, there are several different RNNs that PaddlePaddle will eventually support.

Currently, the basic RNN implementation supported by PaddlePaddle is the recurrent_op which takes tensors as input and splits them into input_segments.

Since a tensor cannot store variable-length sequences directly, PaddlePaddle implements the tensor with level of details (LoDTensor for short). Segmenting the LoDTensor is much more complicated than splitting a tensor, that makes it necessary to refactor the recurrent_op with LoDTensor segmenting support.

As the next step in RNN support, dynamic_recurrent_op should be introduced to handle inputs with variable-length sequences.

The implementation is similar to recurrent_op. The key difference is the way the original input LoDTensors and outupts are split to get the input_segments and the output_segments.

Though it can’t be built over recurrent_op or dynamic_recurrent_op directly, the logic behind splitting a tensor or a LoD tensor into input_segments remains the same.

Why TensorArray

The logic behind splitting the inputs to segments, states and outputs is similar and can be shared in a seperate module.

The array of states, input_segments and output_segments would be exposed to users when writing a dynamic RNN model similar to the above pseudo codes.

So there should be an array-like container, which can store the segments of a tensor or LoD tensor.

This container can store an array of tensors and provides several methods to split a tensor or a LoD tensor . This is where the notion of TensorArray comes from.

Introduce TensorArray to uniform all the three RNNs

TensorArray as a new concept is borrowed from TensorFlow, it is meant to be used with dynamic iteration primitives such as while_loop and map_fn.

This concept can be used to support our new design of dynamic operations, and help to refactor some existing variant-sentence-related layers, such as recurrent_op, RecurrentGradientMachine.

In our design for dynamic RNN, TensorArray is used to segment inputs and store states in all time steps. By providing some methods similar to a C++ array, the definition of some state-based dynamic models such as RNN can be more natural and highly flexible.

Dynamic-operations on TensorArray

TensorArray will be used directly when defining dynamic models, so some operators listed below should be implemented

# several helper operators for TensorArray
def tensor_array_stack(ta, tensor):
    get a tensor array `ta`, return a packed `tensor`.

def tensor_array_unstack(tensor, ta):
    get a `tensor`, unstack it and get a tensor array `ta`.

def tensor_array_write(ta, index, tensor, data_shared):
    get a `tensor` and a scalar tensor `index`, write `tensor` into index-th
    value of the tensor array `ta`.
    `data_shared` is an attribute that specifies whether to copy or reference the tensors.

def tensor_array_read(ta, index, tensor):
    get a tensor array `ta`, a scalar tensor `index`, read the index-th value of
    `ta` and return as the `tensor`.

def tensor_array_size(ta, tensor):
    get a tensor array `ta`, return the size of `ta` and return as the scalar `tensor`.

It is trivial for users to use so many low-level operators, so some helper methods should be proposed in python wrapper to make TensorArray easier to use, for example

class TensorArray:
    def __init__(self, name): = name
        self.desc = TensorArrayDesc()

    def stack(self, name=None):
        Pack the values in a `TensorArray` into a tensor with rank one higher
        than each tensor in `values`.
        `stack` can be used to split tensor into time steps for RNN or whileloop.

        @name: str
            the name of the variable to output.
        tensor = Var(name)
        tensor_array_stack(, tensor)
        return tensor

    def unstack(self, input):
        Unpacks the given dimension of a rank-`R` tensor into rank-`(R-1)` tensors.
        `unstack` can be used to concatenate all the time steps for RNN or whileloop.

        @input: str
            the name of input tensor

    def write(self, index, value, data_shared=True):
        Write value into index of the TensorArray.
        If `data_shared` is set to True, than the index-th value in TensorArray will
        be shared with the tensor passed in.

        @index: str
            name of a scalar tensor
        @value: str
            name of a tensor
        @data_shared: bool
        tensor_array_write(, index, value, data_shared)

    def read(self, index, output):
        Read the value at location `index` in the `TensorArray`.

        @index: str
            name of a scalar tensor
            name of a output variable
        tensor_array_read(, index, output)

    def size(self, output):
        Return the number of values.

        @output: str
            name of a scalar tensor
        tensor_array_size(, output)