paddle.fluid.nets.simple_img_conv_pool(input, num_filters, filter_size, pool_size, pool_stride, act, param_attr=None, pool_type='max', use_cudnn=True, use_mkldnn=False)


paddle.fluid.nets.sequence_conv_pool(input, num_filters, filter_size, param_attr=None, act='sigmoid', pool_type='max')


paddle.fluid.nets.glu(input, dim=-1)

The gated linear unit composed by split, sigmoid activation and elementwise multiplication. Specifically, Split the input into two equal sized parts \(a\) and \(b\) along the given dimension and then compute as following:

\[{GLU}(a, b)= a \otimes \sigma(b)\]

Refer to Language Modeling with Gated Convolutional Networks.

  • input (Variable) – The input variable which is a Tensor or LoDTensor.
  • dim (int) – The dimension along which to split. If \(dim < 0\), the dimension to split along is \(rank(input) + dim\).

The Tensor variable with half the size of input.

Return type:



# x is a Tensor variable with shape [3, 6, 9]
fluid.nets.glu(input=x, dim=1)  # shape of output: [3, 3, 9]


paddle.fluid.nets.scaled_dot_product_attention(queries, keys, values, num_heads=1, dropout_rate=0.0)

The dot-product attention.

Attention mechanism can be seen as mapping a query and a set of key-value pairs to an output. The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function (dot-product here) of the query with the corresponding key.

The dot-product attention can be implemented through (batch) matrix multipication as follows:

\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]

Refer to Attention Is All You Need.

  • queries (Variable) – The input variable which should be a 3-D Tensor.
  • keys (Variable) – The input variable which should be a 3-D Tensor.
  • values (Variable) – The input variable which should be a 3-D Tensor.
  • num_heads (int) – Head number to compute the scaled dot product attention. Default value is 1.
  • dropout_rate (float) – The dropout rate to drop the attention weight. Default value is 0.

A 3-D Tensor computed by multi-head scaled dot product attention.

Return type:



ValueError – If input queries, keys, values are not 3-D Tensors.


1. When num_heads > 1, three linear projections are learned respectively to map input queries, keys and values into queries’, keys’ and values’. queries’, keys’ and values’ have the same shapes with queries, keys and values.

1. When num_heads == 1, scaled_dot_product_attention has no learnable parameters.


# Suppose q, k, v are Tensors with the following shape:
# q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10]

contexts = fluid.nets.scaled_dot_product_attention(q, k, v)
contexts.shape  # [3, 5, 10]