nets¶
simple_img_conv_pool¶

paddle.v2.fluid.nets.
simple_img_conv_pool
(input, num_filters, filter_size, pool_size, pool_stride, act, param_attr=None, pool_type='max', use_cudnn=True)
sequence_conv_pool¶

paddle.v2.fluid.nets.
sequence_conv_pool
(input, num_filters, filter_size, param_attr=None, act='sigmoid', pool_type='max')
glu¶

paddle.v2.fluid.nets.
glu
(input, dim=1) The gated linear unit composed by split, sigmoid activation and elementwise multiplication. Specifically, Split the input into two equal sized parts \(a\) and \(b\) along the given dimension and then compute as following:
\[{GLU}(a, b)= a \otimes \sigma(b)\]Refer to Language Modeling with Gated Convolutional Networks.
Parameters:  input (Variable) – The input variable which is a Tensor or LoDTensor.
 dim (int) – The dimension along which to split. If \(dim < 0\), the dimension to split along is \(rank(input) + dim\).
Returns: The Tensor variable with half the size of input.
Return type: Variable
Examples
# x is a Tensor variable with shape [3, 6, 9] fluid.nets.glu(input=x, dim=1) # shape of output: [3, 3, 9]
scaled_dot_product_attention¶

paddle.v2.fluid.nets.
scaled_dot_product_attention
(queries, keys, values, num_heads=1, dropout_rate=0.0) The dotproduct attention.
Attention mechanism can be seen as mapping a query and a set of keyvalue pairs to an output. The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function (dotproduct here) of the query with the corresponding key.
The dotproduct attention can be implemented through (batch) matrix multipication as follows:
\[Attention(Q, K, V)= softmax(QK^\mathrm{T})V\]Refer to Attention Is All You Need.
Parameters:  queries (Variable) – The input variable which should be a 3D Tensor.
 keys (Variable) – The input variable which should be a 3D Tensor.
 values (Variable) – The input variable which should be a 3D Tensor.
 num_heads (int) – Head number to compute the scaled dot product attention. Default value is 1.
 dropout_rate (float) – The dropout rate to drop the attention weight. Default value is 0.
Returns: A 3D Tensor computed by multihead scaled dot product attention.
Return type: Variable
Raises: ValueError
– If input queries, keys, values are not 3D Tensors.Note
1. When num_heads > 1, three linear projections are learned respectively to map input queries, keys and values into queries’, keys’ and values’. queries’, keys’ and values’ have the same shapes with queries, keys and values.
1. When num_heads == 1, scaled_dot_product_attention has no learnable parameters.
Examples
# Suppose q, k, v are Tensors with the following shape: # q: [3, 5, 9], k: [3, 6, 9], v: [3, 6, 10] contexts = fluid.nets.scaled_dot_product_attention(q, k, v) contexts.shape # [3, 5, 10]