The final project should include the following:
bibtex
to manage references.% Small choose
$$
Name | Alignment Score function | Citation |
---|---|---|
Content-base attention | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \frac{\boldsymbol{s}_t \cdot \boldsymbol{h}_i}{\|\boldsymbol{s}_t\| \|\boldsymbol{h}_i\|}\) | Graves et al. (2014) |
Additive Attention | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \boldsymbol{v}^\top \tanh(\boldsymbol{W}_1 \boldsymbol{s}_t + \boldsymbol{W}_2 \boldsymbol{h}_i)\) | Bahdanau et al. (2015) |
Location-based Attention | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \boldsymbol{W}\boldsymbol{s}_t\) | Luong et al. (2015) |
General | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \boldsymbol{s}_t^\top \boldsymbol{W}\boldsymbol{h}_i\) | Luong et al. (2015) |
Dot-product | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \boldsymbol{s}_t^\top \boldsymbol{h}_i\) | Luong et al. (2015) |
Scaled Dot-product | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \frac{\boldsymbol{s}_t^\top \boldsymbol{h}_i}{\sqrt{n}}\) | Vaswani et al. (2017) |
Instead of using the full attention matrix, we can use a local attention mechanism to compute the attention weight matrix:
The attention mechanism can be applied to image data as well. In fact the convolution operation can be seen as a special case of the attention mechanism:
Gconv
is the graph convolution. By choosing different convolution filters, we can have different types of graph convolutions.Readout
permutation-invariant graph operation that outputs a fixed-length representation of graphs.