The final project should include the following:
bibtex to manage references.% Small choose
$$
| Name | Alignment Score function | Citation |
|---|---|---|
| Content-base attention | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \frac{\boldsymbol{s}_t \cdot \boldsymbol{h}_i}{\|\boldsymbol{s}_t\| \|\boldsymbol{h}_i\|}\) | Graves et al. (2014) |
| Additive Attention | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \boldsymbol{v}^\top \tanh(\boldsymbol{W}_1 \boldsymbol{s}_t + \boldsymbol{W}_2 \boldsymbol{h}_i)\) | Bahdanau et al. (2015) |
| Location-based Attention | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \boldsymbol{W}\boldsymbol{s}_t\) | Luong et al. (2015) |
| General | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \boldsymbol{s}_t^\top \boldsymbol{W}\boldsymbol{h}_i\) | Luong et al. (2015) |
| Dot-product | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \boldsymbol{s}_t^\top \boldsymbol{h}_i\) | Luong et al. (2015) |
| Scaled Dot-product | \(\text{score}(\boldsymbol{s}_t, \boldsymbol{h}_i) = \frac{\boldsymbol{s}_t^\top \boldsymbol{h}_i}{\sqrt{n}}\) | Vaswani et al. (2017) |
Instead of using the full attention matrix, we can use a local attention mechanism to compute the attention weight matrix:
The attention mechanism can be applied to image data as well. In fact the convolution operation can be seen as a special case of the attention mechanism:
Gconv is the graph convolution. By choosing different convolution filters, we can have different types of graph convolutions.Readout permutation-invariant graph operation that outputs a fixed-length representation of graphs.