... | ... | @@ -6,11 +6,11 @@ The source code documentation is in the repository "multimodalDatasetBuilder/doc |
|
|
|
|
|
## Introduction
|
|
|
|
|
|
In the multimodal dataset creation, sentences of documents are enriched by images which in best case represent the context of these sentences. Such an image is called "main image". A multimodal sentence with a main image will also have at least one focus word. A focus word is defined as a word that is complex and depictable/concrete at the same time. The complex word identifier classifies if a word is complex. It can be turned off. Then every word is classified as complex. The depictability/concreteness property of a word is mainly derived from the concreteness values file. These concreteness values are calculated over the image dataset beforehand. For every focus word in a sentence, the main image of the sentence will be saved in a version in which the focus word is highlighted.
|
|
|
In the multimodal dataset creation, sentences of documents are enriched by images which in best case represent the context of these sentences. Such an image is called "main image" and is retrieved with [CLIP](https://github.com/openai/CLIP). A multimodal sentence with a main image will also have at least one focus word. A focus word is defined as a word that is complex and depictable/concrete at the same time. The [complex word identifier](https://github.com/in2dblue/mastersThesis) classifies if a word is complex. It can be turned off. Then every word is classified as complex. The depictability/concreteness property of a word is mainly derived from the concreteness values file. These concreteness values are calculated over the image dataset beforehand for example with this [implementation](https://github.com/victorssilva/concreteness) of [Visual Concreteness](https://arxiv.org/abs/1804.06786). For every multimodal focus word in a sentence, the main image of the sentence will be highlighted according to that word with [miniCLIP](https://github.com/HendrikStrobelt/miniClip) and saved seperately.
|
|
|
|
|
|
Example sentence from the [simple Wikipedia article "Zetland (lifeboat)"](https://github.com/LGDoor/Dump-of-Simple-English-Wiki):
|
|
|
|
|
|
The _boat_ was damaged in 1864, and was to be scrapped - however, following protest it was given to the town's people.
|
|
|
The boat was damaged in 1864, and was to be scrapped - however, following protest it was given to the town's people.
|
|
|
|
|
|
The main image of the sentence is on the left and its highlighted version according to the focus word _boat_ is on the right. The image doesn't show the _Zetland_ but to be fair the image was only retrieved for the beforementioned sentence. Considering this, the image can represent the context of the first part of the sentence a bit - an _old_ (looking) _boat_.
|
|
|
|
... | ... | |