Image Captioning of Branded Products
Deep Image Captioning
Generating captions that describe the content of an image is a task emerging in computer vision. Lastly, Recurrent Neural Networks (RNN) in the form of Long Short-Term Memory (LSTM) networks have shown great success in generating captions matching an image's content. In contrast to traditional tasks like image classification or object detection, this task is more challenging. A model not only needs to identify a main class but also needs to recognize relationships between objects and describe them in a natural language like English.
Image Captioning of Branded Products
In a collaboration with the GfK Verein (link), we introduced a pipeline capable of automatically generating captions for images from social media. In particular, we look at images that contain an object which is related to a brand by depicting a logo of this brand on it.
?
In this project, we focus on correctly identifing the brand contained in the image, but state of the art models like Vinyals et al. [1]?tend to produce rather generalized descriptions. In contrast, we want our model to correctly mention the name of the brand contained in the image within the sentence. Simultaneously, we predict attributes that describe the involvement of the human with the brand, whether the branded product appears in a positive or negative context, and whether the interaction is functional or emotional.
?
Reference
- Philipp Harzig, Stephan Brehm, Rainer Lienhart, Carolin Kaiser, René Schallner,?Multimodal Image Captioning for Marketing Analysis?IEEE MIPR 2018?Miami, FL, USA, April 2018,?[ PDF]
?
?
?
?
?
?
?
Further References:
?
[1] Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
For more information please contact?
Philipp Harzig.