Three papers presented at ICDAR2021

Created

September 5, 2021

Tags

PaperComputer Vision

Updated

October 10, 2021

The following three papers have been accepted to and presented at ICDAR20201. All the research results were created by a research collaboration with Kyushu University.

Shinnosuke Matsuo, Xiaomeng Wu, Guntag Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana, Seiichi Uchida, "Attention to warp: Deep metric learning for multivariate time series." Official : https://link.springer.com/chapter/10.1007/978-3-030-86334-0_23 Pre-print : https://arxiv.org/abs/2103.15074 Deep time-series metric learning is challenging due to the difficult trade-off between temporal invariance to nonlinear distortion and discriminative power in identifying non-matching sequences. This paper proposes a novel neural network-based approach for robust yet discriminative time series classification and verification. The proposed method adapts a parameterized attention model to time warping for greater and more adaptive temporal invariance. It is robust against not only local but also large global distortions, so that even matching pairs that do not satisfy the monotonicity, continuity, and boundary conditions can still be successfully identified. Learning of this model is further guided by dynamic time warping to impose temporal constraints for stabilized training and higher discriminative power. It can learn to augment the inter-class variation through warping, so that similar but different classes can be effectively distinguished.

Seiya Matsuda, Akisato Kimura, Seiichi Uchida, "Impression2Font: Generating fonts by specifying impressions." Official : https://link.springer.com/chapter/10.1007/978-3-030-86334-0_48 Pre-print : https://arxiv.org/abs/2103.10036 This paper proposes Impressions2Font (Imp2Font) that generates font images with specific impressions. Imp2Font introduces recent advances of generative adversarial networks (GANs) for handling noisy conditions. More specifically, our proposal is inspired by CP-GAN [Kaneko+ BMVC2019] and accepts an arbitrary number of impression words as the condition to generate the font images. The proposed model can (1) consider proximity among similar impressions and (2) accept unknown impression words not included in the training data, by introducing an impression embedding module built on a word embedding technique.

Masaya Ueda, Akisato Kimura, Seiichi Uchida, "Which parts determine the impression of fonts?" Official : https://link.springer.com/chapter/10.1007/978-3-030-86334-0_47 Pre-print : https://arxiv.org/abs/2103.14216 This paper aims to analyze the correlation between the local shapes and the impression of fonts. By focusing on local shapes instead of the whole letter, we can realize letter-shape independent and more general analysis. The analysis is performed by newly integrating local descriptors like SIFT into DeepSets that can choose an arbitrary and necessary number of essential parts for describing font impressions. Our qualitative and quantitative analyses prove that (1) fonts with similar parts have similar impressions, (2) many impressions, such as legible and rough, largely depend on specific parts, and (3) several impressions are very irrelevant to parts.