
Mining the Displacement of Max-pooling in Convolutional Neural Networks


题目:Mining the Displacement of Max-pooling in Convolutional Neural Networks

主讲人:郑煜辰 博士







在国内外高水平期刊和会议发表高水平学术论文20余篇,担任国际文档识别分析会议程序委员会委员(ICDAR 2021 PC Member),国际人工神经网络会议程序委员会委员(ICANN 2022 PC Member),图像与信号处理国际会议程序委员会委员 (ISPR 2021 PC Member),中国自动化学会会员,中国计算机学会会员。担任Pattern Recognition, Multimedia Systems, IEEE TCSVT, IEEE TNNLS, IJDAR, IET Image Processing, IJCAS等国际高水平期刊审稿人, ICANN, ICDAR, IJCNN, ICFHR, CVPR, IJCAI等国际会议审稿人。


The max-pooling operation is a common step in modern deep convolutional neural networks (CNNs), which is often introduced to obtain translation-invariant representations and downsample the feature maps of convolutional layers. However, in doing so, it loses some spatial information. In this thesis, we extract a novel feature from max-pooling operation in CNNs, called displacement features. The displacement features record the location coordinates of the maximum values in pooling windows of the max-pooling operation. Then, we analyze and discover the class-wise trend and behavior of the displacement features in many different ways. To verify the effectiveness of the displacement features, we apply the displacement features on two classical tasks, text recognition and offline signature verification. For text recognition tasks, We extract the displacement features from the max-pooling layer and combine them with the features resulting from max-pooling to capture the micro differences between the similar classes. The extensive experimental results and discussions on three text datasets, MNIST dataset, HASY dataset, and Chars74K-font dataset demonstrate that the proposed displacement features can improve the performance of the CNN based architectures and tackle the issues with the micro deformations of max-pooling in the text recognition tasks. For offline signature verification tasks, we extract the displacement features of the maximums in the max-pooling operation and fuses it with the pooling features to capture the micro deformations between the genuine signatures and skilled forgeries as a feature extraction procedure. The extensive experimental results and analysis on GPDS-150, GPDS-300, GPDS-1000, GPDS-2000, and GPDS-5000 datasets demonstrate that the proposed method can discriminate the genuine signatures and their corresponding skilled forgeries well and achieve state-of-the-art performance on these datasets.
