Though self-supervised contrastive learning (CL) has shown its potential to achieve state-of-the-art accuracy without any supervision, its behavior still remains under investigated by academia. Different from most previous work that understands CL from learning objectives,
we focus on an unexplored yet natural aspect: the spatial inductive bias which seems to be implicitly exploited via data augmentations in CL. We design an experiment to study the reliance of CL on such spatial inductive bias,
by destroying the global or local spatial structures of image with global or local patch shuffling, and comparing the performance drop between experiments on original and corrupted dataset to quantify the reliance of certain inductive bias.
We also use the uniformity of feature space to further research on how CL-pre-trained model behave with the corrupted dataset. Our results and analysis show that CL has a much higher reliance on spatial inductive bias than SL,
regardless of specific CL algorithm or backbones, opening a new direction for studying the behavior of CL.
|