INFORMATION ABOUT PROJECT,
SUPPORTED BY RUSSIAN SCIENCE FOUNDATION

The information is prepared on the basis of data from the information-analytical system RSF, informative part is represented in the author's edition. All rights belong to the authors, the use or reprinting of materials is permitted only with the prior consent of the authors.

 

COMMON PART


Project Number21-18-28037

Project titleThe statistical properties of multiple objects as a cue for their segmentation and categorization in human vision

Project LeadUtochkin Igor

AffiliationNational Research University Higher School of Economics,

Implementation period 2021 - 2022 

Research area 08 - HUMANITIES AND SOCIAL SCIENCES, 08-551 - General psychology, methodology and history of psychology, human psychology

KeywordsSegmentation, categorization, perception, visual statistics


 

PROJECT CONTENT


Annotation
In real-world perception, the visual system has to constantly process information about a large number of objects in the visual scene. Despite severe limitations on attention and working memory (Luck & Vogel, 1997; Pylyshyn & Storm, 1998) that make deep processing of all objects at once hardly possible (Wolfe et al., 2011), observers are able to extract much more information than these limitations predict. In an attempt to answer the question of how the access to information outside the limited capacity of attention and working memory is provided, researchers have put forward an idea that the visual system extracts compressed information about summary statistics of multiple objects (ensembles) without storing information about each individual element (Alvarez, 2011): for example, the average feature of all objects (Ariely, 2001; Chong & Treisman, 2003), the variability of the features of all objects (Dakin & Watt, 1997; Morgan et al., 2008), an approximate number of objects (Burr & Ross, 2008; Halberda et al., 2006). A broad variety of perceptual features can be represented as ensemble summary statistics, from orientation (Alvarez & Oliva, 2008) and size (Chong & Treisman, 2005) to emotional facial expressions (Haberman & Whitney, 2007). Ensemble summary statistics can be directly encoded by the visual system, which is supported by sensory adaptation aftereffects for ensemble summaries (e.g., Corbett et al., 2012) and by the fast computational speed of their computation (Chong & Treisman, 2003; Whiting & Oriet, 2011). Recent studies have shown that the visual system also can implicitl represent the entire feature distribution (Chetverikov et al., 2016). This suggests that the summary statistics (mean, variability, etc.) are not separate, independent forms of ensemble representations but are some of the properties available to the visual system as a result of holding a rich representation of the whole featured distribution. One useful implication from the ability to represent the rich feature distribution can be rapid categorization and segmentation. In everyday perception, we often deal with multiple objects of different types intermixed in the space. A typical example is apples and leaves on a tree. In this situation, it is useful to parse the apples and the leaves into independent subgroups before calculating summary statistics for these objects (for example, evaluating the average redness of the apples makes it necessary to get rid of the greenness of the leaves). This is what the visual system often does: for example, it can easily and quickly split many objects into several subgroups by color and calculate ensemble statistics for each group separately (Chong & Treisman, 2005; Im & Chong, 2014). To carry out these independent calculations, the visual system should rapidly “decide” which elements are similar enough to each other to be included in the same subgroup, and which elements are too different to exclude them from this subgroup. The process of such a “decision” is what we call rapid categorization and segmentation. Its implementation requires access to more complex distributional properties than the mean and variability of features (for example, the shape of the distribution, Utochkin, 2015). Studying the process of rapid categorization and segmentation of multiple objects is the main goal of this project. The general problem of segmentation was raised by gestalt psychologists (e.g. Metzger, 1936/2006), as well as by researchers of texture perception and visual search (Bacon & Egeth, 1991; Bergen & Julesz, 1983; Bravo & Nakayama, 1992; Itti & Koch, 2001; Nothdurft, 1992, 1993; Rosenholtz, 2001, 2014; Treisman, 1988; Wolfe, 1992). Their research paradigms differed from our main focus interest in that they were focused mainly on the study of categorical subgroups that have clear spatial organization (textures). That is, textures and groups are usually thought of as of stimuli where objects of one type are in spatial proximity to each other and this whole group of objects is separated from another spatially organized group of objects of another type. In this case, the central question is how the visual system is able to spatially segment two such groups of objects, i.e., to distinguish large surfaces or separate a figure from a complex background. However, as mentioned above, objects of different types often do not have such obvious spatial organization, but are strongly intermixed over space. The mechanisms and laws derived for texture perception with strong spatial organization are not perfectly suitable to explain rapid segmentation in this case. Earlier, our research group has put forward a hypothesis explaining rapid segmentation and categorization of multiple objects mixed in space via the ensemble representation of feature distributions (Utochkin, 2015; Utochkin, & Yurevich, 2016). More specifically, we suggest that the visual system uses information about the shape of the distribution to figure out whether objects of one or different types (categories) are presented. If this distribution has a single peak or a uniform shape and if there are smooth transitions between the individual values of the feature in this distribution, then segmentation of such an ensemble will not occur. As a result, all objects will be assigned to the same category. We called such ensembles and such distributions of features “non-segmentable”. However, if the distribution consists of several peaks with long gaps in-between, then segmentation will be likely — the objects will be assigned to several different categories. We called such ensembles and feature distributions “segmentable”. During the implementation of the 2018-2020 project, our research group carried out a lot of empirical work to test the predictions of this theory (Utochkin, Khvostov, & Stakina, 2018; Im, Tiurina, & Utochkin, 2020; Utochkin, Khvostov, & Wolfe, 2020; Khvostov, Lukashevich, & Utochkin, in press; Khvostov & Utochkin, 2019; Utochkin & Brady, 2020). In the new project, we set a goal to further develop and expand our understanding of the mechanisms behind rapid categorization and segmentation and to answer new questions that arose during the implementation of the 2018-2020 project. In a set of experiments 8, we plan to figure out what the representational basis for segmentation of multiple objects is. Specifically, we are interested in the question as to what kind of visual information can be used to globally group objects into a subset and separate them from other objects, and what kind cannot. Since the ensemble percept is a statistical representation of individual objects, not just of colors or spatial frequencies, we ask how much the segmentation of subsets is related to objects. Is the visual system capable of segmenting subsets based on a characteristic combination of features (i.e., information about objects as a unitized combination of parts), or is it possible only on the basis of unique basic features (for example, the presence of a color that other types of objects do not have)? In the set of experiments 9, we are going to investigate the quantitative characteristics of distributions that provide either the integration of elements into an ensemble of categorically similar elements, or segmentation and selection into a separate category. To do this, we plan to compare within the same study two phenomena that are usually studied in two different experimental paradigms: the pop-out effect in visual search (Treisman & Gelade, 1980; Rosenholtz, 1999, 2001) and “outlier” discount in ensemble averaging (Haberman & Whitney, 2010; Epstein et al., 2020). In this study, we will investigate both tasks on the same set of stimuli parametrically varying the distance between a critical item’s feature and the boundaries of the feature distribution of the remaining objects. Thus, we expect to obtain a function that describes the transition from stable grouping to segmentation, with both pop-out and outlier discount being particular cases of such segmentation. We also plan to compare the obtained functions for the two phenomena, which can give us interesting insights about the potential commonality of their mechanisms. A set of experiments 10 will be dedicated to the issue of the automatic segmentation and discrimination of ensembles with different mean features. We plan to investigate the conditions supporting such discrimination as a basic sensory computation without attention. We will use the visual mismatch negativity (vMMN), a component of the event-related potential (ERP) correlated with early automatic change detection. This set of experiments directly continues the line of EEG studies started in the 2018-2020 project, but here, we consider even more basic segmentation and discrimination based on a systematic variation of distributions for a single sensory dimension. In a set of experiments 11, we are going to test the potential additivity of the segmentation effects produced by the distributions of two feature dimensions at once. If the subsets differ from each other in two ways (for example, one of the subsets is both redder and larger), is segmentation more efficient in this case compared to segmentation when the subsets differ in only one feature? The answers to these questions are intended to complement our knowledge of how the visual system performs rapid categorization and segmentation of multiple objects. This knowledge can be used to develop recommendations for optimizing the presentation of various visual information (interface design, data visualization, etc.), and also be useful in the development of machine vision algorithms.

Expected results
As a result of the project, we expect to get novel findings on how the visual system uses information about feature distributions for rapid categorization and segmentation of multiple objects and figure out boundary conditions that can potentially restrict the efficiency of such categorization and segmentation. We are going to quantitatively probe conditions, when elements from the same categorical group get segmented from the rest of the ensemble members. This segmentation is expected to result in the exclusion of these elements from the calculation of ensemble summary statistics and in the fast detection of these elements in a visual search task (pop-out effect). Our EEG study will be aimed at testing an ERP component called visual mismatch negativity (vMMN), a correlate of automatic stimulus discrimination, in ensemble average discrimination. It will allow us to determine whether quantitative differences between ensembles providing their their grouping or segmentation in psychophysical studies, correlate with a neuronal discrimination signal. This, in turn, will allow us to answer as to which extent conscious rapid categorization and segmentation can be accounted for by basic sensory discrimination and which role attention plays in these. Beside the quantitative estimates of the factors behind grouping to segmentation, we will get the data about the qualitative properties of representations that support the efficient rapid segmentation, or in other words, about the representational basis of rapid categorization and segmentation. Finally, we will find an answer to the question about the joint influence of segmentability from two feature dimensions – whether they complement each other in forming categorical groups. The theoretical significance of these results is related to their contribution to various research fields of vision science. First, the development of the rapid categorization and segmentation theory contributes to the general theory of ensemble perception, especially given the current revisit of ensemble representations as compressed summary statistics towards the idea of rich feature information provided by the entire feature distribution. Moreover, our studies can give new insights about connections between ensemble representations and other visual tasks, previously considered separately in the literature. For example, the study on the outlier devaluation in visual averaging and pop-out in visual search can let us see these two effects as potential results of a common mechanism, rapid ensemble-based segmentation. The study on the representational basis of segmentation can contribute to a prominent and long-lasting discussion of the nature of visual representations beyond the narrow scope of focused of attention (Treisman & Gelade, 1980; Treisman, 2006; Duncan & Humphreys, 1989; Rosenholtz et al., 2012): Are large groups of things in the visual field represented sparsely as a set of basic features or can they instead be bound into coherent objects whose holistic pattern is used for their grouping with other objects of the same kind and for their separation from objects of different kinds? From a practical point of view, our data can serve a basis for the further development of computer vision algorithms (for example, for the classification of multiple objects on images) and for modelling visual processes. Our data can be used as a psychological support and supplement of the effective design in computer interfaces, visualization tools, infographics, that can improve user experience (UX). In particular, our studies can help in determining how large the differences should be between one data point and the rest ones in a data plot to consider this data point an outlier and to segment it from other objects on plot. Is it enough to represent two groups of data only by color or varying other features (such as shape) will make the data points easier to distinguish? We plan to publish results of the project in top international journals specializing in the area of visual perception and neural mechanisms of cognition and behavior. Potential journals for publications include but not limited to: 1. Journal of Experimental Psychology: General (Q1 Scopus) 2. Cognition (Q1 Scopus) 2. Journal of Experimental Psychology: Human Perception and Performance (Q1 Scopus) 3. Journal of Vision (Q1 Scopus) 4. Scientific Reports (Q1) 5. Attention, Perception, and Psychophysics (Q1 Scopus). We also plan to present our results at respected international conferences, such as: 1. Annual Vision Sciences Society Meeting (VSS). 2. European Conference for Visual Perception (ECVP). 3. Psychonomic Society Meeting.


 

REPORTS


Annotation of the results obtained in 2021
In 2021, we completed pilot experiments and main experiments of Series of Experiments 8, pilot experiments of Series of Experiments 9 and 10, and also completely new Series of experiments 12 replacing of main experiments of Series of experiments 9. Series of experiments 8 aimed to investigate the representational basis of ensemble segmentation and selection. Three possible “candidates” for being such a representative basis were tested: basic features (unique colors of a target group not shared with any of other objects), “proto-objects” (feature “bundles” partly shared with other objects but combined together only in a target group), and bound objects (feature conjunctions where exact spatial relations between features and parts are important to distinguish the target group from distractors). In three experiments, we found high accuracy of orientation averaging and size averaging in target subsets when these target subsets were defined by a unique basic feature that none of the distractors had (for example, red targets among green, yellow and blue distractors). In these cases, averaging accuracy was as good as in a baseline condition when only the targets and no distractors were presented. It was also possible for observers to select a target subset and judge its average when target features were partially presented in distractor subsets (e.g., red-green targets among yellow-green, red-blue, and yellow-blue distractors). However, the accuracy rate was lower than in the baseline condition. Finally, when the target subset differed from distractors by exact object-based characteristics, or by an exact spatial conjunction (e.g., red-green targets among green-red, yellow-blue, and blue-yellow distractors) the accuracy of averaging was dramatically lower. Overall, these results let us conclude that basic features and “proto-objects” (sparse representations of different features belonging to particular objects, yet not conveying information about the exact spatial structure of this object) can be used as representational basis for segmentation of subsamples. Series of experiments 9 aimed to establish a potential quantitative relationship between pop-out in visual search and so-called outlier discount in ensemble averaging as a function of orientation difference between the outlier (or a visual search target) and the mean feature of the ensemble. We have run four pilot experiments based on methods described in previous works on outlier discount (Haberman & Whitney, 2010; Epstein et al., 2020). However, we failed to reliably replicate this phenomenon throughout these experiments. Hence, we decided not to run main experiments within this series of experiments. In replacement of Series of experiments 9, a new study was implemented. Series of experiments 12 was designed to measure systematics biases that observers make when they try to adjust the average orientation in sets with skewed feature distributions. Using an adjustment method, we found that responses were systematically biased away from mean and toward the mode of a skewed distribution. The bigger the distance between mean and mode was, the stronger the bias was. This effect was consistent across four experiments with various testing conditions (offline in-lab testing vs. online), various shapes of feature distributions, and various types of probe stimuli (single object or an ensemble with different degrees of skewness). A biologically plausible model was suggested to account for this pattern. This model is based on fundamental and well-established mechanisms of visual system organization, such as population coding and spatial pooling of local neural responses by neurons with large receptive fields. Our model showed high prediction accuracy in Monte Carlo computational simulations. Model parameters that provided best fits to the data were very close to realistic tuning curves of involved neurons, as reported in the electrophysiological literature. Moreover, we compared our model with another recent model with a completely different “engine” of the averaging process (Teng et al. 2021). Although the basic principles behind each model are different, they both accurately predicted systematic biases in averaging of skewed distributions. However, our model was found to have some advantage due to its parsimony. Series of experiments 10 was designed to investigate the role of attention in basic ensemble computations such as averaging, using EEG-markers of sensory processing. Our previous work (Utochkin et al., 2018) showed that rapid segmentation of ensembles into different categorical groups takes place at early stages of stimulus processing. In a subsequent study, it was also found that such early segmentation evokes the visual mismatch negativity (vMMN), a component of event-related potentials (ERP) usually related to automatic detection of change, not involving or barely involving the attentional processing. In our new study, we decided to probe the role of attention in the computation of an average ensemble feature using the ERP technique. Specifically, we are interested in the vMMN in response to a suprathreshold change of mean orientation in an ensemble under concurrent attentional load by a demanding central task. As per our year plan, in 2021 we developed and tested psychophysically valid stimulation that would be used in our main EEG experiments next year. We designed ensemble stimuli with thorough control for some low-level confounds, such as occurrence of locally salient outliers, or ensembles being out of range of each other. In a pilot study with these stimuli, we found that they firmly suprathreshold mean orientation discrimination in a reasonable corridor between chance level and perfect discrimination without errors. We therefore conclude that these stimuli are suitable for further use in the main EEG experiments.

 

Publications