Music is a part of human culture and human cognition that serves no apparent survival purposes. However, humans have evolved to play and understand music as a faculty in parallel to other cognitive functions, and have today made it into a huge entertainment and communication industry (Perlovsky, 2017). Music can be separated into several major components, mainly melody, rhythm, and harmony. Rhythm can make the listener experience several sensations, such as pleasure, relaxation, excitation, and various emotional reactions (Juslin, 2010). An important experience that is strongly related to rhythm is groove, defined as “wanting to move some part of the body in relation of some aspect of the sound pattern” (Madison, 2006). A very interesting question, for musicians in particular, is what it is in the musical stimuli that induces the sensation of groove. A number of quantifiable properties of the musical signal have experimentally been shown to elicit the sensation of groove, a common aspect of which seems to be the predictability of events (Davies, Madison, Silva, & Gouyon, 2013; Frühauf, Kopiez, & Platz, 2013; Madison, 2014a, 2014b; Madison, Gouyon, Ullén, & Hörnström, 2011; Madison, Ullén, & Merker, 2017; Madison & Sioros, 2014; Ravignani & Madison, 2017; Sioros, et al., 2014). There are several rhythmic effects that musicians can incorporate to make music more pleasurable, and to better express themselves.
To understand groove better, Janata, Tomic, and Haberman (2012) administered multiple surveys that included statements generated on the basis of the general intuition of the factors that contribute to experiencing groove based on definitions that the participants provided and descriptive phrases that the researchers believed would be associated with the concept of groove to varying degrees. Participants answered using an 8-point scale and the results showed that participants perceived groove as an aspect of music that compelled them to move and was regarded as pleasurable. Then, they tested if the groove perceived by participants is affected by genre, tempo, and familiarity using 148 musical excerpts, from various music genres like jazz, rock, soul, and folk. From the 148 music excerpts, 20 music excerpts were drum loops. The results showed that groove ratings were strongly affected by the genre and the tempo of the excerpts, with tendencies for soul/R&B and faster tempi to elicit higher groove ratings. Finally, they investigated how participants experience groove during sensorimotor synchronization. For that, they took 48 stimuli from the previous study, which they divided into three categories of stimuli with low, medium, and high-groove ratings. They noticed that when they asked the participants to perform the tapping task, the overall task enjoyment was reduced. Even when they asked participants to tap as they liked, this did not lead to greater enjoyment. The general conclusion was that “groove is that aspect of music that induces a pleasant sense of wanting to move along with the music” (p. 56).
One way for musicians to tamper with the recurring pulses, to create a more complex musical structure, is syncopation. According to music theory, syncopation is a temporary displacement of the regular metrical accent in music caused typically by stressing the weak beat (Rutherford-Johnson, 2013). What effect syncopation has on groove perception has been investigated using several different methods and stimuli such as self-report with Likert type scales and motor responses, including head bobbing, synchronized tapping, and free movement according to the rhythm (Davies et al., 2013; Janata et al., 2012; Madison & Sioros, 2014; Sioros et al., 2014; Vuust et al., 2006; Vuust et al., 2011; Witek et al., 2017).
Sioros, Miron, Davies, Gouyon, and Madison (2014) wanted to see whether syncopation induces groove. To accomplish that, on the first experiment, they used simple piano melodies that had a clear metrical structure. Seven of the twelve melodies were more complicated in structure, while the other five were composed for the purposes of the experiment. To introduce syncopation, they applied a transformation which employed the principles of the “improper binding” of notes articulated in weak metrical positions to missing notes in the following strong position to generate syncopation. They created two sets of two transformations, based on the rhythmical subdivisions they used. One used eighth notes and the other one used sixteenth notes. Volunteers, in this part of the experiment, had to rate how movement inducing they considered the stimuli to be using three rating scales; familiarity, preference, and groove (as defined above). The results showed that the simple melodies had lower ratings in the three statements, which increased more than the complex melodies after the transformations were applied. The differences between the transformed versions of the simple and complex melodies were small. For the second experiment, they wanted to test whether any transformation that introduces faster metrical levels would also lead to higher ratings, or if it is syncopation specifically that causes higher ratings. For this reason, ten of the stimuli that were used in the first experiment and with the previous transformations that introduced the syncopation, they used another set of transformations that did not introduce syncopation. In addition, they tested whether the strength of the syncopation had any effect in groove ratings. The rating scales were identical to those from the first experiment, and the results showed that in fact the highest ratings were related to syncopation overall, while there were no significant differences between the different levels of strength of syncopation.
In a similar fashion, Witek et al. (2015) asked participants to listen to 50 computer- generated drum beats at 120 BPM that varied in syncopation density. The syncopation density was defined using an index of syncopation based on Longuet-Higgins and Lee (1984). Out of the 50 drum beats, 34 were transcribed from real funk tracks, two were included in a preconfigured sound pack in the software used to synthesize the stimuli and the remaining 14 were constructed specifically for the task, in order to cover the whole syncopation spectrum in terms difficulty and abstraction of reccuring pulses. The results showed an inverted U-shaped relationship between the degree of syncopation and the participants’ ratings. Music examples with lower and higher syncopation density received lower ratings compared to the medium ones, meaning that participants did not want to move their body as much, when the drum beats had too much or too little syncopation.
Another study by Witek et al. (2017)examined how the body- and hand- movements are affected by syncopation. To achieve this, they asked participants to listen to 15 synthesized drum-breaks, which were categorized in low, medium and high degree of syncopation. Accelerometer data from the motion sensor inside two Wii controllers were used to reflect participants’ movements, one being strapped to the lower back of the body and the other being held in the right hand. The results concerning participants’ experience of groove were similar to those of the other Witek study mentioned above. But the results from the data collected from the lower back of the body and the right hand showed that participants moved the least in the examples with high syncopation. But in low and medium amounts of syncopation there was no significant different between the lower back and the right hand, suggesting that the medium amount of syncopation provides a balance between the complexity and the predictability of the sound stimulus. Participants did not have the desire to move, or they did not actually move while listening the high syncopation stimuli. The study concluded that as the amount of syncopation rises, the less synchronized the participants were, but a medium amount of syncopation is considered to elicit more the feeling of wanting to move the body, as well as actual body movement.
A main theme amongst these different results is that very little and very much syncopation induce less groove than does a medium level of syncopation. Obviously, syncopes represent faster metrical levels, and convey thereby more precise temporal information (Madison, 2014a). But it is unclear why representing several metrical levels continuously does not induce at least as much groove, as was found by Sioros et al. (2014). It has been suggested that this is because syncopes convey the same essential information with considerably less information flow through the perceptual systems, as different metrical levels reinforce each other and provide redundant representations of time. This works because the metrical space can be conceived as a hierarchical temporal structure, in which higher levels of longer intervals are superordinate to lower levels of shorter intervals. If manifest levels incur a cost for the perceptual processing, sparse representations in the guise of syncopes may be preferred (Madison et al., 2017).
What mechanisms might account for the empirical findings mentioned above? Numerous theoretical models have been advanced to explain why musical structures are the way they are. Here, I will just mention a few that seem to my mind to have bearing on the relation between rhythmic structures and perception of groove. First, is Connectionism (Feldman J.A. ; Ballard D.H., 2010) and then, the “A-not-B-error”, first introduced by Piaget (2006) as part of his theory of cognitive development, and recently reapproched by Smith and Thelen (2003) as dynamical systems theory. For Connectionism, perception works similar to a computer, where there are nodes and connections between the nodes, which, for the brain, are neurons and the synapses between the neurons. This creates a network consisting of an input level, where the stimulus enters the system, an output level, where a decision or an evaluation according to the stimulus exits the system, and one or more hidden levels. The operation of the system is determined by different weights between the connections, which are typically established through iterated feedback learning, in relation to outcome criteria (Todd ; Loy, 1991).
Piaget’s “A-not-B-error”(2006) seen as a dynamic system by Smith and Thelen(2003), whereby a computer program tries to simulate behavioural development across 10 and 12 month old infants in the classical “A not B” task. Compared to a computer program, which sends information, in the form of feedback to correct the output result, or behaviour, human behavior receives every component of human interaction, such as a smile, a social interaction, or a reach, stores it and processes it over time. All these different correction inputs work as correction to the system, which then adapts its reactions accordingly. For groove, the feedback given to the system in the form of clapping along to the music as an infant, and adapting to receive more pleasure when the person is capable of being in synchronization with the music that she is listening to.
The most promising model for explaining the pattern of results, I suggest, is Predicitve Coding. Importantly, it seems to be able to explain both why groove is based on predictability, and integrate the results into a comprehensible model that covers both simple rhythmic patterns and real music (Clark, 2013; Friston, 2002; 2005). According to this framework, human cognition and perception is divided into several hierarchical levels. The lower level is the incoming stimulus level, whether it is visual, auditory, haptic or olfactory. The higher levels of the model constitutes the highest level of processing and creating several alternative hypotheses to explain what the incoming stimulus is. By sending those hypotheses, or predictions, to the lower level, the higher level receives prediction errors, which are then used to either correct the hypothesis, or create a new one, in order to create and maintain an accurate representation of the surrounding environment. The way that this model can achieve that is by using Bayes’ rule of probability recursively from level to level in nested neural networks (Vuust ; Witek, 2014). Using the equation p(a|b) = p(b|a)*p(a)/p(b), where b can be the input and a is the hypothesis, in every level of the network, a nested and hierarchical link is created across the brain. The hypotheses that are created and have successfully predicted an incoming stimulus work as prior prediction for the next similar incoming stimulus.
This creates a network that works both bottom-up and top-down. It is bottom-up in the sense that the input comes from the sensory level and is fed to the higher level and it is top-down in the sense that the prediction is created on the higher level and is fed down to the early sensory processing substrates. Both these processes are mutually dependent. The top-down processes provide the brain with context-sensitive ways of selecting the most appropriate interpretation of the incoming stimulus. The network remains constantly updated in order to create a causal role between predictions and environmental events, and to be able to maximize the accuracy of the predictions for the sensory input and minimize prediction error. This can be done by using the previous accurate predictions, called priors, to create the initial hypotheses. The veracity of this model can intuitively be appreciated by considering the power by which we are taken in by illusions. For example, the visual modality features the Rubin vase, the Necker cube, and Escher’s famous art, and the auditory modality offers infinitely changing pitch (Shepard, 1964) and tempo (Madison, 2009).
The main principles of Predictive Coding seem to apply to music cognition as well, in general, and to rhythm perception in particular. For example, Brochard et al. (2003) performed a very simple experiment in which the participants were asked to listen to a metronome beat that did not differentiate between strong and weak beats. But participants differentiated them into a strong-weak pattern, suggesting a duple meter, which are the most common in Western music. This simple experiment demonstrates the role of priors, meaning in this case the brain expected to hear a duple meter, and the actual stimuli did not refute this hypothesis enough to abandon it through the top-down/ bottom-up process. Returning to groove, I suggest that at least certain aspects of geoove can be explained by the Predictive Coding theory.
An isochronous beat is conceivably the most fundamental percept in music, as it forms the temporal grid that organizes all sound events both in time in general, as well as in their order and hierarchical structure (Madison et al., 2017; Ravignani & Madison, 2017). It should be emphasized that “there can only be one”, i.e. that the brain structures responsible for creating this structure can only represent one tempo simultaneously. As mentioned above, even an isochronous sound sequence can be interpreted at double and half the manifest tempo, for example tapping every other beat or tapping twice per beat, and a typical musical signal is immensely more complex, and open to a multitude of interpretations. In other words, a substantial difference between the actual musical signal and the simple beat model is the natural state of affairs when listening to music. From a Predictive Coding standpoint, the prediction error forwarded upwards in the system is small when the amount of syncopation is low. It will not change the hypothesis because there is little difference between the perceived beat and the actual metre of the musical structure that is contained in the auditory signal. In the case of a high amount of syncopation, the input is very complex, and hence the amount of prediction error is so high that it causes the predicted metrical model to either collapse, or not be created at all. When the amount of syncopation is between these extremes, however, the metric modulations are large enough to produce prediction errors and for the perceptual system to make a prediction model. Thus the many differences between the input and the model create a rich flow of information that successfully updates the model. Perhaps this state is inherently pleasurable, as is signals that what the brain is doing is working, at the same time as it is solving a relatively demanding problem. In doing so, it is effectively enabling synchronized movement. In other words, the prediction error that goes from the lower levels of the brain to the higher levels would ultimately create the feeling of pleasure and the desire to move (Vuust & Witek, 2014).
There are basically three techniques for examining Predictive Coding in the context of rhythm and groove, namely, mostly fMRI, self-report, and, in the case of rhythm perception, motor responses. Motor responses combined with brain imaging methods were used in Vuust et al. (2006; 2011) but only for polyrhythms. The point of using polyrhythms is that they provide the listener with a bistable percept, meaning that the listener is presented with an auditory stimulus- that contains more than one higher-level representation, which both are correct but cannot be perceived simultaneously. The results from both these studies showed activation in brain areas that are associated with language processing.
There have not been any studies, except the one by Witek (2017) mentioned earlier, utilising motor responses and syncopated grooves to evaluate the Predictive Coding model. The previous studies employed the self-report method, through Likert-type scales. As mentioned previously, that study indicated that a medium amount of syncopation yielded the highest ratings of liking and feeling of wanting to move their body, while the results from the free movement showed participants not moving in the high amount syncopation stimuli. That is, according to Predictive Coding, because in a low amount of syncopation there is not much incongruence between the input and the predictive perceptual model, and when the amount is high there is too much incongruence.
Another topic that has not been investigated a lot concerns the type of the stimuli. In almost all of the studies mentioned earlier, the stimuli that have been used were either extracted from real music, synthesized using some sort of software, or created by the researchers and then recorded by musicians. This presents a problem of bias, inasmuch as Western culture shares approximately the same knowledge and listens to music that is created using the same basic principles, all which consitute priors. This leads to the question of how much listeners will appreciate and have the sensation of groove from a sound-sequence, whether it is a drum beat or a melody, when the prior knowledge is reduced as much as possible or even eliminated. What is the extent of this prior knowledge? A way to eliminate this prior knowledge is by applying some kind of random generation of sound sequences. In the present case there is a particularly straightforward way to do that. Starting with a sound pattern that contains multiple metrical levels (Madison, 2009), deleting any sound event amounts to a syncope. Randomly omitting sounds in the metrical structure of computer generated rhythms is “theory-free”, especially when the program does not recognize the boundaries and restrictions of music theory. Specifically, a computer program generates a sound for every point in a time sequence, distinguishing higher metrical levels only by higher loudness (Madison, 2009). Thus, each level represent the note values, such as quarter notes, eighth notes, and so on. Depending on the percentage of all the sounds being audible, the program randomly omits sounds from any of the levels to create a unique and syncopated rhythm.
To test this hypothesis, I designed a listening and tapping experiment, with random computerized beats that will contain different amounts of syncopation, by randomly omitting several sounds in each level, and will have minor differences in tempo. Participants will rate their experience using four Likert type scales, that will contain statements which describe different aspects of groove, such as if they liked what they heard, if the sounds made them move, and if the felt confused by the sounds. For the tapping task, participants will have to synchronize with a specific level of sound that is distinguishable from the other sounds.
With the rating, I intend to measure the level of groove-inducing, while the tapping task is intended to assess the ability to synchronize with the signal as a function of the amount of information. Since participants will have to do both tasks at the same time, I am also interested to see whether their perception of groove is associated with their synchronization success.
I hypothesize that because these randomized beats are not similar with beats transcribed from real music or with beats that have input from someone with knowledge of music theory, as syncopation increases, participants’ groove sensation will decrease. The studies mentioned above have shown that the sensation of groove is affected by the amount of syncopation, but these stimuli being “theory-free”, will most probably be treated as novel by the participants. Therefore, participants will show a preference to the ones containing the most information. At the same time, I hypothesize that, as the amount of syncopation decreases and less sounds from the sound sequence are omitted, participants will be able to tap along with the beats.
If these hypotheses are supported, it would seem that, for these beats, synchronization variability and asynchrony decrease with more manifest levels in the stimulus. Syncopation will be negatively associated with groove. If the hypotheses are rejected, then the results will be in accordance with the studies mentioned above, where perception of groove is positively correlated with syncopation.