It has often been pointed out that the proliferation of personal computers and inexpensive audio hardware and software has made it easy for anyone to create music. However, these systems focus on solo, offline production of digital music. Contrastingly, many people also enjoy the realtime production of analog ensemble music, as is made evident by the large number of community orchestras, choirs, and other ensembles in the world. Aside from being pleasurable for the participants, making music in these settings may help build musical skills which are not used in offline music production, such as improvisation, counting, reading, listening, and other general performance skills. More broadly, this type of music making may have academic, psychological, or other non-musical benefits [1], and might improve general cognitive skills [2]. Unfortunately, even the most amateur of these ensembles still have a relatively high barrier to entry, as oftentimes participants must already play an instrument and read music with some proficiency, have the desire to do so in the company of others, and have the time to commit to regular rehearsals. These skills can be difficult to acquire, and modern technology has done little to bring this type of music making to the large number of people who do not possess them. In order to address this, I have developed a robotic musical instrument, motivated by the desire to design a companion device that amateur musicians could have in their home, which would give them the benefits of playing in an ensemble with less hassle and a lower barrier to entry. The goal of this is not to replace human musicians with robots, but to use robots to fill in niches that are not currently occupied by humans.
A companion robot that plays only preprogrammed music or only plays music in a particular style, without ever doing anything surprising, may quickly become boring, not keeping the human's interest long enough to foster the intended benefits. Consequently, a robotic companion would ideally listen and play along in a way that is responsive and appropriate to the stylistic features of the human's playing. What are these stylistic features, and how can a robot know how to use them? This leads to the broad statement of the research question presented in this dissertation:
How can musical robots learn to play by listening to humans play?
Of course musical robotics is an active field, and a great amount of work has already been done towards making them interactive in interesting ways, yet there also remain many outstanding questions. In order to refine the research question, it is useful to examine the broad outstanding questions in the field as stated by some of its practitioners. They have undertaken the study of musical robotics to "study the human motor control from an engineering point of view ... to understand better the human-robot interaction from a musical point of view ... to create new ways of musical expression from a musical engineering point of view" [3], to attempt to address the problem that current robots "lack the ability to understand and process the emotional states of real humans and to develop and synthesize an emotional state and personality of their own" [4], "to facilitate meaningful musical interactions between humans and machines" [5], to study and aid in the development "human acceptance of personal, social, and service robots" [6], "to have a musical robot perform on stage, reacting and improvising with a human musician in real-time" [7], and to "test the effects of embodiment, visual contact, and acoustic sound on musical synchronization and audience appreciation" [8].
A successful realization of each of these goals is likely to depend in some way upon close attention to the timbre of sound produced by the musical robots. For example, it has been shown that, for human percussionists, slight variations in motor control of the drumstick can result in variations in timbre that are perceivable to listeners [9], and that variations in timbre are consistent across performances [10], so an assessment of timbral production in a percussion robot would be key in demonstrating the degree of understanding of motor control mechanisms. It has also been shown that "timbre independently affects the perception of emotions in music" in a way that is "robust" [11], and that this effect might even be, in some cases, applicable across cultures [12]. This suggests that the quality of emotional interactions with musical machines could be enhanced by giving greater attention to the timbre of sound produced by those machines. Furthermore, human performance tends to contain pervasive subtle timbral nuances [13], so the degree of `meaningfulness' in a musical interaction with a machine, or its success as a live performer, may depend upon the machine's ability to produce subtle timbral variations. Consequently I also believe that timbre will be important in the context that motivates me, i.e. engaging musical robot companions at home.
The role of timbre is not only limited to some abstruse research questions in an arcane branch of academia. As part of a study for this dissertation, I asked a variety of musicians "How would you describe the role of timbre in music?" Their responses were surprisingly emphatic and superlative. "Without timbre there is no music, so timbre is one of the, or maybe the most primary feature of music." "The role of timbre in music cannot be overstated. It is connected to memory and emotion and affect in ways that ... note, pitch, and duration [are not]." "It is super important. Composers are known just for being timbral masters ... so I would say it is a very huge part of music." "Timbre I would say would be an axis adjacent to [rhythm and pitch], at another 90 degrees, so a third dimension, and that is like the quality or type of sound which can be modulated in real time with pitch and rhythm to create a whole other dimension to music." "It plays a huge role in the music that I make ... electronic music is really in my opinion defined by timbre or the immediate control or manipulation of timbre." "In the music that I'm interested in it actually has a huge role."
The importance of timbre notwithstanding, very little explicit attention has been given to the timbre of musical robots. The few treatments which do exist shall be discussed at the relevant places throughout the body of this dissertation.
For now, let it suffice to notice that the word `timbre' comes from the Greek word for `drum', and for many drums, especially hand drums, timbre is the primary parameter that the player manipulates while playing (as opposed to pitch for the majority of orchestral instruments). Therefore, percussion robots in particular are specially suited to the study of timbre. Indeed, many mechanical percussionists have been built and evaluated. However, these robots almost exclusively employ wooden drumsticks or mallets mounted on a pivot and actuated by a spring-loaded solenoid or other linear actuator, and produce only a single, static timbre. The evaluations of these systems, rather than measuring timbre, usually measure physical quantities such as force, velocity and repetition rate. For example, Kapur et al. measured impact speed as a function of pulse-width for a variety of solenoid-based drumstick actuators [14]. Similarly, Velez et al. report the impact force at the end of a solenoid-actuated drumstick [5]. Weinberg and Driscoll report the repetition-rate and stroke length (important for providing visual musical cues to humans) of solenoid and linear induction motor actuated drumsticks [15]. McVay et al. report the fretting speed, fretting accuracy, plucking velocity and other measures of a guitar-like machine [16]. Long considers a large variety of percussion actuator designs and reports their latency, loudness, and repetition rates [17]. Although these studies provide important results, they do not address timbre.
Another participant (a percussionist) in the aforementioned study, addressing the role of timbre in music, discussed specifically the role of timbre in human versus machine percussion. "There is the old joke: drum machines have no soul. Old drum machines literally recorded whack on the snare, and you could record snap, snap, snap all day long on a snare on [beats] 2 and 4 but there is something different between 2 and 4 for a human drummer; a minute placement of the stick, maybe it didn't quite catch the rim each time..." He is suggesting that drum machines sound mechanical because they don't manipulate timbre in the way a human would. What are these differences in timbre between strokes, do humans use them systematically, and how can they be incorporated into a robot's musical model?
This leads the more refined version of the previous research question:
How can musical robots learn to use timbre by listening to humans use timbre?
To address this I have built a percussion robot named Kiki, shewn in Figure 1, with the aim of directly studying robotic timbre. I have chosen djembe because its playing technique consists almost exclusively in timbral manipulation, but its technique is somewhat mechanically simpler and more manageable than other instruments for which this is true, such as tabla. In this endeavour, I hope to make small steps towards the larger questions in the field by laying the foundations for a more sophisticated use of timbre in musical robots.
The outcome of this dissertation shall therefore be a set of tools and methods for dealing with timbre in percussion robots, designed for realtime collaborative music-making between a human and machine. The first set of tools shall pertain to the robotic production of timbre. These shall include an analysis of the dynamics of human drum playing, the timbral characteristics of striking mechanisms, and the dynamic and kinematic requirements of a robot that will dynamically produce a variety of timbres. The second set of tools shall pertain to the robotic analysis of timbre. These shall include realtime classification of a human interactor's timbre for drum-stroke transcription, and also robotic self-analysis of timbre so that the robot can assess and improve it's own sound. The final set of tools shall focus on realtime interactive statistical learning of rhythms, where a rhythm is understood to be a distribution of timbres in time, specifically addressing challenges presented by the nontrivial timbral production and analysis tools. Additionally, this dissertation presents a deep historical perspective on robotic musical companionship, establishing humanity's long-standing desire for musical machines that imitate humans in form and function.