This dissertation has focused on timbre in Kiki. The most obvious extension of this work would be to apply timbral learning to more musical robots playing different types of instruments, especially pitched instruments. The methods could also be extended to different features of musical sound other than timbre. Can musical robots learn how to use `playfulness' in music by listening to humans play? Or loudness, for that matter; how do musicians know when it is appropriate, in an ensemble setting, to play loud? In some situations it is appropriate to mimic the loudness of the other members of the ensemble, while at other times it is appropriate to oppose it. The techniques presented here, at least very broadly speaking, could be extended to allow a robot to learn those situations by listening.
More generally, this dissertation has presented a broad approach for interactive machine learning in musical contexts. The approach involves learning to transcribe what the human is doing at the level of information as opposed to data, learning to mimic specific instances of behaviour, and modeling how those instances are to be used in context. This approach could be extended to other types of musical interaction between human and machine. For example, in a responsive dance environment, a computer could be trained to recognize meaningful (as defined by the dancer) gestures. Given an appropriate map from the gesture space to a sonic space, the computer could learn how to produce sound of interest to the dancers, and then learn how to assemble the sounds in sequence to form an improvised musical composition in response to an improvised dance. If the map from gesture to sound involved affective labels as an intermediate step, the computer would learn in what contexts certain affective gestures are used, and would respond with music that contains an appropriate affect without simply mimicking the dancers. The computer would learn this by observing the dancers' movement.
This opus has presented a set of tools for dealing with timbre in musical robots with the aim of making them more engaging for use by amateur musicians at home. The assumption was that this will inspire musicians to play more at home, and that this will in turn have other musical and extramusical benefits. In the future, the efficacy of these tools for this purpose should be evaluated directly. Will musicians play more frequently or for longer periods of time with a robot that models timbre than with one that does not? Would this cause them to become better at keeping time while they play? Would it accelerate the rehabilitation of an atrophied arm? Would children playing with such a robot experience increased long-term cognitive benefits?
Assuming that such benefits are found to exist, a more important future question is how robots can be used to ensure these benefits become distributed equally in society. Often, new technologies are expensive and therefore only benefit those whom are privileged enough to afford them. The potential danger in a musical robot that could increase cognitive abilities in children is that only rich schools would buy them, thus increasing economic disparities for posterity. On the other hand, a musical robot that is only purchased once would be far less expensive than an entire music program at a public school, which would require continuous funding. This allows the possibility that musical robots, in contrast to human ensembles, could lower the barrier to these benefits and make them more accessible to more people. A school that cannot afford a music program might afford a robot. This would be consistent with the motivation stated in the opening paragraph of this dissertation: to use robots to fill niches that human musicians do not currently occupy. Thus, if Kiki or a similar robot were ever commercialized, the most important work to be done would be to develop a strategy for ensuring that it becomes not a toy for the rich, but a tool for increasing accessibility to the associated benefits.