Going back to at least 1960, computational technologies have been integrated into practices of creative expression, particularly concerning the graphic arts, while also providing scaffolding for new possibilities of practice.1 Now, AI technologies (here understood as the implementation of machine learning techniques in distributed systems) threaten to drastically flatten this expressive potential, turning from a generative or at least integrative to a transcriptive paradigm—from exploration of new modes of aesthetic production to the transactional input of natural language translated via the latent space of diffusion models into arbitrary output formats. This flattening is most clearly observable in the image domain, in the homogenising “platform realism” of AI-generated imagery flooding all media.2
On the near horizon, the spectre of generative AI models trained on AI-generated slop promises even further flattening of the space of creative expression.3
While text and image modalities dominate these discussions, the audio domain provides perhaps the most stark examples considering the fundamentally temporal, situated and embodied paradigm of instrumentation, composition, or performance.
A pre-chatGPT era artifact such as Wekinator, while admittedly constrained to input and output formats as well, called for an integrated practice of expression for a consideration of value mappings and function activations mapped to tacit knowledge and a particular subjective aesthetic that can be elaborated through reflective engagement.4 Subsequent usage of AI technologies in “music generation systems”5 is widespread, with plugins, online tools, or dedicated software packages being integrated into practice with varying degrees of reflection.6 However, as Jourdan and Caramiaux note, “the use of deep learning has enabled more complex interactions to be created but has, at the same time, reduced user agency over the models.”7
The post-chatGPT scenario sees the transcriptive interface paradigm definitively entering the audio domain, with products such as Suno or Udio serving prompt-based audio synthesis. Yet, similar to the text and image domain, synthesis may be a misleading word choice. The core of the matter is that such transcriptive interfaces produce, or perhaps machine, a closed whole, a perfect circle divorced from the integrated interrogation of materials that marks creative expression practices.
In this paper, I present, reflect on and discuss the implications behind the research-through-design of an interval synthesizer that, while based on a neural network, does not afford an input-/output type interaction.8
While it, therefore, may not be a 'good' instrument (fitting as I am neither a musician nor an instrument maker), I will argue that it supports the articulation of design opportunities beyond the logic of the transcriptive interface. I choose this approach of thinking on aesthetics through making an instrument built on the consideration that the creative expression in an auditory domain has a distinct advantage over others, as it fundamentally thrives on situatedness, uncertainty, ambiguity, coincidence and serendipity in composition and performance alike. This is especially clearly outlined by Hamilton, who proposes that there is an “aesthetics of imperfection” that drives auditory experience, especially in the performance and experimentation of music—whether improvised or not.9
It is my contention that the transcriptive interface offered by natural language generative AI technologies does not allow for this aesthetic potential of imperfection, and that therefore beyond questions of skill or taste, this interface paradigm is lacking in terms of creative expression on a deeply fundamental level. The artifact I present in the following is, therefore, not, at the root, the proposal of a specific instrument. Rather, it embodies the proposition that to go beyond the transcriptive interface paradigm; design needs to locate opportunities for imperfection in the operations of AI technologies.
The presented artifact is a web-based graphical interface, affording key-based interaction with a visualization of a neural network’s loss landscape in an orthographic projection.10 Separately accessible at Entoptic Interval Synthesizer.
Using WASD keys, the loss landscape can be navigated and the view changed. Pressing the spacebar starts or stops the playback of interval pairs keyed to the particular location on the loss landscape. There are further options for simple audio configuration (Q = waveform, E = BPM) and triggering a random walk algorithm (R).
Besides information on interaction and the central loss landscape, the interface design contains three responsive information designs. Top right, the current interval pair is displayed in a large font size, with an interconnecting typographic element that corresponds to the degree of loss for the respective interval prediction (from “||
” for lowest 20% loss to “/\
” for highest 20% loss). Below, the audio configuration for BPM and waveform is displayed before another typographic element indicates the state of the random walk algorithm (alternating between “/
” and “\
”).
Technically, the loss landscape visualizes a loss function in three dimensions rather than the typical two. Roughly, loss is the computed mismatch between training dataset and prediction as a neural network architecture converges towards a model of input-output relationships. In this case, the architecture is a multilayer perceptron (MLP, adapted class from PyTorch) with a specific hyperparameter configuration (see code snippet further below). During training, a custom dataset class stores loss values of individual predictions per training epoch. The predictions are MIDI frequencies for the second note in an interval pair, given the first note, key, octave, and interval type (e.g., unison, minor 2nd, major 2nd, etc.). On completion of training, the stored loss values of predictions are correlated with the loss values of training as represented on the y-axis of the loss landscape; and stored together as a JSON file. The dataset and training pipeline are created in Python, while the web-based prototype is implemented using p5.js.11
In the following, I reflect on the design process, and in particular the core aesthetic decisions, of the Entoptic Interval Synthesizer from a strongly subjective first-person analytic position. To frame this, the following should be noted. On a conceptual level, the Entoptic Interval Synthesizer draws firstly on prior work in which the potential of probabilistic uncertainty in machine learning techniques for design was proposed, particularly as a material appropriately specific to these techniques.12 Secondly, its name places it in continuity with the Entoptic Field Camera, a research prototype that allows for situated exploration of uncertain image-to-image synthesis.13 “Entoptic,” referring typically to visual phenomena stemming from the physiological interplay of eye and brain, here serves as a metaphor for design that places the emphasis in form-giving on the interplay between withdrawn technical components. In other words, whereas a transcriptive interface design foregrounds the “aesthetic coding” of input and output, a design leveraging the entoptic metaphor attempts a coding of operations between input and output.14
Returning to the 303rd time of training a neural network to predict an interval note (or rather, it's midi frequency) for a given note, octave, key and interval type; I made a decision on form that took shape through points below zero (the unknown amount of experimentation before I began saving results) to 357 in time, which in itself involved many variant interdependencies between variant forms and materialities. Finally, one such interdependence struck a particular chord, and the parallel jostling for all the other forms dependent on aesthetic decision (color, composition, typography, interaction, waveforms, gain, envelopes, beats per minute) received a decisive impulse.
"parameters": { "input_size": undefined, "num_classes": 1, "hidden_sizes": [72,36,72,36,72], "dropout_rate": 0.0, "learning_rate": 0.1, "batch_size": 36, "epochs": 500, "activation_functions": ["tanh","tanh","tanh","tanh","tanh"], "use_residual": False, "weight_init": "xavier", "optimizer":"rmsprop" };
Hyperparameter settings for iteration 303; specifying a range of routines and procedures. Experimentation with high learning rates, uniform activation functions as well as the combination of weight initialization and optimizer functions was noted to have a strong effect on the topography of the loss landscape
Predicting this or that interval note given this or that input took a backseat—this was no longer about utility in the sense of a "servo-mechanical model of technology" that prioritizes the approximation of the most direct input-output function.15
Instead, the loss in training became my primary concern of design. As such, I began to think of loss as the material reverberation into form of other material decisions such as hyperparameters, dataset processing, neural architecture, and so forth. The goal became to make predictive capacity irrelevant, or rather to bend the utility of a neural network away from making discrete, efficient predictions and toward the probabilistic bending of data through high-dimensional embeddings en route to model convergence. Once a model is trained, output can be demanded by input. But during training, loss moves through embedding and encoding dimensionalities predicated on material decisions; and, crystallized in the landscape, it's movement towards the lowest mismatch between given data and prediction can be retraced. Moving with loss is the primary interaction afforded by the prototype, and whether a particular encoding is preferrable over another is left to the mover.
As the topographical vocabulary used above shows, I had a particular aesthetic feature in mind (as far back as I came across Li and colleagues' 2018 paper “Visualizing Loss Landscapes of Neural Nets”, in fact)—the baseline material decision for this computational procedure suggested staying close to the procedure’s topographical encoding of form. But it took a “logistical imagination” that the call for this issue prompted to begin an actual articulation of the concrete and cohesive form of an experimental instrument leveraging loss.16
Along the way and still ongoing, loss moves me to consider how this matters for design, how it taps into the disregarded or obscured aesthetic potential of new modes of seeing brought about by systems with the capacity to make rather than only replicate patterns. With logistical imagination, Hockenberry, Starosielski and Zieger term an inflection point for theory and practice to attend to the “representational and imaginative modes of logistical activity”17 seeing media and logistics as inextricably interlinked as world-making and -perpetuating engines in the vein of Virilio's "logistics of perception."18
Further elaborating on how to theorize the logistical instruments constitutive of particular media regimes, Rossiter proposes that a contemporary logistical media theory is based on the consideration that if "calculation machines have displaced representational regimes, then the ontological properties of media become secondary to the procedural routines of sorting, classifying, correlation, pattern recognition, prediction, and preemptive action."19
At the very least, a logistical media theory for AI technologies pays off as soon as the forms of analysis, research and/or design are released from an obsession with input and output modalities. In other words, as soon as the servo-mechanical logic, with its conventional forms of subjects and objects, is not granted primacy. The call for this issue highlights this obsession in its most contemporary facet: the aesthetic restraint of the transcriptive interface. The assertion of an aesthetic restraint can be elaborated through Flusser's long-held imperative that we ought not to ask "what [technical images] show [but] what they show for."20In other words, the discernible aesthetic qualities of a technical image such as an interface do not wholly account for the system to which this interface gives access—instead, to what effect precisely these qualities constitute access is a more useful question.
As it turns out, if we consider this question on the mediating effects of interactions with the transcriptive interface itself, the analysis is actually quite brief: the transcriptive chatbot interface models call and response on an equal but never too equal footing—the invocation of response is performed at leisure through an instrument of control (the soft or hard keyboard), the CSS-styling of information retrieval and rendering (e.g., little animated icons as faux-avatars, text appearing letter by letter etc.) an invariant presence standing reserve. This is then not a question of whether this other actually possesses cognitive capacities we may call intelligence, but rather how the subject’s relation to the mere thought of such capacities in others can be channelled into a particular utility. Therein lies the mediation of a fidelity to a specific ideal of free subjects and their (!) others. The specificity of this ideal is a matter of power, of politics. With Rancière’s aesthetico-politics, we may further suggest that the transcriptive interface reaffirms rather than challenges a "distribution of the sensible"21; that is, a self-evident consensus on what is sayable, doable, visible, audible by and for whom. Here, mastery, posession and servitude are at the forefront. This does not mean that each transcriptive interface is deliberately designed so as to perpetuate status quo power relations, but rather in the first instance that it's very form is an expression of these relations. The transcriptive interface thus has a symbolic function in Cassirer's sense, in that it takes its place as an "organ of reality" in the conception of the world that localizes the transcriber and interface according to utility.22 And ultimately, this utility is conceived in the logic of extractive capitalism in two ways: the machinery of the transciptive interface produces ‘perfect’ forms that are fully fleshed and blind expression abstracted from creative practice; while the metadata of servo-mechanical interaction feeds the machinery, and on and on.
So far, so good. At the same time, this analysis is dissatisfying in its shallow grasp on what exactly the transcriptive interface as a designed artefact obscures, and whether this may carry implications for moving beyond the restraints. This is, I suggest, due to analyses of interaction itself remaining on what Simondon referred to as the "scale of the operator", the here and now of a particular technological form rather than the internal and external logistics of its formation.23 Again, this is not new; and Cassirer, Simondon and Flusser are deliberately referred to here to show the longevity of the challenge distinguishing yet also integrating the form, formation, function and meaning of a technological object. Thus, the form of the transcriptive interface is an aesthetic restraint twice over in that it hinders transcending the given aesthetico-political consensus in practice despite all supposed disruptive capacities of "generative AI" (e.g., reaffirming students as cheaters, images as fake, etc.), and at the same time as the object of analysis gives no clue where else to go.
Analyses are one thing, but the call for this issue very accurately demands considerations of the role of design and the responsibilities and opportunities for form-giving that the transcriptive interface restrains. Generally, I am convinced that continuous designerly experimentation with forms can lead the push of the ratchet towards new potentials for cultural expression and that these potentials may lead to a state of affairs that is very "dissensual" from today's extractive consensus on who gets to sense and do what.24 To that end, I return to the more specific designerly consideration that I begun earlier, now with a more fully logistical sensitivity to the procedural routines of AI technologies as substructures to contemporary mediation.
Loss, much like uncertainty, is not welcome in transcription. Within such contexts, uncertainty is generally to be engineered or explained away—despite it being the fundamental part-and-parcel material property that singles these technologies out.25 In the logic of capitalist utility, on the other side of transcription one finds perfect forms, finely machined into a consumable packet of data. But whether as data noise or model variance, what Parisi terms the "sheer receptivity" of AI technologies' neural architectures to patterns is grounded in the probabilistic aspect of machine learning techniques.26 In implementation, such techniques combine specific procedures and routines (e.g., data preprocessing, weight initialization, value normalization, loss minimization, training scheduling etc.) that all contribute to the convergence of a model in one way rather than another. As indicated, loss minimization is one of the most important of these procedures: pushing back against the predictive errors in training towards a replication of the latent relationships in a given dataset. So in training, loss moves the neural network's model through "a field of possible ways in which to function".27 By contrast, while uncertainty can be used to stress the probabilistic attribute of a specific form (chat response, generated image, probable distribution…) derived from a specific model (diffusion, GAN, …); loss opens a trace of the quasi-forms left behind in that model's convergence. In the context of the presented artefact, these quasi-forms are the predictions a model makes and discards during its formation. Loss, in other words, can be considered as the material reverberation of AI technologies' logistical substructures in mediating forms.
The interval synthesizer, as a proof of concept, tries to make these quasi-forms accessible. As the primary aesthetic component, the loss landscape allows for geometric encodings of a model's 'possible ways', performing an "indexing operation"28 of paths into the cells of a topographical mesh. In line with the above reasoning, each cell is inscribed with a quasi-form, the particular input-output relation of the computed loss at that point in parameter space at that time in training. On a peak cell, the pre-convergence prediction that, for example, a B4 is followed by an interval note F#5, may fit no given rule of interval relationships in relation to those surrounding it in topographical proximity. But whether this matters can only be felt. To me, rather than transcriptive, the quasi-forms of the interval synthesizer suggest that design can pursue shaping cultural expression as a ludic negotiation with "particle invasions" (Flusser) that are perhaps too hastily discarded.
Here, I reflect on the implications gathered from the creation of the Entoptic Interval Synthesizer in terms of creative expression practices and design aesthetics. Regarding the former, I propose that interfaces leveraging loss may reorient creative practices with AI technologies from transcriptive ‘perfection’ towards imperfect potential. Regarding the latter, I briefly reflect on the presence of quasi-forms in contemporary graphic and fine arts that hint at a broader significance of loss as a design aesthetic quality.
As it stands, no studies have been conducted to test whether the interval synthesizer in and of itself is suitable for creative expression in comparison to transcriptive interfaces. However, within the context of this article, this is not the major point. As I noted upfront, it is likely not a “good” instrument that can be seamlessly integrated into creative practices as is, or if measured in terms of usability. Likewise, the technical applicability to more complex AI technologies remains to be tested. For instance, it is workable to use the technique on the generator network in a generative adversarial network (GAN), to provide alternative navigation to latent space walks. But besides the technical applicability to more complex AI technologies, I argue that loss landscapes (or similar forms of making technical measures sensible) showcase the virtue of a “shallow model” mindset.29 Scurto, Caramiaux and Bevilacqua propose that such may be preferrable for tools of creative expression in contrast to the typically sought-after ‘deep’ models, since their shallowness (in terms of hidden layers, techniques, and/or dataset size) furnishes “probabilistic properties that remain to be engaged as material by designers.”13
In this light, the interval synthesizer stands as a proof-of-concept for design to move from the search for the perfect parameter mapping to the question of how these technologies allow specific types of imperfection in creative expression. This is a fine balance to walk, as even an acknowledgement of the probabilistic aspects of AI technologies does not necessarily mean a novel paradigm has been introduced. A case in point in the audio domain is that while arguably the open architectures of Virtual Studio Technology (VST) plug-ins remain a highly active source of experimentation, commercial VSTs aestheticize the “random"30 while still providing conventional means of interaction and expression. If design, at least according to Flusser, is to “make naturally conditioned mammals into free artists,”31 this does not mean artistry in the image of mastery. It is the freedom for imperfections, for “error, mistake and poor choice”, as performative qualities rather than consumption.32
Lastly, I find reason consider loss on a yet broader scale. Loss, in contrast to the blinking cursor, moves through the distributed architectures of contemporary technological mediation. I propose that an emerging pattern of contemporary design aesthetics rhymes with this notion. Quasi-forms and logistical substructures—as specific as these may seem to the concerns of AI technologies and the transcriptive interface, the particular designerly logic of form-giving they represent is actually wide-spread. The graphic work of David Rudnick integrates forms that oscillate around and converge in a coherent aesthetic mode of seeing, a Fourier transform of aesthetic potential. The paintings of Parsa Khalili suggest shifting architectures, always seemingly one stumble away from a completely different in-forming gaze. The designed worlds of Metahaven tap into the future residue of contemporary tendencies, teasing out forms of experience and expression latent in the logistical substructures of now.
These emerging aesthetics seem, then, to show that the "symbolic net, the tangled web of human experience," is becoming supplemented by a subsymbolic net that assembles and disperses the quasi-forms processed in the logistic substructures of contemporary mediation.33 This processing moves by means of technical indices such as loss, and any resultant form (whether human-facing interface or merely an interfacing component) can be designed to bear the trace of this movement. That the transcriptive interface paradigm does not is not only evidence of its form being steeped in conventional utility, but also simply a shame for the aesthetic potential left untapped. What richness and resonances is hidden below the threshold of conventional utility remains to be probed by the slow experimental feedback loops of practice and research. In the end, I think that any practice for contemporary design with a claim to some material sincerity must begin with a consideration of the movements in the logistics of perception, of form and its transient imperfections in the subsymbolic net.
Addendum
There is an irony to all of this. I could not have designed the interval synthesizer, and could not have thought and felt my way through why and how it may matter, without the assistance of a transcriptive interface—a chatbot granting access to a model with more programming patterns than I possess. Transcription can have a place in designerly experimentation. Perhaps when the substructures under investigation require teasing apart in the search for something to hold onto in the process of form-giving, and when a transcriptive interface is not the final form, this is fine. But maybe not. Either way it cannot be denied, and material sincerity requires as much