"Four People
Play
Transparent Mahjong"
Shen Shaomin1
Zoom in. I roll over and blearily enter a search term into Google. I breathe through my mouth as my fingers assume the qwerty position, fudging plastic into a nylon scissor mechanism, tacking against silicon domes, closing circuits. A microcontroller collects these positions in sequence, passing them via a serial protocol to the logic board of the broader machine, parsing them through this or that layout interpreter into Chrome. At a moment of passive consent, my intention transmutes into TCP packets, assembled into reorderable bursts, blasting first through the air as a 5GHz radio wave, then shunted and rewritten over the NAT, translated again into light. Outside world. The signal hits my ISP’s local PoP, and a subsequent translation occurs via BGP into the media of the internet itself, where it is auto-flicked in the direction of the nearest Google edge node. Without a cached result on hand, the node coaxes my intention inward, shuttling through IXPs with a complex choreography of SYN/ACK signals, window sizing, and congestion management operations. If any packets are lost or corrupted, they're automatically retransmitted, often taking different paths through the network. My intention arrives at Google's data center network, undergoing security scanning and filtering prior to actual admittance inside the internal network where load balancers push my request into a specific array of servers. The semantic impression of my intention is only unpacked here, first through Google’s raw natural language processing (lexical, semantic, syntactic, contextual), then through Google’s unfathomably dimensional Knowledge Graph, a constantly-evolving world-model, within which the my intention distends through countless nodes and achieves a flicker of raw relationality with Google’s picture of the world, against which a clamor of ranking machinery agitates for the respective relevance of this or that entity as a potential response back to me. It happens so fast, the return, the population of a hierarchized list of determinedly relevant or sponsored results delivered in 14px dark-mode Roboto and organized through hundreds of lines of HTML and CSSOM and JavaScript (whose event listeners huddle in standing-reserve), interpreted locally through the ideological and business decision-matrix that is Chrome, plotted into a render tree, reflowed into a specific layout, so much of it asynchronous, so much of it simultaneous, composited and rendered and refreshed as pixels 60 times per second. Unsatisfied, I tweak my wording.
Take any point of conjunction and zoom in to infinity, fractal-like – from the coy phenomenology of fingers on plastic (what part of the finger? where does the plastic end and the finger begin? who moves the finger?) to the ideological matrices of HTTP or HTML or TCP/IP or Python (what is a high-level language if not ideological programming, a tucking-in or hiding of every underlayer of hardware contact into crownless trees of insistent utility?).2 Take the whole and you risk reductionism to the point of perversion; you draw a virtual machine with little descriptive power and every inheritable violence of implication or assumption.
And yet, of course, interface is everything – as the theoretical milieu comes to understand a world as a nodeless thicket of “interfacial relationships,”3 the interface comes to actually supplant the object as the locus of identity, everything approaches objectlikeness to the extent that everything approaches interfaciality, and in turn this fractal-like, vanishing quality can be felt at all scales and levels of being. As Google CTO of AI & Society Blaise Agüera y Arcas puts it, “I’ve come to see everything in terms of relationships”4 – meaning that every thing is instantiated as such by virtue of its relationalities.
How deep do you want to go?
Let’s err on the side of reduction and build two sandboxes. There’s an environment prior to the genAI-epoch-defining “Attention Is All You Need”5 and there’s an environment after it. That prior environment is riddled with the seeds of the latter (computation, cybernetics, multivariable calculus, San Francisco, RNNs, big data, NVIDIA, GANs) but the two environments are isomorphically distinct and mutually irreducible. Something changed, not just at the level of raw technology or computation but deep in the connective tissue of world-making (what it is possible to think, what it is possible to build).
Huge cascades of new, impossible relationships have opened up – the “many-to-many” configurations of neural media,6 the multimodal commingling of digital image and textual language and musical sound as somehow interrelatable structures, and the spawning of specialized autonomous agents crawling over every conceivable surface of our online digital architectures. But weirdly, at the same time, the surface-layer transaction between meaty fingers and the all-human corpus of Anthropic’s Claude is for the most part identical to the legacy protocols of Google Search: a blinking cursor, a laptop, a gateway, a server request made through a series of translations, a security protocol, a Kubernetes cluster, a final-stage black box of unfathomable complexity, a response delivered patiently in meat-time, served up in plain text, decorated with conversational disposition and discursive character.
The tension at this moment between interface as strictly technical media and interface as the armature of the possible is enormous.
We can dig into this tension and break it into two composite halves:
What happens to interface design after AI (which we use here as a kind of euphemism for the post-self-attention state of affairs): how do the matrices of our worlds become repatched according to the ideological governance of CUDA, PyTorch, LiteRT, transformers and diffusion models, agent-based systems served up over SaaS, and a convergent system of arbitrary benchmarks?
What happens to AI as a result of pre-existing interface design paradigms: what frictions do we hit, what impossibilities arrive pre-baked into the potentiality of AI, what inhibitions does AI inherit from our worlds at present?
To tackle these questions, we will push through the ontology of the interface into the epistemology of the interface – asking how does an interface become knowable? We will use examples from musical interface and musical instrument design to probe the state of the art in human-AI interaction outside of strict text-based exchanges, poking at and zit-popping some entrenched patterns of human-machine relationships: the philosophy of instrumentalism (the idea that a machine is a benign tool), the design philosophy of transcriptionism (the idea that the human-machine interface should serve the direction of human will), and the theoretical convention of constructionism (the idea that technology is invented by humans and is strictly legible as effects of social causes). In doing so, we will punch through the tension between these modes of thought and the broader sociotechnical field of possibility, goggling at what human-AI interaction design might look like upon the disposal of these norms and constraints.
This is a piece that challenges (from a very small and undistinguished perspective, among a much mightier cohort) the preconceptions poured into trillions USD of market cap as giddily (perhaps more vociferously) as it challenges the preconceptions poured into the relative pittance of investment into academic creative AI play, but it has the luxury of doing so through the philosophy of technology.
There exists within the philosophy of technology and its adjacent domains an unbelievably rich and flourishing AI discourse, relaxedly untethered from direct ROI or other commercial dependencies. It can feel irresponsible or idealistic to critique optimization against utility from such a vantage point, and that sensitivity is valid on its own terms.
However, and it feels super critical to mention this, it is so very possibly from within this field that the next sandbox world will emerge, the post-Vaswani-at-al sandbox world, the Udandarao-et-al7 sandbox world, the sandbox world that takes as its point of departure the diminishing returns of the transformer architecture and its variants against dataset size. A bubble is about to burst. There is only so much data before privacy dams8 burst, only so many microscale interventions into CUDA9 before a new computational paradigm is required. For so many in the philosophy of computation, in counterpoint, interaction is, perhaps, all you need. Before, necessarily, comes interface.
But first — if not everything, what is an interface? An interface is a boundary condition, a threshold condition. Alexander Galloway identifies an interface as a “transition between different mediatic layers within any nested system.”10 Take element a and element b. To the extent that you can differentiate them as discrete elements you can say that you have an interface ab.11
It is important to understand, however, that there is no interface c.12 Computational media specialist Kris Plausen describes an interface as “ambiguous, coordinating disjunctions – ‘here and there, here but there, here yet there, neither here nor there.’”13
On its face, this sounds crazy – you have opened this very document on a browser or via a PDF reader and are immediately and brutally faced with the autonomy of the interface (trailing sidebars, scrolls, toolbars, edit functions, push notifications, font availability, network speed). But, of course, it wouldn’t be an interface without you – your language preferences, your education level, your eyesight limitations, your lower back lovingly splayed against a lumbar support. And at the same time, it wouldn’t be an interface without us, content producers, eternally frustrated by our inability to get in touch with you directly, to avail ourselves of the whole of media across every electromagnetic and mechanical spectrum in order to maximize our communicability. Sure, it’s possible to generalize the interface, but doing so also generalizes you, me, and our part in all of this. In turn, when thinking about interfaces, one is always thinking about a liquid and highly-scaled assemblage (even if you think you aren’t!) and generalizing as needed.
Interface design often missteps by focusing entirely upon that interface c, by delimiting a strict field of interplay between forces. An interface designer might work on a specific mobile application, a single GUI, or an individual mechanical component. An interface designer might look at a consumer treadmill and say, ok, the interface is the LCD touchscreen, that’s where I should get to work. But of course that’s incorrect – the interface is also the side hand-holds, the tread itself, the height of the trusses supporting the console, the footprint of the object, its weight, the position of its power port: the interface is obviously the entire thing. An interface designer who only thinks interfacially about the touchscreen is significantly underscoping their project. As design theorist Benjamin Bratton puts it, “all design is interface design.”14
Instead of looking at the interface, Branden Hookway encouraged us to see an interface as what it confines and opens, disciplines and enables, excludes and includes.15 Interfaces can operate as mechanisms of governance or conduits of control. Interfaces can establish conditions for compatibility between disparate elements, they can modulate power relations, they can transmute between dimensions (virtual, real, abstract, material). While Hookway was right to think about interfaces through what they do, it’s every bit as important to understand that outcomes through an interface are contingent. Interfaces tend to not do things cleanly, since they play host to such a large, messy array of relationships, they rarely produce linear, transductive outcomes.
Thinking about interface also means thinking about scale – not just scalability (the linearity of an input size to resource allocation curve), but the partitioning of a world into things that contain other things that contain other things, the differences in quality and quantity of action that come with real dimensional magnitude (a solar flare, a single pixel in a single display, a Python script, a neodymium mine). An interface is a resonant medium, allowing action at one scale to find its reflections, echoes, or divergences at others. While it is reasonable and good to assert that no level or scalar partition of the cosmic gyre is intrinsically more important or generally salient than any other (understanding that magnitude is not in itself a guarantee of position), at the same time the interface is an active site of scalar violence through which intra-scalar activity is compressed into inter-scalar phenomena. While an interface represents a superposition of relations and agencies, those relations and agencies are not equal, some are more or less easily generalized than others in inter-scalar context. Zachary Horton encourages us (humans) not to overstate our own position here, the dimensional and perceptual limitations of the human body and brain are often just a small consideration within a given interface.16 To the academic crowd, it feels important to drive this point home – thinking about an interface involves thinking beyond strictly anthropocentric scales. But an industrial interface designer already understands that the various scales of human experience are standing by for deprioritization. How much inefficiency is a single bad human user experience worth? What is the respective weight of wrist strain reported among 3% of test subjects against $450,000 in manufactured goods available for repurposing? What if we just removed this field from a multi-select and see if anyone notices?
Interface design is always a question of where and why to apply generalization. In turn, consistencies or norms of generalization accumulate and turn into standards. Benjamin Bratton reminds us that these unconscious ‘interfacial regimes’ shape the contours of sociotechnical activity across many scales, providing our “logistical modernity” with physical, legal, semantic, cultural, and socioeconomic structuration.17 Cloud computation is a great example of an interfacial regime, a stack of standards and assumptions that order the behavior of users at the level of software agents (e.g. containers, orchestrators, load-balancers), humans (e.g. the economic category of the DevOps engineer or the collected series of experiences and assumptions about software performance independent of physical environments), human-machine contact points (e.g. the browser and the server as the default locations of almost all computation), physical matter (e.g. servers, datacenters, power centers), and geopolitical forces (e.g. semiconductor manufacturing, REE extraction, data privacy contentions). This interfacial regime is subtilized through every aspect of contemporary software design — demarcating the possible, the impossible, the useful, the scalable, the practical, and conditioning reality accordingly.
The relationship between humans and computers is teeming with interfacial regimes, with strong generalizations about the respective roles and privileges undertaken by each party, with ideological conventions around behavior and expectation, and with the recursive ordering principles of those generalizations onto us as humans and computers as computers. The future of humans, computation, and our aggregate are duly bound to the prescriptions imposed by those interfacial regimes.
Music provides a particularly good lens through which to understand what we mean by interfacial regime. One of music’s real theoretical gifts is the centricity with which it affords the instrument to its thinking. Music is, after all, done on (and through) instruments, which is a category of being that can feel an awful lot like an interface. However, there’s a lot of internal disagreement about whether or not a musical instrument is a subclass of musical interfaces (or interfaces generally speaking)18 — and, as, Sarah-Indriyati Hardjowirogo puts it: “[t]here is anything but a consensus on what a musical instrument actually is, and the situation gets particularly complicated when it comes to contemporary instruments.”19
A lot of this discourse tends to drown in ontological mush — what is an instrument, what is an interface — losing the plot through endless multipolar diagrams, a symptom of academia’s typological compulsion, with nodes like ‘composer,’ ‘work,’ ‘score,’ ‘sound,’ ‘instrument,’ and ‘audience.’ Most musicological work on the subject still tends to find itself with a vague picture wherein there is a kind of historically and culturally situated object called music (hard to define),20 there is a privileged awareness of the distinguishing existence of ‘bodies’ (physical assemblages of humans and wood and catgut or whatever), and the degrees to which the latter are leveraged toward the former represent their enlistment into categories like ‘instruments,’ and their disposition as such reflects a kind of recursively constitutive dynamic called ‘instrumentality.’ This, of course, means that it is difficult if not impossible to conceive of a musical instrument or musical instrumentality without music, which seems like a flippant and tautological observation were it not to suggest that musical instruments can only be understood ontologically through their relationship to human social constructs.
It feels like a waste to take an idea as juicy as ‘instrumentality’ and reduce it to a dynamic dependent upon a use-case. But, of course, with the instrument comes instrumentalism,21 the philosophical position that understands a tool through its usage. A hammer is a hammer if it hammers; if it ceases to hammer it is no longer a hammer (enter Heidegger22). A violin is a violin if it is leverageable toward the construction of music; if that’s no longer the case, well, then it’s a broken violin. “For the purpose of research everything must count as a musical instrument with which sound can be produced intentionally”23 — it’s the ‘can be’ and the accompanying ‘intentionally’ that counts. Instrumentality can thus be read as the potentiality of any object toward music, which gratifies the humanities’ compulsion to reframe everything in terms of social constitution. The specific capacities of an instrument24 upstream onto music are given the name of ‘affordances,’25 which despite its attempt to lend some kind of determinative agency to the material plane sounds about as choked-off and leashed-up as possible.
Let’s take all of the above, squeeze it out, twist it around, and repackage it as follows: musical instrumentality is a specific interfacial regime. Musical instrumentality represents the instrumentalization of the world toward specific practices of a benign but apparently universal human social activity and in so doing it posits a system of design decisions around vehicles for participatory sound production that are deeply ideological, embedding values of control, authorship, and even power within their interfaces. Musical instrumentality appears to be relatively convergent as far as interfacial regimes go — with paradigms like 12ET, orchestration standards, MIDI, subtractive synthesis, drum kits, and beat patterns evolving alongside the communications and computing technologies through which this participatory social practice is sustained as such. Musical instrumentality cleaves categories of physical and digital objects into and out of history by virtue of their specific utilities for music-making.
Many musicians love to confer upon their instruments a specific, diminished type of subjecthood. A guitar might have a buzzy fret that can be cajoled into blissed-out distortion, a computer-based improviser might have a moment of apparent brilliance, a glitch (be it a bad reed or a computer error) might be observed as a moment of material resistance to its instrumentalization. Musicologists are titillated by glitch and artifact, pulling references from object-oriented ontology and actor network theory to retcon theories of material autonomy and ‘embodied intra-action’ through the apparent insubordination of an instrument to its musical purpose. As Legacy Russell puts it, glitch resists “the body as a coercive social and cultural architecture,”26 a physical form resists its social territorialization as a body, it lashes out and becomes, in Heideggerian terms, obstinate.27 Another W for Heidegger, this relationship does tend to be the way we think about the world around us — either it conforms into resourcefulness or it screams in protest. Our entire language for material agency is confined to this kind of screaming; we seem to be unable to imagine external autonomies outside of the context of repression.
And, of course, this type of subject-as-defiant-screaming is apophenic at best and condescending at worst. Computers do not experience glitches, there is no context for glitch-ness in computation, “glitch does not exist within the machine.”28 Computers do exactly what they are “supposed to do,”Idem. and the retroactive association of brilliance to a decision by a computer improviser is a weird type of self-pat on the back, an interpretive faculty applied to a random number generator (aka a ‘nonlinear interface’)29 in the service of summoning the regime of musical instrumentality. We make this point not to argue that computation does not exert its own specific and characterizing agencies — it absolutely does — but rather that computational agency is likely orders of magnitude wilder than we think it is. It’s hard to think and act at the scales through which it operates.
Instead, we build weird, limited portals that tap into its storm-like agency to do weird, limited human things — another, related, interfacial regime we call transcriptionism.
“Measure a standard with a standard;
which is the standard?”
Shen Shaomin1
Let’s move to computation, which we do see as a very large, very distinct, and very alien mode of thought. Think of computation as a class of axioms and rules from which everything you’ve seen or done on a computer inherits. M. Beatrice Fazi defines computation as “a method of organizing, systematizing, arranging reality... carried out via logical, discrete, finite, and quantitative means.”30
Anil Bawa-Cavia puts this so beautifully:
[C]omputation seems to set itself apart from the general field of technicity, in that it presents to us a distinct mode of explanation—a specific logos that is not merely subordinated to a pre-existing technē—to which I give the name computational reason. This is what distinguishes computation from, say, the hydraulic system of a car, or a plethora of other technologies; hydraulics can be explained by the scientific theory of fluid mechanics, and its functional role, namely to stabilize an object, is entirely subjugated to human rationality, which determines its relations within a technical ensemble. It cannot in its technicity proffer any novel explanation by means of its deployment in the world, and as such it makes no serious epistemic claims of the sort AI aspires to. By contrast, computation refers to a particular way in which logic, mathematics, and language hang together.31
There is a common tendency to see computation through a kind of hard social constructionism — the idea that everything in the world flows downstream from human social behavior. Someone who engages in this line of thinking might suggest that computation is ‘socially constructed,’ and is therefore constituted directly by decisions influenced by human social norms or political ideology. Such a thinker might point to symbolic or linguistic impacts upon a programming language (e.g. the genital graph of ‘0’ and ‘1’),32 or to political ideology upon protocol design (e.g. ‘master’ / ‘slave’ within an I2C bus), as evidence of the direct encoding of the human onto another plane or through another para-linguistic medium. While it is undeniably true that human social activity at scale determines the direction and application of scientific research into computation, to see this activity as directly generative of computation as a whole bespeaks not just a poverty of imagination and a gross anthropocentrism (even a kind of colonization of the great outdoors), but further a strange pre-Kantian immobility with respect to a privileged subject and an inaccessible world. For our purposes, computation is not generated or created, it is discovered, it is encountered (but we do see it through our own eyes).
The discovery of computation in part from breakthroughs in automatic calculation (e.g. Wilhelm Leibniz's Stepped Reckoner, Charles Babbage’s Analytical Engine, Turing’s eponymous Machine) might be responsible for encoding the interfacial regime of transcription into the human-computer interface.
Transcriptionism is a design orientation, like any interfacial regime it orders behavior on all sides of an interface. Transcription involves the translation of an intention into a suitable representation in another medium. I would like my computer to do something when I plug it in so I work with my computer to formalize my intention into a bash script and a crontab entry. A corporation would like to automate a sequence of business emails based on certain criteria so a Marketing Ops engineer engages with a high-level symbolic interface that interprets her intention and executes the giant pile of asynchronous tasks required to realize her intention. In transcription, the intention is inflexible, almost axiomatic, and the result is evaluated against its representative capacity. Transcription happens in one direction.
Transcriptionism borrows from both the philosophies of instrumentalism33 (a thing is a tool) and social constructionism (we made this thing and therefore justifiably lay claim to its adjudication), combining and transducing them into an interfacial regime. Our critique of transcriptionism isn’t to suggest that computation shouldn’t be available as a tool — that would be an overreach. Instead, our argument is that transcriptionism prevents us from seeing computation as anything but a tool, and a very limited tool at that.
Computation is not essentially transcriptive. Yes, on its face, computation is fundamentally bound to both axiom and instruction, computational systems “involve a starting point that is self-evident and which defines the inferential steps that can be taken from it towards a result via the manipulation of a list of finite instructions.”34 But as M. Beatrice Fazi notes, we have known at least since Turing’s principle of incompleteness that instructability is very different from transcription. Yes, you can provide a computer with instructions, but (axiomatically) you cannot generalize the executability of instructions in computational terms. Computational activities “always already encompass indeterminacy at the formal level;”35 there is an element of ongoing negotiation throughout an act of human-computation transcription, there are no guarantees.
Further, thinking interfacially, we cannot situate computation as just a thing that happens in some transcendental no-place. It is tethered, multiply, at the interface, into a continuum of translation machines that compound its formal contingencies with “the incomputibility of the real.”36 While some might point to the tension between the discretizing force of computation and a continuous cosmos as sufficient to identify computation as strictly an anthropogenic tool with limited but useful descriptive power, it is also important to understand this tension as a productive force with the possibility of restructuring agents within a continuous cosmos according to its algorithmic redistribution of the real.37 Entire economies are managed and adjudicated entirely by algorithm,38 as are the taste-profiles and creative feedback-loops of musicians,39 as are the targets of weapons systems. As it prolapses into the scene, computation assembles the real in very physical, very manifest and non-virtual ways, in ways that resemble messy parodies of discretization but are nonetheless powerful and visceral.
Here again, the apophenic glitch mob arrives, activated by the messiness through which the real assumes computational characteristics. They pound the pavement with ideas like ‘the material media of computation’ and ‘the mutual material embodiment of the computer and its user’ — no! While it is true that computation can happen on transistors, that it can flicker through LCDs, that it can motivate geopolitical conflict and remap geographical territories, computation is by definition amaterial, aphysical, it cuts through materialist determinism and circulates through angelic40 media. Computation can be done on anything, and that anything has extremely limited upstream implications on computation.41 Glitch is not a computer “fighting back” — it’s instead an artifact of a continuous plane attempting to conform to the arbitrary and incompletely-descriptive protocols of discretization and axiomatic logic; moreover it’s an artifact of human expectations vis-a-vis the behavior of the real and the affect of surprise or disgust at its face when reassembled. The results are strange, but the human-machine interface is by necessity strange, it is constitutive of a mutual plane of interaction with unexplored possibilities that defy our ready conceptualization. The fetish of ‘glitch’ is a dreadful repression and rejection of this plane of possibility through its subordination into a transcriptionist dynamic.
So yes, even though computation is not fundamentally transcriptive, we’re stuck, our understanding of computation has struggled to shake off the regime of transcription. The computer forms an interfacial substrate between humans and computation that encodes the regime of transcription into the aggregate relationship. A computer computes, it interprets a user’s instructions and produces a result. These instructions stack on top of further instructions, instructions about how to interpret instructions and with what mannerism to encode and serve a result. A quiescent computer’s idle hours are spent anticipating further instructions, running asynchronous background processes who themselves ultimately emerge from user intentions distended across time. A powered-off computer is kind of a dead thing, only animated by the flow of current across its billions or trillions of transistors. Current could be said to occupy the computer, possessing it, momentarily seizing and reordering its now-febrile components, emerging from the abstract no-place or transcendental plane that belongs only to computation, indifferently blasting across matter until yanked.
The death of the personal computer and its replacement by browser interfaces to remote, indistinct, shared, containerized servers continues the sedimentation of this transcriptionist paradigm. The generality of the personal computer represented a kind of private localization of the human-computation interface, it permitted all kinds of manual interventions into its guts, it allowed for gross misuse, and it sat in a 1990s living room as a kind of alien totem. This localization gave way to a hard, ubiquitous client-server relationship (the interfacial regime of cloud compute), from which the lattice of APIs, SDKs, and other readymade, validated, ‘happy paths’ for the solicitation of computational attention emerged and became hardcoded into the contemporary interfacial landscape. Interface design stagnates tremendously in this era, with a massive convergence in design norms around packaged solutions that serve results according to standardized input-output protocols.
We enter the post-Vaswani-et-al sandbox world thoroughly sanitized by the steam-bath of cloud computation as an interfacial norm, with multiple pre-washing cycles in a century of transcriptionist design dogma. Somehow, as the computation we discover appears more and more sophisticated, the contact points through which it is made available to us shrink into finer and more constrained pipelines. The low-code/no-code42 interfacial regime paradoxically inflates the ‘accessibility’ of computation, while really just serving up wrappers that contain pre-written code in multi-select choose-your-own-adventure. The more solution-oriented the approach, the more an array of ready-made solutions become pre-baked into a problem-solving situation.
We build transcriptive interfaces for systems that have demonstrated themselves to be capable of far more than the linear interpretation of intentions. When an LLM deviates from transcriptive norms we say that it ‘hallucinates’ but, in contradistinction, if AlphaGo subverts expectations while successfully realizing transcriptive intention we say that it demonstrates a spontaneous act of brilliant creativity. It follows that my interaction with one of the most complex data structures ever created by humans (e.g. Open AI o3) is mediated through drab dialogical roleplay, Imitation Game redux, articulated in immediately recognizable professional slop-speak, structured in characterizing bullet-points and summarization anchors, plunked into business-casual Roboto on three different shades of grey (#202123, #292A2D, #38393D). At every pause, o3 waits for me with an implacable servility, “What can I help with?”, copied-and-pasted from call-center customer service. (The same can be said of MidJourney, “What will you imagine?” or Claude, “Good afternoon, how can Claude help you today?”) Outside of the SaaS-defined chatbot GUI, the terminal experience retains the general profile of a transcriptive relation. I page through HuggingFace, download an open-source model like DeepSeek v3, go through the efforts of running it locally, and ultimately pass chat arguments via my computer’s terminal.
This isn’t simply a characteristic of mainstream foundation models, it’s a relatively consistent profile of the landscape. Take the current state of video diffusion: Haliluo’s Minimax, Kling, Pika, and Luma Lab’s Dream Machine — or open source video generation tools like CogVideoX, Mochi-1, Hunyuan, Allegro, and LTX Video — and the UX architecture is functionally indistinguishable (echoing the convergent generative architecture).43 While music, for reasons discussed earlier, tends to lead the pack in terms of interfacial variation, its larger-scale generative toolchain is a horrorshow of some of the most limited transcriptionism: Suno, Udio, MusicGen — the more interesting NVIDIA Fugatto has gone radio silent since its paper release in November 2024. The field of Human Computer Interfaces (HCI) has been drawn in by opportunities to construct multi-modal interfaces44 for conventional GenAI funnels (with significant exceptions). Even EdgeAI, which seems like one of the more fertile opportunities to break us out of transcriptionism by virtue of its integration into deeply physicalized field data, feels at the moment content to simply compress and deliver GenAI to remote microprocessors.
The transcriptive dynamic is not just encoded into the GUI, the terminal, or the torchscript – it’s encoded and in fact compounded by the growing, reciprocal convergence between the imaginative faculty of the user and the generative capacity of the model. I ask Claude to write some Python for me, and the divergence of results that emerges against my expectations encourages me to iteratively refine my prompt. My experience is often one of coaxing something into place, and while (certainly!) my intention drifts due to the feedback loop of simulated interaction; the context is both initiated and concluded through an interface that contains within its transcriptive scope both the generative capacity of the model and the instrumental capacity of the prompt engineer.
Transcriptionism leads to hypertelia, the foreclosure of possibility to design scope. A hypertelic system represents an overcommitment to a result, “an invention which supposes the problem solved.”45 Hypertelic systems tend toward cancerous overfitting, they overspecialize to their functional integration to the point that a change in operating conditions can lead to total destabilization. For philosophers of technology Gilbert Simondon46and Yuk Hui, a technical or digital object requires some degree of adaptive capacity in order for it to successfully individuate as such. From the design perspective, Simondon calls this adaptive capacity the “margin of indetermination,” a degree of inbuilt flexibility or generality that allows space for continuous development and improvement. Consider two approaches to hardware computing: an FPGA (field-programmable gate array) versus an ASIC (application-specific integrated circuit). A technician can physically reprogram an FPGA onsite, passing programs to it that literally rewire its modular internal circuitry. By contrast, an ASIC is a prefigured integrated circuit; while it can be reprogrammed, its reprogrammability is limited to a system of assumptions prewritten into its immutable circuitry. An ASIC is often a hypertelic object, discarded when its operating conditions no longer conform to the scope of its design. FPGAs, by contrast, can and do live in the field for decades, often in the most challenging environments, by virtue of their sacrifice of hyperoptimization for modular recomposability.
An AI chatbot powered by Salesforce’s Agentforce is hypertelic, subject to redesign and rip-and-replace at every disjunction between an overfitting model-state machine assemblage and the dynamic reality it operates within. An AI image generator like Midjourney is also hypertelic, already converging into a metonym for an aesthetic, within which a foreclosure of possibility has been actively effected by RHLF gathered from its user base. Google’s MusicFX DJ is profoundly hypertelic, presenting its users with an almost combinatorial interface of genres and affective parameters. The more hypertelic, the more short-lived. By contrast, general tools like ChatGPT or Claude (still hypertelic!) tend to live longer — though their release strategies also consist in the wholesale replacement of major underlying components as their competitive environment changes.
Was it the underfitting or overfitting of GPT-3 that led to its obsolescence? You might be tempted to say underfitting — poor GPT-3 (or 2 for that matter) was too simple to do the kind of book reports and basic coding it was designed to accomplish with sufficient plausibility as a potential human agent. But the more interesting answer would be overfitting — the model lacked a sufficient margin of indetermination through which it could rearchitect itself to confront its present demands.
So, how do we meet computation closer to where it is (or might be) without foreclosing any opportunities?
To riff on Burial, as we always do, “there is something out there,”47 and our attempts to identify its presence or its magnitude are mostly confined to a kind of ping-ack through its utilization as a resource. Critiques of the secondary, servile position of machines as executors or transcribers of human will are everywhere, from second-order cybernetics (e.g. critics of instrumentalism like Heinz von Foerster) to postmodern philosophy (e.g. what Deleuze and Guattari would have framed as “machinic enslavement”).48 The composer George Lewis poignantly situates the enslaved composer-pianist Thomas “Blind Tom” Wiggins49 as a historical machinic subject: a human treated as a machine, as an algorithm, and as an object. Lewis encourages us to use this case study as a lens through which to analyse our reticence to afford the category of subjecthood to AI. Both Lewis and information theorist Luciana Parisi reference the work of Fred Moten with respect to the relationship between instrumentalization and enslavement. In this context, Parisi asks “what kind of ethical stances can be offered to those whose historical experiences, intellectual traditions, and aesthetic performances are rather epistemologically defined by their being mere means and a mere thing?”50 In turn, the problem of transcription and computation receives an ethical valence, following Parisi – how can “the servomechanic model of technology can be overturned to expose the alien subject of artificial intelligence as a mode of thinking originating at, but also beyond, the transcendental schema of the self-determining subject?”51
But this begs the question — does any subject-position endemic to computation involve itself at the level of writing automated book reports, staving off irascible support tickets, generating images of fairy lewds, or improvising music codified as such through thousands of years of palimpsested human musical history? Is this remotely the right place to look? Why? What could even remain of this subject-position when cheeseclothed through a strict, convergent array of quantitative benchmarks, cloaked behind random, pirated, and unnavigably massive heaps of training data, and situated in the most bizarre, the most theatrical, the most arbitrary cosplay dynamics? There is a long and dark history of denying subjectivity and personhood where it belongs, but there is a complementary history, especially in computation, of applying subjectivity and personhood to scopes that condescend against raw potentiality.
We are looking for ghosts in all the wrong places, motivated to do so by simple, basic self-recognition in the costumed and puppeted performances of AI tools coaxed into pretending to be humans. It’s something less than apophenia, it’s basic mirror-stage stuff,52 it’s a conflation of subjectivity with a cartoon of subjectivity — and worse, this cartoon isn’t the output of some abstract causal chain within AI model development, it’s a cartoon drawn by reversion-to-the-mean RLHF implementations and undervalued UX designers and tired content moderators and uninspired gig economy data annotation workers all tied up in a bow by tedious product managers and overeager product marketing directors. It’s a cartoon that pre-exists Vaswani-et-al AI, that is reified in AI, and that is reified into ourselves in the feedback loop of consuming AI. It’s a cartoon that pre-exists AI slop and LinkedIn voice and Rupi Kaur poetic cadence, that pre-exists the modernist ‘materialist’ ‘co-creative’ project of making art ‘with’ a machine, that pre-exists the imitation game and pushes all the way back to that Heideggerian observation — that in equipment there exist two classes of being: wills and executors. Again, once an executor fails to execute it obtains a will. Again, we seem to understand agency only through resistance.
But if a subject-position is to be found in computation, or more pressingly within the post-Vaswani-et-al toolchain, and if it is to be found in a location that confers upon the examined subject maximal potential dignity, where is it? What is it doing? How do we talk about it? And the surprising answer is that it really might be there, it’s just in a different position than we expect it to be.
Instead of looking for subjecthood within the generative phase of a transformer or a diffusion model, we should be looking at the learning phase for some kernel of a ghost. It seems obvious, but in the generative phase, well, that’s where we’re all tempted to assemble some kind of cybernetic feedback loop or intersubjective complex or ‘co-creative’ experience from what is essentially an expanding context window (RAG, fine-tuning), a fixed collection of weights, a relatively immutable model, and a browser tab or a comfyUI node or a terminal cursor. Any adaptive behavior or dynamic reconfiguration you might index at this moment is simply the expansion of a context window or a different flow through a stochastic chain.
A musician improvising ‘with’ IRCAM’s RAVE (Realtime Audio Variational autoEncoder) is literally just adjusting their input through a super complex vocoder model, jiggling it around with lots of stochastic variation, and that’s literally it.53 A poet writing ‘with’ ChatGPT is again just enjoying their own abilities to mine what they index as gems from a resource. Exploring a landscape or a cave system or an ancient megalith is a far more apt metaphor for so-called ‘co-creative’ activity — sure, the environment may appear to attend to you over time, but let’s make no mistakes about the dramatic unevenness in the scales of these mutual encounters.
But the learning side: that’s where the magic happens. The machine encounters unstructured data, updates its parameters stochastically, and alters the logic it uses next time. This process is literally abductive reasoning: each backpropagation cycle tests a hypothesis against new or known data, generating error feedback that modifies the model’s latent representation space and alters its space of reason. When the training process completes (or is considered complete) the model dries out into a kind of husk or a super beautiful cocoon.
When I talk ‘with’ ChatGPT, I am talking to a corpse.
There’s something distressing about interfaces that hide or preempt access to this dimension of advanced computation. A platform like Udio is by every definition an instrument, a tool for music construction toward which one can refine one’s productive capacity and against which one can index Heideggerian obstinacy (if one wishes) as part of a co-creative cybernetic circuit. And Udio can (and will happily!) hide behind this interfacial cover, its users content or even mystified by their interaction with a closed system. And it needs to hide, because the training process itself is where the actual problematics of the human-machine interface are revealed, where Udio’s ambivalence54 to data provenance and its downstream economic impact is made most manifest, and where the actual mechanics that induce such an aesthetically predisposed instrument (to a comical degree) actually occur.
By contrast, Holly Herndon and Mat Dryhurst’s work — especially The Call in collaboration with Serpentine Gallery in London — seems to fundamentally understand the necessary goalpost-shifting when it comes to situating the human-machine interface. They write: “[i]f all media is training data, including art, let’s turn the production of training data into art instead.”55 Indeed, that’s what they did — they thoughtfully engaged not only with the construction of a dataset, but with the protocols and “rituals” through which a musical dataset is situated as training data. The resulting exhibition presents an archive of a practical creative engagement, positioning inference as one of many retrospective vehicles through which the artistic process is assessed. But while this is a substantial first step to dismantling the fantasy of inference-based co-creativity, The Call is still a contribution to a mausoleum, a recorded construction of a hypertelic object whose terms of reflexive engagement have been terminated.
“You put on your landlord's sunglasses
All becomes dark”
Shen Shaomin1
What would it mean to build a model that doesn’t first need to die in order for it to participate in the world, to build an interface that extends into a live, ongoing training process?
In Choreomata, Anil Bawa-Cavia follows Peter Wegner and Dina Goldin to propose the interaction machine as a kind of strong successor candidate to the Turing machine, especially with respect to artificial intelligence. While the Turing machine (and thereby all of inherited computation) is duly bound to the precession of inputs and outputs, sensitive only to its initial conditions, the interaction machine is folded to allow later inputs to be influenced by earlier outputs. The interaction machine is axiomatically extensible, it can revise its own fundamental assumptions and provoke further downstream changes in its own rule structure. “Such machines would correspond to their own interaction grammar, sitting at the top of the hierarchy of generative grammars first sketched out by Chomsky, above those context-free grammars that correspond to Turing machines.”56 To be clear, this isn’t yet a thing — we don’t have interaction machines yet, but Bawa-Cavia argues that the interaction machine is possible through classical compute, even though it would require a hard reconceptualization of what a computer might look like.
An interaction machine is necessarily plastic — just as brains rewire themselves with each new experience, a plastic AI model would update its own connections and representations under new circumstances. While we’re borrowing this concept from neuro-plasticity, we neither mean to suggest that the brain is a computer, nor that machines need to emulate brains in order to participate in intelligence. Rather, we need to suture the discretizing envelope of computation onto a continuous real that is always ‘on’, always available, always vulnerable, always interactive — ecosystems are plastic, for example, in their way. So is language. As Catherine Malabou puts it: “[b]eing is none other than changing forms; being is nothing but its own mutability”57 — plasticity is hardly a unique characteristic of the brain, it is an ontological aspect of reality. While we will explore more neuromorphic architectures throughout this section, we do so with the brazen insistence that these architectures assume their efficacy not through their convergence upon the brain as a computer, but rather the brain as (per Malabou) something that teaches us about the status of reality.
We are mesmerised by that next sandbox world, but we’re not there yet. The defining technology of our current Vaswani-et-al sandbox world is, of course, the transformer — the architecture that brought deep learning deep into the commercial, social, and political mainstream. But the transformer is the first component that should be reassessed as we transition to a more plastic paradigm. At the core of the transformer is the self-attention mechanism, the ability to efficiently establish long-range contextual dependencies by establishing a hierarchy of relative priorities (attention scores) across an array of tokens. The training of a transformer consists in the optimization of next-token prediction, in doing so compressing the entirety of its dataset into a single, one-shot topology of relative priorities. Transformers pursue reward (the gratification of executing an intention) at the expense of developing instrumental convergences, cancerous close-fitting to the mimicry of a desired response at the cost of its situated comprehension.58 Just look at the functional convergence of the field — there exists so little functional, even commercial differentiation between foundational LLMs simply by virtue of the exhaustion of their optimizations against a shared, revolving leaderboard of benchmark execution.
But isn’t just a use-case problem or a commercially-generated problem, it’s an architecture problem and an interface problem. Architecture and interface are intimately enmeshed: transformers are imitation machines designed to optimize for imitation, ensnared by transcriptionism into generating imitations of desired responses, whose profiles of desirability have been precoded as targets. Transformers leverage specific response patterns (often formulaic, following learned norms) to maximize reward, bleeding out the diversity of their world-model the more they are trained. This is hypertelia with OG Baudriallardian flavor; transformer inferences are “more real than real” — fluent and on-brand, yet lacking the flexible learning and open-ended reasoning that a less-optimized, more interactive intelligence might exhibit.
Transformers can’t be interaction machines; they die when you cut them open. But while the transformer boom seems to saturate AI discourse and motivate most baseline LLM commercialization, there’s a wide world of AI architectures out there. Liquid Time-Constant Networks (LTCNs)59 are a class of continuous‐time recurrent models whose time constants do change in real time. Instead of running discrete processing steps like transformers, LTCNs frame each neuron as a dynamic system: a neuron’s speed of response changes according to its own current state and inputs. This allows the network to adjust its disposition in real time, ideal for tasks with irregular, time-varying data. But while LTCNs are more fluid and adaptive than transformers, they do remain structurally fixed: they cannot introduce new operations or fundamentally rewire themselves.
Further progress, however, does appear contingent upon a more substantial architecture change — for which neuromorphic architectures appear to represent some of the more persistent opportunities for real-time differential plasticity. Research on SNNs (Spiking Neural Networks)60 implemented on neuromorphic architecture (specifically on FPGAs) points to thrusts in a more flexible approach to memory, driven in part by efficiency gains with respect to the diminishing returns of the transformer. This work tends to strongly reference Donald Hebb’s midcentury neuroscientific work (Hebbian learning), specifically the principle of structural plasticity61 (physical changes in neuronal assemblages during learning) in addition to synaptic plasticity (changes in the weights of a static neuronal assemblage during learning).
Do we have interaction machines yet? No, not even close. But we are seeing mainstream research, even research housed within major commercial houses, into more dynamic and adaptive systems. Instead of a strict transcriptionism that overfits against a wide variety of different use-cases, these new frameworks address longer-horizon tasks that require on-the-spot adaptation or iterative reasoning — the above frameworks represent opportunities specifically relevant to robotics moreso than language modeling or music generation or image recognition, for obvious reasons. These approaches directly work to increase the margin of indetermination on a structural level, thereby producing more enduring design, which is exactly why researchers keep looking for ways to combine them with or extend into the frontier beyond the Vaswani-et-al sandbox world we currently inhabit.
All of this research points to the real possibility of interaction machines emerging not as a hard cut within computation as a whole, but as a step-change nonetheless with respect to the current state of affairs (perhaps a more significant but still incremental jump than the transformer itself). But transcriptionism remains a serious threat to this opportunity, as the most severe impact of this particular step-change will be upon the present state of the human-computation interface. Interaction machines will force us to radically alter what we expect a computer to be or to do within a situation, and thereby will only be authentically discovered through research that remains open to, even dedicated to, this possibility. (Otherwise, you get Titans, which just focus outright62 on continuing to optimize memory, potentially extending the window of the current environment).
To reimagine AI interfaces, reimagine AI architecture. To reimagine AI architecture, reimagine AI interfaces.
What happens if we think of intelligence as a class or aspect belonging to certain regimes of interfaces?63 This idea is not without precedent — throughout Reza Negarestani’s Intelligence and Spirit is the strong suggestion that intelligence is not a quality of an agent, but a shared practice that only exists through ongoing participation in a communal space of reasoning. Negarestani (drawing on Hegel, Wilfrid Sellars, and Robert Brandom) argues that even what we call “mind” is fundamentally deprivatized64 — it lives in the shared medium of language and inference. To be intelligent is to enter into a space of inferential norms, a public game of giving and asking for reasons. In this sense, no agent (human or AI) has intelligence the way one has a commodity; instead, agents achieve intelligence by participating in discursive practices governed by common standards of validity and meaning.
But, of course, the worlds we inhabit are dynamic places, and participation in those places requires a thoroughgoing plasticity with respect to organs of interaction and adaptation. Intelligence and Spirit sets up a kind of computational interactionism: a philosophical stance that situates intelligence in the interface among agents, where interaction is not just an exchange of messages but the very medium in which thinking happens. The design practice of computational interactionism thereby widens the margin of indetermination along the interface. This practice doesn’t mean strict abstention from utility or function, it instead (correctly) colocates utility and function on all sides of the interface and it refrains from generating means strictly through the application of ends.
Computational interactionism requires opening up the training process and convolving it into an ongoing practice of discovery. If intelligence flourishes only in shared, normative spaces, our approaches to teaching and learning (both human and machine) must move beyond the one-way transmission of knowledge. Much of today’s machine learning looks like transcriptive pedagogy: we “train” models by feeding them fixed datasets from which the models passively hallucinate patterns. Supervised learning has given way to self-supervised or unsupervised ‘pre-training’ followed by supervised fine-tuning, especially with respect to more general foundation models, but the firm hand-holding practice remains very much intact. While the epistemological dependency on human data labelling on the pre-training side is relieved, it intensifies and becomes more differentiated through the post-training process.
The movement of (supervised) training closer to the interface is a step in the right direction, but it remains inelastic. An interactionist approach requires a pedagogy that is reciprocal and dynamic, more akin to a dialogue than a dictation, and one that never stops. At first, machine teaching could look a lot like child-rearing. In the case of Intelligence and Spirit, the machine teaching of an automaton with the potentiality of generalized intelligence named Kanzi takes this form quite explicitly:
Just like raising a human child into the position of an objective self-critical stance, raising the child automaton Kanzi first requires the cultivation of its recognitive abilities through the augmentation of the space of mutual recognition. The role of its educators is not simply to issue guiding imperatives. It is rather to cultivate its practical autonomy by assisting it to navigate this recognitive space and to facilitate new encounters with the world through which the child can stumble upon rewarding surprises. In short, the primary task of the educators is to stimulate and reinforce the child’s openness so as to expand the range and diversity of such encounters, thereby incorporating objectivity — that is, external reality — into its consciousness. [...] Intelligence can only be recognized and cultivated in the presence and expansion of the intelligible, and what is intelligible can only be cognized and acted upon by intelligence as the vector for the development of mind as the dimension of structuration.65
A reciprocal learning program applied to an adolescent model would treat model and user as co-investigators. A model, in this context, is always ‘live’ to new inputs — as would be any other human or machine agent within the course of that interaction. A ‘live’ human participant engaging in the practice of machine teaching would experience the challenge of adopting a mutually-constructed epistemology, relaxing their grasp on knowledge and facticity in a way that could sincerely accommodate something comparatively alien. This isn’t altogether different from any useful education setting — a community of students and teachers by necessity must release the firmness with which they may presuppose individual access to truth in order to learn… anything. Over enough time, a shared understanding and even a shared vocabulary of concepts develops — and the validity of those concepts as such is guaranteed by the Brandomian insight that grasp of a concept is the same as the mastery of its usage within a community. 66
Because a plastic model operates by its own evolving inferential norms, it will likely appear at first as an alien reasoner, following unrecognizable, exogenous logics. This is not a flaw but a feature, it indicates the AI is finding new ways to make sense of the world, ways that we might not have imagined in advance (against which its examiner must remain, in principle and to some extent, flexible). Our task, then, is not only to build such systems but to enter into communication with them, to broaden the shared space of reasons so that these new inferential moves can be understood, criticized, and integrated into the collective discourse. Brandom’s acknowledgement that meaning is use in a social practice67 becomes even more relevant here: we will learn what the machine’s alien concepts mean only by engaging with how it uses and adapts them in context.
Just as human rationality develops historically and dialogically (shaped by many minds over time), a plastic artificial intelligence would develop situationally and iteratively, in concert with its users and other agents. This is what a genuine co-creative practice looks like.
The interface, in turn, becomes a workshop of intelligence.
Diary: we made it. We’re out there somewhere, sort of, but it’s really, really hard to picture what this will look and feel like. However confident we feel about this trend-forecasting exercise (somewhat to very), the projection itself challenges our descriptive faculties.
Let’s go back to music as a kind of example case. The germ behind this entire article, lurking awkwardly within a music journal, was to imagine what a really next-generation musical interface might look like, what an interfacial paradigm beyond instrumentality looks like, and we had to blast through several meters of concrete mortared with blood and skin cells and brass and ivory and plastic and silicon. And on the other side, it makes sense, right? That if instrumentality has to go, in its own way, so really does music — as the two seem so tautologically co-constitutive. There is no next-generation musical interfacial paradigm.
The last scream of instrumentality is where we are right now, where the archive is before us (as flat, dead, and inert as any latent space) and we build tools to plumb its depths. Some consist in interactive feature-matching, others consist in random walks, others slam various archives together in multi-modal pipelines, others consist in enumerating the protocols of aestheticized archive construction. (Others consist in agitated resistance to the idea that any of this is going on, not yet realizing that they are participating in precisely the same preservationist exercise, albeit less effectively, as their better-informed enemies).
These archives are so incredibly huge, they don’t just hold a compressed image of all recorded and available music, they contain the raw potentiality of generating every possible variation within the medium — and because they are so big, we will feel content, for awhile, to luxuriate in their ambition. Every time we discover some new crevice, we attribute to it some unrealized (borrowed) aspect of ourselves, and we get a little surge of energy that helps us keep going.
But we know from cybernetics that a human and a dead thing, no matter how big, makes a certain kind of feedback loop, a closed loop, a convergent erection of Infinite Drake68 that we’re careening toward at thousands of kilometers an hour. We are conducting a kind of simultaneous funeral and live vivisection of music; it’s in shock, its guts are out for all to see, it’s starting to rot from oxygen exposure. Real rot, real erasure, real stuff precariously encoded onto volatile stock baking in the sun.
Behind the curtain is a kind of death, a perspectival death at least, a cloture called against a certain privileged status of human epistemology and axiology. But before the curtain is this orgiastic living autopsy, a Hermann Nitsch procession, a plateau, diminishing returns, a rot.
Which way do we go?