Interactivity Through Motion Tracking and Machine Learning
The following information is based off a lecture and workshop given to the computer science department at the University of North Carolina Wilmington. The workshop began with a live demonstration of the programming that went into Signal and Noise and evolved into work achieving familiarity with numerous programming languages. This information is designed to provide a starting point into motion tracking and a guide into how visual programming and machine learning can be used for creative and interactive projects.
This workshop is built off a continued sharing of knowledge between programmers and workers. The end of this document contains a reference guide to programmers who have helped me and helped create some of the base tutorials covered.
- Color tracking
- Motion to audio
- Motion Detection
- Optical Flow
Motion tracking can be used with numerous applications and open-source coding systems. I want to highlight multiple ways to engage with motion tracking for the benefit of creative practice and the potential of problem solving.
Max allows a user to create interactive systems through visual programming. Max, similarly to TouchDesigner, is shaped completely by the user creating the program. It goes as far as the assigned thresholds allow it to go. Max is used as a conduit for the creation of other programs, including projects based around motion tracking, MIDI, sampling, animation, etc.
Color tracking is a simple program that opens further lines of thinking in relation to motion capture. In this color tracking program, you are:
- Connecting a live camera feed or external video to a Max window
- Gathering RGB data on each pixel within the video feed
- Accessing RGB data with each individual mouse click, automatically matching individual pixels to a color swatch
- Transferring the RGB data to a number system that is defined by a minimum and maximum threshold. These thresholds will create coordinates for the tracker to follow dependent on the color selected.
- Outputting these thresholds back to a duplicate window within Max
- Selecting a color/object to track
This tutorial is included because of its simplicity and utility as a lesson into how Max handles data and how a user can control this data. It is sourced from Programming with People. I have adapted the program to run with an external video, but have kept the patch using the webcam within the program.
Figure 1 shows the visual programming in full. I've adapted the program from a walkthrough tutorial on Programming with People and added notes to better understand exactly what is happening throughout the patch.
Figure 2 shows the end product. I've used a looping video built into Max as an example. The basketball is being tracked.
Motion to Audio:
Initial research for Signal and Noise led to studying programs that output audio from motion. In these programs, you are expanding on the same ideas looked at in the color tracker. At a base level, you are:
- Connecting a live camera feed to a Max window
- Adjusting the threshold of the visual image to remove background noise while highlighting foreground motion
- Placing the motion within an x and y coordinate grid
- Assigning those x and y coordinates something to control (an audio piece, MIDI, etc.)
With these programs, you are figuring out how to output visual data into a numerical system that controls the parameters of something else. That parameter could be an on/off switch, speed, volume, etc.
We can start with a base program that triggers an audio piece as it registers a movement within a visual frame. The only parameter being controlled is the start of the audio. This program is sourced originally from Matthew Ostrowski. I have annotated the patch and changed the audio piece attached to it.
Figure 3 shows the visual programming in full. As the system reads a differencing in positioning in the webcam feed, an audio piece begins playing. While the piece is playing, the gate is locked, meaning there will be no trigger. As soon as the piece ends, the gate opens again, and motion outputs to audio.
Tracking programs frequently output to a monochrome image in order to clearly differentiate backgrounds and movement. RGB data creates a noisier picture.
Our last example, shown in Figure 4, takes the same idea as Figure 3 and adds tighter control and further parameters to it. The program is built off a patch created by Georgiy Potopalskiy and fortunately documented in full here. In the documentation, Georgiy goes through each operator and what it specifically controls.
Just like before, we are taking our webcam data and converting it into a monochromatic image. We are also setting the matrix size to fit within a MIDI pitch scale. Instead of controlling the on/off of a particular audio piece, we are controlling a MIDI scale within Max.
The white line that transects our window becomes the point where motion (pixel data) is output to audio. Everything above and below this line becomes cutoff. This is done by multiplying the matrix of the image with the matrix of the white line. The values of the transection are translated into the MIDI scale.
The purpose of these lessons with Max is to demonstrate visual programming and the logic behind creating a system that responds to parameters that a user is able to control. The difficulty with starting Max is the openness of it. With time, operators become more well known and you become able to build systems based around a specific goal. These motion tracking systems are built with nothing other than Max and a computer's webcam. Each of these programs can be expanded infinitely to incorporate audio, visuals, projection, and fully interactive environments.
While other examples we will look at make use of visual programming completed by a user, machine learning systems are able to track movements and figures based on pose estimation and confidence scores. In a machine learning system, data is fed into a computer, which outputs a program/solution created by analyzing patterns within the data and building on those patterns. Open-source machine learning is fairly recent and prone to errors and bias depending on the use and system.
PoseNet is one of these models. Through the use of a webcam, external camera, or imported video, PoseNet is able to estimate figures within a frame, analyzing key points such as elbows, knees, shoulders, etc. This data is useful within its own right but can also be exported into other programs to create interactive environments. For example, PoseNet can be implemented into TouchDesigner and Max in order to create interactive spaces.
Figure 5 demonstrates PoseNet's figure estimation having set the threshold low and the max poses recognized high.
The source images are taken from Appalachian Spring (1944), choreographed by Martha Graham, and Lovers Leap, a project completed in 2023.
sounds.pink is a project created by Brian Ellis that works similarly to PoseNet. Through the website, users are able to run pose detection through their webcam and immediately send the data to Max and begin pulling detection values and confidence scores. Programs can be expanded upon within Max to output further data such as distance estimation between key points. sounds.pink can also be paired with Ableton to create audio accompaniment to motion and emotion recognition.
Figure 6 shows data pulled from a sounds.pink patch on Max. On the left, key points are given a value and a confidence score.
The right shows data pulled from the same patch with an added distance algorithm that estimates the distance between my two wrists at continuing intervals.
These programs are built for you as part of the sounds.pink download package. I have linked the website and Brian's starting tutorial as well.
Machine learning systems are optimal in providing data estimation. Unlike the Max tutorials which worked off an XY grid, systems such as PoseNet and sounds.pink output numerical data that updates itself with every movement. However, as mentioned previously, machine learning is very susceptible to bias within the data gathering process. Systems such as PoseNet are continually learning to better output programs that are accurate by working with larger and more comprehensive data sets.
TouchDesigner is a visual programming software that can be used to create interactive environments, generative artwork, and performances. Depending on how you learn on an individual level, TouchDesigner can be more accessible to start out with. The system is based around nodes that you are able to drop into your program, creating a linear process quickly. Nodes are separated into categories. The two categories most used when beginning are TOPs (visual textures, image based) and CHOPs (processing of data).
TouchDesigner has several built in nodes to motion track/figure detect. These can be accessed directly through a webcam. The most straightforward of these programs is blob tracking. Blob tracking is a way to highlight movement within a frame. While blob tracking does not immediately provide the same data that PoseNet can, it is able to determine figures within an area, which can then be used to control other parameters within TouchDesigner. The program is created with two nodes: a video device in and a blob tracking node. However, this system can perform at high variance levels depending on environment and is limited by the version of TouchDesigner you are running. I would recommend experimenting with blob tracking, as it can be useful to determine people within an interactive environment, but we'll start with a system that differentiates physical movement from static positioning. This program is a visualizer for some of the similar programs we created with Max. You are:
- Connecting a live camera feed to TouchDesigner
- Creating a second video channel that duplicates the live feed on a delay
- Setting a threshold value for visible pixels
Figure 7 shows the full system and the final display output. In Max, operators were used to subtract previous frame's data from current ones. With TouchDesigner, we are doing the same thing manually.
The texture, time machine, and constant TOPs are creating a delay channel based on the live video feed. From there, the threshold is set for visible pixels. The goal of this threshold is to remove pixels from the background and the majority of pixels existing within facial and bodily edges.
Combined with the delay, this system creates a visual for motion detection.
TouchDesigner's built in operator Optical Flow is designed to frame difference automatically. The two different movement channels are sent to a red channel (X) and a green channel (Y). Movement is detected by comparing each frame to the one that directly precedes it. Similar to the last demonstration, staying completely still outputs a mostly blank option. As a subject moves, the channels begin to fill with visual information.
The following is less of a demonstration and more of a documentation of what Optical Flow can do without much manipulation. With three nodes, you are:
- Connecting a live camera feed to TouchDesigner
- Using Optical Flow to automatically frame distance movement
- Setting Optical Flow (movement) to control a particle shader
Figure 8 shows the two main nodes, Optical Flow and particlesGpu.
Changing the force of the Optical Flow, as well as the magnitude of particles, determines the intensity of the particle displacement.
With more fine tuning to the environment, Optical Flow can be used to build out interactive spaces with TouchDesigner's built in functions.
Creating a linear program in TouchDesigner is made simpler due to the built in nodes and drag and drop nature of the system. Similarly to Max, once you grow more experienced with what TouchDesigner has built in, your projects will expand quickly. You might find that Max is more intuitive with audio projects while TouchDesigner is for video, however, both systems occupy similar spaces.
References and More Sources:
These are only several of an infinite number of resources and tutorials that exist online. For the most part, these systems are only limited by the goals brought to them.