YOLO Tracker

Updated: 27 Jun 2026

Detect and track objects in images and video streams using a YOLO AI model.

Example .dfx

Method #

This node analyses an input image or video stream and identifies objects within the frame using a YOLO (You Only Look Once) object detection model. YOLO is a real-time object detection approach that uses a single neural network to process an entire image in one forward pass. It directly predicts bounding boxes, objectness scores, and class probabilities without relying on separate region proposal steps. This design makes it very fast and well suited for real-time applications such as video analysis.

For each detected object, the node generates a transform based on its position and size within the image. These transforms can be passed into a Cloner to generate and control clones that correspond to the detected objects.

The Confidence Threshold property can be adjusted to control how confidently an object must be detected before it is included in the output.
Temporal Coherence can also be adjusted to improve continuity between frames, helping to maintain stable detections and reduce flickering or rapid changes in detection results.

This node has been developed to work with the YOLO object detection AI model, which can be downloaded below.

License & Disclaimer

This model is derived from those available as part of the Ultralytics open source project. You can access the accompanying the recreation source steps below in Recreation Steps and the model above with the Download button. They are released under version 3 of the GNU Affero General Public License (the “Licence”); you may not use these files except in compliance with the Licence. You may obtain a copy of the Licence at https://github.com/ultralytics/ultralytics/blob/main/LICENSE.

Unless required by applicable law or agreed to in writing, software distributed under the Licence is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Licence for the specific language governing permissions and limitations under the Licence.

Recreation Steps

This model has been prepared for use in Notch by converting the original YOLO models to ONNX format. The following steps were taken to prepare the model:

The original YOLO models were obtained from the Ultralytics open source project.

The models were converted using the following Python script:

from ultralytics import YOLO

model = YOLO("yolo11n-seg.pt")  # path to your pretrained or custome YOLO11 model
model.export(format="onnx")

The Yolo model needs to be loaded as a resource and set in the ONNX Model resource property of the node.

Setting the ONNX Model resource property to the incorrect ONNX model will result in the node not working correctly.

Parameters

Attributes

These properties control the core behaviours of the node.

Parameter	Details
Preview In Viewport	Preview the generated image as an overlay in the viewport. Off : No preview is generated. RGBA : Preview the image blended with alpha in the viewport. RGB : Preview the colour channels in the viewport. Alpha : Preview the alpha channel in the viewport. PIP : Preview the image blended with alpha in the viewport, in a smaller picture in picture display, on top of the existing content.
Apply PostFX Before Alpha Image Input (Legacy)	When enabled, the alpha input image is applied after the postfx pass, overwriting any effects the postfx would have applied to the alpha channel.
Active	Enables or disables the effect. Disabling the effect means it will no longer compute, so disabling a node when not in use can improve performance.
ONNX Model	Select a YOLO model from the resource browser.
Input Normalisation	Scale the range of values coming into the model. None : Do not scale the incoming values. ImageNet Dataset : Scale the values to work with the ImageNet dataset specification. CLIP Dataset : Scale the values to work with the CLIP dataset specification. Bias Half : Offset incoming value range to center around zero. Bias Full : Set incoming value range to +- the range maximum.
Input Resizing	How the input texture is resized to fit the target size. Scale : Resize the input image to the target size, even if it distorts the original aspect ratio. Crop : Cut out a portion of the image to fit the target size without distortion. Letterbox : Resize the image while keeping its aspect ratio, then add padding to fill the remaining space.
Output Resizing	How the texture is resized on the output after it has been processed by the Yolo model. None : Output at the working resolution of the Yolo model. Input Size : Resize the output resolution to the same size as the input resolution.
Segmentation	Generates a binary mask for each detected object, identifying which pixels belong to the object and which belong to the background. Disabled : Do not calculate the segmentation mask for objects. False Colours : Calculate mask for each object, and give each object mask a different random colour. Mask : Calculate mask for each object, and combine them all into a single black and white mask.
Max Detections	Limits the amount of possible detected objects.
CLIP Class	Filter object detections by CLIP class object category definition.
Confidence Threshold	Only detections with a confidence score above this threshold are kept, while lower-confidence ones are discarded to reduce false positives.
Match Threshold	Decides how much two boxes must overlap to count as the same object. If the overlap is high enough, they’re treated as a match, if not, they’re considered different objects.
Temporal Coherence	Ensures detections remain consistent across consecutive frames by smoothing or linking predictions over time, reducing flicker and sudden changes
Temporal Coherence Frames	Specifies the number of previous frames used when calculating temporal coherence. Higher values can improve stability, but may introduce latency.
Temporal Match Threshold	Sets how similar detections must be across consecutive frames to be considered the same object, helping maintain consistent tracking over time.

Time

The properties control the time at which the node is active. See Timeline for editing time segments.

Parameter	Details
Duration	Control the duration of the node’s time segment. Composition Duration : Use the length of the composition for the node’s time segment duration. Custom : Set a custom duration for the node’s time segment.
Node Time	The custom start and end time for the node.
Duration (Timecode)	The length of the node’s time segment (in time).
Duration (Frames)	The length of the node’s time segment (in frames).
Time Segment Enabled	Set whether the node’s time segment is enabled or not in the Timeline.

Inputs

Name	Description	Typical Input
Effect Mask	Mask out areas that Post-FX applied to this node won’t be applied.	Video Loader
Alpha Image	Use a separate video nodes luminance values to overwrite the alpha channel of the image.	Video Loader