AI Hand Tracker

Updated: 10 Jul 2026

Import ONNX models in the Google MediaPipe format to perform hand detection and hand landmark tracking

Method #

This node uses ONNX models. Before using this node, ensure your machine is configured to support ONNX models by following the instructions in the Working with AI Models page.

This node provides hand tracking from a live video feed using the following input nodes:

The output can be linked to a Transform Array Source on an Array Cloner.

When connected to these output nodes, the node generates clones from a number of ‘hand landmarks’ produced by the hand landmark AI model.

This node has been developed to work with the Google MediaPipe Palm detection and hand landmark AI models. MediaPipe is an open-source framework developed by Google for building computer vision and machine learning pipelines.

The converted model ONNX files for use with this node can be downloaded below.

AI Model MediaPipe Hand Pose & Landmark Tracking

by Google. Prepared by Notch.

These two models provide a palm detection and a hand landmark detection. The palm detection model detects the presence of a hand and provides a bounding box around it. The hand landmark model takes the detected hand region and identifies specific landmarks on the hand, such as fingertips, knuckles, and wrist positions. By combining the outputs of these two models, the AI Hand Tracker can accurately track hand movements and gestures in real-time.

Download Hand Pose Model

Download Hand Tracker / Landmark Model

License & Disclaimer

These models are derived from those available as part of the MediaPipe open source project. They are licensed under the Apache License, Version 2.0 (the “Licence”); you may not use these files except in compliance with the Licence. You may obtain a copy of the Licence at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the Licence is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Licence for the specific language governing permissions and limitations under the Licence.

Additional notice for MediaPipe files under tasks/cc/text/language_detector/custom_ops/utils/utf/: The authors of this software are Rob Pike and Ken Thompson. Copyright (c) 2002 by Lucent Technologies. Permission to use, copy, modify, and distribute this software for any purpose without fee is hereby granted, provided that this entire notice is included in all copies of any software which is or includes a copy or modification of this software and in all copies of the supporting documentation for such software. THIS SOFTWARE IS BEING PROVIDED “AS IS”, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. IN PARTICULAR, NEITHER THE AUTHORS NOR LUCENT TECHNOLOGIES MAKE ANY REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.

Recreation Steps

This model has been prepared for use in Notch by converting the original MediaPipe models to ONNX format. The following steps were taken to prepare the model:

The original MediaPipe models were obtained from the MediaPipe open source project. For conversion, use the Hand Pose .tflite file and the Hand Landmark .tflite file.

The models were converted using the following Python script:

pip install tf2onnx tensorflow 

then for all models run (changing source and destination filenames):
python -m tf2onnx.convert --opset 16 --tflite ".\<modelname>.tflite" --output ".\<modelname>.onnx"

The Palm Detection Model needs to be loaded as a resource and set in the ONNX Model (Hand Pose) resource property. The Hand Landmarks Model needs to be loaded as a resource and set in the ONNX Model (Hand Tracker) resource property.

Setting the incorrect ONNX models to the model resource properties will result in a non-working node.

When the node is set up with the required ONNX models, hands can be detected and hand landmarks tracked.

A maximum of 32 hands can be tracked simultaneously, but in reality the overhead of the AI model inference passes is quite high on both the CPU and GPU, so 2 or 3 pairs of hands may be a more realistic goal within real-time constraints.

The AI hand tracking model has the following characteristics:

In general the quality of the tracking is very good in most reasonable light conditions, but if cameras go into low light mode the tracking accuracy can degrade due to a noisy input
Extreme hand palm poses or palm directions that face away from the camera can stop hand AI inference passes working due to lack of information or occluded hand features (either occluded by other objects or self occlusion)
The node is configured to best detect hands within a 2-3m range
The CPU and GPU cost of hand detection increases linearly with the number of hands tracked
If you observe either jittering or misplaced landmark points, it is likely that the lighting is too low or the AI models are getting confused with the setting or scene-adjacent objects in the video stream
Sometimes the hand landmark model can incorrectly hallucinate hands from the stream that do not exist; changing the Pose Confidence Threshold, Pose Overlap Threshold and Landmark Confidence Threshold levels may be able to offer a better detection success rate

Performance Diagnosis #

Visualisation of tracking information (both hand pose and landmark information) can be performed. This information can be visualised when the node is in Previewing mode or within the 3D viewport.

2D visualisation can be shown when the ‘Preview in Viewport’ property is set to anything other than ‘Off’ and the Visualisation ‘Show Points Mode’ is set to ‘Viewport’; the landmarks and detection bounds will be visualised in the Preview image
3D visualisation can be shown when the Visualisation ‘Show Points Mode’ is set to ‘Local Space’; the landmarks will be visualised in the viewport as a 3D pointcloud

Parameters

Attributes

These properties control the core behaviours of the node.

Parameter	Details
Preview In Viewport	Preview the generated image as an overlay in the viewport. Off : No preview is generated. RGBA : Preview the image blended with alpha in the viewport. RGB : Preview the colour channels in the viewport. Alpha : Preview the alpha channel in the viewport. PIP : Preview the image blended with alpha in the viewport, in a smaller picture in picture display, on top of the existing content.
Apply PostFX Before Alpha Image Input (Legacy)	When enabled, the alpha input image is applied after the postfx pass, overwriting any effects the postfx would have applied to the alpha channel.
Active	Enables or disables the effect. Disabling the effect means it will no longer compute, so disabling a node when not in use can improve performance.
ONNX Model (Hand Pose)	Select the Media Pipe hand detection model (calculates hand position and Rotation/orientation).
ONNX Model (Hand Tracker)	Select the Media Pipe hand landmark model (tracks hand points such as finger tip).
Pose Confidence Threshold	Minimum confidence for landmarks to be trusted. If the confidence falls below the threshold, the landmarks model has failed to detect the hands correctly (hand pose is too extreme for the model to detect key features).
Pose Overlap Threshold	Determines when overlapping detections are considered the same hand (the minimum non-maximum-suppression threshold for hand detection to be considered overlapped).
Landmark Confidence Threshold	Minimum confidence for hand landmarks to be trusted.
Cloning Mode	Control which landmarks are used as cloning points.
Cloning Output Mode	Control how the cloned points are transformed in space. 2D : XY plane. 2.5D : XY plane with depth. 3D : World space XYZ (note this mode assumes a fixed pinhole camera of 30 degrees for projection in space)
Cloning Inherits Rotation	Clones orientate with the bank of the tracked face.
Cloning Inherits Scale	Clones scale is overridden by the size of the hand in the source input footage. Only available with ‘Palm’ Cloning mode.

Visualisation

The properties control the visualisation of the tracked data.

Parameter	Details
Visualisation Mode	Enable the visualization preferences. Off : Do not visualise any tracking data. Local space : Visualise landmarks relative to the node’s local space. Viewport : Visualise landmarks in the preview space when using ‘Preview In Viewport’.
Show Detection Bounds	When previewing the node, it will visualise the bound box of the tracking.
Show Detection Key Points	When previewing the node it will visualise the tracking of key features.
Show Detection Landmarks	When previewing the node it will visualise the Landmark’s tracking information.

Time

The properties control the time at which the node is active. See Timeline for editing time segments.

Parameter	Details
Duration	Control the duration of the node’s time segment. Composition Duration : Use the length of the composition for the node’s time segment duration. Custom : Set a custom duration for the node’s time segment.
Node Time	The custom start and end time for the node.
Duration (Timecode)	The length of the node’s time segment (in time).
Duration (Frames)	The length of the node’s time segment (in frames).
Time Segment Enabled	Set whether the node’s time segment is enabled or not in the Timeline.

Inputs

Name	Description	Typical Input
Effect Mask	Mask out areas that Post-FX applied to this node won’t be applied.	Video Loader
Alpha Image	Use a separate video nodes luminance values to overwrite the alpha channel of the image.	Video Loader
Parameter Value Array	Used to set the parameters of the node using a float array.