Complete head gaze interactions with the ring

Combine head gaze tracking and gesture recognitions from the ring for seamless interactions in XR.


Phuoc Trinh

8/23/20234 min read

The prevalence of head gaze tracking in contemporary AR/VR devices offers intriguing potential for user interface interaction and virtual object engagement. However, this potential remains largely untapped, as standalone head gaze tracking falls short due to its inherent limitations. This article aims to showcase the transformative capacity of the SoundxVision ring in overcoming these constraints and unlocking the true power of head gaze tracking. To begin, we will explore the essence of an effective input system and delve into the symbiotic relationship between head tracking and gesture recognition.

Essentials of a good input system and where head tracking falls short

Our extensive research and observations underscore the core elements a proficient input system must address:

  1. Object of Intent Tracking: The system should adeptly track the user's desired object of interaction—be it a UI element, virtual entity, or even a tangible object in the real world that they wish to engage with.

  2. Trigger Mechanism: Upon identifying the object of intent, a trigger mechanism becomes imperative to facilitate interaction. This could involve actions like tapping, pinching, pressing a button,…

  3. Navigation: Navigational gestures, such as scrolling and swiping on touch-sensitive surfaces or employing thumbsticks or pressing arrow buttons (left, right, up, down), are paramount. These gestures, commonplace in our interactions with graphical interfaces, retain their significance within XR devices.

A notable example of a robust XR input system that addressed all these 3 points is the Apple Vision Pro, which harmoniously integrates eye gaze and hand tracking. Here, eye tracking (with machine learning to filter out noise) serves to identify points of interest, finger pinching acts as a trigger, and tossing a hand in the air for navigation. Head gaze tracking (when standalone), in the other hand, can only identify objects of intent but lacks a mechanism to initiate interaction. To circumvent this limitation, certain applications employing head gaze tracking necessitate sustained gaze contact with an object for a predefined duration to trigger further engagement, as shown in the GIF below by Microsoft on gaze and commit guideline on HoloLens. Regrettably, this method incurs responsiveness drawbacks and restricts the range of potential interactions, this is where SoundxVision ring can be used to complete the experience.

A not very good example from Microsoft HoloLens design guideline for gaze and commit interaction, notice that user need to gaze at the object for approximately 2 seconds to select, this is very slow to compare to any input modality.


Head gaze tracking, when combined with the SoundxVision gesture recognition ring, can help users seamlessly interact with XR content. They can gaze at a desired object and use gestures to interact with it. For example, they can gaze at a picture and swipe (left to move to the next one, right to move to the previous one), or simply gaze at a button and double tap to activate or deactivate it—just as you already do on computers. It's very easy, isn’t it?


1. It’s widely available: Head tracking relies mostly on Inertial Measurement Unit (IMU), a sensor chip which is available even on the cheapest headsets and on your mobile phone, too, this chip is tiny and very power efficient. This means our “Gaze, tap, swipe” can be implemented on a wide range of XR device, even those light weight smart glasses (39gr and less) where fitting eye and hand tracking sensors is a big challenge or sometimes not even possible.

2. Pointer can be implemented to interact with contents built for computers and mobiles (such as web and applications). The implementation of pointer is made possible because human head is very stable, unlike eye tracking, where pointer is often not available due to fast and unexpected movements of our eyes, which causes "Midas's touch" effect that happens when every object annoyingly change their appearance as user glance at it. This is mentioned by SkarredGhost when he tried attempts to bring Apple Vision Pro UI on Oculus Quest Pro with eye tracking.

3. Power efficiency: as mentioned above, the IMU sensor used for head tracking consume a little amount of energy (can be as low as 2.26mW on some specific IMU), the computing power used for processing head tracking data is also minimal when compare to computer vision based input approaches.

4. Put user's hand in rest, or anywhere: our approach to XR input is to use micro gestures which require the least amount of hand movement, so that even with the hand in your pocket (jacket, please not a tight jean), user can still operate the XR device.

5. Privacy: eye tracking heat map can reveal a lot about a user, even their unconscious mind, head tracking data, in the other hand, is more challenging to deal with as it does not provide pin point accuracy of user's points of interest.


The integration of head gaze tracking and gesture recognition through the SoundxVision ring establishes a robust and dependable method for interactions within the XR environment. By incorporating a trigger mechanism (tapping) and gesture-based navigation (swiping), the ring significantly enhances the responsiveness of the user experience in comparison to relying solely on head gaze. This approach holds the potential to be applied across a diverse array of XR devices, ensuring a consistently seamless and immersive experience.​​​​​​​

And just like other interaction modals for XR, human physical constraints need to be considered when designed an application with head gaze tracking, for example the contents should be targetable without putting user head in awkward position, and UI elements such as buttons, sliders,... aligned in groups to avoid head movement. Microsoft provided good guidelines for head gaze based interaction here (Beside the gaze & commit, it is a very good set of guidelines).

A demonstration of SoundxVision ring prototype for interacting with UI in pass-through AR using head gaze as pointer, thumb double tap as trigger and thumb swiping for navigation, significantly more responsive compare to the example showed earlier.

My post content