Sound x Vision - 10 challenges of hand tracking

10 challenges of hand tracking

Hand tracking is a gesture-based interface that has a lot of benefits and provides an entirely new way to interact. This type of input, on the other hand, introduces challenges that do not arise with traditional input. The user's problems are to learn, to remember and to precisely perform gestures.

The developer must create a system that recognizes these gestures accurately. Observing gestures is insufficient for learning them since the observer is unable to distinguish between relevant and irrelevant movements. As a result, the developer must not only guarantee that gestures are recognized rapidly and correctly, but also provide a guide that allows for quick and easy learning of these gestures. Multi-touch and mid-air gestures are more challenging to teach than single-touch gestures.

Hand models. (a) Capsule model. (b) Cylinder model. (c) Sphere model. (d) Convex bodies for tracking. (e) Sum of an-isotropic Gaussian model. (f) Sphere-mesh model. (g) Triangular hand model. (h) Triangular mesh. (i) Loop subdivision Surface of a triangular control mesh. (j) Articulated Signed Distance Function (SDF) for a voxelized shape-primitive hand model. (k) Articulated signed distance function for a hand model / Source: Sensors 2019

Despite the fact that the study of hand tracking has progressed significantly and achieved high recognition rates in a variety of areas, it still faces numerous challenges, including the extraction of invariant features, transition models between gestures, minimal sign language recognition units, automatic segmentation of recognition units, recognition approach with scalability about vocabularies, auxiliary information, signer independent and mixed gestures recognition, and so on.

Here are some challenges that hand tracking technology confronts:

1. Degrees of freedom (DoF) limitations: Because the hand is an elastic object, there may be significant variances between identical gestures as well as great similarity between distinct gestures. The human hand has more than 25 DoF (the computer mouse is a 2 DoF device while current VR headsets and input devices like hand controllers are generally 3 DoF or 6DoF), its movement is tremendously flexible and complicated. Hence, the same gestures done by various people might differ, as can gestures made by the same person at different times or places.

Full degrees-of-freedom of a hand. It involves the use of 26 DOFs, being 3 for the translations of the hand in the x, y, and z directions, 3 for the wrist rotations, and 4 for the joint rotations of each of the five fingers, including the thumb / Source: Holden, Eun-Jung & Owens, Robyn & Roy, Geoffrey & Ieee, Member. (1999). “3D Hand Tracker for Visual Sign Recognition”

2. Information redundancy: The hand has a lot of redundant information because a crucial aspect of hand gesture recognition is identifying finger features, thus one of the redundant data is the palm feature.

Distance feature computed on gestures of the Sign Word dataset / Source: Ansar, Hira, Ahmad Jalal, Munkhjargal Gochoo, and Kibum Kim. (2021). "Hand Gesture Recognition Based on Auto-Landmark Localization and Reweighted Genetic Algorithm for Healthcare Muscle Activities"

3. Projection direction: The projection direction is really related to the hand position since it relates to the projection of the hand from three-dimensional space to two-dimensional space.

4. Scale problem: This issue arises when the hand poses in the gesture image are of different sizes.

5. Translation problem: The erroneous representation of features is also caused by the variation of hand positions in different images.

Gesture representation and feature extraction / Source: Chakraborty, Biplab & Sarma, Debajit & Bhuyan, Manas & MacDorman, Karl. (2017). “A Review of Constraints on Vision-based Gesture Recognition for Human-Computer Interaction”

6. Variation of illumination conditions: Any change in lighting has a detrimental effect on the extracted hand skin region.

7. Background problem: The complex background with other objects in the scene with the hand objects, some of which may have a skin-like color, causing a misclassification problem.

8. Shadow problem: Because of the non-smooth surface of the hand, it is easy to create shadows.

Example of segmentation of hand gesture recognition systems / Source: Khan, Rafiqul Zaman. (2012). “Comparative Study of Hand Gesture Recognition System”

9. Fatigue: Hand gestures, especially mid-air gestures, use more muscles than other forms of interaction, yet gestures that require muscle tension and complex movements over a lengthy period of time can be exhausting.

10. Memorability: Hand gestures must be memorized and remembered before they can be executed, whereas conventional commands just need to be recognized. Many mid-air gestures used in everyday life differ significantly from country to country; for example, waving means goodbye in Italy but "come over" in America. If you want to point in Malaysia, use your thumb instead of your index finger because using your index finger to indicate is considered impolite. Another example is the VR headset user's gesture of pushing out their hands to zoom in on virtual objects, which will be puzzling to those who are used to the smartphone's pinch-to-zoom gesture.

30/11/2021

Author: Nam Pham