Gestura
Gestura is a real-time, bi-directional accessibility solution designed to bridge communication between Deaf and Hard-of-Hearing individuals and hearing users. It addresses the “double-blind” communication gap by translating Indian Sign Language (ISL) into text or speech, and converting spoken or typed language back into ISL through a 3D virtual avatar, enabling natural two-way interaction.
Category
Webdev
Year
2025
Gestura
Gestura is a real-time, bi-directional accessibility solution designed to bridge communication between Deaf and Hard-of-Hearing individuals and hearing users. It addresses the “double-blind” communication gap by translating Indian Sign Language (ISL) into text or speech, and converting spoken or typed language back into ISL through a 3D virtual avatar, enabling natural two-way interaction.
Category
Webdev
Year
2025

Overview
Communication barriers often arise because most accessibility solutions focus on only one direction—either sign-to-text or text-to-sign. Gestura addresses this “double-blind” gap by enabling seamless interaction in both directions. The platform combines computer vision, deep learning, and 3D animation standards to deliver accurate, privacy-first, and real-time translations suitable for everyday use.
ISL to Text Pipeline
Gestura captures live video input using a standard webcam and processes it entirely in the browser. Instead of relying on raw pixel data, the system uses MediaPipe to extract precise 3D landmarks from the hands and body, converting human motion into structured numerical data. These landmarks are analyzed over time by a deep learning model deployed using TensorFlow.js, allowing the system to recognize Indian Sign Language gestures with high accuracy.
To improve reliability and responsiveness, a Random Forest classifier is used to reduce overfitting caused by variations in hand size, lighting, or camera angle. Alpha-Beta pruning logic further optimizes inference by eliminating impossible sign sequences early, significantly reducing latency and CPU usage.
Text and Speech to ISL Pipeline
The output pipeline focuses on generating expressive, editable sign language rather than relying on static videos. Spoken or typed input is converted to text and mapped to SiGML (Signing Gesture Markup Language), an XML-based standard that describes the phonetic structure of signs, including hand shape, orientation, and movement.
This SiGML data drives a 3D avatar engine to generate smooth, real-time signing animations. Because SiGML is text-based, the system is lightweight, scalable, and easy to modify, avoiding the storage and rigidity limitations of pre-recorded sign videos.
Emotion-Aware Signing
Gestura enhances communication by incorporating emotion detection into the translation process. Vocal cues such as tone and urgency are analyzed and translated into non-manual markers within the SiGML output. This allows the avatar to dynamically adjust facial expressions, motion intensity, and signing speed, resulting in more natural and human-like communication.
Performance, Privacy, and Scalability
All gesture recognition and inference run locally in the browser using TensorFlow.js and MediaPipe, ensuring that no video data is transmitted to external servers. This privacy-first approach makes Gestura suitable for sensitive and real-world use cases. By leveraging SiGML instead of video assets, the platform remains highly scalable, requiring minimal storage while supporting a growing vocabulary. Hardware acceleration through WebGL and optimized search logic enables smooth real-time performance at 30+ FPS.
Key Highlights
Real-time, bi-directional ISL communication
Browser-based computer vision with no server-side video processing
3D avatar-driven ISL generation using SiGML
Emotion-aware signing for natural interaction
Privacy-first and highly scalable architecture
Tech Stack
Computer Vision: MediaPipe
Machine Learning: TensorFlow.js
3D & Animation: SiGML, Avatar-based signing engines
Frontend: Web-based, WebGL accelerated
