Gestura

Overview

Communication barriers often arise because most accessibility solutions focus on only one direction—either sign-to-text or text-to-sign. Gestura addresses this “double-blind” gap by enabling seamless interaction in both directions. The platform combines computer vision, deep learning, and 3D animation standards to deliver accurate, privacy-first, and real-time translations suitable for everyday use.

ISL to Text Pipeline

Gestura captures live video input using a standard webcam and processes it entirely in the browser. Instead of relying on raw pixel data, the system uses MediaPipe to extract precise 3D landmarks from the hands and body, converting human motion into structured numerical data. These landmarks are analyzed over time by a deep learning model deployed using TensorFlow.js, allowing the system to recognize Indian Sign Language gestures with high accuracy.

To improve reliability and responsiveness, a Random Forest classifier is used to reduce overfitting caused by variations in hand size, lighting, or camera angle. Alpha-Beta pruning logic further optimizes inference by eliminating impossible sign sequences early, significantly reducing latency and CPU usage.

Text and Speech to ISL Pipeline

The output pipeline focuses on generating expressive, editable sign language rather than relying on static videos. Spoken or typed input is converted to text and mapped to SiGML (Signing Gesture Markup Language), an XML-based standard that describes the phonetic structure of signs, including hand shape, orientation, and movement.

This SiGML data drives a 3D avatar engine to generate smooth, real-time signing animations. Because SiGML is text-based, the system is lightweight, scalable, and easy to modify, avoiding the storage and rigidity limitations of pre-recorded sign videos.

Emotion-Aware Signing

Gestura enhances communication by incorporating emotion detection into the translation process. Vocal cues such as tone and urgency are analyzed and translated into non-manual markers within the SiGML output. This allows the avatar to dynamically adjust facial expressions, motion intensity, and signing speed, resulting in more natural and human-like communication.

Performance, Privacy, and Scalability

All gesture recognition and inference run locally in the browser using TensorFlow.js and MediaPipe, ensuring that no video data is transmitted to external servers. This privacy-first approach makes Gestura suitable for sensitive and real-world use cases. By leveraging SiGML instead of video assets, the platform remains highly scalable, requiring minimal storage while supporting a growing vocabulary. Hardware acceleration through WebGL and optimized search logic enables smooth real-time performance at 30+ FPS.

Key Highlights

Real-time, bi-directional ISL communication
Browser-based computer vision with no server-side video processing
3D avatar-driven ISL generation using SiGML
Emotion-aware signing for natural interaction
Privacy-first and highly scalable architecture

Tech Stack

Computer Vision: MediaPipe
Machine Learning: TensorFlow.js
3D & Animation: SiGML, Avatar-based signing engines
Frontend: Web-based, WebGL accelerated

Webdev

2025

Gestura

2025

Overview

ISL to Text Pipeline

Text and Speech to ISL Pipeline

Emotion-Aware Signing

Performance, Privacy, and Scalability

Key Highlights

Tech Stack

Ready to create something
awesome together?

Ready to create something
awesome together?

Ready to create something
awesome together?