project / 2026
Public-service translation / hackathon-winning product
SANAD
A trust-first public-service document translation tool built for the Google TMT Hackathon 2026, with protected-entity checks, scoped memory, human review, and export-ready outputs.
- built with FastAPI / Python / React / Vite / SQLAlchemy / SQLite / Docker / Gotenberg
- source View source ↗
notes on the build
Snapshot
SANAD is a public-service document translation prototype built for the Google TMT Hackathon 2026. As team SPAN, a two-person team, we won 1st place in the File Translation Tool Track in the official ILPRL results announcement.
The product is not just a wrapper around translation output. It is a review workflow for official documents: upload, parse, protect sensitive values, apply scoped translation memory, call the translation provider, repair fixable risks, review each segment, and export the final document with a privacy-reduced feedback pack.
Why it mattered
Public-service translation has a different failure mode from casual translation. A misplaced number, dropped name, changed legal phrase, damaged identifier, or malformed date can make the output less useful even when the sentence looks fluent.
SANAD was built around that constraint. It treats translation as a workflow that needs inspection, not as a single invisible model call. The system keeps document segments reviewable, shows glossary and memory provenance, flags risky changes, and waits for human approval before export.
One small example captures the point: if a translation provider damages a date-like value, such as turning 2026-04-21 into a malformed 2026-041, SANAD does not treat the fluent surrounding sentence as enough. It flags the mismatch, attempts a targeted repair, and keeps the corrected value aligned with the translated segment before the reviewer approves it.
How it works
The backend is a FastAPI and SQLAlchemy service with modular parsing, provider, protection, risk, review, memory, and export layers. The frontend is a React/Vite review interface for document intake, trust summaries, segment approval, and export.
The main path is:
- upload a DOCX, PDF, CSV, TSV, or text document
- parse it into reviewable segments with location metadata
- detect protected entities, numbers, glossary matches, and sensitive values
- reuse scoped translation memory with provenance
- translate through the configured TMT provider path, with deterministic fallback support for demos
- auto-repair fixable risks, including malformed date or number output, only when the new risk score improves
- require human review before final export
- produce translated output and a privacy-reduced feedback pack
Why it belongs here
SANAD belongs in the archive because it sits at the intersection I keep returning to: useful surface, inspectable internals, and trust-sensitive workflow design.
The win matters, but the more important part is the shape of the work. It took a hackathon brief about translation and turned it into a system about review, evidence, provenance, and safe handoff.
what mattered
- 1st place in the Google TMT Hackathon 2026 File Translation Tool Track as team SPAN.
- Built for English, Nepali, and Tamang document translation under KU ILPRL's Google-supported TMT project.
- Supports DOCX, PDF, CSV, TSV, and text workflows with parsing, review, export, and privacy-reduced feedback packs.
- Uses scoped translation memory, protected-entity checks, deterministic risk flags, and targeted repair for malformed dates, numbers, and entities before human approval.
next project