An open-source pre-processing library that sits between user uploads and your application. Every file is destroyed from scratch — whatever was hiding doesn't survive the trip.
A JPEG can carry executable shellcode in its metadata. An MP3 can embed scripts in ID3 tags. An MKV can smuggle payloads inside codec headers. Most applications just accept these files, validate the MIME type, and hope for the best.
Hope is not a security strategy.
Not scanning. Not parsing. Not stripping. Re-encoding. The original byte stream is obliterated. What comes out is a new file — structurally unrecognizable to the original.
PNG, JPEG, WebP, GIF — re-encoded through OpenCV. The original byte stream is obliterated. Every injected payload, every steganographic trick, every malformed header — gone.
MP3, OGG, FLAC, WAV, AIF, AU, MP4, MKV, WebM, MOV — decoded and re-encoded through FFmpeg. Codec-validated, bitrate-checked, rebuilt from decoded data.
Plain text, JSON, logs — normalized through Unicode normalisation and entity escaping. HTML injection, shell command injection, and downstream script execution are neutralized.
Six rules that govern every line of code.
Not the filename. Not the extension. Not the MIME type. Not the magic bytes. Every file is hostile until validated against a whitelist and re-encoded through a controlled pipeline.
Blacklists are a losing game. There’s always a new exploit, a new encoding trick. If it’s not on the whitelist, it doesn’t get in. Period.
A sanitisation layer that slows down uploads is a sanitisation layer that gets disabled. The pipeline must be fast enough to be invisible to the user. That’s the target.
Unknown format, corrupted headers, oversized file — the answer is always reject. A false rejection is an inconvenience. A false acceptance is a breach.
Every line of code is an attack surface. Every dependency is a supply chain risk. We use OpenCV and FFmpeg because nobody writes their own codecs. Everything else is our code — readable, auditable, explainable.
Closed-source security is a contradiction. If you can’t read the code, you can’t verify the claims. We ship source. We document decisions. We welcome scrutiny.
Every stage is a gate. Fail any gate, the file is rejected. Pass all gates, the file is re-encoded. The re-encoding is the actual sanitisation — everything before it is defence in depth.
Security teams who need an auditable, measurable layer. Solo builders who can't afford a security team but can't afford a breach either. Anyone who's looked at a user-uploaded file and thought: what's actually in this thing?
File comes in. Gets validated. Gets re-encoded. Clean file comes out. Original is destroyed. No virus scanning. No sandboxing. No cloud dependency. No subscription.