Rendered at 20:00:55 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
unusual_typo 5 hours ago [-]
Here are the benchmark results. You can check more details in the repo. openai/privacy-filter on Apple M1 Max
dtype 1k total 1k tok/s 8k total 8k tok/s
━━━━━━━━━━━━━━━━ ━━━━━━━━━━━ ━━━━━━━━━━ ━━━━━━━━━━━━━ ━━━━━━━━━━
fp32 620.52 ms 1,664 4,893.86 ms 1,689
──────────────── ─────────── ────────── ───────────── ──────────
fp16 654.56 ms 1,578 5,430.17 ms 1,521
──────────────── ─────────── ────────── ───────────── ──────────
q4 582.13 ms 1,776 4,635.39 ms 1,784
──────────────── ─────────── ────────── ───────────── ──────────
q4f16 648.10 ms 1,594 5,261.56 ms 1,570
──────────────── ─────────── ────────── ───────────── ──────────
quantized int8 573.94 ms 1,801 4,594.95 ms 1,800
anoop_kumar 14 hours ago [-]
I would love to have an option where instead of just redaction; I'd love to swap it with something else when it goes to AI and then swap it back when the AI returns it. Thanks for sharing the github. I might submit a PR if I don't find that feature
unusual_typo 14 hours ago [-]
I wanted to implement the feature initially. i realized that it requires modification of coding agents (eg codex, claude code, opencode etc). hook or skills pass PII data into server eventually so i decided to share the standalone app first. Feel free to submit a PR!
levi840714 13 hours ago [-]
Nice, local is the right call. What's the local AI model — a small NER model bundled in, or calling out to something? Curious about the size/footprint for a desktop app.
unusual_typo 5 hours ago [-]
It use openai/privacy-filter which is smaller than 1GB in size. I haven't checked usage during inference. It rans at 1k toks/sec on my macbook. I will update the repo with benchmark results. Thanks for the comment
biduskamil 7 hours ago [-]
Local is the way. Any benchmarks on latency it has on CPU?
unusual_typo 5 hours ago [-]
I just ran the benchmark on my macbook. 582 ms for 1k tokens and 4.64 s for 8k