# Investment Memo: ai-coustics > Published on ADIN (https://adin.chat/s/investment-memo-ai-coustics) > Type: Article > Date: 2026-06-02 > Description: Stage: Seed Last Round: €5M Seed, March 2025 (led by Partech) Executive Summary ai-coustics is building the reliability layer for Voice AI -- a real-time audio intelligence SDK that sits at the input end of any voice pipeline and cleans up what microphones actually hear before ASR, LLM, and TTS... **Stage:** Seed | **Sector:** Voice AI Infrastructure | **HQ:** Berlin, Germany | **Founded:** 2021 | **Total Raised:** \~€6.9M (\~$7.5M) | **Last Round:** €5M Seed, March 2025 (led by Partech) --- ## Executive Summary ai-coustics is building the reliability layer for Voice AI — a real-time audio intelligence SDK that sits at the input end of any voice pipeline and cleans up what microphones actually hear before ASR, LLM, and TTS systems ever touch it. The insight is simple but underappreciated: the speech enhancement problem has been framed as a human-listening problem for decades, but what voice AI needs is something different — audio optimized for machine consumption, not comfort. The company's core offering, the Quail and Rook model families, delivers sub-10ms speech enhancement, voice activity detection, and speaker isolation on-device via a proprietary inference runtime called AirTen. Benchmarks show a 43% reduction in word error rates in noisy conditions and outperformance vs. Silero VAD, the current open-source incumbent. ai-coustics is early-stage but technically differentiated, backed by the right domain-specialist angels, and attacking a market that is growing at \~38% CAGR. The bet is that as voice agents proliferate from boardrooms to call centers to consumer devices, the audio reliability layer becomes as essential — and as defensible — as the model layer above it. --- ## Company Overview | | | | --- | --- | | **Founded** | 2021, Berlin | | **Legal Entity** | ai-coustics GmbH | | **Website** | ai-coustics.com | | **Total Raised** | \~$7.5M across two rounds | | **Latest Round** | €5M Seed (March 2025), Partech lead | | **Prior Round** | €1.9M Pre-Seed (March 2024), Connect Ventures / FOV Ventures | | **Accelerators** | K.I.E.Z. (Berlin AI accelerator), Creative Destruction Lab | | **Customers** | Elgato, Deutsche Welle, Radio France, Synthesia | | **Scale** | Millions of audio minutes weekly, 187 countries, 150+ languages | --- ## The Problem Voice AI is breaking in production — and the cause is not the model. STT, LLM, and TTS components have matured rapidly. But in real-world deployments — phone calls, open-plan offices, ambient retail, noisy warehouses, moving vehicles — the audio that reaches the model is frequently corrupted by background noise, reverb, overlapping speech, clipping, or variable microphone quality. The result is compounding failure: a 15% word error rate on clean audio can spike to 60%+ on degraded input, and voice activity detection misfires turn conversational AI into an unusable experience. Traditional noise cancellation was designed to make audio sound better to human ears. That is not the same problem. Human listeners compensate for context; machine systems cannot. ai-coustics reframes the target: make audio better for machine understanding, not perceptual comfort. --- ## Product ai-coustics ships three core components under the Quail family, plus a higher-accuracy flagship model (Rook) for use cases where latency can flex slightly: **Quail (Speech Enhancement for STT**)Real-time speech enhancement optimized to improve downstream ASR accuracy. Reduces word error rates by up to 43% in noisy environments. Sub-10ms latency. Trained on 500+ noise types and 1M+ acoustic environments. **Quail VAD (Voice Activity Detection**)Drop-in replacement for Silero VAD, the dominant open-source option. More accurate turn-taking and endpoint detection in noisy conditions, which directly reduces interruption errors and response latency in voice agents. **Quail Voice Focus (Speaker Isolation**)Launched December 2025. Isolates the primary speaker from ambient voices in real time — the critical capability for multi-person or ambient-sound environments where competing voices break agent comprehension. **Rook (Advanced Speech Enhancement**)Launched July 2025. Higher-quality enhancement at slightly higher latency — positioned for voice-agent builders who need the best accuracy and can tolerate a modest compute overhead. **AirTen Runtime**Proprietary on-device inference engine that enables the SDK to run with no GPU requirement and sub-10ms latency. This is the infrastructure that makes the SDK practically deployable at scale on commodity hardware. **Developer Platform**Launched September 2025. Self-serve API playground, SDK management, and integration tooling — signals a maturing go-to-market motion from pure enterprise sales toward developer-led adoption. The model families are designed to run as lightweight modules inside existing voice stacks, integrating natively with LiveKit and Pipecat, the two dominant real-time voice agent frameworks. This means adoption friction is low for developers already on those platforms. --- ## Team The founding team has an unusually clean domain-fit profile for audio AI infrastructure. **Fabian Seipel — Co-Founder & CEO**Studied audio technology at TU Berlin. Worked on ML models for predictive acoustic maintenance at Deutsche Bahn, where he analyzed acoustic sensor data to detect mechanical failures in trains — directly relevant experience for machine-consumption audio. Holds a personal relationship to the problem: a slight hearing impairment from music production reportedly shaped his focus on speech clarity. Previously held CPO and CEO roles at ai-coustics. **Corvin Jaedicke — Co-Founder & CTO**ML engineer and TU Berlin lecturer teaching Deep Learning for Audio Event Detection — a rare academic-practitioner combination. Worked as a Data Scientist at Deutsche Bahn (DB Systel) and Deutsche Telekom's R&D arm (T-Labs). His academic role provides ongoing access to audio ML research talent and keeps the company at the frontier of the field. **Tim Janke — Co-Founder**Third co-founder; met Seipel and Jaedicke at TU Berlin. Specific public-facing role is less documented, but the trio share a common academic and applied ML background. The angel investor network functions as an informal extension of the team. Gert Lanckriet (Head of AI at Amazon Music) provides audio ML depth; Mehdi Ghissassi (ex-Google DeepMind, now CPO at AI71) provides frontier AI credibility; Thomas Wolf (CSO at HuggingFace) provides the open-source ML ecosystem network; and Hazel Savage (former CEO of Musiio, acquired by SoundCloud) brings prior audio-tech exit experience. **Gaps to watch:** No publicly identified CRO, VP Sales, or CFO. As the company scales from seed to Series A and moves enterprise contracts into the pipeline, go-to-market and finance leadership become the critical hiring needs. --- ## Market Voice AI infrastructure is not a niche. It is the plumbing underneath a decade-long platform shift. Voice agents are being deployed at scale across customer service, healthcare, financial services, logistics, and consumer devices. The call center market alone represents tens of billions in annual spend; voice AI is projected to displace a significant portion. Every one of those deployments runs on raw microphone audio from environments the lab never anticipated. The Voice AI Infrastructure market is projected to grow at approximately **37.8% CAGR** through 2030, driven by proliferating agent deployments across enterprise and consumer hardware. The adjacent AI audio processing software market was valued at multi-billion dollars in 2025 per Technavio and GII Research, growing on a similar trajectory. ai-coustics' total addressable market spans four layers: 1. **Voice agents / conversational AI** — the primary growth vector, where STT accuracy is a direct revenue driver for customers 2. **Real-time communication** (video conferencing, VoIP, call center infrastructure) 3. **Content creation & media** (podcasting, broadcasting, audiobooks) 4. **Hardware OEM** — embedded on-device enhancement for microphones, headsets, smart speakers The near-term wedge is Voice AI infrastructure; the long-term prize is any connected device that needs to understand spoken language reliably. --- ## Competitive Landscape ai-coustics competes in a layer that is lightly covered by dedicated specialists but increasingly contested by large-platform players. | Competitor | Category | Key Difference vs. ai-coustics | | --- | --- | --- | | **Krisp** | Real-time noise cancellation | Consumer / prosumer focus; designed for human listening, not machine STT optimization; no SDK for voice agents | | **NVIDIA Maxine** | GPU-accelerated audio enhancement | Requires GPU; tightly coupled to NVIDIA stack; heavy dependency for lightweight deployments | | **Silero VAD** | Open-source VAD | Free but not optimized for production noise conditions; ai-coustics benchmarks show clear outperformance | | **Deepgram / AssemblyAI** | Full STT platforms | Own the ASR layer, may bundle enhancement; not infrastructure-agnostic | | **Amazon Chime SDK** | Telecom / communication platform | Noise suppression available but not developer-friendly for voice agent use cases | | **WebRTC built-ins** | Browser / open standard | Rudimentary suppression; not AI-powered; widely used as a floor, not a ceiling | **The core competitive moat:** ai-coustics' differentiation is being purpose-built for machine consumption rather than human listening, combined with the AirTen runtime's ability to deliver production-grade quality without GPU dependency. Krisp requires a desktop client. NVIDIA Maxine requires NVIDIA hardware. ai-coustics works as a lightweight SDK anywhere. The risk is that Deepgram, AssemblyAI, or a major cloud provider bundles enhancement directly into their STT API. The mitigation is that infrastructure-agnostic enhancement remains valuable precisely because customers often mix and match STT, LLM, and TTS vendors — an independent enhancement layer preserves that flexibility. --- ## Business Model ai-coustics monetizes via a developer-led, usage-based model: - **Free tier** for developers via the self-serve platform (SDK keys, API playground) - **Usage-based pricing** for production deployments (minutes processed, API calls, or active seats — specific tiers not publicly disclosed) - **Enterprise contracts** for hardware OEMs, broadcasters, and large-scale voice platform deployments (volume pricing, SLAs, custom integration support) The SDK model has structurally attractive economics. Once the AirTen runtime runs on-device, ai-coustics incurs near-zero marginal infrastructure cost per additional processed minute. Cloud/API calls carry normal hosting costs, but on-device deployments scale with customer revenue without proportional cost increases. Gross margins in this model should be high at scale. Revenue is not publicly disclosed. As of March 2025 (Seed close), the company was in the early revenue stage, with customer logos across hardware (Elgato), broadcasting (Deutsche Welle, Radio France), and AI application (Synthesia) segments. --- ## Traction & Social Proof - 800,000+ signed-up users (pre-Seed era metric; likely higher post-Seed) - 2M+ audio files enhanced - Millions of audio minutes processed weekly as of 2025 - 187 countries, 150+ languages in production - Named customers: Elgato (hardware), Deutsche Welle (broadcasting), Radio France (broadcasting), Synthesia (generative video AI) - Integrations: LiveKit and Pipecat (the two dominant real-time voice agent orchestration frameworks) - Developer platform launched September 2025 — signals enough inbound developer demand to justify self-serve infrastructure - New product launches every 4–6 months (Rook July 2025, Quail Voice Focus December 2025) — strong R&D velocity relative to team size --- ## Financials & Funding | Round | Date | Amount | Lead | Key Participants | | --- | --- | --- | --- | --- | | Pre-Seed | March 2024 | €1.9M | Connect Ventures | FOV Ventures | | Seed | March 2025 | €5M | Partech | Connect Ventures, FOV Ventures, Acurio, Intuition, Arc Investors, strategic angels | | **Total** | | **\~$7.5M** | | | The funding trajectory is capital-efficient. $7.5M over two years to reach millions of processed minutes weekly, named enterprise customers, and a developer platform is lean by European B2B AI standards. The one-year gap between pre-seed and seed, with the same lead investors following on, signals healthy milestone execution. Partech's lead on the Seed round is the most important signal. As one of the leading European VC firms with €2.5B+ AUM and a track record across B2B AI infrastructure (among others), their conviction de-risks the round and opens enterprise sales channels in France and Germany where Partech has deep portfolio relationships. --- ## Bull Case **1. Voice agents become infrastructure.** If voice agents penetrate enterprise at the rate that chatbots did between 2018–2023, every deployment becomes a potential ai-coustics customer. The TAM expands not from new use cases but from the sheer scale of production deployments that need reliable audio. **2. The machine-consumption framing wins.** As more customers discover that human-listening noise cancellation breaks their STT pipelines, the demand for purpose-built machine-consumption enhancement grows. ai-coustics has a 3–4 year head start on this framing. **3. On-device becomes the default.** Privacy regulation, latency requirements, and cost pressure push inference to the edge. AirTen positions ai-coustics to be the embedded audio intelligence layer in hardware — a potentially massive OEM licensing opportunity. **4. Strategic acquisition target.** The combination of unique audio ML IP, production-grade low-latency runtime, and enterprise customer relationships in broadcasting and voice AI makes ai-coustics an attractive bolt-on for Deepgram, AssemblyAI, Amazon, NVIDIA, or any voice AI platform that wants to own the full stack. --- ## Bear Case **1. Bundling risk.** Deepgram, AssemblyAI, or a major cloud provider (AWS, Google, Azure) adds enhancement to their STT API and competes on price with a "good enough" offering bundled at zero marginal cost. ai-coustics' edge erodes if enterprise buyers optimize for vendor consolidation over audio quality. **2. Team thinness.** Five named executives with no identified sales or finance leadership is a fragile org structure for a company that needs to close enterprise contracts, manage hardware OEM partnerships, and raise a Series A simultaneously. Execution risk is real at this stage. **3. Open-source commoditization.** If Silero VAD or a Meta/Google open-source model improves to production quality in noisy conditions, the VAD moat narrows. The Rook/Quail enhancement models have more defensibility, but open-source pressure is persistent across all AI infrastructure categories. **4. Hardware timing risk.** The OEM opportunity is large but long-cycle. Device design wins take 18–36 months from spec to ship, and the revenue recognition is lumpy. If ai-coustics over-indexes on hardware at the expense of the faster-moving Voice AI software market, growth could disappoint. --- ## Key Risks | Risk | Severity | Mitigation | | --- | --- | --- | | Bundling by STT incumbents | High | Infrastructure-agnostic positioning; model quality differential; LiveKit/Pipecat ecosystem lock-in | | GTM / sales leadership gap | Medium-High | Partech network for enterprise intros; developer-led PLG reduces pure sales dependency | | Open-source VAD improvement | Medium | Quail VAD is one of three products; enhancement moat is broader and harder to replicate | | Series A fundraise risk | Medium | €5M runway in a capital-efficient Berlin operation is likely 18–24 months; 2026 raise timing depends on macro | | Hardware OEM concentration | Low-Medium | Broadcasting and Voice AI software provide shorter-cycle revenue diversification | --- ## Investment Considerations **What to like:** - Domain-expert founding team with an unusually direct academic and applied background for the problem - Capital-efficient: $7.5M to production scale across 187 countries is lean - Strategic angel network (HuggingFace, DeepMind, Amazon Music) provides both technical validation and commercial channels - Machine-consumption audio framing is differentiated and defensible - Partech lead provides European enterprise network and Series A credibility - Product velocity is strong: 3 major product launches in 12 months post-Seed **What to watch:** - ARR and net revenue retention — not public, but the critical metric to track before a Series A - Sales leadership hire — the clearest operational gap relative to growth ambitions - LiveKit/Pipecat integration depth — these partnerships are the fastest path to distribution at scale - Bundling moves by Deepgram or AssemblyAI — the competitive event most likely to compress the opportunity **Comparable trajectories:** Krisp ($65M raised, noise cancellation consumer/prosumer); deepgram ($86M raised, full STT stack); Rime/Cartesia (TTS infrastructure, Seed/Series A stage); Livekit (open-source voice agent orchestration, Series A). ai-coustics is most comparable to the infrastructure layer companies — sub-$10M raise to production scale — and is priced accordingly at Seed. --- *Memo prepared June 2026. All figures from public sources including Partech, Connect Ventures, TechCrunch, CB Insights, and company website. Revenue and ARR not publicly disclosed; financial projections not included.*