Skip to content

Case study · Zillow · 2019–2026

From demo to production

Six years building Zillow's document intelligence platform end to end: from a reusable multi-tenant foundation, to digital leases at national scale, to multimodal LLM document understanding that professionals trust with compliance paperwork.

documents a year
1M+
automated accuracy, from ~85%
98%+
business units served
8
landlord activation
2x

01

Documents are the gates

In real estate, nothing moves until a document is signed. Listing agreements, disclosures, leases, purchase contracts: documents are the lynchpin of the transaction, and executing them is what moves it forward. When the signature lands, a house gets bought, sold, or rented. Someone gets a place to live. Someone gets paid.

Discovery started there: mapping the document workflows that gate every Zillow business, the same use cases an entire industry runs through DocuSign and dotloop. E-signature. Secure document storage. Compliance for documents and signatures. And preparing fillable documents from the templates and forms that MLSs and brokerages publish.

02

The platform

Each business had been solving those use cases separately, and every vendor handoff bounced users into someone else's product: another brand, another login, a UX Zillow could not shape. The thesis: documents are too central to the experience to outsource. We took the use cases and built the internal platform that became the foundation for all real estate documents at Zillow: e-sign, secure document storage, compliance, and template-to-fillable-form preparation, native in one multi-tenant platform.

What that bought was the experience, not just the capability. A renter signs a lease without ever leaving Zillow. An agent prepares disclosures inside the tools they already work in. Every business unit shapes the flow to its own users, consumers on one side and professionals on the other, instead of inheriting a vendor's one-size-fits-all UX. The platform scaled across 8 business units and 20+ teams, processing 1M+ documents a year at 95%+ platform accuracy. It also retired $2M+ in annual licensing along the way.

On top of it: an AI-first document and workflow product for brokers and agents, covering automated paperwork, compliance, reporting, and proactive next best actions. Early pilots showed roughly 60% less admin time per deal and about $1M in annual savings for a mid-size brokerage.

03

Digital leases at national scale

Zillow Rentals was the flagship proof at scale: digital leases shipped nationally in about 12 months. The lease funnel (create, send, sign) was instrumented end to end, and the team iterated with experiments at every step where landlords lost momentum.

The results: landlord activation doubled, and market share grew from 18% to 30%.

Zillow Rental Manager marketing page: edit and sign a lease anywhere, with the lease upload, e-sign, and lease documents product surfaces
Zillow Rentals lease signing, public product page

04

Multimodal LLMs with confidence routing

The hardest test is the purchase and sale agreement: the most important contract in a residential transaction, the moment a lead becomes a transaction, and the place where a wrong field costs the most. The OCR/NLP generation of the system topped out around 85% accuracy. Shipping multimodal LLM document understanding to production changed the ceiling, but only because of how it shipped: confidence-based routing with human-in-the-loop guardrails, built in deep partnership with engineering on model selection, confidence thresholds, and cost/latency tradeoffs.

High-confidence extractions flow straight through. Anything below threshold routes to a human reviewer. Automation alone took accuracy from ~85% to 98%+ across 1M+ documents a year. Compliance-critical fields always pass through review, and on those the machine-plus-human system delivers ~100%.

Illustration: a stylized residential purchase and sale agreement with invented, generic values and fictional parties named A. Buyer and S. Seller. The extraction model draws bounding boxes around fields with confidence scores. Purchase price reads 98 percent and closing date 94 percent, both high confidence. Financing contingency reads 73 percent, below threshold, so it routes to a human review panel with an approve action. The accuracy ticker climbs from 85 percent to 98 percent and above, which is what automation achieves alone; a review stamp then appears reading approximately 100 percent on review-gated compliance fields, which is what the machine-plus-human system delivers.

Human review

Financing contingency

21 days

73% confidence, below threshold. Routed for review.

the editor layer

98%+

automated field accuracy, from an 85% baseline, measured in production

with human review

~100% on review-gated compliance fields

05

The eval practice

No model rollout shipped without passing the eval practice: golden datasets, per-field accuracy gates, and production quality monitoring. This is the system that turns AI that demos well into AI that professionals trust with compliance documents.

It is also the receipt behind the headline number. 98%+ automated is not a demo metric; it is measured, gated, and monitored in production. And on review-gated compliance fields, the machine-plus-human system reached ~100%.

06

Guardrails and human review

The human-in-the-loop layer was a product decision, not a fallback. Compliance-critical fields always earn review; confidence thresholds decide everything else. Reviewer corrections flow back into quality monitoring, so the system gets more trustworthy with use.

The model is never the product. The evals, guardrails, and review systems that earn a professional's trust are the product. That pattern, agents draft and the human reviews, is the same one that now runs a real business.

07

Results

  • 1M+ documents a year

    across the multi-tenant platform

  • ~85% to 98%+ automated accuracy

    and ~100% on review-gated compliance fields, machine plus human

  • 8 business units, 20+ teams

    at 95%+ platform accuracy

  • 2x landlord activation

    with market share from 18% to 30%

  • ~60% less admin time per deal

    in early pilots of the AI-first workflow product

  • $2M+ annual licensing cost eliminated

    via build-vs-buy across 3 vendors plus an internal PoC, integrated in ~4 weeks

The platform earned its place by being boring in the best way: measured, gated, reviewed. That is what “from demo to production” means.