Case study · Zillow · 2019–2026
From demo to production
Six years building Zillow's document intelligence platform end to end: from a reusable multi-tenant foundation, to digital leases at national scale, to multimodal LLM document understanding that professionals trust with compliance paperwork.
- documents a year
- 1M+
- automated accuracy, from ~85%
- 98%+
- business units served
- 8
- landlord activation
- 2x
01
Documents are the gates
In real estate, nothing moves until a document is signed. Listing agreements, disclosures, leases, purchase contracts: documents are the lynchpin of the transaction, and executing them is what moves it forward. When the signature lands, a house gets bought, sold, or rented. Someone gets a place to live. Someone gets paid.
Discovery started there: mapping the document workflows that gate every Zillow business, the same use cases an entire industry runs through DocuSign and dotloop. E-signature. Secure document storage. Compliance for documents and signatures. And preparing fillable documents from the templates and forms that MLSs and brokerages publish.
02
The platform
Each business had been solving those use cases separately, and every vendor handoff bounced users into someone else's product: another brand, another login, a UX Zillow could not shape. The thesis: documents are too central to the experience to outsource. We took the use cases and built the internal platform that became the foundation for all real estate documents at Zillow: e-sign, secure document storage, compliance, and template-to-fillable-form preparation, native in one multi-tenant platform.
What that bought was the experience, not just the capability. A renter signs a lease without ever leaving Zillow. An agent prepares disclosures inside the tools they already work in. Every business unit shapes the flow to its own users, consumers on one side and professionals on the other, instead of inheriting a vendor's one-size-fits-all UX. The platform scaled across 8 business units and 20+ teams, processing 1M+ documents a year at 95%+ platform accuracy. It also retired $2M+ in annual licensing along the way.
On top of it: an AI-first document and workflow product for brokers and agents, covering automated paperwork, compliance, reporting, and proactive next best actions. Early pilots showed roughly 60% less admin time per deal and about $1M in annual savings for a mid-size brokerage.
E-sign
DocuSign · Adobe Sign · Authentisign
someone else's UX
Transactions + compliance
dotloop · SkySlope · TransactionDesk
someone else's UX
Forms + document prep
zipForm · Adobe Acrobat · Form Simplicity
someone else's UX
Secure storage
Box · Dropbox · Google Drive
someone else's UX
fragmented vendor stack
One native platform, inside Zillow's surfaces
- E-signbuilt in
- Secure storagebuilt in
- Document prepbuilt in
- Compliancebuilt in
Zillow-shaped UX, every surface
customizable per business
and $2M+ licensing retired
one platform
03
Digital leases at national scale
Zillow Rentals was the flagship proof at scale: digital leases shipped nationally in about 12 months. The lease funnel (create, send, sign) was instrumented end to end, and the team iterated with experiments at every step where landlords lost momentum.
The results: landlord activation doubled, and market share grew from 18% to 30%.

04
Multimodal LLMs with confidence routing
The hardest test is the purchase and sale agreement: the most important contract in a residential transaction, the moment a lead becomes a transaction, and the place where a wrong field costs the most. The OCR/NLP generation of the system topped out around 85% accuracy. Shipping multimodal LLM document understanding to production changed the ceiling, but only because of how it shipped: confidence-based routing with human-in-the-loop guardrails, built in deep partnership with engineering on model selection, confidence thresholds, and cost/latency tradeoffs.
High-confidence extractions flow straight through. Anything below threshold routes to a human reviewer. Automation alone took accuracy from ~85% to 98%+ across 1M+ documents a year. Compliance-critical fields always pass through review, and on those the machine-plus-human system delivers ~100%.
Illustration: a stylized residential purchase and sale agreement with invented, generic values and fictional parties named A. Buyer and S. Seller. The extraction model draws bounding boxes around fields with confidence scores. Purchase price reads 98 percent and closing date 94 percent, both high confidence. Financing contingency reads 73 percent, below threshold, so it routes to a human review panel with an approve action. The accuracy ticker climbs from 85 percent to 98 percent and above, which is what automation achieves alone; a review stamp then appears reading approximately 100 percent on review-gated compliance fields, which is what the machine-plus-human system delivers.
Human review
Financing contingency
21 days
73% confidence, below threshold. Routed for review.
the editor layer
automated field accuracy, from an 85% baseline, measured in production
with human review
~100% on review-gated compliance fields
05
The eval practice
No model rollout shipped without passing the eval practice: golden datasets, per-field accuracy gates, and production quality monitoring. This is the system that turns AI that demos well into AI that professionals trust with compliance documents.
It is also the receipt behind the headline number. 98%+ automated is not a demo metric; it is measured, gated, and monitored in production. And on review-gated compliance fields, the machine-plus-human system reached ~100%.
06
Guardrails and human review
The human-in-the-loop layer was a product decision, not a fallback. Compliance-critical fields always earn review; confidence thresholds decide everything else. Reviewer corrections flow back into quality monitoring, so the system gets more trustworthy with use.
The model is never the product. The evals, guardrails, and review systems that earn a professional's trust are the product. That pattern, agents draft and the human reviews, is the same one that now runs a real business.
07
Results
1M+ documents a year
across the multi-tenant platform
~85% to 98%+ automated accuracy
and ~100% on review-gated compliance fields, machine plus human
8 business units, 20+ teams
at 95%+ platform accuracy
2x landlord activation
with market share from 18% to 30%
~60% less admin time per deal
in early pilots of the AI-first workflow product
$2M+ annual licensing cost eliminated
via build-vs-buy across 3 vendors plus an internal PoC, integrated in ~4 weeks
The platform earned its place by being boring in the best way: measured, gated, reviewed. That is what “from demo to production” means.