Methodology
Listings source
Pricetag sources its homes from Realtor.com's public listings via their structured API. Each listing carries the publisher's published list price, address, beds, baths, square footage, lot size, year built, and the photo set the publisher uploaded. We mirror photos to our own CDN (Cloudflare R2) so the game keeps working even if a listing is removed from Realtor.com after ingest.
Ingest pipeline
Ingest runs in four staged passes:
- Plan — pick which markets to query, how many homes from each.
- Fetch — pull raw listing JSON from the source API, cache to R2.
- Normalize — parse fields, validate addresses, compute neighborhood from coordinates where missing.
- Persist — write to Postgres only the listings that pass a quality gate (has photos, has a valid price, has a parseable address).
The staged design means a failure at one step doesn't corrupt the database — bad data fails at normalize and never makes it to the gameable pool.
Why some markets show up more
Our target-markets list weights cities by tier. Top metros (NYC, LA, Chicago, Houston, Dallas, Miami, Atlanta, Boston, Seattle, Washington DC, Denver, Phoenix, Philadelphia, San Francisco) are weighted 1.0 — they show up most often. Secondary cities (Buffalo, Memphis, Tulsa, Birmingham, Des Moines, Columbus, Pittsburgh, etc.) are weighted 0.4–0.6 — they show up sometimes, which keeps the game from feeling like only the same five cities.
The weighting is editable; it's in data/target-markets.json in the public repo. If a market you care about is missing, that's the file to change.
Scoring math
Each guess is scored as 1 minus the absolute percent difference from the actual price, clamped to [0, 1] and reported as a percentage. A guess of $850k on a $1M home scores 85% accuracy. A guess of $400k on a $1M home scores 40%. Guesses above or below the actual price are symmetric — we don't penalize over-guesses more than under-guesses.
The accuracy gets binned into the buckets you see on the result screen and the leaderboard. The bucket cutoffs are 90/80/70/60/50; see How it works for the bucket labels.
Address redaction
The full street address is never displayed until you submit a guess. On the play screen you see neighborhood + city + state — enough to calibrate against the market, not enough to look up the listing directly. After your guess, the reveal screen shows the address, the list price, and a link to the original listing.
Listing freshness
We re-ingest on a schedule that varies by market tier. Top metros refresh every few days; secondary cities refresh weekly. Each home's reveal screen shows the ingest date, so you know whether the price you guessed against was captured this week or last month.
Questions we didn't cover are probably in the FAQ. Story of the project on the About page.