Would love to hear more from you! Can you point out which graph in particular is inaccurate and what the deficiency is? Also it’s cool that you have so much experience as an assessor - can you tell me which office you worked in?
I worked in an Assessor’s Office in Utah. I know this sounds funny, but we working with Assessor’s in Oklahoma when I was in the Assessor’s Office. So, is not as different in the United States as people might think.
Just a few things:
Where I live, Sales Chasing is illegal. There are two different types of states. “Full Disclosure” and “Non Disclosure.” There are several “Non-disclosure” states where the Assessor is not allowed to see the amount the home was purchased for.
If the Assessor does find out the sales price, and values the home at that price w/o valuing all the other homes in the area, the assessor can be in hot water for Sales Chasing.
It sounds like Iowa is “Full Disclosure” and Sales Chasing is alive and well.
One reason for a discrepency in the price of homes that are very similar is that not all homes are valued as of January 1st. Some states value properties as of June 1st. Not all Assessor’s are County Offices some Assessor’s are City Offices.
Often A Plat (Plot Map) is handled by one Assessor in the Office and the next Plot Map is handled by another Assessor. So I would expect there to be some some variation in the way the homes are valued.
Another possible reason for the discrepency is homes must be appraised at least once every 5 years. So, an Assessor has the homes on a time to be valued. One year the Assessor might have their office all the homes built from 1970 - 1999.
~So if the home next door to the 1999 was built in 2000, it’s value could stay very close to what it currently is. And it’s assessed value may not change until the next year.
If you’re curious about appeals, I’ve been in thousands of appeals and I know what works the best.
Mass appraisal can be in the form of an AVM, Cost Approach, Sales Comparison Approach, Income Approach, etc. It depends on the size of the Assessor’s office and the number of parcels being appraised.
But I’ve learned that most people are afraid of the process because they fear what they don’t understand.
My goal now is to help people learn to minimize their property taxes, then help them know which improvments will given their homes give them the most bang for their buck.
This is just a microscopic look into the Assessor’s world. Let me know if I have bored you to death.
Thank you for taking the time to share your insights and experience—it’s always a treat to connect with someone who’s been in the thick of it for so long. I genuinely appreciate your feedback and am eager to learn from your perspective.
It’s great to hear you’ve worked in Utah. I know Utah's a big place, but any chance you know Jake Parkinson who used to work at the Tooele County Assessor’s Office? I've learned a lot from him.
Let’s go through your points one by one:
Sales Chasing & Non-Disclosure:
I completely agree that the distinction between “Full Disclosure” and “Non Disclosure” states is crucial. In Texas (my home state), we face similar challenges, and I’ve found the Utah offices I've worked with to be particularly resourceful in navigating non-disclosure issues. I discuss this topic in more detail in this section (ttps://progressandpoverty.substack.com/i/158598256/data-scarcity). If there’s anything you feel could be clearer, I’d be happy to hear your thoughts.
If any section seemed ambiguous, please let me know where I might clarify further.
Assessor Offices’ Structures:
You’re absolutely right that not all assessor offices operate at the county level; some are indeed city offices or even independent entities. I’ve illustrated this point with examples of “appraisal districts” that function under the authority of neither the city or the county (https://progressandpoverty.substack.com/i/158598256/actual-bad-behavior). I appreciate you highlighting this nuance.
Mass Appraisal Methods:
Thank you for pointing out the broader scope of mass appraisal techniques. My article implicitly focused on the sales comparison approach, which admittedly means I glossed over alternative methods like the AVM, Cost Approach, or Income Approach. I’ve covered these in previous articles, and I appreciate the reminder to ensure readers understand the full context.
Finally, regarding your comment that “your graph is not accurate” — could you please indicate which graph you’re referring to and what specific issue you noticed? The illustrations are meant to be simplified examples that get one concept across at a time rather than claiming to explain everything at once. I’m more than willing to address any specific concerns if you could point me in the right direction.
Thanks again for your thoughtful feedback. I value the opportunity to engage in this discussion and am always open to learning from the experts. Wishing you a fantastic day ahead!
I actually think the illustrations are really good for explaining what happens and generally how things work. I think you've done a tremendous job.
I'm new to this forum so I apologize for the criticism.
I would just like to add a quick word a caution about your illustrations (graph). While I do believe they are great as a general rule of thumb to get your point across; Zillow, Redfin, or a Real Estate Agent may be reporting the sq ft of the homes in the wrong manner.
Currently I review appraisals for Fannie Mae correctness. Every borrower is allowed one appeal per appraisal due to a Fannie Mae regulation as of last October.
One of the biggest complaints I hear in appeals currently is that real estate agents, Zillow, and Redfin are adding the basement sq ft to the main level sq ft for marketing purposes which often makes the home look like a better deal.
Appraisers are not allowed to report the sq ft of a home this way. There is no guarantee the Assessor does not report the sq ft of a home in the same manner as the real estate agents.
In regards to Jake Parkinson, I want to say yes. I believe Jake was in a different County when I met him. There are 29 Counties in Utah with a total population of about 3.4 million. So, I do know some of the Assessors well, but they are the assessors in the more populated areas.
I don't know last names really well. First names and faces mostly.
Lars, I think you're doing a great job. Keep up the good work.
Hey nothing to apologize for! I want to keep myself sharp and nothing is better than an experienced practitioner for that.
> One of the biggest complaints I hear in appeals currently is that real estate agents, Zillow, and Redfin are adding the basement sq ft to the main level sq ft for marketing purposes which often makes the home look like a better deal.
This is a really great insight, including unfinished square footage I presume as if it were finished? I'll be sure to be on the lookout for this anytime I touch those sources.
Do they also conduct ratio studies where the assessed value is divided by the sale price _soon after_ the valuation/release date? This should be a better shield against sales chasing, right?
It varies by jurisdiction. Most assessors will run ratio studies throughout the process, but the state oversight boards will typically stick to a "standardized test" as above.
Indeed, using new sales to check the predictive power of your valuations is a good practice. The challenge is that you are always limited by the amount of sales, and sales soon after the valuation date will only affect a portion of your sales.
How would it work in a dying village that barely sees any transactions? Would you generalize pricing based on the availability of amenities in a similar location that does have transaction data?
I’m also curious how it work with a very high LVT when you couldn’t rely on sales data.
So, obviously the more extreme data challenges are more difficult and you should expect wider error bars there.
> How would it work in a dying village that barely sees any transactions? Would you generalize pricing based on the availability of amenities in a similar location that does have transaction data?
There are some who do this, but it will certainly be challenged on the basis of how comparable the two areas are, and that will be difficult to mount a counter-argument against. How do we define a "similar location?" I should note that this problem sometimes does occur *within* a jurisdiction -- and in that case the usual answer is to find some--preferably economic--anchor point, such as median income for the local area, which is widely published in US census data. The comparable location should also have similar density, same mix of property types, and not be too far away.
Another consideration is that there is more than one general approach to value and here I only spoke about the most common one, the "market approach" or "sales comparison approach." The other two approaches to value are the cost approach and the income approach.
In an area without any sales, or very sparse sales (such as is often the case with commercial, industrial, or rental property) the income approach comes into play. This would be a net present value of discounted cash flow model, the same model that real estate investors use to generate the prices they bid on properties in the first place. This assumes you have access to rental data, cap rates, etc.
This is very late, but I wanted to leave some naive comments/questions here. I'm coming at this from much more of a math/stats background and without any domain knowledge, so apologies if the questions/comments here are naive.
1. On a mathematical note, why don't we just do everything in the log-space of prices? I find it confusing that we are calculate ratios first (a multiplicative operation) but then do additive statistics on them.
2. Why do we use the coefficient of dispersion (a rescaled mean absolute deviation) instead of a more "standard" measure of dispersion (e.g. a standard deviation if we don't care about robustness or a MAD or Shamos Scale Estimator if we want a robust estimator)? If we are trimming "sales outside the interquartile range" (that's trimming 50% of the data!), then (IMO) that implies we should just be using a robust measure of scale instead [0]. Is this just historical?
3. How does "fair market value" work when you have can bundle/split property? I can imagine that some properties are worth "more than then sum of the parts" (e.g. it's more valuable to own all condos in a building if you want to redevelop) or "less than the sum of its parts". Do we just assume that property is "pre-parceled" and so we don't deal with mergers/splits?
4. IIUC, "vertical equity" is what I'm used to calling "calibration": for any house, you want the expected sales price to be the predicted valuation (adjusted for timing). What you *don't* want is to have systematic deviations from the y=x line (e.g. so that expensive homes aren't consistently undervalues). If that's right, then I'd suggest taking a look at approaches that *avoid* binning/stratifying altogether -- I believe the the most accepted approaches today are based on cumulative deviations [1]. The simplified TLDR (the references in [1] have the details)
1. Store the (log) assessed values in an array `A` and the (log) sales prices in `S`.
2. Sort them in the ascending order of the assessed values: `S = S[argsort(A)]` and `A = sort(A)`.
3. Calculate the cumulative errors `C = (A - S).cumsum()`
4. Calculate the expected total variance `sigma2 = sum((A - S)**2`. [2]
5. Calculate `K = (C.max() - C.min()) / sqrt(sigma2)` (the author calls this the "Kuiper-type statistic" after https://en.wikipedia.org/wiki/Kuiper%27s_test). `K` is now a summary statistic with nice theoretical properties (including p-values because the cumsum should converge to a Brownian motion). You can also plot `C / sqrt(sigma2)` to visualize any discrepancies.
5. I was confused about the discussion of "horizontal equity". Aren't we begging the question if we assume that things in the CHD should have low dispersion on the valuations? It seems entirely circular to *assume* that certain buildings must have similar valuations and then use that to evaluate a valuation model! I feel like what makes more sense exactly the "deviation of a subpopulation from the full population" test from https://link.springer.com/content/pdf/10.1007/s10444-023-10068-6.pdf (where our subpopulation can be "property in a geographic unit" or "property owned by a particular minority group") which does not assume that our subpopulation should have similar home prices.
[0] I'm not an expert on these, but https://pragmastat.dev/ collects these (with proofs and links to references) in a central place.
[2] This point is a bit buried in the papers (since they mostly focus on classification and not prediction), but the key is that the null hypothesis of "perfect calibration" is that E[S] = A and that S is independent given A. Thus, Var[sum(A - S)] = sum(Var(A[k] - S[k])) (by conditional independence) = sum(E[(A[k] - S[k])**2) (by expectation of S) = E[sum(A[k] - S[k])**2] (by linearity of expectation).
These are great questions. Let me see if I can get to them all.
The first thing to say is that there is some historical path dependency in assessment and that although there are many professional statisticians in the field, its principal audience, stakeholders, and most importantly, overseers, are not statisticians, but rather laypeople and politicians who need things explained to them in the simplest possible terms. The more intuitive and concrete a measure is for someone who might only have a high school education, the better.
That said, nothing prevents any individual assessor or modeler from using more traditional academic statistics in their internal work. However, for official reporting, these are the statistics that IAAO (and in most cases, the law) insist upon.
I will also note that, as you might expect, there is some lively debate in the community about whether these are the best possible statistics that could be used, and you would not be the first to question them. In that light, do not take my explanations given here as anything other than my best descriptive explanations for why the industry seems to have settled on their usage.
On a further note, nothing stops us from using additional statistics in our free and open source code library, OpenAVMKit. We will always have to generate and report the IAAO standards if we want assessors to use it, but nothing stops us from adding additional ones for extra scrutiny! In that light I appreciate your feedback here.
Now to your questions.
> 1. On a mathematical note, why don't we just do everything in the log-space of prices? I find it confusing that we are calculate ratios first (a multiplicative operation) but then do additive statistics on them.
The assessment profession has historically been built around *sale ratios* because they're easy for laypeople to interpret. Also, the sales ratio statistic is *deeply* embedded in both law and IAAO standards. When you're reporting to state agencies and explaining things to taxpayers, sales ratios are a) what they expect and b) much easier for them to understand.
> 2. Why do we use the coefficient of dispersion (a rescaled mean absolute deviation) instead of a more "standard" measure of dispersion (e.g. a standard deviation if we don't care about robustness or a MAD or Shamos Scale Estimator if we want a robust estimator)? If we are trimming "sales outside the interquartile range" (that's trimming 50% of the data!), then (IMO) that implies we should just be using a robust measure of scale instead [0]. Is this just historical?
First of all, on the question of trimming, the actual standards are set by the local jurisdiction, and those practices can vary widely. Some do not let you trim much at all, others let you trim a lot. I consider best practice to always focus on the untrimmed statistics, I just mention the trimming because it's in the standards--IAAO is just trying to cover all the bases in a country with widely divergent local practices.
As for why the IAAO picked COD specifically, I don't know the specific historical reason. If it helps, in an assessor's minds there are two things they are trying to communicate to the public and to their overseers:
- The overall *level* of assessment, ie, if they are generally over-assessing or under-assessing. This is expressed by the median ratio.
- The overall *uniformity* of all those individual assessments relative to the sale prices. This is measured by the COD.
Importantly, measuring these factors separately (overall assessment level and assessment-to-sale uniformity) is often required by state law.
So compared to academic statistics, I admit that calling COD a measure of "accuracy" in the article was a slight simplification, because you can of course measure raw accuracy more directly with other statistics. Another thing is that ratio-based statistics are used because they are simple and easily comparable across jurisdictions and groupings, whereas absolute error measures are not.
> 3. How does "fair market value" work when you have can bundle/split property? I can imagine that some properties are worth "more than then sum of the parts" (e.g. it's more valuable to own all condos in a building if you want to redevelop) or "less than the sum of its parts". Do we just assume that property is "pre-parceled" and so we don't deal with mergers/splits?
There is an entire literature in the field for dealing with this so the short answer is "it's complicated." For instance, Multi-parcel sales in some jurisdictions will be excluded from consideration on the grounds of being atypical sales, precisely because the package often trades at a discount compared to what the parcels would have sold for individually, and the most typical market transaction in that jurisdiction is for a parcel to trade individually. However, some jurisdictions will have methods of unpacking multi-parcel sales into multiple "individual sales" that they will then use in their modeling. A full treatment is more complicated than I will be able to go into here, but suffice it to say that this is something the field deals with all the time and a lot has been written on the subject.
> 4. IIUC, "vertical equity" is what I'm used to calling "calibration": for any house, you want the expected sales price to be the predicted valuation (adjusted for timing). What you *don't* want is to have systematic deviations from the y=x line (e.g. so that expensive homes aren't consistently undervalues). If that's right, then I'd suggest taking a look at approaches that *avoid* binning/stratifying altogether -- I believe the the most accepted approaches today are based on cumulative deviations [1]. The simplified TLDR (the references in [1] have the details)
For what it's worth, vertical equity is a statistic that has had a LOT of debate and the IAAO is currently reworking their standards on it. You can see the latest thing they've come up with here:
There has been much debate and the final decision has not yet been made, but they seem to have settled on this new "VEI" statistic. A key consideration in all these things is getting offices to actually adopt whatever new statistic they pick.
> 5. I was confused about the discussion of "horizontal equity". Aren't we begging the question if we assume that things in the CHD should have low dispersion on the valuations? It seems entirely circular to *assume* that certain buildings must have similar valuations and then use that to evaluate a valuation model! I feel like what makes more sense exactly the "deviation of a subpopulation from the full population" test from link.springer.com/conte… (where our subpopulation can be "property in a geographic unit" or "property owned by a particular minority group") which does not assume that our subpopulation should have similar home prices.
There's two things to say here--first of all, the opinion you're contending with is not so much that of assessors but that of taxpayers and the law. The basic principle is "equal treatment of equals." The purpose is to have some way of enforcing that the assessor is using a *standardized method* that is not applying personal or arbitrary bias, but that, entirely apart from its predictive performance in matching sales prices, treats the same kinds of properties in the same way.
It's true that there's an assumption -- "If the locations are the same and the characteristics are the same, and there are similar properties from the same local area sample that have sold, the unsold similarly-located, similarly-featured properties should be valued according to the same standardized method and should therefore all have similar valuations." It is the law and the public that are largely making this assumption, however.
I definitely get your argument about circularity, the thing is just empirically if you violate this standard, oversight will ding you the taxpayer who has the house with the higher valuation than their neighbor with an identical home will protest, "why do I pay more? His house is the same!"
Now despite that strong directive from both the law and the public, there is not actually a single accepted IAAO measure for horizontal equity. There's a few tests floating around that look at median ratios across different property types, but these don't measure uniformity directly. Looking at sales ratios tells you nothing about how consistently unsold properties are being valued.
My own personal clustering method is just something I came up with. It's chief disadvantage is that it's only as good as it clusters, and its validity only holds if you can make the genuine case that everything within a similar cluster should be similarly valued. The key thing is that I am not clustering just on physical characteristics, but also on location, and if you have experience in mass appraisal you will quickly find that nearly every feature you care about (including sale price) is strongly geospatially correlated.
That said, it's good to kick the tires, so I would welcome some concrete examples of where you would expect this definition/assumption of horizontal uniformity to fail in practice. What kind of property should not be valued the same as its physically identical close neighbors, even if all those physically identical close neighbors have sold for similar prices? That would help to better understand the limits of the conventional approaches.
Thanks for taking the time to respond! Definitely appreciate it (and hopefully at some point I'll have time to look through some of the github code and some actual data myself instead of pontificating on nothing but vibes/theory.
Totally understand that there are constraints on "public" interpretability, standardization (and getting everyone to agree) alongside the theoretical qualities of whatever metrics you choose. It's definitely helpful context to know that the post is more descriptive ("this is the statistics encoded in law / customary practice") rather than prescriptive ("this is best metric for something").
> When you're reporting to state agencies and explaining things to taxpayers, sales ratios are a) what they expect and b) much easier for them to understand.
So I actually think sales ratios are the right thing to use for both reporting and calculation. I just think mechanically, it makes more sense to work in log-space (since `x/y = exp(ln x - ln y)`) and just "convert" to a sales ratio at the end by exponentiation. IMO this leads to more natural statistics (e.g. instead of an "arithmetic mean of ratios" you get a geometric mean which preserves the "multiplicative nature" of what we're doing), although I agree it might be harder to explain.
This only really matters if you are actually adding/subtracting ratios though -- things like the median will be equivalent in both spaces (another reason something like the "median absolute deviation" -- `median(ratio - median(ratio))` might make more sense the COD, aside from the fact that MAD is a robust estimator of dispersion where it seems like COD has some (optional) post-hoc trimming involved).
> I consider best practice to always focus on the untrimmed statistics, I just mention the trimming because it's in the standards--IAAO is just trying to cover all the bases in a country with widely divergent local practices.
FWIW, I would love a follow-up post on just the governance structure here (and thus what avenues of collaboration are is possible) -- to what degree is IAAO informative/advisory vs required/mandatory? How much of this is encoded in legislation vs up to the particular assessor's office in whatever jurisidiction you're in? My assumption is that everything is very decentralized, but I'm not actually sure!
> As for why the IAAO picked COD specifically, I don't know the specific historical reason. If it helps, in an assessor's minds there are two things they are trying to communicate to the public and to their overseers:
FWIW, I think this makes a lot of sense! The sales ratios (in some sense) measure the "error" [0], and so you want to characterize the distribution of error (in which case a measure of central tendency like the median and a measure of spread makes sense). I just thought the specific form of COD was odd, since it uses a median for the central tendency but then a (rescaled) arithmetic mean for the spread (vs MAD which uses median for both or just something even more standard like standard deviation).
I don't know there's anything strictly "wrong" with COD to be fair -- I think it's mostly that you lose out in well-developed statistical tests/tools since mean absolute deviation is much less studied than median absolute deviation or standard deviation.
> A full treatment is more complicated than I will be able to go into here, but suffice it to say that this is something the field deals with all the time and a lot has been written on the subject.
Definitely noted!
> For what it's worth, vertical equity is a statistic that has had a LOT of debate and the IAAO is currently reworking their standards on it. You can see the latest thing they've come up with here:
Thanks for the link. I skimmed the VEI definition (and will try to take a closer look sometime later).
> A key consideration in all these things is getting offices to actually adopt whatever new statistic they pick.
That's fair -- ML actually has a similar problem where people mostly still use the (IMO inferior) binned statistics rather than the cumulative statistics, even though the cumulative statistics have been pretty well known in the stats world for a pretty long time.
I personally think the "visual" graph of cumulative errors is quite interpretable / easy to understand, although the actual statistical computation is definitely more complex.
> It's true that there's an assumption -- "If the locations are the same and the characteristics are the same, and there are similar properties from the same local area sample that have sold, the unsold similarly-located, similarly-featured properties should be valued according to the same standardized method and should therefore all have similar valuations." It is the law and the public that are largely making this assumption, however.
I understand the intuition, but I think where I'm suspicious is in how you define "similar locations" and "similar characteristics"? My intuitions here might just be totally off -- I live in NYC (where neighboring buildings can vary drastically and even a 1-block difference can be huge). Even in the suburbs of DC where I grew up, I think there was a pretty wide variation house-to-house between next door neighbors (e.g. in terms of how recently renovated, the style of housing, how large, etc).
> That said, it's good to kick the tires, so I would welcome some concrete examples of where you would expect this definition/assumption of horizontal uniformity to fail in practice. What kind of property should not be valued the same as its physically identical close neighbors, even if all those physically identical close neighbors have sold for similar prices? That would help to better understand the limits of the conventional approaches.
So I think the "if all those physically identical close neighbors have sold for similar prices" is the really the key assumption here. If the clusters have identical sales prices (ignoring noise and timing variation), then I don't see any problems here. But if there is non-trivial variability, then there must be some "expected" level of CHD just based on that "internal" variation, right?
It sounds like you're tackling this by trying to "carefully choose" the clusters so that the sales price variability is small (and can be more-or-less ignored). I think this should work, but you'd have to be quite careful to ensure that each cluster has a similar amount of internal variability (otherwise, if cluster A is the 99% in CHD, is that because something is wrong with the algorithm or because cluster A "just" has high internal variability in the sales price?).
My broader point is that this is unnecessary if you directly "normalize" the valuation variability by the expected sales price variability. That way, you don't have to focus more on the semantic meanings of the clusters without having to worry about their "internal" variability and can analyze large "clusters" directly (e.g. large demographic swathes like racial groups, entire neighborhoods, etc).
The downside here is that each cluster needs to be "large enough" that you can use the sales prices to "calibrate" the expected valuation though: your point about "Looking at sales ratios tells you nothing about how consistently unsold properties are being valued" is the biggest problem with my proposed approach. I suspect there is some good literature about how to tackle this in the causal inference space, but that's beyond my area of expertise -- you have sort of the same problem with observational trials with strong selection bias vs randomized controlled trials).
[0] One potential misunderstanding I realize I had is that I understood this to be a regression where you try to predict the sales price given the house characteristics `x` and the time-of-sale `t`. I'd implicitly assumed that you would compare the estimated price `f(x, t)` against the sales price to test the algorithm, but use `f(x, Jan 1)` as the taxable valuation (relying on the natural "smoothness" of `f` with respect to `t`). But re-reading the article, it sounds like we are using `f(x, Jan 1)` when calculating the ratio, which means we have both the "model error" along with the "timing error" captured here.
Thanks for your response here! Lots of good stuff. Some questions I'm able to answer:
RE IAAO governance:
> My assumption is that everything is very decentralized, but I'm not actually sure!
Everything is decentralized, but the IAAO still has a lot of influence, even though they don't have formal authority in terms of legislation, being a private org. That said, the IAAO (and it's many state chapters) have been blessed by the local legislatures to be involved in a lot of the credentialing and standards making. Local standards usually take the same basic shape out of the national standard books, but the exact figures, tolerances, and strictness is up to the whims of lawmakers.
> I understand the intuition, but I think where I'm suspicious is in how you define "similar locations" and "similar characteristics"? My intuitions here might just be totally off -- I live in NYC (where neighboring buildings can vary drastically and even a 1-block difference can be huge). Even in the suburbs of DC where I grew up, I think there was a pretty wide variation house-to-house between next door neighbors (e.g. in terms of how recently renovated, the style of housing, how large, etc).
You're right to be suspicious here! I have found the same thing, and so a truly good horizontal equity test that relies on a clustering algorithm ALSO needs some way to test whether the locations, or "neighborhoods", are well drawn and defensible. A "well drawn" neighborhood can be defined in various ways, but in layman's terms is an area within which a) all the properties are similar and b) the relationship between sale price and property characteristics is similar, where "similar" means "within some acceptable tolerance range X." There are various mathematical tests for this that I've been experimenting with lately, and I've gotten some good results.
As for this part specifically:
> e.g. in terms of how recently renovated, the style of housing, how large, etc).
Date of last renovation, style of housing, and especially size of housing, are all characteristics that assessors collect, or at least are *supposed* to collect. You can't directly compare side-by-side homes if they're different styles, or one was renovated 25 years ago and the other last week, or one is more than 50% bigger than the other -- each would go in its own local cluster of physically similar properties. That said, it's definitely a common problem that the assessor doesn't always have visibility into all these characteristics. If you're "flying blind" with subpar characteristic visibility, you would have to accept much higher local variation (and also less accurate prediction models).
Generally speaking, my form of the horizontal equity test is probably not something that in its current form can/should be written into legislation, but should remain as an internal consistency check. FWIW, I've heard anecdotes from local assessors that as much as 40% of their protest volume is generated by side-by-side inconsistency in $/sqft valuation between neighboring houses, so developing a reliable test for this could have a massive impact on public trust in the assessment system. Local consistency is really important.
> But if there is non-trivial variability, then there must be some "expected" level of CHD just based on that "internal" variation, right?
Absolutely! Just as you would never expect a COD to drop to zero, you wouldn't expect side by side homes to be *perfectly* identical either, even if the clusters are drawn in a basically perfect way that everyone would agree makes sense. That's why my (admittedly arbitrarily picked) rule of thumb for a median CHD of 15 is 15, and not 0.
You bring up a lot of other great points too, but this is all I have time to get into for now. If you haven't already, we'd love to have you on the OpenAVMKit discord, we could learn a lot from eachother!
A good article, thank you for writing it up. I kinda feel like it leaves out most of the important bits though; in real life, nearby houses can be quite heterogeneous, and teasing out what impact this should have on their market value seems like the primary challenge of making fair and accurate valuations.
Hey Isaac! So this is only the first of many articles to come, and if there's interest believe me we will dive deep into all the rest of these questions. This in particular is a good question, and of course in the real world nearby homes are in fact not perfectly identical. Part of the open source library I'm working on is a clustering algorithm that accounts for this -- fractally dividing a neighborhood up into clusters based on tiers of the most important physical characteristics (building type, building size, building age, building quality, building condition, etc).
We'll be more than happy to get into all the nitty gritty details when we release that, and I look forward to your feedback.
I was just already at 8,000+ words for the opener so I figure I had to end somewhere 😅
I look forward to reading the rest! I was getting curious about this recently actually; I bought a house for ~$300k and just a couple years later with no apparent change in the area it was being appraised at ~$450k, which was confusing.
I would certainly like to see the market comps they used. There *is* a pretty common phenomenon where buyers anchor on the price they paid and are shocked at how fast the market moves after they move in and check out of the housing market - but bottom line, either there’s market evidence to support your valuation or there isn’t. If you’re being compared to homes too far away or of the wrong class, you could likely mount a successful protest.
> If a property assessor overvalues a home in a wealthy neighborhood, they will be sure to hear of it come protest season. In this way, any valuation errors on the higher end tend to get swiftly corrected, while there’s less pressure coming from the low end.
I'm confused by this part; you say they only protest overvaluation, which makes sense, but then conclude that *any* error will get corrected by this mechanism. Why would this apply to undervaluations too?
If that's your reading, I might have worded it awkwardly. The property tax protest mechanism mostly only catches overvaluations. To the extent it catches undervaluations, it's when someone notes that their neighbor got a lower valuation than they did. But that's marginal.
The pressure to catch undervaluations usually comes from state oversight boards. For instance, the Texas state comptroller's property value study is there to make sure that local appraisal districts are not valuing too low. (They also check if valuations are too high, but there's an actual incentive not to undervalue, because the state is on the hook to supplement funds for poor school districts, and it frowns on local governments that freeload on state funds through undervaluation)
Can you elaborate on how regression to the mean applies here? Regression to the mean is I think primarily an issue when some random-over-time process is measured as being far away from the mean; we expect it to move towards it in the future. By what mechanism would an assessment be affected?
I mean it just as a general sort of gravitational pull I tend to see, caused by any number of small factors. The most significant explicit cause is typically data bias. Forgive me if I'm misusing the term.
Sales data for average-priced properties is the most numerous. If you are missing even a few sales in high end neighborhoods, that tends to pull their valuations down. And if you are missing even a few sales in low end neighborhoods, that tends to pull their valuations up.
But then there's missing data in characteristics, because (as we'll talk about in future articles) valuations are built as models where characteristics get assigned dollar values in a complex way. When you are missing characteristic data, you fill in with assumptions -- typically whatever is average. And this has a tendency to pull the valuation for whatever you filled in towards the average value as well.
…39 minute read(!)
But I will read it.
Okay, excellent write-up. Thank you for taking so much time for this post, both for its thoroughness as well as your humility.
Thank you for this effort to inform your readers. That is a whole ton and a half of work.
Hi Lars,
After 12.5 years in the Assessor’s Office I can tell you your graph is not accurate.
To anyone that has a question on how this works let me know and I might be able to provide some insight.
Nice try though.
Would love to hear more from you! Can you point out which graph in particular is inaccurate and what the deficiency is? Also it’s cool that you have so much experience as an assessor - can you tell me which office you worked in?
Hi Lars,
I worked in an Assessor’s Office in Utah. I know this sounds funny, but we working with Assessor’s in Oklahoma when I was in the Assessor’s Office. So, is not as different in the United States as people might think.
Just a few things:
Where I live, Sales Chasing is illegal. There are two different types of states. “Full Disclosure” and “Non Disclosure.” There are several “Non-disclosure” states where the Assessor is not allowed to see the amount the home was purchased for.
If the Assessor does find out the sales price, and values the home at that price w/o valuing all the other homes in the area, the assessor can be in hot water for Sales Chasing.
It sounds like Iowa is “Full Disclosure” and Sales Chasing is alive and well.
One reason for a discrepency in the price of homes that are very similar is that not all homes are valued as of January 1st. Some states value properties as of June 1st. Not all Assessor’s are County Offices some Assessor’s are City Offices.
Often A Plat (Plot Map) is handled by one Assessor in the Office and the next Plot Map is handled by another Assessor. So I would expect there to be some some variation in the way the homes are valued.
Another possible reason for the discrepency is homes must be appraised at least once every 5 years. So, an Assessor has the homes on a time to be valued. One year the Assessor might have their office all the homes built from 1970 - 1999.
~So if the home next door to the 1999 was built in 2000, it’s value could stay very close to what it currently is. And it’s assessed value may not change until the next year.
If you’re curious about appeals, I’ve been in thousands of appeals and I know what works the best.
Mass appraisal can be in the form of an AVM, Cost Approach, Sales Comparison Approach, Income Approach, etc. It depends on the size of the Assessor’s office and the number of parcels being appraised.
But I’ve learned that most people are afraid of the process because they fear what they don’t understand.
My goal now is to help people learn to minimize their property taxes, then help them know which improvments will given their homes give them the most bang for their buck.
This is just a microscopic look into the Assessor’s world. Let me know if I have bored you to death.
Thanks Lars.
Hi there,
Thank you for taking the time to share your insights and experience—it’s always a treat to connect with someone who’s been in the thick of it for so long. I genuinely appreciate your feedback and am eager to learn from your perspective.
It’s great to hear you’ve worked in Utah. I know Utah's a big place, but any chance you know Jake Parkinson who used to work at the Tooele County Assessor’s Office? I've learned a lot from him.
Let’s go through your points one by one:
Sales Chasing & Non-Disclosure:
I completely agree that the distinction between “Full Disclosure” and “Non Disclosure” states is crucial. In Texas (my home state), we face similar challenges, and I’ve found the Utah offices I've worked with to be particularly resourceful in navigating non-disclosure issues. I discuss this topic in more detail in this section (ttps://progressandpoverty.substack.com/i/158598256/data-scarcity). If there’s anything you feel could be clearer, I’d be happy to hear your thoughts.
Valuation Dates:
You make a good point about the variety of valuation dates. I mentioned January 1st mainly because it’s the most commonly used benchmark, but I certainly appreciate that some states opt for dates like June 1st. I do touch on this nuance in this section of the essay: (https://progressandpoverty.substack.com/i/158598256/freezing-in-january) and further note the time effects on valuation drift here: (https://progressandpoverty.substack.com/i/158598256/a-note-on-time).
If any section seemed ambiguous, please let me know where I might clarify further.
Assessor Offices’ Structures:
You’re absolutely right that not all assessor offices operate at the county level; some are indeed city offices or even independent entities. I’ve illustrated this point with examples of “appraisal districts” that function under the authority of neither the city or the county (https://progressandpoverty.substack.com/i/158598256/actual-bad-behavior). I appreciate you highlighting this nuance.
Mass Appraisal Methods:
Thank you for pointing out the broader scope of mass appraisal techniques. My article implicitly focused on the sales comparison approach, which admittedly means I glossed over alternative methods like the AVM, Cost Approach, or Income Approach. I’ve covered these in previous articles, and I appreciate the reminder to ensure readers understand the full context.
Finally, regarding your comment that “your graph is not accurate” — could you please indicate which graph you’re referring to and what specific issue you noticed? The illustrations are meant to be simplified examples that get one concept across at a time rather than claiming to explain everything at once. I’m more than willing to address any specific concerns if you could point me in the right direction.
Thanks again for your thoughtful feedback. I value the opportunity to engage in this discussion and am always open to learning from the experts. Wishing you a fantastic day ahead!
Best regards,
Lars
Lars,
I actually think the illustrations are really good for explaining what happens and generally how things work. I think you've done a tremendous job.
I'm new to this forum so I apologize for the criticism.
I would just like to add a quick word a caution about your illustrations (graph). While I do believe they are great as a general rule of thumb to get your point across; Zillow, Redfin, or a Real Estate Agent may be reporting the sq ft of the homes in the wrong manner.
Currently I review appraisals for Fannie Mae correctness. Every borrower is allowed one appeal per appraisal due to a Fannie Mae regulation as of last October.
One of the biggest complaints I hear in appeals currently is that real estate agents, Zillow, and Redfin are adding the basement sq ft to the main level sq ft for marketing purposes which often makes the home look like a better deal.
Appraisers are not allowed to report the sq ft of a home this way. There is no guarantee the Assessor does not report the sq ft of a home in the same manner as the real estate agents.
In regards to Jake Parkinson, I want to say yes. I believe Jake was in a different County when I met him. There are 29 Counties in Utah with a total population of about 3.4 million. So, I do know some of the Assessors well, but they are the assessors in the more populated areas.
I don't know last names really well. First names and faces mostly.
Lars, I think you're doing a great job. Keep up the good work.
Hey nothing to apologize for! I want to keep myself sharp and nothing is better than an experienced practitioner for that.
> One of the biggest complaints I hear in appeals currently is that real estate agents, Zillow, and Redfin are adding the basement sq ft to the main level sq ft for marketing purposes which often makes the home look like a better deal.
This is a really great insight, including unfinished square footage I presume as if it were finished? I'll be sure to be on the lookout for this anytime I touch those sources.
Do they also conduct ratio studies where the assessed value is divided by the sale price _soon after_ the valuation/release date? This should be a better shield against sales chasing, right?
It varies by jurisdiction. Most assessors will run ratio studies throughout the process, but the state oversight boards will typically stick to a "standardized test" as above.
Indeed, using new sales to check the predictive power of your valuations is a good practice. The challenge is that you are always limited by the amount of sales, and sales soon after the valuation date will only affect a portion of your sales.
How would it work in a dying village that barely sees any transactions? Would you generalize pricing based on the availability of amenities in a similar location that does have transaction data?
I’m also curious how it work with a very high LVT when you couldn’t rely on sales data.
Great article, thanks!
So, obviously the more extreme data challenges are more difficult and you should expect wider error bars there.
> How would it work in a dying village that barely sees any transactions? Would you generalize pricing based on the availability of amenities in a similar location that does have transaction data?
There are some who do this, but it will certainly be challenged on the basis of how comparable the two areas are, and that will be difficult to mount a counter-argument against. How do we define a "similar location?" I should note that this problem sometimes does occur *within* a jurisdiction -- and in that case the usual answer is to find some--preferably economic--anchor point, such as median income for the local area, which is widely published in US census data. The comparable location should also have similar density, same mix of property types, and not be too far away.
Another consideration is that there is more than one general approach to value and here I only spoke about the most common one, the "market approach" or "sales comparison approach." The other two approaches to value are the cost approach and the income approach.
In an area without any sales, or very sparse sales (such as is often the case with commercial, industrial, or rental property) the income approach comes into play. This would be a net present value of discounted cash flow model, the same model that real estate investors use to generate the prices they bid on properties in the first place. This assumes you have access to rental data, cap rates, etc.
This is very late, but I wanted to leave some naive comments/questions here. I'm coming at this from much more of a math/stats background and without any domain knowledge, so apologies if the questions/comments here are naive.
1. On a mathematical note, why don't we just do everything in the log-space of prices? I find it confusing that we are calculate ratios first (a multiplicative operation) but then do additive statistics on them.
2. Why do we use the coefficient of dispersion (a rescaled mean absolute deviation) instead of a more "standard" measure of dispersion (e.g. a standard deviation if we don't care about robustness or a MAD or Shamos Scale Estimator if we want a robust estimator)? If we are trimming "sales outside the interquartile range" (that's trimming 50% of the data!), then (IMO) that implies we should just be using a robust measure of scale instead [0]. Is this just historical?
3. How does "fair market value" work when you have can bundle/split property? I can imagine that some properties are worth "more than then sum of the parts" (e.g. it's more valuable to own all condos in a building if you want to redevelop) or "less than the sum of its parts". Do we just assume that property is "pre-parceled" and so we don't deal with mergers/splits?
4. IIUC, "vertical equity" is what I'm used to calling "calibration": for any house, you want the expected sales price to be the predicted valuation (adjusted for timing). What you *don't* want is to have systematic deviations from the y=x line (e.g. so that expensive homes aren't consistently undervalues). If that's right, then I'd suggest taking a look at approaches that *avoid* binning/stratifying altogether -- I believe the the most accepted approaches today are based on cumulative deviations [1]. The simplified TLDR (the references in [1] have the details)
1. Store the (log) assessed values in an array `A` and the (log) sales prices in `S`.
2. Sort them in the ascending order of the assessed values: `S = S[argsort(A)]` and `A = sort(A)`.
3. Calculate the cumulative errors `C = (A - S).cumsum()`
4. Calculate the expected total variance `sigma2 = sum((A - S)**2`. [2]
5. Calculate `K = (C.max() - C.min()) / sqrt(sigma2)` (the author calls this the "Kuiper-type statistic" after https://en.wikipedia.org/wiki/Kuiper%27s_test). `K` is now a summary statistic with nice theoretical properties (including p-values because the cumsum should converge to a Brownian motion). You can also plot `C / sqrt(sigma2)` to visualize any discrepancies.
5. I was confused about the discussion of "horizontal equity". Aren't we begging the question if we assume that things in the CHD should have low dispersion on the valuations? It seems entirely circular to *assume* that certain buildings must have similar valuations and then use that to evaluate a valuation model! I feel like what makes more sense exactly the "deviation of a subpopulation from the full population" test from https://link.springer.com/content/pdf/10.1007/s10444-023-10068-6.pdf (where our subpopulation can be "property in a geographic unit" or "property owned by a particular minority group") which does not assume that our subpopulation should have similar home prices.
[0] I'm not an expert on these, but https://pragmastat.dev/ collects these (with proofs and links to references) in a central place.
[1] I'm no expert, but https://icml.cc/virtual/2025/40003 is a good tutorial if you like slides + audio. If you prefer papers (with rigorous proofs), then https://jmlr.org/papers/volume23/22-0658/22-0658.pdf and https://link.springer.com/content/pdf/10.1007/s10444-023-10068-6.pdf are good references (the former is more targeted towards ML folks while the latter is more for a statistics audience). The papers + tutorials mostly focus on classification, but they generalize to real-valued regression too.
[2] This point is a bit buried in the papers (since they mostly focus on classification and not prediction), but the key is that the null hypothesis of "perfect calibration" is that E[S] = A and that S is independent given A. Thus, Var[sum(A - S)] = sum(Var(A[k] - S[k])) (by conditional independence) = sum(E[(A[k] - S[k])**2) (by expectation of S) = E[sum(A[k] - S[k])**2] (by linearity of expectation).
These are great questions. Let me see if I can get to them all.
The first thing to say is that there is some historical path dependency in assessment and that although there are many professional statisticians in the field, its principal audience, stakeholders, and most importantly, overseers, are not statisticians, but rather laypeople and politicians who need things explained to them in the simplest possible terms. The more intuitive and concrete a measure is for someone who might only have a high school education, the better.
That said, nothing prevents any individual assessor or modeler from using more traditional academic statistics in their internal work. However, for official reporting, these are the statistics that IAAO (and in most cases, the law) insist upon.
I will also note that, as you might expect, there is some lively debate in the community about whether these are the best possible statistics that could be used, and you would not be the first to question them. In that light, do not take my explanations given here as anything other than my best descriptive explanations for why the industry seems to have settled on their usage.
On a further note, nothing stops us from using additional statistics in our free and open source code library, OpenAVMKit. We will always have to generate and report the IAAO standards if we want assessors to use it, but nothing stops us from adding additional ones for extra scrutiny! In that light I appreciate your feedback here.
Now to your questions.
> 1. On a mathematical note, why don't we just do everything in the log-space of prices? I find it confusing that we are calculate ratios first (a multiplicative operation) but then do additive statistics on them.
The assessment profession has historically been built around *sale ratios* because they're easy for laypeople to interpret. Also, the sales ratio statistic is *deeply* embedded in both law and IAAO standards. When you're reporting to state agencies and explaining things to taxpayers, sales ratios are a) what they expect and b) much easier for them to understand.
> 2. Why do we use the coefficient of dispersion (a rescaled mean absolute deviation) instead of a more "standard" measure of dispersion (e.g. a standard deviation if we don't care about robustness or a MAD or Shamos Scale Estimator if we want a robust estimator)? If we are trimming "sales outside the interquartile range" (that's trimming 50% of the data!), then (IMO) that implies we should just be using a robust measure of scale instead [0]. Is this just historical?
First of all, on the question of trimming, the actual standards are set by the local jurisdiction, and those practices can vary widely. Some do not let you trim much at all, others let you trim a lot. I consider best practice to always focus on the untrimmed statistics, I just mention the trimming because it's in the standards--IAAO is just trying to cover all the bases in a country with widely divergent local practices.
As for why the IAAO picked COD specifically, I don't know the specific historical reason. If it helps, in an assessor's minds there are two things they are trying to communicate to the public and to their overseers:
- The overall *level* of assessment, ie, if they are generally over-assessing or under-assessing. This is expressed by the median ratio.
- The overall *uniformity* of all those individual assessments relative to the sale prices. This is measured by the COD.
Importantly, measuring these factors separately (overall assessment level and assessment-to-sale uniformity) is often required by state law.
So compared to academic statistics, I admit that calling COD a measure of "accuracy" in the article was a slight simplification, because you can of course measure raw accuracy more directly with other statistics. Another thing is that ratio-based statistics are used because they are simple and easily comparable across jurisdictions and groupings, whereas absolute error measures are not.
> 3. How does "fair market value" work when you have can bundle/split property? I can imagine that some properties are worth "more than then sum of the parts" (e.g. it's more valuable to own all condos in a building if you want to redevelop) or "less than the sum of its parts". Do we just assume that property is "pre-parceled" and so we don't deal with mergers/splits?
There is an entire literature in the field for dealing with this so the short answer is "it's complicated." For instance, Multi-parcel sales in some jurisdictions will be excluded from consideration on the grounds of being atypical sales, precisely because the package often trades at a discount compared to what the parcels would have sold for individually, and the most typical market transaction in that jurisdiction is for a parcel to trade individually. However, some jurisdictions will have methods of unpacking multi-parcel sales into multiple "individual sales" that they will then use in their modeling. A full treatment is more complicated than I will be able to go into here, but suffice it to say that this is something the field deals with all the time and a lot has been written on the subject.
> 4. IIUC, "vertical equity" is what I'm used to calling "calibration": for any house, you want the expected sales price to be the predicted valuation (adjusted for timing). What you *don't* want is to have systematic deviations from the y=x line (e.g. so that expensive homes aren't consistently undervalues). If that's right, then I'd suggest taking a look at approaches that *avoid* binning/stratifying altogether -- I believe the the most accepted approaches today are based on cumulative deviations [1]. The simplified TLDR (the references in [1] have the details)
For what it's worth, vertical equity is a statistic that has had a LOT of debate and the IAAO is currently reworking their standards on it. You can see the latest thing they've come up with here:
https://www.iaao.org/about/board-of-directors/governing-documents/ratio-studies-exposure-draft/
There has been much debate and the final decision has not yet been made, but they seem to have settled on this new "VEI" statistic. A key consideration in all these things is getting offices to actually adopt whatever new statistic they pick.
> 5. I was confused about the discussion of "horizontal equity". Aren't we begging the question if we assume that things in the CHD should have low dispersion on the valuations? It seems entirely circular to *assume* that certain buildings must have similar valuations and then use that to evaluate a valuation model! I feel like what makes more sense exactly the "deviation of a subpopulation from the full population" test from link.springer.com/conte… (where our subpopulation can be "property in a geographic unit" or "property owned by a particular minority group") which does not assume that our subpopulation should have similar home prices.
There's two things to say here--first of all, the opinion you're contending with is not so much that of assessors but that of taxpayers and the law. The basic principle is "equal treatment of equals." The purpose is to have some way of enforcing that the assessor is using a *standardized method* that is not applying personal or arbitrary bias, but that, entirely apart from its predictive performance in matching sales prices, treats the same kinds of properties in the same way.
It's true that there's an assumption -- "If the locations are the same and the characteristics are the same, and there are similar properties from the same local area sample that have sold, the unsold similarly-located, similarly-featured properties should be valued according to the same standardized method and should therefore all have similar valuations." It is the law and the public that are largely making this assumption, however.
I definitely get your argument about circularity, the thing is just empirically if you violate this standard, oversight will ding you the taxpayer who has the house with the higher valuation than their neighbor with an identical home will protest, "why do I pay more? His house is the same!"
Now despite that strong directive from both the law and the public, there is not actually a single accepted IAAO measure for horizontal equity. There's a few tests floating around that look at median ratios across different property types, but these don't measure uniformity directly. Looking at sales ratios tells you nothing about how consistently unsold properties are being valued.
My own personal clustering method is just something I came up with. It's chief disadvantage is that it's only as good as it clusters, and its validity only holds if you can make the genuine case that everything within a similar cluster should be similarly valued. The key thing is that I am not clustering just on physical characteristics, but also on location, and if you have experience in mass appraisal you will quickly find that nearly every feature you care about (including sale price) is strongly geospatially correlated.
That said, it's good to kick the tires, so I would welcome some concrete examples of where you would expect this definition/assumption of horizontal uniformity to fail in practice. What kind of property should not be valued the same as its physically identical close neighbors, even if all those physically identical close neighbors have sold for similar prices? That would help to better understand the limits of the conventional approaches.
Thanks for taking the time to respond! Definitely appreciate it (and hopefully at some point I'll have time to look through some of the github code and some actual data myself instead of pontificating on nothing but vibes/theory.
Totally understand that there are constraints on "public" interpretability, standardization (and getting everyone to agree) alongside the theoretical qualities of whatever metrics you choose. It's definitely helpful context to know that the post is more descriptive ("this is the statistics encoded in law / customary practice") rather than prescriptive ("this is best metric for something").
> When you're reporting to state agencies and explaining things to taxpayers, sales ratios are a) what they expect and b) much easier for them to understand.
So I actually think sales ratios are the right thing to use for both reporting and calculation. I just think mechanically, it makes more sense to work in log-space (since `x/y = exp(ln x - ln y)`) and just "convert" to a sales ratio at the end by exponentiation. IMO this leads to more natural statistics (e.g. instead of an "arithmetic mean of ratios" you get a geometric mean which preserves the "multiplicative nature" of what we're doing), although I agree it might be harder to explain.
This only really matters if you are actually adding/subtracting ratios though -- things like the median will be equivalent in both spaces (another reason something like the "median absolute deviation" -- `median(ratio - median(ratio))` might make more sense the COD, aside from the fact that MAD is a robust estimator of dispersion where it seems like COD has some (optional) post-hoc trimming involved).
> I consider best practice to always focus on the untrimmed statistics, I just mention the trimming because it's in the standards--IAAO is just trying to cover all the bases in a country with widely divergent local practices.
FWIW, I would love a follow-up post on just the governance structure here (and thus what avenues of collaboration are is possible) -- to what degree is IAAO informative/advisory vs required/mandatory? How much of this is encoded in legislation vs up to the particular assessor's office in whatever jurisidiction you're in? My assumption is that everything is very decentralized, but I'm not actually sure!
> As for why the IAAO picked COD specifically, I don't know the specific historical reason. If it helps, in an assessor's minds there are two things they are trying to communicate to the public and to their overseers:
FWIW, I think this makes a lot of sense! The sales ratios (in some sense) measure the "error" [0], and so you want to characterize the distribution of error (in which case a measure of central tendency like the median and a measure of spread makes sense). I just thought the specific form of COD was odd, since it uses a median for the central tendency but then a (rescaled) arithmetic mean for the spread (vs MAD which uses median for both or just something even more standard like standard deviation).
I don't know there's anything strictly "wrong" with COD to be fair -- I think it's mostly that you lose out in well-developed statistical tests/tools since mean absolute deviation is much less studied than median absolute deviation or standard deviation.
> A full treatment is more complicated than I will be able to go into here, but suffice it to say that this is something the field deals with all the time and a lot has been written on the subject.
Definitely noted!
> For what it's worth, vertical equity is a statistic that has had a LOT of debate and the IAAO is currently reworking their standards on it. You can see the latest thing they've come up with here:
Thanks for the link. I skimmed the VEI definition (and will try to take a closer look sometime later).
> A key consideration in all these things is getting offices to actually adopt whatever new statistic they pick.
That's fair -- ML actually has a similar problem where people mostly still use the (IMO inferior) binned statistics rather than the cumulative statistics, even though the cumulative statistics have been pretty well known in the stats world for a pretty long time.
I personally think the "visual" graph of cumulative errors is quite interpretable / easy to understand, although the actual statistical computation is definitely more complex.
> It's true that there's an assumption -- "If the locations are the same and the characteristics are the same, and there are similar properties from the same local area sample that have sold, the unsold similarly-located, similarly-featured properties should be valued according to the same standardized method and should therefore all have similar valuations." It is the law and the public that are largely making this assumption, however.
I understand the intuition, but I think where I'm suspicious is in how you define "similar locations" and "similar characteristics"? My intuitions here might just be totally off -- I live in NYC (where neighboring buildings can vary drastically and even a 1-block difference can be huge). Even in the suburbs of DC where I grew up, I think there was a pretty wide variation house-to-house between next door neighbors (e.g. in terms of how recently renovated, the style of housing, how large, etc).
> That said, it's good to kick the tires, so I would welcome some concrete examples of where you would expect this definition/assumption of horizontal uniformity to fail in practice. What kind of property should not be valued the same as its physically identical close neighbors, even if all those physically identical close neighbors have sold for similar prices? That would help to better understand the limits of the conventional approaches.
So I think the "if all those physically identical close neighbors have sold for similar prices" is the really the key assumption here. If the clusters have identical sales prices (ignoring noise and timing variation), then I don't see any problems here. But if there is non-trivial variability, then there must be some "expected" level of CHD just based on that "internal" variation, right?
It sounds like you're tackling this by trying to "carefully choose" the clusters so that the sales price variability is small (and can be more-or-less ignored). I think this should work, but you'd have to be quite careful to ensure that each cluster has a similar amount of internal variability (otherwise, if cluster A is the 99% in CHD, is that because something is wrong with the algorithm or because cluster A "just" has high internal variability in the sales price?).
My broader point is that this is unnecessary if you directly "normalize" the valuation variability by the expected sales price variability. That way, you don't have to focus more on the semantic meanings of the clusters without having to worry about their "internal" variability and can analyze large "clusters" directly (e.g. large demographic swathes like racial groups, entire neighborhoods, etc).
The downside here is that each cluster needs to be "large enough" that you can use the sales prices to "calibrate" the expected valuation though: your point about "Looking at sales ratios tells you nothing about how consistently unsold properties are being valued" is the biggest problem with my proposed approach. I suspect there is some good literature about how to tackle this in the causal inference space, but that's beyond my area of expertise -- you have sort of the same problem with observational trials with strong selection bias vs randomized controlled trials).
---------------------------------------------------------------------
[0] One potential misunderstanding I realize I had is that I understood this to be a regression where you try to predict the sales price given the house characteristics `x` and the time-of-sale `t`. I'd implicitly assumed that you would compare the estimated price `f(x, t)` against the sales price to test the algorithm, but use `f(x, Jan 1)` as the taxable valuation (relying on the natural "smoothness" of `f` with respect to `t`). But re-reading the article, it sounds like we are using `f(x, Jan 1)` when calculating the ratio, which means we have both the "model error" along with the "timing error" captured here.
Thanks for your response here! Lots of good stuff. Some questions I'm able to answer:
RE IAAO governance:
> My assumption is that everything is very decentralized, but I'm not actually sure!
Everything is decentralized, but the IAAO still has a lot of influence, even though they don't have formal authority in terms of legislation, being a private org. That said, the IAAO (and it's many state chapters) have been blessed by the local legislatures to be involved in a lot of the credentialing and standards making. Local standards usually take the same basic shape out of the national standard books, but the exact figures, tolerances, and strictness is up to the whims of lawmakers.
> I understand the intuition, but I think where I'm suspicious is in how you define "similar locations" and "similar characteristics"? My intuitions here might just be totally off -- I live in NYC (where neighboring buildings can vary drastically and even a 1-block difference can be huge). Even in the suburbs of DC where I grew up, I think there was a pretty wide variation house-to-house between next door neighbors (e.g. in terms of how recently renovated, the style of housing, how large, etc).
You're right to be suspicious here! I have found the same thing, and so a truly good horizontal equity test that relies on a clustering algorithm ALSO needs some way to test whether the locations, or "neighborhoods", are well drawn and defensible. A "well drawn" neighborhood can be defined in various ways, but in layman's terms is an area within which a) all the properties are similar and b) the relationship between sale price and property characteristics is similar, where "similar" means "within some acceptable tolerance range X." There are various mathematical tests for this that I've been experimenting with lately, and I've gotten some good results.
As for this part specifically:
> e.g. in terms of how recently renovated, the style of housing, how large, etc).
Date of last renovation, style of housing, and especially size of housing, are all characteristics that assessors collect, or at least are *supposed* to collect. You can't directly compare side-by-side homes if they're different styles, or one was renovated 25 years ago and the other last week, or one is more than 50% bigger than the other -- each would go in its own local cluster of physically similar properties. That said, it's definitely a common problem that the assessor doesn't always have visibility into all these characteristics. If you're "flying blind" with subpar characteristic visibility, you would have to accept much higher local variation (and also less accurate prediction models).
Generally speaking, my form of the horizontal equity test is probably not something that in its current form can/should be written into legislation, but should remain as an internal consistency check. FWIW, I've heard anecdotes from local assessors that as much as 40% of their protest volume is generated by side-by-side inconsistency in $/sqft valuation between neighboring houses, so developing a reliable test for this could have a massive impact on public trust in the assessment system. Local consistency is really important.
> But if there is non-trivial variability, then there must be some "expected" level of CHD just based on that "internal" variation, right?
Absolutely! Just as you would never expect a COD to drop to zero, you wouldn't expect side by side homes to be *perfectly* identical either, even if the clusters are drawn in a basically perfect way that everyone would agree makes sense. That's why my (admittedly arbitrarily picked) rule of thumb for a median CHD of 15 is 15, and not 0.
You bring up a lot of other great points too, but this is all I have time to get into for now. If you haven't already, we'd love to have you on the OpenAVMKit discord, we could learn a lot from eachother!
https://discord.com/invite/4fCkSCPPJD
A good article, thank you for writing it up. I kinda feel like it leaves out most of the important bits though; in real life, nearby houses can be quite heterogeneous, and teasing out what impact this should have on their market value seems like the primary challenge of making fair and accurate valuations.
Hey Isaac! So this is only the first of many articles to come, and if there's interest believe me we will dive deep into all the rest of these questions. This in particular is a good question, and of course in the real world nearby homes are in fact not perfectly identical. Part of the open source library I'm working on is a clustering algorithm that accounts for this -- fractally dividing a neighborhood up into clusters based on tiers of the most important physical characteristics (building type, building size, building age, building quality, building condition, etc).
We'll be more than happy to get into all the nitty gritty details when we release that, and I look forward to your feedback.
I was just already at 8,000+ words for the opener so I figure I had to end somewhere 😅
I look forward to reading the rest! I was getting curious about this recently actually; I bought a house for ~$300k and just a couple years later with no apparent change in the area it was being appraised at ~$450k, which was confusing.
I would certainly like to see the market comps they used. There *is* a pretty common phenomenon where buyers anchor on the price they paid and are shocked at how fast the market moves after they move in and check out of the housing market - but bottom line, either there’s market evidence to support your valuation or there isn’t. If you’re being compared to homes too far away or of the wrong class, you could likely mount a successful protest.
> If a property assessor overvalues a home in a wealthy neighborhood, they will be sure to hear of it come protest season. In this way, any valuation errors on the higher end tend to get swiftly corrected, while there’s less pressure coming from the low end.
I'm confused by this part; you say they only protest overvaluation, which makes sense, but then conclude that *any* error will get corrected by this mechanism. Why would this apply to undervaluations too?
If that's your reading, I might have worded it awkwardly. The property tax protest mechanism mostly only catches overvaluations. To the extent it catches undervaluations, it's when someone notes that their neighbor got a lower valuation than they did. But that's marginal.
The pressure to catch undervaluations usually comes from state oversight boards. For instance, the Texas state comptroller's property value study is there to make sure that local appraisal districts are not valuing too low. (They also check if valuations are too high, but there's an actual incentive not to undervalue, because the state is on the hook to supplement funds for poor school districts, and it frowns on local governments that freeload on state funds through undervaluation)
Can you elaborate on how regression to the mean applies here? Regression to the mean is I think primarily an issue when some random-over-time process is measured as being far away from the mean; we expect it to move towards it in the future. By what mechanism would an assessment be affected?
I mean it just as a general sort of gravitational pull I tend to see, caused by any number of small factors. The most significant explicit cause is typically data bias. Forgive me if I'm misusing the term.
Sales data for average-priced properties is the most numerous. If you are missing even a few sales in high end neighborhoods, that tends to pull their valuations down. And if you are missing even a few sales in low end neighborhoods, that tends to pull their valuations up.
But then there's missing data in characteristics, because (as we'll talk about in future articles) valuations are built as models where characteristics get assigned dollar values in a complex way. When you are missing characteristic data, you fill in with assumptions -- typically whatever is average. And this has a tendency to pull the valuation for whatever you filled in towards the average value as well.
Ignoring outliers for a calculation on how far away the outliers are from the mean is pretty wild.
This is why I always pay close attention to untrimmed ratio studies.