NYC data error report header

Critical Inaccuracies in Public Advocate Jumaane D. Williams' Landlord Watchlist

Analysis by
Published:
Updated: April 12th, 2025

The Landlord Watchlist, hosted on Public Advocate Jumaane D. Williams' official website, is intended to hold landlords accountable for Housing Maintenance Code violations. Recent analysis, however, reveals significant inaccuracies that misrepresent landlord records. A key issue is the case sensitivity of the underlying HPD dataset, which leads to skewed statistics and compromises the credibility of the Watchlist.

One glaring example is the record for Margaret Brunn , president of A&E Real Estate. Although the Watchlist lists her as the 18th worst landlord with 671 open HPD violations between the months of November 2023 and October 2024, a closer examination of the data – adjusted to account for case sensitivity errors – tells a very different story. Margaret Brunn's records report 9,897 open HPD violations between November 2023 and October 2024. Brunn currently has 20,211 open violations total.

In contrast, the Public Advocate’s Watchlist ranks Barry Singer as the #1 worst landlord with 1,804 open HPD violations between the months of November 2023 and October 2024.

* * *

The root of these errors lies in the inconsistent formatting of names. Variations such as “MARGARET BRUNN” written in different cases lead to fragmented records and vastly different counts. Without incorporating fuzzy matching or standardizing the data input formats, the dataset cannot reliably consolidate records. This issue could be resolved with a relatively simple change in the code that would standardize all entries, ensuring that variations like capitalization do not result in misreported data.

Nicholas Quinn and Shane Rajcooar of RYCO NYC — RYCO Capital — as they appear in the multiple dwelling registration contact dataset.
(Fig. 1A) Property Managers Nicholas Quinn and Shane Rajcooar (RYCO Capital) as they appear in the multiple dwelling registration contact dataset.

Above (Fig. 1A) is an example of data from the multiple dwelling registration contact dataset. I noted these stylization differences and here was what querying each of them returned:

Nicholas Quinn - RYCO Capital - Housing Data Fragmentation Shane Rajcooar - RYCO Capital - Housing Data Fragmentation

When analysts aggregate data by name in order to determine a total count of violations or rank landlords, there is an expectation that the name won’t need to be queried in every casing combination possible. Both “Shane Rajcooar” and “Nicholas Quinn” can each be written in 8,192 different ways when you consider every possible combination of uppercase and lowercase letters — plenty of room to hide!

I have raised these issues with the NYC Open Data Help Desk as well as HPD on numerous occasions. In Inquiry #11646, the department acknowledged the problem and indicated that they would review the necessary technological changes to improve data usability.

“‘firstname’ & ‘lastname’ variables are case sensitive. Pair that with the fact that the assigned registrationcontactid for a given contact is building-specific, and you leave next to no ability for people to consolidate the actions of a registered agent across buildings, and you leave developers open to the margin of error that comes with two individuals having the same name.”
— John Mateer, RentHistory.org (Inquiry #11646) • January 31st, 2025 at 10:00AM

“Thank you for your suggestions regarding property registration. We will review what technology changes are necessary and feasible at this time to make our data as user-friendly as possible for the purposes of analysis and grouping. We appreciate that this information is widely used and can provide valuable insight into conditions across buildings.”
New York City Department of Housing Preservation and Development (Response to Inquiry #11646) • February 13th, 2025 at 2:31PM

The fields for “firstname” and “lastname” are treated as case sensitive. If someone is entered as “John Doe” in one record and “john doe” in another, the system treats these as two completely separate individuals. This fragmentation makes it impossible to aggregate or analyze data accurately, since minor differences in capitalization result in multiple, disjointed records for the same person. I would like to also emphasize that the registration contact ID (registrationcontactid) is assigned on a per-building basis rather than being unique to the individual. An individual acting as a registered agent at multiple properties ends up with multiple distinct IDs. This prevents consolidating their actions (like tracking all violations across buildings) and introduces a significant margin of error.

This makes it incredibly easy for corporate landlords with an outrageously high number of violations to go undetected when being listed on something like the Public Advocate’s Landlord Watchlist, and they do.

Chaotic use of lowercase and uppercase to hide records. An example of the error in the HPD multiple dwelling registration contact dataset.
(Fig. 2A) Data in HPD’s registration contact dataset.

Above (Fig. 2A) is an example of capitalization currently present in these crucial datasets. Together with case sensitive querying, the ability to consolidate public records accurately is eliminated.

The error, present on the watchlist website.
(Fig. 1B) The error, present on the watchlist website.

Because of these issues, the same registered agent might appear in the dataset multiple times under different variations of their name (John Doe, John doe, JOHN DOE, john doe) and the discrepancy has been taken advantage of by landlords on countless occasions (see our directory). Given the fact platforms built using this dataset standardize their queries (First Last, and not FIRST LAST), developers & watchdogs using this data for analysis or to produce accountability measures are left with an almost unusable dataset.

Example of the case sensitivity error in the NYC landlord watchlist
(Fig. 3A) Variations in capitalization split records in the dataset.

In plain language, this means that without implementing technologies like case insensitivity and fuzzy matching (which would standardize the input so that “JOHN DOE” and “john doe” are recognized as the same entity), any analysis, accountability measures, or reporting based on this dataset is fundamentally flawed. Moreover, the fix is easy enough to implement, and yet a tax-funded agency drags their feet at the expense of New Yorkers. It is so easy, in fact, that we decided to host these datasets on our own servers and implement the corrective measures ourselves (see our API documentation) — we are not funded by taxes.

These discrepancies in our city's datasets not only compromise the integrity of vital data, but also hinder meaningful accountability. These datasets are widely implemented across varying systems. By standardizing data entry and implementing fuzzy matching, these inaccuracies can be remedied, ensuring a fairer and more accurate representation of landlord performance. It is imperative that HPD as well as Public Advocate Jumaane D. Williams address these issues promptly to maintain public trust and effectively hold landlords accountable.

Analysis published by RentHistory.org - Credit is required when quoting or referencing this material.