Campaign Finance Database Methodology

You can download our full dataset here

Data Sources

We gathered data on contribution amounts and dates, along with donor names, addresses, employers, and occupations from the following sources:

Sector data, as well as some donor notes and summary “blurbs,” were drawn from additional online research by a group of a few dozen volunteers.

Our analysis is based primarily on records provided online by the Philadelphia Department of Records, which we have cross-checked against campaign finance filings made with the Philadelphia City Commissioners. We are including both monetary and in-kind contributions.

Types of Data Included and Data Preparation Methodology

Data Fields

  • Donor Name: Primarily drawn from aforementioned sources. We created a cleaned version of the donor name field, in which we standardized names that had slightly different spellings across different filings (while comparing addresses and employers, where relevant, for added validation[1]). We also in some cases grouped together contributions when a company donated through a property it owned or if donations were made from two entities who did not give independently (e.g. spouses). If a company or organization contributed money both itself and through an affiliated PAC, we have grouped these contributions together under the same final donor name for presentation purposes.

  • Contribution To: Drawn from the aforementioned sources. Unaltered, although some cleaning/combining was needed in cases where candidates filed with different Political Committee names over time (e.g. People for Parker, Friends of Cherelle L. Parker).

  • Amount: Drawn from aforementioned sources. Unaltered.

  • Date: Drawn from aforementioned sources. Unaltered.

  • Sector (detailed): Created to categorize donors based on the sector where they are employed or with which they are affiliated; in some cases the sectors denote industries where donors have other involvement outside of their jobs—including having advocated for, donated to, or served that sector in some additional capacity (e.g. board members for charter schools, etc.). See Sector Rationales below under “How did we alter or augment these data sources?”.

  • Sector (grouped): Sectors were further grouped into a final set of 8 categories that represented the most prevalent and/or politically notable sector groups (further described below).

  • Sector Flags: Since many entities are affiliated with multiple sectors (e.g. a law firm that does a lot of work for real estate companies, or a union that is connected to the food & beverage industry), we have created separate “flag” columns for several categories that we are interested in analyzing. These are not necessarily complete, as we are still working on filling out some of this data.

  • Address: Drawn from aforementioned sources. Unaltered, other than cleaning obvious spelling errors. While we have used address data for mapping purposes, addresses are excluded from the public version of our dataset available above. They can however be found in the original public data sources. For individuals, these generally represent home addresses, though sometimes business or employer addresses are listed instead.

  • Employer: Drawn from aforementioned sources. Generally unaltered, though in a few cases employers were added where missing.

  • Occupation: Drawn from aforementioned sources. Generally unaltered, though in a few cases occupations were added where missing.

  • In-Kind?: This field indicates whether each contribution represented an in-kind contribution (Null values are monetary contributions).

  • State Level Race: This field indicates whether each contribution was for an election for a state-level office (rather than municipal office). There were a few candidates running for local offices in 2019 who were previously elected to state-level offices. State-level elections may draw contributions from fairly different sources than municipal elections, given different policymaking powers in different levels of government. State-level data is excluded from our current campaign finance dashboard.

Updates to Original Datasets

The original datasets originate from the sources listed at the top of this document.

How did we modify these data sources?

We first organized and cleaned up the Philadelphia Department of Records Campaign Finance Document Search Engine data for All Contributions to Current Councilmembers and Mayor Kenney from 2014 through 2018.

We modified the data in the following ways:

  • Corrected donor name misspellings and combined donors with alternate names into a single name, as described above.

  • Cross-checked data with the TXT and PDF filings where they were available. In a few cases, the data sources did not match one another, or data was missing in one place and needed to be combined. We had to transcribe a few PDF filings which were missing from the other sources—recording all of the contributions listed in “Schedule I - Contributions and Receipts” and “Schedule II - In-Kind Contributions and Valuable Things Received.”

  • Deleted duplicate contribution records that had matching (or near-matching) recipient and donor names, donation amounts, and dates, based on the assumption and precaution that many of these were accidental double entries and/or amendments. The campaign finance TXT file records provided by the Department of Records include many rows that represent amendments to previous records, yet there are no contribution “IDs”, so it is impossible to tell for sure which records are actually duplicates. We took a conservative approach by assuming that it is probably rare for the same donor to contribute the same amount to the same candidate more than once on the same day, and deleted duplicate records in these cases, unless we could verify that both records were included in the PDF filings.

  • Eliminated records marked as “other receipts” in TXT files (these include non-contribution sources of income such as loans to campaigns, legal settlements, and interest accrued in bank accounts).

Using employment data and further research, we then categorized as many donors as we could by economic sector, according to the following definitions:

    • Banking, Finance & Insurance: Any entity employed in or affiliated with the banking, finance, accounting, and/or insurance industries

    • Charter / Education Privatization: Any entity with major stakes in charter schools or education privatization, including charter school owners and employees, and Political Action Committees with strong ties to education privatization

    • Food, Beverage & Tobacco: Any entity qualifying as a member of the food, beverage, or tobacco industries, including distributors, grocers, retailers, and restaurants

    • Law: Any entity qualifying as a lawyer or law firm

    • Politics / Political Committees: Any entity qualifying as a political committee under Philadelphia election rules, with several additions for prominent political operators who do not currently hold office (e.g. Ed Rendell)

    • Real Estate & Building Industry: Any entity affiliated with real estate or the building industry, including developers, property managers, realtors, real estate brokers, building industry contractors, architects, interior designers, etc.

    • Tech, Telecom & Engineering: Any entity who works in or is affiliated with the technology, telecommunications, or engineering sectors

    • Unions: Any entity qualifying as a union

    • Other: For any sectors that did not clearly fit into the above categories

    • Unknown: For donors for which we are not sure of their economic sector

Going forward, after each campaign finance deadline, we are downloading the new data from the City of Philadelphia Campaign Finance System search engine (new website as of beginning of 2019), then using an R script to aggregate and deduplicate the data, and then adding it to our existing dataset. During this process, the R script searches for near matches of concatenated donor names, addresses, occupations, and employers between new records and the existing dataset. If a close enough match is found for a new record, then a standardized, cleaned donor name is used for the new record, and the sector data (researched by PPR volunteers) from the existing records are added to the new record.

After this automated data preparation process, we give the full new dataset a thorough manual audit for inconsistencies, perform further manual donor name cleaning, and collaboratively research the sectors for new large donors, to the extent that we have time to do so. While the messiness of the data (in the format that it is made publicly available) results in inevitable minor discrepancies in fundraising totals in some cases, we are confident that our data for almost all cycles is within a margin of error of a few thousand dollars per candidate - and less than that for more recent years. As this process is certainly not perfect, we welcome any input or corrections from the community! Email us at phillypowerresearch [at] gmail [dot] com.

Data Inclusion Rules

Our Campaign Finance Data Explorer tool includes monetary and in-kind donations during the period of January 2014 through the most recent campaign finance report filing period. For years prior to 2019, smaller donations of $50 or less are excluded, since candidate committees are not required to itemize these contributions. However, for 2019 onward, we have been adding the lump sums of small, unitemized contribution totals for each candidate and filing cycle to our dataset as well. Donor names listed on our Campaign Finance Data Explorer are limited to those who have given $1,000 or more for whichever filter criteria are selected by the user.

Footnotes

  1. We acknowledge that this is an imperfect process, and will happily alter names to distinguish between different people/groups (e.g. adding (a), (b) to names, etc.) if anyone makes us aware that one entity’s contributions are being incorrectly lumped together with another entity's contributions. Contact us with feedback at phillypowerresearch [at] gmail [dot] com