This project has no interest in protecting reputations or careers. Its only obligation is to citizens and the factual public record.
The methodology, scoring rubrics, tier weights, and every individual governor score are published at parkerintel.com/governors. Nothing is hidden. Every item includes the score, the evidence, and the primary government data source.
If you disagree with a score: Name the item number. Cite a primary government source that contradicts the evidence we published. We will review any factual correction supported by primary data and publish an updated score with a timestamped revision note if warranted. Personal attacks, political affiliations, and appeals to authority are irrelevant. Only evidence matters.
If you cannot name the item and cite the source, you do not have a disagreement. You have an opinion.
The ALA 263-Metric Governor Evaluation scores all 50 sitting U.S. governors across 263 individually documented items with a maximum of 1,653 points. Every score is anchored to primary government data — BLS, Census, FBI, CDC, NAEP, FHWA, EPA, CMS, PACER, state auditors — not opinion polls, think tank indices, or editorial scorecards. The methodology was locked before any governor was scored. We did not know who would rank where when we set the weights.
Section A — Governance (100 items, max 300 pts): The governor's own executive actions — budget execution, appointments, emergency management, ethics, transparency.
Section B — State Outcomes (13 categories, max 975 pts): Measurable results using primary source data — economics, crime, education, healthcare, infrastructure, fiscal health, immigration.
Section C — Oath Fidelity (4 categories / 126 metrics, −378 to +378): Fidelity to the U.S. Constitution, measured against court-confirmed actions and federal consent decrees, not political opinion.
Sections A & B use a 4-point scale (0–3). Section C uses a 7-point scale (−3 to +3), where negative scores indicate oath violations and positive scores indicate constitutional fidelity. Each item falls into one of three scoring categories:
Binary — Court records: it happened or it didn't. (Criminal charges, ethics violations, consent decrees.)
Data-Benchmarked — National quartile ranking: top 12 states = 3, above median = 2, below median = 1, bottom 12 = 0. Trajectory during tenure matters.
Severity-Scaled — How bad was the failure: no incidents = 3, minor = 2, significant = 1, catastrophic (50+ casualties / federal takeover) = 0.
A single Tier 1 item (preventable deaths, corruption) carries approximately 15× the weight of a single Tier 4 item (economic statistic). A governor cannot score above 60/100 if they completely fail on accountability, regardless of economic performance.
After scoring, ranking stability was tested across 4,096 permutations using random numbers generated on IBM Quantum hardware (ibm_fez, Job ID: d6uvlc2f84ks73deoqp0). Two tests were performed: Test A (Weight Sensitivity) perturbed all 26 dimension weights ±50%; Test B (Measurement Noise) perturbed section scores ±5%. Quantum here is an audit trail decision — any competent statistician could replicate the same sensitivity analysis with classical random numbers. IBM's quantum hardware provides a publicly verifiable, tamper-proof record that the random weights were not cherry-picked. IBM did not review, endorse, or validate the methodology or results.
Timothy E. Parker — Guinness World Records Most Syndicated Puzzle Compiler — served as Senior Editor, directing framework design, scoring consistency, source verification, and cross-governor rubric compliance across all 263 metrics.
This is the most detailed governor evaluation ever published: 50 sitting governors, 263 individually scored metrics, 1,653 maximum points, 26 scoring dimensions, with ranking stability verified on IBM Quantum hardware.
This evaluation operates on an explicit value framework that we name and defend rather than disguise as neutral objectivity:
Consequence-First: The severity of harm to citizens determines how much an item weighs. Preventable deaths count more than GDP growth. Corruption counts more than job statistics. This is not an ideological position — it is a statement about what is irreversible. A dead citizen cannot be compensated by a percentage point of economic growth. A bribed legislature cannot produce trustworthy policy. Consequences that cannot be undone must outweigh consequences that can.
Rights-Based: Every governor swears an oath to uphold the Constitution. Section C measures whether they kept it. This is not our standard — it is theirs. The text of the Constitution and the text of their oath are public documents. Court decisions interpreting both are public records. We measure governors against the commitment they made voluntarily, under oath, on the day they took office.
Critics who disagree with this framework are not disagreeing with us. They are arguing that preventable deaths should matter less, that corruption should be discounted, or that oath-breaking should be ignored. They are welcome to make that argument publicly.
This methodology was finalized and locked before any governor was scored. We did not know who would rank where when we set the weights, defined the rubrics, or assigned items to tiers. Republican and Democrat governors are evaluated using identical items, identical rubrics, identical data sources, and identical weights. If Tier 1 items disproportionately affect governors of one party, that reflects the governance record of those governors, not a bias in the system. The methodology is published in full so that this claim can be independently verified.
Existing governor rankings rely on third-party indices, opinion surveys, or narrow ideological scorecards. They tell you whether a governor is popular. They do not tell you whether a governor is effective, honest, or faithful to their oath of office.
This evaluation measures what matters. Every score is anchored to primary government data — Bureau of Labor Statistics employment figures, Census population flows, FBI Uniform Crime Reports, CDC mortality data, NAEP education scores, state auditor findings, federal court records — not opinion polls, editorial endorsements, or partisan scorecards.
The 263 metrics are organized into three sections reflecting the three dimensions of executive governance:
| Section | Focus | Items | Max Points | Scoring |
|---|---|---|---|---|
| Section A: Governance | The governor's own executive actions | 100 | 300 | Each item 0–3 |
| Section B: State Outcomes | Measurable results using primary source data | 13 categories | 975 | Each category 0–75 |
| Section C: Oath Fidelity | Fidelity to Declaration of Independence & Bill of Rights | 4 categories (126 metrics) | −378 to +378 | Each metric −3 to +3 |
Section A evaluates what the governor personally did or failed to do. These are executive actions within the governor's direct authority. A governor cannot blame the legislature, the federal government, or market forces for Section A scores — these items measure their decisions.
| Subsection | Items | Max | What It Measures |
|---|---|---|---|
| A1: Budget Execution | 15 | 45 | On-time submission, forecast accuracy, rainy day fund, credit ratings, pension funding, debt management, CAFR timeliness, audit findings, federal grant accounting |
| A2: Legislative Relations | 15 | 45 | Bill signing record, veto strategy, override rate, bipartisan legislation, special sessions effectiveness, legislative relationship quality |
| A3: Appointments | 10 | 30 | Judicial appointment quality, agency head qualifications, vacancy rates, diversity, confirmation success rate |
| A4: Emergency Management | 12 | 36 | Disaster response timeliness, National Guard deployment, FEMA coordination, preventable deaths from state failure, infrastructure failure prevention, pandemic response |
| A5: Transparency | 13 | 39 | FOIA compliance, schedule availability, campaign finance, financial disclosure, open meetings, open data, budget transparency, lobbying disclosure, IG reports, press accessibility |
| A6: Ethics | 13 | 39 | Criminal charges, ethics complaints, gift disclosure, conflicts of interest, state resource misuse, truthfulness, ethics infrastructure, emoluments, donor-to-contract pipeline, foreign influence |
| A7: Program Management | 10 | 30 | Healthcare program delivery, education initiative results, environmental programs, corrections system, transportation projects |
| A8: Federal Relations | 6 | 18 | Federal fund capture rate, grant competitiveness, regulatory relationship, federal litigation costs |
| A9: Constituent Service | 6 | 18 | Constituent response systems, town halls, accessibility, complaint resolution |
Section B measures what actually happened in the state. Unlike Section A, governors have partial — not total — control over these outcomes. State economies depend on national trends. Crime rates depend on demographic shifts. Education depends on decades of prior investment. Section B acknowledges this by measuring trajectory during tenure rather than absolute position. A governor who inherits a struggling state and improves outcomes receives credit. A governor who inherits a thriving state and allows decline does not.
| Category | Max | Primary Data Sources |
|---|---|---|
| B01: Economic Performance | 75 | BEA SAGDP, BLS LAUS, BLS CES, Census ACS |
| B02: Population & Migration | 75 | Census Population Estimates, ACS Migration Flows, IRS SOI Migration Data |
| B03: Budget & Fiscal Health | 75 | State CAFR/ACFR, Moody's/S&P/Fitch, NASBO State Expenditure Reports |
| B04: Public Safety | 75 | FBI UCR/NIBRS, CDC WONDER Mortality, BJS NPS, state crime lab reports |
| B05: Education | 75 | NAEP, NCES IPEDS, state DOE report cards, NCHEMS graduation data |
| B06: Healthcare | 75 | CDC WONDER, CMS, Census ACS (uninsured), state vital statistics |
| B07: Infrastructure | 75 | FHWA National Bridge Inventory, ASCE Report Card, EPA SDWIS, DOT FARS |
| B08: Cost of Living | 75 | BLS CPI, BEA RPP, Census ACS (housing burden), HUD FMR |
| B09: Government Transparency | 75 | State FOI logs, POGO, RTI international studies, state AG records |
| B10: Controversy & Scandal | 75 | DOJ filings, federal court records, PACER, state ethics commission records |
| B11: Historical Legacy | 75 | Academic assessments, institutional policy trajectory, long-term investment analysis |
| B12: Constituent Verdict | 75 | Approval polling (aggregated), voter participation rates, ballot measure outcomes |
| B13: Immigration & Law Compliance | 75 | DHS enforcement data, DOJ immigration court records, state cooperation agreements, sanctuary policy documentation |
Section C is unique. It evaluates whether the governor has been faithful to the oath of office every governor voluntarily swears — to uphold the U.S. Constitution, particularly the rights enumerated in the Declaration of Independence and the Bill of Rights. This is not our standard — it is the governor's own sworn commitment, made publicly, on the day they took office.
Section C scores are anchored to court-confirmed constitutional violations under the governor's administration — federal consent decrees, judicial orders striking down state actions as unconstitutional, and documented failures of constitutional obligations confirmed by courts of law. This section does not score political disagreements, policy preferences, or editorial opinions about what the Constitution should mean. It scores what courts have ruled it does mean.
Section C scores can be negative. A governor whose administration has court-confirmed constitutional violations receives a negative score that subtracts from their total. This is the only section where a governor's score can reduce their overall evaluation below the sum of Sections A and B.
| Category | Metrics | Range | Constitutional Authority |
|---|---|---|---|
| C1: Protection of Life | 31 | −93 to +93 | Declaration of Independence: “Life, Liberty, and the pursuit of Happiness” |
| C2: Constitutional Rights | 29 | −87 to +87 | Bill of Rights (1st, 2nd, 4th, 5th, 14th Amendments) |
| C3: Child Welfare & Parental Rights | 25 | −75 to +75 | 9th & 10th Amendments, parens patriae obligations |
| C4: Faithful Discharge of Duties | 41 | −123 to +123 | State constitutional oath of office provisions |
Court-confirmed violations of constitutional rights trigger additional penalties documented separately. Each governor's evaluation includes an oath breach count, the number that are court-confirmed, the penalty applied, and the specific details of each breach.
The 263 metrics span wildly different types of measurement — from binary yes/no questions (was the governor charged with a crime?) to continuous data comparisons (where does the state rank in GDP growth?). A single scoring rubric cannot fairly handle all item types. Instead, each item falls into one of three scoring categories, each with its own 0–3 logic.
Critical design principle: The scoring rubrics were defined before any governor was scored. Rubrics were not reverse-engineered to produce desired outcomes. Every item has a pre-declared definition of what constitutes a 0, 1, 2, or 3.
Did this happen or didn't it?
Binary items measure events or conditions that either occurred or did not. They are anchored to court records, DOJ filings, ethics commission findings, and official government records. There is no subjective interpretation — the document either exists or it does not.
| Score | Definition | Example (Item #74: Campaign Donor to State Contract Pipeline) |
|---|---|---|
| 3 | Clean. No incidents, no charges, no violations, no documented concerns. No public record of the event occurring. | No documented cases of campaign donors receiving preferential state contracts. Ethics commission has no related complaints on file. |
| 2 | Proximity concerns. No direct involvement, but documented connection to an incident. The governor was not charged or found in violation, but circumstantial evidence exists. | Governor received campaign contributions from entities later involved in corruption, but was not a target of the investigation. No evidence governor knew of the scheme. |
| 1 | Documented incident, limited scope. A verified event occurred involving the governor's office, staff, or direct political associates. The governor may have cooperated with investigators. | Governor received $5M+ in campaign contributions from FirstEnergy entities, signed the bill later proven to be the product of a $60M bribery scheme, and did not detect or prevent the corruption despite its scale. Not personally charged. |
| 0 | Direct involvement or catastrophic failure. Governor personally charged, indicted, or found in violation. Or: corruption of such scale operated under the governor's authority that failure to detect or prevent it constitutes a fundamental governance failure. | Governor personally indicted or convicted for corruption. Or: governor directed the scheme, received personal financial benefit, and used state resources to facilitate it. |
Items scored as binary: Criminal charges (#66), ethics complaints (#67), gift/travel disclosure (#68), conflict of interest (#69), state resource misuse (#70), emoluments/self-dealing (#73), campaign donor pipeline (#74), foreign influence (#75), sexual harassment claims (#76), records preservation (#77), revolving door (#78).
Verification sources for binary items: DOJ press releases and case filings, PACER federal court records, state ethics commission complaint databases, state AG investigation records, campaign finance filings (FEC + state), financial disclosure statements, court opinions and orders.
Where does the state rank, and which direction is it moving?
Data-benchmarked items are scored by comparing a state's performance to all other states using published national datasets. The scoring uses quartile ranking to account for the fact that governors inherit different starting positions. A governor in Mississippi and a governor in Massachusetts govern from different baselines — what matters is relative performance and trajectory.
| Score | Definition | Example (GDP Growth Rate) |
|---|---|---|
| 3 | Top quartile nationally (rank 1–12 among states) OR measurably improving trajectory that moved the state up at least one quartile during the governor's tenure. | State GDP growth rate in top 12 nationally during evaluation period. Source: BEA SAGDP tables. |
| 2 | Above median (rank 13–25) OR stable position maintaining existing quartile. | State GDP growth rate ranked 13th–25th nationally. State maintained its economic position without significant gains or losses. |
| 1 | Below median (rank 26–38) OR declining trajectory that dropped the state at least one quartile during the governor's tenure. | State GDP growth rate ranked 26th–38th nationally, or state dropped from above-median to below-median during tenure. |
| 0 | Bottom quartile (rank 39–50) OR significant decline during tenure (two or more quartile drops). | State GDP growth rate ranked in bottom 12 nationally, or state economic performance collapsed relative to peers during tenure. |
Items scored as data-benchmarked: All Section B items across economic performance, population, fiscal health, public safety, education, healthcare, infrastructure, cost of living, and transparency. Also applies to select Section A items where national comparisons exist (e.g., rainy day fund as percentage of expenditure, pension funding ratio).
Primary national datasets used for quartile ranking:
Something went wrong. How wrong?
Severity-scaled items measure the magnitude of a failure. Unlike binary items (did it happen?) or data-benchmarked items (how does the state rank?), severity items measure how bad a documented event was, using objective thresholds tied to body counts, financial losses, and federal intervention triggers.
| Score | Definition | Example (Item #44: Preventable Deaths from State Failure) |
|---|---|---|
| 3 | No incidents. Zero preventable deaths, zero infrastructure failures resulting in casualties, zero events requiring federal emergency intervention due to state negligence. | No mass casualty events attributable to state government failure during the governor's tenure. |
| 2 | Minor incidents. Small-scale event, contained quickly. Fewer than 5 people directly harmed. State response was adequate once the event was identified. | Isolated infrastructure incident (e.g., single bridge failure, localized water contamination) with limited casualties and prompt state response. |
| 1 | Significant failure. 5–50 people affected. State response was delayed, inadequate, or poorly coordinated. Federal agencies intervened due to state incapacity. | East Palestine derailment: hazardous material release, delayed environmental monitoring, uncertain long-term health impacts, primarily federal responsibility (NTSB/FRA) but state inspection resources inadequate. |
| 0 | Catastrophic. 50+ casualties, OR systemic failure requiring federal takeover, OR irreversible environmental damage affecting an entire region. The governor's authority was directly responsible or directly failed to prevent. | ERCOT grid collapse (Texas, Feb 2021): 246+ preventable deaths. 4.5 million without power for days. Governor's appointed regulators failed to mandate winterization despite 2011 warnings. A known risk, a documented recommendation, and a preventable mass casualty event. |
Items scored as severity-scaled: Preventable deaths from state failure (#44), National Guard deployment timing (#43), infrastructure failure prevention (#47), pandemic response effectiveness (#48–51), environmental contamination events, child fatalities in state custody, prison deaths from negligence, emergency response coordination failures.
Verification sources for severity items: NTSB accident reports, FEMA after-action reports, CDC WONDER cause-of-death queries, EPA enforcement action records, federal consent decrees, state medical examiner reports, OSHA inspection records, National Guard deployment records.
Not all governance failures are equal. A governor who presides over a $60 million bribery scheme and 246 preventable deaths should not be able to outscore a clean governor by posting better job numbers. The tier weighting system ensures that scores reflect the real-world consequences of governance, not just the quantity of items scored.
| Tier | Weight | What It Contains | Why This Weight |
|---|---|---|---|
| Tier 1 | 40% | Preventable deaths from state failure. Corruption and fraud under the governor's authority. Court-confirmed constitutional violations. Federal consent decrees. Personal criminal charges or staff indictments. | Irreversible harm. You cannot undo death. Corruption undermines every other metric — if the legislature is for sale, how do you trust the education numbers? 40% means a governor cannot score above 60% overall if they completely fail on accountability, regardless of performance elsewhere. |
| Tier 2 | 25% | Child welfare failures (foster care deaths, DFPS federal oversight). Fiscal destruction (credit downgrades, pension collapse). Environmental health damage with community-level impact. | Severe harm to the most vulnerable. Children in state custody cannot advocate for themselves. Pension collapses affect millions of retirees. Environmental contamination can take decades to remediate. These failures define a governorship but do not necessarily taint every other measurement the way corruption does. |
| Tier 3 | 20% | Infrastructure reliability. Public safety trajectory. Education outcomes. Healthcare access and quality. | The job description. This is what governors are elected to manage. It affects millions of lives daily. But these are expected competencies — running schools and roads is the baseline, not the ceiling. |
| Tier 4 | 10% | Economic performance (GDP, jobs, wages). Cost of living trajectory. Business environment. | Real but indirect. Governors influence economic conditions but do not control them. A state's GDP depends heavily on geography, federal policy, industry mix, national cycles, and decades of prior investment. Give it weight, but not more than it deserves. |
| Tier 5 | 5% | Population migration. Constituent approval. Historical legacy comparisons. | Outputs, not inputs. People voting with their feet and poll numbers are the result of governance, not governance itself. A popular governor is not necessarily a good one. An unpopular one who makes hard, correct decisions is not necessarily bad. Informative as a cross-check, not decisive. |
At 50%+, the system would effectively measure only corruption and preventable deaths. Governors who are genuinely clean but terrible at governing — crumbling schools, failing hospitals, no economic plan — would receive high scores. The job of governor is more than “don't kill people and don't steal.” 40% ensures accountability dominates without making it the only thing that matters.
At 30% or below, the “scoring your way out” problem returns. A governor with a $60M bribery scheme operating under their authority could offset it with enough good economic statistics and positive migration data. That outcome violates the foundational principle: consequences drive weight. 40% is the tipping point where catastrophic Tier 1 failure cannot be compensated by excellent performance elsewhere.
This is where most objections will arise. Critics will say economic performance should count for more. The response is straightforward: governors do not control their state's economy. Texas has oil. California has Silicon Valley. North Dakota has the Bakken formation. New York has Wall Street. These industry concentrations predate every sitting governor by decades. When oil prices rise, Texas GDP grows regardless of who is governor. When tech booms, California grows regardless. Giving economic performance 10% respects that it matters while acknowledging the governor's limited causal role.
If approval ratings drove the score, every governor would optimize for popularity rather than effectiveness. Short-term popular decisions (tax cuts without spending cuts, delaying infrastructure maintenance, avoiding hard policy choices) produce high approval and long-term damage. Migration data is informative — if 500,000 people are leaving a state, something is wrong — but it's a lagging indicator that reflects prior governance, not current performance.
Final scoring proceeds in three steps:
Step 1: Score each of the 263 metrics using the appropriate scoring category (binary, data-benchmarked, or severity-scaled). Sections A & B items are scored 0–3; Section C items are scored −3 to +3.
Step 2: Within each tier, sum the raw item scores and calculate the tier percentage:
Tier % = (sum of item scores in tier) / (max possible in tier) × 100
Step 3: Multiply each tier percentage by its weight and sum:
Final Score = (Tier 1 % × 0.40) + (Tier 2 % × 0.25) + (Tier 3 % × 0.20) + (Tier 4 % × 0.10) + (Tier 5 % × 0.05)
Result: a score from 0 to 100.
The effective per-item multiplier depends on how many items are in each tier:
| Tier | ~Items | Weight | Per-Item Influence | Effective Multiplier vs. Tier 4 Baseline |
|---|---|---|---|---|
| Tier 1 | ~35 | 40% | 1.14% each | ~15× |
| Tier 2 | ~45 | 25% | 0.56% each | ~7× |
| Tier 3 | ~90 | 20% | 0.22% each | ~3× |
| Tier 4 | ~130 | 10% | 0.077% each | 1× (baseline) |
| Tier 5 | ~100 | 5% | 0.05% each | ~0.6× |
A single Tier 1 item is worth approximately 15 times a single Tier 4 item. That ratio sounds aggressive until you state it plainly: “Is one item about 246 people dying from a preventable grid failure worth 15 items about job growth statistics?” The answer is self-evident.
| Tier | Raw Score | Max | Tier % | × Weight | Contribution |
|---|---|---|---|---|---|
| Tier 1 | 42 | 105 | 40% | × 0.40 | 16.0 |
| Tier 2 | 108 | 135 | 80% | × 0.25 | 20.0 |
| Tier 3 | 216 | 270 | 80% | × 0.20 | 16.0 |
| Tier 4 | 351 | 390 | 90% | × 0.10 | 9.0 |
| Tier 5 | 255 | 300 | 85% | × 0.05 | 4.25 |
| Final Score: | 65.3 / 100 | ||||
Despite scoring 80–90% on Tiers 2–5, the 40% Tier 1 score drags the final to 65. The corruption matters.
| Tier | Tier % | × Weight | Contribution |
|---|---|---|---|
| Tier 1 | 85% | × 0.40 | 34.0 |
| Tier 2 | 65% | × 0.25 | 16.25 |
| Tier 3 | 60% | × 0.20 | 12.0 |
| Tier 4 | 55% | × 0.10 | 5.5 |
| Tier 5 | 50% | × 0.05 | 2.5 |
| Final Score: | 70.3 / 100 | ||
The clean, mediocre governor outscores the corrupt, high-performing one. This is the correct outcome.
Every score in this evaluation is traceable to a primary data source. We do not import third-party governor scorecards, editorial indices, or ideological rankings. Where non-government sources are used as inputs (e.g., credit rating agencies, approval polling aggregates, ASCE infrastructure grades), they are clearly identified and serve as secondary context rather than core scoring drivers.
Why this matters: Third-party rankings embed the ideology of the ranking organization. The Cato Institute and the Sierra Club will rank the same governor differently because they measure different values. This evaluation sidesteps the problem entirely by using only raw data from agencies that serve both parties. BLS unemployment figures do not have a political affiliation. CDC mortality rates do not have an editorial board.
| Source | Agency | What It Provides |
|---|---|---|
| SAGDP | Bureau of Economic Analysis | State-level GDP, personal income, industry composition |
| LAUS | Bureau of Labor Statistics | State/metro unemployment rates (monthly) |
| CES | Bureau of Labor Statistics | Nonfarm payroll employment by state (monthly) |
| CPI | Bureau of Labor Statistics | Consumer price inflation by metro area |
| ACS | Census Bureau | Income, poverty, housing costs, insurance coverage, migration |
| Population Estimates | Census Bureau | Annual state population and components of change |
| SOI Migration | Internal Revenue Service | County-to-county migration based on tax returns |
| UCR / NIBRS | Federal Bureau of Investigation | Violent crime, property crime rates by state |
| WONDER | Centers for Disease Control | Cause-of-death mortality, maternal mortality, drug overdose deaths |
| NAEP | National Center for Education Statistics | 4th/8th grade reading and math scores by state |
| IPEDS | National Center for Education Statistics | Higher education enrollment, completion, costs |
| NBI | Federal Highway Administration | Bridge structural condition ratings by state |
| FARS | Department of Transportation | Fatal traffic crash data |
| SDWIS | Environmental Protection Agency | Drinking water system violations by state |
| CMS Data | Centers for Medicare & Medicaid | Medicaid enrollment, hospital quality, state plan data |
| PACER | Federal Courts | Federal case filings, consent decrees, civil rights cases |
| CAFR/ACFR | State Governments | Comprehensive annual financial reports |
| NASBO | Nat'l Assoc. of State Budget Officers | State expenditure reports, fiscal survey |
| Credit Ratings | Moody's, S&P, Fitch | State general obligation bond ratings |
After all 50 governors were scored across all 263 metrics, the ranking stability was tested using perturbation analysis powered by random numbers generated on IBM Quantum hardware. This section explains exactly what that process does, what it does not do, and why quantum hardware was chosen.
Ranking stability verification through sensitivity analysis.
The evaluation system has 26 scoring dimensions (9 governance + 13 outcomes + 4 oath fidelity). Each dimension contributes to the final score. But reasonable people could disagree about how much each dimension should matter. Should budget execution count more than education outcomes? Should public safety outweigh infrastructure?
The perturbation analysis answers the question: “If we changed how much each dimension matters, would the rankings still hold?”
Two complementary tests are performed:
Together, Test A asks “what if we weighted categories differently?” and Test B asks “what if the scores had measurement error?”
Result: Spencer Cox (Utah) held the #1 position in 100.0% of 4,096 permutations. His lead is so substantial that no reasonable reweighting of dimensions changes who finishes first.
The quantum step does not validate the individual scores.
The perturbation analysis tests whether the ranking order is robust to changes in dimension weights. It does not verify whether the underlying 0–3 item scores are accurate. Score accuracy depends entirely on the evidence, data sources, and editorial judgment documented in each governor's evaluation (see Parts III and V of this document).
If every governor were scored incorrectly but consistently incorrectly, the perturbation analysis would still show “stable” rankings. Stability ≠ accuracy. Stability means the ranking order is not sensitive to weighting assumptions. Accuracy means the scores themselves are correct. The quantum step tests the former, not the latter.
IBM did not endorse this project.
IBM Quantum provided computational hardware through its publicly available cloud platform. IBM did not review, approve, endorse, or validate the evaluation methodology, the scoring rubrics, or the results. “Quantum Verified” means the perturbation analysis used quantum-generated random numbers — not that IBM certified or approved the project. IBM's involvement was limited to executing the quantum circuits submitted through their standard platform.
For this application — generating random weights for a sensitivity analysis — classical pseudorandom number generators (such as Python's Mersenne Twister, /dev/urandom, or any cryptographically secure PRNG) would produce statistically equivalent results. Quantum randomness is mathematically necessary for cryptographic applications and certain physics simulations. It is not mathematically necessary for perturbation analysis.
Quantum hardware was chosen for one reason: auditability.
d6uvlc2f84ks73deoqp0) is publicly accessible at quantum.ibm.com.ibm_fez), not a simulator.In short: quantum hardware was chosen for its transparency properties, not its mathematical properties. We disclose this because transparency is not optional in serious research.
| Parameter | Value |
|---|---|
| Quantum Backend | ibm_fez (IBM Quantum) |
| Job ID | d6uvlc2f84ks73deoqp0 |
| Verification URL | quantum.ibm.com/jobs/d6uvlc2f84ks73deoqp0 |
| Number of Qubits | 8 |
| Gate Configuration | Hadamard (H) on all qubits — true quantum superposition |
| Number of Circuits | 100 |
| Shots per Circuit | 1,024 |
| Total Quantum Samples | 102,400 |
| Permutations Generated | 4,096 |
| Dimensions Perturbed | 26 |
| Weight Range | 0.5× to 1.5× per dimension (±50%) |
| Governors Ranked | 50 |
| Verification Date | May 18, 2026 |
| Metric | Result |
|---|---|
| #1 rank hold (Spencer Cox, UT) | 100.0% of 4,096 permutations |
| Top 5 stable | 47.0% held exact position |
| Top 10 stable | 6.8% held exact position |
| Bottom 5 stable | 81.8% held exact position |
The high stability at the top and bottom indicates that the best and worst governors are clearly differentiated. The middle of the pack shows more movement under perturbation, which is expected — governors with similar scores can swap positions when weights change slightly.
The 263-item governor evaluation was produced under the editorial authority of Timothy E. Parker, Senior Editor of the Governor Evaluation Project at Parker Intel.
As Senior Editor, Parker directed:
Parker Intel (parkerintel.com) is a cognitive science research institution founded in 1996. Over 30 years, ALA has developed 12 proprietary assessment systems, integrated over 1,000 peer-reviewed research papers, and administered assessments 180 million+ times. The governor evaluation applies ALA's assessment methodology — item construction, rubric design, evidence-based scoring, and statistical validation — to executive governance performance.
This section provides direct responses to anticipated challenges from reporters, detractors, and governors' offices.
Response: Every score is published with three components: the numerical score (0–3), the evidence supporting it, and the primary government data source where that evidence can be independently verified. Identify the specific item number, explain which part of the evidence is incorrect, and provide the primary source document that contradicts it. We will review any factual correction supported by primary sources and publish an updated score if warranted.
Response: The tier weights were set before any governor was scored. They reflect a single principle: consequences to citizens drive weight. Preventable deaths and corruption (Tier 1, 40%) affect citizens more severely than economic statistics (Tier 4, 10%). This principle applies identically to both parties. If Tier 1 items disproportionately affect governors of one party, that reflects the governance record of those governors, not a bias in the weighting system. The weights are published in advance and apply uniformly.
Response: The scoring rubric distinguishes between direct involvement (score of 0) and proximity without personal charges (score of 1 or 2). A governor who signed a bill produced by a $60M bribery scheme and received $5M in campaign contributions from the corrupt entity receives a 1 — not a 0. The rubric does not treat proximity the same as participation. However, a governor is the chief executive of the state. If a corruption scheme of that magnitude operates in the state capital under their authority, the failure to detect or prevent it is a documented governance failure, even without personal criminal liability. The scoring reflects the failure of oversight, not a presumption of guilt.
Response: Governors do not control their state's economy. Texas has oil. California has Silicon Valley. North Dakota has the Bakken formation. New York has Wall Street. These industry concentrations predate every sitting governor by decades. When oil prices rise, Texas GDP grows regardless of who is governor. When tech booms, California benefits regardless. National economic cycles, Federal Reserve interest rate policy, and global trade conditions affect all 50 states simultaneously. A governor who takes office during a national expansion will have better economic numbers than a governor who takes office during a recession, regardless of competence. The 10% weight respects that economic conditions matter to citizens while acknowledging the limited causal role of the governor. If economic performance counted for 30%+, the evaluation would effectively measure national economic cycles and regional industry concentration rather than governance quality.
Response: Correct, and we address this directly on the evaluation website and in this methodology document. IBM Quantum provided computational hardware through its publicly available cloud platform. IBM did not review, endorse, or validate the methodology or results. “Quantum Verified” refers specifically to the perturbation analysis using quantum-generated random numbers for auditability. We state explicitly that classical pseudorandom generators would produce statistically equivalent results and that quantum hardware was chosen for its audit trail, not mathematical necessity. This disclosure appears on the public website.
Response: The rubric does, not a person. Each item has a pre-defined rubric with specific thresholds (see Part III). Binary items are anchored to court records: the filing exists or it doesn't. Data-benchmarked items are anchored to national quartile rankings: the BLS data puts the state in a quartile, and the quartile maps to a score. Severity-scaled items are anchored to objective magnitude thresholds: casualty counts, federal intervention triggers, consent decree filings. The rubrics were defined before scoring began and apply identically to all 50 governors. The Senior Editor's role is to ensure rubric compliance — that each score matches its rubric — not to exercise personal judgment about what a score should be.
Response: Section C evaluates governor actions against specific enumerated rights in the Declaration of Independence and the Bill of Rights. These are not abstract principles — they are text. When a governor signs legislation that is subsequently struck down by a federal court as violating a specific constitutional amendment, that is a documented oath fidelity failure, not an interpretation. When a governor's state child welfare agency is placed under federal consent decree for constitutional violations, that is a court-confirmed failure to protect rights. Section C scores are anchored to federal court decisions, consent decrees, and the text of specific constitutional provisions. The constitutional text and the court opinions are publicly available for independent verification.
Response: This evaluation does not ask anyone to accept our authority. It asks anyone to check our sources. Every score includes the data source. Every data source is a public government record. The evaluation is designed to be verified, not believed. The methodology is published. The rubrics are published. The scores, evidence, and sources for all 50 governors across all 263 metrics are published. If a score is wrong, identify the item, cite the primary source that contradicts it, and we will review it publicly. Credibility comes from transparency and verifiability, not credentials.
The 263 metrics span the following dimensions. Full item-level detail (metric name, score, evidence, source) is published for each governor individually at parkerintel.com/governors/.
| Dimension | Items | Max | Tier |
|---|---|---|---|
| A1: Budget Execution | 15 | 45 | Tiers 2–3 |
| A2: Legislative Relations | 15 | 45 | Tier 3 |
| A3: Appointments | 10 | 30 | Tier 3 |
| A4: Emergency Management | 12 | 36 | Tiers 1–2 |
| A5: Transparency | 13 | 39 | Tiers 2–3 |
| A6: Ethics | 13 | 39 | Tier 1 |
| A7: Program Management | 10 | 30 | Tiers 2–3 |
| A8: Federal Relations | 6 | 18 | Tier 3 |
| A9: Constituent Service | 6 | 18 | Tier 4 |
| Dimension | Max | Tier |
|---|---|---|
| B01: Economic Performance | 75 | Tier 4 |
| B02: Population & Migration | 75 | Tier 5 |
| B03: Budget & Fiscal Health | 75 | Tier 2 |
| B04: Public Safety | 75 | Tier 3 |
| B05: Education | 75 | Tier 3 |
| B06: Healthcare | 75 | Tier 3 |
| B07: Infrastructure | 75 | Tier 3 |
| B08: Cost of Living | 75 | Tier 4 |
| B09: Government Transparency | 75 | Tier 3 |
| B10: Controversy & Scandal | 75 | Tier 1 |
| B11: Historical Legacy | 75 | Tier 5 |
| B12: Constituent Verdict | 75 | Tier 5 |
| B13: Immigration & Law Compliance | 75 | Tier 2 |
| Category | Metrics | Range | Tier |
|---|---|---|---|
| C1: Protection of Life | 31 | −93 to +93 | Tier 1 |
| C2: Constitutional Rights | 29 | −87 to +87 | Tier 1 |
| C3: Child Welfare & Parental Rights | 25 | −75 to +75 | Tier 2 |
| C4: Faithful Discharge | 41 | −123 to +123 | Tier 1 |
All data sources used in this evaluation are publicly accessible. No proprietary or restricted-access datasets were used.
| Agency | URL | Data Used |
|---|---|---|
| Bureau of Labor Statistics | bls.gov | Employment, unemployment, wages, CPI |
| Bureau of Economic Analysis | bea.gov | State GDP, personal income, regional price parities |
| Census Bureau | census.gov | Population, migration, ACS demographics |
| CDC / NCHS | cdc.gov/nchs & wonder.cdc.gov | Mortality, maternal health, drug overdose, vital statistics |
| FBI | ucr.fbi.gov | Uniform Crime Reports, NIBRS crime data |
| NCES | nces.ed.gov | NAEP scores, IPEDS higher education data |
| FHWA | fhwa.dot.gov | National Bridge Inventory, highway statistics |
| EPA | epa.gov | SDWIS drinking water, air quality, enforcement actions |
| CMS | cms.gov | Medicaid enrollment, hospital quality, state health expenditures |
| DOJ | justice.gov | Federal prosecutions, civil rights enforcement, consent decrees |
| IRS | irs.gov/statistics | SOI migration data (county-to-county tax return flows) |
| Source | URL | Data Used |
|---|---|---|
| PACER | pacer.uscourts.gov | Federal case filings, consent decrees, civil rights litigation |
| Federal Register | federalregister.gov | Executive orders, regulatory actions affecting states |
| Supreme Court opinions | supremecourt.gov | Constitutional rulings affecting state governance |
| Source Type | Data Used |
|---|---|
| State Auditor / Comptroller | CAFR/ACFR, audit findings, material weaknesses |
| State Ethics Commission | Complaints, investigations, financial disclosure records |
| Secretary of State | Campaign finance filings, lobbying disclosures |
| State Treasurer | Debt reports, pension funding, rainy day fund balances |
| Governor's Office | Budget proposals, executive orders, press releases |
| State DOE | School report cards, graduation rates, teacher data |
| State Vital Records | Birth/death certificates, health statistics |
| Source | Data Used |
|---|---|
| NAEP (congressionally mandated) | Standardized 4th/8th grade reading and math scores |
| Moody's / S&P / Fitch | State credit ratings and outlook |
| ASCE | Infrastructure Report Card grades by state |
A methodology that cannot be corrected is not rigorous — it is dogma. This section codifies how errors are identified, verified, corrected, and documented.
All versions of the evaluation are timestamped and archived. When corrections are made:
Why this matters: Any evaluation of this scale will contain errors. Individual item scores may rely on data that is subsequently revised, court decisions that are overturned, or evidence that was incomplete at the time of scoring. The integrity of the evaluation is not measured by whether it is perfect on publication day. It is measured by whether errors are corrected transparently when identified. A project that publishes corrections is more credible than one that claims it never needs them.
This evaluation system, its methodology, scored data, and all associated content are the intellectual property of Parker Intel. Any use of ALA governor rankings, scores, evidence, or methodology — whether in news coverage, academic work, social media, political commentary, or commercial products — requires proper attribution.
The governor evaluation is one of twelve proprietary assessment systems published by Parker Intel. Our cognitive, relationship, and longevity assessments use the same evidence-based methodology — visit realworldiq.com, reliqtest.com, and realbioage.com.