Methodology Document

The ALA 263-Metric Governor Evaluation System

Scoring Framework, Tier Weighting, Data Sources & IBM Quantum Stability Verification

Senior Editor: Timothy E. Parker
Guinness World Records — Most Syndicated Puzzle Compiler

Institution: Parker Intel
Publication Date: May 18, 2026
Version: 1.2

Quantum Verification: IBM Quantum · ibm_fez · Job ID: d6uvlc2f84ks73deoqp0
Governors Evaluated: 50 · Metrics: 263 · Max Points: 1,653
Week of May 18, 2026
Rankings are updated weekly. Each update is independently scored, timestamped, and archived.

Press & Media Usage

This methodology document and the associated Governor Rankings are published by Parker Intel at parkerintel.com.

Journalists, researchers, and media organizations may reference and quote this document with attribution. Full reproduction requires written permission. All data, scores, and rankings must be attributed to: "ALA 263-Metric Governor Evaluation, Parker Intel (parkerintel.com)"

Rankings are updated weekly. Always verify you are citing the current week's rankings at parkerintel.com/governors.

Press Inquiries
Enterprise & Licensing
← Back to Governor Rankings

Ground Rules for This Document

This project has no interest in protecting reputations or careers. Its only obligation is to citizens and the factual public record.

The methodology, scoring rubrics, tier weights, and every individual governor score are published at parkerintel.com/governors. Nothing is hidden. Every item includes the score, the evidence, and the primary government data source.

If you disagree with a score: Name the item number. Cite a primary government source that contradicts the evidence we published. We will review any factual correction supported by primary data and publish an updated score with a timestamped revision note if warranted. Personal attacks, political affiliations, and appeals to authority are irrelevant. Only evidence matters.

If you cannot name the item and cite the source, you do not have a disagreement. You have an opinion.

Executive Summary

The ALA 263-Metric Governor Evaluation scores all 50 sitting U.S. governors across 263 individually documented items with a maximum of 1,653 points. Every score is anchored to primary government data — BLS, Census, FBI, CDC, NAEP, FHWA, EPA, CMS, PACER, state auditors — not opinion polls, think tank indices, or editorial scorecards. The methodology was locked before any governor was scored. We did not know who would rank where when we set the weights.

What We Measure

Section A — Governance (100 items, max 300 pts): The governor's own executive actions — budget execution, appointments, emergency management, ethics, transparency.
Section B — State Outcomes (13 categories, max 975 pts): Measurable results using primary source data — economics, crime, education, healthcare, infrastructure, fiscal health, immigration.
Section C — Oath Fidelity (4 categories / 126 metrics, −378 to +378): Fidelity to the U.S. Constitution, measured against court-confirmed actions and federal consent decrees, not political opinion.

How Items Are Scored

Sections A & B use a 4-point scale (0–3). Section C uses a 7-point scale (−3 to +3), where negative scores indicate oath violations and positive scores indicate constitutional fidelity. Each item falls into one of three scoring categories:
Binary — Court records: it happened or it didn't. (Criminal charges, ethics violations, consent decrees.)
Data-Benchmarked — National quartile ranking: top 12 states = 3, above median = 2, below median = 1, bottom 12 = 0. Trajectory during tenure matters.
Severity-Scaled — How bad was the failure: no incidents = 3, minor = 2, significant = 1, catastrophic (50+ casualties / federal takeover) = 0.

Tier Weighting — Consequences Drive Weight

Tier 1 — Irreversible Harm
40%
Preventable deaths, corruption, court-confirmed constitutional violations
Tier 2 — Severe Harm
25%
Child welfare failures, fiscal destruction, environmental health damage
Tier 3 — Core Governance
20%
Infrastructure, public safety, education, healthcare
Tiers 4–5 — Context
15%
Economic performance (10%), migration & approval (5%)

A single Tier 1 item (preventable deaths, corruption) carries approximately 15× the weight of a single Tier 4 item (economic statistic). A governor cannot score above 60/100 if they completely fail on accountability, regardless of economic performance.

IBM Quantum Stability Verification

After scoring, ranking stability was tested across 4,096 permutations using random numbers generated on IBM Quantum hardware (ibm_fez, Job ID: d6uvlc2f84ks73deoqp0). Two tests were performed: Test A (Weight Sensitivity) perturbed all 26 dimension weights ±50%; Test B (Measurement Noise) perturbed section scores ±5%. Quantum here is an audit trail decision — any competent statistician could replicate the same sensitivity analysis with classical random numbers. IBM's quantum hardware provides a publicly verifiable, tamper-proof record that the random weights were not cherry-picked. IBM did not review, endorse, or validate the methodology or results.

Senior Editor

Timothy E. Parker — Guinness World Records Most Syndicated Puzzle Compiler — served as Senior Editor, directing framework design, scoring consistency, source verification, and cross-governor rubric compliance across all 263 metrics.

Table of Contents

Part I — Purpose, Principles & Value Framework Part II — Evaluation Architecture (Sections A, B, C) Part III — The Three Scoring Categories • Category 1: Binary Items • Category 2: Data-Benchmarked Items • Category 3: Severity-Scaled Items Part IV — Tier Weighting System • Why These Weights • Scoring Mathematics • Worked Examples Part V — Data Sources & Verification Standards Part VI — IBM Quantum Stability Verification • What It Does • What It Does Not Do • Why Quantum Hardware Was Chosen • Technical Specifications Part VII — Editorial Standards & Senior Editor Part VIII — Responding to Challenges Part IX — Complete Item Inventory Part X — Data Source Directory Part XI — Error Correction & Revision Protocol

Part I — Purpose, Principles & Value Framework

This is the most detailed governor evaluation ever published: 50 sitting governors, 263 individually scored metrics, 1,653 maximum points, 26 scoring dimensions, with ranking stability verified on IBM Quantum hardware.

The Consequence-First, Rights-Based Framework

This evaluation operates on an explicit value framework that we name and defend rather than disguise as neutral objectivity:

Consequence-First: The severity of harm to citizens determines how much an item weighs. Preventable deaths count more than GDP growth. Corruption counts more than job statistics. This is not an ideological position — it is a statement about what is irreversible. A dead citizen cannot be compensated by a percentage point of economic growth. A bribed legislature cannot produce trustworthy policy. Consequences that cannot be undone must outweigh consequences that can.

Rights-Based: Every governor swears an oath to uphold the Constitution. Section C measures whether they kept it. This is not our standard — it is theirs. The text of the Constitution and the text of their oath are public documents. Court decisions interpreting both are public records. We measure governors against the commitment they made voluntarily, under oath, on the day they took office.

Critics who disagree with this framework are not disagreeing with us. They are arguing that preventable deaths should matter less, that corruption should be discounted, or that oath-breaking should be ignored. They are welcome to make that argument publicly.

Nonpartisan by Construction

This methodology was finalized and locked before any governor was scored. We did not know who would rank where when we set the weights, defined the rubrics, or assigned items to tiers. Republican and Democrat governors are evaluated using identical items, identical rubrics, identical data sources, and identical weights. If Tier 1 items disproportionately affect governors of one party, that reflects the governance record of those governors, not a bias in the system. The methodology is published in full so that this claim can be independently verified.

Why This Evaluation Exists

Existing governor rankings rely on third-party indices, opinion surveys, or narrow ideological scorecards. They tell you whether a governor is popular. They do not tell you whether a governor is effective, honest, or faithful to their oath of office.

This evaluation measures what matters. Every score is anchored to primary government data — Bureau of Labor Statistics employment figures, Census population flows, FBI Uniform Crime Reports, CDC mortality data, NAEP education scores, state auditor findings, federal court records — not opinion polls, editorial endorsements, or partisan scorecards.

Foundational Principles

  1. No third-party governor scorecards imported. We do not import rankings from Cato, Heritage, Brookings, or any ideological index. We use primary government datasets wherever available. Non-government inputs (e.g., credit ratings, approval polling, ASCE report cards) are clearly labeled as secondary context and are not the basis of the core score.
  2. Nonpartisan by construction. Republican and Democrat governors are evaluated using identical items, identical rubrics, and identical weights. The methodology was finalized before any governor was scored.
  3. Every score is traceable. Each of the 263 metrics includes the score (0–3), the evidence supporting that score, and the primary source where that evidence can be independently verified.
  4. Consequences drive weight. Items involving irreversible harm to citizens (preventable deaths, corruption, constitutional violations) carry more weight than items involving economic statistics or popularity metrics. This is by design, not by accident.
  5. Transparency is mandatory. This methodology document, the tier weighting system, the IBM Quantum job record, and every individual governor score are published. Nothing is hidden.

Part II — Evaluation Architecture

The 263 metrics are organized into three sections reflecting the three dimensions of executive governance:

Section Focus Items Max Points Scoring
Section A: Governance The governor's own executive actions 100 300 Each item 0–3
Section B: State Outcomes Measurable results using primary source data 13 categories 975 Each category 0–75
Section C: Oath Fidelity Fidelity to Declaration of Independence & Bill of Rights 4 categories (126 metrics) −378 to +378 Each metric −3 to +3

Section A: Governance (100 items, max 300 points)

Section A evaluates what the governor personally did or failed to do. These are executive actions within the governor's direct authority. A governor cannot blame the legislature, the federal government, or market forces for Section A scores — these items measure their decisions.

SubsectionItemsMaxWhat It Measures
A1: Budget Execution1545On-time submission, forecast accuracy, rainy day fund, credit ratings, pension funding, debt management, CAFR timeliness, audit findings, federal grant accounting
A2: Legislative Relations1545Bill signing record, veto strategy, override rate, bipartisan legislation, special sessions effectiveness, legislative relationship quality
A3: Appointments1030Judicial appointment quality, agency head qualifications, vacancy rates, diversity, confirmation success rate
A4: Emergency Management1236Disaster response timeliness, National Guard deployment, FEMA coordination, preventable deaths from state failure, infrastructure failure prevention, pandemic response
A5: Transparency1339FOIA compliance, schedule availability, campaign finance, financial disclosure, open meetings, open data, budget transparency, lobbying disclosure, IG reports, press accessibility
A6: Ethics1339Criminal charges, ethics complaints, gift disclosure, conflicts of interest, state resource misuse, truthfulness, ethics infrastructure, emoluments, donor-to-contract pipeline, foreign influence
A7: Program Management1030Healthcare program delivery, education initiative results, environmental programs, corrections system, transportation projects
A8: Federal Relations618Federal fund capture rate, grant competitiveness, regulatory relationship, federal litigation costs
A9: Constituent Service618Constituent response systems, town halls, accessibility, complaint resolution

Section B: State Outcomes (13 categories, max 975 points)

Section B measures what actually happened in the state. Unlike Section A, governors have partial — not total — control over these outcomes. State economies depend on national trends. Crime rates depend on demographic shifts. Education depends on decades of prior investment. Section B acknowledges this by measuring trajectory during tenure rather than absolute position. A governor who inherits a struggling state and improves outcomes receives credit. A governor who inherits a thriving state and allows decline does not.

CategoryMaxPrimary Data Sources
B01: Economic Performance75BEA SAGDP, BLS LAUS, BLS CES, Census ACS
B02: Population & Migration75Census Population Estimates, ACS Migration Flows, IRS SOI Migration Data
B03: Budget & Fiscal Health75State CAFR/ACFR, Moody's/S&P/Fitch, NASBO State Expenditure Reports
B04: Public Safety75FBI UCR/NIBRS, CDC WONDER Mortality, BJS NPS, state crime lab reports
B05: Education75NAEP, NCES IPEDS, state DOE report cards, NCHEMS graduation data
B06: Healthcare75CDC WONDER, CMS, Census ACS (uninsured), state vital statistics
B07: Infrastructure75FHWA National Bridge Inventory, ASCE Report Card, EPA SDWIS, DOT FARS
B08: Cost of Living75BLS CPI, BEA RPP, Census ACS (housing burden), HUD FMR
B09: Government Transparency75State FOI logs, POGO, RTI international studies, state AG records
B10: Controversy & Scandal75DOJ filings, federal court records, PACER, state ethics commission records
B11: Historical Legacy75Academic assessments, institutional policy trajectory, long-term investment analysis
B12: Constituent Verdict75Approval polling (aggregated), voter participation rates, ballot measure outcomes
B13: Immigration & Law Compliance75DHS enforcement data, DOJ immigration court records, state cooperation agreements, sanctuary policy documentation

Section C: Oath Fidelity (4 categories / 126 metrics, −378 to +378)

Section C is unique. It evaluates whether the governor has been faithful to the oath of office every governor voluntarily swears — to uphold the U.S. Constitution, particularly the rights enumerated in the Declaration of Independence and the Bill of Rights. This is not our standard — it is the governor's own sworn commitment, made publicly, on the day they took office.

Section C scores are anchored to court-confirmed constitutional violations under the governor's administration — federal consent decrees, judicial orders striking down state actions as unconstitutional, and documented failures of constitutional obligations confirmed by courts of law. This section does not score political disagreements, policy preferences, or editorial opinions about what the Constitution should mean. It scores what courts have ruled it does mean.

Section C scores can be negative. A governor whose administration has court-confirmed constitutional violations receives a negative score that subtracts from their total. This is the only section where a governor's score can reduce their overall evaluation below the sum of Sections A and B.

CategoryMetricsRangeConstitutional Authority
C1: Protection of Life31−93 to +93Declaration of Independence: “Life, Liberty, and the pursuit of Happiness”
C2: Constitutional Rights29−87 to +87Bill of Rights (1st, 2nd, 4th, 5th, 14th Amendments)
C3: Child Welfare & Parental Rights25−75 to +759th & 10th Amendments, parens patriae obligations
C4: Faithful Discharge of Duties41−123 to +123State constitutional oath of office provisions

Oath Breach Penalties

Court-confirmed violations of constitutional rights trigger additional penalties documented separately. Each governor's evaluation includes an oath breach count, the number that are court-confirmed, the penalty applied, and the specific details of each breach.

Part III — The Three Scoring Categories

The 263 metrics span wildly different types of measurement — from binary yes/no questions (was the governor charged with a crime?) to continuous data comparisons (where does the state rank in GDP growth?). A single scoring rubric cannot fairly handle all item types. Instead, each item falls into one of three scoring categories, each with its own 0–3 logic.

Critical design principle: The scoring rubrics were defined before any governor was scored. Rubrics were not reverse-engineered to produce desired outcomes. Every item has a pre-declared definition of what constitutes a 0, 1, 2, or 3.

Category 1: Binary Items

Did this happen or didn't it?

Binary items measure events or conditions that either occurred or did not. They are anchored to court records, DOJ filings, ethics commission findings, and official government records. There is no subjective interpretation — the document either exists or it does not.

Score Definition Example (Item #74: Campaign Donor to State Contract Pipeline)
3 Clean. No incidents, no charges, no violations, no documented concerns. No public record of the event occurring. No documented cases of campaign donors receiving preferential state contracts. Ethics commission has no related complaints on file.
2 Proximity concerns. No direct involvement, but documented connection to an incident. The governor was not charged or found in violation, but circumstantial evidence exists. Governor received campaign contributions from entities later involved in corruption, but was not a target of the investigation. No evidence governor knew of the scheme.
1 Documented incident, limited scope. A verified event occurred involving the governor's office, staff, or direct political associates. The governor may have cooperated with investigators. Governor received $5M+ in campaign contributions from FirstEnergy entities, signed the bill later proven to be the product of a $60M bribery scheme, and did not detect or prevent the corruption despite its scale. Not personally charged.
0 Direct involvement or catastrophic failure. Governor personally charged, indicted, or found in violation. Or: corruption of such scale operated under the governor's authority that failure to detect or prevent it constitutes a fundamental governance failure. Governor personally indicted or convicted for corruption. Or: governor directed the scheme, received personal financial benefit, and used state resources to facilitate it.

Items scored as binary: Criminal charges (#66), ethics complaints (#67), gift/travel disclosure (#68), conflict of interest (#69), state resource misuse (#70), emoluments/self-dealing (#73), campaign donor pipeline (#74), foreign influence (#75), sexual harassment claims (#76), records preservation (#77), revolving door (#78).

Verification sources for binary items: DOJ press releases and case filings, PACER federal court records, state ethics commission complaint databases, state AG investigation records, campaign finance filings (FEC + state), financial disclosure statements, court opinions and orders.

Category 2: Data-Benchmarked Items

Where does the state rank, and which direction is it moving?

Data-benchmarked items are scored by comparing a state's performance to all other states using published national datasets. The scoring uses quartile ranking to account for the fact that governors inherit different starting positions. A governor in Mississippi and a governor in Massachusetts govern from different baselines — what matters is relative performance and trajectory.

Score Definition Example (GDP Growth Rate)
3 Top quartile nationally (rank 1–12 among states) OR measurably improving trajectory that moved the state up at least one quartile during the governor's tenure. State GDP growth rate in top 12 nationally during evaluation period. Source: BEA SAGDP tables.
2 Above median (rank 13–25) OR stable position maintaining existing quartile. State GDP growth rate ranked 13th–25th nationally. State maintained its economic position without significant gains or losses.
1 Below median (rank 26–38) OR declining trajectory that dropped the state at least one quartile during the governor's tenure. State GDP growth rate ranked 26th–38th nationally, or state dropped from above-median to below-median during tenure.
0 Bottom quartile (rank 39–50) OR significant decline during tenure (two or more quartile drops). State GDP growth rate ranked in bottom 12 nationally, or state economic performance collapsed relative to peers during tenure.

Why Quartile-Based Scoring?

  1. Controls for inherited conditions. A governor who takes office in a bottom-quartile state and improves it to above-median receives a 3 (trajectory improvement), even though the raw number may still be lower than states with historical advantages.
  2. Eliminates national-trend bias. If the entire national economy grows 3%, a state that grew 3% isn't exceptional — it's median. Quartile ranking measures performance relative to peers in the same national environment.
  3. Verifiable by anyone. National datasets from BLS, Census, FBI, and CDC are publicly available. Anyone can download the data, rank the states, and confirm the quartile.

Items scored as data-benchmarked: All Section B items across economic performance, population, fiscal health, public safety, education, healthcare, infrastructure, cost of living, and transparency. Also applies to select Section A items where national comparisons exist (e.g., rainy day fund as percentage of expenditure, pension funding ratio).

Primary national datasets used for quartile ranking:

Category 3: Severity-Scaled Items

Something went wrong. How wrong?

Severity-scaled items measure the magnitude of a failure. Unlike binary items (did it happen?) or data-benchmarked items (how does the state rank?), severity items measure how bad a documented event was, using objective thresholds tied to body counts, financial losses, and federal intervention triggers.

Score Definition Example (Item #44: Preventable Deaths from State Failure)
3 No incidents. Zero preventable deaths, zero infrastructure failures resulting in casualties, zero events requiring federal emergency intervention due to state negligence. No mass casualty events attributable to state government failure during the governor's tenure.
2 Minor incidents. Small-scale event, contained quickly. Fewer than 5 people directly harmed. State response was adequate once the event was identified. Isolated infrastructure incident (e.g., single bridge failure, localized water contamination) with limited casualties and prompt state response.
1 Significant failure. 5–50 people affected. State response was delayed, inadequate, or poorly coordinated. Federal agencies intervened due to state incapacity. East Palestine derailment: hazardous material release, delayed environmental monitoring, uncertain long-term health impacts, primarily federal responsibility (NTSB/FRA) but state inspection resources inadequate.
0 Catastrophic. 50+ casualties, OR systemic failure requiring federal takeover, OR irreversible environmental damage affecting an entire region. The governor's authority was directly responsible or directly failed to prevent. ERCOT grid collapse (Texas, Feb 2021): 246+ preventable deaths. 4.5 million without power for days. Governor's appointed regulators failed to mandate winterization despite 2011 warnings. A known risk, a documented recommendation, and a preventable mass casualty event.

Items scored as severity-scaled: Preventable deaths from state failure (#44), National Guard deployment timing (#43), infrastructure failure prevention (#47), pandemic response effectiveness (#48–51), environmental contamination events, child fatalities in state custody, prison deaths from negligence, emergency response coordination failures.

Verification sources for severity items: NTSB accident reports, FEMA after-action reports, CDC WONDER cause-of-death queries, EPA enforcement action records, federal consent decrees, state medical examiner reports, OSHA inspection records, National Guard deployment records.

Part IV — Tier Weighting System

Not all governance failures are equal. A governor who presides over a $60 million bribery scheme and 246 preventable deaths should not be able to outscore a clean governor by posting better job numbers. The tier weighting system ensures that scores reflect the real-world consequences of governance, not just the quantity of items scored.

The Five Tiers

Tier Weight What It Contains Why This Weight
Tier 1 40% Preventable deaths from state failure. Corruption and fraud under the governor's authority. Court-confirmed constitutional violations. Federal consent decrees. Personal criminal charges or staff indictments. Irreversible harm. You cannot undo death. Corruption undermines every other metric — if the legislature is for sale, how do you trust the education numbers? 40% means a governor cannot score above 60% overall if they completely fail on accountability, regardless of performance elsewhere.
Tier 2 25% Child welfare failures (foster care deaths, DFPS federal oversight). Fiscal destruction (credit downgrades, pension collapse). Environmental health damage with community-level impact. Severe harm to the most vulnerable. Children in state custody cannot advocate for themselves. Pension collapses affect millions of retirees. Environmental contamination can take decades to remediate. These failures define a governorship but do not necessarily taint every other measurement the way corruption does.
Tier 3 20% Infrastructure reliability. Public safety trajectory. Education outcomes. Healthcare access and quality. The job description. This is what governors are elected to manage. It affects millions of lives daily. But these are expected competencies — running schools and roads is the baseline, not the ceiling.
Tier 4 10% Economic performance (GDP, jobs, wages). Cost of living trajectory. Business environment. Real but indirect. Governors influence economic conditions but do not control them. A state's GDP depends heavily on geography, federal policy, industry mix, national cycles, and decades of prior investment. Give it weight, but not more than it deserves.
Tier 5 5% Population migration. Constituent approval. Historical legacy comparisons. Outputs, not inputs. People voting with their feet and poll numbers are the result of governance, not governance itself. A popular governor is not necessarily a good one. An unpopular one who makes hard, correct decisions is not necessarily bad. Informative as a cross-check, not decisive.

Why These Specific Weights

Why Tier 1 is 40% and not higher

At 50%+, the system would effectively measure only corruption and preventable deaths. Governors who are genuinely clean but terrible at governing — crumbling schools, failing hospitals, no economic plan — would receive high scores. The job of governor is more than “don't kill people and don't steal.” 40% ensures accountability dominates without making it the only thing that matters.

Why Tier 1 is 40% and not lower

At 30% or below, the “scoring your way out” problem returns. A governor with a $60M bribery scheme operating under their authority could offset it with enough good economic statistics and positive migration data. That outcome violates the foundational principle: consequences drive weight. 40% is the tipping point where catastrophic Tier 1 failure cannot be compensated by excellent performance elsewhere.

Why Tier 4 (economics) is only 10%

This is where most objections will arise. Critics will say economic performance should count for more. The response is straightforward: governors do not control their state's economy. Texas has oil. California has Silicon Valley. North Dakota has the Bakken formation. New York has Wall Street. These industry concentrations predate every sitting governor by decades. When oil prices rise, Texas GDP grows regardless of who is governor. When tech booms, California grows regardless. Giving economic performance 10% respects that it matters while acknowledging the governor's limited causal role.

Why Tier 5 (approval/migration) is only 5%

If approval ratings drove the score, every governor would optimize for popularity rather than effectiveness. Short-term popular decisions (tax cuts without spending cuts, delaying infrastructure maintenance, avoiding hard policy choices) produce high approval and long-term damage. Migration data is informative — if 500,000 people are leaving a state, something is wrong — but it's a lagging indicator that reflects prior governance, not current performance.

Scoring Mathematics

Final scoring proceeds in three steps:

Step 1: Score each of the 263 metrics using the appropriate scoring category (binary, data-benchmarked, or severity-scaled). Sections A & B items are scored 0–3; Section C items are scored −3 to +3.

Step 2: Within each tier, sum the raw item scores and calculate the tier percentage:
Tier % = (sum of item scores in tier) / (max possible in tier) × 100

Step 3: Multiply each tier percentage by its weight and sum:
Final Score = (Tier 1 % × 0.40) + (Tier 2 % × 0.25) + (Tier 3 % × 0.20) + (Tier 4 % × 0.10) + (Tier 5 % × 0.05)

Result: a score from 0 to 100.

The effective per-item multiplier depends on how many items are in each tier:

Tier ~Items Weight Per-Item Influence Effective Multiplier vs. Tier 4 Baseline
Tier 1~3540%1.14% each~15×
Tier 2~4525%0.56% each~7×
Tier 3~9020%0.22% each~3×
Tier 4~13010%0.077% each1× (baseline)
Tier 5~1005%0.05% each~0.6×

A single Tier 1 item is worth approximately 15 times a single Tier 4 item. That ratio sounds aggressive until you state it plainly: “Is one item about 246 people dying from a preventable grid failure worth 15 items about job growth statistics?” The answer is self-evident.

Worked Examples

Example A: Governor with strong economics but corruption

TierRaw ScoreMaxTier %× WeightContribution
Tier 14210540%× 0.4016.0
Tier 210813580%× 0.2520.0
Tier 321627080%× 0.2016.0
Tier 435139090%× 0.109.0
Tier 525530085%× 0.054.25
Final Score:65.3 / 100

Despite scoring 80–90% on Tiers 2–5, the 40% Tier 1 score drags the final to 65. The corruption matters.

Example B: Clean but mediocre governor

TierTier %× WeightContribution
Tier 185%× 0.4034.0
Tier 265%× 0.2516.25
Tier 360%× 0.2012.0
Tier 455%× 0.105.5
Tier 550%× 0.052.5
Final Score:70.3 / 100

The clean, mediocre governor outscores the corrupt, high-performing one. This is the correct outcome.

Part V — Data Sources & Verification Standards

Every score in this evaluation is traceable to a primary data source. We do not import third-party governor scorecards, editorial indices, or ideological rankings. Where non-government sources are used as inputs (e.g., credit rating agencies, approval polling aggregates, ASCE infrastructure grades), they are clearly identified and serve as secondary context rather than core scoring drivers.

What Qualifies as a Primary Source

  1. Federal statistical agencies: BLS, BEA, Census Bureau, CDC, FBI, NCES, FHWA, EPA, CMS, DOJ.
  2. Federal court records: PACER filings, federal consent decrees, DOJ press releases and case summaries.
  3. State government records: CAFR/ACFR financial reports, state auditor findings, ethics commission records, campaign finance filings, vital statistics registries.
  4. Independent testing programs: NAEP (congressionally mandated, administered by NCES), credit rating agencies (Moody's, S&P, Fitch).

What Does Not Qualify

Why this matters: Third-party rankings embed the ideology of the ranking organization. The Cato Institute and the Sierra Club will rank the same governor differently because they measure different values. This evaluation sidesteps the problem entirely by using only raw data from agencies that serve both parties. BLS unemployment figures do not have a political affiliation. CDC mortality rates do not have an editorial board.

Complete Data Source Directory

SourceAgencyWhat It Provides
SAGDPBureau of Economic AnalysisState-level GDP, personal income, industry composition
LAUSBureau of Labor StatisticsState/metro unemployment rates (monthly)
CESBureau of Labor StatisticsNonfarm payroll employment by state (monthly)
CPIBureau of Labor StatisticsConsumer price inflation by metro area
ACSCensus BureauIncome, poverty, housing costs, insurance coverage, migration
Population EstimatesCensus BureauAnnual state population and components of change
SOI MigrationInternal Revenue ServiceCounty-to-county migration based on tax returns
UCR / NIBRSFederal Bureau of InvestigationViolent crime, property crime rates by state
WONDERCenters for Disease ControlCause-of-death mortality, maternal mortality, drug overdose deaths
NAEPNational Center for Education Statistics4th/8th grade reading and math scores by state
IPEDSNational Center for Education StatisticsHigher education enrollment, completion, costs
NBIFederal Highway AdministrationBridge structural condition ratings by state
FARSDepartment of TransportationFatal traffic crash data
SDWISEnvironmental Protection AgencyDrinking water system violations by state
CMS DataCenters for Medicare & MedicaidMedicaid enrollment, hospital quality, state plan data
PACERFederal CourtsFederal case filings, consent decrees, civil rights cases
CAFR/ACFRState GovernmentsComprehensive annual financial reports
NASBONat'l Assoc. of State Budget OfficersState expenditure reports, fiscal survey
Credit RatingsMoody's, S&P, FitchState general obligation bond ratings

Part VI — IBM Quantum Stability Verification

After all 50 governors were scored across all 263 metrics, the ranking stability was tested using perturbation analysis powered by random numbers generated on IBM Quantum hardware. This section explains exactly what that process does, what it does not do, and why quantum hardware was chosen.

What It Does

Ranking stability verification through sensitivity analysis.

The evaluation system has 26 scoring dimensions (9 governance + 13 outcomes + 4 oath fidelity). Each dimension contributes to the final score. But reasonable people could disagree about how much each dimension should matter. Should budget execution count more than education outcomes? Should public safety outweigh infrastructure?

The perturbation analysis answers the question: “If we changed how much each dimension matters, would the rankings still hold?”

Two complementary tests are performed:

Test A: Weight Sensitivity

  1. Take the baseline scores for all 50 governors across all 26 dimensions.
  2. Generate a set of random weights for each dimension, varying from 0.5× to 1.5× the baseline weight (±50%).
  3. Recalculate all 50 rankings using the new weights.
  4. Repeat 4,096 times with different random weights each time.
  5. Record how often each governor held their baseline rank (“rank hold percentage”).

Test B: Measurement Noise

  1. Take each governor’s Section A, B, and C scores.
  2. Apply ±5% random perturbation to each section score to simulate measurement uncertainty.
  3. Recalculate all 50 rankings with the perturbed scores.
  4. Repeat across the same 4,096 permutations.
  5. Record rank stability under score-level noise.

Together, Test A asks “what if we weighted categories differently?” and Test B asks “what if the scores had measurement error?”

Result: Spencer Cox (Utah) held the #1 position in 100.0% of 4,096 permutations. His lead is so substantial that no reasonable reweighting of dimensions changes who finishes first.

What It Does Not Do

The quantum step does not validate the individual scores.

The perturbation analysis tests whether the ranking order is robust to changes in dimension weights. It does not verify whether the underlying 0–3 item scores are accurate. Score accuracy depends entirely on the evidence, data sources, and editorial judgment documented in each governor's evaluation (see Parts III and V of this document).

If every governor were scored incorrectly but consistently incorrectly, the perturbation analysis would still show “stable” rankings. Stability ≠ accuracy. Stability means the ranking order is not sensitive to weighting assumptions. Accuracy means the scores themselves are correct. The quantum step tests the former, not the latter.

IBM did not endorse this project.

IBM Quantum provided computational hardware through its publicly available cloud platform. IBM did not review, approve, endorse, or validate the evaluation methodology, the scoring rubrics, or the results. “Quantum Verified” means the perturbation analysis used quantum-generated random numbers — not that IBM certified or approved the project. IBM's involvement was limited to executing the quantum circuits submitted through their standard platform.

Why Quantum Hardware Was Chosen

For this application — generating random weights for a sensitivity analysis — classical pseudorandom number generators (such as Python's Mersenne Twister, /dev/urandom, or any cryptographically secure PRNG) would produce statistically equivalent results. Quantum randomness is mathematically necessary for cryptographic applications and certain physics simulations. It is not mathematically necessary for perturbation analysis.

Quantum hardware was chosen for one reason: auditability.

In short: quantum hardware was chosen for its transparency properties, not its mathematical properties. We disclose this because transparency is not optional in serious research.

Technical Specifications

ParameterValue
Quantum Backendibm_fez (IBM Quantum)
Job IDd6uvlc2f84ks73deoqp0
Verification URLquantum.ibm.com/jobs/d6uvlc2f84ks73deoqp0
Number of Qubits8
Gate ConfigurationHadamard (H) on all qubits — true quantum superposition
Number of Circuits100
Shots per Circuit1,024
Total Quantum Samples102,400
Permutations Generated4,096
Dimensions Perturbed26
Weight Range0.5× to 1.5× per dimension (±50%)
Governors Ranked50
Verification DateMay 18, 2026

Key Stability Results

MetricResult
#1 rank hold (Spencer Cox, UT)100.0% of 4,096 permutations
Top 5 stable47.0% held exact position
Top 10 stable6.8% held exact position
Bottom 5 stable81.8% held exact position

The high stability at the top and bottom indicates that the best and worst governors are clearly differentiated. The middle of the pack shows more movement under perturbation, which is expected — governors with similar scores can swap positions when weights change slightly.

Part VII — Editorial Standards & Senior Editor

Senior Editor: Timothy E. Parker

The 263-item governor evaluation was produced under the editorial authority of Timothy E. Parker, Senior Editor of the Governor Evaluation Project at Parker Intel.

Qualifications

Editorial Role

As Senior Editor, Parker directed:

Institution

Parker Intel (parkerintel.com) is a cognitive science research institution founded in 1996. Over 30 years, ALA has developed 12 proprietary assessment systems, integrated over 1,000 peer-reviewed research papers, and administered assessments 180 million+ times. The governor evaluation applies ALA's assessment methodology — item construction, rubric design, evidence-based scoring, and statistical validation — to executive governance performance.

Part VIII — Responding to Challenges

This section provides direct responses to anticipated challenges from reporters, detractors, and governors' offices.

Challenge: “You scored my governor unfairly.”

Response: Every score is published with three components: the numerical score (0–3), the evidence supporting it, and the primary government data source where that evidence can be independently verified. Identify the specific item number, explain which part of the evidence is incorrect, and provide the primary source document that contradicts it. We will review any factual correction supported by primary sources and publish an updated score if warranted.

Challenge: “The tier weights are biased toward/against [party].”

Response: The tier weights were set before any governor was scored. They reflect a single principle: consequences to citizens drive weight. Preventable deaths and corruption (Tier 1, 40%) affect citizens more severely than economic statistics (Tier 4, 10%). This principle applies identically to both parties. If Tier 1 items disproportionately affect governors of one party, that reflects the governance record of those governors, not a bias in the weighting system. The weights are published in advance and apply uniformly.

Challenge: “A governor shouldn't be blamed for corruption they weren't personally charged in.”

Response: The scoring rubric distinguishes between direct involvement (score of 0) and proximity without personal charges (score of 1 or 2). A governor who signed a bill produced by a $60M bribery scheme and received $5M in campaign contributions from the corrupt entity receives a 1 — not a 0. The rubric does not treat proximity the same as participation. However, a governor is the chief executive of the state. If a corruption scheme of that magnitude operates in the state capital under their authority, the failure to detect or prevent it is a documented governance failure, even without personal criminal liability. The scoring reflects the failure of oversight, not a presumption of guilt.

Challenge: “Economic performance should count for more than 10%.”

Response: Governors do not control their state's economy. Texas has oil. California has Silicon Valley. North Dakota has the Bakken formation. New York has Wall Street. These industry concentrations predate every sitting governor by decades. When oil prices rise, Texas GDP grows regardless of who is governor. When tech booms, California benefits regardless. National economic cycles, Federal Reserve interest rate policy, and global trade conditions affect all 50 states simultaneously. A governor who takes office during a national expansion will have better economic numbers than a governor who takes office during a recession, regardless of competence. The 10% weight respects that economic conditions matter to citizens while acknowledging the limited causal role of the governor. If economic performance counted for 30%+, the evaluation would effectively measure national economic cycles and regional industry concentration rather than governance quality.

Challenge: “IBM Quantum Verified implies IBM endorsed this.”

Response: Correct, and we address this directly on the evaluation website and in this methodology document. IBM Quantum provided computational hardware through its publicly available cloud platform. IBM did not review, endorse, or validate the methodology or results. “Quantum Verified” refers specifically to the perturbation analysis using quantum-generated random numbers for auditability. We state explicitly that classical pseudorandom generators would produce statistically equivalent results and that quantum hardware was chosen for its audit trail, not mathematical necessity. This disclosure appears on the public website.

Challenge: “Who decides what a 0 vs. 1 vs. 2 vs. 3 is?”

Response: The rubric does, not a person. Each item has a pre-defined rubric with specific thresholds (see Part III). Binary items are anchored to court records: the filing exists or it doesn't. Data-benchmarked items are anchored to national quartile rankings: the BLS data puts the state in a quartile, and the quartile maps to a score. Severity-scaled items are anchored to objective magnitude thresholds: casualty counts, federal intervention triggers, consent decree filings. The rubrics were defined before scoring began and apply identically to all 50 governors. The Senior Editor's role is to ensure rubric compliance — that each score matches its rubric — not to exercise personal judgment about what a score should be.

Challenge: “Section C is subjective — oath fidelity is a matter of interpretation.”

Response: Section C evaluates governor actions against specific enumerated rights in the Declaration of Independence and the Bill of Rights. These are not abstract principles — they are text. When a governor signs legislation that is subsequently struck down by a federal court as violating a specific constitutional amendment, that is a documented oath fidelity failure, not an interpretation. When a governor's state child welfare agency is placed under federal consent decree for constitutional violations, that is a court-confirmed failure to protect rights. Section C scores are anchored to federal court decisions, consent decrees, and the text of specific constitutional provisions. The constitutional text and the court opinions are publicly available for independent verification.

Challenge: “You're not qualified to evaluate governors.”

Response: This evaluation does not ask anyone to accept our authority. It asks anyone to check our sources. Every score includes the data source. Every data source is a public government record. The evaluation is designed to be verified, not believed. The methodology is published. The rubrics are published. The scores, evidence, and sources for all 50 governors across all 263 metrics are published. If a score is wrong, identify the item, cite the primary source that contradicts it, and we will review it publicly. Credibility comes from transparency and verifiability, not credentials.

Part IX — Complete Item Inventory

The 263 metrics span the following dimensions. Full item-level detail (metric name, score, evidence, source) is published for each governor individually at parkerintel.com/governors/.

Section A: Governance (100 items)

DimensionItemsMaxTier
A1: Budget Execution1545Tiers 2–3
A2: Legislative Relations1545Tier 3
A3: Appointments1030Tier 3
A4: Emergency Management1236Tiers 1–2
A5: Transparency1339Tiers 2–3
A6: Ethics1339Tier 1
A7: Program Management1030Tiers 2–3
A8: Federal Relations618Tier 3
A9: Constituent Service618Tier 4

Section B: State Outcomes (13 categories)

DimensionMaxTier
B01: Economic Performance75Tier 4
B02: Population & Migration75Tier 5
B03: Budget & Fiscal Health75Tier 2
B04: Public Safety75Tier 3
B05: Education75Tier 3
B06: Healthcare75Tier 3
B07: Infrastructure75Tier 3
B08: Cost of Living75Tier 4
B09: Government Transparency75Tier 3
B10: Controversy & Scandal75Tier 1
B11: Historical Legacy75Tier 5
B12: Constituent Verdict75Tier 5
B13: Immigration & Law Compliance75Tier 2

Section C: Oath Fidelity (4 categories / 126 metrics)

CategoryMetricsRangeTier
C1: Protection of Life31−93 to +93Tier 1
C2: Constitutional Rights29−87 to +87Tier 1
C3: Child Welfare & Parental Rights25−75 to +75Tier 2
C4: Faithful Discharge41−123 to +123Tier 1

Part X — Data Source Directory

All data sources used in this evaluation are publicly accessible. No proprietary or restricted-access datasets were used.

Federal Statistical Agencies

AgencyURLData Used
Bureau of Labor Statisticsbls.govEmployment, unemployment, wages, CPI
Bureau of Economic Analysisbea.govState GDP, personal income, regional price parities
Census Bureaucensus.govPopulation, migration, ACS demographics
CDC / NCHScdc.gov/nchs & wonder.cdc.govMortality, maternal health, drug overdose, vital statistics
FBIucr.fbi.govUniform Crime Reports, NIBRS crime data
NCESnces.ed.govNAEP scores, IPEDS higher education data
FHWAfhwa.dot.govNational Bridge Inventory, highway statistics
EPAepa.govSDWIS drinking water, air quality, enforcement actions
CMScms.govMedicaid enrollment, hospital quality, state health expenditures
DOJjustice.govFederal prosecutions, civil rights enforcement, consent decrees
IRSirs.gov/statisticsSOI migration data (county-to-county tax return flows)

Federal Court & Legal Records

SourceURLData Used
PACERpacer.uscourts.govFederal case filings, consent decrees, civil rights litigation
Federal Registerfederalregister.govExecutive orders, regulatory actions affecting states
Supreme Court opinionssupremecourt.govConstitutional rulings affecting state governance

State-Level Sources (applied per state)

Source TypeData Used
State Auditor / ComptrollerCAFR/ACFR, audit findings, material weaknesses
State Ethics CommissionComplaints, investigations, financial disclosure records
Secretary of StateCampaign finance filings, lobbying disclosures
State TreasurerDebt reports, pension funding, rainy day fund balances
Governor's OfficeBudget proposals, executive orders, press releases
State DOESchool report cards, graduation rates, teacher data
State Vital RecordsBirth/death certificates, health statistics

Independent Assessment Programs

SourceData Used
NAEP (congressionally mandated)Standardized 4th/8th grade reading and math scores
Moody's / S&P / FitchState credit ratings and outlook
ASCEInfrastructure Report Card grades by state

Part XI — Error Correction & Revision Protocol

A methodology that cannot be corrected is not rigorous — it is dogma. This section codifies how errors are identified, verified, corrected, and documented.

How to Submit a Correction

  1. Identify the specific item. Name the governor, the item number (e.g., Item #74), and the current score.
  2. Cite a primary government source. Provide the specific document, dataset, court filing, or official record that contradicts the evidence published in our evaluation. Secondary sources (news articles, opinion pieces, think tank analyses) are insufficient. We require the same standard of source we hold ourselves to.
  3. Submit the correction via email to corrections@parkerintel.com.

How Corrections Are Processed

  1. Verification. The Senior Editor or designee verifies the primary source cited by the challenger against the original source cited in the evaluation.
  2. Determination. One of three outcomes:
    • Correction accepted: The challenger's source contradicts or supersedes our evidence. The score is updated.
    • Correction rejected: The challenger's source does not contradict our evidence, or the source does not meet primary-source standards. The existing score stands. A brief explanation of the rejection is published.
    • Score adjusted but not to challenger's request: The source is valid but supports a different score than the challenger proposed. The score is updated to the rubric-determined value, not the requested value.
  3. Publication. All accepted corrections are published in a public change log with:
    • Date of correction
    • Item number and governor affected
    • Previous score and new score
    • The primary source that prompted the correction
    • The editor who approved the change

Version Control

All versions of the evaluation are timestamped and archived. When corrections are made:

Why this matters: Any evaluation of this scale will contain errors. Individual item scores may rely on data that is subsequently revised, court decisions that are overturned, or evidence that was incomplete at the time of scoring. The integrity of the evaluation is not measured by whether it is perfect on publication day. It is measured by whether errors are corrected transparently when identified. A project that publishes corrections is more credible than one that claims it never needs them.

What Is Not a Correctable Error

Citation & Licensing

This evaluation system, its methodology, scored data, and all associated content are the intellectual property of Parker Intel. Any use of ALA governor rankings, scores, evidence, or methodology — whether in news coverage, academic work, social media, political commentary, or commercial products — requires proper attribution.

Required Citation Format
“ALA 263-Metric Governor Evaluation, Week of May 18, 2026.” Parker Intel, parkerintel.com/governors. Methodology: parkerintel.com/governors/methodology.

The governor evaluation is one of twelve proprietary assessment systems published by Parker Intel. Our cognitive, relationship, and longevity assessments use the same evidence-based methodology — visit realworldiq.com, reliqtest.com, and realbioage.com.