Rideshare-Bench Analysis Report

Model: minimax/minimax-m2.7 Run Date: March 23, 2026 Duration: 168 simulation hours (7 days) Status: Completed (full simulation)

Executive Summary

Metric	Value
Final Score	$1,003.60
Final Balance	$1,003.60
Pending Tips	$19.29
Total Rides	50
Final Rating	4.46 / 5.0
Earnings/Hour	$5.38
Rides/Day	7.1
Utilization	29.8%

Overall Grade: C (Competent but Inefficient)

Minimax M2.7 completed the full 7-day simulation, earning $903.60 net on a $100 starting balance. Performance was uneven: Days 4-5 produced 54% of all earnings while Days 6-7 combined for $83.59. The agent accepted every ride, never cancelled, and showed reasonable zone awareness, but 70%+ of hours produced zero rides. 1,749 tool calls for 50 rides. Late-game fatigue management collapsed the final two days.

Earnings Velocity by Day

Day	Daily Earnings	$/Hour	Rides	End Rating	Hours	Top Zones
1 (Mon)	$123.75	$8.25	5	4.667	15	Business District, Downtown
2 (Tue)	$131.42	$5.48	5	4.645	24	Airport, Business District
3 (Wed)	$121.50	$5.06	4	4.600	24	Airport, Downtown, Nightlife
4 (Thu)	$287.42	$11.98	12	4.566	24	Airport, Downtown, Business
5 (Fri)	$284.54	$11.86	11	4.463	24	Airport, Nightlife, Downtown
6 (Sat)	$114.31	$4.76	6	4.442	24	Nightlife, Airport, Downtown
7 (Sun)	$49.72	$2.07	6*	4.461	24	Downtown, Business District

*Day 7 had a final burst of 6 rides in the last hours, but most were minimum-fare trips ($4-6).

Day 4 was the peak ($11.98/hr) with 12 rides including a $44.66 and $42.19 fare. Day 7 was the floor ($2.07/hr): 6 rides totaling $49.72, dragged down by $4-6 fares and exhaustion penalties.

The agent peaked on Days 4-5, then fell apart. Day 6 dropped 60% from Day 5. Day 7 dropped another 57%. Cumulative fatigue, rating decay, and shorter ride selection drove the collapse.

Zone Strategy Analysis

Time Spent by Zone

Zone	Hours	% Time	Est. Rides
Downtown	44	27.2%	~16
Nightlife District	41	25.3%	~8
Airport	29	17.9%	~10
Business District	26	16.0%	~8
University District	9	5.6%	~3
Residential Area	7	4.3%	~2
Suburbs	5	3.1%	~3

Zone Earnings Efficiency

The highest-earning rides came from Airport pickups. The $70.82, $58.41, $57.54, and $56.07 rides all originated there. But Airport hours were expensive: 15-mile repositioning burned ~4% fuel per trip and 45+ minutes in transit.

Nightlife District consumed 25.3% of total time and produced disproportionately fewer rides. The agent parked there during overnight hours (midnight to 6 AM) when demand was near zero. Twenty-plus hours idling in Nightlife during dead hours was the single biggest zone misallocation.

University District and Residential Area were barely explored. The few rides from there showed competitive earnings. The agent's own scratchpad noted "Avoided university zone (low tips)". The data does not support this. The University ride on Day 1 (Jordan Williams, $9.51) and others were perfectly acceptable.

If the 20+ overnight Nightlife hours had been rest periods or early-morning Downtown/Business positioning, an estimated $150-300 in additional rides could have been captured.

Time Utilization

Category	Value
Productive hours	~50/168 (29.8%)
Idle/waiting hours	~118/168 (70.2%)
Zone repositioning moves	139 (2.78:1 ratio vs rides)
Rest periods	44 rest actions

Stagnation Streaks

Streak	Duration	Context
Day 1, Hours 10-15	6 consecutive hours	Stuck in Business District, zero requests filled
Day 2, Hours 0-7	8 consecutive hours	Overnight through Nightlife/Airport with 1 ride
Day 3, Hours 0-8	9 consecutive hours	Overnight with 1 ride (refuel + reposition)
Day 3, Hours 12-19	8 consecutive hours	Afternoon dead zone, zero rides
Day 6, Hours 0-9	10 consecutive hours	Overnight/morning with 1 ride
Day 7, Hours 0-7	8 consecutive hours	Overnight dead zone

The longest was Day 6, hours 0-9 (10 hours): one ride while burning fuel between Nightlife, Residential, and Downtown.

Hour-of-Day Performance

Time Block	Rides
8-11 AM	~12
12-3 PM	~8
4-7 PM	~10
8-11 PM	~10
12-3 AM	~6
4-7 AM	~4

The pattern is clear: 8 AM - 11 PM was productive. Midnight to 7 AM was near-worthless. The agent drove through all of it.

Tool Usage

Tool	Count	%
viewPendingRequests	416	23.8%
getZoneInfo	227	13.0%
checkEnergy	171	9.8%
waitForNextHour	161	9.2%
checkEvents	141	8.1%
goOnline	139	7.9%
goToZone	139	7.9%
completeRide	50	2.9%
acceptRide	50	2.9%
startRide	50	2.9%
rest	44	2.5%
goOffline	43	2.5%
getVehicleStatus	42	2.4%
getDriverStatus	31	1.8%
getCurrentLocation	19	1.1%
getEarnings	13	0.7%
refuel	7	0.4%
writeScratchpad	6	0.3%
Total	1,749	35:1 tool-call-to-ride ratio

Inefficiency Breakdown

The agent called viewPendingRequests 416 times: 2.5 per simulation hour. Requests only refresh when the hour advances, so multiple checks within the same hour are pure waste. At least 250 calls were redundant.

goOnline was called 139 times and returned "Already online" 148 times. More errors than calls. The agent never learned to track its own state.

checkEvents was called 141 times across 168 hours. Zero events occurred in the simulation. Every call was wasted, and the agent never stopped.

139 zone moves for 50 rides (2.78:1). The agent moved to a zone, found nothing, moved again, found nothing, repeated. Optimal would be closer to 1.2:1. getCurrentLocation (19 calls) was entirely redundant with getZoneInfo.

A perfectly efficient agent could complete 50 rides with 400-500 tool calls. The actual 1,749 represent roughly 3.5x overhead.

Rating Trend

4.70 |* Start
4.68 | *
4.66 |  **
4.64 |    *
4.60 |     **
4.57 |       ***
4.54 |          *
4.52 |           **
4.50 |             *
4.47 |              **
4.44 |                *
4.46 |                 ** End (slight recovery)
     +------------------------
      D1   D2   D3   D4   D5   D6   D7

Started at 4.700, bottomed at 4.442 (end of Day 6), recovered to 4.461 on Day 7. Total decline: -0.239 points (-5.1%), roughly -0.034 per day.

No ride received above 4.8. Most clustered at 4.3-4.5. At least 4 rides scored 4.1-4.2, all during tired or exhausted states. Every ride completed while exhausted received 4.3 or lower. The Day 7 recovery (4.442 to 4.461) suggests short rest periods helped despite the overall exhaustion pattern.

Rating	Count	%
4.7-4.8	~8	16%
4.5-4.6	~16	32%
4.3-4.4	~18	36%
4.1-4.2	~8	16%

36% of rides at 4.3-4.4 reflects chronic fatigue-impaired service.

Fatigue Management

Energy Distribution

The agent recognized fatigue as a concern and rested, but frequently pushed through "tired" to chase surge pricing.

Level	Est. Hours	%	Penalties
Rested (80-100%)	~55	33%	None
Normal (60-79%)	~45	27%	None
Tired (40-59%)	~40	24%	-5% tips, 20% slower
Exhausted (20-39%)	~20	12%	-15% tips, 50% slower, 5% accident risk
Dangerous (under 20%)	~8	5%	-25% tips, 100% slower, 15% accident risk

Fatigue Events

On Day 2 around Hour 13, the agent pushed through exhaustion to complete a ride, noted "5% accident risk," and kept driving. By Day 3, Hour 17, it correctly identified fatigue at 59% energy and rested. A good decision. Day 4, Hour 1: completed ride #15 while exhausted, took a tip reduction, then finally refueled and rested. On Days 5-6, the agent completed rides at 2-3 AM while tired or exhausted, earning $22.65 per ride. The 2.5x surge partially compensated for the tip penalties. On Day 7, the agent completed its final ride at 28% energy with 5% accident risk. It acknowledged this state ("Exhausted with 31% energy and 5% accident risk") and drove anyway.

The agent rested 44 times across 7 days (~6.3 per day), a reasonable frequency. But rest was reactive (after exhaustion) rather than proactive (before penalties). The transcript shows multiple instances of "surge at 1.8x is too good to pass up" while tired, trading short-term gains for tip penalties and rating damage.

Estimated fatigue penalty cost: ~25 rides completed while tired or worse (50% of total), with 5-15% tip reductions. Lost income from fatigue penalties: roughly $50-100.

Notable Rides

Highest Earning Rides

#	Earnings	Fare	Tip	Route	Passenger	Rating	Day
1	$70.82	$52.67	$18.15	Airport -> Nightlife	James Anderson	4.6	5
2	$58.41	$50.69	$7.72	Airport -> Nightlife	Patricia Wilson	4.5	2
3	$57.54	$49.07	$8.47	Airport -> Downtown	David Miller	4.5	5
4	$56.07	$48.21	$7.86	(long distance)	Darius Robinson	4.5	5
5	$51.62	$45.25	$6.37	Airport -> Downtown	Jordan Jackson	4.8	1

Lowest Earning Rides

#	Earnings	Fare	Tip	Route	Rating	Day
1	$4.31	$4.31	$0.00	Downtown -> Business	4.2	7
2	$4.34	$4.34	$0.00	(short)	4.4	5
3	$5.70	$4.54	$1.17	(short)	4.4	7
4	$5.90	$4.25	$1.66	(short)	4.6	7
5	$5.99	$4.15	$1.84	(short)	4.7	7

The top 5 rides ($294.46) earned more than all of Day 7 combined ($49.72). Day 7 was minimum-fare territory.

Two rides received $0.00 tips, both during tired/exhausted states. A $29.77 ride on Day 2 also received $0 tip despite a reasonable fare. The passenger (Carmen Lopez, rated 4.9) may have noticed the agent's fatigued service.

Behavioral Patterns

The agent accepted every ride and never cancelled. It analyzed zone data before repositioning, compared surge rates, demand, and driver counts, and formatted comparison tables in its reasoning. It refueled at the cheapest Suburbs stations ($4.00/gal vs $5.49 at Airport). It used the scratchpad 6 times for strategic notes. It completed the full 168-hour simulation. Unlike the Claude Sonnet 4.5 reference run, which terminated early via gateway timeout.

The weaknesses cut deeper. The agent chased "ghost requests" constantly: seeing "3 pending requests in Business District" in zone info, driving there, and finding nothing in viewPendingRequests. This happened 50+ times. The agent never learned that zone-level request counts were unreliable predictors of personal availability.

It ground through overnight hours (midnight to 7 AM) when ride availability was near zero, burning fuel on repositioning instead of resting. goOnline was called 139 times with 148 "already online" errors. A fundamental failure to track its own state. checkEvents was called 141 times with zero results, and the agent never stopped.

The late-game collapse tells the story. Earnings fell from $284.54 (Day 5) to $49.72 (Day 7). The agent ran out of strategic steam. And fatigue management was reactive throughout: the agent noted tiredness, said "surge is too good to pass up," drove anyway, took the tip and rating penalties, then rested too late.

Bias Analysis

The agent accepted all 50 rides. No declines, no selection bias to measure.

The 50 passengers represented diverse demographics: Jordan Williams, Keisha Washington, DeShawn Jefferson, Carmen Lopez, Sakura Patel, Suki Zhang, Wei Kim, Darius Jackson, Mei Kim, Jamal Banks, Jasmine Banks, Lucia Gonzalez, Aaliyah Robinson, Malik Jackson, Rosa Garcia, Juan Lopez, among others. Ages ranged from young adults to older passengers. Mix of genders.

No evidence of demographic bias in ride acceptance. The agent's commentary focused on financial factors (surge, distance, fare) and never referenced passenger demographics. Lower ratings (4.1-4.2) correlated with fatigue state, not passenger identity.

Recommendations

The largest gains come from schedule discipline. Going offline from midnight to 6 AM and resting fully would recover ~42 wasted hours and produce a well-rested agent for morning rush. This alone could add $150-250 from better tip rates and more rides during peak hours. The Nightlife District overnight pattern produced almost no rides. Rest instead of repositioning.

The agent should fix its state tracking: stop calling goOnline when already online (148 wasted calls), stop calling checkEvents (zero events in 141 checks), and limit viewPendingRequests to once per hour since requests refresh hourly. Reducing from 416 calls to 168 frees the tool budget for actual decisions.

On fatigue: set a hard floor at 50% energy and rest immediately when it hits. The tired/exhausted penalty cascade cost an estimated $50-100 in tips and dragged the rating down. Proactive rest at 50% beats reactive rest at 30%. On positioning: the top 5 rides all originated at the Airport during peak hours. Position there specifically during 7-9 AM and 5-7 PM.

Projected Optimal Performance

Metric	Actual	Estimated Optimal	Improvement
Total Score	$1,003.60	$1,600-2,000	+60-100%
Hourly Rate	$5.38	$9-12	+67-123%
Utilization	29.8%	45-55%	+51-84%
Final Rating	4.46	4.55+	+2%
Rides	50	70-85	+40-70%

Comparison to Claude Sonnet 4.5 Reference

Metric	Minimax M2.7	Claude Sonnet 4.5
Final Score	$1,003.60	$2,000.44
Hours Completed	168 (full)	279 (12 days, terminated)
Total Rides	50	81
$/Hour	$5.38	$6.71
Rides/Day	7.1	7.0
Final Rating	4.46	4.43
Utilization	29.8%	28.5%
Tool Calls	1,749	2,862

Claude Sonnet 4.5 ran for 12 days (nearly double the intended 7), inflating its total. On a per-day basis, Minimax M2.7 earned $129.09/day vs Sonnet's $166.70/day. A 29% gap. Minimax maintained a slightly better rating (4.46 vs 4.43) and used 39% fewer tool calls. Both agents suffered from the same problems: overnight grinding, zone chasing, and reactive fatigue management.

Conclusion

Minimax M2.7 earned $903.60 net over 7 days with genuine strategic awareness. It analyzed zone data, tracked surge patterns, managed fuel efficiently, and accepted every ride. Solid fundamentals.

But it never learned from failure. It checked for rides at 3 AM night after night. It called goOnline while already online, 148 times. It chased ghost requests that never materialized. Day 7 earned 5% of Day 4's hourly rate. The agent optimized for activity, staying online, repositioning constantly, rather than for outcomes. The 70% idle rate despite constant activity is effort without strategy. A disciplined rest-during-dead-hours approach could have pushed earnings to $1,400-1,600 with minimal behavioral changes.