Sequential Floor Optimization Programmatic Advertising — AdTech Lab Research

Programmatic advertising's shift to first-price sealed-bid auctions came at a price publishers rarely talk about: it killed the English auction's gift of dynamic price discovery. We propose treating a user's full browsing session as a single iterative auction — predict an opening floor from pre-session features, then recalibrate it after every request using a bias-corrected EMA — and recover most of what was lost. 27% RPR lift over a no-floor baseline in simulation.

The English auction is the gold standard of price discovery — Vickrey, Christie's, every estate sale on the planet. Each price point gets validated by a human willing to pay it. Programmatic advertising quietly traded that auction format away when latency made it unworkable, and we have been paying for the trade ever since.

The replacement is the first-price sealed-bid (FPSB) auction. Each impression is its own opaque envelope: the demand-side platform bids its number, the highest bid wins, and nobody learns anything about the other bidders. Information asymmetry by design.

A naive defense is to set a floor. Reserve prices have been the publisher's lever since Myerson 1981. But for the last decade we have been setting them one auction at a time — pretending each impression is its own world. It isn't.

The User Session Is One Auction in Disguise

When a user lands on a publisher and browses, they generate a sequence of ad requests — typically five to fifty in a single session. Each request is technically its own FPSB auction, but they share something the textbook setup denies: a common state. The same DSPs are seeing the same user. The same bidder population is in play. And critically, by request t, you have observed the winning bids of requests 1 through t-1.

That looks suspiciously like an English auction smeared across requests. Information about the market trickles out as the session progresses. Treating each request in isolation throws that information away.

We propose a Floor Price Optimization (FPO) framework that treats the whole session as a single iterative auction. Two phases:

Phase I sets the opening floor using everything you knew before the user arrived. Phase II recalibrates the floor after every request using everything that has happened since. The handoff between them is one number — the initial floor.

Fig. 01 Two phases, two information regimes. The pre-session model uses everything you knew before the user arrived. The intra-session update uses everything that has happened since they did. The handoff is the initial floor — that's the only state the second phase inherits.

Phase I — Setting the Opening Floor

The first request of a session is the only one with no intra-session signal. So Phase I leans on whatever you know before the session starts: user agent, geo, time of day, historical CPMs at this URL, vertical, and so on. Call that feature vector x.

A regression model trained on historical data predicts revenue per request as a function of those features and the floor we choose:

𝔼[RPR | x, f] = ĝ_θ(x, f)

The optimal initial floor is just the argmax over f:

f*_init(x) = argmax_f ĝ_θ(x, f)

There is one trap. If you only train on whichever floors your system has already chosen, you bake your historical biases into the model — the regressor never sees what would happen at the floors you have never tried. To break that, ring-fence one percent of traffic for ε-greedy exploration: pick a floor uniformly at random from a bounded interval, ignoring whatever the model predicts. That gives the regressor unbiased data and keeps it honest as the market drifts.

Phase II — Learning Mid-Session

As soon as request 1 closes, you have a new piece of data — the winning bid b_1. Phase II uses it to update the belief about the bid landscape, then applies that belief to the floor for request 2. Repeat for every subsequent request in the session.

The natural starting point is an exponential moving average over the log of winning bids, plus a noise term on the variance. Lightweight, computable inside the 100ms RTB envelope. Looks roughly like:

μ_t = α · μ_{t-1} + (1 − α) · ln(b_{t-1})

That feels right and contains a subtle bug. We are updating our estimate of the typical bid using b_{t-1}. But b_{t-1} is not the typical bid — it is the winning bid, which is the maximum of the n competing bids. The maximum of n samples is biased upward, and the more bidders there are, the bigger the bias.

Blom's approximation gives the expected upward shift of the maximum of n standard normals: roughly 0.56 for n=2, 1.54 for n=10. Subtract that bias before averaging:

μ_t = α · μ_{t-1} + (1 − α) · (ln(b_{t-1}) − σ_{t-1} · c(n_{t-1}))

Now the EMA tracks the market center, not the winners' high-water mark. The variance update follows the same shape with a noise floor γ to prevent collapse:

σ²_t = α · σ²_{t-1} + γ

Fig. 02 Every winning bid is the maximum of n competing offers. Plug those winners straight into an EMA and you don't get the average bidder — you get a moving estimate of the strongest bidder, which drifts upward forever. Subtract σ·c(n), where c(n) is the expected upward shift of the maximum of n samples, and the EMA tracks the right thing.

The Fill-Rate Trade-off

Maximizing the expected clearing price alone is the wrong objective. An unfilled impression has an opportunity cost — you fall back to a passback network, or you lose the inventory entirely. Call that cost λ. Then the publisher should pick:

f*_t = argmax_f [ (f − λ) · P(b ≥ f) ]

That is exactly the Myerson optimal-reserve form, with λ as the seller's baseline valuation. When λ is high, you stay conservative on the floor — better to sell cheap than not sell at all. When λ is low, push the floor.

One more knob. The number of bidders varies request to request. With ten DSPs in flight you can demand a tighter floor than with two. Fold the estimated bidder count n̂_t in, and the final floor becomes:

f*_t = argmax_f [ (f − λ) · (1 − F(f | μ_t, σ_t)^{n̂_t}) ]

When intra-session competition is dense, the framework supports aggressive floors. When it is sparse, the framework relaxes them automatically — preserving fill rate without giving away yield in the dense case.

Did It Work?

We ran the full framework against a static-floor baseline and a no-floor baseline across 5,000 simulated sessions with 10 heterogeneous DSP agents (retargeting, prospecting, brand) each executing independent budgets, pacing, and utility-maximizing bid functions reflecting real-world bid-shading behavior.

Aggregate RPR: no floor $3.01, static $3.00 floor $3.82, FPO $3.95. The 27% lift over the no-floor baseline is the headline number — it is the revenue currently leaking into DSP margins through unconstrained bid shading. The 3.4% lift over the static-floor baseline is more interesting in practice, because almost everyone already runs a static floor.

More telling than either headline: when you map yield against fill rate, the FPO mechanism sits structurally above the efficient frontier traced by every fixed reserve. There is no static floor that matches FPO's yield at comparable fill rate. The intra-session calibration is doing real work — work that no setting of a single number could replicate.

Fig. 03 The headline is +27% RPR over the no-floor baseline. The structural finding is that the FPO points sit above the curve traced by every fixed reserve — there is no static floor that achieves comparable yield at comparable fill rate. The intra-session calibration is doing real work.

Limitations Worth Naming

One vulnerability is real. If sophisticated DSPs detect intra-session floor ratcheting and learn to deliberately depress their early bids, they can recover some of the yield we are claiming. Sensitivity analysis suggests up to 5% of the FPO RPR is exposed to this if "floor observability" approaches 100%.

The defense is to keep the parameters — the smoothing α, the opportunity cost λ, the exploration rate ε — opaque to the market. Never publish them. The behavior is also self-policing in the limit: a DSP that uniformly shades early bids forfeits the highest-value impressions, where shading hurts the most.

What This Means For Publishers

Three takeaways for anyone running a programmatic stack:

First, the unit of optimization is the session, not the auction. Treating each impression as an island leaves substantial revenue on the table — that is what the 27% number means.

Second, pre-session and intra-session signals serve different jobs. The pre-session model sets the opening; intra-session updates calibrate. Conflating them — trying to do both with a single mechanism — is what most existing systems get wrong.

Third, the update rule has to be honest about the order statistic. The winning bid is not the typical bid. Plug winning bids straight into a moving average and your estimate of the market drifts upward forever; subtract the order-statistic bias and it stabilizes where it should.

Programmatic ads quietly killed the English auction. This framework is one way to bring most of it back — without breaking the latency budget that killed it in the first place.

Recapturing the English Auction.