Resources

Forward-Looking vs. Backward-Looking: Why the Scoring Signal Under Your Klaviyo Flows Determines Your BFCM Revenue

Email Marketing

June 4, 2026

Forward-Looking vs. Backward-Looking: Why the Scoring Signal Under Your Klaviyo Flows Determines Your BFCM Revenue

Written by

John Levis

Director of Product Marketing

John leads product marketing at Tie, where he focuses on positioning, messaging, and helping ecommerce brands make better use of customer identity and behavioral data.

The category of AI email scoring is growing. Orita launched Engagement Levels in June 2026. Klaviyo K:AI has been in market for over a year. More scoring tools are coming.

What most of them have in common is less obvious than what separates them: they are all scoring the past.

What Engagement Scoring Actually Measures

Engagement scoring starts with a simple question: what has this contact done inside your ESP? Who opened, who clicked, who bought, how recently, how often. From those signals, a model ranks your list.

That is a useful thing to know. It is not a purchase signal for today.

The structural limitation is not a quality-of-model problem. It is a data problem. Engagement-based scoring tools, including Orita's Engagement Levels and Klaviyo K:AI, are architecturally limited to signals from contacts already in your email list, expressing behavior that Klaviyo can observe. That excludes three populations that represent a significant share of your total purchase intent on any given day:

Contacts on your suppression list who stopped engaging on the email client Klaviyo tracks but are actively browsing your site on a different device.
Known contacts whose intent has shifted since their last engagement event (three months ago they were not ready to buy; today they are browsing the product page).
Anonymous site visitors -- the 60 to 80 percent of your site traffic that has never opted into your list and therefore exists outside your ESP entirely.

Engagement scoring gives you a well-ordered view of your known engaged list. The problem is that your known engaged list is a shrinking fraction of your total purchase intent.

What Purchase Intent Scoring Measures

Purchase intent scoring asks a different question: what is this shopper about to do?

The difference in inputs is what makes the difference real. Identity-backed purchase intent scoring draws on signals from outside the ESP, including:

Cross-device browse behavior from shoppers who appear dormant in Klaviyo but are actively on your site via a different device or browser
Anonymous site visitor behavior, matched to identity graph profiles before scoring
Signal freshness: intent scores updated daily based on what happened in the last 24 hours, not what happened last quarter

The output is a score that describes forward probability, not backward summary. A shopper with a purchase intent score of 8 is predicted to buy in the near term based on their current behavior. A shopper with a 90-day-open-based engagement score of "active" is someone who opened something 89 days ago.

Those are not the same signal. And they do not produce the same segment.

The Caraway Head-to-Head

Caraway was running Orita before Tie Predict. That is the cleanest head-to-head data point available because it controls for the brand, the audience, and the time period. The only variable is the scoring input.

Result: +21% placed-order rate with Tie Predict versus Orita.

The Caraway case is the canonical illustration of what changes when you shift from list-bound engagement scoring to identity-backed purchase intent scoring. The list is the same. The audience is the same. The signal layer is different. And that difference produces a +21% placed-order rate outcome.

What Sharper Image Did Differently

Sharper Image sent to a smaller, higher-intent segment built on Predict scores. The common objection to intent-based segmentation is that you send fewer emails and therefore generate less total revenue. The Sharper Image result is the answer to that objection.

Compared to the control group:

2x placed-order rate
+41% revenue per recipient
+36% CTR

Same list. Leaner send. Significantly better per-send economics. And because suppression of low-intent contacts reduces list fatigue, unsubscribe rates hold instead of accelerating. Sending leaner with the right signal does not cost you volume. It redirects volume to where it converts.

The Segment Composition Problem

Here is the practical outcome of backward-looking scoring that most marketers do not model explicitly. Your "active" segment is defined by a recency window. That window shrinks over time, not because your customers are less interested, but because engagement rates decline naturally as DTC brands scale and email programs mature.

The result is a shrinking active segment from which you extract more and more of your email revenue, while a growing suppression list sits on the other side of an engagement threshold, unscored and uncontacted, regardless of what those contacts are doing on your site right now.

Forward-looking scoring does not just improve the ranking of your active segment. It changes the population eligible for segmentation. It adds back the suppression-list contacts with high cross-device intent. It surfaces the anonymous visitors who are browsing your best-selling products today. It gives you a larger high-intent population to work with, not a better-sorted version of a shrinking one.

The Evaluation Framework

If you are currently using a scoring tool or evaluating one, five questions will tell you whether you are buying forward-looking or backward-looking signal:

Does the tool score anonymous visitors, or only contacts already in your list?
Are scores based on forward-looking intent signals, or backward-looking engagement history?
Is the incrementality holdout-tested with your brand's data, or modeled from industry benchmarks?
Can the tool surface purchase intent from your suppression list?
Does it use cross-device signal, or single-session / single-device behavior?

Orita, Klaviyo K:AI, and Alfred answer "no" or "no" or "industry benchmarks" to most of these questions. That is not a criticism of their execution. It is a description of their architecture. They score the list you have, on the signals you have already captured.

Whether that is sufficient for your program depends on how much of your total purchase intent you are comfortable leaving unscored.

Before BFCM

DTC brands finalize MarTech stack decisions in Q3. The difference between a forward-looking scoring tool and a backward-looking one becomes most visible when your engaged segment is smallest relative to intent -- and the highest-intent moment for your audience is Q4.

A proof-of-concept that starts in July gives you holdout-tested incrementality data before Black Friday. One that starts in October gives you data after the window closes.

The architectural choice between scoring the list you have and scoring the full intent picture is one you make now or after BFCM. The revenue impact shows up either way.

FAQ

What is the difference between engagement scoring and purchase intent scoring?

Engagement scoring ranks contacts based on past behavior in your ESP: opens, clicks, purchase history. Purchase intent scoring uses real-time identity signals, including cross-device browse behavior and anonymous site visitors, to predict near-term purchase likelihood. Engagement scoring is backward-looking; purchase intent scoring is forward-looking.

Does Klaviyo K:AI use purchase intent scoring?

Klaviyo K:AI scores based on Klaviyo-internal engagement data. It cannot score visitors outside the opted-in list and relies on backward-looking behavioral signals. It does not draw on cross-device or anonymous traffic signals.

Is Orita a purchase intent scoring tool?

Orita's Engagement Levels product segments your Klaviyo list into engagement cohorts based on email interaction history. It scores contacts already in your list. It does not score anonymous site visitors and does not use cross-device signals outside the ESP.

What is identity-backed purchase intent scoring?

It uses an identity graph to match anonymous site visitors to known profiles and score them for purchase intent. The scoring engine draws on signals beyond the ESP, including cross-device browse behavior and 55+ identity graph inputs, to produce a forward-looking intent score updated daily.

How do you measure whether purchase intent scoring actually works?

The only reliable method is a holdout-tested proof-of-concept: a randomly selected control group excluded from intent-scored sends, with revenue compared between the scored and holdout groups over the same period. Industry benchmark comparisons and modeled projections are not sufficient because they do not control for creative, offer, season, or brand-specific factors.

On this page

This is some text inside of a div block.

Stay Connected to the Latest

New articles delivered to your inbox—no strings attached.