Search Revenue Analysis: YSL

Visualizing which catalog attributes contribute the most to search-driven revenue.

The Prioritization Engine: How It Works

This report is powered by a Python system that bridges the gap between customer search queries and your product catalog. It prioritizes attributes not by how *often* they are used, but by how much *revenue* they generate.

Here is the step-by-step logic:

Filter Catalog Attributes

The system first parses the entire Product Catalog XML. It applies a crucial filter to ignore "noisy" attributes (e.g., booleans like is-new, online-flag) and focuses only on rich, descriptive content.

Build Keyword Index

For all *beneficial* attributes (like long-description), the system reads the text. Using an NLP library (NLTK), it *stems* each word (e.g., "running", "ran" -> run). It builds a massive index mapping each stemmed_keyword to the attributes that contain it.

Match Search Queries with Priority Classification

The analysis processes customer search queries with revenue data (from CSV), applying intelligent prioritization:

1. Brand Keyword Check: First checks if query matches brand-specific keywords from database (e.g., "genifique") → 🏷️ Highest Priority

2. Generic Keyword Check: Checks for category keywords (e.g., "serum", "moisturizer") → 🔸 High Priority

3. Token Matching: Stems query words (e.g., "wrinkle cream" → wrinkle, cream) and looks up in reverse index to find matching attributes

4. Value Assessment: Classifies catalog matches as High Volume (80th percentile searches) → 📊, High Performance ($1000+ revenue or 5%+ conversion) → ⚡, or Other Catalog → 📝

5. Filter Low Value: Queries below minimum thresholds (<10 searches AND <$50 revenue) are filtered out to focus on meaningful opportunities

Attribute Revenue & Prioritize

When a query matches an attribute, the query's entire revenue is attributed to that attribute. The final report (which this page reads) sums the total revenue for all queries that matched each attribute. This is how long-description ends up with over $93k—it matched thousands of high-revenue queries.

The Logic: "Before" vs. "After"

Looking at attribute usage count alone is misleading. An attribute might be used on 10,000 products but drive zero revenue. This analysis cross-matches usage with search revenue to find what *truly* matters.

"Before" Analysis

Top 5 Attributes by Usage Count

"After" Analysis

Top 5 Attributes by Revenue Impact

Conclusion: We filter out "noisy" attributes (like c-dimWeight) to focus on revenue-driving attributes (like long-description).

Top 25 Search-Driving Attributes

Deep Dive: The "Why"

Why is 'long-description' #1?

The data shows that broad, text-heavy attributes like long-description and short-description are overwhelmingly the highest revenue drivers.

This is because they contain the highest density of high-intent keywords. Customers searching for specific products (e.g., "advent calendar") or ingredients ("niacinamide") are matched against these rich-text fields.

Top Matched Keywords

(for the #1 attribute)

P.S. Top Unmatched Queries

These are high-revenue queries that did not match any priority catalog attributes. This represents a content gap and an opportunity for future content optimization.

📊 How This Analysis Works

Understanding the methodology behind these insights

🎯 Overview

This analysis identifies which product catalog attributes are most valuable for improving search results and driving revenue. It combines three critical data sources to provide actionable insights:

Product Catalog XML - Contains product attributes and their values
Library XML - Contains dynamic content (HTML/text) referenced by catalog attributes
Search Queries CSV - Real customer search data with revenue, orders, and conversion metrics
Keywords Database JSON - Brand-specific and generic keywords for enhanced matching

❌ The Problem

Customers search for products, but many searches return no results because:

Product attributes don't contain the words customers are using
Important searchable content is missing or poorly structured
No clear understanding of which attributes drive the most revenue

✅ The Solution

By analyzing search queries customers actually use (with their revenue), we can:

Identify the most valuable attributes to enhance
Prioritize content improvements based on revenue impact
Discover gaps where customer language doesn't match product data
Optimize search indexing by focusing on high-performing attributes

💰 The Gain

📈

Revenue Recovery

Capture lost revenue from searches with no results

🎯

Content Prioritization

Focus resources on attributes that matter most

📊

Data-Driven Decisions

Use actual customer behavior, not assumptions

✨

Measurable Impact

Track revenue, orders, and conversion improvements

🔑 Key Innovation: Dynamic Content Resolution

Many e-commerce catalogs use content assets to avoid duplication. Instead of storing "Reduces wrinkles, brightens skin" on 50 products, they store it once with a content-id like "retinol-benefits" and reference it.

Example Flow:

1. Catalog XML:

Product: "Night Serum"

c-benefits = "serum-benefits-001"

2. Library XML:

content-id: "serum-benefits-001"

body: "Reduces fine lines and wrinkles. Brightens skin."

3. Application Processing:

✓ Resolves ID → actual text

✓ Tokenizes: ["reduce", "fine", "line", "wrinkle", "brighten", "skin"]

✓ Indexes: wrinkle → c-benefits

4. Customer Query: "wrinkle cream"

✓ Matches token "wrinkle" → c-benefits

✓ MATCH! c-benefits is valuable for wrinkle searches

This means we analyze the real content customers search for, not just content IDs, providing accurate attribution of search value to catalog attributes.

🎯 5-Level Prioritization System

Not all search queries are equal. We use sophisticated multi-factor prioritization:

🏷️

Priority 1: Brand Keyword Matches

Queries containing brand-specific keywords show highest purchase intent

Example: "genifique serum" (contains "genifique")

🔸

Priority 2: Generic Keyword Matches

Queries containing category keywords represent product discovery

Example: "anti-aging serum" (contains "serum")

📊

Priority 3: High Volume Catalog

Matches catalog attributes with search volume above 80th percentile

Example: Query with 500+ searches

⚡

Priority 4: High Performance Catalog

Matches catalog with high revenue (≥$1000) OR high conversion (≥5%)

Example: Query with $2000 revenue

📝

Priority 5: Other Catalog

Matches catalog with minimum thresholds (10+ searches OR $50+ revenue)

Standard optimization opportunities

🗑️

Filtered Out: Low Value

Below minimum thresholds (<10 searches AND <$50 revenue) - not worth optimization effort

🔄 Analysis Workflow

Data Ingestion

Load Library XML, Catalog XML, Search CSV, Keywords DB

Content Resolution

Resolve content IDs to actual text, clean HTML, build reverse index

Query Matching

Match queries to attributes, classify priority, aggregate financials

Generate Insights

Sort by priority & metrics, create comprehensive reports, export to CSV/Excel/HTML

🛠️ Technical Stack

Data Processing

• lxml (XML parsing)
• pandas (DataFrames)
• NLTK (NLP, stemming)

Analysis

• Token-based matching
• Reverse indexing (O(1))
• Financial aggregation

Output

• Plotly (visualization)
• CSV/Excel export
• HTML dashboard

🤖 AI-Powered SFCC Search Enhancements

Beyond catalog analysis, we've implemented AI-driven improvements directly in Salesforce Commerce Cloud to optimize search performance

🛍️

Armani: Stopwords Optimization

Multi-language stopword additions to improve search precision

🇩🇪

German (de)

41 words added

nur, sehr, oft, immer, nie, gern, vielleicht, weil, denn, obwohl, trotzdem, ohne, gegen, um, mir, dir, ihm, ihn, uns, euch, ihnen, war, waren, warst, wart, wurde, wurden, sei, seid, gewesen, habe, hast, hat, haben, habt, hätte, hättest, hätten, hättet

🇬🇧

English (en)

23 words added

just, get, got, gets, getting, many, much, never, always, often, see, saw, seen, say, says, said, go, goes, gone, went, also, ever, really

🇪🇸

Spanish (es)

44 words added

y, o, u, mas, ni, que, a, de, contra, durante, hacia, hasta, mediante, según, más, menos, no, sí, tampoco, ya, hoy, ayer, mañana, siempre, nunca, jamás, ahora, luego, después, me, te, se, le, les, os, mi, tu, mis, tus, esto, eso, aquello, del, al

🇫🇷

French (fr)

11 words added

jamais, or, voire, allez, veux, veut, voulez, voudrais, voudrait, autres, mal

🇮🇹

Italian (it)

70 words added

sui, sugli, sulle, dal, dai, dagli, dalle, nel, negli, nelle, col, coi, mi, ti, si, ci, vi, ne, gli, li, non, più, meno, come, dove, quando, perché, sempre, mai, poi, prima, dopo, troppo, così, già, è, era, eri, eravamo, eravate, erano, ero, se, né, pure, neppure, oppure, i, gli, stata, state, mio, tuo, suo, miei, tuoi, suoi, mia, tua, sua, mie, tue, sue

🇳🇱

Dutch (nl)

56 words added

de, u, jij, jou, jouw, uw, haar, hen, mijn, onze, jullie, ben, bent, zijn, waren, hebt, heeft, hebben, hadden, doe, doet, doen, deed, deden, mag, moet, wil, kun, kunt, kunnen, op, over, onder, naar, voor, achter, tegen, door, zonder, maar, want, dus, hoewel, tenzij, omdat, hier, daar, waar, niet, ja, nee, misschien, nooit, altijd, vaak, soms, erg, zeer, veel, weinig

🔗

Armani: Synonym Groups

AI-generated synonym mappings to capture search intent across languages

🛍️

Brand & Misspellings

Catching common brand variations and typos

giorgio armani, giorgio, armani, mr armani, armaniy, armanie, gorgio armani, georgio armani, ga

✨

Product Attributes & Benefits

Multi-language benefit keywords

hydrating: hydrate, moisturising, moisturizing, idratante, hidratante, feuchtigkeitsspendend

anti-aging: anti aging, antiage, anti-rides, anti-età, antiedad, anti arrugas

glowing: glow, luminous, illuminateur, luminoso, illuminante, ilumina

long-lasting: long wear, longue tenue, lunga tenuta, larga duración, langer halt

🆕

Product Categories

Makeup and skincare sub-categories

primer: base, pre-base, base trucco

blush: blusher, fard à joues, colorete

highlighter: enlumineur, illuminante, iluminador

nail polish: vernis, smalto, esmalte de uñas

🧪

Key Ingredients

Popular skincare and fragrance ingredients

hyaluronic acid: acide hyaluronique, acido ialuronico

retinol: vitamin a, vitamine a, vitamina a

rose: rose extract, extrait de rose, acqua di rosa

jasmine: jasmin, gelsomino, jazmín

🎁

Gifting & Occasions

Seasonal and holiday search terms

valentine's day: saint valentin, san valentino, valentinstag

mother's day: fête des mères, festa della mamma, muttertag

birthday: anniversaire, compleanno, cumpleaños

father's day: fête des pères, festa del papà, vatertag

💄

YSL: Strategic Stopword & Synonym Management

Brand-aware optimization protecting iconic product names

⚠️ High-Risk Words Identified & Protected

English: "the"

Conflict: YSL product "The Bold" Lipstick

Impact: Search for "The Bold" would become "Bold", potentially returning unwanted results

✓ Action: Removed "the" from stopword list

French: "le", "la", "l'"

Conflict: Iconic products "L'Homme", "La Nuit de L'Homme", "Le Teint"

Impact: "L'Homme" → "Homme", "Le Teint" → "Teint"

✓ Action: Removed from stopword list, ensured product data indexed correctly

French: "or" (gold)

Conflict: Key YSL theme word (gold packaging, fragrances)

✓ Action: Intentionally excluded from French stopword additions

✍️ Safe Stopword Additions

After risk analysis, added the same multi-language stopwords as Armani, with brand-specific exclusions noted above.

🇩🇪 German

41 words

🇬🇧 English

23 words

🇪🇸 Spanish

44 words

🇫🇷 French

10 words *

🇮🇹 Italian

70 words

🇳🇱 Dutch

56 words

* Excluded "or" (gold) due to YSL brand importance

🔗 Comprehensive x-default Synonym Rebuild

Complete consolidation: All language-specific lists merged into single x-default with extensive new groups added.

💄

YSL Franchises

black opium, libre, touche eclat, all hours, pure shots, loveshine, rouge volupte, tatouage couture, nu, y, la nuit, mon paris...

📦

Product Categories

lipstick (pintalabios, rouge à lèvres, rossetto, lippenstift), foundation, mascara, serum, moisturizer...

✨

Attributes & Benefits

hydrating (feuchtigkeitsspendend, hidratante, hydratant), matte, radiant, brightening, plumping...

🌿

Key Ingredients

saffron (safran, azafrán, zafferano), hyaluronic acid, glycolic acid, vitamin c, ceramides...

🎁

Gifting & Occasions

mother's day (fête des mères, dia de la madre), valentine's day, christmas, birthday...

🎯 Strategic Advantage

Single x-default list enables cross-language search: German users can find products using French names, Spanish users can search Italian terms, creating a unified multilingual search experience.

📈 Expected Impact

🎯

Better Intent Matching

Multi-language synonyms capture customer search intent across all locales

🔍

Reduced Noise

Smart stopword filtering improves result precision while protecting brand terms

💰

Revenue Recovery

Fewer "no results" searches means more conversions and customer satisfaction