Search Revenue Analysis: YSL
Visualizing which catalog attributes contribute the most to search-driven revenue.
The Prioritization Engine: How It Works
This report is powered by a Python system that bridges the gap between customer search queries and your product catalog. It prioritizes attributes not by how *often* they are used, but by how much *revenue* they generate.
Here is the step-by-step logic:
Filter Catalog Attributes
The system first parses the entire Product Catalog XML. It applies a crucial filter to ignore "noisy" attributes (e.g., booleans like is-new, online-flag) and focuses only on rich, descriptive content.
Build Keyword Index
For all *beneficial* attributes (like long-description), the system reads the text. Using an NLP library (NLTK), it *stems* each word (e.g., "running", "ran" -> run). It builds a massive index mapping each stemmed_keyword to the attributes that contain it.
Match Search Queries with Priority Classification
The analysis processes customer search queries with revenue data (from CSV), applying intelligent prioritization:
wrinkle, cream) and looks up in reverse index to find matching attributes
Attribute Revenue & Prioritize
When a query matches an attribute, the query's entire revenue is attributed to that attribute. The final report (which this page reads) sums the total revenue for all queries that matched each attribute. This is how long-description ends up with over $93k—it matched thousands of high-revenue queries.
The Logic: "Before" vs. "After"
Looking at attribute usage count alone is misleading. An attribute might be used on 10,000 products but drive zero revenue. This analysis cross-matches usage with search revenue to find what *truly* matters.
"Before" Analysis
Top 5 Attributes by Usage Count
"After" Analysis
Top 5 Attributes by Revenue Impact
Conclusion: We filter out "noisy" attributes (like c-dimWeight) to focus on revenue-driving attributes (like long-description).
Top 25 Search-Driving Attributes
Deep Dive: The "Why"
Why is 'long-description' #1?
The data shows that broad, text-heavy attributes like long-description and short-description are overwhelmingly the highest revenue drivers.
This is because they contain the highest density of high-intent keywords. Customers searching for specific products (e.g., "advent calendar") or ingredients ("niacinamide") are matched against these rich-text fields.
Top Matched Keywords
(for the #1 attribute)
P.S. Top Unmatched Queries
These are high-revenue queries that did not match any priority catalog attributes. This represents a content gap and an opportunity for future content optimization.
📊 How This Analysis Works
Understanding the methodology behind these insights
🎯 Overview
This analysis identifies which product catalog attributes are most valuable for improving search results and driving revenue. It combines three critical data sources to provide actionable insights:
- Product Catalog XML - Contains product attributes and their values
- Library XML - Contains dynamic content (HTML/text) referenced by catalog attributes
- Search Queries CSV - Real customer search data with revenue, orders, and conversion metrics
- Keywords Database JSON - Brand-specific and generic keywords for enhanced matching
❌ The Problem
Customers search for products, but many searches return no results because:
- Product attributes don't contain the words customers are using
- Important searchable content is missing or poorly structured
- No clear understanding of which attributes drive the most revenue
✅ The Solution
By analyzing search queries customers actually use (with their revenue), we can:
- Identify the most valuable attributes to enhance
- Prioritize content improvements based on revenue impact
- Discover gaps where customer language doesn't match product data
- Optimize search indexing by focusing on high-performing attributes
💰 The Gain
🔑 Key Innovation: Dynamic Content Resolution
Many e-commerce catalogs use content assets to avoid duplication. Instead of storing "Reduces wrinkles, brightens skin" on 50 products, they store it once with a content-id like "retinol-benefits" and reference it.
Example Flow:
This means we analyze the real content customers search for, not just content IDs, providing accurate attribution of search value to catalog attributes.
🎯 5-Level Prioritization System
Not all search queries are equal. We use sophisticated multi-factor prioritization:
🔄 Analysis Workflow
🛠️ Technical Stack
- • lxml (XML parsing)
- • pandas (DataFrames)
- • NLTK (NLP, stemming)
- • Token-based matching
- • Reverse indexing (O(1))
- • Financial aggregation
- • Plotly (visualization)
- • CSV/Excel export
- • HTML dashboard
🤖 AI-Powered SFCC Search Enhancements
Beyond catalog analysis, we've implemented AI-driven improvements directly in Salesforce Commerce Cloud to optimize search performance
Armani: Stopwords Optimization
Multi-language stopword additions to improve search precision
Armani: Synonym Groups
AI-generated synonym mappings to capture search intent across languages
YSL: Strategic Stopword & Synonym Management
Brand-aware optimization protecting iconic product names