This week I ended up reading a couple of recent articles around the topic of search. Not groundbreaking paperās style. Rather down-to-earth field implementations. Below, Iāll go through the paid search challenges in two major online platforms. And then to the emerging role of a Relevance Engineer.
Shopping upsells on Pinterest. An interesting story. Let me decompose it to the common steps seen across data projects.
A simple problem to solve ā introduce ads into the search results. They call it āshopping upsellsā. Imagine you need to build a shopping upsell model.
Step 1. Get Data.
Where to get the data for a feature that doesnāt yet exist on a platform?
- One approach: randomly display a portion of upsells for all queries. However, this way the product quality is mixed with the user intent for shopping ā not clear if the user doesnāt want to buy in general or doesnāt like this particular ad.
- A better approach: embed products in both upsell and organic sections, but hide prices in organic. This way is possible to distill the intent of a user and make data less noisy.
Step 2. Get Model.
Youāve got data, get a model.
- Use business knowledge to come up with a smart objective. Clicks on products are usually noisy, but a good first start. Much better to assign proper weights to strong signals and smartly combine them. Pinterest uses pins and clicks to partner sites.
- Model architecture:
Query -> Embedding -> Encoder -> Dense -> Log Loss
New practitioners are often disappointed by seeing simple architectures after all the resnets and RNNs theyāve just studied. But complexity and state-of-the-arts are often wrong fallacies to chase for most of the businesses.
Step 3a. Get Results.
āAfter launching the experiment, the model increased more than 2X traffic to the shopping search page without hurting overall search metrics in terms of long clicks or saves. The model also increased more than 2X product impressions and product long clicks through the upsell.ā
Step 3b. Hack Production.
Having the results you now need to hack the costs to get the āmodel economicsā right.
- For example, they are smartly precomputing head queries and filtering out ānon-shoppable categories, such as ārecipeā or āfinanceā.ā
My bet is that Pinterest didnāt come up with these optimizations from the beginning. Usually, itās a loop of 2-3b steps until you get all the components right. This often-overlooked cycle of small adjustments, in this case, allowed to reduce model serving traffic by 70% š¤Æ
Ebay
Ebayās article on balancing paid and non-paid content in their search results.
The basic idea is that having fixed paid slots is bad. Both for the:
- head queries, for which there is much more paid content than itās possible to fit
- as well as tail queries, for which there is often not enough high-quality paid content
The solution? Get rid of the fixed paid slots and rank the whole search result according to ārelevancyā. Here is a more detailed summary:
@ebaytech has recently released an article on balancing the paid and non-paid content in their search result page (thread)ā°#ecommerce #Search #marketplace ā°https://t.co/REualIf6Aq
ā Elias Nema (@EliasNema) July 23, 2020
šµļøāāļø DS or ML? RE!
Another interesting take on the career in the data field from one of the most famous search practitioners. A couple of highlights:
- Who is a relevance engineer: āimplements information retrieval algorithms that solve user information needs in real time, at scaleā
- Applied approach: ādonāt chase the state of the art unnecessarily, rather they prefer proven techniques for 80% of the problemā, ādonāt solve search for Kaggle points or academia, but for real companies and usersā
- How itās different from ML engineer: both roles are very similar, with relevance engs tending to be more user-centric and focused on IR problems (ML is broader and not necessarily user-facing problems)
I think the role will become more popular going forward with many companies realizing the need and value of showing relevant content to users in an ever-shrinking customer attention span.