Scour - May Update
Hi friends,
In May, Scour scoured 865,266 posts from 28,671 feeds (1,766 of which were newly added), and 260 new users signed up to bring it across the 3,000 user mark!
Here's what's new in the product:
š Smarter Interest Matching
Scour is now better at finding posts that match your interests. You should see more relevant content and far fewer off-topic articles in your feed. (This sounds simple, but it represents at least a full month's effort š .)
The way this works under the hood was one of the single biggest changes I've made to Scour's core ranking system since I started working on it. At a high level, scoring now combines Scour's original fuzzy concept matching (embedding vector distance) with how much the article uses relevant vocabulary (lexical search). While these ingredients are well-established, I think the exact way Scour implements them might be a somewhat novel system design.
The reason this was so complex to build was that existing approaches to lexical search did not work for Scour. For example, every Scour user has between a handful and hundreds of interests (I have 642), each of which might have 3-10+ relevant keywords. This means that every "search" is actually a search for thousands of terms (for my feed, it's around 5,000). Most search systems are built for individual queries with a handful of terms. The even more tricky issue is that lexical search algorithms like BM25 do not produce scores that are comparable across queries, because they are designed for ranking (ordering results for a specific query), not scoring. Scour, however, needs to know which of your interests a given post is most related to and it sorts the posts in your feed by how relevant they are for any of your interests. I believe that the custom scoring and indexing system Scour now uses provides both cross-query score comparability and efficient lookup for thousands of parallel queries. Stay tuned for more details!
š Help me out! Please like, dislike, and report posts as off-topic as you're browsing. These signals help me tune the system and figure out the edge cases where it could be improved.
š Better Title Keyword Bolding
Scour bolds keywords in the post titles to make the feed easier to skim. The new lexical scoring layer discussed above makes it easier to bold exactly the words related to your interest.
š Peeking Under the Hood
Two other small changes let you peek under the hood of the new scoring system. On desktop, hovering over a post's title will show you the score breakdown between semantic and lexical. Separately, if you click on an interest tag and go to the single-interest page, there is now an Advanced link that will show you the terms the lexical scoring system is using to find and rank posts.
š Some of My Favorite Posts
Here were some of my favorite posts that I found on Scour in May (you can tell from the topic concentration where my mind has been!):
- Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies
- Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?
- Re-autoresearching MSMARCO BM25, on Vespa
- How we made a SQL query optimization agent 59% more accurate using autoresearch and LLM Observability
- Your Vector Database Doesn't Know What Similar Means
- My Plan with RSS
- Agentic Coding is a Trap
Happy Scouring!
- Evan