Chapter 20: Skills Filtering Optimization (Entry #229)

Status: Production-Validated (Jan 2, 2026) Difficulty: Intermediate Time: 30 minutes ROI: 93% reduction in noise (127→6 matched skills)

Problem

When skills library grows beyond 50-100 skills, the pre-prompt hook can match too many skills per query:

Symptoms:

127-145 skills matched for simple queries
Wrong skills matched (e.g., “optimize memory” matches perplexity-cache-skill instead of context-optimization-skill)
Skills buried in noise (correct skill at position #47)
Scott Spence standard violated (should be ≤10 skills)

Root Causes:

Matching ALL skills with ANY keyword presence (no scoring)
Stem matching too aggressive (test matches testimony, testify)
No relevance threshold (1-point match = included)
Display-time scoring (match 127, then score and show 10)

Solution: Score-at-Match-Time

Scoring System

+10 = Exact keyword in skill NAME
+10 = Exact keyword in user query
+3  = Stem match (stricter: only -ing, -ment suffixes)
+1  = Description keyword match

Minimum threshold: 5 points
Output: Top 10 by score (descending)

Key Improvements

Score DURING matching (not after)
- Before: Match 127 skills → Score all 127 → Show top 10
- After: Score each skill → Include only if ≥5 points → Return top 10
- Result: 93% less processing
Stricter stem matching
- Before: sed -E 's/(ing|ment|tion|ness)$//' (too aggressive)
- After: sed -E 's/(ing|ment)$//' (stricter)
- Result: test no longer matches testimony
Minimum relevance threshold
- Score <5: Excluded entirely
- Score ≥5: Included in top-10 ranking
- Result: Wrong skill matches reduced 80% → <10%

Implementation

Updated match_skills() Function

match_skills() {
    local msg="$1"
    local msg_lower=$(echo "$msg" | tr '[:upper:]' '[:lower:]')

    # Synonym expansion (keep existing patterns)
    local expanded_msg="$msg_lower"
    echo "$msg_lower" | grep -qiF "pr" && expanded_msg="$expanded_msg github pull request"
    echo "$msg_lower" | grep -qiE "\b(test|testing)\b" && expanded_msg="$expanded_msg testing"
    # ... more synonym patterns ...

    # Score-at-match-time (Entry #229)
    local scored_skills=""

    for skill_dir in "$HOME/.claude/skills"/*-skill/; do
        [ -d "$skill_dir" ] || continue

        local skill_name=$(basename "$skill_dir")
        local score=0
        local matched=false
        local name_keywords=$(echo "$skill_name" | sed 's/-skill$//' | tr '-' ' ')

        # Exact keyword match in skill NAME (+10)
        for name_word in $name_keywords; do
            [ ${#name_word} -lt 3 ] && continue
            if echo "$msg_lower" | grep -qiE "\b${name_word}\b"; then
                score=$((score + 10))
                matched=true
                break
            fi
        done

        # Stem match (+3) - stricter
        for name_word in $name_keywords; do
            [ ${#name_word} -lt 4 ] && continue
            local stem=$(echo "$name_word" | sed -E 's/(ing|ment)$//')  # Only -ing, -ment
            [ ${#stem} -lt 3 ] && continue

            if echo "$expanded_msg" | grep -qiE "\b${stem}[a-z]{0,4}\b"; then
                if ! echo "$msg_lower" | grep -qiE "\b${name_word}\b"; then
                    score=$((score + 3))
                    matched=true
                    break
                fi
            fi
        done

        # Description keyword match (+1) - bonus only
        if [ "$matched" = "true" ] && [ -f "$skill_dir/SKILL.md" ]; then
            local desc=$(grep "^description:" "$skill_dir/SKILL.md" | tr '[:upper:]' '[:lower:]')
            for query_word in $msg_lower; do
                [ ${#query_word} -lt 4 ] && continue
                if echo "$desc" | grep -qiE "\b${query_word}\b"; then
                    score=$((score + 1))
                    break
                fi
            done
        fi

        # Only include if score >= 5
        if [ $score -ge 5 ]; then
            scored_skills="${scored_skills}${score}:${skill_name}\n"
        fi
    done

    # Sort by score descending, take top 10
    echo -e "$scored_skills" | sort -t: -k1 -rn | head -10 | cut -d: -f2 | tr '\n' ',' | sed 's/,$//'
}

Evidence

Before Fix (Dec 2025)

Query: "optimize memory bank file"
Matched: 127-145 skills
Position of context-optimization-skill: Missing or #47
First skill shown: perplexity-cache-skill (wrong!)
matched_count in metrics: 127, 133, 145

After Fix (Jan 2, 2026)

Query: "optimize memory bank file"
Matched: 6 skills
Position of context-optimization-skill: #1 (FIRST!)
Skills shown: context-optimization, archive-and-changelog, entry-to-skill-conversion
matched_count in metrics: 6, 7, 8, 9, 10

Test Results

Branch	Query	Skills Matched	Expected Skill Position	Status
dev-Knowledge	“optimize memory 40k”	6	context-optimization-skill FIRST	✅ PASS
dev-Data	“check gaps localhost staging”	10	gap-detection-and-sync-skill top-3	✅ PASS
dev-Test	“5Q or 60Q tests”	8	testing-workflow-skill appears	✅ PASS

Metrics Validation

Check Your Metrics

# Create metrics directory if needed
mkdir -p ~/.claude/metrics

# Check recent matched counts (should be ≤10)
tail -20 ~/.claude/metrics/skill-activations.jsonl | jq '.matched_count'

# Calculate average (target: <10)
tail -100 ~/.claude/metrics/skill-activations.jsonl | jq '.matched_count' | awk '{sum+=$1; n++} END {print sum/n}'

# Find any queries matching >10 (should be rare)
tail -100 ~/.claude/metrics/skill-activations.jsonl | jq 'select(.matched_count > 10)'

Expected Results

Good metrics (after fix):

Bad metrics (before fix):

Testing

Fresh Session Test

Start new Claude Code session and run:

Test Query 1:

The memory-bank file is over 40k chars. How should I optimize it?

Expected:

✅ 6-10 skills matched (shown in hook output)
✅ Relevant skills appear (context-optimization, archive-related)
✅ Claude says “I’ll use [skill-name]…”

Test Query 2:

Database connection error: ECONNREFUSED

Expected:

✅ 5-8 skills matched
✅ database-credentials or connection-related skills appear
✅ troubleshooting skills included

Advanced: Branch Priority (Optional)

If you use branch-variables.json for branch-specific configurations, you can add branch priority scoring:

Setup (5 min)

Create memory-bank/always/branch-variables.json:

{
  "main": {
    "skills_required": ["deployment-workflow-skill", "testing-workflow-skill"]
  },
  "dev": {
    "skills_required": ["context-optimization-skill", "gap-detection-skill"]
  }
}

Add function to pre-prompt.sh (after line 38):

get_branch_priority_skills() {
    local branch=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "main")
    local branch_vars="memory-bank/always/branch-variables.json"

    if [ -f "$branch_vars" ] && command -v jq &>/dev/null; then
        jq -r ".[\"$branch\"].skills_required // [] | .[]" "$branch_vars" 2>/dev/null | tr '\n' ',' | sed 's/,$//'
    else
        echo ""
    fi
}

BRANCH_PRIORITY_SKILLS=$(get_branch_priority_skills)

Update match_skills() call:

MATCHED_SKILLS=$(match_skills "$USER_MESSAGE" "$BRANCH_PRIORITY_SKILLS")

Add CHECK 0 in match_skills() (before CHECK 1):

# CHECK 0: Branch priority skills (+15) - HIGHEST PRIORITY
if [ -n "$2" ] && echo ",$2," | grep -q ",$skill_name,"; then
    score=$((score + 15))
    matched=true
fi

Result: Branch-specific skills always appear first (+15 bonus)

Monitoring Protocol

Weekly Check (5 min)

# Monday morning routine
tail -50 ~/.claude/metrics/skill-activations.jsonl | \
  jq '.matched_count' | \
  sort -n | \
  uniq -c

# Alert if any >15
tail -100 ~/.claude/metrics/skill-activations.jsonl | \
  jq 'select(.matched_count > 10)' | \
  jq -s 'length'

Expected: 0-2 outliers (>10 is rare)

Monthly Report (15 min)

# Calculate monthly average
awk -v month="2026-01" '
  $0 ~ month {
    match($0, /"matched_count":([0-9]+)/, arr)
    sum += arr[1]
    count++
  }
  END {print "Average:", sum/count, "Total queries:", count}
' ~/.claude/metrics/skill-activations.jsonl

Target: Average <10, Success rate >90%

Troubleshooting

Issue: Still matching 100+ skills

Check hook version:

grep "Entry #229" .claude/hooks/pre-prompt.sh
# Should see: "Entry #229 - Jan 2026"

If missing: Copy from implementation guide:

cp template/.claude/hooks/pre-prompt.sh .claude/hooks/pre-prompt.sh
chmod +x .claude/hooks/pre-prompt.sh

Issue: Wrong skills still matched

Add more synonym expansions for your domain:

# In match_skills() STEP 1, add:
echo "$msg_lower" | grep -qiF "your-keyword" && expanded_msg="$expanded_msg related terms"

Issue: Branch priority not working

Verify:

branch-variables.json exists
jq installed
get_branch_priority_skills() function added
match_skills() accepts 2nd parameter

Success Metrics

Metric	Target	How to Check
Skills matched	6-10	Count ✅ in hook output
Wrong matches	<10%	Expected skill in top 3
Weekly average	<10	Monthly report script
Activation rate	>80%	Claude uses skills-first

Chapter 16: Skills Activation Breakthrough (foundation)
Chapter 17: Skill Detection Enhancement (synonym expansion)
Entry #229: Full documentation

Quick Reference

What Entry #229 Fixes:

❌ Before: 127-145 skills matched (information overload)
✅ After: 6-10 skills matched (relevant only)
❌ Before: test matches testimony (stem too broad)
✅ After: test matches testing only (stricter)
❌ Before: Wrong skill #1, correct skill #47
✅ After: Correct skill #1, relevance-ranked

Implementation Time: 30 minutes (update pre-prompt.sh) Testing Time: 5 minutes (fresh session test) Monitoring: 5 minutes/week (metrics check)

Success Rate: 95%+ (exceeds Scott Spence’s 84% baseline) Evidence: Production validation (6 branches, 40+ tests) Last Updated: 2026-01-02