Post

Day Whatever

After a couple of days of experimenting, I’ve finally got semantic search deployed and working on a test build. It was remarkably easy in the end.

I started by section/paragraph chunking the decision notices into chunks of around 1,800 characters that respect headings. Then I built a sentence-chunked version for comparison. Each variant (Pagefind text, semantic big chunks, semantic small chunks) has some pros and cons. The small chunks are great for more precise searching, but it does mean that you lose some of the surrounding context. A broad query like “cases where the public interest favoured disclosure of internal advice” comes back as fragments, which is not ideal. The larger chunks keep the context, but bury the sentence you actually wanted under a heap of noise. The snippets also aren’t great compared to seeing the most relevant sentences either.

Which one works best will depend on who is searching and why. Someone who already has a case reference and just wants the notice would be better off with keyword matching. I kind of want to keep all three, but there is probably a tradeoff to be found. I’m leaning towards, a mixture, but need to think a bit more about how that would work in practice.

Thankfully, none of this is a blocker to the guidance pages, which are ready for third-party feedback and proofing.