Post

Outliers

I’ve been thinking about outliers a lot this week, after a hypothetical scenario came true. UK FOI law requires that you use an acceptable form of your real name when making a request - John Smith or Mr Smith but not John. I came across a person with a mononym who does not use title, and who was encountering friction when trying to ask for information. Even when they started to explain in their requests that they only have one name, the responses they got back defaulted to refusing to answer until they provided an identifier that they don’t have. Mononyms are more common in some cultures, yet there is a presumption that you must have two names to tick the proper boxes. As in many areas of life, there is little flexibility for those who don’t fit the template. I was reminded of the essay about the falsehoods programmers believe about names.

I found the mononymous user whilst classifying FOI requests, which is just looking at what a response says and pushing a button to say what it is. I’ve done 75,000 now, which is a large number by most measures, yet a negligible amount when set against the backlog. I find it calming. I like stumbling across interesting releases and there is no better way to gain an understanding of UK FOI as it is in practice versus how some soulless seminar somewhere will sell it to you.

I am so far ahead of the next closest person in terms of the number categorised, that I am the clear outlier here. If anyone eventually gets around to running request statistics, I’ve done enough that my personal preferences will skew the data in a way that will need controlling for. There are plenty of scenarios where I know that I disagree with the typical requester about what label to put on things. Over the years, I’ve reversed my position a few times as well, so I can’t even claim to have brought consistency.

Recently, I have experimented with using AI to pre-classify requests based on what the model thinks they are. If the classification system were simpler, then I think that it would excel at this task. As it stands, it’s more complex than it strictly needs to be, and the replies are full of nuance. I agree with AI predictions about 80% of the time. It is better at some categories than others, but can get confused by the edge cases where users or authorities do unexpected things.

I think I could make it “better” through fine-tuning. I am half tempted to submit a Subject Access Request for the IDs of everything I’ve classified over the years, as I’ve inadvertently assembled a valuable human labeled training set for creating a model that would think like me. There is something slightly unsettling about that thought however, and I am unsure if someone else were to propose it, that I’d consent. Perhaps it is the knowledge that it would crystallise my biases in a way that would make them impossible for me to ignore.

During this latest push, I kept getting rate-limited, which made me smile. As a confirmed human, it’s surprisingly difficult to slow down the rate at which you work, and I lack an auto back-off function to respond to retry-after headers that I can’t see. The limit is likely set at a speed it was thought a human would not hit. I guess I am the outlier again. Somewhat ironically for a bot prevention measure, I’ve had to use a tool to throttle myself to a slower pace than I could manage on my own.

I collected some 503s, surprise 502s and a 504 to add to my 429s, with status codes being a bit like Pokémon. This got me thinking about a song that I generated a while ago using suno about 504 errors after I got irked at hitting one each time I went to do a task.

It was a cathartic joke, and meant to be a thing to listen to once then throw away, but over time, I found that it began to feel like an apt metaphor for other things that were happening. It found its way onto my most played list, which was a first for me with something entirely AI-generated. Amidst their mounting legal troubles, suno recently released a new model with improved audio quality and a tendency to hum or wail at the start of songs. I normally remaster old creations to improve the quality, but this time also changed the genre of the original to better fit the lens through which I now hear it.

504

Such low effort and instant customisation is still something quite new. Personally, I have always found that the best AI music is your own and that, save for some outliers, most of the rest is unlistenable. I think I have worked out why, and it’s not a matter of taste. The 504 song is intentionally sparse and flat, but as with what came before it, the music that suno v5 outputs is often quite generic and safe.

Cooking

Qwen translated my lyrics on that one. Whilst imperfect, it is also missing human imperfection. I am able to enjoy the songs that I’ve written precisely because I’ve already made a connection to the text. I don’t need AI to try to do that part for me, and this helps me overlook obvious flaws with the generation. Even this is an illusion of connection however, with the model acting as a mirror to reflect my own thoughts back at me.

The model can’t yet fully grasp the meaning of what it’s saying, and for now at least, the user has limited tools to teach it. I gave it a pair of spells that are over a thousand years old and a list of yesterday’s most common HTTP status codes, and it treated both with the same level of passion, understanding neither.

429

What I want from AI is the production of more outliers. I want it to give me something new that I haven’t heard before. There are people attempting to do just that by fusing genres and ideas into something that feels unique, but this is hard to do when the models are optimised to average and the generation process feels like opening loot boxes at times. For my part, I’ve been trying to make a yodel-shanty-dubstep fusion happen. v5 has come close to realising my vision, but still trends towards the blander and more formulaic end of the rainbow, even on high weirdness. This is an intended feature of most AI systems, not a bug. Indeed for many, is a key selling point.

I touched briefly upon this blandness as a service in my last post about AI generated FOI requests. Since then I have looked into the outliers a bit more, in the form of the false positives from the pre-ai times. There was a cluster of batch requests that talk exactly like the AI, and that would have become public about the time that openAI was gathering training data. They were made by commercial users of FOI and are themselves outliers in style and content when compared to requests from regular people. Despite this, they will may have been overrepresented in the training set, due to being sent to many authorities at once. This means that instead of 1000 unique voices, the model might have seen this exact voice 1000 times, and could have over optimised on that basis. It’s just a theory, but a tempting one.

This post is licensed under CC BY 4.0 by the author.