I want to segment a Unicode text into runs according to their script. I've had a look through UAX#24 in the hope of finding a standard algorithm for doing this, but there isn't one specified. The implementation section gives some good pointers for what to be careful with (paired punctuation, etc.) but I can't find a step-by-step algorithm similar to the bidi algorithm or collation algorithm.
Equally, I don't see anything in ICU that segments into script-based runs. You can get script properties, but that doesn't help you resolve common characters in the context of a run. Does anyone know of an open-source algorithm for doing this?