I wonder about using the description metadata field (XMP-dc:Description) tag.
I think this is starting to get used, shown to users an editable field, in the smartphone image/video Gallery apps from both Apple and Google..
I managed to get someone working on a new open source Gallery app to add it to their app. Theres not yet been a release since it was added https://github.com/Lmh170/Android-Gallery-App
I really like your idea of using existing, or creating the needed file metadata to enable #a11y
I'm not just talking about images, but also about sound files, and any relevant file really.
PDFs? Why couldn't I convert the .tex file to .rtf and slap it in the file metadata?
(Is .rtf a binary format? Or Markdown, anyway.)
This is a bonfire demo instance for testing purposes