How LLMs See Images, Audio, and More
blog.bytebytego.com
A video is tokenised by first splitting it into individual image frames and audio sequences and then tokenising images and audios. Not surprising but interesting.
A video is tokenised by first splitting it into individual image frames and audio sequences and then tokenising images and audios. Not surprising but interesting.