skip to content
A n u R o c k
Gardener with a knack for plumbing bits

How LLMs See Images, Audio, and More

blog.bytebytego.com

A video is tokenised by first splitting it into individual image frames and audio sequences and then tokenising images and audios. Not surprising but interesting.