The dataset, compiled by EleutherAI includes transcripts from more than 48,000 YouTube channels and was used by companies like Apple, NVIDIA, and Anthropic. Alphabet CEO Sundar Pichai said using data from YouTube to train AI models violates the platform’s terms of service
Discover more from The Doon Mozaic
Subscribe to get the latest posts to your email.