Tech

Private API keys and passwords found in AI training dataset – nearly 12,000 details leaked

Share
Share


  • Truffle Security found thousands of pieces of private info in Common Crawl
  • The archives are used to train some of the biggest LLMs today
  • The researchers notified the vendors and helped fix the problem

Cybersecurity researchers have found thousands of login credentials and other secrets in the Common Crawl dataset.

Common Crawl is a nonprofit organization that provides a freely accessible archive of web data, collected through large-scale web crawling. As of recent estimates, the organization hosts over 250 petabytes of web data, with monthly crawls adding several petabytes more.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles