Comment on Nonprofit scrubs illegal content from controversial AI training dataset

Nonprofit scrubs illegal content from controversial AI training dataset

Enlarge (credit: Kirillm | iStock / Getty Images Plus) After Stanford Internet Observatory researcher David Thiel found links to child sexual abuse materials (CSAM) in an AI training dataset tainting image generators, the controversial dataset was immediately taken down in 2023. Now, the LAION (Large-scale Artificial Intelligence Open Network) team has released a scrubbed version of the LAION-5B dataset called Re-LAION-5B and claimed that it "is the first web-scale, text-link to images pair dataset to be thoroughly cleaned of known links to suspected CSAM." To scrub the dataset, LAION partnered with the Internet Watch Foundation (IWF) and the Canadian Center for Child Protection (C3P) to remove 2,236 links that matched with hashed images in the online safety organizations' databases.

 

Comment On This Story

Welcome to Wopular!

Welcome to Wopular

Wopular is an online newspaper rack, giving you a summary view of the top headlines from the top news sites.

Senh Duong (Founder)
Wopular, MWB, RottenTomatoes

Subscribe to Wopular's RSS Fan Wopular on Facebook Follow Wopular on Twitter Follow Wopular on Google Plus

MoviesWithButter : Our Sister Site

More Business News