The New York Times has blocked the OpenAI web crawler, pausing the company’s access to newspaper content for ltraining of its Generative AI models. The decision concerns a technical detail of the newspaper’s website, but represents a tension between the publishing world and the technological one. Due to unresolved issues on the relationship between the AI and copyrighted content.
New York Times blocks web crawler to train ChatGPT AI
The New York Times blocked the access of the OpenAI web crawler, the bot that scans web pages to provide the association the material for training your own artificial intelligence models. In fact, ChatGPT monitors several online sites to create algorithms that make predictions on how to form sentences based on your request.
Page verification NYT robots.txt notes that GPTBot, the crawler introduced by OpenAI earlier this month, was banned by the New York Times. According to what The Verge found through the Internet Archive’s Wayback Machine, it appears that the header has blocked the crawler as early as August 17th.
This change comes later to the New York Times terms of service update earlier this month, aimed at prohibiting the use of its contents for training artificial intelligence models. The newspaper spokesman, Charlie Stadtlander, he declined to comment on his colleagues in the American press. OpenAI also didn’t immediately react to a request for comment.
Legal action on the horizon
The New York Times is currently considering the ability to take legal action against OpenAI for alleged infringements relating to intellectual property rights. This situation was reported by NPR over the past week.
In case the lawsuit is decided to proceed, the New York Times would join others, including the comedian Sarah Silverman and two other authors, all of whom filed a lawsuit against the company in July. This action was about using Books3, a dataset used to train ChatGPTwhich could include thousands of copyrighted works. Furthermore, The New York Times would join figures such as Matthew Butterick, a programmer and lawyer. Which argues that the company’s data mining practices are equivalent to the infringement of proprietary rights to the software.
New York Times battle may slow ChatGPT training, but does not ask to stop it altogether. Rather, it demands that copyrighted material not be used to train bots. Or that, if they want to use it, pay right holders justly. Like all humans who “train” themselves by reading books and newspapers: AI can do it much faster, but why should it do it without asking permission?
Leave a Reply
View Comments