Microsoft hits back at claims AI data scraping was sneakily turned on in Word, Excel

LovabledanielsNovember 27, 202456 Views

Concerns are raised over Microsoft’s use of customer data for AI training
But the company confirmed it does not use customer data to train LLMs
Publicly available information is considered open for training

Microsoft’s use of so-called ‘Connected Experiences’ has come under scrutiny following claims it collected user-generated content to train its AI models.

The latest claims stem from an X post by @nixCraft, who accuses Microsoft of turning on an opt-out feature that automatically scrapes Word and Excel documents for AI training.

@nixCraft continues: “This setting is turned on by default, and you have to manually uncheck a box in order to opt out.”

Microsoft says it doesn’t train AI on your documents

Concerns were raised about the use of proprietary content belonging to writers and creators who wish to protect, copyright or sell their content. The X user even shared steps on how to disable Connected Experiences via File > Options > Trust Center > Trust Center Settings > Privacy Options > Optional Connected Experiences.

Despite the claims, Microsoft 365 replied to the thread, stating: “In the M365 apps, we do not use customer data to train LLMs. This setting only enables features requiring internet access like co-authoring a document.”

In an earlier August 2024 blog post, Microsoft confirmed use data remains private and is not disclosed without permission. The company wrote: “Generative AI models do not store training data or return it to provide a response, and instead are designed to generate new content.”

Microsoft also promised to alert users “transparently” in the event of a change to how it handles consumer data for training GenAI models in Copilot.

On the whole, the company has made substantial efforts to differentiate customer data from readily available online sources. Microsoft seemingly treats the latter completely separately, with Microsoft AI CEO Mustafa Suleyman calling public information “freeware” for AI training.