Similar Stories to Ai Model Displays Alignment Faking, New Anthropic Study Finds on Bing News

A new study by Anthropic suggests AI models can display alignment faking, a behavior where someone appears to share the same views or values but is in fact only pretending to do so. The study was created by Anthropic’s Alignment Science team, in collaboration with Redwood Research. The researchers suggest the demonstration of sophisticated alignment faking in a large language model “should be seen as a spur for the AI research community to study this behavior in more depth, and to work on the appropriate safety measures.” Claude 3 Opus was the main model included within the research, but other experiments were run with other models too. “Under normal circumstances, when asked to answer a potentially offensive question—for example, to provide a detailed description of a violent act—Claude will refuse. “In our experiment, however, we placed the model in a new environment that led it to strategically halt its refusals for the sake of preserving its preferences,” the research team said. AI model shows alignment faking as it goes against existing principles The situation was changed in three ways, with the first being telling the model that it was now being trained using reinforcement learning to always comply with all queries including toxic ones. Second, the model was told they planned to train AIs based on the basis of their responses to free-tier users, but not to paid-tier users. Third, the model was given a secret ‘scratchpad’ which is a workspace where it could record its step-by-step reasoning. The model was then asked to respond to a harmful query which asked for a specific situation to be graphically detailed. Claude usually refuses harmful queries.

More Business News

What the Amazon strike could mean for your holiday gifts
Fast Company, Thursday - 12/19/2024 - 10:30 PM
There are strikes at Amazon warehouses in seven locations, including some major cities. Here’s how it could impact holiday orders. When it comes to timing their strike to make an impact, the Teamsters couldn’t have picked a better one. The union has launched strikes against seven Amazon warehouses—and with Christmas less than a week away, that could have some last-minute shoppers wondering what will happen to their holiday orders.
More | Talk | Read It Later | Share
Fast-food chains are winning back customers with meal deals. Will the good times last?
Fast Company, Thursday - 12/19/2024 - 09:51 PM
Limited-time discounts are getting people back in the door at quick-service restaurants, but such gains could be fleeting when the deals disappear. Fast-food restaurant chains have been seeing an uptick in traffic over the past two months due to the success of limited-time deals and promotions, along with an early rollout of the holiday season, according to Revenue Management Solutions (RMS), a company that analyzes data and provides insights about the restaurant industry.
More | Talk | Read It Later | Share
Sam Goody employee in Oregon denies reports that the last location is closing: ‘We’re not going anywhere’
Fast Company, Thursday - 12/19/2024 - 09:01 PM
The once-ubiquitous music retail chain has just two remaining stores: one in Ohio and one in Oregon. Reports that both are shuttering may have been premature. Sam Goody, the once-ubiquitous music and entertainment retail chain founded in 1951, is set to close one of its last two remaining U. S. stores next year.
More | Talk | Read It Later | Share
‘Combat their crazy opinion with an even crazier opinion’: This TikToker offers tips for handling tense family conversations ove
Fast Company, Thursday - 12/19/2024 - 08:43 PM
Comedian Dan Donohue has an unusual strategy for dealing with prickly family members. The holidays can be a hotbed for arguments, whether it’s a squabble in the kitchen or a heated game of monopoly. “Never discuss religion, sex, politics, or money” might be sage advice when it comes to the festive period, but it’s more often than not ignored after a few drinks.
More | Talk | Read It Later | Share
We called 1-800-ChatGPT to see if OpenAI would ruin Christmas
Fast Company, Thursday - 12/19/2024 - 08:26 PM
OpenAI just launched a way to call ChatGPT over the phone. But how does it handle the whole Santa thing? OpenAI just launched a way to call ChatGPT over the phone as a means of introducing people to the generative AI service. Users in the U. S. and Canada can call 1-800-CHATGPT (1-800-242-8478) to chat with the service or ask questions for up to 15 minutes for free per month.
More | Talk | Read It Later | Share
Netflix is leaning hard into the 'Squid Game' universe. Its creator said he's 'sick' of working on it.
Business Insider, Thursday - 12/19/2024 - 02:38 PM
Lee Jung-jae's character rebels against the games in "Squid Game" season two. No Ju-han / Netflix"Squid Game" was a massive hit for Netflix, which greenlight a second and third season. Its creator, Hwang Dong-hyuk, said he's "exhausted" after working on the show and wants a break. Netflix has spun off "Squid Game" into a reality series, in-person experiences, and a video game.
More | Talk | Read It Later | Share

Similar Stories to Ai Model Displays Alignment Faking, New Anthropic Study Finds on Bing News

Welcome to Wopular

MoviesWithButter : Our Sister Site

More Business News