How Does Operant Conditioning Work?
This is a follow on from the previous article where I used B.F. Skinner’s definition of operant conditioning, with an interpretation of it in my own words. This article will delve into how operant conditioning works, which is a topic fraught with misunderstanding and over simplifications which have rendered the definitions and applications of the terms specific to operant conditioning as technically wrong. This issue is not unique to the dog training world unfortunately.
If you have not read the previous article, I strongly suggest you do that first as this one will make more sense read second.
The terms in question are reinforcements, there are two types; positive and negative. The other term that crops up regularly is punishment, but that isn’t, on a technical level, part of operant conditioning as defined by B.F. Skinner, its a bug with the English language and definitions of the concept of punishment, and a topic for another blog, not this one.
Before we get into this good and proper, much like the previous article, I shall provide some definitions, I shall use B.F. Skinner’s first, then two others.
Positive reinforcement: A positive reinforcer strengthens any behaviour that produces it. Example: If you are thirsty, getting up, fetching a glass, filling the glass with water then drinking the water quenches your thirst, you’re more likely to do this again in future similar situations; your actions in obtaining the water were positively reinforcing.
Negative reinforcement: A negative reinforcer strengthens any behaviour that reduces or terminates it. Example: If a shoe is hurting your foot, and you take the shoe off, the relief from pain is negatively reinforcing, and you are more likely to do this again in future similar situations.
Now let’s turn to the AKC for a different definition: There are lots of options here, I found 4 articles with a brief search, all worded slightly differently, here’s the one I picked: https://www.akc.org/expert-advice/training/how-you-may-accidentally-be-encouraging-bad-behavior/ by Sassafras Patterdale CPDT-KA, CTDI;
“Positive reinforcement means adding something (food, praise, toys, play, etc.) that our dogs find reinforcing or rewarding to increase the likelihood of behaviors we want.”
“There is also negative reinforcement, which is referring to taking away the reinforcer (i.e. whatever it is that your dog wants like attention) to decrease the likelihood of behavior we don’t want.”
And a definition from Human Psychology, Dr Raymond Miltenberger, the gentleman that first published the operant conditioning quadrants in their current infographic form;
Positive reinforcement: “the process of adding a desirable stimulus following a behavior, which increases the likelihood that the behavior will occur again in the future”
Negative reinforcement “is when something is taken away due to a person's behavior, this creates a wanted outcome for the person.”
Now I wonder how many of you are paying attention to those definitions closely enough to notice one of them is written in the 2nd person, and the other two are written from an external entity’s perspective making changes to an animal or person’s environment to reinforce behavioural outcomes. Fundamentally different perspectives. Unfortunately, changing the perspective from Skinner’s definition is only valid under laboratory conditions.
The most common definitions of these terms in use today are not Skinner’s. It’s the other two. That is a huge problem. It is actually the crux of why everything dog training is wrong, this is a bold statement I know, so let me explain:
Skinner went to great lengths to ensure his explanations of operant conditioning and reinforcements were techncially correct and accurate for use in the real world, not just laboratory experiments. There is a very good reason he wrote about operant conditioning and reinforcement in the 2nd person. It is vital to understanding the concept and for using it correctly.
Operant learning via reinforcements only work when viewed through the eyes of the animal or person trying to navigate a new or changing environment. Reinforcements are not something that can be applied to an animal or a person (that is bribery and / or manipulation, see previous article on Operant Conditioning). They are something that occurs as a result of interactions with the environment by the animal as it strives to survive and win the game of life. The consequences of those actions are the reinforcements.
When under laboratory conditions, reinforcements can be applied or removed, because the animal does not know who changed the environment, or why. This is why Dr Miltenberger’s definitions work in the narrow context of laboratory experimentation. They do not hold true when crossed over into the real world where you as a dog owner are in the environment with the dog. The attempts at applying reinforcements in that situation take the animal out of their operant conditioning loop, and shrink the environment down to you and whatever it is you are offering or taking away from the dog, the big wide world is no longer involved, so the dog is not operantly conditioning themselves to the world, they are operantly conditioning themselves to whatever you are doing to the exclusion of all else. I explained this concept in the last article, but a quick refresh is useful here;
When you offer something to the dog (or take it away) in order to obtain an outcome, what you are in effect saying by your actions is that whatever the dog wants is not important to you, and you, using the knowledge you have of the dog, are offering something you think the dog will like in order to get them to change their mind and do what you want. Bear in mind, the dog will be doing, or wanting to do, things based on its life long operant conditioning up to that point, and you are trying to override that behaviour, in order to obtain your preferred outcome. Once you shrink that environment down to what you are offering at the exclusion of all else, the dog is no longer in an operant learning loop associated with anything except you.
That is not operant conditioning in any useful sense. It is entirely inaccurate to consider it so. It is actually a variable mish mash of the Hawthorne effect (observer effect) in which your actions are directly impacting the outcomes of what you are trying to achieve in applying the reinforcers, and Applied Behaviour Analysis developed by psychologists for dealing with humans with Autism Spectrum Disorder, the latter I’ll get to in a different blog, its a deep dive all by itself. That is of little help when trying to build socialised, independent and confident dogs. They need to experience environments for themselves using their own smarts for it to be operant conditioning; they need to experience the consequences of their actions for it to work. You should be calm moral support, nothing more. Dogs aren’t stupid. They know their own minds, and they know when they are being manipulated. Real world example time.
This image was titled “The Only Way to Do Dog Parks”
In the picture we have a dog being given food, outside of a dog park fence, with other dogs looking on from inside the fence. There is a series of pictures, this one is the best one for me to illustrate my point about reinforcements. The dog in the foreground has not been allowed to approach the fence and go nose to nose with the other dog even though it wants to. It has been allowed to look at the dogs behind the fence, and start to approach, but each time after a couple of seconds the owner calls it back and when it switches its focus back to the owner it gets food, and that is the moment you can see in the photo. Time to break this down into the technicalities of reinforcement, bear in mind the owner thinks they are operantly conditioning their dog through positive reinforcement to the dog park and the dogs behind the fence - spoiler - they are not.
As I have mentioned in earlier paragraphs, operant conditioning and the associated reinforcements as defined by B.F. Skinner have to be considered through the eyes of the animal as it acts in environments to work out the rules by the consequences of it’s actions. The dog is unfamiliar with the dog park, and it therefore qualifies as a new environment, so in theory, operant conditioning should be in full swing with this dog as it approaches the fence. In such an intense environment, we should expect a level of uncertainty and anxiousness from some dogs, in others we should expect excitement and exuberance, and others might show indifference. We want to get to that latter behaviour, indifference. An animal that is indifferent to an environment is said to be socialised to it. Nothing is a big deal, the animal knows how to navigate the environment, it knows the rules so it doesn’t get hurt, and it can then go off and have some fun. I wouldn’t get upset at a little excitement though.
How does our dog come to know the rules of the dog park? How does an anxious dog overcome the anxiety of the situation? The same question applies to an over excited hyper dog. The consequences of its actions in attempting to learn the rules, positive and negative reinforcements. If the dog approaches the fence slowly and with some tentative trepidation and the dog on the other side comes to meet our dog, our dog has now received a welcoming gesture from the other dog, and so the operant conditioning loop updates via a positive reinforcement; the non-threatening slow and tentative approach elicited curiosity from the other dog, now we’ve met and established that acting slowly and a bit nervously at the fence resulted in an outcome that was wanted - a meeting of the dog behind the fence. That means next time in this or similar situations, the same type of fence approach is very likely to be repeated. Now because we are dealing with a dog, and repetition brings familiarity, added to the positive reinforcement of repeated success of meeting through the fence, our dog will start developing confidence in doing it; the outcome becomes known or expected, so the trepidation melts away. That is operant learning and soon we have a socialised dog to this environment / scenario.
The reverse is true too; an amped up overexcited charge at and into the fence would likely cause the other dogs to move away, my client Roadie did this his first time outside a dog park, and guess what, all the other dogs went away. The actions of Roadie repelled the other dogs, which is not what he wanted. This told Roadie that he needed to do something different in order for the other dogs to stick around or approach him, and the rules of the game are that Roadie has to figure out what that something different is, that is when he is operantly learning by himself for himself, only when he is actively working out by trial and error what it takes to obtain his desire (I’ve not delved into this here, but your desires in effect are your positive reinforcements, again this is a topic all by itself, and is where Skinner spent a lot of time researching) which is to meet the dogs through the fence can it be considered operant conditioning.
So how does all that relate to what’s going on in the picture? Well, not very well, because as I have explained, as soon as you bring out the distractions and competing motivators; the bribes, the dog can no longer focus its attention on the goal - meeting the dogs behind the fence. It’s mental faculties devoted to rationalising and figuring out how to get what it wants have been unnaturally cut off mid flow by the human who is waving the food around. So the dog is no longer in an operant learning loop, it is no longer obtaining any reinforcements from its actions in relation to the fence or dogs behind it, instead its world has been shrunk down to obtaining the bribe from the human. The dog has no ability to discern what the rules are in how to interact with the other dogs, nor how to get calm in this environment by understanding it and how to operate in it, there is no association of its actions with outcomes that relate to the broader environment, that mental process has been denied, shut down, taken away. That I’m afraid is so far wide of the mark of actual operant conditioning and reinforcement that it’s borderline criminal to describe it as such. The outcome of this way of doing things with dogs is anxiety, reactivity and zero social skills. It should be obvious. Well, the dog in the photo is all of those things, a hot mess, there’s an Instagram account for it; wigglebutt_Koda. Its tragic.
Often what people see with their dog in trying to apply reinforcement is the dog ignores it and tries to do something else. This frustrates the hell out of people. They don’t understand why this is happening. They are presenting or removing a reinforcer, but the dog isn’t playing ball. Well, you probably know by now that the bad news is what you’re doing there isn’t operant conditioning. I’ve said it multiple times, its bribery and manipulation, and your bribe wasn’t good enough. The worse news is that the dog is still using it’s operant conditioning; the dog has weighed up the options and found your attempts at manipulating the environment to be a less reinforcing thing than doing what it wanted to do, so it ignores you - but it will remember how you behaved and update its mental model of you. That can lead to all sorts of strange outcomes, it can look really good for a while as your dog takes the bribes, until the moment arrives when real operant conditioning suddenly jumps out on you - the environment throws an actual problem at you and your dog flies off the handle and you have no idea of what is going on. Your dog is using it’s operant conditioning to go into fight or flight, and because you’ve been fed a pack of nonsense, you can’t get through to your dog, you mean exactly nothing in the moment with your treat, its own instincts are telling it what to do, a bit of food or shouting or even an e-collar mean zero when faced with a real problem that the dog does not know how to deal with because it was never given the chance and now its going into what it thinks is a life or death situation.
As usual, real life examples are better than theory, so as a follow on from the picture above, here’s something that happened to me with one of my client’s dogs, Peppa the female AmStaff. When I started working with Peppa, she was 10 months old, the owner told me she was ok with other dogs, but would sometimes get too rough with small dogs, but also she was a bit scared of bigger dogs, and this could cause her to either pee or to get a bit snappy. I did what I always do, and had Peppa meet my two Tibetan Mastiff’s, Rollo and Tora, then had her meet some other dogs. It all looked good to me, I could see a lot of untapped energy though. She would get very excited at times, and she pulled like a train on walks. I had the owner get a flirt pole for Peppa, and that helped with the pulling, so the next step was I took Peppa to a dog park, she’d never been before, and the owner was worried about taking her.
When we got there Peppa was a mix of excitement and nervousness. I had checked who was in the park and it was all dogs bigger than Peppa. So I took her in and let the leash off. She went running up to two bigger dogs, freaked herself out when they turned around and started sniffing her, she rolled on her back and urinated. What happened after that was pure unadulterated puppy joy for nearly an hour. She ran around with multiple other dogs, she played hard with the ones that wanted to and she followed around some others that didn’t want to play. She moderated her behaviour to fit in. She took the hints about calming down a notch when they came, those hints were the reinforcements - she learned what worked and what didn’t, she repeated what worked (positively reinforced) and stopped doing what didn’t (negatively reinforced).
Those experiences for Peppa were operant conditioning. She learned how to interact with other dogs, she learned to read the signals and warnings from other dogs about their behavioural expectations of her, they offered her positive and negative reinforcements. She learned that all by herself and for herself. I was almost surplus to requirements, as it should be. Peppa had a great time.
Peppa’s story doesn’t end there. The owner ended up going down the formal dog training route against my advice, I never did find out why, but we met Peppa again when she was nearly 2 at a football pitch, and she was a hot mess. The intention was for Peppa to stay at my house with my dogs and family over Christmas whilst the owner was abroad. So I had Rollo, my wife had Tora and Peppa was with her owner. Peppa wan’t allowed more than 3 seconds of interaction with either of my dogs before her owner called her away with recalls and food scatters etc. Sound familiar? Peppa just wanted to play with my dogs, who weren’t that interested in Peppa’s intensity. Peppa got frustrated after 10 mins or so and ended up doing a flying headbutt at Tora, she missed, and I’m glad she did, there was intent behind that headbutt and it wasn’t friendly. My wife and I shared a glance the moment it happened, Tora knew it wasn’t friendly, but as it didn’t actually hit her, Tora chose to let it go and walk off. I don’t know what Rollo thought.
My point with this bit is that Peppa had been prevented from interacting with my dogs by her owner and when she was allowed it was in a very controlled and unnatural way. Peppa did not know how to behave, and the frustration of the constant redirections from her owner, and food scatter in the mix too which Rollo & Tora were happy to snap up right in front of Peppa(!!!!) just wound her up and she did something which would have very quickly got her in trouble with Tora, it would have been a pretty big negative reinforcement, in the form of a correction, and often where Tora corrects, Rollo is ready to play enforcer.
Peppa’s environment was only ever what the owner said it was and so Peppa had been operantly conditioned to that, not the world around her, she didn’t have any of her own agency or free will or space to learn anymore. This is what application of reinforcements does. It’s a travesty, and totally avoidable, but I repeat myself.
So in summary, operant conditioning and the associated reinforcements are not something that can be applied to animals or people in real world situations. These concepts only work in the real world when understood from the animal or person’s perspective. When we do try to influence the animal or person by applying or offering reinforcements we are in actual fact attempting to exert control over the animal or person in order to obtain outcomes we prefer over the desired outcomes of the animal or person. That is called bribery and manipulation and is not a good basis for a relationship with your dog, or other people. It rapidly degenerates into a purely transactional relationship, or breaks entirely. Application of reinforcements creates dependent dogs who are unable to navigate the world around them in a calm and confident manner, because they don’t understand it, the opportunities for doing so have been taken away.
The foundations for a great relationship with your dog are based on mutual trust and respect, that involves giving the dog its brain back, giving it the opportunities to learn about the world via operant conditioning, by the consequences of its actions. This must be viewed through the eyes of the dog, otherwise you are not using operant conditioning reinforcements. Not at all.
So how do we influence dogs with problem behaviours using operant conditioning, as no one is going to just take a dog that is liable to lash out and start a fight with another dog to the dog park and take the leash off. That’s a terrible idea. Our role is to lead. We show the dog that good things come when the dog shows calm self management of behaviour. That sometimes means we stand around doing a lot of nothing whilst the dog goes through a lot of emotions. Once calm has returned, we can begin unlocking what it is that the dog wants. Its more about demonstration than instruction or commanding. The dog will learn quite quickly what unlocks what it wants, and provided the exercise routine is up to scratch, the dog moves forward and learns how to behave without you having to tell it.
Again I offer you the proof is in the pudding. Rollo is all over YouTube. He’s operantly conditioned under Skinner. He’s a cool calm collected cookie. Tora pops up now and again too.