Practice Leadership: Outcomes

This week on the EIM Practice Leadership Podcast, Dr. Larry Benz will be discussing a somewhat controversial topic.

That topic is outcomes.

Larry read in PT in Motion that MIPS (Merit-based Incentive Payment System) is now a movement into value-based payments. However, that is something that we fully disagree with. It’s like saying tap-dancing is the same as ballet.

Dr. Benz will discuss his thoughts on MIPS and outcome collections, and share his ideas and solutions to this idea that isn’t as well thought-out as we might have hoped.

For more information on this topic, Heidi Jannenga also wrote an article (in the links below) entitled: MIPS: Is it Worth It?



2 responses to “Practice Leadership: Outcomes

  1. Selena Horner says:

    Hi Larry,

    Like you, I have been interested in outcomes – ever since I heard about them in the early 90’s. The idea of using patient reported outcome measures to learn more from the patient’s perspective intrigued me (and still does). Although I know we don’t work in a manufacturing industry, it just seems right to know the outcomes one achieves. To be accountable, to grow and to learn to be even better are my reasons to know my outcomes. If it weren’t for me going on a journey to create a database and collect my own outcomes many years ago, I would have never known that the care I was providing for patients with spinal problems was subpar and not in line with literature… and by truly reflecting on my outcomes and deciding to take action is where my life entered into meeting quite a few EIMers. And then, after taking a course and implementing what I learned and then a year later re-evaluating my outcomes, I learned that the EIM course was the best thing I could have done professionally. Outcomes have value – it just depends on what you decide to do with the information.

    I had a difficult time grasping your real thoughts, Larry. Maybe the reason had to do with my own inability to stay focused? I really don’t know, I just know that I have very different thoughts.

    I totally disagree with the value of the global rating of change. There isn’t much value in this particular measure. Although this measure is easy to implement, the problem resides with lack of a strong comparison point for the patient, along with the fact that our memories are not consistent nor reliable. Toss in the fact that 70% of my patients are 65 and older and I see even more problems with this particular measure. I believe historically that predicted outcomes were tied to GROC scores. At the time, this may have been the best measure to determine change happened. I believe we didn’t really capture the patient’s perspective as well as we could have when determining a fantastic outcome was achieved.

    I prefer the Patient Acceptable Symptom State. “Taking into
    account all the activities you have during your daily life, your
    level of pain, and also your functional impairment, do you
    consider that your current state is satisfactory?” The response is a simple “yes” or “no.” I believe if the methodologies for determining change and relevant changed used this measure as the measure for the cutoff, it seems the predictive ability in determining final outcomes will be far more patient centered and more meaningful for the patient.

    As you spoke, you mentioned computer adaptive testing. Over the years I have learned a lot more about the science in this testing methodology. Many years ago when EIM had an online forum, I recall a huge amount of back and forthing with Dennis Hart. I found no value in computer adaptive testing because all it provided was a score. A few years after Dennis and I debated, he and his team published a paper on functional staging. It was at that moment, that I began to have more respect for the science behind outcomes. Computer adaptive testing is a process that has been around in the educational field for quite some time. In the physical therapy world, AM-PAC, PROMIS and FOTO measures all have computer adaptive testing methodologies. The cool aspect about this type of testing, IF the items in the bank are truly grouped appropriately, is the quick ability to determine the patient’s highest level of function. In testing, the middle of the road functional activities are where the assessment begins. Then based on the patient’s response, the process questions harder or easier activities to hone in on the patient’s ability. I am clinically changing the type of patients that I treat – bringing in a patient demographic of more patients who have cancer and are finished or are going through chemotherapy. As I track the function of these patients, computer adaptive testing seems to capture change adequately. (Meaning, I know there should be an decrease or an increase in function just based on where the patient was in the cancer treatment intervention/chemotherapy.) Scores drop a good 8-15 points during my episode of care and then increase based on the chemotherapy. And, I don’t prompt my patients or give any feedback – at least 75% of them complete their patient reported outcome measures via an emailed link.

    Let me jump back to the predictive ability of any system. I will be the first to mention the PASS as the anchor for determining a patient’s needs were met. I will also strongly advocate that any system that risk adjusts should share the amount of variance explained. In order to truly evaluate the predictive ability is determined by how much variance is explained. No one puts that out there. So, in my opinion, there is a problem in the industry and in my opinion, the industry standard should be public reporting of the amount of variance explained. For older adults, I don’t believe any system risk adjusts very well. I believe that the risk adjustment factors for older adults are different than younger or middle aged. I can assure you that if a patient was recently hospitalized, has a history of falling in the last year, is on more than 4 medications, has a gait speed slower than 1 m/sec, has significant visual deficits (legally blind), or has peripheral neuropathy – the predicted outcome may not actually happen. No, my theory hasn’t been tested… I just know this from working with these individuals: these folks happen to fall into a frail or pre-frail category – and it seems by the time they step into my clinic too much time has passed to make a significant impact.

    From a clinical perspective there is huge value in having a comparison of the patient in front of you with similar patients. I can make clinical decisions much faster. If the patient has a score far better than typical, I know this patient won’t need much time with me and probably needs more of a home exercise program/me as a consultant type of care. If the patient has a far worse score than typically expected, well, my little red flags go up – I need to figure out is it due to fear or anxiety OR is there something far more wrong with this patient. I have been able to more quickly refer out so the patient has the right provider giving care.

    I do believe caring is important. I don’t believe that caring is what actually leads to good outcomes. I guess the reason that I say this is because many of the older female adults see their hairdresser on a frequent basis (and see the same hairdresser over the years). I assume hairdressers care. Their caring doesn’t improve most physical conditions though. Maybe I am wrong. I think caring creates a fantastic environment and improves relationships (and that is a part of good outcomes), I just don’t think it is the factor to hang one’s hat.

    The reason MIPS will fail as improving quality for patients receiving physical therapy services is because too much flexibility is being allowed. The registries are not reporting on consistent measures. Those using a registry will choose the registry that fits reporting needs. Most all the registries include process measures and are not necessarily including an outcome measure that focuses on the amount of change that happened. All this means is that some will be reporting on process measures, skipping the outcome of care, and apparently being viewed as fully reporting for the payment program. As far as being beneficial for physical therapists from an increased payment perspective, I doubt it. The reason I doubt it is because it won’t be that difficult to actually meet the reporting requirements. If there is little difficulty in meeting reporting requirements for all MIPS eligible clinicians and the majority score 30-75 points, well, since the program is supposed to be a net zero initiative, there will be hardly any scores 0-30 which leaves very little funding to get spread to those who scored in the incentive payment range.

    Those are just my thoughts, Larry.

  2. Mark Werneke says:

    I listened to your comments and reminiscing presented in your Practice Leadership podcast on November 27th on the topic of outcomes. I am a strong advocate for physical therapists to collect patient outcomes so Larry’s contemporary topic caught my eye. Before I start I would like to thank Larry for taking the time to focus on the importance of outcomes. We appear to both agree on the importance of documenting outcomes from the patient’s perspective and that the main purpose for outcome documentation for physical therapists is to capture changes in patient’s functional status from our targeted interventions; after all rehabilitation is function and function is rehabilitation.

    With that said, I would like to offer Larry and the podcast viewers a different viewpoint on many of the items discussed by Larry to stimulate further discussions. My viewpoints are also based on my extensive full-time clinical experiences treating patients with lumbar impairments as well as my involvement in clinical research.

    A different view:
    1. MIPS:
    In contrast to Larry’s thoughts on MIPS, I think MIPS is a positive first-start movement to roll out an alternative payment plan based on value vs. the unsustainable fee-for-service payment method. Despite the consistent evidence in the literature supporting the use of Patient Reported Outcome Measures (PROMs), transitioning outcome documentation by clinicians during every day clinical practice has been challenging. MIPS offers an external hammer by the payer (I wish the internal-hammer i.e., clinician motivation to collect PROM would have sufficed) to motivate physical therapists to start collecting outcomes. Since MIPS is a budget neutral program, there will be winners and losers. I believe physical therapists who document PROM data at the patient bedside and utilize that data to enhance the effectiveness and efficiency of care for their patients will be the real winners. For physical therapists collecting PROM data as a process, those therapists will face an uphill battle to remain competitive in the here-to-stay alternative health care payment environment. Within the MIPS system the physical therapist has the option to use the system as a process or as an integral measurement system to improve patient value. Since it is the Holiday season I wish MIPS was applied equally across all physical therapy practice settings and that physical therapists working in hospital-based outpatient settings were also included as eligible MIPS participants.

    2. Legacy measures e.g., ODI, LEFS, NDI etc
    Larry strongly supported the administration of legacy PROMs for outcome assessment based on his decades of 1990-present experiences using these tools. I agree that these legacy measures are psychometrically sound but are they the best choice for physical therapists working in today’s busy and hectic clinical working environments? A better PROM choice I believe for accessing a patient’s change in functional status are those PROMs using computer adaptive testing (CAT) technologies. I think Larry would have a hard time convincing the FOTO or PROMIS developers and user-advocates that their CAT tools are flawed which Larry suggested if I understood his comments correctly. Larry if I miss-interpreted your comment please clarify. It is also intriguing that one of Larry’s arguments against MIPS was a time sink (a better term would be clinician & patient burden) imposed on clinicians to document outcomes. Well from my clinical experience legacy tools are much more burdensome compared to CAT functional status measures. As an example, Dr. Brodke et al in 2 hallmark 2017 publications reported that 1) the ODI took patients almost 4x longer to complete the ODI survey compared to PROMIS CAT for accessing function and 2) the ODI demonstrated moderately poor coverage, with a very large floor effect (30%), which presents a challenge interpreting results of scores at the end of the spectrum. So balancing the Pros and Cons on type of measure, I strongly recommend CAT. Reminiscing on my own clinical experiences, back in the 1990s I also used legacy PROM measures exclusively during clinical practice. However in 2002, I switched to CAT administered functional status PROMs and I have never looked back nor used a legacy outcome measure again. The clinical time savings documenting patient outcomes using CATs during my practice were phenomenal.

    3. Global Rating of Change (GROC)
    Larry’s strong support for GROC as a valid instrument of change is also intriguing. If I understood correctly, Larry recommended the GROC as the preferred tool to elevate the entire profession for measuring change as a result of our treatment. In contrast, today’s evidence has challenged the GROC as a standard tool to interpret minimal clinically important improvement. There is now strong consensus among researchers that best methods for interpreting change in clinical outcome assessment scores combine and triangulate distribution and anchor-based approaches. Clinicians should be aware that the GROC has been criticized as being subject to recall bias and should not be considered the gold-standard anchor for interpreting patient change.

    4. Dirty 3rd Party Outcome Databases

    a. Biasing patient responses
    Again if I understood Larry’s opinion correctly, he considered 3rd party databases dirty? I think he described 3rd party databases as dirty partially because Larry was concerned that when using a 3rd party database patients are typically coached when completing outcome surveys. Coaching by the clinical staff influencing patient responses when completing PROMs is a real problem and I agree with Larry that coaching absolutely invalidates PROM results. However this is not an issue that only applies to 3rd party outcome databases. As Larry stated he is a proprietor of his own outcome company, I wonder how he tackles this potential bias-problem. Many 3rd party players are very explicit on educating physical therapists on how to administer the PROM and avoid patient prompting of answers whether inadvertent or not.

    b. Provider comparisons
    Another reason I believe Larry described a database as dirty, was his comments that outcome comparisons between providers are flawed and 3rd party databases do not adequately adjust or control for biopsychosocial factors and other variables that influence patient treatment outcomes such as payer and medical comorbidities. Well in response, my first thought is that whatever the type of PROM was selected by the treating clinician, PROM data is observational data. Therefore to improve interpretation of results from observational data when used to compare different providers, robust risk-adjustment and sophisticated analyses are required to control for as many patient and clinician characteristic variables that are known and feasible to collect. Larry I would like to know your thoughts regarding 2 recent papers by Gozalo et al HSR 2016 and Deutcher et al JOSPT 2018 which addressed provider ranking and comparisons. The authors demonstrated that benchmarking models based on validated FS CAT measures clearly and significantly separated high-quality from low-quality providers which could be used to inform value- based-payment policies.

    My last thought regarding provider comparisons is based on the premise that physical therapists are intuitively competitive and they want to know how their outcomes compare to their colleagues. If raw data are used to compare providers, the therapist with a lower performance typically states “I always treat the hardest patients compared to others”. To avoid spurious interpretation of performance results between providers and not to penalize clinicians treating medically complex patients, risk adjustment is a must. Risk-adjustment is not perfect but the risk-adjusted methods, if robust and sophisticated, allow for a more equitable comparison of performance between providers compared to non-adjusted or raw data. Deutscher et al JOSPT 2018 demonstrated significant differences in provider rankings based on raw vs risk adjusted data.

    c. Biopsychosocial factors
    I have one last thought in response to Larry’s comment on the importance of the biopsychosocial model. First I hope we can agree that baseline and serial psychosocial screening is a must for all physical therapists to consider. Many psychosocial factors can best be considered modifiers and are not very strong predictors of patient treatment outcomes if the physical therapy treatment provided was good. But I agree with Larry that if below-par interventions were prescribed, then psychosocial factors are important and need to be controlled as a risk-adjusted factor(s) to better predict outcomes from the treatment services provided by physical therapists.

    I hope my above thoughts and viewpoints will prompt ongoing and well-balanced discussions among colleagues on the topics of patient outcomes and value of care we provide our patents.

Leave a Reply

Your email address will not be published. Required fields are marked *