In the roundtable that follows, clinicians discuss a study published in this issue of the Journal in light of its methodology, relevance to practice, and implications for future research. Article discussed:
Candiani M, Izzo S, Bulfoni A, Riparini J, Ronzoni S, Marconi A. Laparoscopic versus vaginal hysterectomy for benign pathology. Am J Obstet Gynecol 2009;200:368.e1-368.e7.
Was the hypothesis clearly stated?
What type of study was this?
How was the sample size determined?
How were the outcomes measured?
Were the statistical methods adequate?
What information is in the tables?
What proportion of patients underwent adnexectomy?
Was the main conclusion supported?
How might this study have been improved?
From the Women and Infants Hospital of Rhode Island, Alpert Medical School of Brown University, Department of Obstetrics and Gynecology, Providence, RI.
Hysterectomy is the most common nonobstetric surgical procedure performed in women. Each approach has its own set of risks and benefits. Decreased operative time, shorter hospital stay, and reduced recovery time have led surgeons to favor vaginal hysterectomy over laparotomy when the situation offers an option. However, when there is no evidence of uterine or adnexal malignancy and the uterus is mobile and of a feasible volume, it has yet to be determined whether the vaginal technique is also preferable to laparoscopic methods. In their randomized clinical trial, Candiani et al suggest that, in specific circumstances, laparoscopic methods might be a better choice.
See related article, page 368
For a summary and analysis of this discussion, see page 465
Kristen A. Matteson, MD, MPH, and George A. Macones, MD, MSCE
Matteson: Can you talk a little about the background of this study?
Sacco: Over the past 10 years, studies have showed that a hysterectomy by laparotomy has a higher incidence of complications and requires a longer hospital stay compared with other surgical approaches. Therefore, an increasing proportion of hysterectomies are being performed vaginally or laparoscopically. There is a paucity of data comparing vaginal vs laparoscopic hysterectomy. However, many surgeons consider the vaginal route to be the “preferred route” for hysterectomy when compared with the laparoscopic approach. This study aimed to compare the 2 approaches.
Matteson: How does the study contribute to knowledge in the field?
Sacco: The study challenges whether the vaginal route should be the preferred approach for hysterectomy. I think the results of the study can be taken into consideration when we plan hysterectomies for our patients. The results may also be helpful when we counsel a patient about which type of hysterectomy is most appropriate for her.
Matteson: What research question was under investigation?
Jackson: The research question was whether operative and postoperative outcomes such as operative time, estimated blood loss, postoperative pain, and duration of hospital stay differ between laparoscopic and vaginal hysterectomy.
Matteson: What was their hypothesis? Was it clearly stated?
Jackson: Their hypothesis was that women who undergo laparoscopic hysterectomy have a shorter hospital stay than women who undergo a vaginal hysterectomy, although this was never clearly stated. In the introduction, the authors said that their goal was to evaluate the differences in terms of earlier discharge between the 2 approaches. However, if you look through their methods, the hypothesis can be implied by their assumptions for sample-size calculation. The authors hypothesize that, compared with women who undergo vaginal hysterectomy, a greater proportion of women with a laparoscopic hysterectomy will be discharged home on postoperative day 2.
Matteson: Where was this study conducted? Who was the study population?
Raker: The study was conducted in Milan, Italy, at the San Paolo Hospital, which is an affiliate of the University of Milan School of Medicine. The study population was 60 patients with an indication for vaginal hysterectomy for benign diseases.
Matteson: How were participants recruited? What was their refusal rate for participation in this surgical trial?
Phipps: I am unclear about how the participants were recruited. It states that patients were eligible if they had an indication for a vaginal hysterectomy for benign disease. But it was unclear whether they were recruited in an office setting or in a preoperative waiting area at the hospital. They assessed 95 patients for eligibility, and they had a refusal rate of about 7%. They excluded a total of 35 patients. Seven refused, and 28 did not meet the inclusion criteria, but we don't know which criteria they failed to meet.
Matteson: What type of study was this?
Raker: This was an unblinded parallel group randomized controlled trial.
Matteson: What do you think about a surgical randomized controlled trial? Is this typically a study that is easy or difficult to recruit for?
Jackson: Overall, a randomized controlled trial is probably a good way to compare surgical procedures in terms of outcome. This design gives you the means to control as many factors as you possibly can. It allows the investigator to control the participant population and to standardize who conducts the surgery, how the surgery is performed, and the environment for the study.
I think it would be difficult to recruit patients to a surgical randomized controlled trial. Patients have access to a lot of health information; based on this information, they may already have a preference for a specific procedure or approach to hysterectomy. Because of this, they may be unwilling to be randomly assigned. I was actually impressed and surprised that their refusal rate was so low.
Matteson: What benefits are there to doing a randomized controlled trial rather than a cohort study?
Sacco: One advantage of a randomized controlled trial is that you are likely to get similar groups in terms of baseline characteristics. Although with smaller sample sizes, like in this study with 60 participants, it is not always guaranteed. Another advantage is that the study design controls for both known and unknown confounders that could affect study results. For example, when you are doing a randomized controlled trial with a surgical procedure, there may be patient characteristics that would affect the length of stay. These patient characteristics might also influence the physician's choice of surgical procedure. Randomly assigning patients to the 2 surgical approaches can control for confounding.
Matteson: How did they randomize subjects?
Raker: Subjects were randomly assigned into 2 equal-sized groups using computer-generated sequences. The treatment allocation was concealed until after the patient was enrolled.
Matteson: What do you think about their method of randomization?
Raker: The method appears to have been simple randomization, but it was not explicitly stated. Simple randomization works best when the sample size is large. However, with small-to-moderate sample sizes, such as in this study, it may not sufficiently balance the distribution of measured and unmeasured confounders between the 2 treatment groups.
Looking at the distribution of baseline characteristics in Table 1 in the article, some imbalances between groups were evident, such as a higher prevalence of nulliparity and a lower mean body mass index (BMI) in the laparoscopic group. Even though none of the differences was statistically significant—as would have been designated by P < .05—the degree of residual confounding will be impacted by the magnitude of the group differences and the association of the factor with the outcome. The authors do comment later that the BMI is not expected to influence the outcomes of interest, but it would be reassuring to comment on the effect of adjusting for the imbalances.
Matteson: What were inclusion and exclusion criteria? Does it matter that the criteria were so specifically defined?
Phipps: Women were included if they had a benign indication for a vaginal hysterectomy. They were excluded if their uterine volume was greater than 300 cc, if they had previous surgery for pelvic inflammatory disease or endometriosis, if they had a suspected malignancy, if they had an ovarian cyst of > 4 cm, or if they had vaginal prolapse exceeding first-degree prolapse.
Strict inclusion and exclusion criteria were important for this study because the patients had to be appropriate candidates for both vaginal hysterectomy and laparoscopic hysterectomy. The investigators had to choose patients for whom there was not clear evidence that 1 or the other method was the “better” route for hysterectomy.
Matteson: What was their intervention?
Sacco: The intervention was either a vaginal or a laparoscopic hysterectomy.
Matteson: Why is it important that they chose “2 skilled surgeons” for each group?
Sacco: It was important that there were 2 skilled surgeons because the level of training is very closely related to some of the intraoperative factors, such as estimated blood loss and operative time. These intraoperative factors influence the primary outcome, hospital length of stay. Using skilled surgeons helps remove surgeons' experience from the study as a complicating factor. It allowed researchers to compare only the procedures.
Matteson: What could have happened if they did not standardize the surgical intervention?
Sacco: It would have been difficult to compare the approaches if they did not specify the exact surgical procedure. In this case, total vaginal hysterectomy was accomplished with the Heaney technique; total laparoscopic hysterectomy was carried out by a type IV E procedure, as set forth by the American Association of Gynecologic Laparoscopists. In the latter, the vaginal cuff was closed with 2 sutures and suspended to the uterosacral ligaments. If the surgical procedures were not standardized, there likely would have been more variation in the outcome measures, and the comparison of the laparoscopic vs vaginal approaches would have been more difficult.
Matteson: How does this standardization affect the generalizability of results?
Sacco: Unfortunately, the specific procedures that were studied and the surgeons who were selected to participate in this study both affect the generalizability of the study results. The results of this study apply only to total laparoscopic hysterectomy and total vaginal hysterectomy when the described techniques are used. There are several other variations of laparoscopic and vaginal hysterectomies that were not considered for this study, such as supracervical hysterectomy or laparoscopic-assisted vaginal hysterectomy. Therefore, this study really is applicable only to women who may be deciding between total vaginal hysterectomy and total laparoscopic hysterectomy and who have a surgeon skilled in both techniques. In reality, most physicians are most comfortable with 1 of these methods, but they might be operating with residents and trainees, which is a factor that was not considered in this study. Skill and learning curves can certainly influence the estimated blood loss, operative time, and length of stay.
Matteson: Is it possible that the skilled surgeons were also investigators for this study?
Sacco: I think it is possible. The article does not specifically state who the skilled surgeons were, so we don't really know.
Matteson: What were their outcomes? How were they measured?
Jackson: The main outcome of this study was the length of stay after the procedure. The specifically stated secondary outcomes included operative time, estimated blood loss, postoperative pain, and whether a planned salpingo-oophorectomy could be performed at the time of surgery. The measurement for length of stay was the postoperative day number on the day of discharge. Specifics were not provided for how they measured operative time and estimated blood loss. For postoperative pain, they used 3 measures: the visual analog score for pain, the number of days that analgesia was requested, and the number of analgesic dosage units provided. Then they recorded whether a previously planned adnexectomy was carried out and assessed the proportion of patients in each group who were able to have the intended procedure.
Matteson: Given that length of stay is the main outcome, what advantages does a randomized controlled trial offer over a cohort study?
Jackson: When you are looking at length of stay, it would be easy to introduce bias if you were using a cohort study design. If this had been a cohort study, the surgeon probably would assign patients to surgical procedures based on their particular characteristics, especially subtle details of their physical examination. If the surgeon decides to perform laparoscopic hysterectomies on all of the young healthy patients, it could influence that group to have a faster discharge. Or, a surgeon may believe 1 approach is easier for someone with a higher BMI, even if there is no clear evidence that compares techniques in that population. The increased morbidity associated with BMI might then influence a patient's postoperative stay.
Another possibility is that patients may have preconceived notions about surgical procedures. Patients have access to information from friends, family, the internet, and preoperative counseling from their surgeon. If a surgeon sets up different expectations for different procedures, it could influence the outcome. For example, during preoperative counseling, a surgeon may say, “After a laparoscopic hysterectomy, patients tend to feel great. Some even feel like they didn't have surgery and want to go home the next day.” If that same surgeon tells his vaginal hysterectomy patient, “You will be uncomfortable after surgery and will be in the hospital for 2-3 days,” that counseling could confound the results.
Matteson: Was anyone blinded in this study?
Sacco: Surgeons, participants, and individuals who assessed the outcome were not blinded.
Matteson: Do you think that the lack of blinding could have introduced bias into the study? Why?
Jackson: I think that there were a couple of avenues where bias could have entered the study. The patients obviously were counseled on both procedures. Again, patient knowledge of the procedures and the counseling they received could have influenced their perceptions; these perceptions could then have influenced the actual length of stay. For example, patients might think they should have more pain with 1 type of procedure or that 1 procedure would require a longer stay.
If the surgeons knew that they were being studied, they could strive to operate faster. Or, when surgeons examined patients postoperatively, they could take the particular procedure into account when deciding whether to discharge. Preconceived notions about length of stay connected with procedures could bias the discharge decision.
Hospital staff could influence outcomes as well. The staff on the postoperative floor, knowing which procedure the patient had, may offer pain medications at different intervals. If a nurse is used to offering a patient pain medicine every 4 hours after 1 procedure and every 6 hours after the other, this could affect outcome. There are many levels where lack of blinding could introduce bias, especially because of predetermined ideas about the surgery.
Matteson: Could there have been blinding at any point in the study? Was this adequately addressed in the discussion?
Jackson: The investigators could use study personnel or physicians who were unaware of the surgical procedure that had been performed to assess the outcomes and appropriateness for discharge. They could use the authors' strict list of criteria for discharge home. If it involved examining the abdomen, the vaginal hysterectomy patients could have adhesive bandages placed on the abdomen. The study personnel could then go through that checklist and decide whether the patient should be discharged. It would not be influenced necessarily by their preconceived notions of how long a patient should stay in the hospital after laparoscopic hysterectomy or vaginal hysterectomy.
Matteson: Did you note any sources of bias other than lack of blinding? Were they adequately addressed in the discussion section?
Phipps: The major source of bias that permeates the study is lack of blinding. The lack of blinding could even enter into the intraoperative measures of blood loss. Bias can be introduced by knowing which surgery is being performed and having unblinded individuals assess the outcome. For estimated blood loss, did the nursing staff or the surgeon record the blood loss at the end of the case?
Matteson: Who determined the outcomes? Why is this important?
Phipps: It is unclear who assessed the outcomes. It could have been the investigators, the surgeons, or other members of the team. This is important because, as previously mentioned, it could introduce bias in terms of when patients were discharged home. If the investigators or surgeons know the type of surgery that was performed, they could have a preset idea of what length of stay that surgical procedure would have. Or the investigator could actually have a preference for a surgical procedure and may actually let a patient go home sooner, given that preference. In general, for a randomized trial, the person doing the intervention should be separate from the person who is analyzing the outcome. I don't believe this was adequately addressed as a possible limitation.
Matteson: How was sample size determined for this study? Does this make sense?
Raker: The sample size of 60 patients, 30 per arm, was based on the primary endpoint: hospital stay. For the calculation, hospital stay was defined as the proportion discharged on day 2 and was assumed to be 5% in the vaginal arm and 30% in the laparoscopic arm, for an absolute difference of 25%. It would be helpful to know why this effect size was chosen. Although a test of proportions was used for sample size estimation, the probability values that were reported in Table 3 in the article and the abstract are based on a 2-sided t-test of mean hospital stay in days. Thus, there is some inconsistency in the methods.
Matteson: What statistical methods were used to compare the groups? Do you think the methods were adequate?
Raker: The primary outcome, hospital stay in days, was compared by t test, as were other intraoperative and postoperative continuous outcomes: operative time, blood loss, pain score, and days of analgesic requests. T tests assume that the continuous data are distributed normally or that the sample size in each group is large enough to assume that the sample means are distributed normally by the central limit theorem.
With 30 patients per group, the latter assumption may not hold, and nonparametric methods may be more appropriate. Therefore, the authors should have commented on whether distribution checks, such as normal probability plots, were performed to assess normality. Categoric outcomes (fever, major complications, and all follow-up variables) were appropriately analyzed by χ2 or Fisher exact test.
Phipps: Having a more explicit description of how they did their assessment would actually help us understand whether bias was introduced. For example, for the decision about discharge, did everyone have their assessment at the same time of day, every day, or was it twice a day? Could they have been discharged in the afternoon, or was it only in the morning? I think those types of things leave readers at a bit of a loss as to how to interpret the overall findings.
Interpreting mean hospital stay is difficult because they don't mention what time of day the surgeries were conducted. One thing that was not clear was how they measured their main outcomes. How did they measure length of stay? Was it just the day of discharge? This is what I think they did. If someone's surgery ended at 5 pm and discharge was on day 3 at 10 am, is that different from someone whose surgery ended at 10 am and who was discharged home on day 3 at 10 am? The results might have been easier to interpret if they looked at duration of hospitalization rather than day of discharge.
Matteson: Were participants analyzed in the groups to which they were allocated?
Jackson: The patients were analyzed in the groups to which they were assigned.
Matteson: Did the study account for participants at each stage of the study? Take us through participant flow. Was follow-up complete?
Sacco: The study does account for participants at each stage. The number of women lost to follow-up in each group was not very different, and overall follow-up was good. There was no crossover between the 2 groups (ie, if a woman was assigned to a certain procedure, that procedure was done 100% of the time). If there had been any crossover, the participant should still have been analyzed in the group to which she was first assigned. Although this was not an issue in this study, it is a very important point to bring up whenever we are talking about a randomized controlled trial.
Matteson: Take us through Table 1. Why is it important that the baseline characteristics were the same? How could it affect the results if, say, uterine volume was drastically higher in the vaginal hysterectomy group?
Jackson: Table 1 in the article is a standard table in that it outlines the baseline characteristics of the participants. Although none were significantly different between the 2 groups, differences in nulliparity and BMI approach significance. Given the small sample size, the differences may still influence the results. It is important to have these characteristics equal in groups so that they don't affect your outcome.
For example, let's say uterine volume was drastically higher in the vaginal hysterectomy group. A bigger uterus could mean a more difficult surgery, regardless of approach. Nulliparity and increased BMI could also make 1 type of hysterectomy more difficult than the other. A more difficult surgery could lead to a longer hospital stay. Another interesting thing is that, in the baseline characteristics, they never tell you how many cesarean sections a participant had or the specific type of pelvic surgery that had occurred. These both could influence either surgery. If someone had 3 cesarean sections or had a surgery for a ruptured appendix, they would be more likely to have adhesions and then have a tougher surgery. It would be helpful to know whether the groups were different in terms of difficulty of the cases when the surgeons performed the procedure.
Matteson: Take us through Table 2. Why is this important?
Sacco: Table 2 in the article is important because it presents some of the major intraoperative outcomes that are associated with hysterectomy. If there were definitive differences in these outcomes between modes of hysterectomy, it could help surgeons decide which surgery may be appropriate for a given patient. Laparoscopy took longer than vaginal hysterectomy but was associated with less blood loss. However, the clinical differences were very small. If the differences were clinically important and the surgeon wanted to minimize operative time for anesthesia reasons, vaginal hysterectomy could be considered a better option. Blood loss is very difficult to compare between these 2 types of surgeries. A laparoscopy may have bleeding that the surgeon does not see because the collection is in the cul-de-sac or in the paracolic gutters, for example. With vaginal hysterectomy, the surgeon is going to be aware of even minimal bleeding.
Matteson: Can operative time and mean blood loss affect length of stay? Do you think it is likely that a 20-minute difference in surgical time or an approximately 100-mL difference in mean blood loss would drastically affect length of stay?
Sacco: Both of these factors can have an impact on length of stay. However, I think that the study's noted operative time difference of only 20 minutes likely would not affect the length of stay. Likewise, the difference of mean blood loss of 100 mL wouldn't significantly influence the length of stay.
Matteson: What do you think about the proportion of patients who actually underwent the planned adnexectomy?
Phipps: This is also presented in Table 2. In the laparoscopic group, 100% of planned adnexectomies were carried out. In the vaginal hysterectomy group, 73% of women who were to undergo adnexectomy actually had the procedure. We don't really know the indications for adnexectomy in these patients, and we don't know why they couldn't do all of the procedures in the vaginal hysterectomy group. It is an interesting difference, but we would like to know why it exists. It could be that, at the time of vaginal hysterectomy, the ovaries were visualized, palpated, and felt normal and therefore were left in place. This would be an appropriate management.
Matteson: Take us through Table 3. What were the important findings?
Jackson: Table 3 in the article presents the postoperative parameters, starting with the primary outcome, length of stay. It presents data on secondary outcomes, postoperative pain, and pain control. It also elaborates on factors that can contribute to length of stay. The only significant differences were the length of stay (mean hospital stay and proportion discharged on each day), the mean days of analgesic request, and the pain level on day 0. In terms of these factors, the laparoscopic approach was favored based on the shorter stay and better pain control on day 0.
Matteson: What criteria were the vaginal hysterectomy patients not meeting that led them to a later discharge? Is this stated in the results?
Jackson: The article detailed the discharge criteria, which included a resumption of normal bowel motility; normal abdominal and vaginal findings, as determined by objective criteria; absence of fever; absence of urinary problems; and patient comfort. There is a difference in postoperative fever, but it is not significant. In addition, pain on day 0 differed between the 2 groups, but no difference existed on any other days. On average, members of the vaginal procedure group requested analgesia for 1.6 days; those women who underwent laparoscopic hysterectomy requested analgesia for an average of 0.9 days. The difference was significant (P = .017).
It is unclear what kept patients in the hospital. It seemed that, on the main days of discharge, their conditions were fairly similar. So, something else contributed to the discharge decision. If, for example, that factor was indeed preconceived notions of when discharge should take place, then using personnel who were blinded to the procedure might have dispelled this concern. This was not presented thoroughly in the discussion section.
Matteson: The follow-up data are presented in Table 4. What did the authors report?
Phipps: In Table 4 in the article, we see results from the 1-, 6-, and 12-month follow-up. Looking at prolapse, urinary problems, sexual activity, and resumption of work, there were no significant differences between the laparoscopic group and the vaginal hysterectomy group.
Matteson: Looking at the data shown and the analyses performed, could the authors have done anything differently? For example, could they have shown a table that displayed length of stay as a function of the other factors, such as operative time, blood loss, and mean days of pain medication request?
Raker: Presentation of the endpoint results would have been enhanced by including measures of association, such as mean difference or risk difference and their corresponding 95% CIs. These measures would provide a range of estimates compatible with the data. Adjustment for baseline prognostic factors by stratification or regression would have addressed possible residual confounding. The authors discussed how differences in the distribution of intra- and postoperative variables between the groups might have accounted collectively for the association between surgical technique and hospital stay. However, a table with these variables and length of stay would show how strongly these factors predicted hospital stay in this population.
Matteson: What was the main stated conclusion? Can the results support the claim that vaginal hysterectomy is not superior to laparoscopic hysterectomy?
Sacco: The main stated conclusion is that a vaginal hysterectomy is not necessarily superior to a laparoscopic hysterectomy. I think the study can support the claim that a vaginal hysterectomy is not necessarily superior, but I don't think it can support the claim that a laparoscopic hysterectomy is superior to vaginal hysterectomy. It is probably more accurate to say that the patients who underwent a laparoscopic hysterectomy in this study had a shorter hospital stay than those women who underwent a vaginal hysterectomy.
Matteson: Do you think the ability to remove ovaries at the time of vaginal hysterectomy (compared with laparoscopic hysterectomy) is surgeon dependent? Would this study change your management?
Phipps: It definitely can be surgeon dependent. Adhesions, previous surgery, and many technical aspects of the surgery itself affect whether adnexectomy could be accomplished. Not all surgery is the same. For this study, we can be somewhat reassured that the surgeons' skill was comparable so that should not have influenced the surgical procedure. Based on this study, if I needed to do a salpingo-oophorectomy, I don't think it would change my management. Sometimes it might be easier to take out ovaries laparoscopically, but every surgeon and every surgery is different.
Matteson: How does this study apply to your clinical practice?
Sacco: One thing I would take away from this is that, if a patient absolutely needs to have a salpingo-oophorectomy, a laparoscopic approach may be preferable. As I stated before, that decision will still depend on how comfortable the surgeon is with the surgical procedure. For a surgeon who is equally comfortable with vaginal and laparoscopic hysterectomy, it may be better to do a laparoscopic hysterectomy if there is a solid indication for removing the tubes and ovaries.
An important point to mention is that this study applies to a very narrow subset of our patients who undergo hysterectomies. It included patients who had unspecified indications for hysterectomy, a small uterus, no history of surgeries for pelvic inflammatory disease or endometriosis, no significant pelvic organ prolapse, no ovarian cysts > 4 cm, and no suspected malignancy. This applies to a very small subset of our patients who need a hysterectomy and for whom it is not quite clear which surgical intervention may be better.
Matteson: How could this study have been improved?
Jackson: I think the study could have been improved by keeping as many people as possible blinded to what procedure occurred for as long as possible. Personnel who evaluated the outcome could have been blinded also. The patients could have been blinded as well; they could all have been given a vaginal pack and adhesive bandages over various parts of their abdomen to represent possible incisions. The nurses could have been told that the study patient had either a laparoscopic or vaginal hysterectomy; no other information about the approach would have been provided. This could have reduced the potential for bias in outcome assessment and postoperative care. Postoperative care could have been more standardized. For example, every study patient could have been offered pain medication and ambulation at prespecified times.
Matteson: Are there any other approaches that could address this study question?
Raker: A prospective cohort study could be performed with adjustment for group differences by propensity scores. With this method, characteristics at baseline would be entered into a regression model with a surgical treatment arm as the outcome. The resulting score, or predicted probability from the model, would be used to match patients in each treatment group. This should result in balanced distributions of the baseline factors in each group. However, unlike randomized trials, unmeasured confounders may not be evenly distributed.
© 2009 Mosby, Inc. Published by Elsevier Inc. All rights reserved.