Volume 196, Issue 5 , Pages e1-e5, May 2007
Discussion: ‘Spot urine testing in evaluation of preeclampsia’ by Wheeler et al
Article Outline
- Abstract
- Discussion Questions
- Introduction
- Background
- Study Objective and Design
- Statistical Analyses
- Conclusions
- References
- Copyright
In the roundtable that follows, clinicians discuss a study published in this issue of the Journal in light of its methodology, relevance to practice, and implications for future research. Article discussed:
Wheeler TL, II, Blackhurst DW, Dellinger EH, Ramsey PS. Usage of spot urine protein-to-creatinine ratios in the evaluation of preeclampsia. Am J Obstet Gynecol 2007;196:465.e1-465.e4.
Discussion Questions
ROC = Receiver Operator Characteristic Curve; PPV = Positive Predictive Value; NPV = Negative Predictive Value.TABLE. Test performance of spot P:C ratio assessment for 24 hour urine protein results
24 hour urine collection quantitative protein Optimal spot P:C (per ROC analysis) Sensitivity (%) Specificity (%) PPV (%) NPV (%) 300 mg 0.21 86.8 77.6 81.9 83.3 1000 mg 0.46 87.5 82.4 53.8 96.6 2000 mg 0.82 100 94.8 62.5 100 5000 mg 3.0 100 100 100 100
Introduction
Obstetricians are always alert to evidence of preeclampsia among patients in the third trimester. Concerns typically arise when patients present with increased blood pressure, spurring a detailed laboratory assessment. The gold-standard for proteinuria, a key component in the assessment of preeclampsia, is a 24-hour urine collection, which can delay diagnosis. A more rapid test capable of accurately predicting the results of the 24-hour urine would be valuable. With that in mind, researchers examined the correlation between a spot urine protein-to-creatinine ratio (P:C) and a 24-hour urine collection in patients under evaluation for preeclampsia. A strong correlation, if it exists, could influence our approach to patients.
Background
Odibo: Preeclampsia is a significant contributor to maternal mortality, not only in the United States, but internationally, and it affects 2-8% of all pregnancies.1 One of the ways we diagnose preeclampsia, apart from the blood pressure criteria, is to look for the presence of significant proteinuria. As we know, waiting for the 24-hour urine collection and results can often delay the diagnosis of preeclampsia. This paper by Wheeler and colleagues, discusses the use of a spot urine P:C for the evaluation of preeclampsia.
Odibo: Is this a useful topic for study, and if so, how useful is it to the general obstetrical population?
Despotovic: This is an excellent topic, given that the ability to accurately substitute a spot urine P:C for a 24-hour urine collection would have significant implications in patient management, including facilitation of prompt clinical decision-making. This would also impact health care costs and improve patient satisfaction with care. Given that preeclampsia is a common and serious complication of pregnancy, any tool that aids in more rapid diagnosis would benefit both patients and obstetricians.
Study Objective and Design
Odibo: Is the objective of the study clearly stated, and is there a well-defined study hypothesis?
Rampersad: The objective, clearly stated, was to compare 2 methods of assessing urinary protein: a spot urine P:C and a 24-hour urine collection.
Odibo: How would you describe the overall study design?
Despotovic: This was a prospective study to determine the correlation between a spot urine P:C and 24-hour urine protein collection. Urine was collected from patients who met admission criteria for preeclampsia, and the spot value obtained at the beginning of the 24-hour collection period was compared to the result acquired from the 24-hour collection to see if these values were correlated.
Odibo: Is the design primarily related to the study question, or was this a secondary analysis from a different study?
Despotovic: The study was designed to specifically address the study question. There was no information given to indicate that this was a secondary analysis.
Martin: I had some questions about the overall design.. They did not use the term prospective in their description; they only reported that samples were gathered from 154 patients who met the admission criteria. The samples were collected over a period of a year and a half. Considering how common preeclampsia is, this number is not very large. We don’t know whether these were consecutive patients, or some fraction of the eligible population.
Odibo: Your points are well taken, Dr. Martin. We will address the sample size in a moment.
Odibo: Were the inclusion and exclusion criteria appropriate for the study design?
Martin: The inclusion of women meeting inpatient criteria for the evaluation of preeclampsia implies that if correlation between the spot P:C and the 24-hour urine collection was high, then a P:C could replace the 24-hour urine collection. As Dr. Despotovic pointed out, this would shorten the duration of inpatient evaluation for many patients. I agree this was a worthy group to study.
No information was provided regarding the presence or absence of comorbid conditions such as diabetes or underlying renal insufficiency. We have to assume, therefore, that the inclusion criteria were very broad, and could be generalized to any pregnant woman who met inpatient criteria for the evaluation of preeclampsia. Very few patients were excluded. Exclusion appeared limited to women who had bacteriuria or who were recumbent for more than 24 hours. We do not know what percentage of the eligible study population was excluded on these grounds. Women who did not complete the 24-hour urine collection because they delivered before the collection period ended were also excluded, which seemed reasonable.
It would be interesting to have more information regarding the factors that led to delivery in the group that delivered before completion of the 24-hour urine collection. This was nearly 20% of the study population. Perhaps even more important, it’s not clear whether the managing physician was blinded to the urine P:C, or whether this information was somehow utilized in the decision to deliver. This might have, in fact, significantly skewed the severity of proteinuria in those women who went on to complete a 24-hour urine collection. Women with the highest P:Cs may have been pulled out early in the absence of blinding.
Odibo: Those are very valid points, which bring me to the next question. Was it appropriate to exclude women who were on best rest for more than 24 hours?
Martin: Two references were provided to support the exclusion of women who were on bed rest for more than 24 hours. The first was a paper by Young and colleagues that does not evaluate the impact of bed rest on urine P:C but cites the second reference.2 The second reference was a paper by Ginsberg et al published in 1982.3 In the paper by Young et al, P:Cs were collected—optimally—from women before the 24-hour urine collection began. However, one third of the population was inadvertently not sampled before commencement of the 24-hour urine collection. Instead, they were sampled at the end of the 24-hour collection. There was no difference in these 2 groups. Given that it is highly likely that those women whose collections were performed at the end of the 24-hour urine had been recumbent for more than 24 hours, it questions the validity of excluding those patients. It is unlikely that these women, who were having a 24-hour urine collection while being observed in the hospital for preeclampsia, were up and walking around.
The Ginsberg study is interesting. It involved a group of men and nonpregnant women with renal disease. Several important differences in the methods and results of this study diminished the ability to generalize their data to the paper currently under discussion. First, the Ginsberg population was ambulatory throughout the study period, and urine was collected in an outpatient setting. Ginsberg found a near-perfect correlation between the random urine P:C and the 24-hour urine, with a correlation coefficient of 0.97. No study of pregnant women has been able to duplicate this, as far as I am aware. They evaluated 5 time periods during the 24-hour urine collection and found that correlation coefficients were lowest for samples voided during the night or upon arising. Regardless, none of the samples were obtained following 24 or more hours of recumbency. As is well known, there is variability in the secretion of protein and creatinine throughout the 24 hours of the day, and physical activity has an impact upon excretion. Based on the data, it would appear that the exclusion of women with more than 24-hours recumbency is controversial. Because of this natural variation, it is unlikely that any single evaluation of urine protein will accurately reflect the 24-hour collection.
Odibo: The authors used criteria from the American College of Obstetricians and Gynecologists (ACOG) for defining new-onset hypertension. Over the years, there have been many changes in the definition by both ACOG and the International Society for Hypertension. How generalizable is this definition, especially if you go outside the United States?
Rampersad: This study used the ACOG definition, which is widely used in the United States, for diagnosing new-onset hypertension. This definition may not be generalizable in other countries where definitions set forth by other groups have been adopted.
Odibo: In studies that involve laboratory procedures, technical details about the procedures are inevitable. They should be conveyed, however, in language that is understandable to the general readership. Was the description of the 24-hour urine collection and the P:C explained in acceptable terms?
Despotovic: The methods used to quantitate urinary protein and creatinine were listed but were not explained in detail. However, I believe the reader is able to find the technical details of these methods by using the references they cited. A lengthy description of the technical aspects of the laboratory methods used to determine the protein and creatinine values would add little to the overall understanding of the study.
Statistical Analyses
Odibo: What does the Pearson correlation coefficient tell us, and when is it appropriate to use this compared with the Spearman Correlation?
Allsworth: The Pearson correlation coefficient is an estimate of the linear relationship or agreement between 2 variables. It ranges from -1, a perfect negative correlation, to +1, a perfect positive correlation. A correlation of 0 indicates no linear relationship. The Pearson correlation assumes that both variables are interval or ratio and are approximately normally distributed. The Spearman Correlation, on the other hand, is a nonparametric estimate of linear correlation. It is based on the ranking of values for the 2 variables and makes no assumption about their distribution. The Spearman correlation is appropriate for ordinal data. While agreement is interesting in this context, it is not a measure of test accuracy as would be an estimate of sensitivity, specificity, or some of the other measures that the authors presented.
Odibo: The receiver operator characteristic (ROC) curve is usually an interaction between the test sensitivity and specificity, or the false-positive rate. In this study, the authors used the point of the left shoulder to determine the optimal cutoff point for each 24-hour urine level assessed. Are there other methods of calibrating the ROC curve, and if so, are they superior or equivalent to the method used in this study?
Allsworth: There are multiple approaches a researcher might undertake to select the optimal cutpoint or threshold, and its superiority or equivalence depends on the specific clinical scenario. The approach by Dr. Wheeler and colleagues—termed the left shoulder of the curve in this report but also referred to as the point closest to (0,1)—is common in studies of diagnostic accuracy. This approach seeks to minimize classification by maximizing sensitivity and specificity simultaneously.
Other approaches for cutpoint determination are available and include logistic regression, discriminate analyses, and the Youden index, to name a few. The Youden index is conceptually similar to the approach used by Dr. Wheeler. It is the maximum vertical distance from the curve to the chance line; the diagonal line across the ROC curve or the point furthest from chance. Recent studies have found that the Youden index may be preferred to the approach used by Wheeler and colleagues, as it minimizes misclassification of patients. The Youden index, logistic regression, and discriminate analyses are approaches readily extendable to include information on the relative cost of procedures, as well as the impact on patients and outcome prevalence. Selecting the appropriate methodological approach for cutpoint determination must balance minimizing misclassification with information on relative cost and prevalence on an outcome by outcome basis.
Odibo: How frequently is the Youden index used?
Allsworth: The use of the Youden index is not yet common. It was developed in the 1950s but not used extensively. A number of recent papers have begun to explore its properties in greater detail and its use is increasing. In general, the mathematical complexities of some of the techniques for optimal cutpoint determination are a barrier to their proliferation.
Odibo: What does the ROC curve tell us about the proposed test?
Rampersad: The ROC curve graphically represents the relationship between sensitivity and specificity and is used to measure the accuracy of a diagnostic test. The area under the curve (AUC) is frequently used as a summary measure of the ROC curve. An AUC of 1 classifies a test as having excellent accuracy compared to 0.5, which denotes poor accuracy. In this study, a urine P:C of 0.21 corresponded with a protein excretion rate of 300 mg/24 hr. The AUC was 0.86, indicating good accuracy. Ratios of 0.46, 0.82, and 3.0 represented 1000 mg/24 hr, 2000 mg/24 hr, and 5000 mg/24 hr, respectively, and the matching AUCs were 0.91, 0.98, and 1.0, respectively. All of these are characterized by excellent accuracy.
Odibo: The introduction mentioned a meta-analysis in which the cumulative negative likelihood ratio of the urine P:C in studies of women with preeclampsia was shown to be 0.14.4 Wheeler and colleagues inferred that this was a good negative likelihood ratio. In their own methodology, however, they did not describe that as one of the features they planned to use in categorizing ratios. Therefore, what does the likelihood ratio tell us about a test, and how does this differ from negative and positive predictive values?
Allsworth: The likelihood ratio combines information on the sensitivity and specificity to improve inference about the odds of an outcome in a given patient. Likelihood ratios indicate how likely a given test result is among those with the outcome relative to those without the outcome. The positive likelihood ratio is the probability of a test being positive in a person with the outcome divided by the probability of the test being positive in a person who does not have an outcome. Practically speaking, once calculated, the likelihood ratio can be used to create a revised estimate of the odds of disease. For example, using the estimates provided by the authors in Table 1 for the category of 300 mg/24hr, the likelihood ratio of a positive test would be 3.9. If we assume an odds ratio for the outcome of 1 in 100 before the test, this would be increased to 1 in 25 post-evaluation.
Likelihood ratios are similar to positive and negative predictive values in that they are useful to the clinician when counseling patients on test results. Positive predictive values are the probability of having an outcome given that a person has tested positive. A negative predictive value is the probability of not having an outcome given a negative test result. Unlike the likelihood ratio, positive predictive values are dependent on the prevalence of the outcome in a given population. The authors definitely had enough information to evaluate and present the likelihood ratio as well.
Odibo: More recently, physicians have begun to counsel patients using numbers regarding the pretest and posttest odds just like in Down syndrome screening. The use of a likelihood ratio of positive and negative can be very useful.
Allsworth: Yes, and it’s independent of patient population. Negative and positive predictive values are specific to precise patient populations. If you are comparing to a different clinical population where the prevalence of an outcome is very different, the predictive values will not apply, but the likelihood ratios will.
Odibo: Are we provided with accurate information about a study population? How important is the population demographics in the interpretation of the study result?
Allsworth: The demographics and the baseline clinical information are exceedingly important to evaluate the generalizability of the study and whether or not it’s representative of a specific clinical population within a given setting. It would have been useful to have more information to evaluate both the generalizability overall and across clinical settings.
Odibo: Yes, and there may be some comorbid conditions, such as diabetes, that might increase proteinuria and affect some of the results.
Odibo: What do you think of the correlation between the spot P:C and the 24-hour urine from the results provided?
Despotovic: This study demonstrated a strong correlation (r=.88) between the spot P:C and the 24-hour urine protein. However, using the spot P:C of 0.21 as a correlate to the critical value of 300 mg of protein over 24 hours would result in the failure to identify significant proteinuria in approximately 13% of affected patients. Although the spot P:C is strongly correlated with the 24-hour urine protein, its clinical application as a primary diagnostic tool is thus limited.
Odibo: Is the area under the curve for the different 24-hour urine cutoffs highly discriminating?
Rampersad: The latter 3 values—0.46, 0.82, and 3.0—are very highly discriminating compared to the ratio of 0.21.
Odibo: Does Table 1 provide us with enough information on the study results?
Martin: For the most part it does. I believe Table 1 should have included the number of patients in each group. They inferred that there were only 5 in the highest category, but we really have no idea what the distribution was throughout the remainder of the critical protein cutoffs.
Odibo: What do you think about the sample size and power of the study?
Rampersad: Power is an important aspect of experimental design, as it decides how large your sample size should be in order to make a statistical conclusion. If the sample size is too small, it may not be reliable and caution should be used when making a conclusion. The authors stated that 154 patients were initially included, but only 126 had complete information. This sample size appears to be small.
Allsworth: If confidence intervals had been presented around the sensitivity and specificity, we would have had more information about the precision.
Odibo: What goes into the calculation of the ideal sample size for a study like this?
Allsworth: Although many studies of diagnostic tests rely on convenience samples, it is important to incorporate sample size estimation into the study planning. In general, sample size calculation for diagnostic testing is pretty straightforward. A researcher will need a number of pieces of information to calculate sample size, including estimates of prevalence of the outcome, ideal desired specificity and sensitivity, as well as desired precision or width of a confidence interval.
Conclusions
Odibo: Will the results of this study help you to manage patients with suspected preeclampsia?
Martin: In view of the fact that the P:C was not demonstrated to replace the 24-hour urine collection in women meeting criteria for inpatient evaluation for suspected preeclampsia, I felt that the study did not support performing this analysis. If the treating physician deemed some form of crude assessment of proteinuria to be helpful in the initial assessment of these women, a urine dipstick would appear to be almost as well-correlated and significantly less expensive. Several other papers have evaluated shortening the 24-hour collection time. These have achieved similar correlations, and in fact, were better than a spot P:C. A shorter collection period might ultimately prove to be a more robust surrogate for the current gold standard of a 24-hour collection.
Odibo: What do you think is the take-home message of the study?
Despotovic: The study demonstrated that although there is a strong correlation between spot urine P:Cs and the 24-hour urine protein collections, they cannot be viewed as equal in their ability to quantitate proteinuria. The 24-hour urine collection should remain the gold standard for the quantification of proteinuria in the evaluation for preeclampsia.
Odibo: Do you have any suggestions for future studies on the use of spot P:C in the evaluation of patients with suspected preeclampsia?
Rampersad: It would be of interest to learn how accurate the P:C is compared to the 24-hour urine collection in other processes that increase risk for preeclampsia, such as diabetes, chronic hypertension, multiple pregnancies, high body-mass index, and preexisting proteinuria.
References
- . Clinical review: management of preeclampsia. BMJ. 2006;332:463–468
- . Use of the protein/creatinine ratio of a single voided urine specimen in the evaluation of suspected pregnancy-induced hypertension. J Fam Pract. 1996;42:385–389
- . Use of single voided urine samples to estimate quantitative proteinuria. N Engl J Med. 1983;309:1543–1546
- . Use of protein: creatinine ratio measurements on random urine samples for prediction of significant proteinuria: a systematic review. Clin Chem. 2005;51:1577–1586
PII: S0002-9378(07)00438-3
doi:10.1016/j.ajog.2007.03.070
© 2007 Mosby, Inc. All rights reserved.
Volume 196, Issue 5 , Pages e1-e5, May 2007
