Just in from U.S. District Court for the District of Columbia: Judge Reggie B. Walton has thrown out the U.S. Environmental Protection Agency’s water quality guidance for coal mining in Appalachia, a central part of the Obama administration’s crackdown on mountaintop removal.

Here’s the bottom line from the ruling:

*The Court is not unappreciative of the viable interests asserted by all parties to this litigation. How to best strike a balance between, on the one hand, the need to preserve the verdant landscapes and natural resources of Appalachia and, on the other hand, the economic role that coal mining plays in the region is not, however, a question for the Court to decide. In this litigation, the sole inquiry for the Court is the legality of the Final Guidance, and, for the reasons set forth above, that inquiry yields the conclusion that the EPA has overstepped its statutory authority under the CWA and the SMCRA, and infringed on the authority afforded state regulators by those statutes.*

UPDATED: Read our Gazette print story online here.

I’ve posted a copy of Judge Walton’s decision here. Federal judges have previously thrown out EPA’s plan to coordinate mining permit reviews with other agencies and the EPA move to veto the permit for the largest strip-mine in West Virginia history.

Here’s a statement just out from the National Mining Association, which had challenged the EPA guidance:

*“NMA is gratified by today’s decision in NMA v. Jackson in which the U.S. District Court for the District of Columbia set aside the Environmental Protection Agency’s (EPA) Final Guidance for coal mining operations in Appalachia because the guidance and agency’s activities have overstepped the bounds of the law. As we have always maintained, EPA has engaged in an unlawful overreach in its attempt to commandeer the permitting responsibilities the law places with other state and federal agencies.*

*“Today’s decision has truly given coal miners and coal mining communities their ‘day in court’ and has affirmed NMA’s longstanding belief that EPA overreached its authority in its virtual moratorium on Eastern coal mining permits and denied those operations the protections provided for under the law. It is now time to get miners back to work by allowing the state permitting agencies to do their jobs.”*

UPDATED: Here’s what EPA had to say today —

*The EPA is reviewing today’s District Court decision regarding the agency’s July 21, 2011, Mountaintop Mining Guidance. We will continue to protect public health and water quality for Appalachian communities under the law.*

Soyedina I don’t know if you are deliberately ignoring the obvious, or you really don’t get it. This is pre- and post-data. So temporal repeats are not pseudoreplication. Averaging pre- with post- (based on tabled data that were already averaged) does nothing to remedy an alleged pseudoreplication.

Further, in correlation analysis, there is no correct dependent or independent – you are looking for shared distribution, not cause and effect. They didn’t present a regression analysis, they presented a correlation analysis. And the probability value is directly related to the coefficient AND the number of samples, but at a coefficient that low, there is no need to consider the probability threshold, because it is a defined ‘WEAK’ correlation, regardless of p value.

I am not pooh pooh ing anything except your misdirected attacks on a nice piece of real science. And regression is very different from correlation. I taught statistics at the college level for a number of years. I am pooh poohing how this real data, that says actual things about an actual stream, in a clearly defined watershed, is casually brushed aside by comments that don’t apply. That is what you did, and that is what I meant. I dont know any other way to explain it.

Warren, perhaps you can understand the problem if you consider how the values of a single repeated measures at a site is not independent of the other observed values at a site. Although the authors don’t report the statistics, so we cannot be sure how they analyzed these data, if they did, if the regression was performed on every single data point as an independent value then this is the

epitomeof pseudoreplication.If you want to design a study that examines a change in a variable pre and post a disturbance, you could not choose this study design because it does not even provide a before-after-control-impact hypothesis test and cannot distinguish between several competing alternate hypotheses.

You suggest that we think of this study as a “correlation” analysis, but there is no correlation coefficient provided for the relationships in this paper and it’s not clear that the authors analyzed this. The r-squared (coefficient of determination) value for the linear regression scales between 0 and 1, and is not the same as the correlation coefficient (r) which varies between -1 and 1. Although the authors describe this as a “correlation analysis”, it appears that they simply used excel to run a regression through all the data points. In other words, Figure 5 is a linear model that does predict values as a specific function between variables, but they plotted those variables on the wrong axes in order

to answer the question they believe they have answered with the statement “The WV-SCI assessment of these streams shows that elevated conductivities have not limited macroinvertebrate assemblages” p.15. They did not report the correlation coefficient between the variables, anywhere in this paper, they clearly thought of this as a regression question. Because

that’s how they interpreted it in their own words.The form of the regression in Fig. 5 asks “Does WVSCI predict conductivity?” which is clearly not the question we seek to ask, “does conductivity predict WVSCI”, to make inferences about the value of the conductivity threshold as a predictor of biological water quality.

And when that regression was performed, it used a pseudoreplicated analysis that allows too many degrees of freedom to the model. If you have taught statistics then you know this, handwaving aside. You say this is “weak” so that you don’t have to worry with a p-value, but I will agree that there is no sense in fooling with p-values for models that cannot apply to the data. The model that treats every value as an independent observation is a model that cannot apply to these data.

And, on average, conductivity is a robust predictor of the WVSCI data in this paper. If you wish to dicuss why spot measurements of conductivity or benthic assessments might vary in time or across years or field crews, that is an entirely different discussion from the one raised by AFC where AFC claimed that this paper demonstrated that conductivity is not a good predictor of biological integrity. That claim is not supported by this paper.

Soye, (can I call you Soye?) –

The claims the authors (Hart, Kirk and Maggard) make is supported. They are pretty clear about the claim, its specificity, and application. There is nothing anywhere near robust about any of the relationships presented. You cannot rely on your own averaging of averages because you have by definition reduced variability while decreasing sample size, falsely yielding a p value with no relevance to the real data. But back to the claims that the authors make:

First, in the abstracted version “These findings suggest conductivity is a poor primary indicator of aquatic health in certain reaches of central Appalachian streams.” and later, with more clarity and certainty, in the conclusion “The use of specific conductance as a distinct measure of aquatic health has shown limited correlation against macroinvertebrate and fish community integrity in southwestern West Virginia.”

Yes, their scatterplot looks like it was made in excel. Excel, unfortunately, precludes the use of appropriate non-parametric stats for the non-normality of both sets of variables, and it reports an R-square value for any best-fit line on a scatterplot.

Figure 5 is described as a scatterplot, and while not labeled individually with values, the data are presented as raw. Their subsequent inference that it shows a weak relationship is accurate. No sign indicator is necessary (as in the case of your example of switched axes and negative correlation) – the graphic itself demonstrates the inverse relationship between the paired data. There is no confusion of axes or relationships, so far. Now, given the coefficient of determination of 0.0681 (probably not an actual Pearson-product-moment based regression, though, but an Excel derivative used primarily for economics), a correlation coefficient may be calculated by finding subset z-scores for each of the data points, plotting those against each other, and reporting the slope of THAT line. In that case, a sign is used to indicate direction of slope.

Regardless, you may approximate a correlation coefficient here by finding the square root of the coefficient of determination, which is 0.26. In the event of a Pearson product-moment correlation test, any coefficient with an absolute value of 3 or less is variously interpreted as “small” or “weak.”

Keep in mind that the layout of Figure 5 appears to have 2 goals, 1) illustrate distribution of paired, raw datasets, and 2) present it in the context of the WVSCI impairment threshold.

It appears to me that the authors realized their best-fit line indicated a weak correlation. And most scientists agree that a weak correlation is not necessarily a good place to start looking for regression models for prediction. Or, another way of looking at it, without considering correlation, in fact, as a stat to report, is to realize that the coefficient of determination is indicating that changes in values in one dataset only account for approximately 6.8 percent of the changes in the values of the other dataset. Again, another way of using the numbers to say what they basically said in their narrative conclusions. It begs a question – How do you explain the other 93.2% ? I don’t know. But, to me, that is 93.2 percent AGAINST overreaching regulation. Does it really hurt to keep looking for the right answers, instead of propping up an incomplete one by pooh poohing data that doesnt fit the current buzz?

Now, for pseudoreplication. It does not apply here, in concept, just as your averaging values that were already averaged from unknown numbers of samples did nothing to remedy it. And certainly did not bolster the strength of any relationship. Mathematically, yes, your relationship appeared stronger, but it was no longer a relationship between two variables so much as it was misapplication of a parametric test to previously transformed, averaged, and grouped data, one being composed of lognormal distribution and one heavily skewed distribution. So that doesn’t tell us much. You made nice pictures, though.

On the other hand, your thoughts that sampling the same reach repeatedly yields pseudoreplication are either deeply mistaken OR you are hiding an underlying assumption about the nature of streams. IF within a given reach, either conductivity OR WVSCI had remained static while its paired component had varied, and that data had been used in correlation or regression, then the magnitude of the possible relationship would potentially be severely weakened. However, in the application of this paper, each paired dataset is both valid and independent, regardless of time or location, because the only question I see the asking is about the nature of the relationship between two variables that are championed as always going hand-in-hand. Another way of looking at this is, if by sampling repeatedly in the same reach, you invalidate assumptions of independence, what is it about that reach that causes the invalidation? Certainly not the conductivity or WVSCI scores, right? Therefore, there must be (an)other, significant factor in the relationship(s). Unfortunately, you just spend a lot of time trying to say that there isn’t.

Now, despite how you may interpret or rephrase my posts or intentions, I suggest that you re-read my initial post:

“Does it disprove previous measurements and statements about the relationship between conductivity and macroinvertebrate assemblages? Nope. Does it raise questions about the nature and/or causal effect of that relationship? Yes, it does. And it goes no further than that.”

That initial post was not intended to support or detract from whatever AFC was driving at; it was intended to point out how the paper was improperly being discarded by your claims, as well as others. It was even suggested that it was spin that the industry puts out there ‘as being absolute proof that the rest of the peer-reviewed, published science is mistaken.’

I simply disagree with characterizing the study that way. It doesn’t go that far, in its assumptions or conclusions, but it raises many questions about the nature of the relationship between conductivity and WVSCI, and it leaves room to question what other factors are there.

If I had the raw data (which I don’t) I would first apply sign tests or similar pre- and post- non parametric tests for differences, in terms of both means and variance. I would then consider (if available) patterns throughout the duration, for correlation and covariance. I would do this on all raw data, and also on seasonally and monthly – segregated subsets. Then across any subsets that might be apparent in attendant environmental and water quality parameters (if available), such as temperature, substrate (availability and distribution), drainage area, antecedent moisture, etc… I am of the opinion that lots more could be revealed. And most of it, just like the study, would be very much watershed specific.

How about, for example, degree of unitless channel incision within the sampled reach? Or, compare that as well as similar channel morphology parameters within contributing drainage areas? One of the reasons habitat as assessed by RBP generally fails to correlate with benthic communities or associated scores is the lack of resolution, either in the ranked, averaged total scores, or individual sub-metrics. Because even RBP habitat assessments were originally intended to be locally calibrated to reference scores, just like WVSCI.

What if WVSCI were calibrated within even smaller subregions? Why not at least limited to streams above 3% gradient, and tailored to 12-digit HUCs? Would the same story be told? And if so, what would it really mean, then?

None of this is intended to propose that mining does not, has not, or never will contribute to impairment of aquatic resources, however it is measured. I just wish people would let go of the conductivity idea and measure the right things. Conductivity will only get you so far as a predictor, as shown in the Hart, Kirk, Maggard specific case. There must be better answers.

I averaged the data they provided but did not analyze.

The paper claims

but then presents data that shows average conductivity strongly predicts the average WVSCI among repeated observations at only four sites. How does this suggest that conductivity is a poor primary indicator? By taking many repeated measures at four single sites then treating those observations at four sites as a population of independent samples. That isn’t a valid experimental design, and you know that it isn’t.

You are intentionally avoiding the issue of “what is the appropriate null hypothesis to test”. Why would you do this? If you can get the raw data, please post them and we can perform this analysis correctly.

We have established, to a consensus I believe, that Figure 5 shows that spot measurements of WVSCI and conductivity can vary greatly among sampling events at a single site or set of sites. I happily agree that this is true, and I’ll add that the authors have not attempted to explain that variation. But if you are looking for the rest of the hidden “93.2%” variation then why don’t you post the data and we can see how much of it is accounted for by variation among sites? That’s not “begging the question”, is it? I’ll give you a hint: there isn’t 93.2% remaining variation because if we had a valid model for these repeated measures comparisons it would account for variation within and among sites and my guess is that the simple and invalid regression model the authors used would fail to explain much of the variance explained by models accounting for site and time effects. And I note that you have not admitted that the authors did not test hypotheses about correlation in this study, and that the coefficient of determination is from a

linear model used to predict conductivity as a function of WVSCI. Which is a problem.Pseudoreplication applies here because there are no replicable units to use to test for a times*treatment interaction. Treating correlated repeated measures at a site as independent observations is the very definition of pseudoreplication, and your struggle to correctly understand how that influences the confidence around

anyputative relationships in a dataset is part of the issue. For instance, it’s not “my thoughts that sampling the same reach repeatedly yields pseudoreplication”, you are right, it doesn’t. It’susing those samples as independent observations that yields pseudoreplication. That you refuse to acknowledge this remains curious, but to me it seems as if it could be better understood in the context of your sneers about professors publishing to meet quotas.As I mentioned, you are starting to ask the right question. If these sites vary greatly in spot measurements of WVSCI and conductivity (and the authors never analyze the site variation, and I can’t understand why wouldn’t), then what is the source of that variation? You seem to think that it’s possible to analyze data without a model, but I’m sure you don’t actually hold that view in all instances.

Well, the world is waiting for your manuscript. This apparently is not it, since it actually supports the conductivity relationship (as published). I wish people would use the appropriate models for statistical tests, then not move the goalposts for those tests, before they accuse others of slander or cynically implying that critics are merely gaming an academic system, in the face of contrary evidence.

I am not avoiding the issue of what null hypothesis to test. As a de-facto experimental design, a first step is to observe relationships between datasets, if any. The first, and simplest, observation they appear to have made (regarding conductivity and WVSCI) was a graphical comparison. And that said plenty.

The tabular data says a lot, too. Potentially. We dont really know what the underlying raw data were, so lets not argue about inferences and appropriate tests. We can both dream up how we would look at it, but we’d be better off flipping a coin before guessing at what it would actually reveal. So your averaging to correct for nonexistant pseudoreplication does no bolster anything, nothing is robust there. For example, I could construct a non-parametric test on that tabular data to show that, for every 200uS increase in conductivity over a ten year period, WVSCI scores will increase by approximately 6 points (hint – plot net change in conductivity from Table 3 vs net change in WVSCI in Table 3, segregated by site. ) That is a mathematical prediction, and it fits the data as presented in the table. But it is not a good one, and it is not robust. Mathematically, however, it is stronger than yours. That means nothing, however, because both are nonsense.

You said:

“By taking many repeated measures at four single sites then treating those observations at four sites as a population of independent samples. That isn’t a valid experimental design, and you know that it isn’t.”

Good experimental design? That depends on the question(s) being asked. Note that there are more than 4 sample sites, the grouping is by stream. Any idea how big those streams are? Please re-read the paper.

Now, given that conductivity clearly varied at a given site over the sampling time period, how can it be wrong to point at that it DID NOT co-vary with WVSCI scores? That is all they have done with the WVSCI data, that I can tell. Not pseudoreplication – temporal spread, for this dataset, precludes the allegation. Much in the same way that studies relying on spatially distinct sites are not accused of pseudoreplication when multiple sites exhibit similar conductivity measures. Independence is entirely defined by the questions being asked, not by a preconceived notion.

You said, in this regard, “… That you refuse to acknowledge this remains curious, but to me it seems as if it could be better understood in the context of your sneers about professors publishing to meet quotas.” Please explain what you mean by this, I am missing something.

You said:

“And I note that you have not admitted that the authors did not test hypotheses about correlation in this study, and that the coefficient of determination is from a linear model used to predict conductivity as a function of WVSCI.”

I don’t suppose I admitted anything, I didn’t feel there was anything to admit. I don’t see any indication that they needed to do anything beyond what they did, and presented, to conclude that conductivity is not a good predictor for the subjects in their study.

I do agree with you that the reported R-square is based on the relationship between the two variables, however, the R-square will not change by simply switching axes, as you are implying. You may call it a regression model, if you like, however, to imply that the authors intended to announce that WVSCI cannot predict conductivity is plain silly, and not at all related to the meat of the discussion, which is their conclusions, which remain valid, as best I can tell.

You said “…If these sites vary greatly in spot measurements of WVSCI and conductivity (and the authors never analyze the site variation, and I can’t understand why wouldn’t), then what is the source of that variation?”

I think that was part of their point, Soyedina. And it is the central point I have been making all along. What is the source? Clearly the variations in conductivity are not inducing the changes in WVSCI that have been observed elsewhere. Ergo, conductivity is not a good predictor here. In general, conductivity clearly increased over time for these sites. In the meantime, so did WVSCI. Why? I dont know, the authors dont purport to know. But that is what the numbers say.

Challenging me to post the raw data is out of line, when I have stated a number of times that I dont have it. And I am not attacking you or your opinions on what the raw data may show. I am not drawing inferences from an imagined version of the data (except when I speculated about the sorts of things I would like to test it for, which has no bearing on the central argument about their conclusions).

I would, in turn, challenge you to something a bit more practical. If you feel that their table 3 reveals conductivity to be a good predictor – no, a robust relationship that, contrary to what the authors say, is the single best predictor of WVSCI, then please, pick one of the sites, use the predictive power to define the relationship, and test your theory. Go to one of these streams, and collect a sample. Use an instantaneous conductivity measure to predict the WVSCI score of a concurrent benthic sample. You can report conductivity to me immediately, along with your prediction, and we can compare that to a final WVSCI score once that is calculated. Two caveats – label the benthic sample with nothing more than the conductivity value, and have the sorting and IDs performed by an NABS certified taxonomist who will retain the 200 bug count as a voucher. The data presented in the paper indicate that you have a very slim chance of predicting that score accurately. It doesnt say what your chances are, but it does clearly say that the relationship is weak.

A final question, what are you referring to with the goalpost metaphor? I missed that.

Regardless of whether your reply here, or not, I will not be carrying this on on the comments of this page. Your diction and hyperbole indicate that you are speaking for an audience, often not even answering my questions in a direct person, but carrying on a narrative about what ‘Warren’ presumes, implies, avoids, etc. I am only talking to you directly, about things I disagree with. Therefore, if you want to continue this discussion with me, you can email me a warren.salix@gmail.com

The point is that “data can’t say anything without an underlying model”. The naive model that site differences don’t account for some of that variation, or account for the temporal trend that is the focus of the authors of the paper, “says plenty less” than a realistic model that measures the variation attributable to these spatial and temporal factors.

You even recognize thiswhen you say that “conductivity clearly increased over time” at these sites, even though the authors have not actually tested this hypothesis.For example, Fig. 7 shows that conductivity both increased and decreased over time at these sites. If you truly claim that you believe that you can make a credible inference from naively regressing conductivity onto WVSCI, without accounting for site variability and temporal variability and variability associated with the mining changes upstream, then I suppose I should just give up trying to convince you otherwise. Fixing that sort of misconception is above my paygrade.

I was quite clear that the r-squared won’t change when you assign the variables their proper axes to represent the verbal model they discussed in the paper and that is the point of the exercise, that conductivity is a predictor of stream condition. Did not imply anything to the contrary.

I understand that there are four pairs of sites in direct comparison and these are part of a larger background of monitoring in these watersheds. This emphasizes why these results are meaningless without an understanding of the site variation, no matter what you think should be measured instead. Whatever you want to measure, this remains true.

Re goalposts– the r squared value they reported and which you later cited is a result from a naive and invalid regression model, and no formal analysis of correlation was given. You started this discussion by claiming that this was not a regression and that this was a “correlation study”, but when I pointed out that the authors didn’t analyze their data that way, did not use that terminology and in fact made statements demonstrating that they considered a linear model of “conductivity predicts WVSCI” to be the hypothesis they were testing, you moved the goalposts in a comment that provided an ad hoc calculation of the correlation coefficient you assumed they had in mind, a quantity not measured in the paper and that was not measured or discussed by the authors. And now you are moving the goalposts again to suggest that if I don’t jump up off my seat and run to go sample these streams then my comments pointing out the silliness at the root of the naive statistical model presented in this paper and defended by you are “not productive”.

I apologize if I mistakenly assumed you had these data, I think we have been posting past each other in some longer responses and I missed your comment about not having it.

I agree, in silico it is hard to tell what “Warren” presumes or implies or avoids, but it seems obvious in this thread that you have suggested professional and academic misconduct in researchers (“get off conductivity and measure the right thing”, “professors publishing variation of a single correlation multiple times to achieve a set quota”, criticizing the merits of this study are “repeating slanderous comments”, and you laugh while “others are lost in their own tautologies”). So, if those comments do not reflect your attitudes upon scientists, I beg your pardon.