B. Questions to the Experts and their Compiled Responses (Summarized)
Question 1: In relation to the concepts: probit analysis, dose-mortality test, and confirmatory test, could you: (i) state your understanding of these concepts; (ii) indicate the purpose of the tests; (iii) comment on the validity of using dose mortality tests for comparing the efficacy of quarantine treatment between varieties of the same commodity by calculating LD50 for each variety; (iv) comment on the confidence in predicting the level of mortality between varieties and the relative efficacy of quarantine treatment when using, for the approval of an additional variety, (1) only the procedure outlined in (iii); (2) the procedure outlined in (iii) and confirmatory tests; and (3) only confirmatory tests (to confirm the efficacy of the treatment already imposed for a variety of the same commodity); and, (v) indicate, in respect of the testing options outlined in (iv), how the type and quantity of the information derived from the tests on different varieties of the same commodity may vary, and what the causes for such variations may be.
In respect of the understanding of the concepts and the purpose of the tests, Dr. Ducom noted that the dose-mortality test, analyzed by the probit method, was the key to all trials concerning a living organism's response to a toxic. The test was widely used by scientists in efficacy studies on pests. It was informative in respect to the sensitivity of a species to a toxic. According to Dr. Ducom, the utilization of LD50 in dose-mortality test to compare the efficacy of quarantine treatment posed two problems:
- Japan had demanded that variety X be compared to a reference variety. However, it was sometimes impossible to have two varieties at the same time in the same physiological conditions if these had different ripening dates. This could lead to abnormal differences in the behavior of gas;
- the tests did not render a reliable statistical analysis given the fact that the number of insects and fruit in question was low and that the causes of variation, of whatever nature, had a great influence on the results. The following were two examples based on the Yokoyama nectarine trials, 1987. 169
- The "Summer Grand" variety had a LD50 of 6.3g/m3 compared to 15 - 18 for the other varieties tested, but the dose that killed 100 per cent of the insects was 40g/m3, compared to 30 - 35 for the others.
- Yokoyama and Vail, 1997, re-tested the "Summer Grand" and another variety tested in 1987 and found equivalent results in contradiction with those from 1987. 170
Dr. Ducom noted that in practice, the LD50 test constituted a fairly unreliable method to compare the efficacy of quarantine. Furthermore, Japan imposed a subsequent confirmatory test which was long and costly.
The confirmatory tests, by using a sufficient number of insects, gave the statistical confidence which permitted achieving the desired threshold of 99.9968 per cent mortality (probit 9). This test obtained a sufficient degree of confidence, but it was also costly.
Dr. Ducom noted that while the dose-mortality test (LD50) did not give any confidence in respect of the varietal factor, it did give an indication of the relative sensitivity of the products tested. However, these indications undoubtedly did not allow for the determination of the part played by the variety itself in relation to the other factors which could have an influence on the test results, such as fruit ripeness, annual climatic differences, etc. In this respect, the confirmatory test gave absolutely no indication of varietal differences, it either worked or it did not.
Dr. Heather noted that probit analysis was a biometrical technique for analysis of experimental data in which the quantitative response of an organism, usually as mortality, was subjected to regression analysis with respect to treatment dose, i.e., "dose-response" data. Mathematical transformation of mortality to probability units termed "probits" assisted in conversion of the normal distribution (curve) of the response data to a linear distribution to facilitate analysis. Dose data was frequently, but not invariably, logarithmically transformed for the same purpose of linearity. The outcomes of probit analysis were values such as LD (lethal dose), LC (lethal concentration) or LT (lethal time) for a nominated proportion of the population i.e., 50 per cent or 99.99 per cent, together with nominated confidence or fiducial intervals i.e., 95 per cent. 171
The main purposes of probit analysis were:
- to define susceptibility of a population of target organisms to a treatment in terms of LD, LC or LT values;
- subsequent comparisons of susceptibility of populations of target organisms;
- subsequent comparisons of varying response according to substrates, such as commodities;
- subsequent comparisons of treatments; and,
- prediction of the dose required for a specific level of treatment efficacy.
Comparisons were the most appropriate use of probit analysis.
Dose-mortality (dose-response) testing was an experimental procedure in which the response of an organism was estimated for a series of mortality-inducing doses of a specified treatment. It pertained to a group of tests known generally as bioassays. Dr. Heather noted that individual dose-mortality tests had to target a specific stage of an organism wherever possible as the susceptibility to a treatment could vary between life stages. The more direct the effect of the treatment or toxin on the target organism, in general, the more precise and reliable were the results.
The main purposes of dose-mortality testing were to produce data for analysis, possibly, but not exclusively by probit analysis, for:
- determination of above-mentioned parameters categorising the response of an organism;
- comparisons of efficacies of different treatments, organisms or substrates; and,
- prediction of a treatment dose to meet a required level of efficacy.
The target organism test unit was usually a sub-sample of 20 to 50 individuals typically replicated 3 times, at each dose level. For a satisfactory result, 5 or more dose levels were usually required, evenly spaced between 0 and 100 per cent mortality. The dose variable could be concentration or duration.
Confirmatory test was a term that had restricted usage in a quarantine sense. By contrast, probit analysis and dose-mortality testing were widely used in pesticide science. A confirmatory test as used by US researchers equated to a large-scale test as used by Japanese researchers. The concept applied to a single dose response test on sufficient numbers of the target organism to ensure that a required efficacy had been attained at a nominated statistical confidence level. Countries such as Japan and New Zealand had customized this test by requiring a number of sub-samples of minimum size. This had the practical advantage that it could then be used in an iterative way to establish the minimum dose required to achieve a desired efficacy.
Validity: In principle, dose-mortality bioassays were a valid method to characterize the responses of test organism populations for comparison of the efficacies of quarantine treatments between varieties of the same commodity, if adequate precision could be achieved. Use of LD50 values for this purpose would be acceptable where the more desirable whole response line comparison was not valid. Since the LD50 was effectively the mean response of the bioassay test population and where confidence belts were narrowest, it was arguably the most robust point of comparison. Nevertheless it had to be supported by other point-wise comparisons such as at the LD95, making the slopes of the lines more readily apparent. These LD (LC or LT) values would give more precise definition of response if the population of the bioassay organism was relatively homogeneous in its response to the treatment.
In respect of confidence, Dr. Heather noted that in practice, large-scale confirmatory testing was usually the most practicable and reliable assurance that a treatment was effective.
Variance: Dr. Heather noted that in phytosanitary experimentation variance was intrinsic to both commodity 172 and organism. If variance was not evident it would be cause for concern. The dose-responses of an organism to a quarantine treatment could be expected to be influenced by unavoidable variation within each commodity sample whether it was based varietally or otherwise. A test organism on the surface of a commodity would be relatively unaffected by interaction with the commodity and hence less variable in response than one present internally. As the stage for most codling moth tests was the egg, an external stage, it could be expected that its intrinsic susceptibility to the fumigant would be the same or closely similar for each commodity, given that all other conditions were the same.
For a fumigation, variance originating from a commodity would be expected to be mainly the result of sorption although other causes were possible. The resultant decay of the concentration affected the dose received by the target organism and warranted monitoring during the course of the fumigation, which was normal practice.
Dr. Heather noted in addition that other causes of commodity variation included interaction of the parent scion with rootstock and interstock, production locality, weather, site orientation, water management, nutrition, pests and diseases and their treatments, fruit initiation including pollination, orientation of fruit on trees, ripeness and maturity. This meant that where differences between varieties were small, fruit to fruit variation could greatly exceed variety to variety variation. Such variation was an inherent characteristic and was usually overcome by ensuring adequate robustness of the treatment.
Mr. Taylor noted that probit analysis was the application of a statistical programme to data obtained from dose-mortality tests. It permitted a straight line to be drawn between dosage and mortality and from this critical dosages and mortality levels to be determined.
Dose-mortality tests were tests conducted at laboratory level in order to determine the quantity of a toxicant, such as methyl bromide, required to cause a particular level of mortality of an examined insect (i.e., 50 or 90 per cent of the population).
Confirmatory tests were conducted on a large-scale to confirm that the dose and exposure period derived from smaller tests would provide the level of quarantine treatment required in the field. The principal purpose of a confirmatory test was that by using large numbers of insects, account was taken of any natural variations that might occur within insect populations. This would include the testing of individuals that were more tolerant to methyl bromide than the general population, and which might not be present where much smaller numbers of insects were tested.
According to Mr. Taylor, LD50 values were extremely useful in comparing the toxicity of different chemicals and in the measurement of resistance. However, these values were less useful in investigations of much higher levels of toxic response such as were necessary in relation to quarantine treatments, where LD values of 99 or 99.9 were sought.
Question 2: In Japan's first submission173, it is stated that "experimental error, physical condition of the fruit, sorption of the fumigant by packing material, and load of fruit in the chamber are the factors which scientists should be responsible for controlling in dose-mortality tests. Indeed, scientists who conducted these tests describe that test conditions were consciously made equal". To what extent is it feasible, technically and scientifically, to control such factors? Does the Japanese statement mean that differences in dose mortality tests for different varieties cannot be attributed to any of these factors?
Dr. Ducom noted that although there were controllable factors such as the load of fruit in the chamber, the temperature, the packing material, the geographic and annual climatic differences174, there were other factors that were impossible to control: the physical and physiological conditions of the fruit, ripeness, the precise stage of the insects at the time of treatment, small experimental errors, unexpected leaks in the chamber, etc. Those who carried out experiments were aware that the results of tests varied from one to another without the researchers necessarily understanding why. Nevertheless, if, hypothetically, all the factors mentioned above were identical, then the difference, if one existed, could be attributed to variety.
Dr. Heather pointed out that "experimental error" in this context would be expected to include small errors in measurement, equipment imperfections, ambient conditions and biological response variation in test organism populations. It could be minimized and standardized from test to test but would always be present to some degree.
Variation in "physical condition of the fruit" from test to test could be minimized but not eliminated. Handling injury, range of ripeness and maturity, and the need to have fruit in a condition susceptible to infestation at levels required for experimentation all contributed to unavoidable variation in the physical condition of the fruit. This could have some effect on sorption levels despite best efforts to standardise it.
"Sorption of the fumigant by packaging material" and the walls of the chamber could be standardized by researchers but some variability would always remain.
"Load in the test chamber" could be standardized by the control of fruit size, weight and number, but again small levels of variation would be unavoidable.
Any experimental result would have background variation. It was usual to standardize procedures ("consciously made equal") as far as possible but always, some variation would remain. Its presence could be taken as evidence of the integrity of the experimenters. Statistical analyses were used to minimize the effects but it would not be possible to eliminate them totally. Dose-mortality was a bioassay and as such was relatively imprecise compared to a physical measurement, even when measuring the direct effect on an organism.
Mr. Taylor noted that the Japanese position appeared to be that all tests should be conducted under such standardized conditions that any physical differences arising between tests, including those of the fruit, should be accounted for in the experimental procedure. Whilst it might be expected that conditions such as temperature, atmospheric pressure, loading, and even the type and condition of packing material could be controlled very accurately in test programmes, it was difficult to state with exact and absolute confidence that none of these factors could ever affect the results of tests. For this reason it would appear to be too dogmatic to state that differences in dose-mortality could not be attributed to any of the physical factors.
Question 3: Some of the results derived from dose-mortality tests seem to indicate differences for different varieties of the same commodity tested. The parties indicated that a number of factors may explain these differences175. Is it possible scientifically or technically to determine, by statistical or other methods, the relevant impact of each of these specific factors? If so, with what degree of scientific and/or statistical certainty for each factor? In your expert view, can one determine that varietal difference is one of these factors? On the basis of the results of the dose-mortality tests presented by the parties to the Panel (if appropriate, for each of the commodities tested), as an expert, is it possible to make such a determination?
Dr. Ducom claimed that it was impossible by a simple dose-mortality test to determine the relevant impact of the factors playing a role in varietal differences, mainly because varieties ripened at different times. This had been adequately explained by the United States. 176 The dose-mortality test presented by the parties were designed to give information on insect sensitivity. The search for possible causes of varietal variations could not be determined with precision by them, but only with a specific research program.
Dr. Heather noted that variability of the test organism, the test equipment, the test conditions and the test sample of fruit would all influence differences in LD50 values from one variety to another. However, it was probable that in most of the experiments under discussion, the major source of variability would be differential sorption by the commodity. Although statistical differences were evident between some varietally based experiments, this did not provide an assurance that the origin of the difference lay predominantly in varietal characteristics.
Whether it would be possible to statistically identify the magnitude of the varietally based components required expert biometrical comment.