# translate a date into a monthcode
calcMonthDate(lubridate::ymd("1995-12-10"))
[1] 72
# translate a monthcode to a date
calcMonthNo(72)
[1] "Dec 1995"
As of this report’s publication, the samples table contained details of 9,320 from 1,010 sites.
The primary field of the samples table is smpcode
: a unique code comprising hyphen-separated monthcode
, sitecode
, replicate number, and a two-character code indicating collection and processing methods (ccode and pcode: see below).
For instance smpcode
072-GRD-5879-0-1-BH signifies a sample collected in month 072 (Dec 1995) from Gardiners Creek (GRD) at Solway Street Ashburton, where its catchment area is 5.879 km2, zero indicating that the site is at the bottom of the reach). The sample is a single sample (or possibly the first replicate taken at the site) from a riffle using a standard bioassessment kick net, lab-sorted to a 300-count subsample, identified to lowest taxon. The conventions for these calculations are explained below.
sitecode
is matched to the unique sitecodes
in the sites table: see the section above for an explanation of the sitecode logic. The month code is a three-digit integer indicating the number of months since December 1989 (e.g. Jan 1990 = 01, Jan 1992 = 25). The date that the sample was taken is recorded in the date field. monthcodes
can be calculated in R from dates and translated back into dates using two functions in the file “bug_database_functions.R”.
# translate a date into a monthcode
calcMonthDate(lubridate::ymd("1995-12-10"))
[1] 72
# translate a monthcode to a date
calcMonthNo(72)
[1] "Dec 1995"
The replicate
field is used to distinguish samples taken from the same habitat using the same collection method in the same month. For most samples there are no such replicates, and replicate
= 1.
Collection method distinguishes both the collection method and the habitat sampled. Each collection method is represented in the smpcode
by a single character (ccode
). The collection_method
field links to the collection_methods table, which details all collection methods recorded in the database (with potential for augmentation with other methods as the database grows: Table 4.1).
ccode | cabb | collection_method | reference | habitat |
---|---|---|---|---|
A | RBA-E- | RBA edge (sweep) | epa_vic_2003 | edge |
B | RBA-R- | RBA riffle (kick) | epa_vic_2003 | riffle |
C | RBA-C- | RBA composite (sweep-kick) | epa_vic_2003 | edge and riffle |
D | RBA-C- | RBA edge-riffle combined | epa_vic_2003 | edge and riffle |
E | RBA-E- | RBA two edges combined | epa_vic_2003 | edge |
F | H-B- | Hess sampler, benthos | hess_1941 | benthos |
G | H-B- | micro-Hess sampler, benthos | benthos | |
H | A-B- | airlift, benthos, single sample | hellawell_1978 | benthos |
I | S-W- | snag bag, large woody debris, single sample | growns_etal_1999 | large wooody debris |
J | A-M- | airlift, benthos, samples combined | hellawell_1978 | benthos |
K | S-W- | snag bag, large woody debris, samples combined | growns_etal_1999 | large woody debris |
L | B-B- | Boulton suction sampler, benthos, single sample | boulton_1985 | benthos |
M | B-B- | Boulton suction sampler, benthos, samples combined | boulton_1985 | benthos |
N | S-W- | sweep and jab over natural large woody debris | ghd_2013 | large woody debris |
O | S-W- | sweep and jab over artificial large woody debris | ghd_2013 | large woody debris |
Processing method distinguishes both the method used to sort the sample following collection and the taxonomic resolution to which the sample was identified. Each processing method is represented in the samplecode by a single character (pcode
). The processing_method
field links to the processing_methods table, which details all processing methods recorded in the database (with tentative, as yet unused fields for anticipated new data arising from DNA methods: Table 4.2).
pcode | pabb | processing_method | reference | sort | taxonomic_resolution |
---|---|---|---|---|---|
A | F-F | 30-min field-sort, ID to family | epa_vic_2003 | field | family |
B | L-F | lab-subsample to 200, ID to family | walsh_1997 | lab | family |
C | L-F | lab-subsample to 300, ID to family | walsh_1997 | lab | family |
D | L-F | lab-subsample to 400, ID to family | walsh_1997 | lab | family |
C | L-G | lab-subsample to 200, ID to genus | walsh_1997 | lab | genus |
E | L-G | lab-subsample to 400, ID to genus | walsh_etal_2007 | lab | genus |
F | F-L | 30-min field-sort, ID to lowest taxon | epa_vic_2003 | field | lowest |
G | L-L | lab-subsample to 200, ID to lowest taxon | walsh_1997 | lab | lowest |
H | L-L | lab-subsample to 300, ID to lowest taxon | walsh_1997 | lab | lowest |
I | L-F | lab complete sort, ID to family | lab | family | |
K | L-L | lab complete sort, ID to lowest taxon | lab | lowest | |
R | R-F | residue of subsampling to family | |||
S | R-L | residue of subsampling to lowest taxon | |||
N | N | no sample taken | |||
O | L-G | lab-subsample to 300, ID to genus | lab | genus | |
J | L-G | lab complete sort, ID to genus | lab | genus | |
L | L-D | lab-subsample to 300, DNA barcoding ID to species | carew_etal_2018 | lab | species |
M | L-D | bulk processing of 2 combined samples, DNA barcoding ID to species | carew_etal_2023 | lab | species |
For quantitative collection methods, the area sampled is recorded in the samples table field area_m2
. For some quantitative sampling methods, replicate samples were combined before processing. In such cases (e.g. the airlift and snag-bag samples of the Yarra Ecological study, C. J. Walsh et al. 2007), the number of sample units in the combined sample is recorded in the field nsamples
, and the area_m2
field records the total area sampled by all sample units. For samples that have been subsampled, the percentage subsample is recorded in the subsample_perc
table. It is important to note that the counts recorded in the biota table are raw counts: total abundance in subsampled samples can be approximately estimated as count*100/subsample_perc
. However, a better way to use count
and subsample_perc
data is to model subsample error in any statistical model of taxon counts: see C. J. Walsh et al. (2023) for an example.
Both the collection_methods and processing_methods tables include a more intuitive abbreviation field (cabb
and pabb
, respectively). These abbreviations are used by the web interface (see below) to summarise the methods used for collections of samples.
The sourcecode
field links to the field of the same name in sample_provenance table. Samples, as much as possible, have been allocated to a single project which commissioned their collection, and the sample_provenance table lists relevant details about that project including the name of the project, publications arising from each project (bibtex citations linked to mwbugs.bib1), information on the source and format of the data as supplied before it was entered into the database, and the laboratory that collected and processed each project. For some sets of samples, allocation to individual projects was not simple. For instance, samples that have been allocated to Phase III of the Little Stringybark Creek project (sourcecode 3), were also collected as part of MW’s annual monitoring program. In such cases projects were allocated on the basis of the most likely avenue for publication using the data.
Data for some samples will not be included in the publicly available version of the data if preparation of publications using the data is in progress. Such samples will be identified in the samples table using the embargoed
field. If the sample data are being held back for publication embargoed
= 1.
The fields old_sitecode
and old_samplecode
are included in the samples table to maintain a link with old versions of the database.
The samples table contains three metrics summarizing macroinvertebrate assemblage composition in each sample: SIGNAL2 [signal2
; Chessman (2003)], and SIGNAL SEPP Waters of Victoria [signal_wov2003
; EPA Victoria (2004)], and Number of EPT families (n_ept_fam
).
samppr
codes in the samples table group pairs of samples collected from the same site on the same day, or in some cases in consecutive seasons. The combined family accurrences in each samppr were used to calculate the range of metrics calculated by the melbstreambiota R package (C. Walsh C J 2019) and are compiled in the sample_pairs table. LUMAR is a macroinvertebrate metric [walsh_2023a] that is used to assess stream condition by Melbourne Water’s 2018 Healthy Waterways Strategy. Figure 4.1 uses the lumar_plot() function from bug_database_functions.R (see the data downloads page of the database website) to plot a time series of LUMaR scores for the long-term monitoring site on the Yarra River at Templestowe, showing a decline in condition associated with increased urban development in its catchment.
# Find reach of Yarra at Fitzsimons Lane, Templestowe
<- sites[sites$strcode == "YAR" & grepl("Fitzsimons", sites$location),]
site_yar #lumarPlot function in bug_database_functions.R
lumar_plot(site_yar$sitecode)
mwbugs.bib is available from the database download page of the database web site.↩︎