3  Sites, environmental data, and photos

3.1 sites table

As of this report’s publication, the sites table contained locational information on 1,260 sites across the region. It is a spatial table, with a geometry field (geom), but the Map Grid of Australia 2020 Zone 55 coordinates (crs 7855) of each site are also recorded in the mgae_7855 and mgan_7855 fields.

The sites table is linked to the Melbourne Water Stream Network database by the reach and strcode fields (streams table in the stream network database), and the str_nm field (in the stream_names table) (Walsh 2023). reach values comprise a three-character stream code (strcode) and a (usually) four-digit code equal to 1000 times log10(catchment area in 100-m2), so that codes beginning with 2 indicate total upstream catchment areas between 0.1 and 1 km2, codes beginning with 3 indicate 1-10 km2, codes beginning with 4 indicate 10-100 km2, etc. For instance, the reach of the Yarra River at Whittons Reserve, Wonga Park (YAR-7350) has a catchment area of 2239 km2.

These reach codes are from version 1.1 of the stream network (Kunapo, Walsh, and Sammonds 2019), and have been retained for historical consistency. However, each site also has a reach_v12 code, which refers to versions 1.2 and 1.3 of the stream network (Walsh 2023), which introduced many new stream lines and altered the split of many reach subcatchments, thus altering catchment area values slightly. reach_v12 codes separate strcode and catchment area with underscores rather than hyphens: thus, for example, STV-2279, whose subcatchment split was unchanged became STV_2279, while STW-788 became STW_642. The reach_v12 codes should be used to associate sites with the up-to-date environmental information in version 1.3 of the stream network database.

Stream codes are generally intuitive abbreviations of the full stream name, and the stream network database stream_names table lists unique names for 31,397 streams, including clarifiers to ensure that ambiguously named streams have unique names (e.g. the eight Deep Creeks in the region: see Kunapo, Walsh, and Sammonds (2019)). Names of streams can be explored at the stream network web site https://tools.thewerg.unimelb.edu.au/mwstr/.

Unique sitecodes in the sites table are achieved by combining reachcode with the single integer dus field, which is the decile (0-9) of distance upstream from the bottom of each reach. A site at the bottom of a reach has a dus of 0, and one at the top of the reach 9. Thus throughout the region, sitecodes allow identification of the site location within <20m in most reaches (median reach length = 176 m). To distinguish sitecodes (based on version 1.1 of the stream network) from sitecode_v12 (versions 1.2 and 1.3 of the network), the distance upstream is indicated by a lower case letter between a and j: thus sitecode STV-2279-5 equals sitecode_v12 STV_2279f.

The fields location and alt_location are, in most cases, transcribed from earlier versions of the database or from original source data. Many of the location descriptions are insufficient to be certain about reach locations, particularly upstream or downstream of roads. Where sampled reaches have been confirmed by field collectors, location descriptions have been clarified.

The fields old_sitecode and subc refer to site codes used in earlier versions of the database and subcatchments in the earlier version of the stream network subcatchment spatial layer (Grace Detailed-GIS Services 2012). However, note that many reaches in the new network have no equivalent in the less accurate earlier version: for such reaches subc is NA. The lga field identifies the local government authority area in which each site resides, and the mw_subcat and mw_cat fields refer to Melbourne Water’s management units, which they refer to as ‘subcatchments’ and ‘catchments’ (see Walsh 2023).

3.2 Environmental data for sites

Environmental data for all sites in the sites table can be accessed from the Melbourne Stream Network Database. The wide range of environmental variables in that database are described in full in its user manual (Walsh 2023): environmental variables are stored in two tables: reach-scale variables are stored in the table subc_env, and catchment-scale variables in the table cat_env.

The following is example code for extracting environmental data for a subset of sites, which can be adapted for other sites and environmental variables. It assumes you have established a connection with the mwbugs database named db_m (see section 2.2, and a connection to the mwstr database named mwstr_db (see the mwstr technical tips page). Note that the SQL query selects variables of interest from the cat_env table (ei_2018, meanq_mm), the subc_env table (n_part_barriers_ds_2023) and the subcs table (carea_km2), buy joining the three tables. You could equally select all of the three tables and manipulate them in R.

# Subset samples table based on project_code (= 1)
samples_proj1 <- DBI::dbGetQuery(db_m, 
    "SELECT * FROM samples JOIN sample_project_groups
     ON samples.smpcode = sample_project_groups.smpcode
     WHERE project_code = 1;")

# Subset the sites table (a spatial table) from samples_proj1
sites_proj1 <- sf::st_read(db_m, query = 
      paste0("SELECT * FROM sites WHERE sitecode IN ('",
             paste(samples_proj1$sitecode, collapse = "', '"),

# Select selected environmental data from those sites
# Noting that the mwstr database uses reach_V12 codes
# (Hint: run just the paste0() part to see the full SQL query)
sites_proj1_env <- DBI::dbGetQuery(mwstr_db, 
    paste0("SELECT cat_env.reach as reach_v12, cat_env.ei_2018, ",
            "cat_env.meanq_mm, subcs.carea_km2, ",
            "subc_env.n_part_barriers_ds_2023 ", 
            "FROM cat_env ",
            "JOIN subc_env ON cat_env.site = subc_env.site ",
            "JOIN subcs ON cat_env.site = subcs.site ",
            "WHERE cat_env.reach IN ('", 
            paste(sites_proj1$reach_v12, collapse = "', '"), "');"))

The variables stored in the cat_env and subc_env tables were calculated for the bottom point in each reach. Because reaches in the stream database are split at and between confluence boundaries, for most reach-scale and catchment-scale variables, the value of the bottom of the reach is a satisfactory estimate for all points along the reach. This is not the case for attenuated forest cover (AF, af_ variables in the cat_env table). Because it is calculated by weighting forest cover upstream, AF can vary along a reach with varying forest cover. I thus calculated AF variables for every sitecode in the database (i.e. at the position of the site along the reach) using following code:

sites <- DBI::dbReadTable(db_m, "sites")
  af_all_sites <- data.frame(sitecode_v12 = sites$sitecode_v12[1],
                      reach_v12 = sites$reach_v12[1],
                      t(af_site(sitecode_or_reach = sites$sitecode_v12[1], 
                                prepared_data = prep_data, max_us = 6000)),
                      af_2018_reach = af_site(sites$reach_v12[1], prep_data)[3], max_us = 6000)
  for(i in 2:nrow(sites)){
    af_all_sites <- rbind(af_all_sites,
                   data.frame(sitecode_v12 = sites$sitecode_v12[i],
                              reach_v12 = sites$reach_v12[i],
                              t(af_site(sitecode_or_reach = sites$sitecode_v12[i], 
                                        prepared_data = prep_data, max_us = 6000)),
                              af_2018_reach = af_site(sites$reach_v12[i], prep_data)[3], max_us = 6000))
})  # ~ 5.5 h

I added the resulting table to the mwbugs database as table sitecode_env. The af_variables for the above selected sites can be selected and added to the sites_proj1_env data.frame created above. Here, I add only af_2018, but the table also contains af_1750, af_2006, and af_2022 (however, note the caveats about these variables as described in Walsh (2023) Chapter 7).

af <- DBI::dbGetQuery(db_m, 
    paste0("SELECT sitecode_v12, af_2018 FROM sitecode_env ",
           "WHERE sitecode_v12 IN ('", 
            paste(sites_proj1$sitecode_v12, collapse = "', '"), "');"))
# reach_v12 is sitecode_v12 without the last letter
af$reach_v12 <- substr(af$sitecode_v12, 1, nchar(af$sitecode_v12)-1)
sites_proj1_env$af_2018 <- af$af_2018[match(sites_proj1_env$reach, 
sites_proj1_env <- dplyr::mutate(sites_proj1_env, 
                  sitecode = sites_proj1$sitecode[match(sites_proj1_env$reach_v12, 
                                 .before = reach_v12)

The resulting table begins like this:

Table 3.1: First 6 rows of environmental variable table created with the above code (values rounded).


















































3.3 site_photos table

Since 2020, sample site locations have been confirmed by geocoded photos taken at the bottom of the sampling reach looking upstream, and the top of the sampling reach looking downstream. The site_photos table lists the photos for a subset of sites in the database (most of those sampled since 2020), with geographic and other information extracted from the metadata associated with each photo (using the program exiftool). The nearest sitecode to each photo is recorded as sitecode and sitecode_v12, and the sitecode in the sites table corresponding to the photos is recorded as sample_sitecode. Note that the site_photos table contains 184 photos from sites not in the sites table. These are mostly from a study of longitudinal water quality along the Yarra River, and from small headwater catchments.

The photo_source table groups photos listed in the site_photos table by the photo_sourcecode field, and records the directory in which each group of photos is stored on the server. See the downloads page for code to permit downloading the full set of photos.