Appendix B — Code for using and compiling the stream network database

B.1 Using the database

A faster and more flexible way to access the database than through the gpkg file as used throughout this manual is as a postgreSQL database. After some tips on accessing and using the database in R, this appendix details how to build a postgreSQL database from the downloaded gpkg files. There is a compiled postgreSQL version of the database available on the unimelb network. Please contact Chris Walsh for access instructions if you have access to the unimelb network.

B.1.1 Reading tables from postgreSQL (or SQLite) in R

Whether you are accessing the postgreSQL database or reading from the gpkg files as SQLite databases, you first need to create a connection to the database. For postgreSQL:

conn <- RPostgreSQL::dbConnect(drv = RPostgres::Postgres(), 
    dbname = "mwbugs", host = [hostname], port = [port],
    user = [username], password = [password])  
#contact author for credentials for unimelb access

Note, if you are working on the unimelb water server, host = “localhost”. If you have compiled your own postgreSQL database, you will need to change the values to match your set up. Using SQLite to connect to the gpkg follows the same concept:

conn <- RSQLite::dbConnect(drv = RSQLite::SQLite(), 
                               "[PATH_TO_YOUR_COPY]/mwbugs_public.gpkg")  

With a connection (in these examples called conn) open, the example sql queries should work. To run queries that return the geographic information in the two spatial tables in the database (sites and site_photos), used the st_read() command from the sf package. For instance, to select all sites that have been sampled on Taylors Creek:

tay_sites <- sf::st_read(conn, query = "SELECT * FROM sites WHERE str_nm LIKE 'TAYLORS%';")

A different approach is needed to read non-spatial data. To extract non-spatial data from the connected database use the DBI::dbGetQuery() function. Thus, to read just the (non-spatial) site and reach fields from the sites table:

site_reach <- DBI::dbGetQuery(conn, "SELECT sitecode, reach, sitecode_v12, 
                              reach_v12 FROM sites;")

Whole tables can be read without SQL queries:

# e.g, for spatial tables
sites <- sf::st_read(conn, layer = "sites") 
# or for non-spatial tables
samples <- DBI::dbReadTable(conn, "samples")

Finally, it is good practice to disconnect from the database connection after you have extracted the data you need. Thus:

RPostgres::dbDisconnect(conn)
# OR 
RSQLite::dbDisconnect(conn)

B.1.2 Reading tables from postreSQL in QGIS or ArGIS.

Request the database credentials from the author (if using the unimelb database), and follow these instructions for (QGIS)[https://docs.qgis.org/2.18/en/docs/user_manual/managing_data_source/opening_data.html#creating-a-stored-connection] and (these)[https://pro.arcgis.com/en/pro-app/latest/help/data/databases/connect-postgresql.htm] for ArGIS.

B.2 Compiling a postgreSQL database

The following instructions assume that you have downloaded “mwbugs.gpkg” to a local directory, and can read them in R. To create a postgreSQL database, PostgreSQL must first be installed on your local computer or server. On a linux machine:

$ sudo apt install postgresql
$ sudo apt install postgis postgresql-postgis 

On a mac, install it with brew via a terminal:

% brew install postgresql
% brew install postgis
% brew services start postgresql  

Then create two postgreSQL user accounts (as the default superuser account, postgres):

  1. An account with administrative rights with the same user name (called [admin-user-name] below) as you use to log into the computer you are installing postgres on.

  2. a read-only account (name ‘readonly’)

And create a new, empty database, “mwstr”, and then enter postgreSQL .

In the terminal again:

% sudo -u postgres createuser readonly
% sudo -u postgres createuser [admin-user-name] 
% sudo -u postgres createdb mwbugs

Where [admin-user-name] equals your user name on the server/local computer that you are using. Then enter the psql program and set a strong password for [admin-user-name] and give the account superuser privileges

% sudo -u postgres psql
postgres=# \connect mwbugs
postgres=# alter user [admin-user-name] with encrypted password 'INSERT_STRONG_PASSWORD_HERE';
postgres=#  alter user [admin-user-name] with SUPERUSER;
postgres=# alter user readonly with encrypted password 'reachcode_42';

Type to quit the psql interface and return to bash.

postgres=# \q

To avoid having to enter your password repeatedly, create a text file and paste the following single line into it (username and password as set above):

localhost:5432:*:[admin-user-name]:[INSERT_STRONG_PASSWORD_HERE]

Save the text file to your home directory as “~/.pgpass”

And then in terminal set permissions for this file so that only [admin-user-name] can read it or write to it.

$ touch ~/.pgpass
$ chmod 0600 ~/.pgpass

Now re-enter psql, to add necessary extensions to mwstr database, and set readonly privileges to the readonly account

  $ sudo -u postgres psql
  
  postgres=# \connect mwbugs;
  mwbugs=# CREATE EXTENSION postgis;
  mwbugs-# CREATE EXTENSION pg_trgm;
  mwbugs-# CREATE EXTENSION fuzzystrmatch;
  mwbugs-# GRANT CONNECT ON DATABASE mwbugs TO readonly;
  mwbugs-# GRANT USAGE ON SCHEMA public TO readonly;
  mwbugs-# GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;
  mwbugs-# ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO readonly;

check user privileges

mwbugs-# \du

And check database status

mwbugs~# \l

The above process has many potential traps. Seek internet advice if you fall into any of them. With an empty mwbugs database set up on postgreSQL, and with the file “mwbugs.gpkg” downloaded from the mwbugs web site to the working directory, the following R code reads the data from the downloaded files, writes them to the database, and sets constraints between tables.

library(RSQLite); library(sf); library(RPostgreSQL)
mwbugs_dir <- "~/uomShare/wergStaff/ChrisW/git-data/mwbugs"  
# Set to your download directory
# Connect to the geopackage and read all tables to be written to postgres
mwbugs_gpkg <- RSQLite::dbConnect(RSQLite::SQLite(), 
                                  dbname = paste0(mwbugs_dir,"/mwbugs.gpkg"))
# Read all files in the database into R
tables <- DBI::dbListTables(mwbugs_gpkg)
tables <- tables[!grepl("gpkg",tables)& !grepl("geom",tables)]
tables_spatial <- 
  DBI::dbGetQuery(mwbugs_gpkg, "select * from gpkg_geometry_columns;")$table_name
tables_non_spatial <- tables[!tables %in% tables_spatial]
for(i in 1:length(tables_spatial)){
assign(tables_spatial[i], sf::st_read(mwbugs_gpkg,tables_spatial[i], crs=7855))
  }
for(i in 1:length(tables_non_spatial)){
assign(tables_non_spatial[i],DBI::dbReadTable(mwbugs_gpkg,tables_non_spatial[i]))
}
DBI::dbDisconnect(mwbugs_gpkg)

# Dates may have been saved as numeric values (Check if so before running)
samples$date <- as.Date(samples$date, origin = lubridate::origin)
sample_pairs$date <- as.Date(sample_pairs$date, origin = lubridate::origin)

# Connect to empty postgres database
# This connection command to postgres assumes ~/.pgass is populated with
  # credentials of postgreSQL superuser (as above)
mwbugs_db <- RPostgres::dbConnect(RPostgres::Postgres(), dbname = "mwbugs",
                                  host = "localhost", port = 5432)

# Spatial tables, sites and site_photos
sf::st_write(sites, mwbugs_db, "sites")
sf::st_write(sites, mwbugs_db, "site_photos")

# photo_source
DBI::dbWriteTable(conn = mwbugs_db, name = 'photo_source',
                  value = samples, row.names = FALSE)

# Table samples
DBI::dbWriteTable(conn = mwbugs_db, name = 'samples',
                  value = samples, row.names = FALSE,
field.types = c(smpcode = "varchar(24)",date = "DATE", 
                   sitecode = "varchar(14)",monthcode = "int", area_m2 = "real",
                   replicate = "int",collection_method = "varchar(64)", 
                   processing_method = "varchar(64)",nsamples = "int",
                   subsample_perc = "real", sourcecode = "int", 
                   comment = "varchar(264)", embargoed = "int",
                   old_sitecode = "varchar(24)", old_samplecode = "varchar(24)", 
                   signal_wov2003 = "real", samppr = "varchar(23)"))

# Table spv (sample provenance)
DBI::dbWriteTable(conn = mwbugs_db, name = 'spv', value = spv, row.names = FALSE)

# collection_methods and processing_methods
DBI::dbWriteTable(conn = mwbugs_db, name = "collection_methods",
                  value = collection_methods, row.names = FALSE,
                  field.types = list(ccode = "varchar(1)", cabb = "varchar(12)",
                                     collection_method = "varchar(64)",
                                     reference = "varchar(256)", 
                                     habitat = "varchar(64)",
                                     comment = "varchar(256)")
DBI::dbWriteTable(conn = mwbugs_db, name = "processing_methods",
                  value = processing_methods, row.names = FALSE,
                  field.types = c(pcode = "varchar(1)", pabb = "varchar(12)",
                                     processing_method = "varchar(64)",
                                     reference = "varchar(256)", 
                                     sort = "varchar(64)",
                                     taxonomic_resolution = "varchar(64)",
                                     comment = "varchar(256)")

# Table biota
DBI::dbWriteTable(mwbugs_db, "biota", biota, row.names = FALSE, 
  field.type = c(smpcode="varchar(24)",taxoncode="varchar(8)",count="real", 
                    taxon="varchar(264)",coarsepick="int",notes="varchar(264)", 
                    originalbugcode="varchar(8)",shortcode="varchar(8)"))

# Table sample_pairs
DBI::dbWriteTable(conn = mwbugs_db, name = 'sample_pairs', value = sample_pairs, 
                  row.names = FALSE,
  field.types=c(samppr= "varchar(23)",sitecode="varchar(9)",nriff="int",                          
      nspring="int",process="varchar", date="date", subc="varchar", 
      lumar="real",nSensFams ="real",obs.wt.sensif.prev.nonweed="real",  
      obs.expdiffPos.sensif.weedinvas="real",exp.wt.sensif.prev.nonweed="real",     
      expdiff.prev.sensif.weedinvas="real",obs.expdiffNeg.unexp.weedinvas = "real",  
      obs.wt.A = "real",obs.wt.B = "real", obs.wt.C = "real",  obs.wt.D = "real",                       
      obs.wt.weedy = "real",  unexp.wt.weedy = "real",  unexp.wt.invas = "real",                 
      signal_wov2003 = "real",  signal2 = "real"))

# Taxonomic tables
DBI::dbWriteTable(db_m_pg, "biotic_indices", biotic_indices, row.names = FALSE)
DBI::dbWriteTable(db_m_pg, "bold_id_numbers", bold_id_numbers, row.names = FALSE)
DBI::dbWriteTable(db_m_pg, "morphospp_etc", morphospp_etc, row.names = FALSE)
DBI::dbWriteTable(db_m_pg, "taxon_all", taxon_all, row.names = FALSE)
DBI::dbWriteTable(db_m_pg, "taxon_fam", taxon_fam, row.names = FALSE)
DBI::dbWriteTable(db_m_pg, "taxon_gen", taxon_gen, row.names = FALSE)                 
DBI::dbWriteTable(db_m_pg, "taxon_rank1", taxon_rank1, row.names = FALSE)
DBI::dbWriteTable(db_m_pg, "taxon_spp", taxon_spp, row.names = FALSE)
DBI::dbWriteTable(db_m_pg, "taxon_syn", taxon_syn, row.names = FALSE) 

# Metadata table
DBI::dbWriteTable(db_m_pg, "metadata", metadata, row.names = FALSE, overwrite = TRUE)

# Apply constraints
DBI::dbSendQuery(mwbugs_db, "ALTER TABLE sites ADD PRIMARY KEY (sitecode);")
DBI::dbSendQuery(mwbugs_db, 
              "CREATE UNIQUE INDEX CONCURRENTLY sitecode ON sites (sitecode);")
DBI::dbSendQuery(mwbugs_db, 
  paste0("ALTER TABLE samples ADD CONSTRAINT sitecode FOREIGN KEY ",
  "(sitecode) REFERENCES sites (sitecode);"))
DBI::dbSendQuery(mwbugs_db, "ALTER TABLE samples ADD PRIMARY KEY (smpcode);")
DBI::dbSendQuery(mwbugs_db, 
              "CREATE UNIQUE INDEX CONCURRENTLY smpcode ON samples (smpcode);")
DBI::dbSendQuery(mwbugs_db, 
  paste0("ALTER TABLE biota ADD CONSTRAINT smpcode FOREIGN KEY (smpcode) ",
         "REFERENCES samples (smpcode);"))
DBI::dbSendQuery(mwbugs_db, "ALTER TABLE sample_pairs ADD PRIMARY KEY (samppr);")
# No constraint on samppr for samples because some samples do not have a samppr
DBI::dbSendQuery(mwbugs_db, "ALTER TABLE taxon_all ADD PRIMARY KEY (shortcode);")
DBI::dbSendQuery(mwbugs_db, 
  paste0("ALTER TABLE biota ADD CONSTRAINT shortcode FOREIGN KEY (shortcode) ",
         "REFERENCES taxon_all (shortcode);"))
RPostgreSQL::dbSendQuery(mwbugs_db, 
                         "ALTER TABLE taxon_spp ADD PRIMARY KEY (spcode);")
RPostgreSQL::dbSendQuery(mwbugs_db, 
                         "ALTER TABLE taxon_gen ADD PRIMARY KEY (gencode);")
RPostgreSQL::dbSendQuery(mwbugs_db,
                         "ALTER TABLE taxon_fam ADD PRIMARY KEY (famcode);")
RPostgreSQL::dbSendQuery(mwbugs_db, 
                         "ALTER TABLE taxon_rank1 ADD PRIMARY KEY (t1code);")
RPostgreSQL::dbSendQuery(mwbugs_db, 
                      "ALTER TABLE morphospp_etc ADD PRIMARY KEY (taxoncode);")
RPostgreSQL::dbSendQuery(mwbugs_db, 
    paste0("ALTER TABLE taxon_gen ADD CONSTRAINT taxon_spp FOREIGN KEY ",
           "(gencode) REFERENCES taxon_gen (gencode);"))
RPostgreSQL::dbSendQuery(mwbugs_db, 
    paste0("ALTER TABLE taxon_fam ADD CONSTRAINT taxon_gen FOREIGN KEY ",
           "(famcode) REFERENCES taxon_fam (famcode);"))
RPostgreSQL::dbSendQuery(mwbugs_db, 
    paste0("ALTER TABLE taxon_rank1 ADD CONSTRAINT taxon_fam FOREIGN KEY ",
           "(t1code) REFERENCES taxon_rank1 (t1code);")

# # To delete or overwrite tables, relevant constraints need to be dropped, e.g. 
# DBI::dbSendQuery(mwbugs_db,
#                           "ALTER TABLE samples DROP CONSTRAINT sitecode;")
# DBI::dbSendQuery(mwbugs_db,
#                          "ALTER TABLE biota DROP CONSTRAINT shortcode;")
# DBI::dbSendQuery(mwbugs_db,
#                  "ALTER TABLE biota DROP CONSTRAINT smpcode;")
# DBI::dbSendQuery(mwbugs_db, "DROP TABLE biota;")
# DBI::dbSendQuery(mwbugs_db, "DROP TABLE samples;")
# DBI::dbSendQuery(mwbugs_db, "DROP TABLE sites;")