Appendix C — Code for using and compiling the stream network database

C.1 Using the database

A faster and more flexible way to access the database than through the gpkg files as used throughout this manual is as a postgreSQL database. After some tips on accessing and using the database in R, this appendix details how to build a postgreSQL database from the downloaded gpkg files. There is a compiled postgreSQL version of the database available on the unimelb network. Please contact Chris Walsh for access instructions if you have access to the unimelb network.

C.1.1 Reading tables from postgreSQL (or SQLite) in R

Whether you are accessing the postgreSQL database or reading from the gpkg files as SQLite databases, you first need to create a connection to the database. For postgreSQL:

conn <- RPostgres::dbConnect(drv = RPostgres::Postgres(), 
    dbname = "mwstr", host = [hostname], port = [port],
    user = [username], password = [password])  
#contact author for credentials for unimelb access

Note, if you are working on the unimelb water server, host = “localhost”. If you have compiled your own postgreSQL database, you will need to change the values to match your set up. Using SQLite to connect to the gpkg follows the same concept:

conn <- RSQLite::dbConnect(drv = RSQLite::SQLite(), 
                               "[PATH_TO_YOUR_COPY]/mwstr_v13.gpkg")  

With a connection (in these examples called conn) open, the example sql queries should work. For example you can take the SQL code from the example of extracting the stream reach geometries and their associated Strahler order values in Chapter 4, and use the st_read() command from the sf package to read spatial tables:

str_strahler_subcat <- sf::st_read(conn, 
    paste0("SELECT streams.geometry, subc_env.strahler ",
           "FROM streams JOIN subc_env ON streams.site = subc_env.site ",
           "WHERE streams.sampleable = 1;"))

A different approach is needed to read non-spatial data. To extract non-spatial data from the connected database use the DBI::dbGetQuery() function. Thus, to read just the (non-spatial) site and reach fields from the subcs table:

str_strahler_subcat <- DBI::dbGetQuery(conn,"SELECT site, reach FROM subcs;")

Whole tables can be read without SQL queries:

# e.g, for spatial tables
coast <- sf::st_read(conn, layer = "coast") 
# or for non-spatial tables
stream_names <- DBI::dbReadTable(conn, "stream_names")

Finally, it is good practice to disconnect from the database connection after you have extracted the data you need. Thus:

RPostgres::dbDisconnect(conn)
# OR 
RSQLite::dbDisconnect(conn)

C.1.2 Reading tables from postreSQL in QGIS or ArGIS.

Request the database credentials from the author (if using the unimelb database), and follow these instructions for (QGIS)[https://docs.qgis.org/2.18/en/docs/user_manual/managing_data_source/opening_data.html#creating-a-stored-connection] and (these)[https://pro.arcgis.com/en/pro-app/latest/help/data/databases/connect-postgresql.htm] for ArGIS.

C.2 Compiling a postgreSQL database

The following instructions assume that you have downloaded “mwstr_v13.gpkg” and “mwstr_cats_v13.gpkg” to a local directory, and can read them in R. To create a postgreSQL database, PostgreSQL must first be installed on your local computer or server. On a linux machine:

$ sudo apt install postgresql
$ sudo apt install postgis postgresql-postgis 

On a mac, install it with brew via a terminal:

% brew install postgresql
% brew install postgis
% brew services start postgresql  

Then create two postgreSQL user accounts (as the default superuser account, postgres):

  1. An account with administrative rights with the same user name (called [admin-user-name] below) as you use to log into the computer you are installing postgres on.

  2. a read-only account (name ‘readonly’)

And create a new, empty database, “mwstr”, and then enter postgreSQL .

In the terminal again:

% sudo -u postgres createuser readonly
% sudo -u postgres createuser [admin-user-name] 
% sudo -u postgres createdb mwstr

Where [admin-user-name] equals your user name on the server/local computer that you are using. Then enter the psql program and set a strong password for [admin-user-name] and give the account superuser privileges

% sudo -u postgres psql

postgres=# \connect mwstr
postgres=# alter user [admin-user-name] with encrypted password 'INSERT_STRONG_PASSWORD_HERE';
postgres=#  alter user [admin-user-name] with SUPERUSER;
postgres=# alter user readonly with encrypted password 'reachcode_42';

Type to quit the psql interface and return to bash.

postgres=# \q

To avoid having to enter your password repeatedly, create a text file and paste the following single line into it (username and password as set above):

localhost:5432:*:[admin-user-name]:[INSERT_STRONG_PASSWORD_HERE]

Save the text file to your home directory as “~/.pgpass”

And then in terminal set permissions for this file so that only [admin-user-name] can read it or write to it.

$ touch ~/.pgpass
$ chmod 0600 ~/.pgpass

Now re-enter psql, to add necessary extensions to mwstr database, and set readonly privileges to the readonly account

  $ sudo -u postgres psql
  
  postgres=# \connect mwstr
  mwstr=# CREATE EXTENSION postgis;
  mwstr-# CREATE EXTENSION pg_trgm;
  mwstr-# CREATE EXTENSION fuzzystrmatch;
  mwstr-# GRANT CONNECT ON DATABASE mwstr TO readonly;
  mwstr-# GRANT USAGE ON SCHEMA public TO readonly;
  mwstr-# GRANT SELECT ON ALL TABLES IN SCHEMA public TO readonly;
  mwstr-# ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO readonly;

check user privileges

mwstr-# \du

And check database status

mwstr~# \l

The above process has many potential traps. Seek internet advice if you fall into any of them. With an empty mwstr database set up on postgreSQL, and with the files “mwstr_v_1_3.gpkg” and “mwstr_v_cats_1_3.gpkg” downloaded from the mwstr web site to the working directory, the following R code reads the data from the downloaded files, writes them to the database, and sets constraints between tables.

With a local version of the mwstr postgreSQL database, the snapSiteToReach() in mwstr_network_functions.R, which requires root access to the database, will be available to you. While raster tables can be loaded into postgreSQL databases, I have found that reading them from the database in QGIS is problematic, and reading them from the database in R is slower than reading the tif files directly. I recommend the terra package in R for reading and manipulating the raster data.

library(RSQLite); library(sf); library(RPostgreSQL)
source("https://tools.thewerg/data/mwstr_v13/mwstr_network_functions.R")
mwstr_dir <- "directory_of_choice/mwstr_v13/"

# Connect to the geopackage and to the empty postgres mwstr database
  mwstr_gpkg <- RSQLite::dbConnect(RSQLite::SQLite(), 
                                  dbname = paste0(mwstr_dir,"mwstr_v13.gpkg"))
# This connection command to postgres assumes ~/.pgass is populated with
  # credentials of postgreSQL superuser (as above)
  mwstr <- RPostgres::dbConnect(RPostgres::Postgres(), dbname = "mwstr",
                                  host = "localhost", port = 5432)
  #Use sf to read and write spatial tables (SQLite for non-spatial tables)
  streams <- sf::st_read(mwstr_gpkg, layer = "streams")
  sf::st_crs(streams) <- 7855
  sf::st_write(streams,mwstr,"streams")
  RPostgreSQL::dbSendQuery(mwstr,
                           paste("ALTER TABLE streams",
                                 "ALTER COLUMN reach TYPE VARCHAR(10),",
                                 "ALTER COLUMN mi_prinx TYPE INTEGER,",
                                 "ALTER COLUMN type TYPE VARCHAR(64),",
                                 "ALTER COLUMN site TYPE INTEGER,",
                                 "ALTER COLUMN nextds TYPE INTEGER,",
                                 "ALTER COLUMN hydroid TYPE INTEGER,",
                                 "ALTER COLUMN nextdownid TYPE INTEGER,",
                                 "ALTER COLUMN strcode TYPE VARCHAR(3),",
                                 "ALTER COLUMN botseg TYPE INTEGER,",
                                 "ALTER COLUMN reach_v11 TYPE VARCHAR(10),",
                                 "ALTER COLUMN reach_v11_match TYPE INTEGER,",
                                 "ALTER COLUMN seg_order TYPE INTEGER,",
                                 "ALTER COLUMN sampleable TYPE INTEGER;"))
  rm(streams)
  
  stream_names <- RSQLite::dbReadTable(mwstr_gpkg, "stream_names")
  RPostgreSQL::dbWriteTable(mwstr,"stream_names",stream_names,row.names = FALSE)
  RPostgreSQL::dbSendQuery(mwstr,
                           paste("ALTER TABLE stream_names",
                                 "ALTER COLUMN strcode TYPE VARCHAR(3),",
                                 "ALTER COLUMN termreach TYPE INTEGER,",
                                 "ALTER COLUMN headreach TYPE INTEGER;"))
  rm(stream_names)
  
  subcs <- sf::st_read(mwstr_gpkg, layer = "subcs")
  sf::st_crs(subcs) <- 7855
  sf::st_write(subcs, mwstr,"subcs")
  RPostgreSQL::dbSendQuery(mwstr,
                           paste("ALTER TABLE subcs",
                                 "ALTER COLUMN reach TYPE VARCHAR(10),",
                                 "ALTER COLUMN site TYPE INTEGER,",
                                 "ALTER COLUMN nextds TYPE INTEGER,",
                                 "ALTER COLUMN strcode TYPE VARCHAR(3),",
                                 "ALTER COLUMN agg_order TYPE INTEGER,",
                                 "ALTER COLUMN nus TYPE INTEGER,",
                                 "ALTER COLUMN nds TYPE INTEGER,",
                                 "ALTER COLUMN reach_v11 TYPE VARCHAR(10);"))
  rm(subcs)
  
  # Set constraints between first three tables
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE streams ADD PRIMARY KEY (hydroid);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subcs ADD PRIMARY KEY (site);")
  RPostgreSQL::dbSendQuery(mwstr, "CREATE UNIQUE INDEX CONCURRENTLY site ON subcs (site);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE streams ADD CONSTRAINT site FOREIGN KEY (site) REFERENCES subcs (site);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE stream_names ADD PRIMARY KEY (strcode);")
  RPostgreSQL::dbSendQuery(mwstr, "CREATE UNIQUE INDEX CONCURRENTLY strcode ON stream_names (strcode);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE streams ADD CONSTRAINT strcode FOREIGN KEY (strcode) REFERENCES stream_names (strcode);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subcs ADD CONSTRAINT strcode FOREIGN KEY (strcode) REFERENCES stream_names (strcode);")
  
  subc_pps <- sf::st_read(mwstr_gpkg, layer = "subc_pps")
  sf::st_crs(subc_pps) <- 7855
  sf::st_write(subc_pps, mwstr,"subc_pps")
  RPostgreSQL::dbSendQuery(mwstr,
                           paste("ALTER TABLE subc_pps",
                                 "ALTER COLUMN site TYPE INTEGER;"))
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subc_pps ADD PRIMARY KEY (site);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subc_pps ADD CONSTRAINT site FOREIGN KEY (site) REFERENCES subcs (site);")
  rm(subc_pps)
  
  mwassets <- RSQLite::dbReadTable(mwstr_gpkg, "mwassets")
  RPostgreSQL::dbWriteTable(mwstr,"mwassets",mwassets,row.names = FALSE)
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE mwassets ADD PRIMARY KEY (mi_prinx);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE streams ADD CONSTRAINT mi_prinx FOREIGN KEY (mi_prinx) REFERENCES mwassets (mi_prinx);")
  rm(mwassets)
  
  cat_env <- RSQLite::dbReadTable(mwstr_gpkg, "cat_env")
  RPostgreSQL::dbWriteTable(mwstr,"cat_env",cat_env,row.names = FALSE)
  RPostgreSQL::dbSendQuery(mwstr,
                           paste("ALTER TABLE cat_env",
                                 "ALTER COLUMN reach TYPE VARCHAR(10),",
                                 "ALTER COLUMN site TYPE INTEGER;"))
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE cat_env ADD PRIMARY KEY (site);")
  RPostgreSQL::dbSendQuery(mwstr, "CREATE UNIQUE INDEX CONCURRENTLY site_cat_env ON cat_env (site);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE cat_env ADD CONSTRAINT site FOREIGN KEY (site) REFERENCES subcs (site);")
  rm(cat_env)
  
  subc_env <- RSQLite::dbReadTable(mwstr_gpkg, "subc_env")
  RPostgreSQL::dbWriteTable(mwstr,"subc_env",subc_env,row.names = FALSE)
  RPostgreSQL::dbSendQuery(mwstr,
                           paste("ALTER TABLE subc_env",
                                 "ALTER COLUMN reach TYPE VARCHAR(10),",
                                 "ALTER COLUMN site TYPE INTEGER,",
                                 "ALTER COLUMN mw_subcat TYPE INTEGER,",
                                 "ALTER COLUMN mw_cat TYPE INTEGER,",
                                 "ALTER COLUMN strahler TYPE INTEGER;"))
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subc_env ADD PRIMARY KEY (site);")
  RPostgreSQL::dbSendQuery(mwstr, "CREATE UNIQUE INDEX CONCURRENTLY site_subc_env ON subc_env (site);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subc_env ADD CONSTRAINT site FOREIGN KEY (site) REFERENCES subcs (site);")
  rm(subc_env)
  
  coast <- sf::st_read(mwstr_gpkg, layer = "coast")
  sf::st_crs(coast) <- 7855
  RPostgreSQL::dbWriteTable(mwstr,"coast",coast,row.names = FALSE)
  rm(coast)
  
  # Add estuaries, mw_subcatchments, metadata tables
  estuaries <- sf::st_read(mwstr_gpkg, layer = "estuaries")
  sf::st_crs(estuaries) <- 7855
  sf::st_write(estuaries,mwstr,"estuaries")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE estuaries ADD PRIMARY KEY (strcode);")
  RPostgreSQL::dbSendQuery(mwstr, "CREATE UNIQUE INDEX CONCURRENTLY strcode_estuaries ON estuaries (strcode);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE estuaries ADD CONSTRAINT strcode FOREIGN KEY (strcode) REFERENCES stream_names (strcode);")
  
  mw_subcatchments <- sf::st_read(mwstr_gpkg, layer = "mw_subcatchments")
  sf::st_crs(mw_subcatchments) <- 7855
  sf::st_write(mw_subcatchments,mwstr,"mw_subcatchments")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE mw_subcatchments ADD PRIMARY KEY (mw_subcat);")
  RPostgreSQL::dbSendQuery(mwstr, "CREATE UNIQUE INDEX CONCURRENTLY mw_subcat_mw_subcat ON mw_subcatchments (mw_subcat);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subc_env ADD CONSTRAINT mw_subcat FOREIGN KEY (mw_subcat) REFERENCES mw_subcatchments (mw_subcat);")
  
  mwstr_cats_gpkg <- RSQLite::dbConnect(RSQLite::SQLite(), 
                          dbname = paste0(mwstr_dir,"mwstr_cats_v13.gpkg"))
  cats <- sf::st_read(mwstr_cats_gpkg, "cats")
  sf::st_crs(cats) <- 7855
  sf::st_write(cats,mwstr,"cats")
  RPostgreSQL::dbSendQuery(mwstr,
                         paste("ALTER TABLE cats",
                               "ALTER COLUMN site TYPE INTEGER,",
                               "ALTER COLUMN nextds TYPE INTEGER,",
                               "ALTER COLUMN reach TYPE VARCHAR(10);"))
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE cats ADD PRIMARY KEY (site);")
  RPostgreSQL::dbSendQuery(mwstr, "CREATE UNIQUE INDEX CONCURRENTLY site_cat ON cats (site);")
  RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE cats ADD CONSTRAINT site FOREIGN KEY (site) REFERENCES subcs (site);")

  metadata <- RSQLite::dbReadTable(mwstr_gpkg, "metadata")
  RPostgreSQL::dbWriteTable(mwstr,"metadata",metadata,row.names = FALSE)
  
  DBI::dbDisconnect(mwstr_gpkg); DBI::dbDisconnect(mwstr_cats_gpkg)
  DBI::dbDisconnect(mwstr)

# # # Constraints to remove in order to drop tables in the hierarchy of dependencies
# # # (in case you need to re-write the database)
# RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subc_env DROP CONSTRAINT site;")
# RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE c_env DROP CONSTRAINT site;")
# RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subc_pps DROP CONSTRAINT site;")
# RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE cats DROP CONSTRAINT site;")
# RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subc_allus_ds DROP CONSTRAINT site;")
# RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE streams DROP CONSTRAINT mi_prinx;")
# RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE streams DROP CONSTRAINT strcode;")
# RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE streams DROP CONSTRAINT site;")
# RPostgreSQL::dbSendQuery(mwstr, "ALTER TABLE subcs DROP CONSTRAINT strcode;")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table subc_env")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table c_env")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table mw_subcatchments")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table mwassets")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table streams")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table stream_names")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table subcs")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table cats")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table coast")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table subc_pps")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table estuaries")
# RPostgreSQL::dbSendQuery(mwstr, "DROP table metadata")