The R Script associated with this page is available here. Download this file and open it (or copy-paste into a new script) with RStudio so you can follow along.

Summary

  • Access and work with station weather data from Global Historical Climate Network (GHCN)
  • Explore options for plotting timeseries
  • Trend analysis
  • Compute Climate Extremes

Climate Metrics

Climate Metrics: ClimdEX

Indices representing extreme aspects of climate derived from daily data:

alt text

Climate Change Research Centre (CCRC) at University of New South Wales (UNSW) (climdex.org).

27 Core indices

For example:

  • FD Number of frost days: Annual count of days when TN (daily minimum temperature) < 0C.
  • SU Number of summer days: Annual count of days when TX (daily maximum temperature) > 25C.
  • ID Number of icing days: Annual count of days when TX (daily maximum temperature) < 0C.
  • TR Number of tropical nights: Annual count of days when TN (daily minimum temperature) > 20C.
  • GSL Growing season length: Annual (1st Jan to 31st Dec in Northern Hemisphere (NH), 1st July to 30th June in Southern Hemisphere (SH)) count between first span of at least 6 days with daily mean temperature TG>5C and first span after July 1st (Jan 1st in SH) of 6 days with TG<5C.
  • TXx Monthly maximum value of daily maximum temperature
  • TN10p Percentage of days when TN < 10th percentile
  • Rx5day Monthly maximum consecutive 5-day precipitation
  • SDII Simple pricipitation intensity index

Weather Data

Climate Data Online

CDO

CDO

GHCN

ghcn

ghcn

Options for downloading data

FedData package

  • National Elevation Dataset digital elevation models (1 and 1/3 arc-second; USGS)
  • National Hydrography Dataset (USGS)
  • Soil Survey Geographic (SSURGO) database
  • International Tree Ring Data Bank.
  • Global Historical Climatology Network (GHCN)

rNOAA package

Handles downloading data directly from NOAA APIv2.

  • buoy_* NOAA Buoy data from the National Buoy Data Center
  • ghcnd_* GHCND daily data from NOAA
  • isd_* ISD/ISH data from NOAA
  • homr_* Historical Observing Metadata Repository
  • ncdc_* NOAA National Climatic Data Center (NCDC)
  • seaice Sea ice
  • storm_ Storms (IBTrACS)
  • swdi Severe Weather Data Inventory (SWDI)
  • tornadoes From the NOAA Storm Prediction Center

Libraries

library(raster)
library(sp)
library(rgdal)
library(ggplot2)
library(ggmap)
library(dplyr)
library(tidyr)
library(maps)
library(scales)
# New Packages
library(rnoaa)
library(climdex.pcic)
library(zoo)
library(reshape2)
library(broom)

Station locations

Download the GHCN station inventory with ghcnd_stations().

datadir="data"

st = ghcnd_stations()

## Optionally, save it to disk
# write.csv(st,file.path(datadir,"st.csv"))
## If internet fails, load the file from disk using:
# st=read.csv(file.path(datadir,"st.csv"))

GHCND Variables

5 core values:

  • PRCP Precipitation (tenths of mm)
  • SNOW Snowfall (mm)
  • SNWD Snow depth (mm)
  • TMAX Maximum temperature
  • TMIN Minimum temperature

And ~50 others! For example:

  • ACMC Average cloudiness midnight to midnight from 30-second ceilometer
  • AWND Average daily wind speed
  • FMTM Time of fastest mile or fastest 1-minute wind
  • MDSF Multiday snowfall total

filter() to temperature and precipitation

st=dplyr::filter(st,element%in%c("TMAX","TMIN","PRCP"))

Map GHCND stations

First, get a global country polygon

worldmap=map_data("world")

Plot all stations:

ggplot(data=st,aes(y=latitude,x=longitude)) +
  facet_grid(element~.)+
  annotation_map(map=worldmap,size=.1,fill="grey",colour="black")+
  geom_point(size=.1,col="red")+
  coord_equal()

It’s hard to see all the points, let’s bin them…

ggplot(st,aes(y=latitude,x=longitude)) +
  annotation_map(map=worldmap,size=.1,fill="grey",colour="black")+
  facet_grid(element~.)+
  stat_bin2d(bins=100)+
  scale_fill_distiller(palette="YlOrRd",trans="log",direction=-1,
                       breaks = c(1,10,100,1000))+
  coord_equal()

Your turn

Produce a binned map (like above) with the following modifications:

  • include only stations with a data record that starts before 1950 and ends after 2000 (keeping only complete records during that time).
  • include only tmax
ggplot(filter(st,
              first_year<=1950 & 
              last_year>=2000 & 
              element=="TMAX"),
       aes(y=latitude,x=longitude)) +
  annotation_map(map=worldmap,size=.1,fill="grey",colour="black")+
  stat_bin2d(bins=75)+
  scale_fill_distiller(palette="YlOrRd",trans="log",direction=-1,
    breaks = c(1,10,50))+
  coord_equal()

Download daily data from GHCN

ghcnd() will download a .dly file for a particular station. But how to choose?

geocode in ggmap package useful for geocoding place names

Geocodes a location (find latitude and longitude) using either (1) the Data Science Toolkit (http://www.datasciencetoolkit.org/about) or (2) Google Maps.

geocode("University at Buffalo, NY")
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=University%20at%20Buffalo%2C%20NY
##         lon      lat
## 1 -78.78897 43.00081

However, you have to be careful:

geocode("My Grandma's house")
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=My%20Grandma%27s%20house
##         lon      lat
## 1 -104.9874 39.68546

But this is pretty safe for well known places.

coords=as.matrix(geocode("Buffalo, NY"))
## Source : https://maps.googleapis.com/maps/api/geocode/json?address=Buffalo%2C%20NY
coords
##            lon      lat
## [1,] -78.87837 42.88645

Now use that location to spatially filter stations with a rectangular box.

dplyr::filter(st,
              grepl("BUFFALO",name)&
              between(latitude,coords[2]-1,coords[2]+1) &
              between(longitude,coords[1]-1,coords[1]+1)&
         element=="TMAX")
## # A tibble: 3 x 11
##            id latitude longitude elevation state       name gsn_flag
##         <chr>    <dbl>     <dbl>     <dbl> <chr>      <chr>    <chr>
## 1 USC00301010  42.8833  -78.8833    -999.9    NY    BUFFALO         
## 2 USC00301018  42.9333  -78.9000     177.1    NY BUFFALO #2         
## 3 USW00014733  42.9486  -78.7369     211.2    NY    BUFFALO         
## # ... with 4 more variables: wmo_id <chr>, element <chr>,
## #   first_year <int>, last_year <int>

You could also spatially filter using over() in sp package…

With the station ID, we can now download daily data from NOAA.

d=meteo_tidy_ghcnd("USW00014733",
                   var = c("TMAX","TMIN","PRCP"),
                   keep_flags=T)
head(d)
## # A tibble: 6 x 14
##            id       date mflag_prcp mflag_tmax mflag_tmin  prcp qflag_prcp
##         <chr>     <date>      <chr>      <chr>      <chr> <dbl>      <chr>
## 1 USW00014733 1938-05-01          T                           0           
## 2 USW00014733 1938-05-02          T                           0           
## 3 USW00014733 1938-05-03                                     25           
## 4 USW00014733 1938-05-04                                    112           
## 5 USW00014733 1938-05-05          T                           0           
## 6 USW00014733 1938-05-06                                     64           
## # ... with 7 more variables: qflag_tmax <chr>, qflag_tmin <chr>,
## #   sflag_prcp <chr>, sflag_tmax <chr>, sflag_tmin <chr>, tmax <dbl>,
## #   tmin <dbl>

See CDO Daily Description and raw GHCND metadata for more details. If you want to download multiple stations at once, check out meteo_pull_monitors()

Quality Control: MFLAG

Measurement Flag/Attribute

  • Blank no measurement information applicable
  • B precipitation total formed from two twelve-hour totals
  • H represents highest or lowest hourly temperature (TMAX or TMIN) or average of hourly values (TAVG)
  • K converted from knots

See CDO Description

Quality Control: QFLAG

  • Blank did not fail any quality assurance check
  • D failed duplicate check
  • G failed gap check
  • K failed streak/frequent-value check
  • N failed naught check
  • O failed climatological outlier check
  • S failed spatial consistency check
  • T failed temporal consistency check
  • W temperature too warm for snow

See CDO Description

Quality Control: SFLAG

Indicates the source of the data…

Summarize QC flags

Summarize the QC flags. How many of which type are there? Should we be more conservative?

table(d$qflag_tmax)  
## 
##           G     I 
## 29027     2    10
table(d$qflag_tmin)  
## 
##           G     I     S 
## 29026     2     7     4
table(d$qflag_prcp)  
## 
##           G 
## 29038     1
  • T failed temporal consistency check

Filter with QC data and change units

d_filtered=d%>%
  mutate(tmax=ifelse(qflag_tmax!=" "|tmax==-9999,NA,tmax/10))%>%  # convert to degrees C
  mutate(tmin=ifelse(qflag_tmin!=" "|tmin==-9999,NA,tmin/10))%>%  # convert to degrees C
  mutate(prcp=ifelse(qflag_tmin!=" "|prcp==-9999,NA,prcp))%>%  # convert to degrees C
  arrange(date)

Plot temperatures

ggplot(d_filtered,
       aes(y=tmax,x=date))+
  geom_line(col="red")
## Warning: Removed 11 rows containing missing values (geom_path).

Limit to a few years and plot the daily range and average temperatures.

d_filtered_recent=filter(d_filtered,date>as.Date("2013-01-01"))

  ggplot(d_filtered_recent,
         aes(ymax=tmax,ymin=tmin,x=date))+
    geom_ribbon(col="grey",fill="grey")+
    geom_line(aes(y=(tmax+tmin)/2),col="red")
## Warning: Removed 11 rows containing missing values (geom_path).

Zoo package for rolling functions

Infrastructure for Regular and Irregular Time Series (Z’s Ordered Observations)

  • rollmean(): Rolling mean
  • rollsum(): Rolling sum
  • rollapply(): Custom functions

Use rollmean to calculate a rolling 60-day average.

  • align whether the index of the result should be left- or right-aligned or centered
d_rollmean = d_filtered_recent %>% 
  arrange(date) %>%
  mutate(tmax.60 = rollmean(x = tmax, 60, align = "center", fill = NA),
         tmax.b60 = rollmean(x = tmax, 60, align = "right", fill = NA))
d_rollmean%>%
  ggplot(aes(ymax=tmax,ymin=tmin,x=date))+
    geom_ribbon(fill="grey")+
    geom_line(aes(y=(tmin+tmax)/2),col=grey(0.4),size=.5)+
    geom_line(aes(y=tmax.60),col="red")+
    geom_line(aes(y=tmax.b60),col="darkred")
## Warning: Removed 11 rows containing missing values (geom_path).
## Warning: Removed 70 rows containing missing values (geom_path).

## Warning: Removed 70 rows containing missing values (geom_path).

Your Turn

Plot a 30-day rolling “right” aligned sum of precipitation.

tp=d_filtered_recent %>%
  arrange(date)  %>% 
  mutate(prcp.30 = rollsum(x = prcp, 30, align = "right", fill = NA))

ggplot(tp,aes(y=prcp,x=date))+
  geom_line(aes(y=prcp.30),col="black")+ 
  geom_line(col="red") 
## Warning: Removed 40 rows containing missing values (geom_path).
## Warning: Removed 11 rows containing missing values (geom_path).

Time Series analysis

Most timeseries functions use the time series class (ts)

tmin.ts=ts(d_filtered_recent$tmin,frequency = 365)

Temporal autocorrelation

Values are highly correlated!

ggplot(d_filtered_recent,aes(y=tmin,x=lag(tmin)))+
  geom_point()+
  geom_abline(intercept=0, slope=1)
## Warning: Removed 12 rows containing missing values (geom_point).

Autocorrelation functions

  • autocorrelation x vs. xt − 1 (lag=1)
  • partial autocorrelation. x vs. xn after controlling for correlations t − 1 : n

Autocorrelation

acf(tmin.ts,lag.max = 365*3,na.action = na.exclude )

Partial Autocorrelation

pacf(tmin.ts,lag.max = 365*3,na.action = na.exclude )

Climate Metrics

Climdex indices

ClimDex

Format data for climdex

library(PCICt)
## Parse the dates into PCICt.
pc.dates <- as.PCICt(as.POSIXct(d_filtered$date),cal="gregorian")

Generate the climdex object

  library(climdex.pcic)
    ci <- climdexInput.raw(
      tmax=d_filtered$tmax,
      tmin=d_filtered$tmin,
      prec=d_filtered$prcp,
      pc.dates,pc.dates,pc.dates, 
      base.range=c(1971, 2000))
years=as.numeric(as.character(unique(ci@date.factors$annual)))

Cumulative dry days

cdd= climdex.cdd(ci, spells.can.span.years = TRUE)
plot(cdd~years,type="l")

Diurnal Temperature Range

dtr=climdex.dtr(ci, freq = c("annual"))
plot(dtr~years,type="l")

Frost Days

fd=climdex.fd(ci)
plot(fd~years,type="l")

Your Turn

See all available indices with:

climdex.get.available.indices(ci)
##  [1] "climdex.su"      "climdex.id"      "climdex.txx"    
##  [4] "climdex.txn"     "climdex.tx10p"   "climdex.tx90p"  
##  [7] "climdex.wsdi"    "climdex.fd"      "climdex.tr"     
## [10] "climdex.tnx"     "climdex.tnn"     "climdex.tn10p"  
## [13] "climdex.tn90p"   "climdex.csdi"    "climdex.rx1day" 
## [16] "climdex.rx5day"  "climdex.sdii"    "climdex.r10mm"  
## [19] "climdex.r20mm"   "climdex.rnnmm"   "climdex.cdd"    
## [22] "climdex.cwd"     "climdex.r95ptot" "climdex.r99ptot"
## [25] "climdex.prcptot" "climdex.gsl"     "climdex.dtr"

Select 3 indices, calculate them, and plot the timeseries.

r10mm=climdex.r10mm(ci)
plot(r10mm~years,type="l")

prcptot=climdex.prcptot(ci)
plot(prcptot~years,type="l")

gsl=climdex.gsl(ci)
plot(gsl~years,type="l")