Back
Featured image of post Fit quadratic-plateau models in R

Fit quadratic-plateau models in R

Agricultural yields go up a curve, then level off.

How does one fit a quadratic-plateau model in R?

Like this:

quad_plateau(data)

Or at least that is how it works once I explain a few things. Like my post on linear-plateau models in R, this post is for the person1 who has been googling how to fit quadratic-plateau models in R. This post provides my creative function that helps one do quadratic-plateau fits and make a ggplot style figure all at once.

So first, get out your library card and borrow some packages.

library(tidyverse) # for data manipulation and plotting and so on
library(minpack.lm)# for the model fitting
library(broom)     # for getting model info in a tidy way
library(rcompanion)# for the r-squared
library(nlstools)  # for confidence intervals
library(agridat)   # for the data

The Data

The data is from some cotton trials in the agridat package. Basically, more potassium measured in the soil should correlate to more cotton yield, but likely the relationship will not be linear. Fitting a model to determine something about this relationship is central to the soil-test correlation process. Let’s plot it to see what we’re working with.

cotton <- tibble(potassium = agridat::cate.potassium$potassium,
                 yield = agridat::cate.potassium$yield)

cotton %>% 
  ggplot(aes(potassium, yield)) +
  geom_point(size = 2, alpha = 0.5) +
  geom_rug(alpha = 0.5, length = unit(2, "pt")) +
  scale_x_continuous(breaks = seq(0, 1000, 20)) +
  scale_y_continuous(breaks = seq(0, 200, 5)) +
  labs(title = paste(
    "Relative yield and soil potassium for", nrow(cotton), "cotton trials"),
    y = "Relative yield (%)",
    x = "Concentration of Soil Potassium (ppm)")

The relationship between soil-test potassium (STK) and relative cotton yield may be nonlinear2. Perhaps a polynomial function could be fit, or the data could be transformed, but we’ll fit a non-linear model known as the quadratic-plateau. Quadratic-plateau (QP) models are also called quad-plats3. Quad-plats are good in that they have a curved component (important to biological systems) followed by a join point and a zero-slope plateau (convenient for researchers). Notice how the relative yield of the cotton does not appear to increase anymore from about 60-200 ppm K (the plateau!)

The Function(s)

The equation for the QP is essentially that of a quadratic up until the join point: y = B0 + B1x + Bx^2. While we could define the QP equation in the nls() function later, lets make a function. Actually, let’s make two functions4.

# b0 = intercept
# b1 = slope
# b2 = quadratic term
# jp = join point = critical concentration

qp <- function(x, b0, b1, b2) {
    jp <- -0.5 * b1 / b2
    if_else(
        condition = x < jp,
        true  = b0 + (b1 * x) + (b2 * x * x),
        false = b0 + (b1 * jp) + (b2 * jp * jp)
    )
}

qp_jp <- function(x, b0, b1, jp) {
    b2 <- -0.5 * b1 / jp
    if_else(
        condition = x < jp,
        true  = b0 + (b1 * x) + (b2 * x * x),
        false = b0 + (b1 * jp) + (b2 * jp * jp)
    )
}

These functions say that up until some join-point, the relationship is a second-order polynomial (quadratic), after which it hits a flat line (plateau). This model is sometimes also called quadratic-plus-plateau. That join-point is important; in the context of soil fertility it represents a critical concentration for the nutrient in the soil. The join point of a quad-plat will always be higher than with a linear-plateau, an important consideration for another time and place.

The qp and qp_jp functions will be used inside the nlsLM() functions inside my quad_plateau() function soon. So look for them there.

Next is my big quad_plateau() function that fits the QP model and either gets useful numbers into a table or plots it visually. I’m not a function wizard, but I’m trying, so forgive any poor coding practices. That’s why in my function, I’ve just put in the variables yield and potassium. Here are the steps of the function:

  1. Get initial values for the nlsLM() function coming up.

    • this can be done reliably by fitting a quadratic model, and yanking out coefficients.
  2. Run nlsLM()5 with the qp() and qp_jp() functions as part of the formula.

  3. Make a null model, and get a pseudo-R2 value.

  4. Get a confidence interval for the join-point (critical soil test value).

  5. Get coefficients from model for use in table and plot.

  6. Create a table of results.

    OR

  7. Plot the cotton yield data with a regression line using nlsLM and the model final values as starting values.

quad_plateau <- function(data, confidence = 95, plot = FALSE) {
    # initial values
    start <- lm(yield ~ poly(potassium, 2, raw = TRUE),
                      data = data)
    start_b0 <- start$coef[[1]]
    start_b1 <- start$coef[[2]]
    start_b2 <- start$coef[[3]]
    start_jp <- mean(data$potassium)
    
    ### Quad plateau model ###
    # nlsLM tends to converge/work better on average than nls, IMO
    try(corr_model <<- minpack.lm::nlsLM(
        formula = yield ~ qp(potassium, b0, b1, b2),
        data = data,
        start = list(b0 = start_b0,
                     b1 = start_b1,
                     b2 = start_b2)
    ))
    
    try(corr_model_jp <<- minpack.lm::nlsLM(
        formula = yield ~ qp_jp(potassium, b0, b1, jp),
        data = data,
        start = list(b0 = start_b0,
                     b1 = start_b1,
                     jp = start_jp)
    ))
    
    # Define null model
    nullfunct <- function(x, m) {
        m
    }
    
    m_ini <- mean(data$yield)
    
    null <- nls(yield ~ nullfunct(potassium, m),
                data = data,
                start = list(m = m_ini),
                trace = FALSE)
    
    # Find p-value and pseudo R-squared
    model_error <- round(summary(corr_model)$sigma, 2)
    pseudo_r2 <- nagelkerke(corr_model, null)
    label_pseudo_r2 <- round(pseudo_r2$Pseudo.R.squared.for.model.vs.null[3], 2)
    
    # print and set confidence interval (use JP model)
    ci <- nlstools::confint2(corr_model_jp, level = (confidence / 100))
    ci_lower <- round(ci[3, 1], 0)
    ci_upper <- round(ci[3, 2], 0)
    
    # get model coefficients
    b0 <- tidy(corr_model)$estimate[1]
    b1 <- tidy(corr_model)$estimate[2]
    b2 <- tidy(corr_model)$estimate[3] # quadratic term of equation
    # critical concentration value = join point of segmented regression
    cc <- tidy(corr_model_jp)$estimate[3]
    
    plateau <- round(b0 + (b1 * cc) + (b2 * cc * cc), 1)
    
    equation <- paste0(round(b0, 1), " + ",
                       round(b1, 2), "x + ",
                       round(b2, 3), "xx")
    
    se_b0 <- round(tidy(corr_model)$std.error[1], 2)
    se_b1 <- round(tidy(corr_model)$std.error[2], 2)
    se_b2 <- round(tidy(corr_model)$std.error[3], 3)
    
    pval_b0 <- tidy(corr_model)$p.value[1]
    pval_b1 <- tidy(corr_model)$p.value[2]
    pval_b2 <- tidy(corr_model)$p.value[3]
    
    pval_b0 <- case_when(
        tidy(corr_model)$p.value[1] > 0.05 ~ "NS",
        tidy(corr_model)$p.value[1] <= 0.001 ~ "***",
        tidy(corr_model)$p.value[1] <= 0.01 ~ "**",
        tidy(corr_model)$p.value[1] <= 0.05 ~ "*")
    
    pval_b1 <- case_when(
        tidy(corr_model)$p.value[2] > 0.05 ~ "NS",
        tidy(corr_model)$p.value[2] <= 0.001 ~ "***",
        tidy(corr_model)$p.value[2] <= 0.01 ~ "**",
        tidy(corr_model)$p.value[2] <= 0.05 ~ "*")
    
    pval_b2 <- case_when(
        tidy(corr_model)$p.value[3] > 0.05 ~ "NS",
        tidy(corr_model)$p.value[3] <= 0.001 ~ "***",
        tidy(corr_model)$p.value[3] <= 0.01 ~ "**",
        tidy(corr_model)$p.value[3] <= 0.05 ~ "*")
    
    # Printouts
    if (plot == FALSE) {
        tibble(
            equation,
            plateau,
            critical = round(cc, 0),
            ci_lower,
            ci_upper,
            r_square = label_pseudo_r2,
            model_error,
            AIC = round(AIC(corr_model),0),
            BIC = round(BIC(corr_model),0),
            se_b0,
            se_b1,
            se_b2,
            pval_b0,
            pval_b1,
            pval_b2
        )
    } else {
        # Residual plots and normality
        #nls_resid <- nlsResiduals(corr_model)
        #plot(nls_resid, which = 0)
        
      # ggplot of data
      data %>%
        ggplot(aes(x = potassium, y = yield)) +
        geom_point(size = 2, alpha = 0.5) +
        geom_rug(alpha = 0.5, length = unit(2, "pt")) +
        geom_vline(xintercept = round(cc, 0),
                   linetype = 3,
                   alpha = 0.5) +
        geom_line(
          stat = "smooth",
          method = "nlsLM",
          formula = y ~ qp(x, b0, b1, b2),
          method.args = list(start = as.list(coef(corr_model))),
          se = FALSE,
          alpha = 0.5
        ) +
        annotate(
          "text",
          label = paste(
            equation,
            "\nR^2 =",
            label_pseudo_r2,
            "\nPlateau =",
            round(plateau, 1),
            "% RY",
            "\nCritical conc. =",
            round(cc, 0),
            "ppm",
            "\nConf Int [",
            ci_lower,
            ",",
            ci_upper,
            "]"
          ),
          x = max(data$potassium) * 0.7,
          y = min(data$yield) + 5,
          hjust = 0,
          family = "Roboto Condensed"
        ) +
        scale_x_continuous(breaks = seq(0, 1000, 20)) +
        scale_y_continuous(breaks = seq(0, 200, 5))
    }
    
}

The result

So how do you fit a quadratic-plateau model in R? Like this:

quad_plateau(cotton)
## # A tibble: 1 x 15
##   equation   plateau critical ci_lower ci_upper r_square model_error   AIC   BIC
##   <chr>        <dbl>    <dbl>    <dbl>    <dbl>    <dbl>       <dbl> <dbl> <dbl>
## 1 12.9 + 1.~    95.3       86       45      127     0.68        10.8   187   192
## # ... with 6 more variables: se_b0 <dbl>, se_b1 <dbl>, se_b2 <dbl>,
## #   pval_b0 <chr>, pval_b1 <chr>, pval_b2 <chr>

The model is written out in a data frame, ready for other use and manipulation. This is really helpful if for example you have a grouped data frame of different fertilizer trials, and want to perform a QP fit to each group using something like group_modify().

If you want to plot it along with an overlay of model results, use the plot = TRUE argument and add your labels. Use group_map() if you want to “loop” this over multiple groups in your data frame.

quad_plateau(cotton, plot = TRUE) +
  labs(title = paste(
    "Relative yield and soil potassium for", nrow(cotton), "cotton trials"),
    y = "Relative yield (%)",
    x = "Soil Test Potassium (ppm)")

The critical soil concentration for potassium appears to be around 86 ppm, higher than the linear-plateau model. The confidence interval for the join-point is also wider than the “lp” model. In fact, the confidence interval is just wide. This is one of the problems with getting the confidence limits from the join point, rather than at a lower point such as 95% of the maximum (plateau), but this is to be discussed at another time and place.

The model summary shows that the quadratic term is not actually statistically significant. It may be then that a linear-plateau model would be better for this dataset. The pseudo-R2 is about 0.67, so pretty good for noisy crop yield data. Still, the distribution of points has gaps in it, which limits the confidence of the correlation.

Now, what if the two points with “high” STK (> 180 ppm), were removed? How would that change the analysis? Go back and test yourself6. How do other models compare? Linear-plateau? Mitscherlich-Bray?

Other resources about nonlinear functions in R

  • rcompanion package - gave me a good jumpstart and has some useful functions inside

  • nlstools package

  • nlraa package by Fernando Miguez

  • easynls package - it’s quick and dirty, but not as useful as it could be

  • nls.multstart package - tries lots of starting values to try and better get at the best values for the equation. But I think for QP models, my approach works fine and using multiple starting values is unnecessary.


  1. Probably a grad student↩︎

  2. No one will judge you if you hold a clear, plastic ruler up to your computer screen.↩︎

  3. This isn’t true, but I wish it was.↩︎

  4. I’m humble enough to admit that, as of now, I can’t figure out how to write one function that gets me 4 model parameters instead of three AND doesn’t yell at me for creating a singularity in space or something. So I’m using two functions, one to get the first three (intercept, slope, and quadratic term) and then the second to get the join point.↩︎

  5. nlsLM() seems more reliable overall than plain nls() from base R. I’ve had decent luck with nls(), but sometimes it fails. nlsLM() seems to fail less, probably due to using a different algorithm. You can test their similarity in reaching the same final results by using the output of nlsLM() as starting values in nls().↩︎

  6. Hint: filter()↩︎

Built with Hugo
Theme Stack designed by Jimmy