Skip to contents

This function is a small utility to create a specific length dataframe with a set number of groups, specific mean/sd per group. Note that the total length of the dataframe will be n * n_grps.

Usage

generate_df(n = 10L, n_grps = 1L, mean = c(10), sd = mean/10, with_seed = NULL)

Arguments

n

An integer indicating the number of rows per group, default to 10

n_grps

An integer indicating the number of rows per group, defaults to 1

mean

A number indicating the mean of the randomly generated values, must be a vector of equal length to the n_grps

sd

A number indicating the standard deviation of the randomly generated values, must be a vector of equal length to the n_grps

with_seed

A seed to make the randomization reproducible

Value

a tibble/dataframe

Function ID

2-19

Examples

library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
generate_df(
  100L,
  n_grps = 5,
  mean = seq(10, 50, length.out = 5)
) %>%
  group_by(grp) %>%
  summarise(
    mean = mean(values), # mean is approx mean
    sd = sd(values), # sd is approx sd
    n = n(), # each grp is of length n
    # showing that the sd default of mean/10 works
    `mean/sd` = round(mean / sd, 1)
  )
#> # A tibble: 5 × 5
#>   grp    mean    sd     n `mean/sd`
#>   <chr> <dbl> <dbl> <int>     <dbl>
#> 1 grp-1  9.93 0.970   100      10.2
#> 2 grp-2 20.2  1.97    100      10.2
#> 3 grp-3 29.9  2.97    100      10.1
#> 4 grp-4 39.9  3.67    100      10.9
#> 5 grp-5 49.9  5.00    100      10