Flag all values in a vector that occur more than once.

Useful for reviewing datasets with where there should be unique values. It is similar to duplicated(), but marks all repeated values as TRUE rather than just the second observation and thereafter.

Treats NaN, Inf, and NA like any other value. repeated(c(NULL, NULL, NULL, NaN, Inf, NA, NA)) returns c(FALSE, FALSE, TRUE, TRUE)

Usage

repeated(x, incomparables = FALSE, ...)

Arguments

x: a vector
incomparables: a vector of values that cannot be compared. FALSE is a special value, meaning that all values can be compared, and may be the only value accepted for methods other than the default; values are coerced internally to be the same type as x.
...: passed to duplicated()

Value

a logical vector the same length as the input

Details

A convenience function to use with dplyr::filter() to make it easy to identify and review all repeated observations in a dataframe. The function is an alias for x %in% x[duplicated(x)].

Filtering a dataframe by the repeated values in column 'x', as in dplyr::filter(.data, repeated(x)), performs the same task as grouping by a variable and selecting counts greater than 1, as in dplyr::group_by(.data, x) |> dplyr::filter(dplyr::n()>1), but is quicker and easier to remember.

(Ideally repeated() would be written in c++ and included in dplyr.)

Examples

repeated(c(1, 2, 3, 4, 5, 5, 5))
#> [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
repeated(c('A','A','B','C',5,5), incomparables = c('A'))
#> [1] FALSE FALSE FALSE FALSE  TRUE  TRUE
c(NA, NA, '', NULL, NULL, NULL) |> repeated()
#> [1]  TRUE  TRUE FALSE

# Find all cars where 'qsec' is the same as another car's:
mtcars |> dplyr::filter(repeated(qsec))
#>                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> Fiat X1-9         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1

# For clarity, compare repeated() output to duplicated() output:
mtcars |> dplyr::filter(repeated(wt))
#>                    mpg cyl  disp  hp drat   wt  qsec vs am gear carb
#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.44 17.02  0  0    3    2
#> Duster 360        14.3   8 360.0 245 3.21 3.57 15.84  0  0    3    4
#> Merc 280          19.2   6 167.6 123 3.92 3.44 18.30  1  0    4    4
#> Merc 280C         17.8   6 167.6 123 3.92 3.44 18.90  1  0    4    4
#> Maserati Bora     15.0   8 301.0 335 3.54 3.57 14.60  0  1    5    8
mtcars |> dplyr::filter(duplicated(wt))
#>                mpg cyl  disp  hp drat   wt qsec vs am gear carb
#> Merc 280      19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
#> Merc 280C     17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4
#> Maserati Bora 15.0   8 301.0 335 3.54 3.57 14.6  0  1    5    8

# To use dplyr::filter() on several variables, put a `,` for 'and',
#  and use `|` for an 'or' statement, as in:
mtcars |> dplyr::filter(repeated(qsec), repeated(wt))
#>                    mpg cyl  disp  hp drat   wt  qsec vs am gear carb
#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.44 17.02  0  0    3    2
#> Merc 280C         17.8   6 167.6 123 3.92 3.44 18.90  1  0    4    4
mtcars |> dplyr::filter(repeated(qsec) | repeated(wt))
#>                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> Merc 280C         17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> Fiat X1-9         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> Maserati Bora     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8