Useful for reviewing datasets with where there should be unique
values. It is similar to duplicated()
, but marks all repeated values as
TRUE rather than just the second observation and thereafter.
Treats NaN, Inf, and NA like any other value. repeated(c(NULL, NULL, NULL, NaN, Inf, NA, NA))
returns c(FALSE, FALSE, TRUE, TRUE)
Arguments
- x
a vector
- incomparables
a vector of values that cannot be compared. FALSE is a special value, meaning that all values can be compared, and may be the only value accepted for methods other than the default; values are coerced internally to be the same type as x.
- ...
passed to
duplicated()
Details
A convenience function to use with dplyr::filter() to make it easy
to identify and review all repeated observations in a dataframe. The function
is an alias for x %in% x[duplicated(x)]
.
Filtering a dataframe by the repeated values in column 'x', as in
dplyr::filter(.data, repeated(x))
, performs the same task as grouping by a
variable and selecting counts greater than 1, as in dplyr::group_by(.data, x) |> dplyr::filter(dplyr::n()>1)
, but is quicker and easier to remember.
(Ideally repeated()
would be written in c++ and included in dplyr.)
Examples
repeated(c(1, 2, 3, 4, 5, 5, 5))
#> [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE
repeated(c('A','A','B','C',5,5), incomparables = c('A'))
#> [1] FALSE FALSE FALSE FALSE TRUE TRUE
c(NA, NA, '', NULL, NULL, NULL) |> repeated()
#> [1] TRUE TRUE FALSE
# Find all cars where 'qsec' is the same as another car's:
mtcars |> dplyr::filter(repeated(qsec))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# For clarity, compare repeated() output to duplicated() output:
mtcars |> dplyr::filter(repeated(wt))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2
#> Duster 360 14.3 8 360.0 245 3.21 3.57 15.84 0 0 3 4
#> Merc 280 19.2 6 167.6 123 3.92 3.44 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.44 18.90 1 0 4 4
#> Maserati Bora 15.0 8 301.0 335 3.54 3.57 14.60 0 1 5 8
mtcars |> dplyr::filter(duplicated(wt))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4
#> Maserati Bora 15.0 8 301.0 335 3.54 3.57 14.6 0 1 5 8
# To use dplyr::filter() on several variables, put a `,` for 'and',
# and use `|` for an 'or' statement, as in:
mtcars |> dplyr::filter(repeated(qsec), repeated(wt))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2
#> Merc 280C 17.8 6 167.6 123 3.92 3.44 18.90 1 0 4 4
mtcars |> dplyr::filter(repeated(qsec) | repeated(wt))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8