This is a post about some caveats on calculating mean value in R:
1. If a vector contains NA value, then mean(vector) always returns NA
data:image/s3,"s3://crabby-images/0c739/0c739403456854aecf849d75c8508272cbdc0d2b" alt=""
This is unlike many data management software/languages, which return 1.5 in the above situation.
2. If a vector contains numeric(0), it will not influence mean() function in R:
data:image/s3,"s3://crabby-images/44144/44144c0dcf3001cb0a2bbe11e77597fc117c0153" alt=""
3. Therefore, we can solve such problem (take the mean for a vector that contains NA) by this:
data:image/s3,"s3://crabby-images/61c23/61c237a0e4770efe4acedb2d60ee2812b7e980f2" alt=""
or
data:image/s3,"s3://crabby-images/3006b/3006bf278915088be30791cf80b7323e36bb9c15" alt=""
or just in function use option na.rm = T to remove NA in the calculation.
4. In addition, we can check the attribute of numeric(0) in R, numeric(0) will appear when one filters a data frame or vector and there is no corresponding data that satisfies the wanted condition.
data:image/s3,"s3://crabby-images/7e1f3/7e1f3b34687359868ce083247fc9979cd5a65070" alt=""
When we calculate with numeric(0), the result will be numeric(0), with NA is the same.
data:image/s3,"s3://crabby-images/d9af5/d9af59401336d9dd6e0ad863c1800d2af486829b" alt=""
In summary, if we use functions to calculate some statistics, we need to ensure our data does not contain any NA value, but numeric(0) values are allowed and usually needed:
data:image/s3,"s3://crabby-images/cbefc/cbefc2726b289487849da939e9b3ff19ca0f318d" alt=""
Comments