# Glossary

**Bernoulli distribution**- a named random variables used for binary outcomes; $1$ usually denotes the level of interest
**categorical variable**- a variable in a dataset that takes on not-mathable values
**dataframe**- a two dimensional data structure in the programming language R in which each row represents a new observation and each column represents a new variable
**discrete random variable**- a random variable that only takes on a countable set of values
**independent and identically distributed**- a description of data that suggests the data were randomly sampled (independent $\Rightarrow$ no two data points intentionally share anything in common, except) that they come from the same population (identically distributed).
**individual/observation**- a noun in the population of interest, not necessarily people
**interpolate**- estimate a number within a range of data
**level**- values that a categorical variable could take on
**maximum likelihood estimator**- A best quess
**observation/individual**- a noun in the population of interest, not necessarily people
**parameter**- a characteristic of a population, abstracted to non-dataarguments of probability density functions
**percentile**- the value in the support of the random variable that puts $p$% of the area under the probability density function to the left of it
**population**- the broader group of nouns of interest
**probability density function**- a function indexed by parameter(s) of interest, the shape of which theoretically describes the process of interest
**proportion**- AKA a mean, when applied to numerically encoded binary categorical data; unfortunately thought of as $successes / trials$.
**random variable**- a function from an event to a numerical value, e.g. $X(\{Caniformia\}) = 1$
**sample**- a subset of the population, ideally randomly collected
**statistic**- any function of data