Skip to content

Census Data

Identify relevant attributes:

  • Capital gain/loss are not relevant because many attributes are zero
  • Age (we expect it to be linear but the majority of >50k is in the range of 40 years)
  • Work class is unbalanced (the majority of people works in private), coverage is very low
  • Education number
  • Marital status (interesting, married is different)
  • Occupation (interesting)

Education and education-num are perfectly correlated (it is duplicated).

We can enhance accuracy by replacing missing values:

Preprocess - filter - unsupervised - attribute - replacemissingvalues

Apriori Algorithm

We apply discretization and perform a manual analysis of the data in order to identify any correlation between pairs of attributes.


Last update: November 8, 2022 10:52:23
Created: October 18, 2022 10:04:20
Authors: Francesca Neri