Main Page | Namespace List | Class List | File List | Class Members | File Members

An efficient implemenation of MSAPRIORI algorithm

This program is a very efficient implementation of MSAPRIORI algorithm proposed by Bing Liu, Wynne Hsu and Yiming Ma. MSAPRIORI is the most basic and well-known algorithm to find frequent itemsets with multiple minimum supports in a transactional database.

Frequent Itemset Mining problem

A transactional database consists of sequence of transaction: $T=\langle t_1,\ldots ,t_n\rangle $. A transaction is a set of items ($t_i\in I$). Transactions are often called baskets, referring to the primary application domain (i.e. market-basket analysis). A set of items is often called itemset by the data mining community. The (absolute) support or the occurrence of $X$ (denoted by $supp(X)$) is the number of transactions that are supersets of $X$ (i.e. that contain $X$). The realtive support is the absolute support divided by the number of transactions (i.e. n). An itemset is frequent if its support is greater or equal than its threshold value (mis(X)). If $X=\{i_1,\ldots, i_k\}$, then $mis(X)=min\{mis(i_j)\}$, where the mis values of the single items are predefined.

In the frequent itemset mining problem a transaction database and the mis values of the items (traditionally denoted by mis(i)) is given and we have to find all frequent itemsets.

Association Rule Mining problem

This program is also capable of mining association rules. An association rule is like an implication: $X\to Y $ means that if itemset X occurs in a transaction, than itemset Y also occurs with high probability. This probability is given by the confidence of the rule. It is like an approxiamtion of p(Y|X), it is the number of transactions that contain both X and Y divided by the number of transaction that contain X, thus $conf(X\to Y)=\frac{supp(X\cup Y)}{supp(X)}$. The relative support of the association rule $X\to Y $ is the support of itemset $X \cup Y $. The lift of $X\to Y $ tries to capture the independence of the antecedent and the consequent of the rule: $lift(X\to Y)=\frac{supp(X\cup Y)}{supp(X)supp(Y)}$ An association rule is valid if its confidence, support and lift are greater than or equal than corresponding threshold values.

In the association rule mining problem a transaction database and the mis values of the items (traditionally denoted by mis(i)), a confidence threshold (traditionally denoted by min_conf), and a lift threshold is given and we have to find all valid association rules.


Generated on Sun Jun 20 23:41:08 2004 for APRIORI algorithm by doxygen 1.3.5