Main Page | Namespace List | Class List | File List | Class Members | File Members

MSApriori Class Reference

This class implements the MSAPRIORI algirithm. More...

#include <MSApriori.hpp>

Collaboration diagram for MSApriori:

[legend]
List of all members.

Public Member Functions

 MSApriori (ifstream &basket_file, const char *output_file_name, const bool store_input)
void MSAPRIORI_alg (ifstream &mis_file, const double min_conf, const bool quiet, const unsigned long size_threshold)
 This procedure implements the MSAPRIORI algorithm.

 ~MSApriori ()

Private Member Functions

void support (const itemtype &candidate_size)
 Determines the support of the candidates of the given size.


Private Attributes

MSApriori_Triemsapriori_trie
 A trie that stores the frequent itemset and candidates.

Input_Output_Manager input_output_manager
 The input_output_manager that is responsibel for the input, output and recoding operations.

map< vector< itemtype >, unsigned
long > 
reduced_baskets
 This will store the reduced baskets, if store_input=true;.

bool store_input
 If store_input = true, then the reduced baskets will be stored in memory.


Detailed Description

This class implements the MSAPRIORI algirithm.

MSAPRIORI is a levelwise algorithm. It scans the transaction database several times. After the first scan the frequent 1-itemsets are found, and in general after the kth scan the frequent k-itemsets are extracted. The method does not determine the support of every possible itemset. In an attempt to narrow the domain to be searched, before every pass it generates candidate itemsets. An itemset becomes a candidate if almost every subset of it is frequent. Obviously every frequent itemset needs to be candidate too, hence only the support of candidates is calculated. Frequent k-itemsets generate the candidate k+1-itemsets after the $k^{th}$ scan.

After all the candidate k+1-itemsets have been generated, a new scan of the transactions is effected and the precise support of the candidates is determined. The candidates with low support are thrown away. The algorithm ends when no candidates can be generated.

The intuition behind candidate generation is based on the following simple fact:

All but one subset of a frequent itemset is frequent.

This is immediate, because if a transaction t supports an itemset X, then t supports every subset $Y\subseteq X$. So the support of a subset X is greater than or equal with the support of X. However due to the different support thresholds a subset may not be freuent even if it has greater support. Fortunatelly this only applies to one subset (X'), that can be obtained from X if we delete the item that has the smallest support threshold. Notice, that if the support threshold of X' is the same as the support threshold of X, then this exception needs not to be considered in the candidate generation. Candidate generation of itempairs also differs from the original APRIORI method. Reader is refered to the paper for further information.

Using the fact indirectly, we infer, that if an itemset has a subset (not the one mentioned above) that is infrequent, then it cannot be frequent. So in the algorithm MSAPRIORI only those itemsets will be candidates whose all but one subsets are frequent. The frequent k-itemsets are available when we attempt to generate candidate k+1-itemsets. The algorithm seeks candidate k+1-itemsets among the sets which are unions of two frequent k-itemsets. After forming the union we need to verify that all of its subsets are frequent, otherwise it should not be a candidate. To this end, it is clearly enough to check if all the k-subsets of X are frequent.

Next the supports of the candidates are calculated. This is done by reading transactions one by one. For each transaction t the algorithm decides which candidates are supported by t. To solve this task efficiently MSAPRIORI uses a hash-tree. However in this implementation a trie (prefix-tree) is applied. Tries have many advantages over hash-trees.

  1. It is faster
  2. It needs no parameters (main drawback of a hash-tree is that its performance is very sensitive to the parameteres)
  3. The candidate generation is very simple.

Definition at line 72 of file MSApriori.hpp.


Constructor & Destructor Documentation

MSApriori::MSApriori ifstream &  basket_file,
const char *  output_file_name,
const bool  store_input
 

Parameters:
basket_file The file that contain the transactions.
output_file_name The name of file where the results have to be written to.
store_input If store_input = true, then the reduced baskets will be stored in memory

Definition at line 54 of file MSApriori.cpp.

MSApriori::~MSApriori  ) 
 

Definition at line 127 of file MSApriori.cpp.

References msapriori_trie.


Member Function Documentation

void MSApriori::MSAPRIORI_alg ifstream &  mis_file,
const double  min_conf,
const bool  quiet,
const unsigned long  size_threshold
 

This procedure implements the MSAPRIORI algorithm.

Parameters:
mis_file The file that contains relative support thresholds
min_conf The confidence threshold for association rules. If min_conf=0 no association rules will be extraced.
quiet If quiet = true then no system messages will be written during the process.
size_threshold frequent itemsets above this threshold do not need to be found.

Definition at line 68 of file MSApriori.cpp.

References MSApriori_Trie::association(), MSApriori_Trie::candidate_generation(), MSApriori_Trie::delete_infrequent(), Input_Output_Manager::find_frequent_items(), input_output_manager, MSApriori_Trie::insert_frequent_items(), itemtype, MSApriori_Trie::longest_path(), msapriori_trie, Input_Output_Manager::rewind(), support(), and MSApriori_Trie::write_content_to_file().

Referenced by main().

void MSApriori::support const itemtype candidate_size  )  [private]
 

Determines the support of the candidates of the given size.

Parameters:
candidate_size The size of the candidate whose support has top be determined.

Definition at line 21 of file MSApriori.cpp.

References Input_Output_Manager::basket_recode(), MSApriori_Trie::find_candidate(), input_output_manager, itemtype, msapriori_trie, Input_Output_Manager::read_in_a_line(), reduced_baskets, and store_input.

Referenced by MSAPRIORI_alg().


Member Data Documentation

Input_Output_Manager MSApriori::input_output_manager [private]
 

The input_output_manager that is responsibel for the input, output and recoding operations.

Definition at line 92 of file MSApriori.hpp.

Referenced by MSAPRIORI_alg(), and support().

MSApriori_Trie* MSApriori::msapriori_trie [private]
 

A trie that stores the frequent itemset and candidates.

Definition at line 90 of file MSApriori.hpp.

Referenced by MSAPRIORI_alg(), support(), and ~MSApriori().

map<vector<itemtype>, unsigned long> MSApriori::reduced_baskets [private]
 

This will store the reduced baskets, if store_input=true;.

Definition at line 94 of file MSApriori.hpp.

Referenced by support().

bool MSApriori::store_input [private]
 

If store_input = true, then the reduced baskets will be stored in memory.

Definition at line 96 of file MSApriori.hpp.

Referenced by support().


The documentation for this class was generated from the following files:
Generated on Sun Jun 20 23:41:08 2004 for APRIORI algorithm by doxygen 1.3.5