Main Page | Namespace List | Class List | File List | Class Members | File Members

MSApriori_Trie Class Reference

Apriori_Trie (or prefix-tree) is a tree-based datastructure. More...

#include <MSApriori_Trie.hpp>

Collaboration diagram for MSApriori_Trie:

[legend]
List of all members.

Public Member Functions

 MSApriori_Trie (const unsigned long counter_of_emptyset, const vector< double > &mis_abs)
void insert_frequent_items (const set< pair< itemtype, unsigned long > > &counters)
 Insert the frequent items and their counters into the trie;.

void candidate_generation (const itemtype &frequent_size)
 Generates candidates.

void find_candidate (const vector< itemtype > &basket, const itemtype candidate_size, const unsigned long counter=1)
 Increases the counter of those candidates that are contained by the given basket.

void delete_infrequent (const itemtype candidate_size)
 Deletes unfrequent itemsets.

void association (const double min_conf, Input_Output_Manager &input_output_manager) const
 Generates association rules.

itemtype longest_path () const
 Returns the length of the longest path in the Apriori_Trie.

void write_content_to_file (Input_Output_Manager &input_output_manager) const
 Writes the content (frequent itemsets) to the file.

void show_content_preorder () const
 This procedure shows the Apriori_Trie in a preorde.

 ~MSApriori_Trie ()

Protected Member Functions

bool is_all_subset_frequent (const set< itemtype > &maybe_candidate) const
 Decides if all subset of an itemset is contained in the Apriori_Trie.

void candidate_generation_two ()
 Generates candidate of size two.

void candidate_generation_assist (Trie *Trie, const itemtype distance_from_generator, set< itemtype > &maybe_candidate)
 Generates candidate of size more than two.

void find_candidate_two (const vector< itemtype > &basket, const unsigned long counter=1)
 Increases the counter for those itempairs that are in the given basket.

void delete_infrequent_two ()
 Deletes the Tries that represent infrequent itemsets of size 2.

void assoc_rule_find (const double min_conf, set< itemtype > &condition_part, set< itemtype > &consequence_part, const unsigned long union_support, Input_Output_Manager &input_output_manager) const
void assoc_rule_assist (const double min_conf, const Trie *Trie, set< itemtype > &consequence_part, Input_Output_Manager &input_output_manager) const
void write_content_to_file_assist (Input_Output_Manager &input_output_manager, const Trie *actual_state, const itemtype distance_from_frequent, set< itemtype > &frequent_itemset) const
 Writes out the content of the Apriori_Trie (frequent itemset and counters).


Protected Attributes

Trie main_trie
 Trie to store the candidates and the frequent itemsets.

vector< vector< unsigned long > > temp_counter_array
 temp_counter_array stores the occurences of the itempairs

vector< double > mis_abs
 The mis values.


Detailed Description

Apriori_Trie (or prefix-tree) is a tree-based datastructure.

Apriori_Trie is a special tree designed to provide efficient methods for the apriori algorithm. It mostly uses a regular trie except when there exist faster solution. For example for storing one and two itemset candidate where a simple vector and array gives better performance. Apriori_Trie extends the functions provided by the regular trie with a candidate generation process.

Definition at line 32 of file MSApriori_Trie.hpp.


Constructor & Destructor Documentation

MSApriori_Trie::MSApriori_Trie const unsigned long  counter_of_emptyset,
const vector< double > &  mis_abs
 

Parameters:
counter_of_emptyset The support of the empty set, i.e. the number of transactions.
mis_abs The mis values.

Definition at line 20 of file MSApriori_Trie.cpp.

MSApriori_Trie::~MSApriori_Trie  ) 
 

Definition at line 114 of file MSApriori_Trie.cpp.


Member Function Documentation

void MSApriori_Trie::assoc_rule_assist const double  min_conf,
const Trie Trie,
set< itemtype > &  consequence_part,
Input_Output_Manager input_output_manager
const [protected]
 

Definition at line 264 of file MSApriori_Trie.cpp.

References assoc_rule_find(), Trie::counter, and Trie::edgevector.

Referenced by association().

void MSApriori_Trie::assoc_rule_find const double  min_conf,
set< itemtype > &  condition_part,
set< itemtype > &  consequence_part,
const unsigned long  union_support,
Input_Output_Manager input_output_manager
const [protected]
 

Definition at line 240 of file MSApriori_Trie.cpp.

References Trie::counter, Trie::is_included(), itemtype, main_trie, and Input_Output_Manager::write_out_basket().

Referenced by assoc_rule_assist().

void MSApriori_Trie::association const double  min_conf,
Input_Output_Manager input_output_manager
const
 

Generates association rules.

Parameters:
min_conf Confidence threshold.
input_output_manager This will write out itemsets.

Definition at line 84 of file MSApriori_Trie.cpp.

References assoc_rule_assist(), and main_trie.

Referenced by MSApriori::MSAPRIORI_alg().

void MSApriori_Trie::candidate_generation const itemtype frequent_size  ) 
 

Generates candidates.

Parameters:
frequent_size Size of the frequent itemsets that generate the candidates.

Definition at line 40 of file MSApriori_Trie.cpp.

References candidate_generation_assist(), candidate_generation_two(), itemtype, main_trie, and Trie::maxpath.

Referenced by MSApriori::MSAPRIORI_alg().

void MSApriori_Trie::candidate_generation_assist Trie Trie,
const itemtype  distance_from_generator,
set< itemtype > &  maybe_candidate
[protected]
 

Generates candidate of size more than two.

Definition at line 181 of file MSApriori_Trie.cpp.

References Trie::add_empty_state(), Trie::edgevector, is_all_subset_frequent(), itemtype, Trie::max_path_set(), and Trie::maxpath.

Referenced by candidate_generation().

void MSApriori_Trie::candidate_generation_two  )  [protected]
 

Generates candidate of size two.

Definition at line 165 of file MSApriori_Trie.cpp.

References main_trie, Trie::maxpath, mis_abs, and temp_counter_array.

Referenced by candidate_generation().

void MSApriori_Trie::delete_infrequent const itemtype  candidate_size  ) 
 

Deletes unfrequent itemsets.

Parameters:
candidate_size The size of the candidate itemset.

Definition at line 68 of file MSApriori_Trie.cpp.

References delete_infrequent_two(), Trie::edgevector, itemtype, main_trie, and mis_abs.

Referenced by MSApriori::MSAPRIORI_alg().

void MSApriori_Trie::delete_infrequent_two  )  [protected]
 

Deletes the Tries that represent infrequent itemsets of size 2.

temp_counter_array[stateIndex_1] will never be used again!

temp_counter_array will never be used again!

Definition at line 218 of file MSApriori_Trie.cpp.

References Edge_point_less(), Trie::edgevector, main_trie, Trie::max_path_set(), mis_abs, and temp_counter_array.

Referenced by delete_infrequent().

void MSApriori_Trie::find_candidate const vector< itemtype > &  basket,
const itemtype  candidate_size,
const unsigned long  counter_incr = 1
 

Increases the counter of those candidates that are contained by the given basket.

Parameters:
basket The basket that hs to be analyzed.
candidate_size the size of the candidates.
counter_incr The number of time the basket occured. The counters of candidates that occure in the basket has to be incremented by counter_incr.

Definition at line 57 of file MSApriori_Trie.cpp.

References Trie::find_candidate(), find_candidate_two(), itemtype, and main_trie.

Referenced by MSApriori::support().

void MSApriori_Trie::find_candidate_two const vector< itemtype > &  basket,
const unsigned long  counter = 1
[protected]
 

Increases the counter for those itempairs that are in the given basket.

Parameters:
basket the given basket
counter The number the processed basket occures in the transactional database

Definition at line 152 of file MSApriori_Trie.cpp.

References temp_counter_array.

Referenced by find_candidate().

void MSApriori_Trie::insert_frequent_items const set< pair< itemtype, unsigned long > > &  counters  ) 
 

Insert the frequent items and their counters into the trie;.

Parameters:
counters It stores the support of the frequent items. counters[i] stores the suport of item i.

Definition at line 30 of file MSApriori_Trie.cpp.

References Trie::add_empty_state(), Trie::edgevector, main_trie, and Trie::maxpath.

Referenced by MSApriori::MSAPRIORI_alg().

bool MSApriori_Trie::is_all_subset_frequent const set< itemtype > &  maybe_candidate  )  const [protected]
 

Decides if all subset of an itemset is contained in the Apriori_Trie.

Parameters:
maybe_candidate The itemset that has to be checked.

Definition at line 122 of file MSApriori_Trie.cpp.

References Trie::is_included(), main_trie, and mis_abs.

Referenced by candidate_generation_assist().

itemtype MSApriori_Trie::longest_path  )  const
 

Returns the length of the longest path in the Apriori_Trie.

Definition at line 91 of file MSApriori_Trie.cpp.

References itemtype, main_trie, and Trie::maxpath.

Referenced by MSApriori::MSAPRIORI_alg().

void MSApriori_Trie::show_content_preorder  )  const
 

This procedure shows the Apriori_Trie in a preorde.

Definition at line 108 of file MSApriori_Trie.cpp.

References main_trie, and Trie::show_content_preorder().

void MSApriori_Trie::write_content_to_file Input_Output_Manager input_output_manager  )  const
 

Writes the content (frequent itemsets) to the file.

Definition at line 96 of file MSApriori_Trie.cpp.

References Trie::counter, itemtype, main_trie, Trie::maxpath, and write_content_to_file_assist().

Referenced by MSApriori::MSAPRIORI_alg().

void MSApriori_Trie::write_content_to_file_assist Input_Output_Manager input_output_manager,
const Trie actual_state,
const itemtype  distance_from_frequent,
set< itemtype > &  frequent_itemset
const [protected]
 

Writes out the content of the Apriori_Trie (frequent itemset and counters).

Definition at line 278 of file MSApriori_Trie.cpp.

References Trie::counter, Trie::edgevector, itemtype, and Input_Output_Manager::write_out_basket_and_counter().

Referenced by write_content_to_file().


Member Data Documentation

Trie MSApriori_Trie::main_trie [protected]
 

Trie to store the candidates and the frequent itemsets.

Definition at line 98 of file MSApriori_Trie.hpp.

Referenced by assoc_rule_find(), association(), candidate_generation(), candidate_generation_two(), delete_infrequent(), delete_infrequent_two(), find_candidate(), insert_frequent_items(), is_all_subset_frequent(), longest_path(), show_content_preorder(), and write_content_to_file().

vector<double> MSApriori_Trie::mis_abs [protected]
 

The mis values.

mis_abs[i] strores the mis value of item i.

mis_abs vector is sorted, because eacs item gets a new code so that the item withe the smallest mis value has the smallest code. This is for the sake of efficiency, and candidate generation simplicity.

Definition at line 113 of file MSApriori_Trie.hpp.

Referenced by candidate_generation_two(), delete_infrequent(), delete_infrequent_two(), and is_all_subset_frequent().

vector< vector<unsigned long> > MSApriori_Trie::temp_counter_array [protected]
 

temp_counter_array stores the occurences of the itempairs

We can use a simple array to determine the support of itemset of size two. This requires less memory than the trie-based supportcount. temp_counter_array[i][j-i] stores the occurence of the itempair (i,j).

Definition at line 106 of file MSApriori_Trie.hpp.

Referenced by candidate_generation_two(), delete_infrequent_two(), and find_candidate_two().


The documentation for this class was generated from the following files:
Generated on Sun Jun 20 23:41:08 2004 for APRIORI algorithm by doxygen 1.3.5