Main Page | Namespace List | Class Hierarchy | Class List | Directories | File List | Namespace Members | Class Members | File Members

Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C > Class Template Reference

This class implements the APRIORI algorithm. More...

#include <Apriori.hpp>

Collaboration diagram for Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >:

Collaboration graph
[legend]
List of all members.

Public Member Functions

 Apriori (MAIN_DATA_STRUCTURE &data_structure, SEC_DATA_STRUCTURE &s_alloc, IR &infrequent_remover, C &coder, D &decoder, FII &fii)
void findFrequentItemsets (const counter_t emptyset_support, std::vector< counter_t > &freq_counters, std::vector< std::pair< counter_t, std::pair< item_t, item_t > > > &freq_pairs_with_counters, const counter_t min_supp, unsigned int maxsize=largest_itemsetsize)
 This method finds the frequent itemsets.

Private Attributes

MAIN_DATA_STRUCTURE & data_structure
 the data strucure that stores candidate.
SEC_DATA_STRUCTURE & s_alloc
 a secondary data strucure; in general it is an allocator
IR & infrequent_remover
 the infrequent remover
C & coder
 The coder.
D & decoder
 The decoder.
FII & fii
unsigned int candidate_size

Detailed Description

template<class C, class D, class MAIN_DATA_STRUCTURE, class SEC_DATA_STRUCTURE, class FII, class FPI, class CG, class IR, class SUPP_C>
class Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >

This class implements the APRIORI algorithm.

APRIORI is a levelwise algorithm. It scans the transaction database several times. After the first scan the frequent 1-itemsets are found, and in general after the kth scan the frequent k-itemsets are extracted. The method does not determine the support of every possible itemset. In an attempt to narrow the domain to be searched, before every pass it generates candidate itemsets. An itemset becomes a candidate if every subset of it is frequent. Obviously every frequent itemset needs to be candidate too, hence only the support of candidates is calculated. Frequent k-itemsets generate the candidate k+1-itemsets after the kth scan.

After all the candidate (k+1)-itemsets have been generated, a new scan of the transactions is effected and the precise support of the candidates is determined. The candidates with low support are thrown away. The algorithm ends when no candidates can be generated.

The intuition behind candidate generation is based on the following simple fact:

Every subset of a frequent itemset is frequent.

This is immediate, because if a transaction t supports an itemset X, then t supports every subset $Y\subseteq X$ .

Using the fact indirectly, we infer, that if an itemset has a subset that is infrequent, then it cannot be frequent. So in the algorithm APRIORI only those itemsets will be candidates whose every subset is frequent. The frequent k-itemsets are available when we attempt to generate candidate (k+1)-itemsets. The algorithm seeks candidate k+1-itemsets among the sets which are unions of two frequent k-itemsets. After forming the union we need to verify that all of its subsets are frequent, otherwise it should not be a candidate. To this end, it is clearly enough to check if all the k-subsets of X are frequent.

Next the supports of the candidates are calculated. This is done by reading transactions one by one. For each transaction t the algorithm decides which candidates are supported by t. To solve this task efficiently it is useful to store the candidates in a special datastructure like trie or hash-tree. The template parameters are the following:

Parameters:
C type of the coder, it has to implement the same function as Coder does.
D type of the decoder, it has to implement interface DecoderBase.
MAIN_DATA_STRUCTURE the main data structure. It is basically a trie (prefix-trie)
SEC_DATA_STRUCTURE the secondary data structure
FII the type of a frequent item inserter (it inserts frequent items into the data structure)
FPI the type of a frequent item inserter (it inserts frequent pairs into the data structure)
CG the type of candidate generator
IR the type of the infrequent remover
SUPP_C the type of the support counter

Definition at line 84 of file Apriori.hpp.


Constructor & Destructor Documentation

template<class C, class D, class MAIN_DATA_STRUCTURE, class SEC_DATA_STRUCTURE, class FII, class FPI, class CG, class IR, class SUPP_C>
Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >::Apriori MAIN_DATA_STRUCTURE &  data_structure,
SEC_DATA_STRUCTURE &  s_alloc,
IR &  infrequent_remover,
C &  coder,
D &  decoder,
FII &  fii
[inline]
 

Definition at line 104 of file Apriori.hpp.


Member Function Documentation

template<class C, class D, class MAIN_DATA_STRUCTURE, class SEC_DATA_STRUCTURE, class FII, class FPI, class CG, class IR, class SUPP_C>
void Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >::findFrequentItemsets const counter_t  emptyset_support,
std::vector< counter_t > &  freq_counters,
std::vector< std::pair< counter_t, std::pair< item_t, item_t > > > &  freq_pairs_with_counters,
const counter_t  min_supp,
unsigned int  maxsize = largest_itemsetsize
 

This method finds the frequent itemsets.

Parameters:
emptyset_support the support of the emptyset (i.e. number of transaction)
freq_counters a vector that stores the support of the frequent items. freq_counters[i] belongs to item i.
freq_pairs_with_counters a vector that stores the support of the frequent pairs. The elements of the vector are pairs, where the first elemen stores the support value and the second element stores the pair itself.
min_supp the support threshold
maxsize the maximal size of the frequent itemset. If this parameter is set to 0 then there is no limit for the maximal size.

Definition at line 138 of file Apriori.hpp.


Member Data Documentation

template<class C, class D, class MAIN_DATA_STRUCTURE, class SEC_DATA_STRUCTURE, class FII, class FPI, class CG, class IR, class SUPP_C>
unsigned int Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >::candidate_size [private]
 

Definition at line 100 of file Apriori.hpp.

template<class C, class D, class MAIN_DATA_STRUCTURE, class SEC_DATA_STRUCTURE, class FII, class FPI, class CG, class IR, class SUPP_C>
C& Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >::coder [private]
 

The coder.

Definition at line 94 of file Apriori.hpp.

template<class C, class D, class MAIN_DATA_STRUCTURE, class SEC_DATA_STRUCTURE, class FII, class FPI, class CG, class IR, class SUPP_C>
MAIN_DATA_STRUCTURE& Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >::data_structure [private]
 

the data strucure that stores candidate.

Definition at line 88 of file Apriori.hpp.

template<class C, class D, class MAIN_DATA_STRUCTURE, class SEC_DATA_STRUCTURE, class FII, class FPI, class CG, class IR, class SUPP_C>
D& Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >::decoder [private]
 

The decoder.

Definition at line 96 of file Apriori.hpp.

template<class C, class D, class MAIN_DATA_STRUCTURE, class SEC_DATA_STRUCTURE, class FII, class FPI, class CG, class IR, class SUPP_C>
FII& Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >::fii [private]
 

Definition at line 98 of file Apriori.hpp.

template<class C, class D, class MAIN_DATA_STRUCTURE, class SEC_DATA_STRUCTURE, class FII, class FPI, class CG, class IR, class SUPP_C>
IR& Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >::infrequent_remover [private]
 

the infrequent remover

Definition at line 92 of file Apriori.hpp.

template<class C, class D, class MAIN_DATA_STRUCTURE, class SEC_DATA_STRUCTURE, class FII, class FPI, class CG, class IR, class SUPP_C>
SEC_DATA_STRUCTURE& Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >::s_alloc [private]
 

a secondary data strucure; in general it is an allocator

Definition at line 90 of file Apriori.hpp.


The documentation for this class was generated from the following file:
Generated on Sun Sep 17 17:55:07 2006 for FIM environment by  doxygen 1.4.4