Apriori implementation for mining frequent sequence of items

Attention!!!
This Apriori-based implementation is a part of the FIM template library. If I change the code, then I change the version found in the FIM template Library. Therefore, the version that can be downloaded from this page, is not updated any more and this page is no longer maintained. Please click here to go to the webpage of the FIM template library.

Apriori algorithm was originally proposed by Agrawal in "Fast Algorithms for Mining Association Rules" in 1994 to find frequent itemsets (known as the problem of FIM) and association rules in a transaction database. A natural generalization of FIM is finding frequent sequence of items in a transactional database where both the transactions and the patterns are allowed to contain duplicates. To solve this problem the trie-based Apriori algorithm can easily be extended.

Here you can download a fast, trie-based, command-line implementation of the Apriori algorithm for Linux platform. The code is written in standard C++ in object-oriented manner. It uses STL possibilities, if that does not reduce running speed.

The code can be freely used for research purposes.

If you use this program in a research paper then a citation to the following paper is welcome.

Ferenc Bodon, Trie-based Apriori Implementation for Mining Frequent Itemsequences, SIGKDD Workshop on Open Source Data Mining Workshop (OSDM'05), in Bart Goethals and Siegfried Nijssen and Mohammed J. Zaki editors, Chicago, IL, USA. 2005

Download

Binaries:

to Linux (ver. 1.0.1, i386, gcc 3.4.5): fsm (102 KB)
to Windows (ver. 1.0.1, compiled by cygwin gcc 3.4.4): fsm.exe
Note! To run fsm.exe you have to have cygwin1.dll in your path. The simplest solution is that you can copy the cygwin1.dll that I have used to the folder where fsm.exe is located. Nevertheless, I propose you to download the latest dll from the website of cygwin. (102 KB)

source:

Latest version (1.0.5)

fim_env.tar.bz2 (142KB): Source code and documentation (date of issue: 25. april 2006.)

To compile in Unix systems (obviously after uncompressing the downloaded file) do the following:

cd src; make dep; make ../fsm; cd ..

this will result in an fsm program. To generate the documentation type

cd src; make doc; cd ..

Documentation

On-line documentation
p56-bodon.pdf: Some theoretical remarks on the program.

Usage

To run fsm 4 parameters are mandatory:

The type of candidate generation, i.e apriori, apriori-intersectprune, apriori-noprune. Most of the times apriori is the best choice.
The file that contains the transactions in market-basket format,
support threshold.
the output file the frequent sequences are sent to,

For example you can type in a linux terminal:

./fsm apriori kosarak2_10_2.dat 2 out.txt

or under Windows command prompt:

.\fsm apriori kosarak2_10_2.dat 2 out.txt

Useful links

Other well-known implementations for mining frequent sequences of items/itemsets in a transactional database

prefixspan made by Taku Kudo
SPAM
SPADE made by Mohammed J. Zaki