Introduction & purpose

Association Rules is quite a powerful technique for exploratory analysis of any dataset with categorical values. It finds categories that often happen at the same time. It’s been applied to market basket analysis -i.e. what is it that customers usually buy together. And in a previous post, it was applied to clickstream data to discover which web pages are usually visited together.

As powerful as Association Rules is a technique, its results are usually a bit difficult to communicate (at least, that’s the experience of the writer of this post, who has used the technique quite a few times in their professional life). However, at the Spanish R users event 2019, the netCoin package was presented in a practical lab and proved to be a good solution for generating a visualization for a frequent itemset analysis.

In this post, netCoin is applied to clickstream data from Microsoft in 1998 available at UCI ML repository, for which Association Rules was applied in a previous post.

Results: visualization of frequent itemsets in a graph using netCoin for clickstream data

When running Association Rules to a dataset, the results are rules, which are really descriptive and bring insights, but a bit cumbersome to read. As an example, let’s take a look to the 10 most important rules regarding frequent intemsets (i.e. pages that are visited together for the dataset of Microsoft’s visits in 1998).

## Apriori
##
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##          NA    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen            target   ext
##      10 frequent itemsets FALSE
##
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
##
## Absolute minimum support count: 327
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[284 item(s), 32711 transaction(s)] done [0.01s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [150 set(s)] done [0.00s].
## creating S4 object  ... done [0.01s].
##      items                                    support    count
## [1]  {Free Downloads,Internet Explorer}       0.16080218 5260
## [2]  {Free Downloads,Windows Family of OSs}   0.07792486 2549
## [3]  {Free Downloads,isapi}                   0.07306411 2390
## [4]  {Free Downloads,Products }               0.06123322 2003
## [5]  {Free Downloads,Microsoft.com Search}    0.06043838 1977
## [6]  {isapi,Support Desktop}                  0.05942955 1944
## [7]  {Knowledge Base,Support Desktop}         0.05521079 1806
## [8]  {Internet Explorer,Microsoft.com Search} 0.05328483 1743
## [9]  {Microsoft.com Search,Products }         0.04989147 1632
## [10] {Microsoft.com Search,Support Desktop}   0.04857693 1589

(For details on how to perform this analysis, take a look to the previous post).

Imagine yourself trying to explain these to a customer. It is doable of course, but everything would certainly go smoother if, first of all, you could provide to them an image like the the picture below, which shows the results of applying netCoin to the same dataset.