Sliding Window Technique for the Web Log Analysis
The results of the Web query log analysis may be significantly shifted depending on the fraction of agents (non-human clients), which were not excluded from the log. To detect and exclude agents the Web log studies use threshold values for a number of requests submitted by a client during the observation period. However, different studies use different observation periods, and a threshold assigned to one period usually incomparable with the threshold assigned to the other period. We propose the uniform method equally working on the different observation periods. The method bases on the sliding window technique: a threshold is assigned to the sliding window rather than to the whole observation period. Besides, we estimate the sub-optimal values of the parameters of the method: a window size and a threshold.