An Improved Generic Crawler using Poisson Fit Distribution

Thangaraj M., Sivagaminathan P. G.. Published in Information Sciences.

Communications on Applied Electronics
Year of Publication: 2016
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: Thangaraj M., Sivagaminathan P. G.

The remarkable growth of Internet populates the World Wide Web to contain huge web data which is unexplored to whom it is intended for worth extraction and assimilation into knowledge. Retrieving potential information from web data needs a broad-spectrum crawler to collect relevant documents and metadata. Breadth first crawler algorithm is presented to fetch related web documents essential to create a web archive for alias extraction. In this paper, it is proved that the upgraded crawler generates better random depth rather than predetermined depth crawling. Contributing different mean values to this function enabled crawler it is possible to generate dynamic random depth.


Breadth First Search, Parsing, Multi-threading, *Probability Mass Function, Frontier, Virtual web