Show simple item record

dc.contributor.advisor Soltys, Dr. Michael
dc.contributor.author Pereira, Rihan Stephen
dc.date.accessioned 2020-01-27T18:02:21Z
dc.date.available 2020-01-27T18:02:21Z
dc.date.issued 2019-12
dc.identifier.uri http://hdl.handle.net/10211.3/214919
dc.description.abstract Historically, web crawlers/bots/spiders have been well known for indexing, ranking websites on the internet. This thesis augments the crawling activity but approaches the problem through the lens of a data engineer. Whirlpool as a continuous, topical web crawling tool is also a data ingestion pipeline implemented from bottom-up using RabbitMQ which is a high performance messaging buffer to organize the data flow within its network. It is based on a open, standard blueprint design of mercator. This paper discusses the high and low level design of this complex program covering auxiliary data structures, object-oriented design, addressing scalability concerns, and deployment on AWS. The project name Whirlpool is used as an analogy referring to the naturally occurring phenomenon where opposing water currents in sea cause water to spin round and round drawing various objects into it. en_US
dc.format.extent 118 en_US
dc.language.iso en_US en_US
dc.publisher California State University Channel Islands en_US
dc.subject Distributed systems en_US
dc.subject Message Queues en_US
dc.subject Deduplication en_US
dc.subject Amazon Web Services en_US
dc.subject Docker en_US
dc.subject Computer Science thesis en_US
dc.title Whirlpool: A microservice style scalable continuous topical web crawler en_US
dc.type Thesis en_US
dc.contributor.committeeMember Thoms, Dr. Brian
dc.contributor.committeeMember Issacs, Dr. Jason
dc.contributor.committeeMember Ozturgut, Dr. Osman

Files in this item


This item appears in the following Collection(s)

Show simple item record

Search DSpace

My Account

RSS Feeds