Homework 5
Deadline: 2017-01-08 20:00 PPT and videos: See Group Files Report template: Electronic Report Template
NOTE: This is your final homework and you are supposed to submit paper & electronic report.
Objectives
- Analyze important requests/responses using a web debugger, i.e. an HTTP message sniffer. You should focus on av numbers, authors, categories, number of coins, number of favourites, etc.
The spirit of a crawler is to camouflage a program-generated HTTP request as if it is generated from a browser. So, deciphering HTTP messages is the first and the most important step of building a crawler.
-
Store each av number along with its important attributes in your favorite database.
-
Find out the top 3 favorite videos for each category.
-
Get the URLs of the videos in step 3 by using apache-httpclient (perhaps jsoup is needed to parse html pages). Then, download the URLs via Java IO. You can simplify this step with www.ibilibili.com .
Requirements
-
Homework 5 is an individual work.
-
Every JARs you need should be configured in
pom.xml
, which will appear in your project root directory after enabling Maven support. -
The hand-written report is a brief summary of your electronic report. You are encouraged to write a detailed electronic report, and append a print copy to the hand-written version.