Homework 5

Posted by Java Course Staff on December 12, 2016

Homework 5

Deadline: 2017-01-08 20:00 PPT and videos: See Group Files Report template: Electronic Report Template

NOTE: This is your final homework and you are supposed to submit paper & electronic report.

Objectives

  • Analyze important requests/responses using a web debugger, i.e. an HTTP message sniffer. You should focus on av numbers, authors, categories, number of coins, number of favourites, etc.

The spirit of a crawler is to camouflage a program-generated HTTP request as if it is generated from a browser. So, deciphering HTTP messages is the first and the most important step of building a crawler.

  • Store each av number along with its important attributes in your favorite database.

  • Find out the top 3 favorite videos for each category.

  • Get the URLs of the videos in step 3 by using apache-httpclient (perhaps jsoup is needed to parse html pages). Then, download the URLs via Java IO. You can simplify this step with www.ibilibili.com .

Requirements

  1. Homework 5 is an individual work.

  2. Every JARs you need should be configured in pom.xml, which will appear in your project root directory after enabling Maven support.

  3. The hand-written report is a brief summary of your electronic report. You are encouraged to write a detailed electronic report, and append a print copy to the hand-written version.