The scope of this task involves one particular section scraping of 5 Public websites which allow such via robots.txt. Some details:
-The section of each website can be determined via a keyword in the URL.
-The # of documents varies per site but on average it is 10K.
-Some structured data will need to be extracted from each page, such as URL, page title and other information within HTML or CSS tags. Approximately 9 attributes will be extracted.
-Data should be delivered as CSV or XML file in previously agreed upon format.
-If task gets completed in a high quality manner then weekly or bi-weekly refreshes can be negotiated for an additional cost.
Please contact me if you have any questions or concerns. I look forward to working with you.
接包方 | 国家/地区 | |
---|---|---|
![]() ![]() |
4
Xing910
|
|
![]() |
3
Itgenes
(中标)
|
|
![]() |
3
Djworth
|
|
![]() |
3
Cmaxo
|
|
![]() |
2
Gdinnovative
|
|
![]() |
2
Talentmainly
|
|
![]() |
2
Infomediatech
|