How to use scrapy to crawl multi level pages? (two level) -

- January 15, 2012

On the first page it scraps "test1" well in the title tag, but in page 2 "test2.html "None of my script:

  import scrapy.spider from scarves to spider. Selector from selector select selectors from 1. Import website class Dumoszepider (spider): name = "bill" allowed_domains = ["http: // www .mywebsite.com"] start_urls = ["http://www.mywebsite.com/test.html"] def pars (Self, response): site for feedback. Xpath ('// head'): item = website () item ['title'] = site.xpath ('// title / text ()'). Remove () yield produce scrap.Request (url = "www.mywebsite.com/test1.html", callback = Self.other_function) def_other_function (auto, response): response.xpath ('// head') to other_thing To: item = website () item ['title'] = other_thing.xpath ('// title / text ()'). Extracts () yield item

Thank you in advance STEF

Attempt

yield scrapy.Request (url = "www.mywebsite.com", callback = self.other_function)

instead of

yield scrapy.Request ( Url = "www.mywebsite.com/test1.html", callback = self.other_function)

Search This Blog

Sign

How to use scrapy to crawl multi level pages? (two level) -

Comments

Post a Comment

Popular posts from this blog

html - Trouble with image gallery on codepen -

java - org.apache.http.ProtocolException: Target host is not specified -

How to access user directory in lazarus? -