Ciao Enrico, ti ringrazio per il tuo aiuto. Per meglio contestualizzare e circoscrivere quello che sto cercando di fare, ti dico che sto cercando di "portare" (inteso come porting) in script uno scraper (con scrapy) funzionante correttamente.
Riporto con un copia-incolla quanto ho scritto oggi nel google group di scrapy, non avendo purtroppo ad ora ricevuto alcun commento. Hi everybody, following the indications here:http://scrapy.readthedocs.org/en/0.18/topics/practices.html where: from testspiders.spiders.followall import FollowAllSpider means: import class "FollowAllSpider" contained in the file followall.py, which is located in folder testspiders/spiders I'm trying to transfer into a script my working scraper. so this is my file: #!/usr/bin/python from twisted.internet import reactor from scrapy.crawler import Crawler from scrapy.settings import Settings from scrapy import log, signals #from sole24ore.sole24ore.spiders.sole import SoleSpider spider = SoleSpider(domain='sole24ore.com') crawler = Crawler(Settings()) crawler.signals.connect(reactor.stop, signal=signals.spider_closed) crawler.configure() crawler.crawl(spider) crawler.start() log.start() reactor.run() # the script will block here until the spider_closed signal was sent but when running the script: python soleLinksScrapy.py Traceback (most recent call last): File "soleLinksScrapy.py", line 25, in <module> from sole24ore.sole24ore.spiders.sole import SoleSpider File "/home/ubuntu/ggc/prove/sole24ore/sole24ore/spiders/sole.py", line 6, in <module> from sole24ore.items import Sole24OreItem ImportError: No module named items The scraperm when typying in its folder scrapy crawl sole works fine: scrapy crawl sole 2014-02-08 17:00:52+0000 [scrapy] INFO: Scrapy 0.18.4 started (bot: sole24ore) 2014-02-08 17:00:52+0000 [scrapy] DEBUG: Optional features available: ssl, http11, boto 2014-02-08 17:00:52+0000 [scrapy] DEBUG: Overridden settings: {'NEWSPIDER_MODULE': 'sole24ore.spiders', 'SPIDER_MODULES': ['sole24ore.spiders'], 'BOT_NAME': 'sole24ore'} 2014-02-08 17:00:52+0000 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState 2014-02-08 17:00:52+0000 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 2014-02-08 17:00:52+0000 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2014-02-08 17:00:52+0000 [scrapy] DEBUG: Enabled item pipelines: 2014-02-08 17:00:52+0000 [sole] INFO: Spider opened 2014-02-08 17:00:52+0000 [sole] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2014-02-08 17:00:52+0000 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 2014-02-08 17:00:52+0000 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 2014-02-08 17:00:53+0000 [sole] DEBUG: Redirecting (301) to <GET http://www.ilsole24ore.com/> from <GET http://www.sole24ore.com/> 2014-02-08 17:00:53+0000 [sole] DEBUG: Crawled (200) <GET http://www.ilsole24ore.com/> (referer: None) [s] Available Scrapy objects: [s] hxs <HtmlXPathSelector xpath=None data=u'<html xmlns="http://www.w3.org/1999/xhtm'> [s] item {} [s] request <GET http://www.ilsole24ore.com/> [s] response <200 http://www.ilsole24ore.com/> [s] settings <CrawlerSettings module=<module 'sole24ore.settings' from '/home/ubuntu/ggc/prove/sole24ore/sole24ore/settings.pyc'>> [s] Useful shortcuts: [s] shelp() Shell help (print this help) [s] view(response) View response in a browser In [1]: Do you really want to exit ([y]/n)? y 2014-02-08 17:00:58+0000 [sole] DEBUG: Scraped from <200 http://www.ilsole24ore.com/> {'url': [u'http://www.ilsole24ore.com/ebook/norme-e-tributi/2014/crisi_impresa/index.shtml', u'http://www.ilsole24ore.com/ebook/norme-e-tributi/2014/crisi_impresa/index.shtml', u'http://www.ilsole24ore.com/cultura.shtml', u'http://www.casa24.ilsole24ore.com/', u'http://www.moda24.ilsole24ore.com/', u'http://food24.ilsole24ore.com/', u'http://www.motori24.ilsole24ore.com/', u'http://job24.ilsole24ore.com/', u'http://stream24.ilsole24ore.com/', u'http://www.viaggi24.ilsole24ore.com/', u'http://www.salute24.ilsole24ore.com/', u'http://www.shopping24.ilsole24ore.com/', u'http://www.radio24.ilsole24ore.com/', the scraper folder is 'sole24ore' folder, which is in ~/ggc/prove/sole24ore... while the script I would like to make it working is in ~/ggc/prove Any hints? Thanks for your help. Kind regards. Marco _______________________________________________ Python mailing list Python@lists.python.it http://lists.python.it/mailman/listinfo/python