scrapy.log has been deprecated alongside its functions in favor of
explicit calls to the Python standard logging. Keep reading to learn more
about the new logging system.
Scrapy uses Python’s builtin logging system for event logging. We’ll provide some simple examples to get you started, but for more advanced use-cases it’s strongly suggested to read thoroughly its documentation.
Logging works out of the box, and can be configured to some extent with the Scrapy settings listed in Logging settings.
scrapy.utils.log.configure_logging() to set some reasonable
defaults and handle those settings in Logging settings when
running commands, so it’s recommended to manually call it if you’re running
Scrapy from scripts as described in Run Scrapy from a script.
Python’s builtin logging defines 5 different levels to indicate severity on a given log message. Here are the standard ones, listed in decreasing order:
logging.CRITICAL- for critical errors (highest severity)
logging.ERROR- for regular errors
logging.WARNING- for warning messages
logging.INFO- for informational messages
logging.DEBUG- for debugging messages (lowest severity)
How to log messages¶
Here’s a quick example of how to log a message using the
import logging logging.warning("This is a warning")
There are shortcuts for issuing log messages on any of the standard 5 levels,
and there’s also a general
logging.log method which takes a given level as
argument. If you need so, last example could be rewrote as:
import logging logging.log(logging.WARNING, "This is a warning")
On top of that, you can create different “loggers” to encapsulate messages (For example, a common practice it’s to create different loggers for every module). These loggers can be configured independently, and they allow hierarchical constructions.
Last examples use the root logger behind the scenes, which is a top level
logger where all messages are propagated to (unless otherwise specified). Using
logging helpers is merely a shortcut for getting the root logger
explicitly, so this is also an equivalent of last snippets:
import logging logger = logging.getLogger() logger.warning("This is a warning")
You can use a different logger just by getting its name with the
import logging logger = logging.getLogger('mycustomlogger') logger.warning("This is a warning")
Finally, you can ensure having a custom logger for any module you’re working on
by using the
__name__ variable, which is populated with current module’s
import logging logger = logging.getLogger(__name__) logger.warning("This is a warning")
Logging from Spiders¶
Scrapy provides a
logger within each Spider
instance, that can be accessed and used like this:
import scrapy class MySpider(scrapy.Spider): name = 'myspider' start_urls = ['http://scrapinghub.com'] def parse(self, response): self.logger.info('Parse function called on %s', response.url)
That logger is created using the Spider’s name, but you can use any custom Python logger you want. For example:
import logging import scrapy logger = logging.getLogger('mycustomlogger') class MySpider(scrapy.Spider): name = 'myspider' start_urls = ['http://scrapinghub.com'] def parse(self, response): logger.info('Parse function called on %s', response.url)
Loggers on their own don’t manage how messages sent through them are displayed. For this task, different “handlers” can be attached to any logger instance and they will redirect those messages to appropriate destinations, such as the standard output, files, emails, etc.
By default, Scrapy sets and configures a handler for the root logger, based on the settings below.
These settings can be used to configure the logging:
The first couple of settings define a destination for log messages. If
LOG_FILE is set, messages sent through the root logger will be
redirected to a file named
LOG_FILE with encoding
LOG_ENCODING. If unset and
messages will be displayed on the standard error. Lastly, if
False, there won’t be any visible log output.
LOG_DATEFORMAT specify formatting strings
used as layouts for all messages. Those strings can contain any placeholders
listed in logging’s logrecord attributes docs and
datetime’s strftime and strptime directives
LOG_SHORT_NAMES is set, then the logs will not display the scrapy
component that prints the log. It is unset by default, hence logs contain the
scrapy component responsible for that log output.
There are command-line arguments, available for all commands, that you can use to override some of the Scrapy settings regarding logging.
- Module logging.handlers
- Further documentation on available handlers
Because Scrapy uses stdlib logging module, you can customize logging using all features of stdlib logging.
For example, let’s say you’re scraping a website which returns many HTTP 404 and 500 responses, and you want to hide all messages like this:
2016-12-16 22:00:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <500 http://quotes.toscrape.com/page/1-34/>: HTTP status code is not handled or not allowed
The first thing to note is a logger name - it is in brackets:
[scrapy.spidermiddlewares.httperror]. If you get just
LOG_SHORT_NAMES is likely set to True; set it to False and re-run
Next, we can see that the message has INFO level. To hide it
we should set logging level for
higher than INFO; next level after INFO is WARNING. It could be done
e.g. in the spider’s
import logging import scrapy class MySpider(scrapy.Spider): # ... def __init__(self, *args, **kwargs): logger = logging.getLogger('scrapy.spidermiddlewares.httperror') logger.setLevel(logging.WARNING) super().__init__(*args, **kwargs)
If you run this spider again then INFO messages from
scrapy.spidermiddlewares.httperror logger will be gone.
Initialize logging defaults for Scrapy.
- settings (dict,
None) – settings used to create and configure a handler for the root logger (default: None).
- install_root_handler (bool) – whether to install root logging handler (default: True)
This function does:
- Route warnings and twisted logging through Python standard logging
- Assign DEBUG and ERROR level to Scrapy and Twisted loggers respectively
- Route stdout to log if LOG_STDOUT setting is True
install_root_handleris True (default), this function also creates a handler for the root logger according to given settings (see Logging settings). You can override default options using
settingsis empty or None, defaults are used.
configure_loggingis automatically called when using Scrapy commands, but needs to be called explicitly when running custom scripts. In that case, its usage is not required but it’s recommended.
If you plan on configuring the handlers yourself is still recommended you call this function, passing install_root_handler=False. Bear in mind there won’t be any log output set by default in that case.
To get you started on manually configuring logging’s output, you can use logging.basicConfig() to set a basic root handler. This is an example on how to redirect
INFOor higher messages to a file:
import logging from scrapy.utils.log import configure_logging configure_logging(install_root_handler=False) logging.basicConfig( filename='log.txt', format='%(levelname)s: %(message)s', level=logging.INFO )
Refer to Run Scrapy from a script for more details about using Scrapy this way.
- settings (dict,