LIMITED TIME OFFER 70% OFF View Details

Troubleshoot guide for Scrapes

Troubleshoot guide for Scrapes
Octolooks
Octolooks

We have assembled the issues that you may encounter in some situations when you use Scrapes, and the steps you need to follow to easily resolve these issues.

Table of contents

Gives “The package could not be installed. No valid plugins were found” error during installation.

The installation file is the file named “ol_scrapes.zip” located in “Plugin” folder in “Scrapes.zip” package zip file. In order to install the plugin, if you upload the downloaded file directly instead of the installation file, you see this error message. You can reach out detailed information about installation steps in our related blog post.

Gives “Failed to open stream: No such file or directory” error after installation.

Some antivirus software installed automatically by hosting providers may mark our “shell commands” included in our plugin which provides pinging for automatic task start in main plugin file “class-ol-scrapes.php” harmful and may cause this error because of renaming this file or changing the extension. In order to solve this issue you can learn which lines are marked harmful from this antivirus software by asking your hosting provider first and contact us with this information.

Gives “An error occurred while connecting to server.” error while activating.

Some firewall software installed by hosting providers may give this error message because of blocking outgoing connection requests on your server. This blocking rules must be removed in order to ensure that your purchase code is validated during activation, as well as to send requests to the target sites to be scraped. In order to resolve this issue, you can contact your hosting provider to request not to block your outgoing connection requests on your server.

Gives “Domain name is not matching with your site.” error while activating.

You encounter this error message if the domain name of your website and the domain name value that you defined in “Domain” field in the settings screen are not the same. In order to resolve this issue install the plugin to the website you want to use; in the settings page instead of defining another domain name you can define the same domain name of the website you are installing on.

Gives “Purchase code is not approved.” while activating.

You encounter this error message if the code you define in “Code” field does not match the purchase code (license key) on the plugin’s settings page. In order to resolve this issue you can verify and check the accuracy of the code you defined. You can reach out detailed information about accessing license keys steps in our related blog post.

Gives “Purchase code is already exists.” error while activating.

You see this error message if your purchase code you are trying to define on the settings screen of plugin is defined or active on the domain name of a website you used previously. In order to resolve this issue and reset your domain registration, you can contact us after deleting your purchase code from the settings page. You can reach out detailed information about purchase code activation steps in our related blog post.

The visual selector is not opening at all

You may encounter this problem if some of the plugins installed on your site load CSS and Javascript codes on all other pages without limiting their pages. If the visual selector is never opened, cannot be clicked by a black shadow area, is opened in a very small size, or your web browser prints a console error, this is a clear indication of this incompatibility. In order to resolve this situation, you can temporarily disable all plugins one by one until you find the plugin that is causing the problem, and observe whether the visual selector is working. When creating a new task, you can disable the plug-in which causes the problem to enable it later, or you can contact the developer of this extension.

The visual selector shows a blank (white) screen.

You may encounter this problem if some source sites block the IP address of your server to prevent excessive requests from your server. In order to resolve this issue you can stop all tasks that retrieves content from the relevant source site. Wait for a while (it may take up to 24 hours for some website) then create only one task to reduce the number of requests sent by increasing the value of “Wait next processes” to 5-10 seconds.

It is more common that you often encounter this problem with popular shared hosting services because of possible requests from other sites with the same IP address where you share the server. In order to resolve this issue, you can define a proxy as described below in your WordPress configuration file (wp-config.php) located in the parent directory to change the IP address used in the requests to the source site.

/* Configure HTTP Proxy Server */
define('WP_PROXY_HOST', '192.168.1.1');
define('WP_PROXY_PORT', '3128');
define('WP_PROXY_USERNAME', '');
define('WP_PROXY_PASSWORD', '');
define('WP_PROXY_BYPASS_HOSTS', 'localhost');

If the proxy you are using does not require a user name and password, you do not need to add the corresponding lines. You can find out your premium or free proxy is running by creating a new task with https://www.whatismyip.com defined in “Source URL” field and checking the IP address in the visual selector. The IP address you see in the visual selector must be the same in your wp-config.php value.

The visual selector gives “A valid URL was not provided” error.

If you do not define a valid web address in the “Source URL” field or if you try to define other fields with a visual selector without selecting “Post item” while editing the “Serial” type task that you created, you encounter this error message. In order to resolve this issue when creating a new task you can define a valid web address in the “Source URL” field. While editing a previously created task, you can redefine the “Post item” field with a visual selector before you can define other fields with the visual selector.

The visual selector gives “cURL” error.

You may encounter these and similar error messages due to “Network”, “DNS” or “Firewall” problems caused by some hosting services. In order to resolve this situation, you can contact your system administrator or hosting service provider directly using the message template below.

“Hello, I am using a WordPress plugin to retrieve data from (Enter target site here) and getting cURL Error (Enter full error here). Can you please check and help me to solve if it is netword related? Thank you very much in advance, your support is greatly appreciated”

The website in the visual selector is different or missing elements from the normal website.

Due to the Cross-Origin (CORS) policy implemented by web browsers for security purposes; If you try to pull content from an SSL-certified (Secure, https) website without SSL certificate (Not secure, http), you encounter this problem because files such as CSS, font and image cannot be uploaded to the visual selector (Your admin panel is https, target website url is http). In order to resolve this issue, you can edit relevant URLs, or temporarily disable and enable your SSL certificate, so that your site and the target site have the same protocol.

The website in the visual selector shows CAPTCHA (I’m not robot).

You may encounter this problem if some resource sites detect that excessive requests from your server are automatically performed in a specified period. Since CAPTCHAs are mostly JavaScript-based and cannot be solved on the server side as described in our related article, you can try to apply the solutions specified in “Visual selector shows blank (white) screen” to resolve this issue.

Another possible solution you can try to define cookie values to be sent to the source site in the task after opening the resource site in your own web browser and encounter the related CAPTCHA and resolve.

Gives “403” or “404” error while creating a task.

You may encounter these error messages from “Apache security module” filters installed by some hosting services if these filters identify variables such as “concat”, “contains” in the XPath codes defined by the visual selector. In order to resolve this issue you can follow these steps below

  • You can deactivate the corresponding module installed in your hosting service for a while, or contact your hosting service provider to do so. After successfully creating the task, you can re-enable this module.
  • Click on the “Settings » Permalinks” link in the left navigation and without making any changes in the opened settings screen, you can click the “Save Changes” button and observe the results.
  • You can observe the results by adding the following code to the “.htaccess” files in the main directory of the server where your website is installed and in the “wp-admin” folder if any.

    <IfModule mod_security.c>
    SecRuleEngine Off
    SecFilterInheritance Off
    SecFilterEngine Off
    SecFilterScanPOST Off
    SecRuleRemoveById 300015 3000016 3000017
    </IfModule>

Task status is stuck at “Preparing”.

You encounter this problem if the WordPress cron (wp-cron.php) mechanism, which allows the task to start automatically when the specified runtime arrives, is not triggered as it should. In order to resolve this issue you can apply the following solution suggestions.

  1. You can use the WP Crontrol plugin to detect the problem found in your WordPress cron mechanism. Once you have activated this free, third-party WordPress plugin click on “Tools » Cron Events” link to share the error message if any on the opening page with your hosting provider or system administator by using the template message below.

    “Hello, my wp-cron.php file is not pinged and I’m getting an error message (Enter full error here) on my WordPress administration panel. Can you please check and help me to solve it? Thank you very much in advance, your support is greatly appreciated”

  2. DNS settings or firewall blocking rules identified by some hosting services can cause this problem. In order to resolve this issue, you can define your server’s IP address and domain name in the “hosts” file if you have access, or you can contact your system administrator or hosting service provider to do so.
  3. Having basic authentication or maintenance plugin installed on your WordPress site causes your cron file not to be pinged and makes it out of reach makes you encounter this problem. In order to resolve this issue, you can disable the relevant plugins to make the cron file accessible from the outside.
  4. In order to enable alternative WordPress cron, you can observe the results by adding adding the below code line; before “That’s all, stop editing! Happy blogging.” expression in “wp-config.php” file which is located in the main folder of your WordPress installed site.

    define('ALTERNATE_WP_CRON', true);

  5. Although you have checked the “System” option for the “Cron type” field, you may encounter this problem because of the restrictions that shared hosting services impose on the shell commands that are executed from the PHP files. In order to resolve this situation you can manually add the following line to your crontab by running the command “crontab –e” via the shell connection or server management panel like cPanel.

    * * * * curl http://(Your URL here)/wp-cron.php > /dev/null 2>&1

  6. In case of your cron file is not triggered properly or the steps you follow to trigger are inconclusive you can use third party remote pinging services like EasyCron in order to ping your http://(Your URL here)/wp-cron.php file instead of local pinging.
  7. You encounter this problem if your WordPress installed website server’s timezone or automatic date/time update settings are corrupted. After you register the task, in order to resolve this situation, you can update your server’s system time after you confirm that “Next run” value is not the same as the minute value of the hour you are recording, or 1 minute forward. You can contact your system administrator or hosting service provider to perform this system timezone update operation.
  8. You may encounter this problem because some plugins that are installed on your WordPress site or that have previously used the automatic processing, use the WordPress cron queue heavily that Scrapes cron events are not processed from this queue. In order to resolve this, you can disable other plugins that perform automatic processing, and you can remove other tasks that occupy the cron with the WP Crontrol plugin described in section 1.
  9. If you migrate your WordPress site from database dump to a different domain name, you encounter this problem. In order to resolve this, after you remove the license information that was automatically installed on your new WordPress site with database dump on settings page, so you can contact us to change your domain name registration to your license code.

Another possibility is that, your plugin files may be corrupted while uploading for some reason if you do not see your changes after a task save you can confirm this situation. In order to resolve this issue please follow the steps below.

  • Remove your purchase code from the “Settings” screen.
  • Delete the “ol_scrapes” plugin folder under “wp-content/plugins” from your server with your FTP client.
  • Download the latest version from  with your account.
  • Upload the “ol_scrapes” plugin folder to “wp-content/plugins” after you unzip “ol_scrapes.zip” file.
  • Login to your WordPress admin panel and check the result.

Task status is “Running” but no new post is created.

The creation of a new post may take up to 10-15 seconds depending on the content in the source site and the intensity of the settings defined when creating the task. In case of no change in the value of “Last scrape” for more than 10 minutes despite the task operation start, you can apply the following solution suggestions.

  • If the XPath value defined in the “Post item” field with the visual selector does not match the XPath value for the links that are located on the listing page of the source site and referencing to the detail page when they are clicked, you encounter this problem. In order to solve this situation with “reference a element not found” error message logged and to define the manual XPath with exacth match option in the task, you can contact us after making sure that this error message is found in your log file and sharing us the source url.
  • You encounter this problem because of the limits imposed by some hosting services and limiting the maximum execution time required for the task to run. In order to resolve this issue, which is saved in the log file as “single scrape started” and then the sudden stop of the log recording in the middle of task run, you can check your log file and make sure that this happens. Then you can contact your system administrator or hosting service provider directly using the following message template.

    “Hello, I am using a PHP script which needs to run 5 minutes without a break. Can you please set my max_execution_time directive in a php.ini file to “0” (Unlimited) or to “300” (5 minutes) at least? Thank you very much in advance, your support is greatly appreciated”

  • You may encounter this problem if some resource sites block the IP address of your server to prevent excessive requests from your server. In order to solve this issue, you can try to apply the solutions specified in “Visual selector displays blank (white) screen” section.
  • You encounter this problem if the defined “Run frequency” value is less than the minimum time required to scrape the contents of the source site. When the time elapsed since the start of the task has reached the defined value in the “Run frequency” field, new posts will not be created since the task will stop and start again from the beginning even if the contents of the source site are not all grabbed. In order to resolve this issue, epsecially when there is a large number of content available on the resource site, and if “Unlimited” is selected for the “Total posts” field; Instead of defining a short value such as “Every 5 minutes” in “Run frequency” field. You can define a reasonable value enough to scrape all content from the source site such as “Every day”.

    The value defined in the “Total posts” field represents the initial posts on the listing page of the source site; the perception that a total posts amount of new post will be created during each task is not true.

The images in the post content are not showing.

You encounter this isssue in the situtaion that some source sites, in order to increase the page opening speeds instead of identifying the “img” tags with the “src” attributes of their contents directly, they can use the so-called lazy load JavaScript method to load the page images while hovering from and identify “data-src” or “data-lazy-src” attributes. In order to resolve this issue where 1 pixel transparent “placeholder” image is mostly defined “src” attributes instead of real image is downloaded, you can add ” Find and replace “rule below to the task’s content area. If the source site uses another HTML attribute instead of “data-lazy-src ” or the defined custom attribute is before src attribute, the regex code defined in “Find” field must be updated with the same attribute name or change order.

Find: \ssrc="([^"]+)(?:.*?)data-lazy-src="([^"]+)"
Replace: src="$2"

Downloading images even not checked “Download images to media library” option.

Although you do not want to download the images contained in the source site to your own server, you do not select the “Download images to media library” option in the task’s content area, but you encounter this problem if you make any definition to the featured image field. You can use Featured Image from URL plugin to resolve this issue, which is caused by the fact that the featured images must be from the media library on your own WordPress site by default, as opposed to images that can be viewed remotely without downloading to your own server.

By enabling this plugin, which is a free third-party WordPress plugin, you can have the XPath value of the image displayed without identifying it in the featured image field (if it is automatically defined please delete it), and downloading. Instead defining a “Custom field” with the settings below to not download and show remotely. You can edit this setting, which is generally automatically found on most sources featured image, to match the image you want.

Name: fifu_image_url
Value: //metal[@property="og:image"]
Attribute: content

Translate is not working.

You may encounter this problem if the service providing the translation service blocks the IP address of your server to prevent any excessive requests. Since translation service is provided free of charge through the unofficial API normally provided by “Google Translate”, a paid service, you can try to apply the solutions specified in “The visual selector shows a blank (white) screen” section to resolve this situation. Also you can search for “Google translate service http error” in your logs to make sure that you are making excessive requests.

Tracking log file.

Each task records the operations it performs with its identifier number (ID) in the file “logs.txt” under the “/wp-content/plugins/ol_scrapes/logs” folder. You can open and review this recording file with any text editing tool to see the step-by-step process of the plugin’s task, or to quickly detect a possible problem you may encounter.

The server system time, task identifier number (Scrapes Task ID), process identifier number (PHP process ID) and the current RAM usage information are saved as a new line in the log file for each operation. You can examine the logs from the related tasks by searching. “TASK ID: (Enter task ID Found on dashboard)”. Other keywords and their descriptions you can search are as follows.

  • “Number of posts” indicates how many posts have been processed in total.
  • “Number of links” shows how many redirecting detail page links are found on the listing page.
  • If there is a defined “Find and replace rule” in the task “Before regex” and “After regex” indicates the content before and after processing.
  • “Post updated”and “Post inserted” indicates that the post has been updated or added when “Update post” option is active.
  • “Repeat count” indicates the number of times that the source site has been re-encountered same content with previously captured content.

Follow us to stay updated.