Articles & Blogs

Penetration Testing: Search Engine based Reconnaissance

October 20, 2022 | By Accorian

Written by Vivek Jaiswal II 

Reconnaissance is an essential phase in Penetration Testing, before actively testing targets for vulnerabilities.

It helps you widen the scope & attack surface and helps uncover potential vulnerabilities. There are already multiple open-source and proprietary automated tools available in the market to perform reconnaissance or scan any host/application for vulnerabilities, while penetration testing. However, the manual and professional approach is what gives you the actual understanding of the backend technology, it’s workflow, and helps you uncover potential vulnerabilities.

A basic reconnaissance includes:

  • Subdomain Enumeration
  • Directory Enumeration
  • Port Scanning
  • Search Engine based recon
  • Github recon
  • Shodan recon
  • Enumerating backend technologies
  • WayBack History
  • and so on


In this article, I will demonstrate how a simple Search Engine based Reconnaissance helped me identify a potential security vulnerability that leads to dumping the entire database – SQLi

While I was recently working on an External Network Penetration Testing project, as usual, I started with the basic reconnaissance approach.

Now, for a Network Penetration Testing activity, I started with the basic port scan and services enumeration. Once the scans were complete, I found the 80/TCP port open which is an HTTP webpage. I then quickly visited the site and found that it did not have any feature or functionality and was only a static error page.

Search Engine Based Reconnaissance

After this, I started performing some directory brute forcing using a common wordlist of directories. You can find the payload list here.

I couldn’t find any valid directory or any entry point, on trying the common directory wordlist. I then proceeded with another reconnaissance approach which was the Search Engine Based Information Discovery.

Tip: It is always better to use custom directory names wherever possible, as they are difficult to guess and brute force.

Here comes the interesting part

First, what is Search Engine based discovery or reconnaissance in penetration testing?

Search Engine based reconnaissance in simple terms is a method to extract all the information which are publicly available on the internet in the databases of various search engine(s).

Basically, all search engines work in an automated fashion where they use software known as web crawlers that explore the web regularly to find pages to add to their indexes. In fact, the vast majority of pages listed in our results aren’t manually submitted for inclusion but are found and added automatically when the web crawlers explore the web.

Coming back to our scenario:

While I was trying to enumerate via Search Engine discovery, looking for information publicly disclosed over the Internet, I came across a very interesting directory. In this article, I will call it “/unique_directory”.

Search Engine based Reconnaissance

I quickly accessed the URL ( and found a simple login page.

Search Engine based Reconnaissance

I then started to poke the login page to find weaknesses that I could leverage and, in the process, I sent a single quote (‘) as the username. The server responded with an error page, and the error message indicated that there was an unclosed quotation mark, as the addition of the single quote made the backend query syntactically incorrect.

this behavior and the error message from the server,is an indication of a possible SQLi.

After numerous attempts of carefully crafting and recrafting payloads, it was observed that the servers revealed the backend database information in the error message, which confirmed the presence of a SQLi vulnerability and also the database server used in the backend.

Search Engine based Reconnaissance

Payload used: ‘) and (select CASE WHEN (substring(@@version,1,50))=1 THEN 1 ELSE 0 END )=1 and (‘1’=’1

Explanation of the payload:

The CASE statement goes through conditions and returns a value when the first condition is met (similar to an if-then-else statement). So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE clause. You can find details of the syntax here.

What are some of the preventive measures one should always consider overcoming such scenarios?

As part of your vulnerability or security management strategy, you need to continuously identify and remediate vulnerabilities in every small, medium, and large business application, because there is never a one-time or one-patch solution for vulnerabilities. The business applications, hosts, assets, and every single piece of information which are posted online need to be audited and monitored in a regular and timely fashion.

Hence, the best recommendation to overcome such scenarios is to carefully consider the sensitivity of design and configuration information before it is posted online and to periodically scan them.

Sometimes, the robots.txt files also work efficiently in preventing sensitive pages, directories, or information from being automatically crawled by search engines and getting stored in their databases.

A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests. However, it is still not a fool proof solution.

vulnerability or security management strategy

One can also Block Search indexing with noindex: A noindex meta tag or header can be added to the HTTP response to prevent a website or other resource from showing up in Google search. Regardless of whether other websites link to it, the page will be completely removed from Google Search results when Googlebot crawls it again and notices the tag or header.

Similarly, we can also temporarily block search results from your site or manage safe search filtering.

Details on this can be found here.


Every piece of information available on the internet could be sensitive and can be a potential entry point for attackers, which could later be escalated to dump the entire database.

Even if you are unintentionally exposing the sensitive information, it will still be crawled through and will be indexed into the search engine Databases, which an attacker can easily extract and enumerate via search engine based discovery or Google dorking.

Useful Tip:  While performing search engine based reconnaissance, do not limit testing to just one search engine provider, as different search engines may generate different results. Search engine results can vary in a few ways, depending on when the engine last crawled content, and the algorithm the engine uses to determine relevant pages.

Every small, medium, and large application, host, API, or any piece of information posted online should be thoroughly examined. It is highly recommended to review the sensitivity of the online information on current designs and configurations, on a regular basis.

Recent Blog

Ready to Start?

Ready to Start?​

Drop your CVs to

Interested Position

Download Case study

Download SOC2 Guide