3.5. Search engine utilization

Appropriate use of search engines (Google/Bing/Yahoo/Baidu, etc.) can obtain more information about the target site.

3.5.1. Search engine processing flow

  • data preprocessing
    • length truncation

    • case conversion

    • remove punctuation

    • Simplified and Traditional Conversion

    • Number normalization, Chinese numerals, Arabic numerals, Roman characters

    • synonym rewriting

    • Pinyin rewriting

  • deal with
    • Participle

    • keyword extraction

    • Illegal information filtering

3.5.2. Search Techniques

  • site:www.hao123.com
    • Returns all content crawled by search engines for this target site

  • site:www.hao123.com keyword
    • Returns all pages containing this keyword that are crawled by search engines on this target site

    • Here you can set keywords as website background, management background, password modification, password retrieval, etc.

  • site:www.hao123.com inurl:admin.php
    • Returns all pages containing admin.php in the address of the target site, you can use admin.php/manage.php or other keywords to find key function pages

  • link:www.hao123.com
    • Returns all pages that contain links to the target site, including the developer’s personal blog, development log, or third-party companies, partners, etc. who open this site

  • related:www.hao123.com
    • Returns all pages that are “similar” to the target site, may contain some general program information, etc.

  • intitle:”500 Internal Server Error” “server at”
    • Search for the wrong page

  • inurl:”nph-proxy.cgi” “Start browsing”
    • Find a proxy server

In addition to the above keywords, there are allintile / allinurl / allintext / inanchor / intext / filetype / info / numberange / cache and so on.

3.5.2.1. Wildcards

  • * represent a word

  • OR or | stands for logical or

  • word preceded + table forced lookup

  • word preceded - table exclude the corresponding keyword

  • " Emphasize keywords

3.5.2.2. tips

  • Queries are not case sensitive

  • parentheses are ignored

  • Search with and logic by default

3.5.3. Snapshot

Snapshots of search engines often contain some key information, such as program error information that may leak the specific path of the website, or some test information for testing will be saved in some snapshots. For example, when a website develops a background function module, The authority identification has not been added to all pages, and the snapshot is captured by the search engine at this time. Even if the website adds authority identification later, the information will still be retained in the snapshot of the search engine.

There are also dedicated site snapshots that provide snapshot functionality, such as the Wayback Machine and Archive.org etc.

3.5.4. Github

In Github, there may be source code leaks, AccessKey leaks, passwords, server configuration leaks, etc. Common search techniques are:

  • @example.com password/pass/pwd/secret/credentials/token

  • @example.com username/user/key/login/ftp/

  • @example.com config/ftp/smtp/pop

  • @example.com security_credentials/connetionstring

  • @example.com JDBC/ssh2_auth_password/send_keys