3.5. Search engine utilization¶
Appropriate use of search engines (Google/Bing/Yahoo/Baidu, etc.) can obtain more information about the target site.
3.5.1. Search engine processing flow¶
- data preprocessing
length truncation
case conversion
remove punctuation
Simplified and Traditional Conversion
Number normalization, Chinese numerals, Arabic numerals, Roman characters
synonym rewriting
Pinyin rewriting
- deal with
Participle
keyword extraction
Illegal information filtering
3.5.2. Search Techniques¶
- site:www.hao123.com
Returns all content crawled by search engines for this target site
- site:www.hao123.com keyword
Returns all pages containing this keyword that are crawled by search engines on this target site
Here you can set keywords as website background, management background, password modification, password retrieval, etc.
- site:www.hao123.com inurl:admin.php
Returns all pages containing admin.php in the address of the target site, you can use admin.php/manage.php or other keywords to find key function pages
- link:www.hao123.com
Returns all pages that contain links to the target site, including the developer’s personal blog, development log, or third-party companies, partners, etc. who open this site
- related:www.hao123.com
Returns all pages that are “similar” to the target site, may contain some general program information, etc.
- intitle:”500 Internal Server Error” “server at”
Search for the wrong page
- inurl:”nph-proxy.cgi” “Start browsing”
Find a proxy server
In addition to the above keywords, there are allintile / allinurl / allintext / inanchor / intext / filetype / info / numberange / cache and so on.
3.5.2.1. Wildcards¶
*
represent a wordOR or | stands for logical or
word preceded
+
table forced lookupword preceded
-
table exclude the corresponding keyword"
Emphasize keywords
3.5.2.2. tips¶
Queries are not case sensitive
parentheses are ignored
Search with and logic by default
3.5.3. Snapshot¶
Snapshots of search engines often contain some key information, such as program error information that may leak the specific path of the website, or some test information for testing will be saved in some snapshots. For example, when a website develops a background function module, The authority identification has not been added to all pages, and the snapshot is captured by the search engine at this time. Even if the website adds authority identification later, the information will still be retained in the snapshot of the search engine.
There are also dedicated site snapshots that provide snapshot functionality, such as the Wayback Machine and Archive.org etc.
3.5.4. Github¶
In Github, there may be source code leaks, AccessKey leaks, passwords, server configuration leaks, etc. Common search techniques are:
@example.com password/pass/pwd/secret/credentials/token
@example.com username/user/key/login/ftp/
@example.com config/ftp/smtp/pop
@example.com security_credentials/connetionstring
@example.com JDBC/ssh2_auth_password/send_keys