{"id":2332,"date":"2021-11-01T11:35:12","date_gmt":"2021-11-01T02:35:12","guid":{"rendered":"https:\/\/sirius10.net\/blog\/wordpress\/?p=2332"},"modified":"2021-11-01T11:35:14","modified_gmt":"2021-11-01T02:35:14","slug":"post-2332","status":"publish","type":"post","link":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/2021\/11\/01\/2332\/","title":{"rendered":"10 \u6708\u306e\u30a2\u30af\u30bb\u30b9\u30ed\u30b0\uff08CCBOT\uff09"},"content":{"rendered":"\n<p>\u300010 \u6708\u306e Webalizer \u306e\u30ec\u30dd\u30fc\u30c8\u3092\u898b\u307e\u3057\u305f\u3002<\/p>\n\n\n\n<p>\u3000\u6642\u3005\u30ec\u30dd\u30fc\u30c8\u3092\u898b\u3066\u3044\u305f\u306e\u3067\u3001\u65b0\u305f\u306b\u5bfe\u51e6\u304c\u5fc5\u8981\u306a\u3082\u306e\u306f\u306a\u3055\u305d\u3046\u3067\u3059\u3002404 \u30a8\u30a2\u30e9\u30fc\u3082 2 % \u4ee5\u4e0b\u306b\u306a\u308a\u307e\u3057\u305f\u3002<\/p>\n\n\n\n<p>\u3000\u6c17\u306b\u306a\u3063\u305f bot \u304c\u6709\u3063\u305f\u306e\u3067\u3001\u8abf\u3079\u3066\u307f\u307e\u3057\u305f\u3002UA \u304c CCBot\/2.0 (https:\/\/commoncrawl.org\/faq\/) \u3067\u3059\u3002<\/p>\n\n\n\n<p>\u3000CCBot \u306f\u975e\u55b6\u5229\u7d44\u7e54\u304c\u904b\u7528\u3057\u3066\u3044\u308b\u3088\u3046\u3067\u3059\u3002\u554f\u984c\u306f\u306a\u3055\u305d\u3046\u306a\u306e\u3067\u3001https:\/\/commoncrawl.org\/ \u3092\u958b\u3044\u3066\u307f\u307e\u3057\u305f\u3002<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>What to do with the crawled content?<br>The crawl data is stored on Amazon\u2019s S3 service, allowing it to be bulk downloaded as well as directly accessed for map-reduce processing in EC2.<\/p><cite>http:\/\/commoncrawl.org\/big-picture\/frequently-asked-questions\/<\/cite><\/blockquote>\n\n\n\n<p>\u3000Google \u3055\u3093\u306b\u304a\u9858\u3044\u3057\u3066\u65e5\u672c\u8a9e\u306b\u3057\u3066\u3082\u3089\u3044\u307e\u3059\u3002<\/p>\n\n\n\n<p class=\"has-lightyellow-background-color has-background\">\u30af\u30ed\u30fc\u30eb\u3055\u308c\u305f\u30b3\u30f3\u30c6\u30f3\u30c4\u3092\u3069\u3046\u3059\u308b\u304b\uff1f<br><br>\u30af\u30ed\u30fc\u30eb\u30c7\u30fc\u30bf\u306fAmazon\u306eS3\u30b5\u30fc\u30d3\u30b9\u306b\u4fdd\u5b58\u3055\u308c\u308b\u305f\u3081\u3001\u4e00\u62ec\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3057\u305f\u308a\u3001EC2\u3067\u306emap-reduce\u51e6\u7406\u306e\u305f\u3081\u306b\u76f4\u63a5\u30a2\u30af\u30bb\u30b9\u3057\u305f\u308a\u3067\u304d\u307e\u3059\u3002<\/p>\n\n\n\n<p>\u3000\u521d\u3081\u306e\u65b9\u3067\u306f\u6b21\u306e\u3088\u3046\u306b\u66f8\u3044\u3066\u3042\u308a\u307e\u3059\u3002<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>What is Common Crawl?<br>Common Crawl is a 501(c)(3) non-profit organization dedicated to providing a copy of the internet to internet researchers, companies and individuals at no cost for the purpose of research and analysis.<\/p><cite>http:\/\/commoncrawl.org\/big-picture\/frequently-asked-questions\/<\/cite><\/blockquote>\n\n\n\n<p class=\"has-lightyellow-background-color has-background\">\u30b3\u30e2\u30f3\u30af\u30ed\u30fc\u30eb\u3068\u306f\u4f55\u3067\u3059\u304b\uff1f<br>Common Crawl\u306f\u3001501\uff08c\uff09\uff083\uff09\u306e\u975e\u55b6\u5229\u56e3\u4f53\u3067\u3042\u308a\u3001\u8abf\u67fb\u3068\u5206\u6790\u3092\u76ee\u7684\u3068\u3057\u3066\u3001\u30a4\u30f3\u30bf\u30fc\u30cd\u30c3\u30c8\u306e\u30b3\u30d4\u30fc\u3092\u30a4\u30f3\u30bf\u30fc\u30cd\u30c3\u30c8\u306e\u7814\u7a76\u8005\u3001\u4f01\u696d\u3001\u500b\u4eba\u306b\u7121\u6599\u3067\u63d0\u4f9b\u3059\u308b\u3053\u3068\u3092\u76ee\u7684\u3068\u3057\u3066\u3044\u307e\u3059\u3002<\/p>\n\n\n\n<p>\u3000<span style=\"background-color: #ffff00\" class=\"background-color\">\u30b3\u30d4\u30fc\u3092\u30a4\u30f3\u30bf\u30fc\u30cd\u30c3\u30c8\u306e\u7814\u7a76\u8005\u3001\u4f01\u696d\u3001\u500b\u4eba\u306b\u7121\u6599\u3067\u63d0\u4f9b<\/span><\/p>\n\n\n\n<p>\u3000\u3046\u3093\u3002\u30b9\u30d1\u30de\u30fc\u3084\u30cf\u30c3\u30ab\u30fc\u306b\u30c7\u30fc\u30bf\u304c\u6e21\u308a\u307e\u3059\u306d\u3002SHODAN \u3068\u540c\u3058\u611f\u3058\u3067\u3059\u3002\u5168\u90e8\u306e\u30da\u30fc\u30b8\u306b \u00a9 \u30de\u30fc\u30af\u4ed8\u3051\u3066\u308b\u3093\u3067\u3059\u3051\u3069\u306d\u3002<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-wp-embed is-provider-sirius-\u306e\u30d6\u30ed\u30b0 wp-block-embed-sirius-\u306e\u30d6\u30ed\u30b0\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"a7IOQRPhyS\"><a href=\"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/2021\/07\/07\/1609\/\">\u30dd\u30fc\u30c8\u30b9\u30ad\u30e3\u30f3(SHODAN)<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8220;\u30dd\u30fc\u30c8\u30b9\u30ad\u30e3\u30f3(SHODAN)&#8221; &#8212; Sirius \u306e\u30d6\u30ed\u30b0\" src=\"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/2021\/07\/07\/1609\/embed\/#?secret=1HIhDcznaq#?secret=a7IOQRPhyS\" data-secret=\"a7IOQRPhyS\" width=\"500\" height=\"282\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><figcaption>\u8ff7\u60d1\u306a\u30d7\u30ed\u30b8\u30a7\u30af\u30c8<\/figcaption><\/figure>\n\n\n\n<p>\u3000\u307e\u3042\u3001\u300cWeb \u9b5a\u62d3\u300d\u3084 \u300cWayback Machine\u300d\u306a\u3093\u304b\u3082\u3042\u308b\u306e\u3067 Web \u306e\u30b3\u30d4\u30fc\u3092\u4fdd\u5b58\u3059\u308b\u3053\u3068\u81ea\u4f53\u306f\u9632\u3052\u306a\u3044\u3057\u3001\u4ed5\u69d8\u304c\u306a\u3044\u3068\u601d\u3063\u3066\u3044\u307e\u3059\u3002\u3057\u304b\u3057\u3001\u3053\u308c\u306f\u7d68\u6bef\u7206\u6483\u306e\u3088\u3046\u306b\u30c7\u30fc\u30bf\u3092\u53ce\u96c6\u3057\u3001\u307e\u308b\u3054\u3068\u30c7\u30fc\u30bf\u3092\u63d0\u4f9b\u3057\u307e\u3059\u3002\u3046\u3093\u3001\u8ff7\u60d1\u3067\u3059\u3002<\/p>\n\n\n\n<p>\u3000\u5e78\u3044\u3001robots.txt \u306b\u5bfe\u5fdc\u3057\u3066\u3044\u308b\u3088\u3046\u306a\u306e\u3067\u3001\u6b21\u3092\u8ffd\u52a0\u3057\u307e\u3057\u305f\u3002<\/p>\n\n\n\n<p class=\"file\">User-agent: CCBot<br>Disallow: \/<\/p>\n\n\n\n<p>\u3000\u5f8c\u306f\u7279\u306b\u6c17\u306b\u306a\u308b\u3082\u306e\u306f\u306a\u3044\u3088\u3046\u3067\u3059\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u300010 \u6708\u306e Webalizer \u306e\u30ec\u30dd\u30fc\u30c8\u3092\u898b\u307e\u3057\u305f\u3002 \u3000\u6642\u3005\u30ec\u30dd\u30fc\u30c8\u3092\u898b\u3066\u3044\u305f\u306e\u3067\u3001\u65b0\u305f\u306b\u5bfe\u51e6\u304c\u5fc5\u8981\u306a\u3082\u306e\u306f\u306a\u3055\u305d\u3046\u3067\u3059\u3002404 \u30a8\u30a2\u30e9\u30fc\u3082 2 % \u4ee5\u4e0b\u306b\u306a\u308a\u307e\u3057\u305f\u3002 \u3000\u6c17\u306b\u306a\u3063\u305f bot \u304c\u6709\u3063\u305f\u306e\u3067\u3001\u8abf\u3079\u3066\u307f\u307e [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19],"tags":[],"class_list":["post-2332","post","type-post","status-publish","format-standard","hentry","category-web"],"_links":{"self":[{"href":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/wp-json\/wp\/v2\/posts\/2332","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/wp-json\/wp\/v2\/comments?post=2332"}],"version-history":[{"count":2,"href":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/wp-json\/wp\/v2\/posts\/2332\/revisions"}],"predecessor-version":[{"id":2334,"href":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/wp-json\/wp\/v2\/posts\/2332\/revisions\/2334"}],"wp:attachment":[{"href":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/wp-json\/wp\/v2\/media?parent=2332"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/wp-json\/wp\/v2\/categories?post=2332"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sirius10.net\/blog\/wordpress\/index.php\/wp-json\/wp\/v2\/tags?post=2332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}