Item - データベース: ことせかいWebページ読み込み用情報

Database: ことせかいWebページ読み込み用情報

ことせかいという読み上げアプリで利用するデータです。
Webで読める小説を取り込むために、文書の場所や次のページへのlinkなどをさすxpathを記述します。
幾つかの項目はAutopagerizeとほぼ同じ意味です。

ことせかいについては以下のURLを参照してください。
http://limura.github.io/NovelSpeaker/

以下、個々の属性についての概ねの方針を書いておきます。

url
対象のURLしてヒットする正規表現を書きます。

pageElement
読み上げ対象となる文字列が含まれるエレメントを取り出すxpathを書きます。複数hitするようなxpathを書いた場合、その全てが取り出されます。

nextLink
次のページに続く場合のaタグを指定するxpathを書きます。複数hitするものを書いた場合、最初のエレメントだけが採用されます。
Autopagerize側では必須属性になっていますが、こちらでは必須ではありません。つまり、Autopagerizeは(その目的から)1ページに全てが入っているサイトは登録されていませんので、そのようなサイトはこちらに登録する必要があります。また、Autopagerize側で主題が違う記事へのnextLinkが指定されているものについて、こちら側のデータベースにnextLinkが無い状態で登録することによって、主題が違う記事を読み込まないようにする事もできなくもありません。

title
本棚に登録される時の名前として採用される文字列を含むエレメントを取り出すためのxpathを書きます。

subtitle
(2017/12/18: 将来的に利用するための項目です)小説の個々の章にあたるサブタイトルを抽出できるエレメントを取り出すためのxpathを書きます。

author
作者名として採用される文字列を含むエレメントを指定するxpathを書きます。

firstPageLink
小説のタイトルページ(本文は含まない)がある場合、そのタイトルページから本文のページに遷移するためのaタグを指定します。複数hitするものを書いた場合、最初のエレメントだけが採用されます。
注意：firstPageLink にヒットした場合、そのURLを読み直して評価をし直すため、無限に firstPageLink の読み込み先を読んで場合があります。そのようにならないように firstPageLink の xpath を指定してください。

tag
その小説等に関する文字列タグが列挙されているエレメントがある場合は指定します。将来的に小説の検索などに使われるタグとして利用されるつもりのものになります。幾つかターゲットを書いてみたところ、どうやら Aレコードでタグへのlinkになっているものが多いようで、その場合は内部にスペースがあるタグなども見受けられましたため、Aレコードのようなエレメント毎に一つのタグとして取り込めないかな？と思っています。ということでできればAレコードそのものを取り出すように(a/text() のような XPath を書いてテキストだけにしないように)して頂ければ幸いです。

isNeedHeadless
JavaScriptが動作した後に本文部分(pageElement)が生成される場合には何らかの値("true"が推奨)を入れます。"false" や "False", "nil", "0", ""(何も入れない) の場合には単にGETリクエストで取得された値について評価して良いという意味になります。

nextButton
isNeedHeadless が有効になっている場合、ここで指定される要素の最初の物を次へのリンクとみなして .click() が実行されます。なお、この要素については xpath ではなく CSS selector で表記します(できればxpathにしたいのですが……)。なお、nextLink よりも nextButton の方が先に評価されます(nextButton と nextLink が両方共に存在する場合は nextButton が優先されます)。

firstPageButton
isNeedHeadless が有効になっている場合、ここで指定される要素の最初の物を最初の本文へのリンクとみなして .click() が実行されます。なお、この要素については xpath ではなく CSS selector で表記します(できればxpathにしたいのですが……)。なお、firstPageLink よりも firstPageButton の方が先に評価されます(firstPageButton と firstPageLink が両方共に存在する場合は firstPageButton が優先されます)。

forceClickButton
(実験的機能なので将来的に変更される可能性があります)isNeedHeadless が有効になっている場合、ここで指定される要素が存在した場合はその要素に対して .click() が実行された後、改めて内容を評価し直します。特定のボタンを押さないと先に進めない場合などに利用します。

waitSecondInHeadless
isNeedHeadless が有効になっている場合、ページが読み込まれたと思われる後、さらにここで指定される数値で指定される秒数待った後にinnerHTMLを評価します。

injectStyle
pageElement で取り出した後のHTMLに対して強制的に適用するstyleを記述します。これは、pageElementで取り出した後のHTMLにはstyleが適用されない(パスが違うのでstyleが適用されないであるとか、pageElementで取り出されたHTMLにはstyleが無いであるとか、styleは別のファイルになっているのでGETリクエスト一回では取り出せないであるといったときに、white-space:pre-wrap; で指定されるようなコンテンツが入っているとHTMLからStringに変換した時に改行周りとかが省略されてしまって悲しいことになるという問題を回避するための物になります。(例えば、pageElement として <div class="content xxx yyy">...</div> が取り出されるのがわかっている場合、"div.content{white-space:pre-wrap;}" といったような値を指定する事を期待しています。

exampleUrl
対象のURLを書きます。後で動作確認などをする時に利用します。複数書く場合は半角スペースで区切ってください。

memo
何らかのメモ書きを残すべきと思った場合は書いておいてください。日本語でOKです。

また、次のページや最初のページを判定するための要素がいくつかあるのでそれらの優先順位を別途書き記しておきます。
優先される順序は

1. forceClickButton
2. nextButton
3. firstPageButton
4. nextLink
5. firstPageLink

の順(数字が若い方が優先)になります。

なお、ことせかいの動作としては、このデータベースとAutopagerizeのデータベースの両方を使ってデータを読み込むように作られています。
動作としては対象のURLについて、まずこちらのデータベースにあるものが優先され、こちらのデータベースに載っていないURLのものについてはAutopagerizeのデータベースを参照することになります。
ですので、Autopagerize側で定義されている情報で十分なものの場合は改めてこちら側のデータベースに登録する必要は無いかもしれません。

Last Update: 2024-04-16T12:03:19+09:00 History

« Previous 1 2 3 4 5 6 7 Next »

1 - 50 / 350

Item List

JSON
JSONP

Xfolio/小説 2024-04-16T12:03:19+09:00

limura(openid-provider-appspot-com)

pageElement	//div[@id='article--novel']
title	//section//div[contains(@class,'article')]//h2[contains(@class,'title')]/a[contains(@href,'/series/')]
waitSecondInHeadless	3.0
subtitle	//section//div[contains(@class,'article')]//h2[contains(@class,'detailInfo__title')]
nextButton	a.navi--next
forceClickButton
firstPageLink	//section//li/div[contains(@class,'title')]/a[contains(@href,'/works/') and contains(@class,'title')]
firstPageButton
memo
isNeedHeadless	true
nextLink
url	^https://xfolio\.jp/portfolio/[^/]+/(series\|works)/\d+
tag	//div[contains(@class,'article--detailInfo')]//ul[@class='tags__list']/li/a[@class='tag__link']
author	//header//div[contains(@class,'creatorInfo__data')]/a[contains(@href,'/portfolio/')]
injectStyle
exampleUrl	https://xfolio.jp/portfolio/A_Kusaue/works/473019

last update: 2024-04-16T12:03:19+09:00

すまほん！！ 2024-04-14T12:08:17+09:00

limura(openid-provider-appspot-com)

pageElement	//div[contains(@class,'content-body')]/p
title	//article/div[@class='entry-header']/h1
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://smhn\.info/\d+
tag	//article//div[@class='the_tags']/a[contains(@href,'/tag/')]
author	//article/div[@class='entry-header']/div[@class='entry-meta']//a[contains(@href,'/author/')]
injectStyle
exampleUrl	https://smhn.info/202404-apple-m4-mac-rumors

last update: 2024-04-14T12:08:17+09:00

reddit 2024-02-12T13:01:37+09:00

limura(openid-provider-appspot-com)

pageElement	//div[@slot='text-body' or @slot='comment']
title	//h1
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless	true
nextLink
url	^https://www\.reddit\.com/r/[^/]+/comments/
tag	//faceplate-tracker/a[contains(@href,'flair_name')]
author
injectStyle
exampleUrl

last update: 2024-02-12T13:01:37+09:00

KAI-YOU.net 2024-01-10T18:57:44+09:00

limura(openid-provider-appspot-com)

pageElement	//div[@class='m-article-body']/div[contains(@class,'m-article-text')]/div[contains(@class,'m-article-text-main')]
title	//h1[@class='m-article-header-title']
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://kai-you\.net/article/\d+
tag	//div[@class='m-article-related-keyphrase-main']//li/a[contains(@href,'/word/')]\|//div[@class='m-article-header']//div[contains(@class,'m-article-label')]/a
author	//div[@class='m-article-data']//li[@class='author']/a[contains(@href,'/author/')]
injectStyle
exampleUrl	https://kai-you.net/article/88607

last update: 2024-01-10T18:57:44+09:00

あにまん掲示板 2023-12-30T12:55:22+09:00

limura(openid-provider-appspot-com)

title	//h1/text()
pageElement	//article//div[contains(@class,'resbody')]/p\|//article//div[@class='resheader']/span[@class='resnumber']
nextLink	//article//div[contains(@class,'resbody') and child::p[contains(text(),'次スレ')]][last()]/blockquote/a[contains(@href,'/board/')]\|//article//li[@id='res195' or @id='res196' or @id='res197' or @id='res198' or @id='res199' or @id='res200']/div[contains(@class,'resbody') and descendant::a[contains(@href,'/board/')]][last()]/blockquote/a[contains(@href,'/board/')]
url	^https://bbs\.animanch\.com/board/\d+/
exampleUrl	https://bbs.animanch.com/board/2706799/

last update: 2023-12-31T12:29:00+09:00

genpaku.org/1984 2023-11-25T12:54:01+09:00

limura(openid-provider-appspot-com)

pageElement	//body/*[not(contains(@style,'text-align:center')) and not(@id='top') and not(@id='toc') and not(child::a[@href='#top']) and not(@id='footnotes')]
title	//title
waitSecondInHeadless
subtitle	//h1[@id='top']
nextButton
forceClickButton
firstPageLink	//h3/a
firstPageButton
memo
isNeedHeadless
nextLink	//a[@id='goto_nexttext']
url	^https://genpaku\.org/1984/
tag
author
injectStyle
exampleUrl	https://genpaku.org/1984/1984_1.html

last update: 2023-11-25T12:54:01+09:00

FF XIV 漆黒秘話 2023-09-09T17:16:49+09:00

limura(openid-provider-appspot-com)

title	//title
pageElement	//div[@id='contents']/div[@class='main']//div[@class='body']
subtitle	//div[@id='contents']/div[@class='main']//div[@class='header']/h2
memo	タグは「//meta[@name='keywords']/@content」にあるのだけれど取り込めてない(´・ω・`)
nextLink	//div[@id='contents']/div[@class='main']//div[@class='footer']//a[@class='link_next']
url	^https://jp\.finalfantasyxiv\.com/lodestone/special/tales_from_the_shadows/sidestory
exampleUrl	https://jp.finalfantasyxiv.com/lodestone/special/tales_from_the_shadows/sidestory_01/
author	//meta[@name='author']/@content

last update: 2023-09-09T17:18:37+09:00

電撃ノベコミ＋ 2023-08-04T22:57:41+09:00

limura(openid-provider-appspot-com)

title	//nav[@class='topicpath']/ol/li/a[contains(@href,'/novecomi/novel/') and not(@aria-current='page') and not(@href='https://dengekibunko.jp/novecomi/novel/')]
pageElement	//main/section[@class='container-board']//article/div/p
subtitle	//main/section//article/h1
firstPageLink	//article//section[child::h2]/section//li/a[contains(@href,'/novecomi/novel/') and contains(@href,'.html')]
nextLink	//main/section[@class='container-board']//ul/li/a[contains(@href,'/novecomi/novel/') and contains(@href,'.html') and child::*[1][self::span]]
url	^https://dengekibunko\.jp/novecomi/novel/\d+/
exampleUrl	https://dengekibunko.jp/novecomi/novel/16817330660924812675/
author	//article//div[child::h1]/p[1]

last update: 2023-08-04T23:23:25+09:00

coindeskjapan 2023-07-26T21:22:09+09:00

limura(openid-provider-appspot-com)

pageElement	//div[@class='article-body']/*[not(self::figure) and not(contains(@class,'justify-center')) and not(@class='credits')]
title	//h1
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://www\.coindeskjapan\.com/\d+/
tag	//footer/div/a[contains(@href,'/tag/')]
author	//article/header//a[contains(@href,'author/')]
injectStyle
exampleUrl	https://www.coindeskjapan.com/195238/

last update: 2023-07-26T21:22:09+09:00

プリ小説 2023-07-20T19:28:07+09:00

limura(openid-provider-appspot-com)

pageElement	//div[contains(@class,'contents__sentence')]
title	//app-chapter-detail//app-breadcrumb//ul/li/a[contains(@class,'item__body--even') and not(contains(@href,'/chapter/')) and not(contains(@href,'/list/')) and not(@href='/') and contains(@href,'/novel/')]
waitSecondInHeadless
subtitle	//div[contains(@class,'headline')]/*[contains(@class,'headline__title') or contains(@class,'headline__number')]
nextButton
forceClickButton
firstPageLink
firstPageButton	.button-read--first-chapter
memo
isNeedHeadless	true
nextLink	//a[@appanalytics='pager/footer-next/icon']
url	^https://novel\.prcm\.jp/novel/
tag	//ul[@class='novel-attribute']/li[contains(@class,'novel-attribute__item')]/text()\|//ul[@class='novel-attribute']/li[contains(@class,'novel-attribute__item')]/span
author	//div[@class='novel-author']/a[contains(@href,'/user/') and @class='novel-author__name']\|//section[contains(@class,'author-novels')]/app-author//a[contains(@href,'/user/')]//p[@class='author__name']
injectStyle
exampleUrl	https://novel.prcm.jp/novel/lUZLp2hP6AQCv0XpcQqH/chapter/4Lti0yVM2MnWIGUaEJ1I

last update: 2023-07-20T19:28:07+09:00

待ラノ 2023-07-20T14:50:06+09:00

limura(openid-provider-appspot-com)

waitSecondInHeadless	4.0
title	//h3[@class='n-title']
pageElement	//h3[not(@class)]\|//div[@id='episodeBody']
nextButton	.episode-footer-right .episode-footer-text
subtitle	//h3[not(@class)]
firstPageLink	//div[contains(@class,'border-episodes')]//tr//div[contains(@class,'episode-list-item-title')]//a[contains(@href,'/novel/') and contains(@href,'/episode/')]
memo	どうにも nextButton を押した後に次のページに移動してくれないような挙動を確認したので、 URL 部分を変更してマッチしないようにしています。
isNeedHeadless	true
url	^https://lanobe\.jp/novel_/
exampleUrl	https://lanobe.jp/novel/6f169d9a94300e140ce73b4403884f44
author	//a[contains(@href,'/user/') and @target='_self']

last update: 2023-07-20T18:22:06+09:00

Privatter 2023-06-27T13:20:32+09:00

limura(openid-org-cn)

pageElement	//p[@class='honbun']
title	//h3[@class='lead']
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://privatter\.net/p/\d+
tag
author	//div[@class='panel-heading']/a[contains(@href,'/u/') and @class='panel-title']/b[1]
injectStyle
exampleUrl	https://privatter.net/p/10040264

last update: 2023-06-27T13:20:32+09:00

無料で読める大人のケータイ官能小説 2023-05-08T10:22:08+09:00

limura(openid-org-cn)

title	//div[@class='maincontent']//div[contains(@class,'center') and contains(@class,'f18')]
pageElement	//div[@class='maincontent']//div[@class='padding10']/div[@class='padding5' and @id='changeArea']
subtitle	//div[@class='maincontent']//div[@class='padding10']/div[@class='padding5' and not(@id)]
firstPageLink	//div[@class='maincontent']//ul[@class='viewstorybtns']/li[1]/a[contains(@href,'/viewstory/chapterlist/')]\|//table//tr[1]//a[contains(@href,'/viewstory/page/')]
nextLink	//a[@class='next' and contains(@href,'/viewstory/page/')]
tag	//div[@class='viewstorytags']/a[contains(@href,'/search/searchword/') and @class='eachtag']
url	^https://kanno-novel\.jp/viewstory/(index\|chapterlist\|page)/\d+/
exampleUrl	https://kanno-novel.jp/viewstory/page/4617/1/?guid=ON
author	//div[@class='maincontent']//div[@class='storymeta']//a[contains(@href,'/viewuser/index/')]

last update: 2023-05-08T21:17:56+09:00

FC2ブログ(ラノベを目指してみよう) 2023-04-16T22:49:26+09:00

limura(openid-org-cn)

title	//td[@class='site_title']/b/a
pageElement	//td[@class='main_txt' and not(child::div[@class='td']) and not(child::form) and not(contains(text(),'トラックバック URL'))]
subtitle	//td[@class='entry_title']
memo	URL部分が短すぎて FC2ブログのglobal設定にひっかかるので、怪しくURL部分に無駄な記述を入れてあります
nextLink	//div[@class='navi']/a[contains(@href,'/blog-entry-')][1]
url	^https?://nemuiyon.blog72\.fc2\.com/blog-(entry\|entry)-\d+.html
exampleUrl	http://nemuiyon.blog72.fc2.com/blog-entry-2.html

last update: 2023-04-17T13:29:31+09:00

gihyo.jp/(article|admin) 2023-03-30T09:17:46+09:00

limura(openid-org-cn)

title	//h1[@class='main-title']
pageElement	//article[contains(@class,'article-main')]/*[not(@role='doc-footnote') and not(@class='sc-related-book') and not(self::figure)]
tag	//aside/ul[@class='article-category']/li/a[contains(@href,'/list/category/')]
url	^https://gihyo\.jp/(article\|admin)/(\d+\|clip\|serial)/\d+/[^/]+
exampleUrl	https://gihyo.jp/article/2023/03/how-ai-image-generator-work-02
author	//a[@rel='author']/span[@itemprop='name']

last update: 2024-03-21T13:33:29+09:00

占いツクール/novel 2023-03-26T01:23:47+09:00

limura(openid-org-cn)

title	//div[@id='header']/h1[@class='utitle']
pageElement	//div[@id='output']/div[@class='result']/p[@id='u_result']
firstPageLink	//form/fieldset/label[@for='ans0']/a[contains(@href,'/novel/') and contains(@href,'ans=0')]
isNeedHeadless	true
nextLink	//div[@id='output']/div[@class='result']/p[@id='u_pager']/span[@class='p_next']//a[@id='ans_next' and contains(@href,'/novel/')]
url	^https://uranai\.nosv\.org/u\.php/novel/[^/]+/
exampleUrl	https://uranai.nosv.org/u.php/novel/conaconaconan/
author	//p[@class='block' and not(@id)]/a[contains(@href,'/find.php/')]

last update: 2023-03-26T01:27:53+09:00

zenn / scraps 2023-03-10T10:51:12+09:00

limura

pageElement	//article//div[contains(@class,'ThreadItemContent')]//div[contains(@class,'BodyCommentContent')]
title	//article//h1[contains(@class,'View_title')]
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://zenn\.dev/[^/]+/scraps/
tag
author	//aside//a/span[contains(@class,'displayName')]
injectStyle
exampleUrl	https://zenn.dev/megyo9/scraps/23de174b69ef83

last update: 2023-03-10T10:51:12+09:00

Royal Road 2023-02-12T21:07:56+09:00

limura

pageElement	//div[contains(@class,'chapter-content')]
title	//a/h2
waitSecondInHeadless
subtitle	//div[contains(@class,'fic-header')]//h1
nextButton
forceClickButton
firstPageLink	//table[@id='chapters']//tr/td/a[contains(@href,'/chapter/')]
firstPageButton
memo
isNeedHeadless
nextLink	//div[contains(@class,'nav-buttons')]/div[2]/a[contains(@href,'/chapter/')]
url	^https://www\.royalroad\.com/fiction/\d+/
tag	//div[@class='fiction-info']//span[@class='tags']/a[contains(@href,'tagsAdd=')]
author	//div[contains(@class,'fic-header')]//h3/a[contains(@href,'/profile/')]
injectStyle
exampleUrl	https://www.royalroad.com/fiction/21220/mother-of-learning/chapter/301778/1-good-morning-brother

last update: 2023-02-12T21:07:56+09:00

Forbes JAPAN 2023-02-11T17:08:12+09:00

limura

pageElement	//div[contains(@class,'article-content')]
title	//section[@class='article-detail']//h1[@class='article-tit']
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink	//div[@class='articles-list-pager']/ul/li[contains(@class,'articles-list-pager__item') and contains(@class,'next')]/a
url	^https://forbesjapan\.com/articles/detail/\d+
tag	//section[@class='article-detail']//div[@class='article-detail-head']//div[@class='meta']/p[@class='cate']/a
author	//section[@class='article-detail']//div[@class='author']//p[@class='name']/a[contains(@href,'/author/detail/')]
injectStyle
exampleUrl	https://forbesjapan.com/articles/detail/60874

last update: 2023-02-11T17:08:12+09:00

集英社オンラインニュース・トピック 2023-02-02T09:05:10+09:00

limura

title	//h1
pageElement	//div[@id='okra-elements']/div[@class='lo-PostContent']//*[not(contains(@style,'display: none')) and (@paragraph-element-component or @heading-element-component)]
nextButton	.el-PostPagination_Inner .el-PostPagination_PageNumber + a.el-PostPagination_Button
isNeedHeadless	true
tag	//div[@tags-block-component]//li[contains(@class,'Tags_Tag')]/a[contains(@href,'/tags/')]
url	^https://shueisha\.online/newstopics/\d+
exampleUrl	https://shueisha.online/newstopics/99493
author	//a[@class='bl-Authors_Person']

last update: 2023-02-02T09:15:26+09:00

さくらのブログ 2023-01-13T20:41:23+09:00

limura

pageElement	id('content')//div[@class='blogbody']/div[@class='text']
title	//div[@id='container']/div[@id='banner']/h1
waitSecondInHeadless
subtitle	id('content')//div[@class='blogbody']/h3[@class='title']
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink	//div[@id='container']/div[@id='content']/div[@class='navi']/a[last()]
url	^https?://[^.]+\.sblo\.jp/article/\d+\.html
tag
author
injectStyle
exampleUrl	http://mokotyama.sblo.jp/article/190069313.html

last update: 2023-01-13T20:41:23+09:00

新都杜 2023-01-02T17:45:26+09:00

limura

pageElement	//div[@id='main']/div[@class='text']
title	//h1
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink	//table[@class='story']//a[contains(@href,'story=1')]
firstPageButton
memo
isNeedHeadless
nextLink	//table[@class='prev-top-next']//a[@class='next' and contains(@href,'story=')]
url	^https://neetsha\.jp/inside/comic\.php\?
tag
author	//form[@action='/inside/comment.php']//a[contains(@href,'/inside/main.php?author=')]
injectStyle
exampleUrl	https://neetsha.jp/inside/comic.php?id=24316&story=3

last update: 2023-01-02T17:45:26+09:00

Enty 2023-01-02T17:09:14+09:00

limura

title	//a[child::i[contains(@class,'fa-user')]]/span[@property='name']\|//a[contains(@onclick,'entertainerClick')]/div/span
pageElement	//article
subtitle	//h4[@class='article-post-title']
memo	支援者にならないとマトモに使えないので支援しないで見える範囲でしか設定していません。支援してる人が書き直すべきだと考えられます。
isNeedHeadless	true
nextLink	//a[contains(@href,'/posts/') and contains(@class,'post-nav') and contains(@onclick,'postNextPostClick')]
url	^https://enty\.jp/posts/\d+
exampleUrl	https://enty.jp/posts/142002
author	//a[child::i[contains(@class,'fa-user')]]/span[@property='name']\|//a[contains(@onclick,'entertainerClick')]/div/span

last update: 2023-01-02T17:18:56+09:00

青空文庫 (HTML) 2023-01-02T16:39:59+09:00

limura

title	//title
pageElement	//body
memo	このページは //body で取らないと何も取れないぽいので特別視して専用のフィールドを作ります
url	^https?://www\.aozora\.gr\.jp/cards/\d+/files/\d+\.html
exampleUrl	https://www.aozora.gr.jp/cards/001528/files/474.html

last update: 2023-02-20T09:53:16+09:00

MAGIC The Gathering JP / READING 2022-12-21T08:26:15+09:00

limura

pageElement	//div[@class='detail']
title	//header//div[@class='column-title']/h1
waitSecondInHeadless
subtitle	//header[@class='article-title']//h1
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink	//div[@class='next-prev']//li[contains(@class,'btn-next')]/a[contains(@href,'/reading/')]
url	^https://mtg-jp\.com/reading/[^/]+/\d+/
tag	//article//section[@class='sec-article']/div[@class='tag-list']//a[contains(@href,'tag=')]
author	//article//section[@class='sec-article']//div[@class='name']
injectStyle
exampleUrl	https://mtg-jp.com/reading/mm/0036581/

last update: 2022-12-21T08:26:15+09:00

Webnovel 2022-12-18T14:18:15+09:00

limura

waitSecondInHeadless	1.0
title	//header//a[contains(@href,'/book/') and @title]/@title
pageElement	//div[@id='page']//div[contains(@class,'cha-content')]/div[contains(@class,'cha-words')]/div[contains(@class,'j_paragraph')]/div/p
subtitle	//header//span[preceding-sibling::a[contains(@href,'/book/') and @title] and @class='j_chapName']
firstPageLink	//a[@id='j_read' and contains(@href,'/book/')]
isNeedHeadless	true
nextLink	//link[@rel='next']
tag	//div[@class='m-tags']//a[contains(@href,'/tags/')]
url	^https://www\.webnovel\.com/book/[^/]+
exampleUrl	https://www.webnovel.com/book/villain-the-play-of-destiny_21226665705647705/chapter-163_63615575804397816
author	//div[@id='page']//address/span\|//div[@class='page']//h2/a[preceding-sibling::strong and @class='c_primary' and contains(@href,'/profile/')]

last update: 2022-12-18T15:25:55+09:00

文字の冷凍庫、 2022-12-07T10:09:16+09:00

limura

pageElement	//article/div[@class='kiji-content']/*[not(contains(@class,'addtoany_share_save_container'))]
title	//article/div[@class='kiji-info']/div/a
waitSecondInHeadless
subtitle	//article/h1
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink	//article/div[@class='paging']/div[@class='next']/a[@rel='next']
url	^https://dyreitou\.com/.
tag
author	//footer/div[@class='copy']/a
injectStyle
exampleUrl	https://dyreitou.com/%e5%be%8c%e6%97%a5%e8%ab%87%e2%91%a4

last update: 2022-12-07T10:09:16+09:00

lightnovelpub 2022-12-03T12:06:22+09:00

limura

waitSecondInHeadless	2.0
title	//h1[@itemprop='name']\|//h1/a[@class='booktitle']
pageElement	//div[@id='chapter-container']/*[not(descendant::dt[contains(text(),'ʟɪɢʜᴛɴᴏᴠᴇʟᴘᴜʙ.ᴄᴏᴍ')])]
subtitle	//h1/span[@class='chapter-title']
firstPageLink	//header[@class='novel-header']//div[@class='novel-info']/nav[@class='links']/a[@id='readchapterbtn' and contains(@href,'/novel')]
isNeedHeadless	true
nextLink	//div[contains(@class,'chapternav')]/a[@rel='next' and contains(@href,'/novel/')]
tag	//header//div[@class='categories']//ul/li/a[contains(@href,'/stories-') and @class='property-item']
url	^https://www\.lightnovelpub\.com/novel/[^/]+
exampleUrl	https://www.lightnovelpub.com/novel/allrounders-01122100
injectStyle	p {padding: 20px 40px 40px; position: relative;}
author	//div[@class='main-head']/div[@class='author']/a[contains(@href,'/author')]

last update: 2022-12-03T12:59:57+09:00

ノベルピア 2022-10-29T20:36:57+09:00

limura

forceErrorMessageAndElement	ノベルピアでログインしていない事によるログインを促す画面が出ているようです。「Web取込タブ」側でログインしてからお試し下さい。://div[contains(@onclick,'login_req=')]
waitSecondInHeadless	2.0
title	//div[@id='header_bar']//*[contains(@onclick,'pageload(')]/b[@class='cut_line_one']
pageElement	//div[@id='novel_text']/ol[@id='novel_drawing']/[not(child::div[child::img]) and not(@id='writer_comments_box') and not(@id='next_e_btn_bottom') and not(@class='is-last-btn') and not(@id='next_epi_btn_bottom')]/text()\|//div[@id='novel_text']/ol[@id='novel_drawing']/[not(child::div[child::img]) and not(@id='writer_comments_box') and not(@id='next_e_btn_bottom') and not(@class='is-last-btn') and not(@id='next_epi_btn_bottom')]/ruby
forceClickButton	button.auth-yes
nextButton	#next_epi_btn_bottom
subtitle	//div[@id='header_bar']//*[contains(@onclick,'pageload(')]/span[@class='cut_line_one']
firstPageButton	#episode_list table td[onclick]
memo	かなーりトリッキーな事をしているのでメモを残す。このサイトは ol の中に font があってその中に文字が書かれているんだけれども、どうやら NSAttributedString に変換した時か何かで ol/font には 1. 2. 3. みたいな数字が付与されてしまう。そのため、 injectStyle で ol {list-style: none} にするのだけれど、どうにもそれだけだと　. (ピリオド) は消えないようである。そこで、ol/font については /text() で文字列だけを取り出すようにして、その上で font { white-space: pre-wrap } で強制的に改行を入れさせるように対応した。
isNeedHeadless	true
tag	//tbody[not(child::tr[@class='more_info'])]//span[contains(@onclick,'/search/hash/') and contains(@onclick, 'location=')]
url	^https://(novelpia\|novelpink)\.jp/(viewer\|novel)/\d+
exampleUrl	https://novelpia.jp/novel/1272
injectStyle	ol {list-style: none;} font { white-space: pre-wrap; }
author	//tbody[not(child::tr[@class='more_info'])]//a[contains(@href,'/user/')]

last update: 2023-07-30T16:21:19+09:00

ふたまん＋ 2022-10-18T14:47:05+09:00

limura

title	//main//h1[contains(@class,'article-header')]
pageElement	//main//div[@class='article-body']/*[not(@class='article-image') and not(contains(@class,'ad-rectangle')) and not(@class='next-link') and not(@class='article-pager')]
firstPageLink	//main//div[@class='article-teaser__more-btn']/a[contains(@href,'/articles/-/') and contains(@href,'page=1')]
nextLink	//main//div[@class='article-body']/div[@class='next-link']/a[contains(@href,'/articles/-/')]
tag	//main//header/ul[contains(@class,'article-tag-list')]//a[contains(@href,'/list/tag/')]
url	^https://futaman\.futabanet\.jp/articles/-/\d+
exampleUrl	https://futaman.futabanet.jp/articles/-/122699?page=1
author	//main//div[@class='article-header-info']//div[@class='article-header-info__authorName']

last update: 2022-10-18T14:48:06+09:00

雨夜の月 2022-08-13T08:13:32+09:00

limura

pageElement	//p[contains(@class,'typesquare_option')]
title
waitSecondInHeadless
subtitle	//header/h1
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink	//a[contains(@class,'next-post')]
url	^https://amayo\.halfmoon\.jp/[^/]+/
tag
author	//footer//div[contains(@class,'copyright')]
injectStyle
exampleUrl	https://amayo.halfmoon.jp/rm0-1/

last update: 2022-08-13T08:13:32+09:00

TECHNOEDGE 2022-07-28T16:57:18+09:00

limura

title	//h1[@class='head']
pageElement	//article[contains(@class,'arti-body')]/*[not(@class='af_box') and not(@class='modal') and not(@class='link-card')]
memo	前後の記事へのlinkは取り出すのがかなり面倒くさそうなので諦めました(´・ω・`)
tag	//section[@class='main-special']/ul[@class='special-list']/li/a[contains(@href,'/special/')]
url	^https://www\.techno-edge\.net/article/\d+/\d+/\d+/\d+\.html
exampleUrl	https://www.techno-edge.net/article/2022/07/14/89.html
author	//main/div[@class='author'][1]//h2[@class='writer-name']

last update: 2023-07-09T18:43:21+09:00

わんちゃんホンポ 2022-06-12T17:44:05+09:00

limura

pageElement	//main//article//div[contains(@class,'article_body')]/*[not(@class='img') and not(@class='toc-index') and not(self::p[@class='link']/a[contains(@href,'https://wanchan.jp/')]) and not(@class='detail_footer__ad') and not(@class='sns_group') and not(@class='article_osusume_comment')]
title	//main//header//h1[@class='article_title']
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://wanchan\.jp/osusume/detail/\d+
tag	//main//div[@id='csw_block']/div[@class='csw-word-container']/label[@class='csw-word-label']
author	//main//header//div[@class='article_writer']/div[@class='name']/a[contains(@href,'/user/')]
injectStyle
exampleUrl	https://wanchan.jp/osusume/detail/7759

last update: 2022-06-12T17:44:05+09:00

Lmaga.jp 2022-06-12T16:51:09+09:00

limura

pageElement	//div[@id='main']/div[@class='article_main']/div[@class='main_body']/div[@class='main_txt']/*[not(self::figure)]
title	//div[@id='main']/div[@class='article_main']/h1[@class='main_ttl']
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://www\.lmaga\.jp/news/\d+/\d+/\d+/
tag	//div[@id='main']/div[@class='article_main']/div[@class='main_body']/div[@class='main_tag']//li/a[contains(@href,'/tag/')]
author
injectStyle
exampleUrl	https://www.lmaga.jp/news/2022/06/459430/

last update: 2022-06-12T16:51:09+09:00

ABEMA TIMES 2022-05-31T08:41:10+09:00

limura

pageElement	//main/article/div[contains(@class,'article-body')]/*[not(contains(@class,'figure')) and not(contains(@class,'article-thumb')) and not(contains(@class,'article-tvbtn')) and not(contains(@class,'article-relation'))]
title	//main/article//h1[contains(@class,'article-header')]
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://times\.abema\.tv/articles/-/\d+
tag	//main//div[contains(@class,'article-tag-wrap')]//li/a[contains(@class,'article-tag')]
author	//main/article//div[contains(@class,'article-header-tags')]//a[contains(@href,'/program/')]
injectStyle
exampleUrl	https://times.abema.tv/articles/-/10025376

last update: 2022-05-31T08:41:10+09:00

ノベマ！ 2022-05-26T00:32:11+09:00

limura

pageElement	//section/article[contains(@class,'bookText')]/div
title	//div[@class='readBookDetail']//div[@class='title']/h1
waitSecondInHeadless
subtitle	//section/article[contains(@class,'bookText')]/aside/div[@class='chapterName']
nextButton
forceClickButton
firstPageLink	//div[@class='bookChapterList']//ul/li/a[contains(@href,'/book/') and contains(@href,'/1')]
firstPageButton
memo
isNeedHeadless
nextLink	//link[@rel='next']
url	^https://novema\.jp/book/[^/]+
tag	//article[@class='bookFooter']//ul[@class='keywordList']/li/a[contains(@href,'word=')]
author	//main/section//div[@class='name']/a[contains(@href,'/member/')]
injectStyle
exampleUrl	https://novema.jp/book/n1660193/1

last update: 2022-05-26T00:32:11+09:00

ゲームメーカーズ 2022-05-17T15:03:59+09:00

limura

title	//div[@class='l-article-header-inner-txts']/h1
pageElement	//main//section[contains(@class,'article-body')]/*[not(contains(@class,'image-wrapper')) and not(self::a) and not(contains(@class,'auther-wrapper')) and not(contains(@class,'section-recommend'))]
tag	//div[contains(@class,'article-header-tags')]/a[contains(@href,'/category/') or contains(@href,'/tag/')]
url	^https://gamemakers\.jp/article/\d{4}_\d{2}_\d+_\d+/
exampleUrl	https://gamemakers.jp/article/2022_05_17_2062/
author	//main//section//a[contains(@href,'/author-archive/?author_id=')]//h6

last update: 2023-01-24T15:05:46+09:00

WFS blog 2022-05-13T11:01:27+09:00

limura

pageElement	//main/section[@id='blogEntry']//div[@class='sectionBody']/*[not(@class='profileArea') and not(@class='contentImg') and not(self::style)]
title	//main/section[@id='blogEntry']//h1
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://www\.wfs\.games/blog/[^/]+/[^/]+/
tag
author
injectStyle
exampleUrl	https://www.wfs.games/blog/company/infographics/2022/ https://www.wfs.games/blog/story/interview12/

last update: 2022-05-13T11:01:27+09:00

JAMSTEC BASE 2022-05-04T17:24:28+09:00

limura

pageElement	//main/section//div[@class='p-articleDetail_main']/*[not(@class='c-toc') and not(self::figure) and not(@class='c-columnBox') and not(@class='c-news_outer') and not(contains(@class,'c-linkTag_outer')) and not(@class='c-share_outer') and not(@class='c-linkList_outer') and not(contains(@class,'c-media'))]
title	//main/section//div[@class='p-articleDetail_mv']//h1
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://www\.jamstec\.go\.jp/j/[^/]+/topics/[^/]+/
tag	//div[contains(@class,'p-articleDetail_mv_inner')]//div[contains(@class,'c-linkTag_outer')]/a[contains(@href,'/tag/')]/span[@class='c-linkTag_text']
author	//main/section//div[@class='p-articleDetail_main']//div[@class='c-media_text']/h2
injectStyle
exampleUrl	https://www.jamstec.go.jp/j/pr/topics/shin-chishiki-20220428/

last update: 2022-05-04T17:24:28+09:00

PRESIDENT Online 2022-04-30T17:44:50+09:00

limura

title	//article//h1[@class='article__mainTtl']/text()
pageElement	//article/section[@class='article-body']/*[not(contains(@class,'image-area')) and not(@class='caution') and not(contains(@class,'ad-rectangle')) and not(contains(@class,'ext-container')) and not(@class='related-article')]
subtitle	//article//h1[@class='article__mainTtl']/span[@class='article__subTtl']
nextLink	//div[@class='article-above-pagination']//a[@class='next' and contains(@href,'/articles/') and contains(@href,'page=')]
tag	//article//div[@class='leafInfo__tag']/a[contains(@href,'/tag/')]
url	^https://president\.jp/articles/-/\d+
exampleUrl	https://president.jp/articles/-/56527
author	//article//ul[@class='articleInfo__list']//a[contains(@class,'author_link')]/img/@alt

last update: 2022-04-30T17:53:45+09:00

Over The Rainbow ～にじの彼方～ 2022-04-17T16:59:51+09:00

limura

title	//div[@id='content']/p/a[contains(@href,'/page/') and position()=last()]\|//section[not(child::h2[contains(text(),'目次')])]/ul[@class='menu']/li[position()=1]/a[contains(@href,'/page/')]
pageElement	//div[@id='container']/div[@id='content' and not(child::h3[text()='目次'])]/[not(self::h2) and not(self::p[contains(child::a/text(),'トップページ')]) and not(@id='clap') and not(@id='continue') and not(@id='page_link') and not(@id='comment_form') and not(@id='vote')]\|//section[not(@id) and not(child::h2[text()='目次'])]/article/[not(self::h1) and not(@id='vote') and not(child::form)]
subtitle	//div[@id='content']/h2\|//section/article/h1
firstPageLink	//div[@id='content']/ul//li/a[contains(@href,'/page/')]\|//section[child::h2[contains(text(),'目次')]]/ul[@class='menu']/li[child::a[contains(@href,'/page/')] and position()=1]/a
memo	このサイトは似たようなURLで小説本文と本文へのリンクが記述されている場合があり、さらに、本文へのリンクのページへのリンクが記述されたページがあり、さらにその上のリンクのページへのリンクのページなどもあって、かなりややこしいです。また、個々のページのDOM構造が統一されていないようで、少なくとも二種類のDOM構造を確認しています。具体的には、 https://nijikana.net/index.php/page/kamishiro_other_osaka01 のDOM構造と https://nijikana.net/index.php/page/kamishiro_other_18_01 のDOM構造は違っていて、片方は //div[@class='novel_view'] はあるけれど、もう片方にはありません。また、このサイトはスマートフォン等へのDOM構造ももっているため、それへの対応も必要になります。
nextLink	//div[@id='content' and not(child::ul//li/a[contains(@href,'/page/')])]//div[@id='page_link']//li[@class='next']/a[contains(@href,'/page/')]\|//section[not(child::h2[contains(text(),'目次')])]/ul[@class='menu']/li/a[contains(@href,'/page/') and contains(text(),'次:')]
url	^https://nijikana\.net/index.php/page/[^/]+
exampleUrl	https://nijikana.net/index.php/page/kamishiro_other_osaka https://nijikana.net/index.php/page/kamishiro_other_osaka01

last update: 2022-04-17T18:07:55+09:00

ORICON NEWS /special 2022-04-16T16:48:06+09:00

limura

title	//article//h1
pageElement	//article/div[@class='special-content' and not(child::aside)]/*[not(contains(@class,'unit-photo'))]
url	^https://www\.oricon\.co\.jp/special/\d+/
exampleUrl	https://www.oricon.co.jp/special/58825/

last update: 2022-04-16T16:53:26+09:00

androidcentral 2022-04-16T16:29:38+09:00

limura

title	//header/h1
pageElement	//div[@id='article-body']/*[not(self::figure) and not(contains(@id,'ad-unit')) and not(@class='youtube-video') and not(contains(@class,'featured_product_block')) and not(contains(@class,'see-more')) and not(contains(@class,'van_vid_carousel'))]
url	^https://www\.androidcentral\.com/[^/]+/[^/]+/
exampleUrl	https://www.androidcentral.com/gaming/virtual-reality/quest-pro-t-rex-super-resolution
author	//header//span[contains(@class,'by-author')]/a[@rel='author']

last update: 2022-04-16T16:34:32+09:00

Motor-Fan 2022-04-15T13:37:06+09:00

limura

pageElement	//main//div[contains(@class,'p-entry-content')]/*[not(@id='ez-toc-container') and not(self::figure) and not(self::aside)]
title	//main//h1
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://motor-fan\.jp/[^/]+/article/\d+/
tag	//main//div[@class='c-entry-tags']/a[contains(@href,'/tag/')]
author	//main//ul[@class='c-meta']/li[contains(@class,'item--author')]//span[@id='dimension_author']
injectStyle
exampleUrl	https://motor-fan.jp/weboption/article/30308/ https://motor-fan.jp/mf/article/50990/

last update: 2022-04-15T13:37:06+09:00

ガジェット通信 2022-04-14T15:39:12+09:00

limura

pageElement	//div[@class='post-bodycopy']/*[not(self::script) and not(self::img) and not(self::center and child::div[contains(@class,'twitter-tweet')])]
title	//article[@class='post']/h1
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://getnews\.jp/archives/\d+
tag	//article/div[@class='cattag']/span/a[contains(@rel,'tag')]
author	//div[@class='si_profile']//p[@class='prof']/a[contains(@href,'/author/')]
injectStyle
exampleUrl	https://getnews.jp/archives/3256309

last update: 2022-04-14T15:39:12+09:00

Spring Leaves Translations 2022-04-11T08:49:30+09:00

limura

title	//main[@id='main']/article/header/h1
pageElement	//main[@id='main']/article/div[@class='entry-content']/*[not(self::script) and not(@id='jp-post-flair') and not(child::a[text()='Previous Chapter' or text()='Next Chapter'])]
nextLink	//main[@id='main']/article/div[@class='entry-content']/p/a[text()='Next Chapter']
url	^https://muryoutranslation\.wordpress\.com/\d+/\d+/\d+/[^/]+/

last update: 2022-04-11T08:56:29+09:00

ORICON NEWS 2022-04-09T13:23:14+09:00

limura

pageElement	//article/div[@class='block-detail-body']/div[@class='mod-p']
title	//h1[@class='title']
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless
nextLink
url	^https://www\.oricon\.co\.jp/news/\d+/full/
tag	//div[contains(@class,'block-tags')]//li/a[contains(@href,'/tag/')]
author
injectStyle
exampleUrl	https://www.oricon.co.jp/news/2230906/full/

last update: 2022-04-09T13:23:14+09:00

マイナビニュース 2022-04-04T19:34:43+09:00

limura

title	//div[@class='articleHeader']/h1
pageElement	//article
tag	//div[contains(@class,'c-archiveMain')]//section[@gtm-label='article_tag']//a[@class='articleRelated_keywordList_nodeLink' and contains(@href,'/tag/')]
url	^https://news\.mynavi\.jp/([^/]+/)?article/[^/]+/
exampleUrl	https://news.mynavi.jp/techplus/article/20220331-2308013/
author	//div[contains(@class,'c-archiveMain')]//div[@class='articleHeader_info']/a[@gtm-label='article_author' and contains(@href,'/author/')]

last update: 2022-06-04T17:42:32+09:00

読むらじる。 2022-04-03T15:49:36+09:00

limura

pageElement	//section[@id='article']/div[@class='article-body']
title	//section[@id='article']//div[@class='article-header-inner']/p[@class='article-header-title']
waitSecondInHeadless
subtitle
nextButton
forceClickButton
firstPageLink
firstPageButton
memo
isNeedHeadless	true
nextLink
url	^https://www\.nhk\.or\.jp/radio/magazine/article/[^/]+/
tag	//section[@id='article']//div[@class='article-header-inner']//p[@class='yr-tags']/span[contains(@class,'tag')]
author	//section[@id='article']//p[@class='article-header-program']
injectStyle
exampleUrl	https://www.nhk.or.jp/radio/magazine/article/kodomoq/j22XUmLiG8.html

last update: 2022-04-03T15:49:36+09:00

Readwn.com 2022-03-26T17:25:59+09:00

limura

pageElement	//div[@class='chapter-content']
title	//main//header//div[@class='titles']/h1
waitSecondInHeadless
subtitle	//main//header//div[@class='titles']/h2
nextButton
forceClickButton
firstPageLink	//main/article/div[contains(@class,'novel-body')]/section[@id='chapters']/div[@id='chpagedlist']/ul[@class='chapter-list']/li/a[contains(@href,'/novel/') and contains(@href,'_1.html')]
firstPageButton
memo
isNeedHeadless
nextLink	//main/article/section[contains(@class,'page-in')]/div[contains(@class,'chapternav')]/a[@class='nextchap' and contains(@href,'/novel/')]
url	^https://www\.readwn\.com/novel/[^/]+\.html
tag	//main/article/div[contains(@class,'novel-body')]/section[@id='info']/div[@class='tags']/ul/li/a[@class='tag' and contains(@href,'/tags/')]
author	//main/article/header//div[@class='main-head']/div[@class='author']/span[@itemprop='author']
injectStyle
exampleUrl	https://www.readwn.com/novel/gods-the-beginning-of-creation-creates-an-ominous-red-haired-emperor-in-his-later-years_156.html

last update: 2022-03-26T17:25:59+09:00

« Previous 1 2 3 4 5 6 7 Next »

1 - 50 / 350