isdn-python/isdn/parser.py

import re
from typing import IO, Iterator

from lxml import etree

namespaces = {"sitemap": "http://www.sitemaps.org/schemas/sitemap/0.9"}


class ISDNJpSitemapXMLParser:
    @staticmethod
    def parse_list(file: str | IO) -> Iterator[str]:
        for event, elm in etree.iterparse(
            file, events=("end",), tag=[f"{{{namespaces['sitemap']}}}loc"], remove_blank_text=True
        ):
            m = re.match(r"https://isdn.jp/(\d{13})", elm.text)
            if not m:
                continue
            yield m.group(1)
Initial commit 2023-03-26 04:57:18 +09:00			`import re`
			`from typing import IO, Iterator`

			`from lxml import etree`

Deserialize XML using pydantic-xml 2023-04-08 03:32:34 +09:00			`namespaces = {"sitemap": "http://www.sitemaps.org/schemas/sitemap/0.9"}`
Initial commit 2023-03-26 04:57:18 +09:00

Deserialize XML using pydantic-xml 2023-04-08 03:32:34 +09:00			`class ISDNJpSitemapXMLParser:`
Initial commit 2023-03-26 04:57:18 +09:00			`@staticmethod`
			`def parse_list(file: str \| IO) -> Iterator[str]:`
			`for event, elm in etree.iterparse(`
			`file, events=("end",), tag=[f"{{{namespaces['sitemap']}}}loc"], remove_blank_text=True`
			`):`
Add write-image option to bulk-download command 2023-04-03 04:16:28 +09:00			`m = re.match(r"https://isdn.jp/(\d{13})", elm.text)`
Initial commit 2023-03-26 04:57:18 +09:00			`if not m:`
			`continue`
			`yield m.group(1)`