<div id="parrent">
<ul>
<li>
<span>
<span>hi, i want you to get me!!!!</span>
</span>
</li>
<li>
<span>me too please!!!!</span>
</li>
</ul>
</div>
Imaging in the example above we want to get all the texts withing the li, note there is only one text by li, but can be imbricated into multiple levels Tags.
So how we get it using BeautifulSoup of course. the protperty .string don't do the job, here is a resume about .string:
- If a tag has only one child, and that child is a
NavigableString
, the child is made available as.string
: (a navigable string is such a text within a tag)
for ex:<span>me too please!!!!</span>here if we have a soup of span, and we call .string then it give as "me too please!!!!" (span.string => "me too please!!!!" ) - If a tag’s only child is another tag, and that tag has a
.string
, then the parent tag is considered to have the same.string
as its child:
for example:<span>here if we have the upper span, than upperSpanSoup.string will give the inner span .string. and so give "hi, i want you to get me!!!!".
<span>hi, i want you to get me!!!!</span>
</span> - Finally the important point: <the None> : If a tag contains more than one thing, then it’s not clear what
.string
should refer to, so.string
is defined to beNone
<li><span><span>hi, i want you to get me!!!!</span></span></li>
==> We use getText() or get_text() methods, both do the same, the first i think available only in BS4. i wanted to mention both. And we stripe the result.
liSoup.getText().strip()
GOOD SCRAPPING !!
Commentaires
Enregistrer un commentaire