Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How do i split html source code ?
#1
I want to extract & summarize topics from a forum.
The page source is downloaded & the topic all start with "<span class="title"><span class="genmed"></span><a href="viewtopic.php?t=15678".
The SID I have, only need to split the source on this line probably with "<span class="title"><span class="genmed"></span><a href="viewtopic.php?t=" , but I simply need something like in VBA split string on these charaters, but the tok function splits it into many small pieces.

Code:
Copy      Help
;ARRAY(str) b
;tok s b -1

Please Help ?

Here is an extract from the source data:

Code:
Copy      Help
    <div class="short-description">
                <span>Topics</span>
            </div>
            <div class="topics">
                <span>Replies</span>
            </div>
            <div class="posts">
                <span>Author</span>
            </div>
            <div class="views">
                <span>Views</span>
            </div>
            <div class="last-post">
                <span>Last Post</span>
            </div>
        </div>
                        <div class="cat-row"><span>Important Announcements</span></div>
                <div class="list-rows">
            <div class="topicrow">
                <div class="icon"><span><a href="viewtopic.php?t=15678"><img src="//img11.hardware.com/images/folder_global_announce.gif" alt="This topic is locked: you cannot edit posts or make replies." title="This topic is locked: you cannot edit posts or make replies." /></a></span></div>
                <div class="description">
                    <span>
                                                <span class="title"><span class="genmed"></span><a href="viewtopic.php?t=15678" class="topictitle">Rules! | Last updated: September 30th, 2016</a></span>
                                                                        <span class="pagination"></span>
                    </span>
                </div>
                <div class="topics">
                    <span>0</span>
                </div>
                <div class="posts">
                    <span><a href="profile.php?mode=viewprofile&amp;u=30&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">Coolyou</a></span>
                </div>
                <div class="views">
                    <span>3,946,618</span>
                </div>
                <div class="last-post">
                    <span><a href="viewtopic.php?p=52838" title="View latest post">Mon Jul 11, 2005 7:00 am</a><br /><a href="profile.php?mode=viewprofile&amp;u=30&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">Coolyou</a> <a href="viewtopic.php?p=52838"><img src="//img11.hardware.com/images/icon_latest_reply.gif" width="18" height="9" class="imgspace" alt="View latest post" title="View latest post" border="0" /></a></span>
                </div>
            </div>
        </div>
                        <div class="cat-row"><span>Sticky Topics</span></div>
                <div class="list-rows">
            <div class="topicrow">
                <div class="icon"><span><a href="viewtopic.php?t=1142400"><img src="//img11.hardware.com/images/folder_sticky.gif" width="25" height="25" class="imgfolder" alt="No new posts" title="No new posts" /></a></span></div>
                <div class="description">
                    <span>
                                                <span class="title"><span class="genmed"></span><a href="viewtopic.php?t=1142400" class="topictitle">Go Here Before Asking Questions Or Ask For Help</a></span>
                                                                        <span class="pagination"> [ <img src="//img11.hardware.com/images/icon_minipost.gif" width="12" height="9" class="imgspace" alt="Goto page" title="Goto page" />Goto page: <a href="viewtopic.php?t=1142400">1</a> ... <a href="viewtopic.php?t=1142400">5</a>, <a href="viewtopic.php?t=1142400">6</a>, <a href="viewtopic.php?t=1142400">7</a> ] </span>
                    </span>
                </div>
                <div class="topics">
                    <span>101</span>
                </div>
                <div class="posts">
                    <span><a href="profile.php?mode=viewprofile&amp;u=284931&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">MasterDKR</a></span>
                </div>
                <div class="views">
                    <span>8,144</span>
                </div>
                <div class="last-post">
                    <span><a href="viewtopic.php?p=86702013" title="View latest post">Sun Apr 19, 2015 7:23 am</a><br /><a href="profile.php?mode=viewprofile&amp;u=3693006&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">abhijim007</a> <a href="viewtopic.php?p=86702013"><img src="//img11.hardware.com/images/icon_latest_reply.gif" width="18" height="9" class="imgspace" alt="View latest post" title="View latest post" border="0" /></a></span>
                </div>
            </div>
        </div>
                        <div class="cat-row"><span>Topics</span></div>
                <div class="list-rows">
            <div class="topicrow">
                <div class="icon"><span><a href="viewtopic.php?t=23831402"><img src="//img11.hardware.com/images/folder.gif" width="25" height="25" class="imgfolder" alt="No new posts" title="No new posts" /></a></span></div>
                <div class="description">
                    <span>
                                                <span class="title"><span class="genmed"></span><a href="viewtopic.php?t=23831402" class="topictitle">Coding help for a semi-beginner!</a></span>
                        <br />Description: need help updating my knowledge                                                <span class="pagination"></span>
                    </span>
                </div>
                <div class="topics">
                    <span>3</span>
                </div>
                <div class="posts">
                    <span><a href="profile.php?mode=viewprofile&amp;u=494136&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">soupsmoke</a></span>
                </div>
                <div class="views">
                    <span>44</span>
                </div>
                <div class="last-post">
                    <span><a href="viewtopic.php?p=92967860" title="View latest post">Fri Nov 04, 2016 11:12 pm</a><br /><a href="profile.php?mode=viewprofile&amp;u=39648&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">deadseason1</a> <a href="viewtopic.php?p=92967860"><img src="//img11.hardware.com/images/icon_latest_reply.gif" width="18" height="9" class="imgspace" alt="View latest post" title="View latest post" border="0" /></a></span>
                </div>
            </div>
        </div>
                        <div class="list-rows">
            <div class="topicrow">
                <div class="icon"><span><a href="viewtopic.php?t=23820810"><img src="//img11.hardware.com/images/folder.gif" width="25" height="25" class="imgfolder" alt="No new posts" title="No new posts" /></a></span></div>
                <div class="description">
                    <span>
                                                <span class="title"><span class="genmed"></span><a href="viewtopic.php?t=23820810" class="topictitle">web developer noobie looking to learn</a></span>
                                                                        <span class="pagination"></span>
#2
To split HTML use regular expressions or HtmlDoc class.

Macro Macro274
Code:
Copy      Help
str s=
;;;;;;;;;;;;;;;<div class="short-description">
;;;;;;;;;;;;;;;;;;;;<span>Topics</span>
;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;<div class="topics">
;;;;;;;;;;;;;;;;;;;;<span>Replies</span>
;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;<div class="posts">
;;;;;;;;;;;;;;;;;;;;<span>Author</span>
;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;<div class="views">
;;;;;;;;;;;;;;;;;;;;<span>Views</span>
;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;<div class="last-post">
;;;;;;;;;;;;;;;;;;;;<span>Last Post</span>
;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;<div class="cat-row"><span>Important Announcements</span></div>
;;;;;;;;;;;;;;;;;;;;<div class="list-rows">
;;;;;;;;;;;;;;;;<div class="topicrow">
;;;;;;;;;;;;;;;;;;;<div class="icon"><span><a href="viewtopic.php?t=15678"><img src="//img11.hardware.com/images/folder_global_announce.gif" alt="This topic is locked: you cannot edit posts or make replies." title="This topic is locked: you cannot edit posts or make replies." /></a></span></div>
;;;;;;;;;;;;;;;;;;;;<div class="description">
;;;;;;;;;;;;;;;;;;;;;;;;<span>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<span class="title"><span class="genmed"></span><a href="viewtopic.php?t=15678" class="topictitle">Rules! | Last updated: September 30th, 2016</a></span>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<span class="pagination"></span>
;;;;;;;;;;;;;;;;;;;;;;;;</span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="topics">
;;;;;;;;;;;;;;;;;;;;;;;;<span>0</span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="posts">
;;;;;;;;;;;;;;;;;;;;;;;;<span><a href="profile.php?mode=viewprofile&amp;u=30&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">Coolyou</a></span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="views">
;;;;;;;;;;;;;;;;;;;;;;;<span>3,946,618</span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="last-post">
;;;;;;;;;;;;;;;;;;;;;;;;<span><a href="viewtopic.php?p=52838" title="View latest post">Mon Jul 11, 2005 7:00 am</a><br /><a href="profile.php?mode=viewprofile&amp;u=30&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">Coolyou</a> <a href="viewtopic.php?p=52838"><img src="//img11.hardware.com/images/icon_latest_reply.gif" width="18" height="9" class="imgspace" alt="View latest post" title="View latest post" border="0" /></a></span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;<div class="cat-row"><span>Sticky Topics</span></div>
;;;;;;;;;;;;;;;;;;;;<div class="list-rows">
;;;;;;;;;;;;;;;;<div class="topicrow">
;;;;;;;;;;;;;;;;;;;<div class="icon"><span><a href="viewtopic.php?t=1142400"><img src="//img11.hardware.com/images/folder_sticky.gif" width="25" height="25" class="imgfolder" alt="No new posts" title="No new posts" /></a></span></div>
;;;;;;;;;;;;;;;;;;;;<div class="description">
;;;;;;;;;;;;;;;;;;;;;;;;<span>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<span class="title"><span class="genmed"></span><a href="viewtopic.php?t=1142400" class="topictitle">Go Here Before Asking Questions Or Ask For Help</a></span>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<span class="pagination"> [ <img src="//img11.hardware.com/images/icon_minipost.gif" width="12" height="9" class="imgspace" alt="Goto page" title="Goto page" />Goto page: <a href="viewtopic.php?t=1142400">1</a> ... <a href="viewtopic.php?t=1142400">5</a>, <a href="viewtopic.php?t=1142400">6</a>, <a href="viewtopic.php?t=1142400">7</a> ] </span>
;;;;;;;;;;;;;;;;;;;;;;;;</span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="topics">
;;;;;;;;;;;;;;;;;;;;;;;;<span>101</span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="posts">
;;;;;;;;;;;;;;;;;;;;;;;;<span><a href="profile.php?mode=viewprofile&amp;u=284931&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">MasterDKR</a></span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="views">
;;;;;;;;;;;;;;;;;;;;;;;<span>8,144</span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="last-post">
;;;;;;;;;;;;;;;;;;;;;;;;<span><a href="viewtopic.php?p=86702013" title="View latest post">Sun Apr 19, 2015 7:23 am</a><br /><a href="profile.php?mode=viewprofile&amp;u=3693006&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">abhijim007</a> <a href="viewtopic.php?p=86702013"><img src="//img11.hardware.com/images/icon_latest_reply.gif" width="18" height="9" class="imgspace" alt="View latest post" title="View latest post" border="0" /></a></span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;<div class="cat-row"><span>Topics</span></div>
;;;;;;;;;;;;;;;;;;;;<div class="list-rows">
;;;;;;;;;;;;;;;;<div class="topicrow">
;;;;;;;;;;;;;;;;;;;<div class="icon"><span><a href="viewtopic.php?t=23831402"><img src="//img11.hardware.com/images/folder.gif" width="25" height="25" class="imgfolder" alt="No new posts" title="No new posts" /></a></span></div>
;;;;;;;;;;;;;;;;;;;;<div class="description">
;;;;;;;;;;;;;;;;;;;;;;;;<span>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<span class="title"><span class="genmed"></span><a href="viewtopic.php?t=23831402" class="topictitle">Coding help for a semi-beginner!</a></span>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;<br />Description: need help updating my knowledge                                                <span class="pagination"></span>
;;;;;;;;;;;;;;;;;;;;;;;;</span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="topics">
;;;;;;;;;;;;;;;;;;;;;;;;<span>3</span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="posts">
;;;;;;;;;;;;;;;;;;;;;;;;<span><a href="profile.php?mode=viewprofile&amp;u=494136&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">soupsmoke</a></span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="views">
;;;;;;;;;;;;;;;;;;;;;;;<span>44</span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;<div class="last-post">
;;;;;;;;;;;;;;;;;;;;;;;;<span><a href="viewtopic.php?p=92967860" title="View latest post">Fri Nov 04, 2016 11:12 pm</a><br /><a href="profile.php?mode=viewprofile&amp;u=39648&amp;sid=a812f9f148a9f42c4ffbaffd9473baba">deadseason1</a> <a href="viewtopic.php?p=92967860"><img src="//img11.hardware.com/images/icon_latest_reply.gif" width="18" height="9" class="imgspace" alt="View latest post" title="View latest post" border="0" /></a></span>
;;;;;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;</div>
;;;;;;;;;;;;</div>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;<div class="list-rows">
;;;;;;;;;;;;;;;;<div class="topicrow">
;;;;;;;;;;;;;;;;;;;<div class="icon"><span><a href="viewtopic.php?t=23820810"><img src="//img11.hardware.com/images/folder.gif" width="25" height="25" class="imgfolder" alt="No new posts" title="No new posts" /></a></span></div>
;;;;;;;;;;;;;;;;;;;;<div class="description">
;;;;;;;;;;;;;;;;;;;;;;;;<span>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<span class="title"><span class="genmed"></span><a href="viewtopic.php?t=23820810" class="topictitle">web developer noobie looking to learn</a></span>
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<span class="pagination"></span>
;;;;;;;;;;;;;;;;;;;;;;;
ARRAY(str) a
str rx=
;<span class="title"><span class="genmed"></span><a href="(.+?)" class="topictitle">(.+?)</a></span>
if(0=findrx(s rx 0 4 a)) end "failed"
int i
for i 0 a.len
,out i
,out a[1 i]
,out a[2 i]
#3
Thank you Gintaras - it works 100% Big Grin Big Grin Big Grin


Forum Jump:


Users browsing this thread: 1 Guest(s)