Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
extract links (a href => zip only)
#1
I used the menu on top of the Quick Macros window to:

- Find HTML element, wait
and
- HTML element actions

Both work for me perfectly to target single elements.
But if I have a very simple webpage with only 3 links, like this (location example: http://www.testserver1234.com/testpage.html) :

Code:
Copy      Help
.
.
<P><A href="http://dir1/subdir1/file1.zip"><STRONG>Part 1</STRONG></A></P>
<P><A href="http://dir2/subdir2/test.txt"><STRONG>text file</STRONG></A></P>
<P><A href="http://dir3/subdir3/file2.zip"><STRONG>Part 2</STRONG></A></P>
.
.

How do I extract only the .zip links?

Code:
Copy      Help
http://dir1/subdir1/file1.zip
http://dir3/subdir3/file2.zip

Edit: I forgot to mention, that the html example (above) will be running in Internet Explorer 9, and I do all data-manipulation/extracting/clicking/...etc... through the browser (IE9) with the help of Quick Macros off course.
#2
Member function Htm.GetLinks_
Code:
Copy      Help
function [flags] [ARRAY(str)&aURL] [ARRAY(str)&aText] [ARRAY(MSHTML.IHTMLElement)&aElem] ;;flags: 0 inner links, 1 all links, 2 itself is link

;Gets links.

;flags:
;;;0 get links that are inside this element (except from inner frames/iframes).
;;;1 get all links in element's container document (which may be frame/iframe).
;;;2 get properties of this element (it should be a link).
;aURL - receives href attribute of links. It is full URL, not relative as in HTML source. Can be 0 if you dont need it.
;aText - receives text of links. Can be 0 if you dont need it.
;aElem - receives COM interface of links. You can call its functions to get URL, text and other attributes. Can be 0 if you dont need it.

;EXAMPLE
;int w=win("Windows Internet Explorer" "IEFrame")
;Htm e=htm("BODY" "" "" w "" 0 0x20)
;ARRAY(str) a at; int i
;e.GetLinks_(1 a at)
;for i 0 a.len
,;out F"{at[i]%%-35s}  {a[i]}"


if(!el) end ERR_INIT

if(&aURL) aURL=0
if(&aText) aText=0
if(&aElem) aElem=0

MSHTML.IHTMLElementCollection links
MSHTML.IHTMLElement link
int src=flags&3
sel src
,case 0 links=el.all.tags("A") ;;info: gets <a> without href
,case 1 links=el.document.links ;;info: does not get <a> without href
,case 2 link=el; goto g1
,case 3 end "flag 3"

foreach link links
,;g1
,str href
,if &aURL or src!1
,,href=link.getAttribute("href" 0); err continue
,,if(!href.len) continue
,if(&aURL) aURL[]=href
,if(&aText) aText[]=link.innerText; err aText[]=_s
,if(&aElem) aElem[]=link

err+ end _error

;note: tried to add flag 3 to get links from selection, but it is difficult and unreliable, works not with all pages.
#3
wow! Thank you very very much!!!
#4
Sorry to ask this, but I have to much problems with Internet Explorer.
You script works perfect!, but some sites just don't act well in IE8 and IE9 (I get blank pages or freezing so you script doesn't even get a chance to work).

Is there any chance the code you have given can be adjusted for Firefox?


edit:
Could you also give an example how to delete duplicate array items, if I have an array called "a" of type string.
then I want remove all duplicate array contents.

Code:
Copy      Help
a[0]="abc"
a[1]="abc"
a[2]="abc"
a[3]="cde"

I currently use this:
Macro Macro7
Code:
Copy      Help
int i
ARRAY(str) a

for i 0 a.len
,if(i>1)
,,if(a[i]=a[i-1])
,,,a[i]=""

but this is not effective, I end up with:
Code:
Copy      Help
abc

abc
cde
I have to remove the empty lines and redo my code.
#5
Tested with Firefox, but should work with all browsers, just need to find correct root web page object.
Macro Macro1817
Code:
Copy      Help
out
int w=win("Firefox" "Mozilla*WindowClass" "" 0x804)
Acc a.Find(w "DOCUMENT" "" "" 0x3010 2)
ARRAY(Acc) aa; int i
a.GetChildObjects(aa -1 "LINK")
for i 0 aa.len
,out "%-35s %s" aa[i].Name aa[i].Value
#6
Macro Macro1820
Code:
Copy      Help
ARRAY(str) a="abc[]abc[]abc[]cde"

int i j
for i a.len-1 0 -1
,for(j 0 i) if(a[j]=a[i]) break
,if(j<i) a.remove(i)

out a
#7
Faster version.

Function FirefoxGetLinks
Code:
Copy      Help
;/
function hwnd [ARRAY(str)&aURL] [ARRAY(str)&aText] [ARRAY(Acc)&aObj]

;Gets links in Firefox.

;hwnd - Firefox window handle.
;aURL - receives href attribute of links. It is full URL, not relative as in HTML source. Can be 0 if you dont need it.
;aText - receives text of links. Can be 0 if you dont need it.
;aObj - receives accessible object of links. Can be 0 if you dont need it.

;EXAMPLE
;out
;int w=win("Firefox" "Mozilla*WindowClass" "" 0x804)
;ARRAY(str) a at; int i
;FirefoxGetLinks w a at
;for i 0 a.len
,;out F"{at[i]%%-35s} {a[i]}"


if(&aURL) aURL=0
if(&aText) aText=0
if(&aObj) aObj=0

FFNode f.FindFF(hwnd "A" "" "" 0 0 0 &__FGL_Proc &hwnd)

err+ end _error
Function __FGL_Proc
Code:
Copy      Help
;/
function# FFNode&x level *p FFNODEINFO&ni

ARRAY(str)& aURL=+p[1]
ARRAY(str)& aText=+p[2]
ARRAY(Acc)& aObj=+p[3]

Acc a.FromFFNode(x)

_s=a.Value
if(!_s.len) ret 1

if(&aURL) aURL[]=_s
if(&aText) aText[]=a.Name
if(&aObj) aObj[]=a

err+
ret 1
#8
PERFECT!! THANK YOU!!!!
#9
Is this still working for the last version of QM?


Gintaras Wrote:Faster version.

Function FirefoxGetLinks
Code:
Copy      Help
;/
function hwnd [ARRAY(str)&aURL] [ARRAY(str)&aText] [ARRAY(Acc)&aObj]

;Gets links in Firefox.

;hwnd - Firefox window handle.
;aURL - receives href attribute of links. It is full URL, not relative as in HTML source. Can be 0 if you dont need it.
;aText - receives text of links. Can be 0 if you dont need it.
;aObj - receives accessible object of links. Can be 0 if you dont need it.

;EXAMPLE
;out
;int w=win("Firefox" "Mozilla*WindowClass" "" 0x804)
;ARRAY(str) a at; int i
;FirefoxGetLinks w a at
;for i 0 a.len
,;out F"{at[i]%%-35s} {a[i]}"


if(&aURL) aURL=0
if(&aText) aText=0
if(&aObj) aObj=0

FFNode f.FindFF(hwnd "A" "" "" 0 0 0 &__FGL_Proc &hwnd)

err+ end _error
Function __FGL_Proc
Code:
Copy      Help
;/
function# FFNode&x level *p FFNODEINFO&ni

ARRAY(str)& aURL=+p[1]
ARRAY(str)& aText=+p[2]
ARRAY(Acc)& aObj=+p[3]

Acc a.FromFFNode(x)

_s=a.Value
if(!_s.len) ret 1

if(&aURL) aURL[]=_s
if(&aText) aText[]=a.Name
if(&aObj) aObj[]=a

err+
ret 1
#10
Works. If doesn't, maybe you use portable Firefox, then the xxxFF functions don't work, need to add something in registry.
FireFox portable and FireFox functions (FirefoxGetTabs, ..)


Forum Jump:


Users browsing this thread: 1 Guest(s)