Offering "pageXurl" - a JS bookmark to extract web page URLs

Post Reply
User avatar
Myxt
Posts: 4156
Joined: Sat Mar 05, 2011 6:18 am

Offering "pageXurl" - a JS bookmark to extract web page URLs

Post by Myxt » Sun Jul 10, 2016 6:19 am

[TL;DR]
- Go to _http://pastebin.com/ZEUMPgma and copy the long line of code (line2). << [url=https://www.mywot.com/en/forum/67486-offering-pagexurl-a-js-bookmark-to-extract-web-page-urls?comment=344302#comment-344302 t=_self]UPDATED[/url] below
- Create a new browser bookmark, give it a name, and paste the code in the URL field.
- Open a web page and click the bookmark to get a report of URLs in the page.
- It's a JavaScript, so it won't work unless JavaScript is enabled in your browser.
____

I have been avoiding this project for too long. Assuming that better coders than I must have solved this already, I occasionally tried numerous searches with numerous terms and landed repeatedly in swamps of irrelevant, over-specialized apps for mincing and modifying single URLs, scraping entire web sites, or swapping puppy pictures with pals.

What I wanted was an app to give me a clean report of URLs in the currently loaded web page. What I found were three browser extensions, two of which were nearly useless, leaving one which at least reported links in its own newly-created page.

That's "links": a rather tired joke arising from the ambiguity of lazy language. This app could only report hyperlinks from Anchor elements:
[url=https://www.mywot.comsomeURL t=_self]linkText[/url]
whereas, IRL, web pages connect to many other kinds of resources such as:
Image

And I wanted it free. A week ago, I realized that the pain of manually sifting through page code exceeded the anticipated pain of coding my own solution; so I ransacked W3Schools' HTML and DOM references, passed a few impasses, and arrived at something I like.

Cons: It can't violate the modern "Same-Origin" rule to dig URLs out of CSS (such as for backgroundImage) or to insert itself into cross-site frames. It can't find your lost keys.

Pros: It's free. It works in Chrome, Firefox, Internet Explorer, and Opera. It doesn't require special browser permissions. It is not an app, extension, or plug-in that installs into, or reconfigures the browser - it's just a bookmark URL that is 2078 bytes long. Anyone who can read JavaScript can verify that it's safe.

Guest

RE: Offering "pageXurl" - a JS bookmark to extract web page URLs

Post by Guest » Sun Jul 10, 2016 7:11 am

Really interesting! Could I use it as an addon in my Android browser?

User avatar
Myxt
Posts: 4156
Joined: Sat Mar 05, 2011 6:18 am

RE: Offering "pageXurl" - a JS bookmark to extract web page URLs

Post by Myxt » Sun Jul 10, 2016 8:03 am

Yes - and thanks for asking. That's why I placed it in uncontrolled space.

Of interest:

The last big hurdle - it took a day to solve - was that script tags or script elements (.createElement and .body.appendChild), added with document.write after window.open, behaved as if they did not exist.

Much searching returned many articles, some quite long, discussing obscure methods and timing, wishful thinking, and frustration - mainly focused on the notion of quirks in document.write, varying by browser and version.

The answer went back to "kindergarten" at W3Schools:
window.open > document.open > document.write > document.close
[facePalm] - which, BTW, nicely triggers body onload in the new page. The clue was that onload wasn't working.

Guest

RE: Offering "pageXurl" - a JS bookmark to extract web page URLs

Post by Guest » Sun Jul 10, 2016 11:21 am

Thanks. It works perfectly with the original code. I put it, in the WOT popup (Link button): _https://s20.postimg.org/wrrmmk5pp/device_2016_07_10_131022.png

Result:
_https://s20.postimg.org/7n0m95699/device_2016_07_10_131123.png
_https://s20.postimg.org/4hg0ixnn1/device_2016_07_10_131141.png

c۞g
Posts: 21225
Joined: Mon Jan 05, 2009 4:02 am

RE: Offering "pageXurl" - a JS bookmark to extract web page URLs

Post by c۞g » Sun Jul 10, 2016 12:39 pm

Nice job!
Added to my FF.

I have Chris Pederick's Web Developer extension (since it's initial release)

Code: Select all

http://chrispederick.com/work/web-developer/

It displays all links in a page including image links but it does not categorize them as nice if you open "Information > View link information" (opens in a new tab)

User avatar
Myxt
Posts: 4156
Joined: Sat Mar 05, 2011 6:18 am

RE: Offering "pageXurl" - a JS bookmark to extract web page URLs

Post by Myxt » Sun Jul 10, 2016 4:24 pm

Thanks for reminding me of Web Developer - I see that my crude extraction method skips mailto links.

User avatar
Myxt
Posts: 4156
Joined: Sat Mar 05, 2011 6:18 am

UPDATE

Post by Myxt » Wed Jul 13, 2016 7:15 am

Updated pageXurl - call it v1.2 - 2959 bytes. Get it from _http://pastebin.com/FKvQpqij

It's more polished with better hostname resolution, and it adds links to Same-Origin iframes so you can get reports on them as well.

About the Same-Origin rule:

For some siteA.com/outer.html with an embedded iframe page sourced from siteA.com/inner.html, scripts within either the Parent or Child page can access and modify the content of the other page because both have the same origin, which is siteA.com.

But if siteA.com/outer.html has an embedded iframe page sourced from siteB.com/inner.html, modern browsers forbid scripts within either page to even access the other - such as to read content - let alone modify it. The security advantage is obvious: If you want to steal (e.g.) Google login credentials, you will have to fabricate your own phishing page because you cannot simply embed the real login page and tinker with it.

If you are experimenting locally - e.g. at _file:///C:/Users/you/Desktop/outer.html (and inner.html) - Firefox and Internet Explorer consider _file:///C:/Users/you/Desktop as a valid Same-Origin; but Chrome and Opera consider that location as null and (somehow) therefore not Same-Origin - even though they know the precise location of the pages they are showing you at this very moment.

Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests