Actually, even that one is messed up a bit. But there are lots you could try.
iPhone 6S Plus running latest iOs using the feature in Safari called „create pdf“:
That seems to do a screenshot to a graphic file and then drop that into a PDF. That should be easy. But I can do that too, in an instant.
What seems to mess up printing from the HTML page is CSS and the apparent fact that you can have one lot of CSS to display the page and another lot of CSS which becomes active for printing the page. That’s my very limited understanding, anyway… my website development has always been pages free of style sheets which “just work” But if you do a screenshot that bypasses all that stuff you get exactly what was on the screen. You just can’t copy/paste any text from it… well not unless you do OCR on it, which the full version of acrobat can do, too…
It is all very possible and I have done it for here many times. For one page I can just pull up Faststone screen capture (one of the best bits of software I’ve ever had and a massive timesaver), and either (a) copy/paste the image into the forum or (b) print the image to a PDF and drop that in.
It gets tedious for ads which span 2 or 3 pages, because then one gets several images which need to be dropped into a PDF. This is also fiddly to automate, I suspect, because you would need to “scroll down” and take another screenshot. But sometimes, with a small ad, you won’t need to do that.
If I was doing this and I knew how, or if I was paying someone to do it (and EuroGA donations so far, very generous though they have been, don’t really cover paying anybody to write code) I would find a way to cut through the style sheet crap which is screwing up the advert display when “printing”, and print the HTML to a PDF. Then you don’t need to do OCR because you have the actual text in the PDF, and you get a nice compact PDF. EuroGA already has 40GB of images sitting on the server and this will only keep getting bigger (btw, the current space is 160GB, so not an issue).
Extracting PDF files from websites is a tricky thing, as not all provide an appropriate CSS print template. You could define your own and convert the data you pull, but this means not only pulling and storing, but processing foreign data. If you pull and process you may have to go PDF/A to ensure one can read it for a while.
Peter wrote:
Unfortunately their storage policies are very patchy, to say the least.
If you are referring to the pages that their crawler decides to actively visit and archive as part of their “Wayback Machine”, then that has nothing to do with the mentioned “Save page now” function from here:
If it gives you a URL, then that will be guaranteed to be archived. It saves the actual HTML/CSS and comes as close to what you want to do as possible.
BTW, they also accept donations ;)
I now have someone working on a tool for doing this
Um, isn’t it as simple as “print to PDF”, with the additional caveat that planecheck uses frames, so one must “print to PDF” the frame with the ad?
On planecheck – find ad you like, right-click on the center, eg. on the plane registration, “this frame”, “print frame”, “to file”, give the file a name, voila.
Should work for any other site, I would think…
Which browser are you using and on which platform? In Chrome I see this
Planecheck has a little “print aircraft” button (top left) which produces a text-only listing like this
Firefox 65 on Ubuntu; the same works in FF for Windows, but one might need to add a “print to PDF” solution.
Chrome is, indeed, lacking in this department.
I have a tool for doing this now – Planecheck at least. Example: