How to save threads to your harddrive

I often find threads in internet forums so interesting that I want to keep them for later use. I want the threads exactly as I see them today, with all pictures included.
There are far too many threads out there that are unusable now because all pictures are missing. A lot of knowledge is lost this way.

I found a solution that works for me with Firefox and Chrome.

  • I use a plugin to load the complete thread as one page. Pages are just added under each other, no reformatting.
  • Then optionally I use another plugin to delete stuff I do not want to save (like repetitive page headers/footers).
  • Finally I save the page as one single MHTML file.

Here is my toolchain for Mozilla Firefox:

  1. PageZipper (PageZipper - Life's too short to spend clicking "Next" « PrintWhatYouLike.com)
  2. Nuke Anything Enhanced - Old but still working.
  3. Mozilla Archive Format - Use similar UnHTM plugin when broken.

Similar toolchain for Chrome:

  1. PageZipper
  2. Click to Remove Element. (Dynamite - did not work for me, may try again.)
  3. Activate ’MHTML Save’ feature in chrome://flags and ’Save as Single File’. This no longer works. See setup procedure in post #8.

Some tricks and traps:

  • Forums need a ‘Next’ link or PageZipper will not work. Many do, some do not.
  • Collecting pages is not completly automatic. You have to scroll down the page until the page counter (like 5/6) shows the correct number. Or click the PageZip next button to step forward through the pages.
  • On a few forums PageZipper tends to ‘overshoot’ and show more pages than there are. I do not know why. CPF is one.
  • In very long threads (my feeling is it starts at 40 pages) sometimes PageZipper (in FF) leaves single pages empty. Try again tomorrow.
  • Switch off PageZipper when it is done so it will not show on the saved page.
  • Nuke Anything Enhanced shows a frame around the element that it will delete if you hover the mouse over it for a second. I found that feature accidentally after months of use. (Real men do not read manuals.)
  • Click to remove elements is handy to kill advertising even when just browsing webpages.
  • Pictures that cannot be saved by hand (like the ones still working on Photobucket) cannot be saved by PageZipper in Firefox but in Chrome. Firefox tries to reload all pictures to save them and Photobucket will not allow direct downloads (they send HTML code instead).
  • Chrome saves pictures directly from cache. Much faster and no surprises.
  • In MHTML files media files are encoded as MIME base 64. I have not tried it but there should be tools to extract single files or even modify the main HTML file.
  • Update: ’Save As MHT’ works much better than ’Save As MHTML’. It saves long filenames and avoids the security warnung for this file extension by using capital letters.
  • Chrome is much faster than Firefox and rock solid. I tested it on the ‘What did you mod today?’ thread with almost 3250 posts. The saved file is 1.1 GB. Found no browser that can open it…

Here is a tough test thread: Review: Convoy S2+ (the red one!) – Fantastic 18650 EDC

This is very cool TBone, thanks for sharing it.

Sean

Wow nice
I sometimes print a page to PDF but this is better with more pages of a thread possible
Thanks!

Works Good Thanks

John.

Small updates.

Thanks for the bump.I missed this in July.I tested Page Zipper (Chrome) on the GT thread.That’s one loooong page!

New installation procedure for Chrome.

Google has deleted the flag for the experimental ‘MHTML Save’ feature in Version 75. The feature is still there but you always have to start Chrome with a parameter to enable it.

Here is how to set this up (only once):

  1. Right click on Chrome shortcut on the desktop
  2. Select “Properties”
  3. In the target field after "...chrome.exe" add --save-page-as-mhtml (with a space in front and after the ")
  4. Click "Apply", give administrator permission by clicking "Continue".