Now how do I get all the owners of good soups to hand me their RSS link? 🤔
@rixx good news: found the password for my soup account again. bad news: where is the link to the settings where I can see my RSS link? (It seems some JS or CSS doesn't load because of cross-origin stuff...?)
@daniel_bohrer Alternatively, a naive scraper will probably net you even more images because some people used to post images as test-posts with links instead of image posts, but with their current response times, you'd just have to hope to get done before it's over.
@rixx I already tried wget --mirror, but it obviously doesn't load the JS and execute it, so no content is loaded and therefore no content is downloaded at all :-/
@daniel_bohrer Oh huh, that's weird. Did you use a custom domain? JS execution should not be necessary at all.
@rixx Yeah, that's what I remember too, but even after login I cannot see this button.
Ah well, maybe I should just let it go.
@rixx maybe that would help. I've tried the export RSS now with the soup-backup script, but it always times out… The other option was wget --mirror --page-requisites, which only gave me the index.html without any images...
@daniel_bohrer https://drop.rixx.de/wkE/ with `pip install requests beautifulsoup4`. You'll have to make a "data" directory and touch some files in there first, though, and of course replace "rixx". Should put a bunch of files with URLs in the data directory (so made so that you can start downloading while it's scraping). Files are in the format "<url> <post_id>" to help you retain some sort of ID/ordering.
@daniel_bohrer Does a vague sort or resume on error, though you might end up with some duplicate images in any case. `uniq` if you care etc etc.
Doesn't download images because piping into curl is probably the best thing to do here.
@daniel_bohrer I just improved the script to be less annoying, gonna publish in like five minutes, if you're still interested.
@rixx mine timeouts with 503 when you request it unfortunately; it doesn’t matter how often or when you try
chaos.social – a Fediverse instance for & by the Chaos community