Compare commits

...

633 commits

Author SHA1 Message Date
Dominika 0526e2add4 * Change LGPL to Unlicense 2021-10-23 01:11:44 +02:00
Lauren Liberda ab1904e854 fix/speedup ci 2021-10-23 01:06:55 +02:00
Lauren Liberda f1bd7ee019 vider support 2021-10-23 01:06:55 +02:00
Lauren Liberda 0b617cb3ed [polskieradio] fix PR4 audition shit 2021-10-23 01:06:55 +02:00
Lauren Liberda 9638569ee0 [ipla] state the DRM requirement clearly 2021-10-23 01:06:55 +02:00
Lauren Liberda db05777f80 [ipla] error handling 2021-10-23 01:06:55 +02:00
Dominika Liberda ed2d212864 * version 2021.08.01 2021-10-23 01:06:55 +02:00
Lauren Liberda 5cc47c2fde [youtube] fix age gate for *some* videos 2021-10-23 01:06:55 +02:00
Lauren Liberda a9e6daf0d8 [peertube] pt 3.3+ url scheme support, fix tests, minor fixes 2021-10-23 01:06:55 +02:00
Lauren Liberda 07b309368f [niconico] dmc downloader and other stuff from yt-dlp (as of 40078a5) 2021-10-23 01:06:55 +02:00
Dominika Liberda 18b5da3114 * version 2021.06.24.1 2021-10-23 01:06:55 +02:00
Dominika Liberda 0872e0c334 * fixes crash if signature decryption code isn't packed with artifacts 2021-10-23 01:06:55 +02:00
Dominika Liberda 23a00ac4b8 * fix in release script 2021-10-23 01:06:55 +02:00
Dominika Liberda 4223117be9 * version 2021.06.24 2021-10-23 01:06:55 +02:00
Dominika Liberda 937a597095 * fixes youtube list extractor 2021-10-23 01:06:55 +02:00
Lauren Liberda 1fbc3083b5 fix app crash/tests 2021-10-23 01:06:55 +02:00
Lauren Liberda f03d9efbb8 [liveleak] remove for real 2021-10-23 01:06:55 +02:00
Lauren Liberda 679db44560 [soundcloud] prerelease client id fetching 2021-10-23 01:06:55 +02:00
Lauren Liberda 5f8b81c6e7 prerelease artifact generator, for youtube sig 2021-10-23 01:06:55 +02:00
Lauren Liberda 17436014c9 [liveleak] remove extractor 2021-10-23 01:06:55 +02:00
Lauren Liberda b5b2163730 [pornhub] Add support for pornhubthbh7ap3u.onion
Original author: dstftw <dstftw@gmail.com>
2021-10-23 01:06:55 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 68dd52b3bf [pornhub] Detect geo restriction 2021-10-23 01:06:55 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b717cae5d2 [pornhub] Dismiss tbr extracted from download URLs (closes #28927)
No longer reliable
2021-10-23 01:06:55 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 7e9f0bdc0c [curiositystream:collection] Extend _VALID_URL (closes #26326, closes…
#29117)
2021-10-23 01:06:55 +02:00
Tianyi Shi 4645070227 [bilibili] Strip uploader name (#29202) 2021-10-23 01:06:55 +02:00
Logan B 87e25ce47f [umg:de] Update GraphQL API URL (#29304)
Previous one no longer resolves

Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-10-23 01:06:55 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3d4581e17e [nrk] Switch psapi URL to https (closes #29344)
Catalog calls no longer work via http
2021-10-23 01:06:55 +02:00
kikuyan b19b66e927 [postprocessor/ffmpeg] Show ffmpeg output on error (refs #22680) (#29…
…336)
2021-10-23 01:06:55 +02:00
kikuyan 11f62f6f41 [egghead] Add support for app.egghead.io (closes #28404) (#29303)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-10-23 01:06:55 +02:00
kikuyan 2d37e073fc [appleconnect] Fix extraction (#29208) 2021-10-23 01:06:55 +02:00
kikuyan 30bc6f7e13 [orf:tvthek] Add support for MPD formats (closes #28672) (#29236) 2021-10-23 01:06:55 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 7855193169 [facebook] Improve login required detection 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ba41af65d6 [youporn] Fix formats and view count extraction (closes #29216) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8b43ba3b9e [orf:tvthek] Fix thumbnails extraction (closes #29217) 2021-10-23 01:06:54 +02:00
Remita Amine 404f3b78b0 [formula1] fix extraction(closes #29206) 2021-10-23 01:06:54 +02:00
Lauren Liberda 3cd3edc116 [youtube] fix the fancy georestricted error 2021-10-23 01:06:54 +02:00
Dominika Liberda 09c16b979a * version 2021.06.20 2021-10-23 01:06:54 +02:00
Lauren Liberda 7ac2178507 update changelog 2021-10-23 01:06:54 +02:00
Dominika Liberda 619d36e26c * fixes agegate on youtube 2021-10-23 01:06:54 +02:00
Lauren Liberda f52386cd5c [youtube] cleanup, speed up age-gated extraction, fix videos with js-like syntax 2021-10-23 01:06:54 +02:00
Lauren Liberda ae124bda4d [options] fix playwright headlessness behavior 2021-10-23 01:06:54 +02:00
Lauren Liberda 7340d055f6 [playwright] option to force a specific browser 2021-10-23 01:06:54 +02:00
Lauren Liberda dcb83a2e63 [tiktok] fix empty video lists
I'm fucking stupid
2021-10-23 01:06:54 +02:00
Lauren Liberda 26d3345641 [playwright] simplify code 2021-10-23 01:06:54 +02:00
Dominika Liberda 5e59a0c68d * version 2021.06.01 2021-10-23 01:06:54 +02:00
Lauren Liberda fa2c96dbf7 update changelog 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0985383759 [ard] Relax _VALID_URL and fix video ids (closes #22724, closes #29091) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8b33577012 [ustream] Detect https embeds (closes #29133) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c54b1f98ec [ted] Prefer own formats over external sources (closes #29142) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a5baa644c0 [twitch:clips] Improve extraction (closes #29149) 2021-10-23 01:06:54 +02:00
phlip 8603229d45 [twitch:clips] Add access token query to download URLs (closes #29136) 2021-10-23 01:06:54 +02:00
Remita Amine c1544c413a [vimeo] fix vimeo pro embed extraction(closes #29126) 2021-10-23 01:06:54 +02:00
Remita Amine da145ccc36 [redbulltv] fix embed data extraction(closes #28770) 2021-10-23 01:06:54 +02:00
Remita Amine c6db7563e2 [shahid] relax _VALID_URL(closes #28772, closes #28930) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a757ab0382 [playstuff] Add extractor (closes #28901, closes #28931) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 500d0ac319 [eroprofile] Skip test 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 39722f43cc [eroprofile] Fix extraction (closes #23200, closes #23626, closes #29…
…008)
2021-10-23 01:06:54 +02:00
kr4ssi 8bc41b335b [vivo] Add support for vivo.st (#29009)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b5e4cd23ec [generic] Add support for og:audio (closes #28311, closes #29015) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= eb8e9fe2d0 [options] Fix thumbnail option group name (closes #29042) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 656c0b9c09 [phoenix] Fix extraction (closes #29057) 2021-10-23 01:06:54 +02:00
Lauren Liberda d638820bb4 [generic] Add support for sibnet embeds
286e01ce30
2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= eac67eb1d4 [vk] Add support for sibnet embeds (closes #9500) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e12c9a7042 [generic] Add Referer header for direct videojs download URLs (closes…
#2879, closes #20217, closes #29053)
2021-10-23 01:06:54 +02:00
Lukas Anzinger 44e71d2673 [orf:radio] Switch download URLs to HTTPS (closes #29012) (#29046) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a35a131b44 [blinkx] Remove extractor (closes #28941)
No longer exists.
2021-10-23 01:06:54 +02:00
catboy 6852dc1aa6 [medaltv] Relax _VALID_URL (#28884)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-10-23 01:06:54 +02:00
Jacob Chapman 0331aa9167 [YoutubeDL] Improve extract_info doc (#28946)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e1096efc6a [funimation] Add support for optional lang code in URLs (closes #28950) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1c7d4b3685 [gdcvault] Add support for HTML5 videos 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0368cbf93b [dispeak] DRY and update tests (closes #28970) 2021-10-23 01:06:54 +02:00
Ben Rog-Wilhelm 239c10f655 [dispeak] Improve FLV extraction (closes #13513) 2021-10-23 01:06:54 +02:00
Ben Rog-Wilhelm c835d0687b [kaltura] Improve iframe extraction (#28969)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-10-23 01:06:54 +02:00
Lauren Liberda e42a7a1fa9 [kaltura] Make embed code alternatives actually work 2021-10-23 01:06:54 +02:00
Lauren Liberda 606afd80d7 [youtube] fix videos with age gate 2021-10-23 01:06:54 +02:00
Lauren Liberda 3c0361c4dd radiokapital extractors 2021-10-23 01:06:54 +02:00
Lauren Liberda 386e1394d7 [misskey] add tests 2021-10-23 01:06:54 +02:00
Lauren Liberda 6a36a1dae6 utils: flake8 2021-10-23 01:06:54 +02:00
Lauren Liberda 643c27d1b4 misskey extractor 2021-10-23 01:06:54 +02:00
Lauren Liberda 2e14967f2a [tiktok] deduplicate videos 2021-10-23 01:06:54 +02:00
Lauren Liberda 79fff7e88d [peertube] logging in 2021-10-23 01:06:54 +02:00
Lauren Liberda bab808f45b [mastodon] support cards to external services 2021-10-23 01:06:54 +02:00
Lauren Liberda 46e0bbdd3f [mastodon] cache apps on logging in 2021-10-23 01:06:54 +02:00
Lauren Liberda 224aaf6089 changelog update 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 7d714dfeed [twitter] Improve formats extraction from vmap URL (closes #28909) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 370d964894 [xtube] Fix formats extraction (closes #28870) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a1b9be1a53 [svtplay] Improve extraction (closes #28507, closes #28876) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 17b1620d62 [tv2dk] Fix extraction (closes #28888) 2021-10-23 01:06:54 +02:00
schnusch fddd529f5f [xfileshare] Add support for wolfstream.tv (#28858) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b841786bd2 [francetvinfo] Improve video id extraction (closes #28792) 2021-10-23 01:06:54 +02:00
catboy 3191180dac [medaltv] Fix extraction (#28807)
numeric clip ids are no longer used by medal, and integer user ids are now sent as strings.
2021-10-23 01:06:54 +02:00
The Hatsune Daishi 64dca35585 [tver] Redirect all downloads to Brightcove (#28849) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3302ba8ad7 [test_execution] Add test for lazy extractors (refs #28780) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8457a6d655 [bbc] Extract full description from __INITIAL_DATA__ (refs #28774) 2021-10-23 01:06:54 +02:00
dirkf 625303a611 [bbc] Extract description and timestamp from __INITIAL_DATA__ (#28774) 2021-10-23 01:06:54 +02:00
Lauren Liberda c3f2a841bb [mastodon] oh haruhi what did I NOT do here
+ --force-use-mastodon option
+ logging in to mastodon/pleroma
+ fetching posts via different mastodon/pleroma instances to get follower-only/direct posts
+ fetching peertube videos via pleroma instances to circuvument censorship (?)
2021-10-23 01:06:54 +02:00
Lauren Liberda 9785b42850 [wppilot] add tests 2021-10-23 01:06:54 +02:00
Lauren Liberda b2b7827c17 [wppilot] reduce logging in and throw meaningful errors 2021-10-23 01:06:54 +02:00
Lauren Liberda d9638fc6a0 wp pilot extractors 2021-10-23 01:06:54 +02:00
Lauren Liberda 5ed8a6a03c yet another update on funding 2021-10-23 01:06:54 +02:00
Lauren Liberda 83e8089e97 [tvp] fix website extracting with weird urls 2021-10-23 01:06:54 +02:00
Lauren Liberda 62c7989a1b [tvn] better extraction method choosing 2021-10-23 01:06:54 +02:00
Lauren Liberda c72ceff4d4 update on donations 2021-10-23 01:06:54 +02:00
Lauren Liberda 15beff25c0 [tvp:embed] handling formats better way 2021-10-23 01:06:54 +02:00
Lauren Liberda ef77f34b31 [youtube:channel] fix multiple page extraction 2021-10-23 01:06:54 +02:00
Lauren Liberda 46cefbff8d readme update 2021-10-23 01:06:54 +02:00
Lauren Liberda dc73c8de73 [tvp] fix jp2.tvp.pl 2021-10-23 01:06:54 +02:00
Lauren Liberda 38060e3efc [mastodon] support for soapbox and audio files 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c4ccab8371 [cbsnews] Fix extraction for python <3.6 (closes #23359) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 94d4c3d1cc [utils] Add support for support for experimental HTTP response status…
… code 308 Permanent Redirect (refs #27877, refs #28768)
2021-10-23 01:06:54 +02:00
quyleanh a7c395a00d [pluralsight] Extend anti-throttling timeout (#28712) 2021-10-23 01:06:54 +02:00
Aaron Lipinski 1759eea81f [maoritv] Add new extractor(closes #24552) 2021-10-23 01:06:54 +02:00
Remita Amine fc5095c1d2 [mtv] Fix Viacom A/B Testing Video Player extraction(closes #28703) 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e2f80b2756 [pornhub] Extract DASH and HLS formats from get_media end point (clos…
…es #28698)
2021-10-23 01:06:54 +02:00
Remita Amine 0f178530c5 [cbssports] fix extraction(closes #28682) 2021-10-23 01:06:54 +02:00
Remita Amine 74b132c3fc [jamendo] fix track extraction(closes #28686) 2021-10-23 01:06:54 +02:00
Remita Amine a1a447a265 [curiositystream] fix format extraction(closes #26845, closes #28668) 2021-10-23 01:06:54 +02:00
Lauren Liberda 4fa8ad7c6c compat simplecookie again because reasons 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0e2e352931 [compat] Use more conventional name for compat SimpleCookie 2021-10-23 01:06:54 +02:00
guredora 2ca5e38e9f [line] add support live.line.me (closes #17205)(closes #28658) 2021-10-23 01:06:54 +02:00
Lauren Liberda b4e5845dd3 added compat_SimpleCookie for compatibility with ytdl 2021-10-23 01:06:54 +02:00
Remita Amine 9c768e0dae [compat] add compat_SimpleCookie 2021-10-23 01:06:54 +02:00
Lauren Liberda 33db90c0b3 [vimeo] extraction improvements
originally by Remita Amine <remitamine@gmail.com>
2021-10-23 01:06:54 +02:00
RomanEmelyanov 4e795ff1ae [youku] Update ccode(closes #17852, closes #28447, closes #28460) (#2…
…8648)
2021-10-23 01:06:54 +02:00
Remita Amine ecfbf64895 [extractor/common] fix _get_cookies method for python 2(#20673, #2325…
…6, #20326, closes #28640)
2021-10-23 01:06:54 +02:00
Remita Amine 56e1b36377 [screencastomatic] fix extraction(closes #11976, closes #24489) 2021-10-23 01:06:54 +02:00
Allan Daemon 7f09191fd0 [palcomp3] Add new extractor(closes #13120) 2021-10-23 01:06:54 +02:00
Vid efab7d7dc4 [arnes] Add new extractor(closes #28483) 2021-10-23 01:06:54 +02:00
Adrian Heine 61a370b0cd [magentamusik360] Add new extractor 2021-10-23 01:06:54 +02:00
Lauren Liberda e8519119ce [core] merge formats by codecs 2021-10-23 01:06:54 +02:00
Lauren Liberda e6a069b3e2 [senat.pl] support for live videos 2021-10-23 01:06:54 +02:00
Lauren Liberda eac7ec5743 [sejm.pl] support live streams 2021-10-23 01:06:54 +02:00
Lauren Liberda 8513698017 + castos extractors 2021-10-23 01:06:54 +02:00
Lauren Liberda 085ce91c53 spryciarze.pl extractors 2021-10-23 01:06:54 +02:00
Lauren Liberda 5a5c43b647 json_dl: better author extraction 2021-10-23 01:06:54 +02:00
Lauren Liberda ab5d9dabf7 [spreaker] embedded player support 2021-10-23 01:06:54 +02:00
Lauren Liberda f5914a436e [spreaker] new url schemes 2021-10-23 01:06:54 +02:00
Lauren Liberda 5e117b0baf senat.pl extractor 2021-10-23 01:06:54 +02:00
Lauren Liberda e330f170bb [sejm.pl] extracting ism formats, small changes to work with senat 2021-10-23 01:06:54 +02:00
Lauren Liberda 8324350f32 [sejm.pl] multiple cameras and PJM translator 2021-10-23 01:06:54 +02:00
Lauren Liberda 059505a9ff + sejm.gov.pl archival video extractor 2021-10-23 01:06:54 +02:00
Lauren Liberda 8fe478ae13 improve documentation on subtitles 2021-10-23 01:06:54 +02:00
Lauren Liberda 8a84a62b70 [tvp] support for tvp.info vue pages 2021-10-23 01:06:54 +02:00
Lauren Liberda e293203eba [cda] fix premium videos for premium users (?) 2021-10-23 01:06:54 +02:00
Lauren Liberda 6ea512d062 [tvn24] refactor nextjs frontend handling
mitigating HTTP 404 response issues
2021-10-23 01:06:54 +02:00
Lauren Liberda 486c20162c - [ninateka] remove extractor [*]
ninateka uses DRM protection now
2021-10-23 01:06:54 +02:00
Lauren Liberda 68078811a6 [tvp:series] error handling, fallback to web 2021-10-23 01:06:54 +02:00
Lauren Liberda e9a6e6819c copykitku: get ready for merging other fork changes 2021-10-23 01:06:54 +02:00
Dominika Liberda 99be437b0b * version 2021.04.01 2021-10-23 01:06:54 +02:00
Lauren Liberda 9c3071f67e [vlive] merge all updates from ytdl 2021-10-23 01:06:54 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= dd32f5079e [francetvinfo] Improve video id extraction (closes #28584) 2021-10-23 01:06:54 +02:00
Chris Hranj a55d01caa2 [instagram] Improve title extraction and extract duration (#28469)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-10-23 01:06:54 +02:00
Lauren Liberda 1acc012d12 [youtube] better consent workaround 2021-10-23 01:06:54 +02:00
Dominika Liberda cd6352284d * version 2021.03.30 2021-10-23 01:06:54 +02:00
Lauren Liberda 59e06bd4ba [makefile] use python3 2021-10-23 01:06:54 +02:00
Lauren Liberda b7d3834471 [youtube] consent shit workaround (fuck google)
Co-authored-by: Dominika Liberda <ja@sdomi.pl>
2021-10-23 01:06:54 +02:00
Remita Amine 3121701f4a [sbs] add support for ondemand watch URLs(closes #28566) 2021-10-23 01:06:54 +02:00
Remita Amine 516f230395 [picarto] fix live stream extraction(closes #28532) 2021-10-23 01:06:54 +02:00
Remita Amine 729c5ed0af [vimeo] fix unlisted video extraction(closes #28414) 2021-10-23 01:06:54 +02:00
Remita Amine a3bb4b9cc3 [ard] improve clip id extraction(#22724)(closes #28528) 2021-10-23 01:06:54 +02:00
Roman Sebastian Karwacik ce34e3f5d3 [zoom] Add new extractor(closes #16597, closes #27002, closes #28531) 2021-10-23 01:06:54 +02:00
The Hatsune Daishi 72ec72e477 [extractor] escape forgotten dot for hostnames in regular expression …
…(#28530)
2021-10-23 01:06:54 +02:00
Remita Amine ba823664f9 [bbc] fix BBC IPlayer Episodes/Group extraction(closes #28360) 2021-10-23 01:06:54 +02:00
Remita Amine ed103246ab [zingmp3] fix extraction(closes #11589, closes #16409, closes #16968,…
closes #27205)
2021-10-23 01:06:54 +02:00
=?UTF-8?q?Martin=20Str=C3=B6m?= 9d1c745fac [vgtv] Add support for new tv.aftonbladet.se URL schema (#28514)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-10-23 01:06:54 +02:00
Lauren Liberda 235b606437 [tiktok] detect private videos 2021-10-23 01:06:53 +02:00
Lauren Liberda ec794dacb2 --ie-key cli option 2021-10-23 01:06:53 +02:00
Lauren Liberda 53c1660468 fix dw:article, refactor dw 2021-10-23 01:06:53 +02:00
Lauren Liberda bb9c83c613 + patronite audio extractor 2021-10-23 01:06:53 +02:00
Dominika Liberda c35da27819 * version 2021.03.21 2021-10-23 01:06:53 +02:00
Lauren Liberda 8e9915aa09 changelog update 2021-10-23 01:06:53 +02:00
Lauren Liberda 6567bd00fe [youtube] meaningful error for age-gated no-embed videos 2021-10-23 01:06:53 +02:00
Lauren Liberda 55fa7b48c4 - removed tvnplayer extractor 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6d4a1a1f1b [yandexmusic:playlist] Request missing tracks in chunks (closes #2735…
…5, closes #28184)
2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b8f1f10baa [yandexmusic:album] Simplify 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 51b98e969c [yandexmusic] Add support for music.yandex.com (closes #27425) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 78c346bb86 [yandexmusic] DRY _VALID_URL base 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1607edbfe5 [yandexmusic:album] Improve album title extraction (closes #27418) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 56a1c8a862 [yandexmusic] Refactor and add support for artist's tracks and albums…
… (closes #11887, closes #22284)
2021-10-23 01:06:53 +02:00
Lauren Liberda bfa15434b6 [peertube] improve thumbnail extraction
Original author: remitamine
2021-10-23 01:06:53 +02:00
Lauren Liberda 7bc8f716ac [PATCH] [vimeo:album] Fix extraction for albums with number of videos multiple to page size
Original author: dstftw
2021-10-23 01:06:53 +02:00
Remita Amine 0d8471c5cd [vvvvid] fix kenc format extraction(closes #28473) 2021-10-23 01:06:53 +02:00
Remita Amine a19efaf4f2 [mlb] fix video extracion(#21241) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2395913191 [svtplay] Improve extraction (closes #28448) 2021-10-23 01:06:53 +02:00
Remita Amine 92838c24a7 [applepodcasts] fix extraction(closes #28445) 2021-10-23 01:06:53 +02:00
Remita Amine 139349577f [rtve] improve extraction
- extract all formats
- fix RTVE Infantil extraction(closes #24851)
- extract is_live and series
2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= fe7f2f4665 [southpark] Fix extraction and add support for southparkstudios.com (…
…closes #26763, closes #28413)
2021-10-23 01:06:53 +02:00
Remita Amine b4565ba80a [sportdeutschland] fix extraction(closes #21856)(closes #28425) 2021-10-23 01:06:53 +02:00
Remita Amine 62e95de9b8 [pinterest] reduce the number of HLS format requests 2021-10-23 01:06:53 +02:00
Remita Amine a14cdc8911 [tver] improve title extraction(closes #28418) 2021-10-23 01:06:53 +02:00
Remita Amine 1ae0b09ce7 [fujitv] fix HLS formats extension(closes #28416) 2021-10-23 01:06:53 +02:00
Remita Amine 5a7b034588 [shahid] fix format extraction(closes #28383) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 5302d7deb5 [bandcamp] Extract release_timestamp 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 7cd709a8d6 Introduce release_timestamp meta field (refs #28386) 2021-10-23 01:06:53 +02:00
Lauren Liberda 184e14106d [pornhub] Detect flagged videos
Original author: dstftw
2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6342cb9395 [pornhub] Extract formats from get_media end point (#28395) 2021-10-23 01:06:53 +02:00
Remita Amine 0f5a738cb8 [bilibili] fix video info extraction(closes #28341) 2021-10-23 01:06:53 +02:00
Remita Amine c43af05b3f [cbs] add support for Paramount+ (closes #28342) 2021-10-23 01:06:53 +02:00
Remita Amine 3f6c70d7fd [trovo] Add Origin header to VOD formats(closes #28346) 2021-10-23 01:06:53 +02:00
Remita Amine 3f8ceb92d6 [voxmedia] fix volume embed extraction(closes #28338) 2021-10-23 01:06:53 +02:00
Remita Amine c906fbd026 [9c9media] fix extraction for videos with multiple ContentPackages(cl…
…oses #28309)
2021-10-23 01:06:53 +02:00
Remita Amine be01b75623 [bbc] correct catched exception type 2021-10-23 01:06:53 +02:00
dirkf a13d30df60 [bbc] add support for BBC Reel videos(closes #21870, closes #23660, c…
…loses #28268)
2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 9e2d1d79af [zdf] Rework extractors (closes #11606, closes #13473, closes #17354,…
closes #21185, closes #26711, closes #27068, closes #27930, closes #28198, closes #28199, closes #28274)

* Generalize unique video ids for zdf based extractors
* Improve extraction
* Fix 3sat and phoenix
2021-10-23 01:06:53 +02:00
Lauren Liberda 8763d41b2e fix the patch hook 2021-10-23 01:06:53 +02:00
Remita Amine 87d2884a39 [stretchinternet] Fix extraction(closes #28297) 2021-10-23 01:06:53 +02:00
Remita Amine 21755dfd0b [urplay] fix episode data extraction(closes #28292) 2021-10-23 01:06:53 +02:00
Remita Amine f7f18a1f95 [bandaichannel] Add new extractor(closes #21404) 2021-10-23 01:06:53 +02:00
Lauren Liberda 76dd2ae2c6 [tvp:embed] extracting video subtitles 2021-10-23 01:06:53 +02:00
Lauren Liberda f38b6982dd fix m3u8 parsing test 2021-10-23 01:06:53 +02:00
Lauren Liberda 485abb04a8 fix possible crash 2021-10-23 01:06:53 +02:00
Lauren Liberda 84412f41fa support for vtt subtitles in m3u8 manifests 2021-10-23 01:06:53 +02:00
Lauren Liberda d506825a5c [pulsevideo] unduplicating formats 2021-10-23 01:06:53 +02:00
Lauren Liberda 98a2f0c8fe [polskieradio] radiokierowcow.pl extractor 2021-10-23 01:06:53 +02:00
Lauren Liberda 01e9b4552b [youtube] some formats are now just static 2021-10-23 01:06:53 +02:00
Lauren Liberda 28d1709c91 [youtube] better signature handling for DASH formats 2021-10-23 01:06:53 +02:00
Lauren Liberda e150971ea7 [generic] extracting mpd manifests properly 2021-10-23 01:06:53 +02:00
Lauren Liberda c8a9e64511 + bittorrent magnet extractor 2021-10-23 01:06:53 +02:00
Lauren Liberda b36bfac24e [generic] detecting bittorrent manifest files 2021-10-23 01:06:53 +02:00
Lauren Liberda 21c370bf29 [peertube] bittorrent formats 2021-10-23 01:06:53 +02:00
Lauren Liberda 81f0034a12 initial bittorrent support 2021-10-23 01:06:53 +02:00
Lauren Liberda 503f2b988e [tiktok] hashtag and music extractors 2021-10-23 01:06:53 +02:00
Lauren Liberda 9fb4a5decb [onnetwork] refactor 2021-10-23 01:06:53 +02:00
Lauren Liberda 6856478350 [polskieradio] podcast support 2021-10-23 01:06:53 +02:00
Lauren Liberda 5bb9c5e53e [youtube] more descriptive geo-lock messages (with countries) 2021-10-23 01:06:53 +02:00
Timothy Wynn 79002a5092 Update go.py 2021-10-23 01:06:53 +02:00
Lauren Liberda 5d4293f103 removed a lot of deprecated platform support code 2021-10-23 01:06:53 +02:00
Lauren Liberda e4ad9f9329 new exe build script 2021-10-23 01:06:53 +02:00
Lauren Liberda c5a07adbd2 [playwright] more verbose errors if --verbose 2021-10-23 01:06:53 +02:00
Lauren Liberda ec881dd98d [youtube] signature function caching 2021-10-23 01:06:53 +02:00
Lauren Liberda 99ae610f74 fix links to ytdl issues 2021-10-23 01:06:53 +02:00
Lauren Liberda b02f30e9e9 pypy tests 2021-10-23 01:06:53 +02:00
Lauren Liberda 2930f4f593 videotarget extractor 2021-10-23 01:06:53 +02:00
Lauren Liberda 5c09f8a7db acast player extractor 2021-10-23 01:06:53 +02:00
Dominika Liberda 1275ec3347 version 2021.03.01 2021-10-23 01:06:53 +02:00
Lauren Liberda 5f73fa1c26 [peertube] playlist, channel and account extractor 2021-10-23 01:06:53 +02:00
Lauren Liberda d52e39ac13 [cda] logging in with a user account 2021-10-23 01:06:53 +02:00
Laura Liberda bf9bb174e5 remove some unused devscripts/docs 2021-10-23 01:06:53 +02:00
Dominika Liberda be6253988a version 2021.02.27 2021-10-23 01:06:53 +02:00
Laura Liberda 9d7059a4a9 add --use-proxy-sites option 2021-10-23 01:06:53 +02:00
Laura Liberda 12e6f64462 nitter extractor 2021-10-23 01:06:53 +02:00
bopol 521dc1f82c [nitter] Add new extractor 2021-10-23 01:06:53 +02:00
Laura Liberda fe439b89c4 updated changelog for 2021.02.26 2021-10-23 01:06:53 +02:00
Laura Liberda 7989d4c448 [ipla] reformat code 2021-10-23 01:06:53 +02:00
Laura Liberda 50d15ce421 remove now-invalid unicode_literals test 2021-10-23 01:06:53 +02:00
Dominika Liberda 724948f4b2 version 2021.02.26 2021-10-23 01:06:53 +02:00
Dominika Liberda 39fbd9c21b new youtube crypto 2021-10-23 01:06:53 +02:00
Laura Liberda 79278413a9 make sure py2 throws a deprecation notice 2021-10-23 01:06:53 +02:00
Laura Liberda 2748cc857a changelog 2021-10-23 01:06:53 +02:00
Laura Liberda d06a708d58 fix crash in generic extractor 2021-10-23 01:06:53 +02:00
Laura Liberda 0b4878715c fix hdl tests 2021-10-23 01:06:53 +02:00
Alexander Seiler 43b1927fe4 [srgssr] improve extraction
- extract subtitle
- fix extraction for new videos
- update srf download domains

closes #14717
closes #14725
closes #27231
closes #28238
2021-10-23 01:06:53 +02:00
Remita Amine aba0fad66e [vvvvid] reduce season request payload size 2021-10-23 01:06:53 +02:00
nixxo 785d9930cb [vvvvid] extract series sublists playlist_title (#27601) (#27618) 2021-10-23 01:06:53 +02:00
Remita Amine fe62eeb47b [dplay] Extract Ad-Free uplynk URLs(#28160) 2021-10-23 01:06:53 +02:00
Remita Amine 9c66db7689 [wat] detect DRM protected videos(closes #27958) 2021-10-23 01:06:53 +02:00
Remita Amine 0ed4a821b8 [tf1] improve extraction(closes #27980)(closes #28040) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 4a1a901bd3 [tmz] Fix and improve extraction (closes #24603, closes #24687, close…
…s #28211)
2021-10-23 01:06:53 +02:00
Remita Amine 8b3f0fb289 [gedidigital] improve asset id matching 2021-10-23 01:06:53 +02:00
nixxo 8d3f96a92c [gedidigital] Add new extractor(closes #7347)(closes #26946) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e8c278f355 [apa] Improve extraction (closes #27750) 2021-10-23 01:06:53 +02:00
Adrian Heine f5a691070c [apa] Fix extraction 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 56a54d1838 [youporn] Skip test 2021-10-23 01:06:53 +02:00
piplongrun 8c0b1ad436 [youporn] Extract duration (#28019)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-10-23 01:06:53 +02:00
Isaac-the-Man e55149a2fc [samplefocus] Add new extractor(closes #27763) 2021-10-23 01:06:53 +02:00
Remita Amine 3393281f14 [vimeo] add support for unlisted video source format extraction 2021-10-23 01:06:53 +02:00
Remita Amine 129037f139 [viki] improve extraction(closes #26522)(closes #28203)
- extract uploader_url and episode_number
- report login required error
- extract 480p formats
- fix API v4 calls
2021-10-23 01:06:53 +02:00
Remita Amine e391150eaf [ninegag] unscape title(#28201) 2021-10-23 01:06:53 +02:00
Remita Amine af5374fccc [dplay] add support for de.hgtv.com (closes #28182) 2021-10-23 01:06:53 +02:00
Remita Amine 69758c2be1 [dplay] Add support for discoveryplus.com (closes #24698) 2021-10-23 01:06:53 +02:00
dmsummers e5c17a8125 [simplecast] Add new extractor(closes #24107) 2021-10-23 01:06:53 +02:00
Max 4a816731df [postprocessor/embedthumbnail] Recognize atomicparsley binary in lowe…
…rcase (#28112)
2021-10-23 01:06:53 +02:00
Stephen Stair 451ed35cf9 [storyfire] Add new extractor(closes #25628)(closes #26349) 2021-10-23 01:06:53 +02:00
Remita Amine 71097456a0 [zhihu] Add new extractor(closes #28177) 2021-10-23 01:06:53 +02:00
Remita Amine 00a34e93d5 [ccma] fix timestamp parsing in python 2 2021-10-23 01:06:53 +02:00
Remita Amine b90eb0c932 [videopress] add support for video.wordpress.com 2021-10-23 01:06:53 +02:00
Remita Amine 9a7824ce14 [kakao] improve info extraction and detect geo restriction(closes #26…
…577)
2021-10-23 01:06:53 +02:00
Remita Amine 8353016232 [xboxclips] fix extraction(closes #27151) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 568e70c287 [ard] Improve formats extraction (closes #28155) 2021-10-23 01:06:53 +02:00
Kevin Velghe bbf256a986 [canvas] Add new extractor for Dagelijkse Kost (#28119) 2021-10-23 01:06:53 +02:00
Remita Amine d1cb2f14ee [ign] fix extraction(closes #24771) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 046fbce8fb [xhamster] Extract formats from xplayer settings and extract filesize…
…s (closes #28114)
2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 627d01e28f [archiveorg] Fix and improve extraction (closes #21330, closes #23586…
…, closes #25277, closes #26780, closes #27109, closes #27236, closes #28063)
2021-10-23 01:06:53 +02:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 87d88a3ce1 [urplay] Fix extraction (closes #28073) (#28074) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 0b4bded8d8 [azmedien] Fix extraction (#28064) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b37f3b7703 [pornhub] Implement lazy playlist extraction 2021-10-23 01:06:53 +02:00
Sergey M efbbc4fb90 [pornhub] Add placeholder netrc machine 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a15fb8f91b [svtplay] Fix video id extraction (closes #28058) 2021-10-23 01:06:53 +02:00
Sergey M 7393b46d28 [pornhub] Add support for authentication (closes #18797, closes #21416, closes #24294) 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 789d99a4f9 [pornhub:user] Add support for URLs unavailable via /videos page and …
…improve paging (closes #27853)
2021-10-23 01:06:53 +02:00
Remita Amine 3ceec04914 [bravotv] add support for oxygen.com(closes #13357)(closes #22500) 2021-10-23 01:06:53 +02:00
Guillem Vela 77a502d82b [ccma] improve metadata extraction(closes #27994)
- extract age_limit, alt_title, categories, series and episode_number
- fix timestamp multiple subtitles extraction
2021-10-23 01:06:53 +02:00
Remita Amine 41b390b21e [egghead] fix typo 2021-10-23 01:06:53 +02:00
Viren Rajput 97becd9c42 [egghead] update API domain(closes #28038) 2021-10-23 01:06:53 +02:00
Remita Amine db09c2ce61 [vidzi] remove extractor(closes #12629) 2021-10-23 01:06:53 +02:00
Remita Amine 5eab5ac665 [vidio] improve metadata extraction 2021-10-23 01:06:53 +02:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= ac7d81f386 [AENetworks] update AENetworksShowIE test playlist id (#27851) 2021-10-23 01:06:52 +02:00
nixxo 4de0c0695a [vvvvid] add support for youtube embeds (#27825) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= d78a905c93 [awaan] Extract uploader id (#27963) 2021-10-23 01:06:52 +02:00
Remita Amine 90854501c3 [medialaan] add support DPG Media MyChannels based websites
closes #14871
closes #15597
closes #16106
closes #16489
2021-10-23 01:06:52 +02:00
Remita Amine d7156eead8 [abcnews] fix extraction(closes #12394)(closes #27920) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 9f1e642faf [AMP] Fix upload_date and timestamp extraction (#27970) 2021-10-23 01:06:52 +02:00
Remita Amine e653b73d7b [tv4] relax _VALID_URL(closes #27964) 2021-10-23 01:06:52 +02:00
Remita Amine 72ccf1ea95 [tv2] improve MTV Uutiset Article extraction 2021-10-23 01:06:52 +02:00
tpikonen 4297f217fd [tv2] Add support for mtvuutiset.fi (#27744) 2021-10-23 01:06:52 +02:00
Remita Amine 21a21fdd12 [adn] improve login warning reporting 2021-10-23 01:06:52 +02:00
Remita Amine 6f32443da8 [zype] fix uplynk id extraction(closes #27956) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 10e0bc3eeb [ADN] Implement login (#27937)
closes #17091
closes #27841
2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e752371a7e [franceculture] Make thumbnail optional (closes #18807) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Aur=C3=A9lien=20Grosdidier?= b68a24ee06 [franceculture] Fix extraction (closes #27891) (#27903)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 774a9a4f95 [options] Clarify --extract-audio help string (closes #27878) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a17fa78858 Introduce --output-na-placeholder (closes #27896) 2021-10-23 01:06:52 +02:00
aarubui e6f46e50df [njpwworld] fix extraction (#27890) 2021-10-23 01:06:52 +02:00
Remita Amine b6aeee6944 [comedycentral] fix extraction(closes #27905) 2021-10-23 01:06:52 +02:00
Remita Amine 89d8d831ab [wat] remove unused variable 2021-10-23 01:06:52 +02:00
Remita Amine a4afcb7a2f [wat] fix format extraction(closes #27901) 2021-10-23 01:06:52 +02:00
Remita Amine 0ca38e7fb2 [americastestkitchen] improve season extraction 2021-10-23 01:06:52 +02:00
Brian Marks e4d12c6925 [americastestkitchen] Add support for downloading entire seasons (#27…
…861)
2021-10-23 01:06:52 +02:00
Remita Amine 9cd160bcf0 [trovo] Add new extractor(closes #26125) 2021-10-23 01:06:52 +02:00
Remita Amine 78d9c01473 [aol] add support for yahoo videos(closes #26650) 2021-10-23 01:06:52 +02:00
Remita Amine 0aca265418 [yahoo] fix single video extraction 2021-10-23 01:06:52 +02:00
Remita Amine 2287106362 [ninegag] improve extraction 2021-10-23 01:06:52 +02:00
DrWursterich 554ca47216 [9gag] Fix Extraction (#23022) 2021-10-23 01:06:52 +02:00
Brian Marks 23ed695eaa [americastestkitchen] Improve metadata extraction for ATK episodes (#…
…27860)
2021-10-23 01:06:52 +02:00
Remita Amine 6e0744819c [aljazeera] fix extraction(closes #20911)(closes #27779) 2021-10-23 01:06:52 +02:00
Remita Amine 81e163b024 [minds] improve extraction 2021-10-23 01:06:52 +02:00
Tatsh ca4bfa3da3 [Minds] Add new extractor (#17934) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 58fa22f684 [ard] Fix title and description extraction and update tests (#27761) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= f808eef2f3 [aenetworks] Fix test (#27847) 2021-10-23 01:06:52 +02:00
Remita Amine 6b6c8bf1f0 [spotify] Add new extractor for Spotify Podcasts(closes #27443) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 79a5e77e79 [mixcloud:playlist:base] Fix video id extraction in flat playlist mod…
…e (refs #27787)
2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6f17f97383 [animeondemand] Add support for lazy playlist extraction (closes #27829) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= fff26f2340 [YoutubeDL] Protect from infinite recursion due to recursively nested…
… playlists (closes #27833)
2021-10-23 01:06:52 +02:00
Remita Amine 2dcad362e8 [twitter] Add tests for more cards 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 720ee011e2 [youporn] Restrict fallback download URL (refs #27822) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 4b782259f2 [youporn] Improve height and tbr extraction (refs #23659, refs #20425) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1855ad87cd [youporn] Fix extraction (closes #27822) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= cd5366e41d [twitter] Add support for unified cards (closes #27826) 2021-10-23 01:06:52 +02:00
main() 6fdbcc1c59 [twitch] Set OAuth token for GraphQL requests using auth-token cookie…
… (#27790)

Co-authored-by: remitamine <remitamine@gmail.com>
2021-10-23 01:06:52 +02:00
Aaron Zeng 8ab5adca22 [YoutubeDL] Ignore failure to create existing directory (#27811) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6ec432f9e4 [YoutubeDL] Raise syntax error for format selection expressions with …
…multiple + operators (closes #27803)
2021-10-23 01:06:52 +02:00
Aarni Koskela 9fb1f50a15 [Mixcloud] Harmonize ID generation from lists with full ID generation…
… (#27787)

Mixcloud IDs are generated as `username_slug` when the full ID dict has been
downloaded.  When downloading a list (e.g. uploads, favorites, ...), the temporary
ID is just the `slug`.  This made e.g. archive file usage require the download
of stream metadata before the download can be rejected as already downloaded.

This commit attempts to get the uploader username during the GraphQL query, so the
temporary IDs are generated similarly.
2021-10-23 01:06:52 +02:00
Remita Amine e5ef69ca71 [cspan] improve info extraction(closes #27791) 2021-10-23 01:06:52 +02:00
Remita Amine ca8ab55941 [adn] improve info extraction 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 258cd9f44e [ADN] Fix extraction (#27732)
Closes #26963.
2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 12ca40436c [twitch] Improve login error extraction 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 82b487f429 [twitch] Fix authentication (refs #27743) 2021-10-23 01:06:52 +02:00
Remita Amine 7501f94648 [threeqsdn] Improve extraction(closes #21058) 2021-10-23 01:06:52 +02:00
0l-l0 1adc5f4c47 [peertube] Extract files also from streamingPlaylists (#27728)
JSON objects with an empty "files" tag seem to be a valid PeerTube API
response. In those cases the "files" arrays contained in the
"streamingPlaylists" members can be used instead.
closes #26002
closes #27586
2021-10-23 01:06:52 +02:00
Remita Amine 9c305e65fb [khanacademy] fix extraction(closes #2887)(closes #26803) 2021-10-23 01:06:52 +02:00
Remita Amine 2c106247f3 [spike] Update Paramount Network feed URL(closes #27715) 2021-10-23 01:06:52 +02:00
nixxo a447f25d5c [rai] improve subtitles extraction (#27705)
closes #27698
2021-10-23 01:06:52 +02:00
Remita Amine 3caed2e161 [canvas] Match only supported VRT NU URLs(#27707) 2021-10-23 01:06:52 +02:00
Remita Amine 1c3eda7037 [extractors] add BibelTVIE import 2021-10-23 01:06:52 +02:00
Remita Amine 84d85be122 [bibeltv] Add new extractor(closes #14361) 2021-10-23 01:06:52 +02:00
Remita Amine 245f1d834f [bfmtv] Add new extractor(closes #16053)(closes #26615) 2021-10-23 01:06:52 +02:00
Remita Amine da1bf36474 [sbs] Add support for ondemand play and news embed URLs(closes #17650…
…)(closes #27629)
2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 16ff1e6bdf [twitch] Refactor 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ced3ed1c25 [twitch] Drop legacy kraken API v5 code altogether 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0b6351b649 [twitch:vod] Switch to GraphQL for video metadata 2021-10-23 01:06:52 +02:00
Remita Amine acac4f3f2c [canvas] Fix VRT NU extraction(closes #26957)(closes #27053) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3ef8a0fb3d [twitch] Improve access token extraction and remove unused code (clos…
…es #27646)
2021-10-23 01:06:52 +02:00
23rd 8ad9ab6bf1 [twitch] Switch access token to GraphQL and refactor. 2021-10-23 01:06:52 +02:00
nixxo 57f417b3cd [rai] Detect ContentItem in iframe (closes #12652) (#27673)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-10-23 01:06:52 +02:00
Remita Amine f24ce9e12a [ketnet] fix extraction(closes #27662) 2021-10-23 01:06:52 +02:00
Remita Amine 330c49718a [dplay] Add suport Discovery+ domains(closes #27680) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 5984740aed [motherless] Fix review issues and improve extraction (closes #26495,…
closes #27450)
2021-10-23 01:06:52 +02:00
cladmi b3e644ee0d [motherless] Fix recent videos upload date extraction (closes #27661)
Less than a week old videos use a '20h ago' or '1d ago' format.

I kept the support for 'Ago' with uppercase start at is was already in the code.
2021-10-23 01:06:52 +02:00
Kevin O'Connor aa88061b86 [downloader/hls] Disable decryption in tests (#27660)
Tests truncate the download to 10241 bytes, which is not divisible by 16
and cannot be decrypted. Tests don't really care about the decrypted
content, just that the data they retrieved is the expected data.
Therefore, it's fine to just return the encrypted data to tests.

See: #27621 and #27620
2021-10-23 01:06:52 +02:00
Yurii H 598916af95 [iheart] Update test description value (#27037)
the description has no HTML tags now.
2021-10-23 01:06:52 +02:00
Remita Amine 59d82ec275 [nrk] fix extraction for videos without a legalAge rating 2021-10-23 01:06:52 +02:00
Remita Amine 0a8305247c [iheart] clean HTML tags from episode description 2021-10-23 01:06:52 +02:00
Remita Amine 4d731a9917 [iheart] remove print statement 2021-10-23 01:06:52 +02:00
Remita Amine c23155171b [googleplus] Remove Extractor(closes #4955)(closes #7400) 2021-10-23 01:06:52 +02:00
Remita Amine 5a6c27fd90 [applepodcasts] Add new extractor(#25918) 2021-10-23 01:06:52 +02:00
Remita Amine 12049cc72a [googlepodcasts] Add new extractor 2021-10-23 01:06:52 +02:00
Remita Amine 0c069db053 [iheart] Add new extractor for iHeartRadio(#27037) 2021-10-23 01:06:52 +02:00
Remita Amine 49dc695b8b [acast] clean podcast URLs 2021-10-23 01:06:52 +02:00
Remita Amine 888a8d4c64 [stitcher] clean podcast URLs 2021-10-23 01:06:52 +02:00
Remita Amine 95a9d868f6 [utils] add a function to clean podcast URLs 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 97cdea36f2 [xfileshare] Add support for aparat.cam (closes #27651) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= d0177a6f30 [nrktv] Add subtitles test 2021-10-23 01:06:52 +02:00
Remita Amine 881049eef5 [twitter] Add support for summary card(closes #25121) 2021-10-23 01:06:52 +02:00
Remita Amine a39405150b [twitter] try to use a Generic fallback for unknown twitter cards(clo…
…ses #25982)
2021-10-23 01:06:52 +02:00
Remita Amine f6560b19c4 [stitcher] Add support for shows and show metadata extraction(closes …
…#20510)
2021-10-23 01:06:52 +02:00
Remita Amine 7db31b6628 [stv] improve episode id extraction(closes #23083) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c09836e567 [nrk] Fix age limit extraction 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 74e9abdd95 [nrk] Improve series metadata extraction (closes #27473) 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0047ca78b2 [nrk] PEP 8 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6aa2ce5a53 [nrk] Improve episode and season number extraction 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= dd3fe3a68e [nrktv] Fix tests 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 28011fbf5d [nrk] Improve series metadata extraction 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= d1cc5993a9 [nrk] Extract subtitles 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ee3e906f25 [nrk] Fix age limit extraction 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2f5c7ee1f9 [nrk] Inline _extract_from_playback 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3042ced564 [nrk] Improve video id extraction 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= f78fbf70e7 [nrk] Add more shortcut tests 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6ed6badbe2 [nrk] Improve extraction (closes #27634, closes #27635)
+ Add support for mp3 formats
* Generalize and delegate all item extractors to nrk, beware ie key breakages
+ Add support for podcasts
+ Generalize nrk shortcut form to support all kind of ids
2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= f224645ac3 [nrktv] Switch to playback endpoint
mediaelement endpoint is no longer in use.
2021-10-23 01:06:52 +02:00
Remita Amine 67215062ea [vvvvid] fix season metadata extraction(#18130) 2021-10-23 01:06:52 +02:00
Remita Amine ca8965d28b [stitcher] fix extraction(closes #20811)(closes #27606) 2021-10-23 01:06:52 +02:00
Remita Amine 91dd4819e8 [acast] fix extraction(closes #21444)(closes #27612)(closes #27613) 2021-10-23 01:06:52 +02:00
Remita Amine 7e985046a0 [arcpublishing] add missing staticmethod decorator 2021-10-23 01:06:52 +02:00
Remita Amine 4336684bca [arcpublishing] Add new extractor
closes #2298
closes #9340
closes #17200
2021-10-23 01:06:52 +02:00
Remita Amine f8af3b480f [sky] add support for Sports News articles and Brighcove videos(close…
…s #13054)
2021-10-23 01:06:52 +02:00
Remita Amine 6fec676bf2 [vvvvid] skip unplayable episodes and extract akamai formats(closes #…
…27599)
2021-10-23 01:06:52 +02:00
Remita Amine 9014fa950e [yandexvideo] fix extraction for Python 3.4 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b8ebc9bd91 [redditr] Fix review issues and extract source thumbnail (closes #27503) 2021-10-23 01:06:52 +02:00
ozburo b83833f1f4 [redditr] Extract all thumbnails 2021-10-23 01:06:52 +02:00
Remita Amine 4f49d50c61 [vvvvid] imporove info extraction 2021-10-23 01:06:52 +02:00
nixxo 8188302a03 [vvvvid] add playlists support (#27574)
closes #18130
2021-10-23 01:06:52 +02:00
Remita Amine 2c8306ce89 [yandexdisk] extract info from webpage
the public API does not return metadata when download limit is reached
2021-10-23 01:06:52 +02:00
Remita Amine 5643182c7c [yandexdisk] fix extraction(closes #17861)(closes #27131) 2021-10-23 01:06:52 +02:00
Remita Amine ba2ee853d0 [yandexvideo] use old api call as fallback 2021-10-23 01:06:52 +02:00
Remita Amine 7dc64629d5 [yandexvideo] fix extraction(closes #25000) 2021-10-23 01:06:52 +02:00
Remita Amine d918ab8191 [utils] accept only supported protocols in url_or_none 2021-10-23 01:06:52 +02:00
Remita Amine f16294dd19 [YoutubeDL] Allow format filtering using audio language(#16209) 2021-10-23 01:06:52 +02:00
Remita Amine 93c2a90d46 [nbc] Remove CSNNE extractor 2021-10-23 01:06:52 +02:00
Remita Amine 21652ea04f [nbc] fix NBCSport VPlayer URL extraction(closes #16640) 2021-10-23 01:06:52 +02:00
Remita Amine 6a185bd70e [aenetworks] fix HistoryPlayerIE tests 2021-10-23 01:06:52 +02:00
Remita Amine 78c85eee12 [aenetworks] add support for biography.com (closes #3863) 2021-10-23 01:06:52 +02:00
Remita Amine f40b51b20b [uktvplay] match new video URLs(closes #17909) 2021-10-23 01:06:52 +02:00
Remita Amine 03ec618950 [sevenplay] detect API errors 2021-10-23 01:06:52 +02:00
Remita Amine 8215e72574 [tenplay] fix format extraction(closes #26653) 2021-10-23 01:06:52 +02:00
Remita Amine de72e99bd4 [brightcove] raise ExtractorError for DRM protected videos(closes #23…
…467)(closes #27568)
2021-10-23 01:06:52 +02:00
Remita Amine 107ca3cbb4 [aparat] Fix extraction
closes #22285
closes #22611
closes #23348
closes #24354
closes #24591
closes #24904
closes #25418
closes #26070
closes #26350
closes #26738
closes #27563
2021-10-23 01:06:52 +02:00
Remita Amine 5a2545d9e1 [brightcove] remove sonyliv specific code 2021-10-23 01:06:52 +02:00
Remita Amine c75135ffe9 [piksel] import format extraction 2021-10-23 01:06:52 +02:00
Remita Amine 4a4b71739b [zype] Add support for uplynk videos 2021-10-23 01:06:52 +02:00
Remita Amine c80af1ed8b [toggle] add support for live.mewatch.sg (closes #27555) 2021-10-23 01:06:52 +02:00
JamKage 4174222eb4 [go] Added support for FXNetworks (#26826)
Co-authored-by: James Kirrage <james.kirrage@mortgagegym.com>

closes #13972
closes #22467
closes #23754
2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2edeaf4d05 [teachable] Improve embed detection (closes #26923) 2021-10-23 01:06:52 +02:00
Remita Amine 4e37a9ea6c [mitele] fix free video extraction(#24624)(closes #25827)(closes #26757) 2021-10-23 01:06:52 +02:00
Remita Amine 1c1e20603b [telecinco] fix extraction 2021-10-23 01:06:52 +02:00
Sergey M acd96ffe3c [youtube] Update invidious.snopyta.org (#22667)
Co-authored-by: sofutru <54445344+sofutru@users.noreply.github.com>
2021-10-23 01:06:52 +02:00
Remita Amine a684623f86 [amcnetworks] improve auth only video detection(closes #27548) 2021-10-23 01:06:52 +02:00
Laura Liberda 571c02ab38 VHX embeds
https://github.com/ytdl-org/youtube-dl/issues/27546
2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c90c6e0db7 [instagram] Fix test 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0309b4a494 [instagram] Fix comment count extraction 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6f05e08af3 [instagram] Add support for reel URLs (closes #26234, closes #26250) 2021-10-23 01:06:52 +02:00
Remita Amine 16f8b4442c [bbc] switch to media selector v6
closes #23232
closes #23933
closes #26303
closes #26432
closes #26821
closes #27538
2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 36d012f932 [instagram] Improve thumbnail extraction 2021-10-23 01:06:52 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6e617eb2e8 [instagram] Improve extraction (closes #22880) 2021-10-23 01:06:51 +02:00
Andrew Udvare 22c3b77c77 [instagram] Fix extraction when authenticated (closes #27422) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ede86727f3 [spankbang] Remove unused import 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1cb545ef2a [spangbang:playlist] Fix extraction (closes #24087) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= de08d2ebb6 [spangbang] Add support for playlist videos 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c7a0d780b6 [pornhub] Fix review issues (closes #27393) 2021-10-23 01:06:51 +02:00
JChris246 6ba48137c2 [pornhub] Fix lq formats extraction (closes #27386) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e1b44f86a8 [bongacams] Add extractor (closes #27440) 2021-10-23 01:06:51 +02:00
Remita Amine 1d866fe2b3 [theweatherchannel] fix extraction (closes #25930)(closes #26051) 2021-10-23 01:06:51 +02:00
Remita Amine ac0b651c35 [sprout] correct typo 2021-10-23 01:06:51 +02:00
Remita Amine 7246d188ec [sprout] Add support for Universal Kids (closes #22518) 2021-10-23 01:06:51 +02:00
Remita Amine 866f1a801d [theplatform] allow passing geo bypass countries from other extractors 2021-10-23 01:06:51 +02:00
Remita Amine 93e7f9943c [ctv] Add new extractor (closes #27525) 2021-10-23 01:06:51 +02:00
Remita Amine a8abf2770e [9c9media] improve info extraction 2021-10-23 01:06:51 +02:00
Remita Amine 522b9ee05b [sonyliv] fix title for movies 2021-10-23 01:06:51 +02:00
Remita Amine 6baf86c39d [sonyliv] fix extraction(closes #25667) 2021-10-23 01:06:51 +02:00
Remita Amine 14a2647111 [streetvoice] fix extraction(closes #27455)(closes #27492) 2021-10-23 01:06:51 +02:00
Remita Amine 97e449e183 [facebook] add support for watchparty pages(closes #27507) 2021-10-23 01:06:51 +02:00
Remita Amine ff3bc594e7 [cbslocal] fix video extraction 2021-10-23 01:06:51 +02:00
Remita Amine 50377f35a0 [brightcove] add another method to extract policyKey 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 15058fcdea [mewatch] Relax _VALID_URL (closes #27506) 2021-10-23 01:06:51 +02:00
Remita Amine 25f997a417 [anvato] remove NFLTokenGenerator
until a better solution is introduced that:
- works with lazy_extractors
- allows for 3rd party token generators
2021-10-23 01:06:51 +02:00
Remita Amine 23a5bcc4df [tastytrade] Remove Extractor(closes #25716)
covered by GenericIE via BrighcoveNewIE
2021-10-23 01:06:51 +02:00
Remita Amine 3e93d39835 [niconico] fix playlist extraction(closes #27428) 2021-10-23 01:06:51 +02:00
Remita Amine bfd375ccb7 [everyonesmixtape] Remove Extractor 2021-10-23 01:06:51 +02:00
Remita Amine 4dd69eb6fd [kanalplay] Remove Extractor 2021-10-23 01:06:51 +02:00
Remita Amine 8f1118f75d [nba] rewrite extractor 2021-10-23 01:06:51 +02:00
Remita Amine adbb3cdd89 [turner] improve info extraction 2021-10-23 01:06:51 +02:00
Remita Amine 4969a2783c [common] remove unwanted query params from unsigned akamai manifest URLs 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 86cf5a2aa7 [generic] Improve RSS age limit extraction 2021-10-23 01:06:51 +02:00
renalid e1b808fa40 [generic] Fix RSS itunes thumbnail extraction (#27405) 2021-10-23 01:06:51 +02:00
Trevor Nelson 1da4c42b9f [redditr] Extract duration (#27426) 2021-10-23 01:06:51 +02:00
Remita Amine aaabef0220 [anvato] Disable NFLTokenGenerator(closes #27449) 2021-10-23 01:06:51 +02:00
Remita Amine 4eab6d5637 [zaq1] Remove extractor 2021-10-23 01:06:51 +02:00
Remita Amine 76995ec854 [asiancrush] fix extraction and add support for retrocrush.tv
closes #25577
closes #25829
2021-10-23 01:06:51 +02:00
Remita Amine 7b2415f4f9 [nfl] fix extraction(closes #22245) 2021-10-23 01:06:51 +02:00
Remita Amine c2c2b20b39 [anvato] update ANVACK table and add experimental token generator for…
… NFL
2021-10-23 01:06:51 +02:00
Remita Amine a1952d3a0b [sky] relax SkySports URL regex (closes #27435) 2021-10-23 01:06:51 +02:00
Remita Amine 4bc794225a [tv5unis] Add new extractor(closes #22399)(closes #24890) 2021-10-23 01:06:51 +02:00
Remita Amine a78d792adb [videomore] add support more.tv (closes #27088) 2021-10-23 01:06:51 +02:00
Remita Amine b2d4c847ed [nhk:program] Add support for audio programs and program clips 2021-10-23 01:06:51 +02:00
Matthew Rayermann 7e6e4c9a29 [nhk] Add support for NHK video programs (#27230) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 93dd9a4b58 [test_InfoExtractor] PEP 8 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= f0db042d42 [mdr] Bypass geo restriction 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b331a819bc [mdr] Improve extraction (closes #24346, closes #26873) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c700190a05 [eporner] Fix view count extraction and make optional (closes #23306) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1060552241 [extractor/common] Improve JSON-LD interaction statistic extraction (…
…refs #23306)
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c7e6e03982 [eporner] Fix embed test URL 2021-10-23 01:06:51 +02:00
spvkgn aa655226dd [eporner] Fix hash extraction and extend _VALID_URL (#27396)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-10-23 01:06:51 +02:00
Remita Amine 948c21b310 [slideslive] use m3u8 entry protocol for m3u8 formats(closes #27400) 2021-10-23 01:06:51 +02:00
Remita Amine 90b9baceb9 [downloader/hls] delegate manifests with media initialization to ffmpeg 2021-10-23 01:06:51 +02:00
Remita Amine 2257ec4792 [twitcasting] fix format extraction and improve info extraction(close…
…s #24868)
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 09824dd983 [extractor/common] Document duration meta field for playlists 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= d5b9b7ffc1 [linuxacademy] Fix authentication and extraction (closes #21129, clos…
…es #26223, closes #27402)
2021-10-23 01:06:51 +02:00
Remita Amine 096b76fb8f [itv] clean description from HTML tags (closes #27399) 2021-10-23 01:06:51 +02:00
Remita Amine 26a7c4416c [hotstart] fix and improve extraction
- fix format extraction (closes #26690)
- extract thumbnail URL (closes #16079, closes #20412)
- support country specific playlist URLs (closes #23496)
- select the last id in video URL (closes #26412)
2021-10-23 01:06:51 +02:00
toniz4 a16737d123 [youtube] Add some invidious instances (#27373)
Co-authored-by: Cássio <heyitscassio@cock.li>
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 940c972e32 [ruutu] Extract more metadata and detect non-free videos (closes #21154) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 9551a110d5 [ruutu] Authenticate format URLs (closes #21031, closes #26782) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ee2b2e978d [ruutu] Add support for static.nelonenmedia.fi (closes #25412) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0f76bf3b16 [ruutu] Extend _VALID_URL (closes #24839) 2021-10-23 01:06:51 +02:00
Remita Amine 1f42f0d662 [facebook] Add support archived live video URLs(closes #15859) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 85b2a459dd [wdr] Extent subtitles extraction and improve overall extraction (clo…
…ses #22672, closes #22723)
2021-10-23 01:06:51 +02:00
Remita Amine ce1265ba8b [facebook] add support for videos attached to Relay based story pages…
…(#10795)
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 271de5e36b [wdr:page] Add support for kinder.wdr.de (closes #27350) 2021-10-23 01:06:51 +02:00
Remita Amine 7178c3e070 [facebook] Add another regex for handleServerJS 2021-10-23 01:06:51 +02:00
Remita Amine 80e135e246 [facebook] fix embed page extraction 2021-10-23 01:06:51 +02:00
compujo d96c2cd5bc [YoutubeDL] Improve thumbnails' filenames deducing (closes #26010) (#…
…27244)
2021-10-23 01:06:51 +02:00
Remita Amine a494221ffa [PATCH] [facebook] add support for Relay post pages(closes #26935) 2021-10-23 01:06:51 +02:00
Remita Amine 3a0bdc1456 [facebook] proper support for watch videos(closes #22795)(#27062) 2021-10-23 01:06:51 +02:00
Remita Amine 6970a136ec Revert "[facebook] add support for watch videos(closes #22795)"
This reverts commit dc65041c224497f46b2984df02c234ce54bdedfd.
2021-10-23 01:06:51 +02:00
Remita Amine ab0118de82 [facebook] add support for watch videos(closes #22795) 2021-10-23 01:06:51 +02:00
Remita Amine cfa38c154d [facebook] add support for group posts with multiple videos(closes #1…
…9131)
2021-10-23 01:06:51 +02:00
Remita Amine 885a3a9383 [itv] remove old extractio method and fix series metadata extraction
closes #23177
closes #26897
2021-10-23 01:06:51 +02:00
Remita Amine 131b5bbf6e [facebook] redirect Mobile URLs to Desktop URLs
closes #24831
closes #25624
2021-10-23 01:06:51 +02:00
Remita Amine 7d43250e35 [facebook] Add support for Relay based pages(closes #26823) 2021-10-23 01:06:51 +02:00
Remita Amine 9ba5d1f2c4 [facebook] try to reduce unessessary tahoe requests 2021-10-23 01:06:51 +02:00
Remita Amine 61022d86b9 [facebook] remove hardcoded chrome user-agent
closes #18974
closes #25411
closes #26958
closes #27329
2021-10-23 01:06:51 +02:00
Andrey Smirnoff b0635eb9af [smotri] Remove extractor (#27358) 2021-10-23 01:06:51 +02:00
Remita Amine d64b39da56 [beampro] Remove Extractor
closes #17290
closes #22871
closes #23020
closes #23061
closes #26099
2021-10-23 01:06:51 +02:00
EntranceJew 45ee5c8ba2 [tubitv] Extract release year (#27317) 2021-10-23 01:06:51 +02:00
Remita Amine 51433a1efa [amcnetworks] Fix free content extraction(closes #20354) 2021-10-23 01:06:51 +02:00
Remita Amine 8a311118c5 [telequebec] Fix Extraction and Add Support for video.telequebec.tv
closes #25733
closes #26883
closes #27339
2021-10-23 01:06:51 +02:00
Remita Amine d2f6235840 [generic] comment a test covered now by AmericasTestKitchenIE 2021-10-23 01:06:51 +02:00
Remita Amine 03f1dda5da [tvplay:home] Fix extraction(closes #21153) 2021-10-23 01:06:51 +02:00
Remita Amine aad69ec1b5 [americastestkitchen] Fix Extraction and add support for Cook's Count…
…ry and Cook's Illustrated

closes #17234
closes #27322
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 695d097a1b [slideslive] Add support for yoda service videos and extract subtitle…
…s (closes #27323)
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e895ca0d9f [extractor/generic] Remove unused import 2021-10-23 01:06:51 +02:00
Remita Amine b89074356b [aenetworks] Fix extraction
- Fix Fastly format extraction
- Add support for play and watch subdomains
- Extract series metadata

closes #23363
closes #23390
closes #26795
closes #26985
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a0b9b17798 [extractor/common] Extract timestamp from Last-Modified header 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 10b7025196 [generic] Extract RSS video itunes metadata 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 021738dfa7 [generic] Extract RSS video timestamp 2021-10-23 01:06:51 +02:00
renalid d3f521bbdc [generic] Extract RSS video description (#27177) 2021-10-23 01:06:51 +02:00
Remita Amine d6457b9613 [nrk] reduce requests for Radio series 2021-10-23 01:06:51 +02:00
Remita Amine 2120179b61 [nrk] reduce the number of instalments requests 2021-10-23 01:06:51 +02:00
Remita Amine 392743d1e4 [nrk] improve format extraction 2021-10-23 01:06:51 +02:00
Remita Amine 917f2ec68d [nrk] improve extraction
- improve format extraction for old akamai formats
- update some of the tests
- add is_live value to entry info dict
- request instalments only when their available
- fix skole extraction
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 9c92a54488 [peertube] Extract fps 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c23e70b2e7 [peertube] Recognize audio-only formats (closes #27295) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c11211efce [teachable:course] Improve extraction (closes #24507, closes #27286) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 427dfe7ca9 [nrk] Improve error extraction 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 24342b07db [nrktv] Relax _VALID_URL 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3d5bbf574b [nrktv:series] Improve extraction (closes #21926) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8b78e040c0 [nrktv:series] Improve extraction 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= d98d4ea999 [nrktv:season] Improve extraction 2021-10-23 01:06:51 +02:00
Remita Amine 3935b7503d [nrk] fix call to moved method 2021-10-23 01:06:51 +02:00
Remita Amine c3f178555c [nrk] fix typo 2021-10-23 01:06:51 +02:00
Remita Amine 9cd7a23c87 [nrk] improve format extraction and geo-restriction detection (closes…
#24221)
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3012eb0bee [pornhub] Handle HTTP errors gracefully (closes #26414) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 9647acd865 [nrktv] Relax _VALID_URL (closes #27299, closes #26185) 2021-10-23 01:06:51 +02:00
Remita Amine 5657492d8b [zdf] extract webm formats(closes #26659) 2021-10-23 01:06:51 +02:00
Matthew Rayermann a2c136b114 [nhk] Add audio clip test to NHK extractor (#27269) 2021-10-23 01:06:51 +02:00
Remita Amine aae5e2a9ae [gamespot] Extract DASH and HTTP formats 2021-10-23 01:06:51 +02:00
Remita Amine f1544fba1b [extractor/commons] improve Akamai HTTP formats extraction 2021-10-23 01:06:51 +02:00
Remita Amine 3e720fb782 [tver] correct episode_number key 2021-10-23 01:06:51 +02:00
Remita Amine b4790d23c3 [extractor/common] improve Akamai HTTP format extraction
- Allow m3u8 manifest without an additional audio format
- Fix extraction for qualities starting with a number
Solution provided by @nixxo based on: https://stackoverflow.com/a/5984688
2021-10-23 01:06:51 +02:00
Remita Amine 7eaf5dfb74 [tver] Add new extractor (closes #26662)(closes #27284) 2021-10-23 01:06:51 +02:00
Remita Amine a542d171f2 [extractors] Add QubIE import 2021-10-23 01:06:51 +02:00
Remita Amine 25b8a45235 [tva] Add support for qub.ca (closes #27235) 2021-10-23 01:06:51 +02:00
Remita Amine cb2a719249 [toggle] Detect DRM protected videos (closes #16479)(closes #20805) 2021-10-23 01:06:51 +02:00
Remita Amine 1016c56cd3 [toggle] Add support for new MeWatch URLs (closes #27256) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 00fc0dea8c [cspan] Extract info from jwplayer data (closes #3672, closes #3734, …
…closes #10638, closes #13030, closes #18806, closes #23148, closes #24461, closes #26171, closes #26800, closes #27263)
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Roman=20Ber=C3=A1nek?= db8e1543b1 [cspan] Pass Referer header with format's video URL (#26032) (closes …
…#25729)
2021-10-23 01:06:51 +02:00
Remita Amine cf0fbf895a [mediaset] add support for movie URLs(closes #27240) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c7c2b1d972 [yandexmusic:track] Fix extraction (closes #26449, closes #26669, clo…
…ses #26747, closes #26748, closes #26762)
2021-10-23 01:06:51 +02:00
Michael Munch 176b13bab2 [drtv] Extend _VALID_URL (#27243) 2021-10-23 01:06:51 +02:00
bopol af3b23c63d [ina] Add support for mobile URLs (#27229) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 06cf4cef12 [YoutubeDL] Write static debug to stderr and respect quiet for dynami…
…c debug (closes #14579, closes #22593)

TODO: logging and verbosity needs major refactoring (refs #10894)
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= c968fd04de [videa] Adapt to updates (#26301)
closes #25973, closes #25650.
2021-10-23 01:06:51 +02:00
Remita Amine c3d9771ac6 [spreaker] fix SpreakerShowIE test URL 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0f12f211cf [spreaker] Add extractor (closes #13480, closes #13877) 2021-10-23 01:06:51 +02:00
Remita Amine fd0c0b9663 [viki] fix video API request(closes #27184) 2021-10-23 01:06:51 +02:00
Remita Amine 8ebadf3f79 [bbc] fix BBC Three clip extraction 2021-10-23 01:06:51 +02:00
Remita Amine 67f2b570ef [bbc] fix BBC News videos extraction 2021-10-23 01:06:51 +02:00
Remita Amine c7e8522059 [medaltv] improve extraction 2021-10-23 01:06:51 +02:00
Joshua Lochner 7fff37c758 [medaltv] Add new extractor (#27149) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 42acea347e [downloader/fragment] Set final file's mtime according to last fragme…
…nt's Last-Modified header (closes #11718, closes #18384, closes #27138)
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8315bccf62 [nrk] Fix extraction 2021-10-23 01:06:51 +02:00
Remita Amine 84e0c2cca9 [pinterest] Add support for large collections(more than 25 pins) 2021-10-23 01:06:51 +02:00
Remita Amine 2aa4d9aab1 [franceinter] flake8 2021-10-23 01:06:51 +02:00
renalid aa8cbb45e3 [franceinter] add thumbnail url (#27153)
Co-authored-by: remitamine <remitamine@gmail.com>
2021-10-23 01:06:51 +02:00
Remita Amine 6908e58791 [box] Add new extractor(#5949) 2021-10-23 01:06:51 +02:00
Jia Rong Yee 711ef6c996 [nytimes] Add new cooking.nytimes.com extractor (#27143)
* [nytimes] support cooking.nytimes.com, resolves #27112

Co-authored-by: remitamine <remitamine@gmail.com>
2021-10-23 01:06:51 +02:00
Remita Amine f20b6d8dd4 [rumble] add support for embed pages(#10785) 2021-10-23 01:06:51 +02:00
Remita Amine 2fdf3447c1 [skyit] add support for multiple Sky Italia websites(closes #26629) 2021-10-23 01:06:51 +02:00
Remita Amine 1dd494ee31 [extractor/common] add generic support for akamai http format extraction 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0ffe0d8bbd [pinterest] Add extractor (closes #25747) 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 63d1574bc8 [svtplay] Fix test title 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c315d7caf0 [svtplay] Add support for svt.se/barnkanalen (closes #24817) 2021-10-23 01:06:51 +02:00
Mattias Wadman b6c66dbc79 [svt] Extract timestamp and thumbnail in more cases (#27130)
Add timestamp, set to "valid from" which i think could been seen as publish time.
Add thumbnail in more cases, seems to was only done in the embedded data case for some reason.
Switch svtplay test url to an existing video and also one with no expire date.
Also add an additional thumbnail url test regex.
2021-10-23 01:06:51 +02:00
Remita Amine 4e68d816b3 [infoq] fix format extraction(closes #25984) 2021-10-23 01:06:51 +02:00
renalid eb38071416 [francetv] Update to fix thumbnail URL issue (#27120)
Fix the thumbnail URL. The issue was here for many years, never fixed. It's done ! :-)

Example : https://www.france.tv/france-2/de-gaulle-l-eclat-et-le-secret/de-gaulle-l-eclat-et-le-secret-saison-1/2035247-solitude.html

failed thumbnail url generated : http://pluzz.francetv.fr/staticftv/ref_emissions/2020-11-02/EMI_1104da66f533cc7dc5d0d07a181a18c2e2fe1d81_20201014122553940.jpg

right thumbnail url fixed : https://sivideo.webservices.francetelevisions.fr/staticftv/ref_emissions/2020-11-02/EMI_1104da66f533cc7dc5d0d07a181a18c2e2fe1d81_20201014122553940.jpg
2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 21c9c8293f [downloader/http] Fix crash during urlopen caused by missing reason o…
…f URLError
2021-10-23 01:06:51 +02:00
Laura Liberda 1250d58b96 improve copykitku patch hook 2021-10-23 01:06:51 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0b84ab95e5 [YoutubeDL] Fix --ignore-errors for playlists with generator-based en…
…tries of url_transparent (closes #27064)
2021-10-23 01:06:50 +02:00
Remita Amine bf3df2dd79 [discoverynetworks] add support new TLC/DMAX URLs(closes #27100) 2021-10-23 01:06:50 +02:00
Remita Amine 1b9cc0baff [rai] fix protocol relative relinker URLs(closes #22766) 2021-10-23 01:06:50 +02:00
Remita Amine 9c2afc03ee [rai] fix unavailable video format detection 2021-10-23 01:06:50 +02:00
Remita Amine 081b5611f1 [rai] improve extraction 2021-10-23 01:06:50 +02:00
Leonardo Taccari 1f784e32ec [rai] Fix extraction for recent raiplay.it updates (#27077)
- Remove first test of RaiPlayIE: it is no longer available
- Make RaiPlayIE extension-agnostic (passing possible `.json' URLs is now
  supported too)
- Adjust RaiPlayLiveIE to recent raiplay.it updates.  Passing it as
  `url_transparent' is no longer supported (there is no longer an accessible
  ContentItem)
- Adjust RaiPlayPlaylistIE to recent raiplay.it updates and instruct it about
  ContentSet-s.
- Update a RaiIE test and remove two tests that are no longer availables

Thanks to @remitamine for the review!
2021-10-23 01:06:50 +02:00
Remita Amine ab19e26c5d [viki] improve format extraction 2021-10-23 01:06:50 +02:00
beefchop 7f6aa287d7 [viki] fix stream extraction from mpd (#27092)
Co-authored-by: beefchop <beefchop@users.noreply.github.com>
2021-10-23 01:06:50 +02:00
Remita Amine 0a76a87779 [amara] improve extraction 2021-10-23 01:06:50 +02:00
Joost Verdoorn 3182075ad6 [Amara] Add new extractor (#20618)
* [Amara] Add new extractor
2021-10-23 01:06:50 +02:00
Remita Amine a9fd199643 [vimeo:album] fix extraction(closes #27079) 2021-10-23 01:06:50 +02:00
Remita Amine dc9feaf10e [mtv] fix mgid extraction(closes #26841) 2021-10-23 01:06:50 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 4f7945880e [youporn] Fix upload date extraction and make comment count optional …
…(closes #26986)
2021-10-23 01:06:50 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 7910bb7586 [arte] Rework extractors
* Reimplement embed and playlist extractors to delegate to the single entrypoint artetv extractor
  Beware reluctant download archive extractor keys breakage.
* Improve embeds detection (closes #27057)
- Remove obsolete code
2021-10-23 01:06:50 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 60ee706bbd [arte] Extract m3u8 formats (closes #27061) 2021-10-23 01:06:50 +02:00
Remita Amine fe97445dc3 [mgtv] fix format extraction(closes #26415) 2021-10-23 01:06:50 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 00a35be6f7 [extractor/common] Output error for invalid URLs in _is_valid_url (re…
…fs #21400, refs #24151, refs #25617, refs #25618, refs #25586, refs #26068, refs #27072)
2021-10-23 01:06:50 +02:00
Remita Amine 276aef5dde [francetv] improve info extraction 2021-10-23 01:06:50 +02:00
gdzx c8a6cd5640 [francetv] Add fallback video url extraction (#27047)
Fallback on another API endpoint when no video formats are found.

Closes ytdl-org#22561
2021-10-23 01:06:50 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= d445ef16e7 [spiegel] Fix extraction (closes #24206, closes #24767)
Code picked from PR #24767 since original repo is not available due to takedown.
2021-10-23 01:06:50 +02:00
Remita Amine f91186eb70 [malltv] fix extraction(closes #27035) 2021-10-23 01:06:50 +02:00
Remita Amine d420d23ad2 [bandcamp] extract playlist_description(closes #22684) 2021-10-23 01:06:50 +02:00
Remita Amine e23e51ca94 [urplay] fix extraction(closes #26828) 2021-10-23 01:06:50 +02:00
Remita Amine 02f491e375 [lrt] fix extraction with empty tags(closes #20264) 2021-10-23 01:06:50 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 71d074ba46 [ndr:embed:base] Extract subtitles (closes #25447, closes #26106) 2021-10-23 01:06:50 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= cb9623206e [servus] Add support for pm-wissen.com (closes #25869) 2021-10-23 01:06:50 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 4b3da7ce47 [servus] Fix extraction (closes #26872, closes #26967, closes #26983,…
closes #27000)
2021-10-23 01:06:50 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 23c968301c [xtube] Fix extraction (closes #26996) 2021-10-23 01:06:50 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 64e706884a [utils] Skip ! prefixed code in js_to_json 2021-10-23 01:06:50 +02:00
Remita Amine 41ba58cea2 [lrt] fix extraction 2021-10-23 01:06:50 +02:00
Remita Amine 099936f4e6 [condenast] fix extraction and extract subtitles 2021-10-23 01:06:50 +02:00
Remita Amine 3ac79c2585 [bandcamp] fix extraction 2021-10-23 01:06:50 +02:00
Remita Amine fc645d1052 [rai] fix RaiPlay extraction 2021-10-23 01:06:50 +02:00
Remita Amine 065a8e40de [usanetwork] fix extraction 2021-10-23 01:06:50 +02:00
Remita Amine 8ff536d5be [nbc] fix NBCNews/Today/MSNBC extraction 2021-10-23 01:06:50 +02:00
Edward Betts 3a7feb93dc [devscripts/make_lazy_extractors] Correct a spelling mistake (#26991) 2021-10-23 01:06:50 +02:00
Remita Amine d23d09c709 [cnbc] fix extraction 2021-10-23 01:06:50 +02:00
320 changed files with 19450 additions and 14873 deletions

2
.github/FUNDING.yml vendored Normal file
View file

@ -0,0 +1,2 @@
github: selfisekai
ko_fi: selfisekai

1
.gitignore vendored
View file

@ -15,6 +15,7 @@ haruhi-dl.1
haruhi-dl.bash-completion
haruhi-dl.fish
haruhi_dl/extractor/lazy_extractors.py
haruhi_dl/extractor_artifacts/
haruhi-dl
haruhi-dl.exe
haruhi-dl.tar.gz

View file

@ -1,8 +1,29 @@
default:
before_script:
- sed -i "s@dl-cdn.alpinelinux.org@alpine.sakamoto.pl@g" /etc/apk/repositories
- apk add bash
- pip install nose
pypy3.6-core:
image: pypy:3.6-slim
variables:
HDL_TEST_SET: core
before_script:
- apt-get update && apt-get install -y bash && apt-get clean
- pip install nose
script:
- ./devscripts/run_tests.sh
pypy3.7-core:
image: pypy:3.7-slim
variables:
HDL_TEST_SET: core
before_script:
- apt-get update && apt-get install -y bash && apt-get clean
- pip install nose
script:
- ./devscripts/run_tests.sh
py3.6-core:
image: python:3.6-alpine
variables:
@ -39,18 +60,6 @@ py3.9-download:
script:
- ./devscripts/run_tests.sh
#jython-core:
# image: openjdk:11-slim
# variables:
# HDL_TEST_SET: core
# allow_failure: true
# before_script:
# - apt-get update
# - apt-get install -y wget
# - ./devscripts/install_jython.sh
# - export PATH="$HOME/jython/bin:$PATH"
# script:
# - ./devscripts/run_tests.sh
playwright-tests-core:
image: mcr.microsoft.com/playwright:focal

594
ChangeLog
View file

@ -1,3 +1,597 @@
version 2021.08.01
Extractor
* [youtube] fixed agegate
* [niconico] dmc downloader from youtube-dlp
* [peertube] new URL schemas
version 2021.06.20
Core
* [playwright] fixed headlessness
+ [playwright] option to force a specific browser
Extractor
* [tiktok] fix empty video lists
* [youtube] fix and speed-up age-gate circumvention
* [youtube] fix videos with JS-like syntax
version 2021.06.01
Core
* merging formats by codecs
* [json_ld] better author extraction
+ --force-use-mastodon option
* support for HTTP 308 redirects
+ [test_execution] add test for lazy extractors
* Improve extract_info doc
* [options] Fix thumbnail option group name
Extractor
* [tvp:series] fallback to web
- [ninateka] remove extractor
* [tvn24] refactor handling next.js frontend
* [cda] fix premium videos for premium users (?)
* [tvp] support for tvp.info vue.js pages
+ [sejm.gov.pl] new extractors
+ [senat.gov.pl] new extractors
* [spreaker] new url schemes
* [spreaker] support for embedded player
+ [spryciarze.pl] new extractors
+ [castos] new extractors
+ [magentamusik360] new extractor
+ [arnes] new extractor
+ [palcomp3] new extractor
* [screencastomatic] fix extraction
* [youku] update ccode
+ [line] support live.line.me
* [curiositystream] fix format extraction
* [jamendo] fix track extraction
* [pornhub] extracting DASH and HLS formats
* [mtv] fix Viacom A/B testing video player
+ [maoritv] new extractor
* [pluralsight] extend anti-throttling timeout
* [mastodon] support for soapbox and audio files
* [tvp] fix jp2.tvp.pl
* [youtube:channel] fix multiple page extraction
* [tvp:embed] handling formats better way
* [tvn] better extraction method choosing
* [tvp] fix tvp:website extracting with weird urls
+ [wppilot] new extractors
+ [mastodon] logging in to mastodon/pleroma
+ [mastodon] fetching posts via different instances
+ [mastodon] fetching peertube videos via pleroma instances
* [bbc] extract full description from __INITIAL_DATA__
* [tver] redirect all downloads to Brightcove
* [medaltv] fix extraction
* [francetvinfo] improve video id extraction
* [xfileshare] support for wolfstream.tv
* [tv2dk] fix extraction
* [svtplay] improve extraction
* [xtube] fix formats extraction
* [twitter] improve formats extraction from vmap URL
* [mastodon] cache apps on logging in
* [mastodon] support cards to external services
* [peertube] logging in
* [tiktok] deduplicate videos
+ [misskey] new extractor
+ [radiokapital] new extractors
* [youtube] fix videos with age gate
* [kaltura] Make embed code alternatives actually work
* [kaltura] Improve iframe extraction
* [dispeak] Improve FLV extraction
* [dispeak] DRY and update tests
* [gdcvault] Add support for HTML5 videos
* [funimation] Add support for optional lang code in URLs
* [medaltv] Relax _VALID_URL
- [blinkx] Remove extractor
* [orf:radio] Switch download URLs to HTTPS
+ [generic] Add Referer header for direct videojs download URLs
+ [vk] Add support for sibnet embeds
+ [generic] Add support for sibnet embeds
* [phoenix] Fix extraction
* [generic] Add support for og:audio
* [vivo] Add support for vivo.st
* [eroprofile] Fix extraction
* [playstuff] Add extractor
* [shahid] relax _VALID_URL
* [redbulltv] fix embed data extraction
* [vimeo] fix vimeo pro embed extraction
* [twitch:clips] Add access token query to download URLs
* [twitch:clips] Improve extraction
* [ted] Prefer own formats over external sources
* [ustream] Detect https embeds
* [ard] Relax _VALID_URL and fix video ids
version 2021.04.01
Core
- Removed Herobrine
Extractor
* [youtube] fixed GDPR consent workaround
* [instagram] improve title extraction and extract duration
* [francetvinfo] improve video ID extraction
* [vlive] merge all updates from YTDL
version 2021.03.30
Core
* `--ie-key` commandline option for selecting specific extractor
Extractor
* [tiktok] detect private videos
* [dw:article] fix extractor
+ [patroniteaudio] added extractor
+ [sbs] Add support for ondemand watch URLs
* [picarto] Fix live stream extraction
* [vimeo] Fix unlisted video extraction
* [ard] Improve clip id extraction
+ [zoom] Add support for zoom.us
* [bbc] Fix BBC IPlayer Episodes/Group extraction
* [zingmp3] Fix extraction
* [youtube] added workaround for cookie consent
version 2021.03.21
Core
* [playwright] More verbose errors
- Removed a lot of deprecated platform support code
* New win32 exe build system
+ Support for BitTorrent formats
+ Support for VTT subtitles in m3u8 (HLS) manifests
+ `release_timestamp` meta field
Extractor
+ [acast:player] new extractor
+ [videotarget] new extractor
* [youtube] caching extracted signature functions
* [go] fix extraction
* [youtube] more descriptive geo-lock messages (with countries)
* [polskieradio] podcast support
* [onnetwork] refactored extraction
+ [tiktok] hashtag and music extractors
* [peertube] bittorrent formats
* [generic] detecting bittorrent manifest files
+ bittorrent magnet extractor
* [generic] extracting mpd manifests properly
* [youtube] better signature handling for DASH formats
* [youtube] some DASH formats are now just static files
+ [polskieradio] radiokierowcow.pl extractor
* [pulsevideo] unduplicating formats
+ [tvp:embed] extracting video subtitles
+ [bandaichannel] Add new extractor
* [urplay] fix episode data extraction
* [stretchinternet] Fix extraction
* [zdf] Rework extractors
+ [bbc] add support for BBC Reel videos
* [9c9media] fix extraction for videos with multiple ContentPackages
* [voxmedia] fix volume embed extraction
* [trovo] Add Origin header to VOD formats
* [cbs] add support for Paramount+
* [bilibili] fix video info extraction
* [pornhub] Extract formats from get_media end point
* [pornhub] Detect flagged videos
* [bandcamp] Extract release_timestamp
* [shahid] fix format extraction
* [fujitv] fix HLS formats extension
* [tver] improve title extraction
* [pinterest] reduce the number of HLS format requests
* [sportdeutschland] fix extraction
* [southpark] Fix extraction and add support for southparkstudios.com
* [rtve] improve extraction
* [applepodcasts] fix extraction
* [svtplay] Improve extraction
* [mlb] fix video extracion
* [vvvvid] fix kenc format extraction
* [vimeo:album] Fix extraction for albums with number of videos multiple to page size
* [peertube] improve thumbnail extraction
* [yandexmusic] Refactor and add support for artist's tracks and albums
* [yandexmusic:album] Improve album title extraction
* [yandexmusic] DRY _VALID_URL base
* [yandexmusic] Add support for music.yandex.com
* [yandexmusic:playlist] Request missing tracks in chunks
- [tvnplayer] removed extractor
* [youtube] meaningful error for age-gated no-embed videos
version 2021.03.01
Extractor
* [cda] logging in with a user account
* [peertube] playlist, channel and account extractor
version 2021.02.27
Core
+ Use proxy sites option
Extractor
+ Nitter extractor
version 2021.02.26
A lot of changes merged back from youtube-dl, thanks to the Copykitku project
Core
+ [postprocessor/embedthumbnail] Recognize atomicparsley binary in lowercase
* Introduce --output-na-placeholder (https://github.com/ytdl-org/youtube-dl/issues/27896)
* Protect from infinite recursion due to recursively nested
playlists (https://github.com/ytdl-org/youtube-dl/issues/27833)
* Ignore failure to create existing directory (https://github.com/ytdl-org/youtube-dl/issues/27811)
* Raise syntax error for format selection expressions with multiple
+ operators (https://github.com/ytdl-org/youtube-dl/issues/27803)
* [downloader/hls] Disable decryption in tests (https://github.com/ytdl-org/youtube-dl/issues/27660)
+ [utils] Add a function to clean podcast URLs
* [utils] Accept only supported protocols in url_or_none
* Allow format filtering using audio language (https://github.com/ytdl-org/youtube-dl/issues/16209)
* [common] Remove unwanted query params from unsigned akamai manifest URLs
* [extractor/common] Improve JSON-LD interaction statistic extraction (https://github.com/ytdl-org/youtube-dl/issues/23306)
* [downloader/hls] Delegate manifests with media initialization to ffmpeg
+ [extractor/common] Document duration meta field for playlists
* Improve thumbnail filename deducing (https://github.com/ytdl-org/youtube-dl/issues/26010, https://github.com/ytdl-org/youtube-dl/issues/27244)
* [extractor/common] Fix inline HTML5 media tags processing (https://github.com/ytdl-org/youtube-dl/issues/27345)
* [extractor/common] Extract timestamp from Last-Modified header
+ [extractor/common] Add support for dl8-* media tags (https://github.com/ytdl-org/youtube-dl/issues/27283)
* [extractor/common] Fix media type extraction for HTML5 media tags
in start/end form
* [extractor/common] Improve Akamai HTTP format extraction
* Allow m3u8 manifest without an additional audio format
* Fix extraction for qualities starting with a number
* Write static debug to stderr and respect quiet for dynamic debug
(https://github.com/ytdl-org/youtube-dl/issues/14579, https://github.com/ytdl-org/youtube-dl/issues/22593)
* [downloader/fragment] Set final file's mtime according to last fragment's
Last-Modified header (https://github.com/ytdl-org/youtube-dl/issues/11718, https://github.com/ytdl-org/youtube-dl/issues/18384, https://github.com/ytdl-org/youtube-dl/issues/27138)
+ [extractor/common] Add generic support for akamai HTTP format extraction
* [downloader/http] Fix crash during urlopen caused by missing reason
of URLError
* Fix --ignore-errors for playlists with generator-based entries
of url_transparent (https://github.com/ytdl-org/youtube-dl/issues/27064)
* [extractor/common] Output error for invalid URLs in _is_valid_url (https://github.com/ytdl-org/youtube-dl/issues/21400,
https://github.com/ytdl-org/youtube-dl/issues/24151, https://github.com/ytdl-org/youtube-dl/issues/25617, https://github.com/ytdl-org/youtube-dl/issues/25618, https://github.com/ytdl-org/youtube-dl/issues/25586, https://github.com/ytdl-org/youtube-dl/issues/26068, https://github.com/ytdl-org/youtube-dl/issues/27072)
+ [extractor/common] next.js data search function
- Removed Herobrine
Extractor
* [apa] Fix and improve extraction (https://github.com/ytdl-org/youtube-dl/issues/27750)
+ [youporn] Extract duration (https://github.com/ytdl-org/youtube-dl/issues/28019)
+ [samplefocus] Add support for samplefocus.com (https://github.com/ytdl-org/youtube-dl/issues/27763)
+ [vimeo] Add support for unlisted video source format extraction
* [viki] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/26522, https://github.com/ytdl-org/youtube-dl/issues/28203)
* Extract uploader URL and episode number
* Report login required error
+ Extract 480p formats
* Fix API v4 calls
* [ninegag] Unescape title (https://github.com/ytdl-org/youtube-dl/issues/28201)
+ [dplay] Add support for de.hgtv.com (https://github.com/ytdl-org/youtube-dl/issues/28182)
+ [dplay] Add support for discoveryplus.com (https://github.com/ytdl-org/youtube-dl/issues/24698)
+ [simplecast] Add support for simplecast.com (https://github.com/ytdl-org/youtube-dl/issues/24107)
* [yandexmusic:playlist] Request missing tracks in chunks (https://github.com/ytdl-org/youtube-dl/issues/27355, https://github.com/ytdl-org/youtube-dl/issues/28184)
+ [storyfire] Add support for storyfire.com (https://github.com/ytdl-org/youtube-dl/issues/25628, https://github.com/ytdl-org/youtube-dl/issues/26349)
+ [zhihu] Add support for zhihu.com (https://github.com/ytdl-org/youtube-dl/issues/28177)
* [ccma] Fix timestamp parsing in python 2
+ [videopress] Add support for video.wordpress.com
* [kakao] Improve info extraction and detect geo restriction (https://github.com/ytdl-org/youtube-dl/issues/26577)
* [xboxclips] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27151)
* [ard] Improve formats extraction (https://github.com/ytdl-org/youtube-dl/issues/28155)
+ [canvas] Add support for dagelijksekost.een.be (https://github.com/ytdl-org/youtube-dl/issues/28119)
* [ign] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/24771)
+ [xhamster] Extract format filesize
+ [xhamster] Extract formats from xplayer settings (https://github.com/ytdl-org/youtube-dl/issues/28114)
* [archiveorg] Fix and improve extraction (https://github.com/ytdl-org/youtube-dl/issues/21330, https://github.com/ytdl-org/youtube-dl/issues/23586, https://github.com/ytdl-org/youtube-dl/issues/25277, https://github.com/ytdl-org/youtube-dl/issues/26780,
https://github.com/ytdl-org/youtube-dl/issues/27109, https://github.com/ytdl-org/youtube-dl/issues/27236, https://github.com/ytdl-org/youtube-dl/issues/28063)
* [urplay] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/28073, https://github.com/ytdl-org/youtube-dl/issues/28074)
* [azmedien] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/28064)
* [pornhub] Implement lazy playlist extraction
* [svtplay] Fix video id extraction (https://github.com/ytdl-org/youtube-dl/issues/28058)
+ [pornhub] Add support for authentication (https://github.com/ytdl-org/youtube-dl/issues/18797, https://github.com/ytdl-org/youtube-dl/issues/21416, https://github.com/ytdl-org/youtube-dl/issues/24294)
* [pornhub:user] Improve paging
+ [pornhub:user] Add support for URLs unavailable via /videos page (https://github.com/ytdl-org/youtube-dl/issues/27853)
+ [bravotv] Add support for oxygen.com (https://github.com/ytdl-org/youtube-dl/issues/13357, https://github.com/ytdl-org/youtube-dl/issues/22500)
* [ccma] Improve metadata extraction (https://github.com/ytdl-org/youtube-dl/issues/27994)
+ Extract age limit, alt title, categories, series and episode number
* Fix timestamp multiple subtitles extraction
* [egghead] Update API domain (https://github.com/ytdl-org/youtube-dl/issues/28038)
- [vidzi] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/12629)
* [vidio] Improve metadata extraction
+ [vvvvid] Add support for youtube embeds (https://github.com/ytdl-org/youtube-dl/issues/27825)
* [vlive] Fix error message decoding for python 2 (https://github.com/ytdl-org/youtube-dl/issues/28004)
+ [awaan] Extract uploader id (https://github.com/ytdl-org/youtube-dl/issues/27963)
+ [medialaan] Add support DPG Media MyChannels based websites (https://github.com/ytdl-org/youtube-dl/issues/14871, https://github.com/ytdl-org/youtube-dl/issues/15597,
https://github.com/ytdl-org/youtube-dl/issues/16106, https://github.com/ytdl-org/youtube-dl/issues/16489)
* [abcnews] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/12394, https://github.com/ytdl-org/youtube-dl/issues/27920)
* [AMP] Fix upload date and timestamp extraction (https://github.com/ytdl-org/youtube-dl/issues/27970)
* [tv4] Relax URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/27964)
+ [tv2] Add support for mtvuutiset.fi (https://github.com/ytdl-org/youtube-dl/issues/27744)
* [adn] Improve login warning reporting
* [zype] Fix uplynk id extraction (https://github.com/ytdl-org/youtube-dl/issues/27956)
+ [adn] Add support for authentication (https://github.com/ytdl-org/youtube-dl/issues/17091, https://github.com/ytdl-org/youtube-dl/issues/27841, https://github.com/ytdl-org/youtube-dl/issues/27937)
* [franceculture] Make thumbnail optional (https://github.com/ytdl-org/youtube-dl/issues/18807)
* [franceculture] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27891, https://github.com/ytdl-org/youtube-dl/issues/27903)
* [njpwworld] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27890)
* [comedycentral] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27905)
* [wat] Fix format extraction (https://github.com/ytdl-org/youtube-dl/issues/27901)
+ [americastestkitchen:season] Add support for seasons (https://github.com/ytdl-org/youtube-dl/issues/27861)
+ [trovo] Add support for trovo.live (https://github.com/ytdl-org/youtube-dl/issues/26125)
+ [aol] Add support for yahoo videos (https://github.com/ytdl-org/youtube-dl/issues/26650)
* [yahoo] Fix single video extraction
* [9gag] Fix and improve extraction (https://github.com/ytdl-org/youtube-dl/issues/23022)
* [americastestkitchen] Improve metadata extraction for ATK episodes (https://github.com/ytdl-org/youtube-dl/issues/27860)
* [aljazeera] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/20911, https://github.com/ytdl-org/youtube-dl/issues/27779)
+ [minds] Add support for minds.com (https://github.com/ytdl-org/youtube-dl/issues/17934)
* [ard] Fix title and description extraction (https://github.com/ytdl-org/youtube-dl/issues/27761)
+ [spotify] Add support for Spotify Podcasts (https://github.com/ytdl-org/youtube-dl/issues/27443)
+ [animeondemand] Add support for lazy playlist extraction (https://github.com/ytdl-org/youtube-dl/issues/27829)
* [youporn] Restrict fallback download URL (https://github.com/ytdl-org/youtube-dl/issues/27822)
* [youporn] Improve height and tbr extraction (https://github.com/ytdl-org/youtube-dl/issues/20425, https://github.com/ytdl-org/youtube-dl/issues/23659)
* [youporn] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27822)
+ [twitter] Add support for unified cards (https://github.com/ytdl-org/youtube-dl/issues/27826)
+ [twitch] Add Authorization header with OAuth token for GraphQL requests
(https://github.com/ytdl-org/youtube-dl/issues/27790)
* [mixcloud:playlist:base] Extract video id in flat playlist mode (https://github.com/ytdl-org/youtube-dl/issues/27787)
* [cspan] Improve info extraction (https://github.com/ytdl-org/youtube-dl/issues/27791)
* [adn] Improve info extraction
* [adn] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/26963, https://github.com/ytdl-org/youtube-dl/issues/27732)
* [twitch] Improve login error extraction
* [twitch] Fix authentication (https://github.com/ytdl-org/youtube-dl/issues/27743)
* [3qsdn] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/21058)
* [peertube] Extract formats from streamingPlaylists (https://github.com/ytdl-org/youtube-dl/issues/26002, https://github.com/ytdl-org/youtube-dl/issues/27586, https://github.com/ytdl-org/youtube-dl/issues/27728)
* [khanacademy] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/2887, https://github.com/ytdl-org/youtube-dl/issues/26803)
* [spike] Update Paramount Network feed URL (https://github.com/ytdl-org/youtube-dl/issues/27715)
* [rai] Improve subtitles extraction (https://github.com/ytdl-org/youtube-dl/issues/27698, https://github.com/ytdl-org/youtube-dl/issues/27705)
* [canvas] Match only supported VRT NU URLs (https://github.com/ytdl-org/youtube-dl/issues/27707)
+ [bibeltv] Add support for bibeltv.de (https://github.com/ytdl-org/youtube-dl/issues/14361)
+ [bfmtv] Add support for bfmtv.com (https://github.com/ytdl-org/youtube-dl/issues/16053, https://github.com/ytdl-org/youtube-dl/issues/26615)
+ [sbs] Add support for ondemand play and news embed URLs (https://github.com/ytdl-org/youtube-dl/issues/17650, https://github.com/ytdl-org/youtube-dl/issues/27629)
* [twitch] Drop legacy kraken API v5 code altogether and refactor
* [twitch:vod] Switch to GraphQL for video metadata
* [canvas] Fix VRT NU extraction (https://github.com/ytdl-org/youtube-dl/issues/26957, https://github.com/ytdl-org/youtube-dl/issues/27053)
* [twitch] Switch access token to GraphQL and refactor (https://github.com/ytdl-org/youtube-dl/issues/27646)
+ [rai] Detect ContentItem in iframe (https://github.com/ytdl-org/youtube-dl/issues/12652, https://github.com/ytdl-org/youtube-dl/issues/27673)
* [ketnet] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27662)
+ [dplay] Add suport Discovery+ domains (https://github.com/ytdl-org/youtube-dl/issues/27680)
* [motherless] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/26495, https://github.com/ytdl-org/youtube-dl/issues/27450)
* [motherless] Fix recent videos upload date extraction (https://github.com/ytdl-org/youtube-dl/issues/27661)
* [nrk] Fix extraction for videos without a legalAge rating
- [googleplus] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/4955, https://github.com/ytdl-org/youtube-dl/issues/7400)
+ [applepodcasts] Add support for podcasts.apple.com (https://github.com/ytdl-org/youtube-dl/issues/25918)
+ [googlepodcasts] Add support for podcasts.google.com
+ [iheart] Add support for iheart.com (https://github.com/ytdl-org/youtube-dl/issues/27037)
* [acast] Clean podcast URLs
* [stitcher] Clean podcast URLs
+ [xfileshare] Add support for aparat.cam (https://github.com/ytdl-org/youtube-dl/issues/27651)
+ [twitter] Add support for summary card (https://github.com/ytdl-org/youtube-dl/issues/25121)
* [twitter] Try to use a Generic fallback for unknown twitter cards (https://github.com/ytdl-org/youtube-dl/issues/25982)
+ [stitcher] Add support for shows and show metadata extraction (https://github.com/ytdl-org/youtube-dl/issues/20510)
* [stv] Improve episode id extraction (https://github.com/ytdl-org/youtube-dl/issues/23083)
* [nrk] Improve series metadata extraction (https://github.com/ytdl-org/youtube-dl/issues/27473)
+ [nrk] Extract subtitles
* [nrk] Fix age limit extraction
* [nrk] Improve video id extraction
+ [nrk] Add support for podcasts (https://github.com/ytdl-org/youtube-dl/issues/27634, https://github.com/ytdl-org/youtube-dl/issues/27635)
* [nrk] Generalize and delegate all item extractors to nrk
+ [nrk] Add support for mp3 formats
* [nrktv] Switch to playback endpoint
* [vvvvid] Fix season metadata extraction (https://github.com/ytdl-org/youtube-dl/issues/18130)
* [stitcher] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/20811, https://github.com/ytdl-org/youtube-dl/issues/27606)
* [acast] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/21444, https://github.com/ytdl-org/youtube-dl/issues/27612, https://github.com/ytdl-org/youtube-dl/issues/27613)
+ [arcpublishing] Add support for arcpublishing.com (https://github.com/ytdl-org/youtube-dl/issues/2298, https://github.com/ytdl-org/youtube-dl/issues/9340, https://github.com/ytdl-org/youtube-dl/issues/17200)
+ [sky] Add support for Sports News articles and Brighcove videos (https://github.com/ytdl-org/youtube-dl/issues/13054)
+ [vvvvid] Extract akamai formats
* [vvvvid] Skip unplayable episodes (https://github.com/ytdl-org/youtube-dl/issues/27599)
* [yandexvideo] Fix extraction for Python 3.4
+ [redditr] Extract all thumbnails (https://github.com/ytdl-org/youtube-dl/issues/27503)
* [vvvvid] Improve info extraction
+ [vvvvid] Add support for playlists (https://github.com/ytdl-org/youtube-dl/issues/18130, https://github.com/ytdl-org/youtube-dl/issues/27574)
* [yandexvideo] Use old API call as fallback
* [yandexvideo] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/25000)
- [nbc] Remove CSNNE extractor
* [nbc] Fix NBCSport VPlayer URL extraction (https://github.com/ytdl-org/youtube-dl/issues/16640)
+ [aenetworks] Add support for biography.com (https://github.com/ytdl-org/youtube-dl/issues/3863)
* [uktvplay] Match new video URLs (https://github.com/ytdl-org/youtube-dl/issues/17909)
* [sevenplay] Detect API errors
* [tenplay] Fix format extraction (https://github.com/ytdl-org/youtube-dl/issues/26653)
* [brightcove] Raise error for DRM protected videos (https://github.com/ytdl-org/youtube-dl/issues/23467, https://github.com/ytdl-org/youtube-dl/issues/27568)
* [aparat] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/22285, https://github.com/ytdl-org/youtube-dl/issues/22611, https://github.com/ytdl-org/youtube-dl/issues/23348, https://github.com/ytdl-org/youtube-dl/issues/24354, https://github.com/ytdl-org/youtube-dl/issues/24591, https://github.com/ytdl-org/youtube-dl/issues/24904,
https://github.com/ytdl-org/youtube-dl/issues/25418, https://github.com/ytdl-org/youtube-dl/issues/26070, https://github.com/ytdl-org/youtube-dl/issues/26350, https://github.com/ytdl-org/youtube-dl/issues/26738, https://github.com/ytdl-org/youtube-dl/issues/27563)
- [brightcove] Remove sonyliv specific code
* [piksel] Improve format extraction
+ [zype] Add support for uplynk videos
+ [toggle] Add support for live.mewatch.sg (https://github.com/ytdl-org/youtube-dl/issues/27555)
+ [go] Add support for fxnow.fxnetworks.com (https://github.com/ytdl-org/youtube-dl/issues/13972, https://github.com/ytdl-org/youtube-dl/issues/22467, https://github.com/ytdl-org/youtube-dl/issues/23754, https://github.com/ytdl-org/youtube-dl/issues/26826)
* [teachable] Improve embed detection (https://github.com/ytdl-org/youtube-dl/issues/26923)
* [mitele] Fix free video extraction (https://github.com/ytdl-org/youtube-dl/issues/24624, https://github.com/ytdl-org/youtube-dl/issues/25827, https://github.com/ytdl-org/youtube-dl/issues/26757)
* [telecinco] Fix extraction
* [youtube] Update invidious.snopyta.org (https://github.com/ytdl-org/youtube-dl/issues/22667)
* [amcnetworks] Improve auth only video detection (https://github.com/ytdl-org/youtube-dl/issues/27548)
+ [generic] Add support for VHX Embeds (https://github.com/ytdl-org/youtube-dl/issues/27546)
* [instagram] Fix comment count extraction
+ [instagram] Add support for reel URLs (https://github.com/ytdl-org/youtube-dl/issues/26234, https://github.com/ytdl-org/youtube-dl/issues/26250)
* [bbc] Switch to media selector v6 (https://github.com/ytdl-org/youtube-dl/issues/23232, https://github.com/ytdl-org/youtube-dl/issues/23933, https://github.com/ytdl-org/youtube-dl/issues/26303, https://github.com/ytdl-org/youtube-dl/issues/26432, https://github.com/ytdl-org/youtube-dl/issues/26821,
https://github.com/ytdl-org/youtube-dl/issues/27538)
* [instagram] Improve thumbnail extraction
* [instagram] Fix extraction when authenticated (https://github.com/ytdl-org/youtube-dl/issues/22880, https://github.com/ytdl-org/youtube-dl/issues/26377, https://github.com/ytdl-org/youtube-dl/issues/26981,
https://github.com/ytdl-org/youtube-dl/issues/27422)
* [spankbang:playlist] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/24087)
+ [spankbang] Add support for playlist videos
* [pornhub] Improve like and dislike count extraction (https://github.com/ytdl-org/youtube-dl/issues/27356)
* [pornhub] Fix lq formats extraction (https://github.com/ytdl-org/youtube-dl/issues/27386, https://github.com/ytdl-org/youtube-dl/issues/27393)
+ [bongacams] Add support for bongacams.com (https://github.com/ytdl-org/youtube-dl/issues/27440)
* [theweatherchannel] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/25930, https://github.com/ytdl-org/youtube-dl/issues/26051)
+ [sprout] Add support for Universal Kids (https://github.com/ytdl-org/youtube-dl/issues/22518)
* [theplatform] Allow passing geo bypass countries from other extractors
+ [wistia] Add support for playlists (https://github.com/ytdl-org/youtube-dl/issues/27533)
+ [ctv] Add support for ctv.ca (https://github.com/ytdl-org/youtube-dl/issues/27525)
* [9c9media] Improve info extraction
* [sonyliv] Fix title for movies
* [sonyliv] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/25667)
* [streetvoice] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27455, https://github.com/ytdl-org/youtube-dl/issues/27492)
+ [facebook] Add support for watchparty pages (https://github.com/ytdl-org/youtube-dl/issues/27507)
* [cbslocal] Fix video extraction
+ [brightcove] Add another method to extract policyKey
* [mewatch] Relax URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/27506)
- [tastytrade] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/25716)
* [niconico] Fix playlist extraction (https://github.com/ytdl-org/youtube-dl/issues/27428)
- [everyonesmixtape] Remove extractor
- [kanalplay] Remove extractor
* [arkena] Fix extraction
* [nba] Rewrite extractor
* [turner] Improve info extraction
* [generic] Improve RSS age limit extraction
* [generic] Fix RSS itunes thumbnail extraction (https://github.com/ytdl-org/youtube-dl/issues/27405)
+ [redditr] Extract duration (https://github.com/ytdl-org/youtube-dl/issues/27426)
- [zaq1] Remove extractor
+ [asiancrush] Add support for retrocrush.tv
* [asiancrush] Fix extraction
- [noco] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/10864)
* [nfl] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/22245)
* [skysports] Relax URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/27435)
+ [tv5unis] Add support for tv5unis.ca (https://github.com/ytdl-org/youtube-dl/issues/22399, https://github.com/ytdl-org/youtube-dl/issues/24890)
+ [videomore] Add support for more.tv (https://github.com/ytdl-org/youtube-dl/issues/27088)
+ [nhk:program] Add support for audio programs and program clips
+ [nhk] Add support for NHK video programs (https://github.com/ytdl-org/youtube-dl/issues/27230)
* [mdr] Bypass geo restriction
* [mdr] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/24346, https://github.com/ytdl-org/youtube-dl/issues/26873)
* [eporner] Fix view count extraction and make optional (https://github.com/ytdl-org/youtube-dl/issues/23306)
+ [eporner] Extend URL regular expression
* [eporner] Fix hash extraction and extend _VALID_URL (https://github.com/ytdl-org/youtube-dl/issues/27396)
* [slideslive] Use m3u8 entry protocol for m3u8 formats (https://github.com/ytdl-org/youtube-dl/issues/27400)
* [twitcasting] Fix format extraction and improve info extraction (https://github.com/ytdl-org/youtube-dl/issues/24868)
* [linuxacademy] Fix authentication and extraction (https://github.com/ytdl-org/youtube-dl/issues/21129, https://github.com/ytdl-org/youtube-dl/issues/26223, https://github.com/ytdl-org/youtube-dl/issues/27402)
* [itv] Clean description from HTML tags (https://github.com/ytdl-org/youtube-dl/issues/27399)
* [vlive] Sort live formats (https://github.com/ytdl-org/youtube-dl/issues/27404)
* [hotstart] Fix and improve extraction
* Fix format extraction (https://github.com/ytdl-org/youtube-dl/issues/26690)
+ Extract thumbnail URL (https://github.com/ytdl-org/youtube-dl/issues/16079, https://github.com/ytdl-org/youtube-dl/issues/20412)
+ Add support for country specific playlist URLs (https://github.com/ytdl-org/youtube-dl/issues/23496)
* Select the last id in video URL (https://github.com/ytdl-org/youtube-dl/issues/26412)
+ [youtube] Add some invidious instances (https://github.com/ytdl-org/youtube-dl/issues/27373)
+ [ruutu] Extract more metadata
+ [ruutu] Detect non-free videos (https://github.com/ytdl-org/youtube-dl/issues/21154)
* [ruutu] Authenticate format URLs (https://github.com/ytdl-org/youtube-dl/issues/21031, https://github.com/ytdl-org/youtube-dl/issues/26782)
+ [ruutu] Add support for static.nelonenmedia.fi (https://github.com/ytdl-org/youtube-dl/issues/25412)
+ [ruutu] Extend URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/24839)
+ [facebook] Add support archived live video URLs (https://github.com/ytdl-org/youtube-dl/issues/15859)
* [wdr] Improve overall extraction
+ [wdr] Extend subtitles extraction (https://github.com/ytdl-org/youtube-dl/issues/22672, https://github.com/ytdl-org/youtube-dl/issues/22723)
+ [facebook] Add support for videos attached to Relay based story pages
(https://github.com/ytdl-org/youtube-dl/issues/10795)
+ [wdr:page] Add support for kinder.wdr.de (https://github.com/ytdl-org/youtube-dl/issues/27350)
+ [facebook] Add another regular expression for handleServerJS
* [facebook] Fix embed page extraction
+ [facebook] Add support for Relay post pages (https://github.com/ytdl-org/youtube-dl/issues/26935)
+ [facebook] Add support for watch videos (https://github.com/ytdl-org/youtube-dl/issues/22795, https://github.com/ytdl-org/youtube-dl/issues/27062)
+ [facebook] Add support for group posts with multiple videos (https://github.com/ytdl-org/youtube-dl/issues/19131)
* [itv] Fix series metadata extraction (https://github.com/ytdl-org/youtube-dl/issues/26897)
- [itv] Remove old extraction method (https://github.com/ytdl-org/youtube-dl/issues/23177)
* [facebook] Redirect mobile URLs to desktop URLs (https://github.com/ytdl-org/youtube-dl/issues/24831, https://github.com/ytdl-org/youtube-dl/issues/25624)
+ [facebook] Add support for Relay based pages (https://github.com/ytdl-org/youtube-dl/issues/26823)
* [facebook] Try to reduce unnecessary tahoe requests
- [facebook] Remove hardcoded Chrome User-Agent (https://github.com/ytdl-org/youtube-dl/issues/18974, https://github.com/ytdl-org/youtube-dl/issues/25411, https://github.com/ytdl-org/youtube-dl/issues/26958,
https://github.com/ytdl-org/youtube-dl/issues/27329)
- [smotri] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/27358)
- [beampro] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/17290, https://github.com/ytdl-org/youtube-dl/issues/22871, https://github.com/ytdl-org/youtube-dl/issues/23020, https://github.com/ytdl-org/youtube-dl/issues/23061, https://github.com/ytdl-org/youtube-dl/issues/26099)
+ [tubitv] Extract release year (https://github.com/ytdl-org/youtube-dl/issues/27317)
* [amcnetworks] Fix free content extraction (https://github.com/ytdl-org/youtube-dl/issues/20354)
+ [telequebec] Add support for video.telequebec.tv (https://github.com/ytdl-org/youtube-dl/issues/27339)
* [telequebec] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/25733, https://github.com/ytdl-org/youtube-dl/issues/26883)
* [tvplay:home] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/21153)
* [americastestkitchen] Fix Extraction and add support
for Cook's Country and Cook's Illustrated (https://github.com/ytdl-org/youtube-dl/issues/17234, https://github.com/ytdl-org/youtube-dl/issues/27322)
+ [slideslive] Add support for yoda service videos and extract subtitles
(https://github.com/ytdl-org/youtube-dl/issues/27323)
* [aenetworks] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/23363, https://github.com/ytdl-org/youtube-dl/issues/23390, https://github.com/ytdl-org/youtube-dl/issues/26795, https://github.com/ytdl-org/youtube-dl/issues/26985)
* Fix Fastly format extraction
+ Add support for play and watch subdomains
+ Extract series metadata
+ [generic] Extract RSS video description, timestamp and itunes metadata
(https://github.com/ytdl-org/youtube-dl/issues/27177)
* [nrk] Reduce the number of instalments and episodes requests
* [nrk] Improve extraction
* Improve format extraction for old akamai formats
+ Add is_live value to entry info dict
* Request instalments only when available
* Fix skole extraction
+ [peertube] Extract fps
+ [peertube] Recognize audio-only formats (https://github.com/ytdl-org/youtube-dl/issues/27295)
* [teachable:course] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/24507, https://github.com/ytdl-org/youtube-dl/issues/27286)
* [nrk] Improve error extraction
* [nrktv:series] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/21926)
* [nrktv:season] Improve extraction
* [nrk] Improve format extraction and geo-restriction detection (https://github.com/ytdl-org/youtube-dl/issues/24221)
* [pornhub] Handle HTTP errors gracefully (https://github.com/ytdl-org/youtube-dl/issues/26414)
* [nrktv] Relax URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/27299, https://github.com/ytdl-org/youtube-dl/issues/26185)
+ [zdf] Extract webm formats (https://github.com/ytdl-org/youtube-dl/issues/26659)
+ [gamespot] Extract DASH and HTTP formats
+ [tver] Add support for tver.jp (https://github.com/ytdl-org/youtube-dl/issues/26662, https://github.com/ytdl-org/youtube-dl/issues/27284)
+ [pornhub] Add support for pornhub.org (https://github.com/ytdl-org/youtube-dl/issues/27276)
+ [tva] Add support for qub.ca (https://github.com/ytdl-org/youtube-dl/issues/27235)
+ [toggle] Detect DRM protected videos (https://github.com/ytdl-org/youtube-dl/issues/16479, https://github.com/ytdl-org/youtube-dl/issues/20805)
+ [toggle] Add support for new MeWatch URLs (https://github.com/ytdl-org/youtube-dl/issues/27256)
+ [cspan] Extract info from jwplayer data (https://github.com/ytdl-org/youtube-dl/issues/3672, https://github.com/ytdl-org/youtube-dl/issues/3734, https://github.com/ytdl-org/youtube-dl/issues/10638, https://github.com/ytdl-org/youtube-dl/issues/13030,
https://github.com/ytdl-org/youtube-dl/issues/18806, https://github.com/ytdl-org/youtube-dl/issues/23148, https://github.com/ytdl-org/youtube-dl/issues/24461, https://github.com/ytdl-org/youtube-dl/issues/26171, https://github.com/ytdl-org/youtube-dl/issues/26800, https://github.com/ytdl-org/youtube-dl/issues/27263)
* [cspan] Pass Referer header with format's video URL (https://github.com/ytdl-org/youtube-dl/issues/26032, https://github.com/ytdl-org/youtube-dl/issues/25729)
+ [mediaset] Add support for movie URLs (https://github.com/ytdl-org/youtube-dl/issues/27240)
* [drtv] Extend URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/27243)
+ [ina] Add support for mobile URLs (https://github.com/ytdl-org/youtube-dl/issues/27229)
* [pornhub] Fix like and dislike count extraction (https://github.com/ytdl-org/youtube-dl/issues/27227, https://github.com/ytdl-org/youtube-dl/issues/27234)
* [youtube] Improve yt initial player response extraction (https://github.com/ytdl-org/youtube-dl/issues/27216)
* [videa] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/25650, https://github.com/ytdl-org/youtube-dl/issues/25973, https://github.com/ytdl-org/youtube-dl/issues/26301)
+ [spreaker] Add support for spreaker.com (https://github.com/ytdl-org/youtube-dl/issues/13480, https://github.com/ytdl-org/youtube-dl/issues/13877)
* [vlive] Improve extraction for geo-restricted videos
+ [vlive] Add support for post URLs (https://github.com/ytdl-org/youtube-dl/issues/27122, https://github.com/ytdl-org/youtube-dl/issues/27123)
* [viki] Fix video API request (https://github.com/ytdl-org/youtube-dl/issues/27184)
* [bbc] Fix BBC Three clip extraction
* [bbc] Fix BBC News videos extraction
+ [medaltv] Add support for medal.tv (https://github.com/ytdl-org/youtube-dl/issues/27149)
* [nrk] Fix extraction
+ [pinterest] Add support for large collections (more than 25 pins)
+ [franceinter] Extract thumbnail (https://github.com/ytdl-org/youtube-dl/issues/27153)
+ [box] Add support for box.com (https://github.com/ytdl-org/youtube-dl/issues/5949)
+ [nytimes] Add support for cooking.nytimes.com (https://github.com/ytdl-org/youtube-dl/issues/27112, https://github.com/ytdl-org/youtube-dl/issues/27143)
+ [rumble] Add support for embed pages (https://github.com/ytdl-org/youtube-dl/issues/10785)
+ [skyit] Add support for multiple Sky Italia websites (https://github.com/ytdl-org/youtube-dl/issues/26629)
+ [pinterest] Add support for pinterest.com (https://github.com/ytdl-org/youtube-dl/issues/25747)
+ [svtplay] Add support for svt.se/barnkanalen (https://github.com/ytdl-org/youtube-dl/issues/24817)
+ [svt] Extract timestamp (https://github.com/ytdl-org/youtube-dl/issues/27130)
* [svtplay] Improve thumbnail extraction (https://github.com/ytdl-org/youtube-dl/issues/27130)
* [infoq] Fix format extraction (https://github.com/ytdl-org/youtube-dl/issues/25984)
* [francetv] Update to fix thumbnail URL issue (https://github.com/ytdl-org/youtube-dl/issues/27120)
+ [discoverynetworks] Add support new TLC/DMAX URLs (https://github.com/ytdl-org/youtube-dl/issues/27100)
* [rai] Fix protocol relative relinker URLs (https://github.com/ytdl-org/youtube-dl/issues/22766)
* [rai] Fix unavailable video format detection
* [rai] Improve extraction
* [rai] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27077)
* [viki] Improve format extraction
* [viki] Fix stream extraction from MPD (https://github.com/ytdl-org/youtube-dl/issues/27092)
+ [amara] Add support for amara.org (https://github.com/ytdl-org/youtube-dl/issues/20618)
* [vimeo:album] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27079)
* [mtv] Fix mgid extraction (https://github.com/ytdl-org/youtube-dl/issues/26841)
* [youporn] Fix upload date extraction
* [youporn] Make comment count optional (https://github.com/ytdl-org/youtube-dl/issues/26986)
* [arte] Rework extractors
* Reimplement embed and playlist extractors to delegate to the single
entrypoint artetv extractor
* Improve embeds detection (https://github.com/ytdl-org/youtube-dl/issues/27057)
+ [arte] Extract m3u8 formats (https://github.com/ytdl-org/youtube-dl/issues/27061)
* [mgtv] Fix format extraction (https://github.com/ytdl-org/youtube-dl/issues/26415)
* [francetv] Improve info extraction
+ [francetv] Add fallback video URL extraction (https://github.com/ytdl-org/youtube-dl/issues/27047)
* [spiegel] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/24206, https://github.com/ytdl-org/youtube-dl/issues/24767)
* [malltv] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27035)
+ [bandcamp] Extract playlist description (https://github.com/ytdl-org/youtube-dl/issues/22684)
* [urplay] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/26828)
* [lrt] Fix extraction with empty tags (https://github.com/ytdl-org/youtube-dl/issues/20264)
+ [ndr:embed:base] Extract subtitles (https://github.com/ytdl-org/youtube-dl/issues/25447, https://github.com/ytdl-org/youtube-dl/issues/26106)
+ [servus] Add support for pm-wissen.com (https://github.com/ytdl-org/youtube-dl/issues/25869)
* [servus] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/26872, https://github.com/ytdl-org/youtube-dl/issues/26967, https://github.com/ytdl-org/youtube-dl/issues/26983, https://github.com/ytdl-org/youtube-dl/issues/27000)
* [xtube] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/26996)
* [lrt] Fix extraction
+ [condenast] Extract subtitles
* [condenast] Fix extraction
* [bandcamp] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/26681, https://github.com/ytdl-org/youtube-dl/issues/26684)
* [rai] Fix RaiPlay extraction (https://github.com/ytdl-org/youtube-dl/issues/26064, https://github.com/ytdl-org/youtube-dl/issues/26096)
* [vlive] Fix extraction
* [usanetwork] Fix extraction
* [nbc] Fix NBCNews/Today/MSNBC extraction
* [cnbc] Fix extraction
+ [transistorfm] new extractor
* [x-link] improved embed searching
+ [PolskaPress] new extractor
* [tvn24] next.js frontend extraction without playwright
version 2021.02.23
Extractor
* [youtube] new crypto

181
LICENSE
View file

@ -1,165 +1,24 @@
GNU LESSER GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
This is free and unencumbered software released into the public domain.
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
This version of the GNU Lesser General Public License incorporates
the terms and conditions of version 3 of the GNU General Public
License, supplemented by the additional permissions listed below.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
0. Additional Definitions.
As used herein, "this License" refers to version 3 of the GNU Lesser
General Public License, and the "GNU GPL" refers to version 3 of the GNU
General Public License.
"The Library" refers to a covered work governed by this License,
other than an Application or a Combined Work as defined below.
An "Application" is any work that makes use of an interface provided
by the Library, but which is not otherwise based on the Library.
Defining a subclass of a class defined by the Library is deemed a mode
of using an interface provided by the Library.
A "Combined Work" is a work produced by combining or linking an
Application with the Library. The particular version of the Library
with which the Combined Work was made is also called the "Linked
Version".
The "Minimal Corresponding Source" for a Combined Work means the
Corresponding Source for the Combined Work, excluding any source code
for portions of the Combined Work that, considered in isolation, are
based on the Application, and not on the Linked Version.
The "Corresponding Application Code" for a Combined Work means the
object code and/or source code for the Application, including any data
and utility programs needed for reproducing the Combined Work from the
Application, but excluding the System Libraries of the Combined Work.
1. Exception to Section 3 of the GNU GPL.
You may convey a covered work under sections 3 and 4 of this License
without being bound by section 3 of the GNU GPL.
2. Conveying Modified Versions.
If you modify a copy of the Library, and, in your modifications, a
facility refers to a function or data to be supplied by an Application
that uses the facility (other than as an argument passed when the
facility is invoked), then you may convey a copy of the modified
version:
a) under this License, provided that you make a good faith effort to
ensure that, in the event an Application does not supply the
function or data, the facility still operates, and performs
whatever part of its purpose remains meaningful, or
b) under the GNU GPL, with none of the additional permissions of
this License applicable to that copy.
3. Object Code Incorporating Material from Library Header Files.
The object code form of an Application may incorporate material from
a header file that is part of the Library. You may convey such object
code under terms of your choice, provided that, if the incorporated
material is not limited to numerical parameters, data structure
layouts and accessors, or small macros, inline functions and templates
(ten or fewer lines in length), you do both of the following:
a) Give prominent notice with each copy of the object code that the
Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the object code with a copy of the GNU GPL and this license
document.
4. Combined Works.
You may convey a Combined Work under terms of your choice that,
taken together, effectively do not restrict modification of the
portions of the Library contained in the Combined Work and reverse
engineering for debugging such modifications, if you also do each of
the following:
a) Give prominent notice with each copy of the Combined Work that
the Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the Combined Work with a copy of the GNU GPL and this license
document.
c) For a Combined Work that displays copyright notices during
execution, include the copyright notice for the Library among
these notices, as well as a reference directing the user to the
copies of the GNU GPL and this license document.
d) Do one of the following:
0) Convey the Minimal Corresponding Source under the terms of this
License, and the Corresponding Application Code in a form
suitable for, and under terms that permit, the user to
recombine or relink the Application with a modified version of
the Linked Version to produce a modified Combined Work, in the
manner specified by section 6 of the GNU GPL for conveying
Corresponding Source.
1) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (a) uses at run time
a copy of the Library already present on the user's computer
system, and (b) will operate properly with a modified version
of the Library that is interface-compatible with the Linked
Version.
e) Provide Installation Information, but only if you would otherwise
be required to provide such information under section 6 of the
GNU GPL, and only to the extent that such information is
necessary to install and execute a modified version of the
Combined Work produced by recombining or relinking the
Application with a modified version of the Linked Version. (If
you use option 4d0, the Installation Information must accompany
the Minimal Corresponding Source and Corresponding Application
Code. If you use option 4d1, you must provide the Installation
Information in the manner specified by section 6 of the GNU GPL
for conveying Corresponding Source.)
5. Combined Libraries.
You may place library facilities that are a work based on the
Library side by side in a single library together with other library
facilities that are not Applications and are not covered by this
License, and convey such a combined library under terms of your
choice, if you do both of the following:
a) Accompany the combined library with a copy of the same work based
on the Library, uncombined with any other library facilities,
conveyed under the terms of this License.
b) Give prominent notice with the combined library that part of it
is a work based on the Library, and explaining where to find the
accompanying uncombined form of the same work.
6. Revised Versions of the GNU Lesser General Public License.
The Free Software Foundation may publish revised and/or new versions
of the GNU Lesser General Public License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the
Library as you received it specifies that a certain numbered version
of the GNU Lesser General Public License "or any later version"
applies to it, you have the option of following the terms and
conditions either of that published version or of any later version
published by the Free Software Foundation. If the Library as you
received it does not specify a version number of the GNU Lesser
General Public License, you may choose any version of the GNU Lesser
General Public License ever published by the Free Software Foundation.
If the Library as you received it specifies that a proxy can decide
whether future versions of the GNU Lesser General Public License shall
apply, that proxy's public statement of acceptance of any version is
permanent authorization for you to choose that version for the
Library.
For more information, please refer to <http://unlicense.org/>

View file

@ -9,7 +9,7 @@ PREFIX ?= /usr/local
BINDIR ?= $(PREFIX)/bin
MANDIR ?= $(PREFIX)/man
SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python
PYTHON ?= /usr/bin/env python3
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)

View file

@ -2,10 +2,11 @@
[![build status](https://img.shields.io/gitlab/pipeline/laudom/haruhi-dl/master?gitlab_url=https%3A%2F%2Fgit.sakamoto.pl&style=flat-square)](https://git.sakamoto.pl/laudom/haruhi-dl/-/pipelines)
[![PyPI Downloads](https://img.shields.io/pypi/dm/haruhi-dl?style=flat-square)](https://pypi.org/project/haruhi-dl/)
[![License: LGPL 3.0 or later](https://img.shields.io/pypi/l/haruhi-dl?style=flat-square)](https://git.sakamoto.pl/laudom/haruhi-dl/-/blob/master/README.md)
[![Sasin stole 70 million PLN](https://img.shields.io/badge/Sasin-stole%2070%20million%20PLN-orange?style=flat-square)](https://www.planeta.pl/Wiadomosci/Polityka/Ile-kosztowaly-karty-wyborcze-Sasin-do-wiezienia-Wybory-odwolane)
[![Trans rights!](https://img.shields.io/badge/Trans-rights!-5BCEFA?style=flat-square)](http://transfuzja.org/en/artykuly/trans_people_in_poland/situation.htm)
# This project has ended. Our forces have moved into contributing to [yt-dlp](https://github.com/yt-dlp/yt-dlp).
This is a fork of [youtube-dl](https://yt-dl.org/), focused on bringing a fast, steady stream of updates. We'll do our best to merge patches to any site, not only youtube.
Our main repository is on our GitLab: https://git.sakamoto.pl/laudompat/haruhi-dl
@ -14,30 +15,79 @@ A Microsoft GitHub mirror exists as well: https://github.com/haruhi-dl/haruhi-dl
## Installing
haruhi-dl is available on PyPI: [![version on PyPI](https://img.shields.io/pypi/v/haruhi-dl?style=flat-square)](https://pypi.org/project/haruhi-dl/)
System-specific ways:
- [Windows .exe files](https://git.sakamoto.pl/laudompat/haruhi-dl/-/releases) ([mirror](https://github.com/haruhi-dl/haruhi-dl/releases)) - just unpack and run the exe file in cmd/powershell! (ffmpeg/rtmpdump not included, playwright extractors won't work)
- [Arch Linux (AUR)](https://aur.archlinux.org/packages/haruhi-dl/) - `yay -S haruhi-dl` (managed by mlunax)
- [macOS (homebrew)](https://formulae.brew.sh/formula/haruhi-dl) - `brew install haruhi-dl` (managed by Homebrew)
haruhi-dl is also available on PyPI: [![version on PyPI](https://img.shields.io/pypi/v/haruhi-dl?style=flat-square)](https://pypi.org/project/haruhi-dl/)
Install release from PyPI on Python 3.x:
```sh
$ python3 -m pip install --upgrade haruhi-dl
```
Install from master (unstable) on Python 3.x:
```sh
$ python3 -m pip install --upgrade git+https://git.sakamoto.pl/laudompat/haruhi-dl.git
```
**Python 2 support is dropped and we recommend to switch to Python 3**, though it may still work.
**Python 2 support is dropped, use Python 3.**
## Usage
```sh
$ haruhi-dl "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
```
That's it! You just got rickrolled!
Full manual with all options:
```sh
$ haruhi-dl --help
```
## Differences from youtube-dl
_This is not a complete list._
- Extracting and downloading video with subtitles from m3u8 (HLS) - this also includes subtitles from Twitter and some other services
- Support for BitTorrent protocol (only used when explicitly enabled by user with `--allow-p2p` or `--prefer-p2p`; aria2c required)
- Specific way to handle selfhosted services (untied to specific providers/domains, like PeerTube, Funkwhale, Mastodon)
- Specific way to handle content proxy sites (like Nitter for Twitter)
- Merging formats by codecs instead of file extensions, if possible (you'd rather like your AV1+opus downloads from YouTube to be .webm, than .mkv, don't you?)
- New/improved/fixed extractors:
- PeerTube (extracting playlists, channels and user accounts, optionally downloading with BitTorrent)
- Funkwhale
- TikTok (extractors for user profiles, hashtags and music - all except single video and music with `--no-playlist` require Playwright)
- cda.pl
- Ipla
- Weibo (DASH formats)
- LinkedIn (videos from user posts)
- Acast
- Mastodon (including Pleroma, Gab Social, Soapbox)
- Ring Publishing (aka PulsEmbed, PulseVideo, OnetMVP; Ringier Axel Springer)
- TVP (support for TVPlayer2, client-rendered sites and TVP ABC, refactored some extractors to use mobile JSON API)
- TVN24 (support for main page, Fakty and magazine frontend)
- PolskieRadio
- Agora (wyborcza.pl video, wyborcza.pl/wysokieobcasy.pl/audycje.tokfm.pl podcasts, tuba.fm)
- sejm.gov.pl/senat.gov.pl
- Some improvements with handling JSON-LD
## Bug reports
Please send the bug details to <bug@haruhi.download> or on [Microsoft GitHub](https://github.com/haruhi-dl/haruhi-dl/issues).
## Contributing
If you want to contribute, send us a diff to <contribute@haruhi.download>, or submit a Pull Request on [our mirror at Microsoft GitHub](https://github.com/haruhi-dl/haruhi-dl).
The project has ended. As an alternative, use [yt-dlp](https://github.com/yt-dlp/yt-dlp) - we're going to contribute there from now on :3
## Donations
If my contributions helped you, please consider sending me a small tip.
[![Buy Me a Coffee at ko-fi.com](https://cdn.ko-fi.com/cdn/kofi1.png?v=2)](https://ko-fi.com/selfisekai)

View file

@ -1,6 +1,6 @@
#!/bin/bash
data="$(curl -s "https://www.youtube.com/s/player/$1/player_ias.vflset/en_GB/base.js")"
func="$(grep -P '[a-z]\=a\.split.*a\.join' <<< "$data")"
func="$(grep -P '[a-z]\=a\.split\([\"'"'"']{2}.*a\.join' <<< "$data")"
echo "full extracted function: $func"
obfuscatedName="$(grep -Poh '\(""\);[A-Za-z]+' <<< "$func" | sed -s 's/("");//')"

View file

@ -1,433 +0,0 @@
#!/usr/bin/python3
import argparse
import ctypes
import functools
import shutil
import subprocess
import sys
import tempfile
import threading
import traceback
import os.path
sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__)))))
from haruhi_dl.compat import (
compat_input,
compat_http_server,
compat_str,
compat_urlparse,
)
# These are not used outside of buildserver.py thus not in compat.py
try:
import winreg as compat_winreg
except ImportError: # Python 2
import _winreg as compat_winreg
try:
import socketserver as compat_socketserver
except ImportError: # Python 2
import SocketServer as compat_socketserver
class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer):
allow_reuse_address = True
advapi32 = ctypes.windll.advapi32
SC_MANAGER_ALL_ACCESS = 0xf003f
SC_MANAGER_CREATE_SERVICE = 0x02
SERVICE_WIN32_OWN_PROCESS = 0x10
SERVICE_AUTO_START = 0x2
SERVICE_ERROR_NORMAL = 0x1
DELETE = 0x00010000
SERVICE_STATUS_START_PENDING = 0x00000002
SERVICE_STATUS_RUNNING = 0x00000004
SERVICE_ACCEPT_STOP = 0x1
SVCNAME = 'youtubedl_builder'
LPTSTR = ctypes.c_wchar_p
START_CALLBACK = ctypes.WINFUNCTYPE(None, ctypes.c_int, ctypes.POINTER(LPTSTR))
class SERVICE_TABLE_ENTRY(ctypes.Structure):
_fields_ = [
('lpServiceName', LPTSTR),
('lpServiceProc', START_CALLBACK)
]
HandlerEx = ctypes.WINFUNCTYPE(
ctypes.c_int, # return
ctypes.c_int, # dwControl
ctypes.c_int, # dwEventType
ctypes.c_void_p, # lpEventData,
ctypes.c_void_p, # lpContext,
)
def _ctypes_array(c_type, py_array):
ar = (c_type * len(py_array))()
ar[:] = py_array
return ar
def win_OpenSCManager():
res = advapi32.OpenSCManagerW(None, None, SC_MANAGER_ALL_ACCESS)
if not res:
raise Exception('Opening service manager failed - '
'are you running this as administrator?')
return res
def win_install_service(service_name, cmdline):
manager = win_OpenSCManager()
try:
h = advapi32.CreateServiceW(
manager, service_name, None,
SC_MANAGER_CREATE_SERVICE, SERVICE_WIN32_OWN_PROCESS,
SERVICE_AUTO_START, SERVICE_ERROR_NORMAL,
cmdline, None, None, None, None, None)
if not h:
raise OSError('Service creation failed: %s' % ctypes.FormatError())
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_uninstall_service(service_name):
manager = win_OpenSCManager()
try:
h = advapi32.OpenServiceW(manager, service_name, DELETE)
if not h:
raise OSError('Could not find service %s: %s' % (
service_name, ctypes.FormatError()))
try:
if not advapi32.DeleteService(h):
raise OSError('Deletion failed: %s' % ctypes.FormatError())
finally:
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_service_report_event(service_name, msg, is_error=True):
with open('C:/sshkeys/log', 'a', encoding='utf-8') as f:
f.write(msg + '\n')
event_log = advapi32.RegisterEventSourceW(None, service_name)
if not event_log:
raise OSError('Could not report event: %s' % ctypes.FormatError())
try:
type_id = 0x0001 if is_error else 0x0004
event_id = 0xc0000000 if is_error else 0x40000000
lines = _ctypes_array(LPTSTR, [msg])
if not advapi32.ReportEventW(
event_log, type_id, 0, event_id, None, len(lines), 0,
lines, None):
raise OSError('Event reporting failed: %s' % ctypes.FormatError())
finally:
advapi32.DeregisterEventSource(event_log)
def win_service_handler(stop_event, *args):
try:
raise ValueError('Handler called with args ' + repr(args))
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_set_status(handle, status_code):
svcStatus = SERVICE_STATUS()
svcStatus.dwServiceType = SERVICE_WIN32_OWN_PROCESS
svcStatus.dwCurrentState = status_code
svcStatus.dwControlsAccepted = SERVICE_ACCEPT_STOP
svcStatus.dwServiceSpecificExitCode = 0
if not advapi32.SetServiceStatus(handle, ctypes.byref(svcStatus)):
raise OSError('SetServiceStatus failed: %r' % ctypes.FormatError())
def win_service_main(service_name, real_main, argc, argv_raw):
try:
# args = [argv_raw[i].value for i in range(argc)]
stop_event = threading.Event()
handler = HandlerEx(functools.partial(stop_event, win_service_handler))
h = advapi32.RegisterServiceCtrlHandlerExW(service_name, handler, None)
if not h:
raise OSError('Handler registration failed: %s' %
ctypes.FormatError())
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_start(service_name, real_main):
try:
cb = START_CALLBACK(
functools.partial(win_service_main, service_name, real_main))
dispatch_table = _ctypes_array(SERVICE_TABLE_ENTRY, [
SERVICE_TABLE_ENTRY(
service_name,
cb
),
SERVICE_TABLE_ENTRY(None, ctypes.cast(None, START_CALLBACK))
])
if not advapi32.StartServiceCtrlDispatcherW(dispatch_table):
raise OSError('ctypes start failed: %s' % ctypes.FormatError())
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def main(args=None):
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--install',
action='store_const', dest='action', const='install',
help='Launch at Windows startup')
parser.add_argument('-u', '--uninstall',
action='store_const', dest='action', const='uninstall',
help='Remove Windows service')
parser.add_argument('-s', '--service',
action='store_const', dest='action', const='service',
help='Run as a Windows service')
parser.add_argument('-b', '--bind', metavar='<host:port>',
action='store', default='0.0.0.0:8142',
help='Bind to host:port (default %default)')
options = parser.parse_args(args=args)
if options.action == 'install':
fn = os.path.abspath(__file__).replace('v:', '\\\\vboxsrv\\vbox')
cmdline = '%s %s -s -b %s' % (sys.executable, fn, options.bind)
win_install_service(SVCNAME, cmdline)
return
if options.action == 'uninstall':
win_uninstall_service(SVCNAME)
return
if options.action == 'service':
win_service_start(SVCNAME, main)
return
host, port_str = options.bind.split(':')
port = int(port_str)
print('Listening on %s:%d' % (host, port))
srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler)
thr = threading.Thread(target=srv.serve_forever)
thr.start()
compat_input('Press ENTER to shut down')
srv.shutdown()
thr.join()
def rmtree(path):
for name in os.listdir(path):
fname = os.path.join(path, name)
if os.path.isdir(fname):
rmtree(fname)
else:
os.chmod(fname, 0o666)
os.remove(fname)
os.rmdir(path)
class BuildError(Exception):
def __init__(self, output, code=500):
self.output = output
self.code = code
def __str__(self):
return self.output
class HTTPError(BuildError):
pass
class PythonBuilder(object):
def __init__(self, **kwargs):
python_version = kwargs.pop('python', '3.4')
python_path = None
for node in ('Wow6432Node\\', ''):
try:
key = compat_winreg.OpenKey(
compat_winreg.HKEY_LOCAL_MACHINE,
r'SOFTWARE\%sPython\PythonCore\%s\InstallPath' % (node, python_version))
try:
python_path, _ = compat_winreg.QueryValueEx(key, '')
finally:
compat_winreg.CloseKey(key)
break
except Exception:
pass
if not python_path:
raise BuildError('No such Python version: %s' % python_version)
self.pythonPath = python_path
super(PythonBuilder, self).__init__(**kwargs)
class GITInfoBuilder(object):
def __init__(self, **kwargs):
try:
self.user, self.repoName = kwargs['path'][:2]
self.rev = kwargs.pop('rev')
except ValueError:
raise BuildError('Invalid path')
except KeyError as e:
raise BuildError('Missing mandatory parameter "%s"' % e.args[0])
path = os.path.join(os.environ['APPDATA'], 'Build archive', self.repoName, self.user)
if not os.path.exists(path):
os.makedirs(path)
self.basePath = tempfile.mkdtemp(dir=path)
self.buildPath = os.path.join(self.basePath, 'build')
super(GITInfoBuilder, self).__init__(**kwargs)
class GITBuilder(GITInfoBuilder):
def build(self):
try:
subprocess.check_output(['git', 'clone', 'git://github.com/%s/%s.git' % (self.user, self.repoName), self.buildPath])
subprocess.check_output(['git', 'checkout', self.rev], cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(GITBuilder, self).build()
class HaruhiDLBuilder(object):
authorizedUsers = ['fraca7', 'phihag', 'rg3', 'FiloSottile', 'ytdl-org']
def __init__(self, **kwargs):
if self.repoName != 'haruhi-dl':
raise BuildError('Invalid repository "%s"' % self.repoName)
if self.user not in self.authorizedUsers:
raise HTTPError('Unauthorized user "%s"' % self.user, 401)
super(HaruhiDLBuilder, self).__init__(**kwargs)
def build(self):
try:
proc = subprocess.Popen([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'], stdin=subprocess.PIPE, cwd=self.buildPath)
proc.wait()
#subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
# cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(HaruhiDLBuilder, self).build()
class DownloadBuilder(object):
def __init__(self, **kwargs):
self.handler = kwargs.pop('handler')
self.srcPath = os.path.join(self.buildPath, *tuple(kwargs['path'][2:]))
self.srcPath = os.path.abspath(os.path.normpath(self.srcPath))
if not self.srcPath.startswith(self.buildPath):
raise HTTPError(self.srcPath, 401)
super(DownloadBuilder, self).__init__(**kwargs)
def build(self):
if not os.path.exists(self.srcPath):
raise HTTPError('No such file', 404)
if os.path.isdir(self.srcPath):
raise HTTPError('Is a directory: %s' % self.srcPath, 401)
self.handler.send_response(200)
self.handler.send_header('Content-Type', 'application/octet-stream')
self.handler.send_header('Content-Disposition', 'attachment; filename=%s' % os.path.split(self.srcPath)[-1])
self.handler.send_header('Content-Length', str(os.stat(self.srcPath).st_size))
self.handler.end_headers()
with open(self.srcPath, 'rb') as src:
shutil.copyfileobj(src, self.handler.wfile)
super(DownloadBuilder, self).build()
class CleanupTempDir(object):
def build(self):
try:
rmtree(self.basePath)
except Exception as e:
print('WARNING deleting "%s": %s' % (self.basePath, e))
super(CleanupTempDir, self).build()
class Null(object):
def __init__(self, **kwargs):
pass
def start(self):
pass
def close(self):
pass
def build(self):
pass
class Builder(PythonBuilder, GITBuilder, HaruhiDLBuilder, DownloadBuilder, CleanupTempDir, Null):
pass
class BuildHTTPRequestHandler(compat_http_server.BaseHTTPRequestHandler):
actionDict = {'build': Builder, 'download': Builder} # They're the same, no more caching.
def do_GET(self):
path = compat_urlparse.urlparse(self.path)
paramDict = dict([(key, value[0]) for key, value in compat_urlparse.parse_qs(path.query).items()])
action, _, path = path.path.strip('/').partition('/')
if path:
path = path.split('/')
if action in self.actionDict:
try:
builder = self.actionDict[action](path=path, handler=self, **paramDict)
builder.start()
try:
builder.build()
finally:
builder.close()
except BuildError as e:
self.send_response(e.code)
msg = compat_str(e).encode('UTF-8')
self.send_header('Content-Type', 'text/plain; charset=UTF-8')
self.send_header('Content-Length', len(msg))
self.end_headers()
self.wfile.write(msg)
else:
self.send_response(500, 'Unknown build method "%s"' % action)
else:
self.send_response(500, 'Malformed URL')
if __name__ == '__main__':
main()

View file

@ -4,16 +4,31 @@
module.exports = function patchHook(patchContent) {
[
[/(?:youtube-|yt-?)dl\.org/g, 'haruhi.download'],
// fork: https://github.com/blackjack4494/yt-dlc
[/youtube_dlc/g, 'haruhi_dl'],
[/youtube-dlc/g, 'haruhi-dl'],
[/ytdlc/g, 'hdl'],
[/yt-dlc/g, 'hdl'],
// fork: https://github.com/yt-dlp/yt-dlp
[/yt_dlp/g, 'haruhi_dl'],
[/yt-dlp/g, 'haruhi-dl'],
[/ytdlp/g, 'hdl'],
[/youtube_dl/g, 'haruhi_dl'],
[/youtube-dl/g, 'haruhi-dl'],
[/youtubedl/g, 'haruhidl'],
[/YoutubeDL/g, 'HaruhiDL'],
[/ytdl/g, 'hdl'],
[/(?:youtube-|yt-?)dl\.org/g, 'haruhi.download'],
[/yt-dl/g, 'h-dl'],
[/ydl/g, 'hdl'],
// prevent from linking to non-existent repository
[/github\.com\/ytdl-org\/haruhi-dl/g, 'github.com/ytdl-org/youtube-dl'],
[/github\.com\/(?:yt|h)dl-org\/haruhi-dl/g, 'github.com/ytdl-org/youtube-dl'],
[/github\.com\/rg3\/haruhi-dl/g, 'github.com/ytdl-org/youtube-dl'],
[/github\.com\/blackjack4494\/hdl/g, 'github.com/blackjack4494/yt-dlc'],
[/github\.com\/hdl\/hdl/g, 'github.com/yt-dlp/yt-dlp'],
// prevent changing the smuggle URLs (for compatibility with ytdl)
[/__haruhidl_smuggle/g, '__youtubedl_smuggle'],
].forEach(([regex, replacement]) => patchContent = patchContent.replace(regex, replacement));

View file

@ -1,5 +0,0 @@
#!/bin/bash
wget https://repo1.maven.org/maven2/org/python/jython-installer/2.7.2/jython-installer-2.7.2.jar
java -jar jython-installer-2.7.2.jar -s -d "$HOME/jython"
$HOME/jython/bin/jython -m pip install nose

View file

@ -1,33 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
import re
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
readme = inf.read()
bug_text = re.search(
r'(?s)#\s*BUGS\s*[^\n]*\s*(.*?)#\s*COPYRIGHT', readme).group(1)
dev_text = re.search(
r'(?s)(#\s*DEVELOPER INSTRUCTIONS.*?)#\s*EMBEDDING YOUTUBE-DL',
readme).group(1)
out = bug_text + dev_text
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View file

@ -1,29 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
issue_template_tmpl = inf.read()
# Get the version from haruhi_dl/version.py without importing the package
exec(compile(open('haruhi_dl/version.py').read(),
'haruhi_dl/version.py', 'exec'))
out = issue_template_tmpl % {'version': locals()['__version__']}
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View file

@ -77,7 +77,7 @@ def build_lazy_ie(ie, name):
return s
# find the correct sorting and add the required base classes so that sublcasses
# find the correct sorting and add the required base classes so that subclasses
# can be correctly created
classes = _ALL_CLASSES[:-1]
ordered_cls = []

View file

@ -1,26 +0,0 @@
from __future__ import unicode_literals
import io
import sys
import re
README_FILE = 'README.md'
helptext = sys.stdin.read()
if isinstance(helptext, bytes):
helptext = helptext.decode('utf-8')
with io.open(README_FILE, encoding='utf-8') as f:
oldreadme = f.read()
header = oldreadme[:oldreadme.index('# OPTIONS')]
footer = oldreadme[oldreadme.index('# CONFIGURATION'):]
options = helptext[helptext.index(' General Options:') + 19:]
options = re.sub(r'(?m)^ (\w.+)$', r'## \1', options)
options = '# OPTIONS\n' + options + '\n'
with io.open(README_FILE, 'w', encoding='utf-8') as f:
f.write(header)
f.write(options)
f.write(footer)

View file

@ -1,46 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
import os
import sys
# Import haruhi_dl
ROOT_DIR = os.path.join(os.path.dirname(__file__), '..')
sys.path.insert(0, ROOT_DIR)
import haruhi_dl
def main():
parser = optparse.OptionParser(usage='%prog OUTFILE.md')
options, args = parser.parse_args()
if len(args) != 1:
parser.error('Expected an output filename')
outfile, = args
def gen_ies_md(ies):
for ie in ies:
ie_md = '**{0}**'.format(ie.IE_NAME)
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
continue
if ie_desc is not None:
ie_md += ': {0}'.format(ie.IE_DESC)
if not ie.working():
ie_md += ' (Currently broken)'
yield ie_md
ies = sorted(haruhi_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower())
out = '# Supported sites\n' + ''.join(
' - ' + md + '\n'
for md in gen_ies_md(ies))
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View file

@ -0,0 +1,32 @@
# this is intended to speed-up some extractors,
# which sometimes need to extract some data that doesn't change very much often,
# but it does on random times, like youtube's signature "crypto" or soundcloud's client id
import os
from os.path import dirname as dirn
import sys
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
from haruhi_dl import HaruhiDL
from haruhi_dl.utils import (
ExtractorError,
)
hdl = HaruhiDL(params={
'quiet': True,
})
artifact_dir = os.path.join(dirn(dirn((os.path.abspath(__file__)))), 'haruhi_dl', 'extractor_artifacts')
if not os.path.exists(artifact_dir):
os.mkdir(artifact_dir)
for ie_name in (
'Youtube',
'Soundcloud',
):
ie = hdl.get_info_extractor(ie_name)
try:
file_contents = ie._generate_prerelease_file()
with open(os.path.join(artifact_dir, ie_name.lower() + '.py'), 'w') as file:
file.write(file_contents)
except ExtractorError as err:
print(err)

View file

@ -1,141 +1,24 @@
#!/bin/bash
# IMPORTANT: the following assumptions are made
# * the GH repo is on the origin remote
# * the gh-pages branch is named so locally
# * the git config user.signingkey is properly set
# You will need
# pip install coverage nose rsa wheel
# TODO
# release notes
# make hash on local files
set -e
skip_tests=true
gpg_sign_commits=""
buildserver='localhost:8142'
while true
do
case "$1" in
--run-tests)
skip_tests=false
shift
;;
--gpg-sign-commits|-S)
gpg_sign_commits="-S"
shift
;;
--buildserver)
buildserver="$2"
shift 2
;;
--*)
echo "ERROR: unknown option $1"
exit 1
;;
*)
break
;;
esac
done
if [ -z "$1" ]; then echo "ERROR: specify version number like this: $0 1994.09.06"; exit 1; fi
version="$1"
major_version=$(echo "$version" | sed -n 's#^\([0-9]*\.[0-9]*\.[0-9]*\).*#\1#p')
if test "$major_version" '!=' "$(date '+%Y.%m.%d')"; then
echo "$version does not start with today's date!"
exit 1
if [[ "$(basename $(pwd))" == 'devscripts' ]]; then
cd ..
fi
if [ ! -z "`git tag | grep "$version"`" ]; then echo 'ERROR: version already present'; exit 1; fi
if [ ! -z "`git status --porcelain | grep -v CHANGELOG`" ]; then echo 'ERROR: the working directory is not clean; commit or stash changes'; exit 1; fi
useless_files=$(find haruhi_dl -type f -not -name '*.py')
if [ ! -z "$useless_files" ]; then echo "ERROR: Non-.py files in haruhi_dl: $useless_files"; exit 1; fi
if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi
if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; exit 1; fi
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
v="$(date "+%Y.%m.%d")"
read -p "Is ChangeLog up to date? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
/bin/echo -e "\n### First of all, testing..."
make clean
if $skip_tests ; then
echo 'SKIPPING TESTS'
else
nosetests --verbose --with-coverage --cover-package=haruhi_dl --cover-html test --stop || exit 1
if [[ "$(grep "'$v" haruhi_dl/version.py)" != '' ]]; then #' is this the first release of the day?
if [[ "$(grep -Poh '[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.[0-9]' haruhi_dl/version.py)" != '' ]]; then # so, 2nd or nth?
v="$v.$(($(cat haruhi_dl/version.py | grep -Poh '[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.[0-9]' | grep -Poh '[0-9]+$')+1))"
else
v="$v.1"
fi
fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" haruhi_dl/version.py
sed "s/__version__ = '.*'/__version__ = '$v'/g" -i haruhi_dl/version.py
/bin/echo -e "\n### Changing version in ChangeLog..."
sed -i "s/<unreleased>/$version/" ChangeLog
/bin/echo -e "\n### Committing documentation, templates and haruhi_dl/version.py..."
make README.md CONTRIBUTING.md issuetemplates supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE/1_broken_site.md .github/ISSUE_TEMPLATE/2_site_support_request.md .github/ISSUE_TEMPLATE/3_site_feature_request.md .github/ISSUE_TEMPLATE/4_bug_report.md .github/ISSUE_TEMPLATE/5_feature_request.md .github/ISSUE_TEMPLATE/6_question.md docs/supportedsites.md haruhi_dl/version.py ChangeLog
git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."
git tag -s -m "Release $version" "$version"
git show "$version"
read -p "Is it good, can I push? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
echo
MASTER=$(git rev-parse --abbrev-ref HEAD)
git push origin $MASTER:master
git push origin "$version"
/bin/echo -e "\n### OK, now it is time to build the binaries..."
REV=$(git rev-parse HEAD)
make haruhi-dl haruhi-dl.tar.gz
read -p "VM running? (y/n) " -n 1
wget "http://$buildserver/build/ytdl-org/haruhi-dl/haruhi-dl.exe?rev=$REV" -O haruhi-dl.exe
mkdir -p "build/$version"
mv haruhi-dl haruhi-dl.exe "build/$version"
mv haruhi-dl.tar.gz "build/$version/haruhi-dl-$version.tar.gz"
RELEASE_FILES="haruhi-dl haruhi-dl.exe haruhi-dl-$version.tar.gz"
(cd build/$version/ && md5sum $RELEASE_FILES > MD5SUMS)
(cd build/$version/ && sha1sum $RELEASE_FILES > SHA1SUMS)
(cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS)
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
/bin/echo -e "\n### Signing and uploading the new binaries to GitHub..."
for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
ROOT=$(pwd)
python devscripts/create-github-release.py ChangeLog $version "$ROOT/build/$version"
#ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
/bin/echo -e "\n### Now switching to gh-pages..."
git clone --branch gh-pages --single-branch . build/gh-pages
(
set -e
ORIGIN_URL=$(git config --get remote.origin.url)
cd build/gh-pages
"$ROOT/devscripts/gh-pages/add-version.py" $version
"$ROOT/devscripts/gh-pages/update-feed.py"
"$ROOT/devscripts/gh-pages/sign-versions.py" < "$ROOT/updates_key.pem"
"$ROOT/devscripts/gh-pages/generate-download.py"
"$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update
git commit $gpg_sign_commits -m "release $version"
git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages
)
rm -rf build
make pypi-files
echo "Uploading to PyPi ..."
python setup.py sdist bdist_wheel upload
make clean
/bin/echo -e "\n### DONE!"
python3 setup.py build_lazy_extractors
python3 devscripts/prerelease_codegen.py
rm -R build dist
python3 setup.py sdist bdist_wheel
python3 -m twine upload dist/*
devscripts/wine-py2exe.sh setup.py

View file

@ -2,7 +2,8 @@
# Run with as parameter a setup.py that works in the current directory
# e.g. no os.chdir()
# It will run twice, the first time will crash
# Wine >=6.3 required: https://bugs.winehq.org/show_bug.cgi?id=3591
set -e
@ -10,36 +11,30 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
if [ ! -d wine-py2exe ]; then
sudo apt-get install wine1.3 axel bsdiff
mkdir wine-py2exe
cd wine-py2exe
export WINEPREFIX=`pwd`
axel -a "http://www.python.org/ftp/python/2.7/python-2.7.msi"
axel -a "http://downloads.sourceforge.net/project/py2exe/py2exe/0.6.9/py2exe-0.6.9.win32-py2.7.exe"
#axel -a "http://winetricks.org/winetricks"
echo "Downloading Python 3.8.8"
aria2c "https://www.python.org/ftp/python/3.8.8/python-3.8.8.exe"
# this will need to be upgraded when switching to a newer version of python
winetricks win7
# http://appdb.winehq.org/objectManager.php?sClass=version&iId=21957
echo "Follow python setup on screen"
wine msiexec /i python-2.7.msi
echo "Installing Python 3.8.8"
wine python-3.8.8.exe /quiet InstallAllUsers=1 'DefaultAllUsersTargetDir=C:\\python38'
echo "Follow py2exe setup on screen"
wine py2exe-0.6.9.win32-py2.7.exe
echo "Installing py2exe"
wine 'C:\\python38\\python.exe' -m pip install wheel
wine 'C:\\python38\\python.exe' -m pip install py2exe
#wine 'C:\\python38\\python.exe' -m pip install playwright===1.9.0
#wine 'C:\\python38\\python.exe' -m playwright install
#echo "Follow Microsoft Visual C++ 2008 Redistributable Package setup on screen"
#bash winetricks vcrun2008
rm py2exe-0.6.9.win32-py2.7.exe
rm python-2.7.msi
#rm winetricks
# http://bugs.winehq.org/show_bug.cgi?id=3591
mv drive_c/Python27/Lib/site-packages/py2exe/run.exe drive_c/Python27/Lib/site-packages/py2exe/run.exe.backup
bspatch drive_c/Python27/Lib/site-packages/py2exe/run.exe.backup drive_c/Python27/Lib/site-packages/py2exe/run.exe "$SCRIPT_DIR/SizeOfImage.patch"
mv drive_c/Python27/Lib/site-packages/py2exe/run_w.exe drive_c/Python27/Lib/site-packages/py2exe/run_w.exe.backup
bspatch drive_c/Python27/Lib/site-packages/py2exe/run_w.exe.backup drive_c/Python27/Lib/site-packages/py2exe/run_w.exe "$SCRIPT_DIR/SizeOfImage_w.patch"
rm python-3.8.8.exe
cd -
@ -49,8 +44,8 @@ else
fi
wine "C:\\Python27\\python.exe" "$1" py2exe > "py2exe.log" 2>&1 || true
echo '# Copying python27.dll' >> "py2exe.log"
cp "$WINEPREFIX/drive_c/windows/system32/python27.dll" build/bdist.win32/winexe/bundle-2.7/
wine "C:\\Python27\\python.exe" "$1" py2exe >> "py2exe.log" 2>&1
mkdir -p build/bdist.win32/winexe/bundle-3.8/
# cp "$WINEPREFIX/drive_c/python38/python38.dll" build/bdist.win32/winexe/bundle-3.8/
echo "Making the exe file"
# cannot be piped into a file: https://forum.winehq.org/viewtopic.php?t=33992
wine 'C:\\python38\\python.exe' "$1" py2exe | tee py2exe.log

1
docs/.gitignore vendored
View file

@ -1 +0,0 @@
_build/

View file

@ -1,177 +0,0 @@
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
clean:
rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/haruhi-dl.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/haruhi-dl.qhc"
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/haruhi-dl"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/haruhi-dl"
@echo "# devhelp"
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."

View file

@ -1,71 +0,0 @@
# coding: utf-8
#
# haruhi-dl documentation build configuration file, created by
# sphinx-quickstart on Fri Mar 14 21:05:43 2014.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys
import os
# Allows to import haruhi_dl
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# -- General configuration ------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'haruhi-dl'
copyright = u'2014, Ricardo Garcia Gonzalez'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
from haruhi_dl.version import __version__
version = __version__
# The full version, including alpha/beta/rc tags.
release = version
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'default'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Output file base name for HTML help builder.
htmlhelp_basename = 'haruhi-dldoc'

View file

@ -1,23 +0,0 @@
Welcome to haruhi-dl's documentation!
======================================
*haruhi-dl* is a command-line program to download videos from YouTube.com and more sites.
It can also be used in Python code.
Developer guide
---------------
This section contains information for using *haruhi-dl* from Python programs.
.. toctree::
:maxdepth: 2
module_guide
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View file

@ -1,67 +0,0 @@
Using the ``haruhi_dl`` module
===============================
When using the ``haruhi_dl`` module, you start by creating an instance of :class:`HaruhiDL` and adding all the available extractors:
.. code-block:: python
>>> from haruhi_dl import HaruhiDL
>>> hdl = HaruhiDL()
>>> hdl.add_default_info_extractors()
Extracting video information
----------------------------
You use the :meth:`HaruhiDL.extract_info` method for getting the video information, which returns a dictionary:
.. code-block:: python
>>> info = hdl.extract_info('http://www.youtube.com/watch?v=BaW_jenozKc', download=False)
[youtube] Setting language
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
>>> info['title']
'haruhi-dl test video "\'/\\ä↭𝕐'
>>> info['height'], info['width']
(720, 1280)
If you want to download or play the video you can get its url:
.. code-block:: python
>>> info['url']
'https://...'
Extracting playlist information
-------------------------------
The playlist information is extracted in a similar way, but the dictionary is a bit different:
.. code-block:: python
>>> playlist = hdl.extract_info('http://www.ted.com/playlists/13/open_source_open_world', download=False)
[TED] open_source_open_world: Downloading playlist webpage
...
>>> playlist['title']
'Open-source, open world'
You can access the videos in the playlist with the ``entries`` field:
.. code-block:: python
>>> for video in playlist['entries']:
... print('Video #%d: %s' % (video['playlist_index'], video['title']))
Video #1: How Arduino is open-sourcing imagination
Video #2: The year open data went worldwide
Video #3: Massive-scale online collaboration
Video #4: The art of asking
Video #5: How cognitive surplus will change the world
Video #6: The birth of Wikipedia
Video #7: Coding a better government
Video #8: The era of open innovation
Video #9: The currency of the new economy is trust

File diff suppressed because it is too large Load diff

View file

@ -60,6 +60,7 @@ from .utils import (
format_bytes,
formatSeconds,
GeoRestrictedError,
HaruhiDLError,
int_or_none,
ISO3166Utils,
locked_file,
@ -163,6 +164,7 @@ class HaruhiDL(object):
simulate: Do not download the video files.
format: Video format code. See options.py for more information.
outtmpl: Template for output names.
outtmpl_na_placeholder: Placeholder for unavailable meta fields.
restrictfilenames: Do not allow "&" and spaces in file names
ignoreerrors: Do not stop on download errors.
force_generic_extractor: Force downloader to use the generic extractor
@ -338,6 +340,8 @@ class HaruhiDL(object):
_pps = []
_download_retcode = None
_num_downloads = None
_playlist_level = 0
_playlist_urls = set()
_screen_file = None
def __init__(self, params=None, auto_init=True):
@ -660,7 +664,7 @@ class HaruhiDL(object):
template_dict = dict((k, v if isinstance(v, compat_numeric_types) else sanitize(k, v))
for k, v in template_dict.items()
if v is not None and not isinstance(v, (list, tuple, dict)))
template_dict = collections.defaultdict(lambda: 'NA', template_dict)
template_dict = collections.defaultdict(lambda: self.params.get('outtmpl_na_placeholder', 'NA'), template_dict)
outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
@ -680,8 +684,8 @@ class HaruhiDL(object):
# Missing numeric fields used together with integer presentation types
# in format specification will break the argument substitution since
# string 'NA' is returned for missing fields. We will patch output
# template for missing fields to meet string presentation type.
# string NA placeholder is returned for missing fields. We will patch
# output template for missing fields to meet string presentation type.
for numeric_field in self._NUMERIC_FIELDS:
if numeric_field not in template_dict:
# As of [1] format syntax is:
@ -774,22 +778,38 @@ class HaruhiDL(object):
def extract_info(self, url, download=True, ie_key=None, extra_info={},
process=True, force_generic_extractor=False):
'''
Returns a list with a dictionary for each video we find.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result
'''
"""
Return a list with a dictionary for each video extracted.
Arguments:
url -- URL to extract
Keyword arguments:
download -- whether to download videos during extraction
ie_key -- extractor key hint
extra_info -- dictionary containing the extra values to add to each result
process -- whether to resolve all unresolved references (URLs, playlist items),
must be True for download to work.
force_generic_extractor -- force using the generic extractor
"""
if not ie_key and force_generic_extractor:
ie_key = 'Generic'
force_use_mastodon = self.params.get('force_use_mastodon')
if not ie_key and force_use_mastodon:
ie_key = 'MastodonSH'
if not ie_key:
ie_key = self.params.get('ie_key')
if ie_key:
ies = [self.get_info_extractor(ie_key)]
else:
ies = self._ies
for ie in ies:
if not ie.suitable(url):
if not force_use_mastodon and not ie.suitable(url):
continue
ie = self.get_info_extractor(ie.ie_key())
@ -797,21 +817,14 @@ class HaruhiDL(object):
self.report_warning('The program functionality for this site has been marked as broken, '
'and will probably not work.')
return self.__extract_info(url, ie, download, extra_info, process)
else:
self.report_error('no suitable InfoExtractor for URL %s' % url)
def __handle_extraction_exceptions(func):
def wrapper(self, *args, **kwargs):
try:
ie_result = ie.extract(url)
if ie_result is None: # Finished already (backwards compatibility; listformats and friends should be moved here)
break
if isinstance(ie_result, list):
# Backwards compatibility: old IE result format
ie_result = {
'_type': 'compat_list',
'entries': ie_result,
}
self.add_default_extra_info(ie_result, ie, url)
if process:
return self.process_ie_result(ie_result, download, extra_info)
else:
return ie_result
return func(self, *args, **kwargs)
except GeoRestrictedError as e:
msg = e.msg
if e.countries:
@ -819,20 +832,33 @@ class HaruhiDL(object):
map(ISO3166Utils.short2full, e.countries))
msg += '\nYou might want to use a VPN or a proxy server (with --proxy) to workaround.'
self.report_error(msg)
break
except ExtractorError as e: # An error we somewhat expected
self.report_error(compat_str(e), e.format_traceback())
break
except MaxDownloadsReached:
raise
except Exception as e:
if self.params.get('ignoreerrors', False):
self.report_error(error_to_compat_str(e), tb=encode_compat_str(traceback.format_exc()))
break
else:
raise
return wrapper
@__handle_extraction_exceptions
def __extract_info(self, url, ie, download, extra_info, process):
ie_result = ie.extract(url)
if ie_result is None: # Finished already (backwards compatibility; listformats and friends should be moved here)
return
if isinstance(ie_result, list):
# Backwards compatibility: old IE result format
ie_result = {
'_type': 'compat_list',
'entries': ie_result,
}
self.add_default_extra_info(ie_result, ie, url)
if process:
return self.process_ie_result(ie_result, download, extra_info)
else:
self.report_error('no suitable InfoExtractor for URL %s' % url)
return ie_result
def add_default_extra_info(self, ie_result, ie, url):
self.add_extra_info(ie_result, {
@ -897,123 +923,30 @@ class HaruhiDL(object):
# url_transparent. In such cases outer metadata (from ie_result)
# should be propagated to inner one (info). For this to happen
# _type of info should be overridden with url_transparent. This
# fixes issue from https://github.com/ytdl-org/haruhi-dl/pull/11163.
# fixes issue from https://github.com/ytdl-org/youtube-dl/pull/11163.
if new_result.get('_type') == 'url':
new_result['_type'] = 'url_transparent'
return self.process_ie_result(
new_result, download=download, extra_info=extra_info)
elif result_type in ('playlist', 'multi_video'):
# We process each entry in the playlist
playlist = ie_result.get('title') or ie_result.get('id')
self.to_screen('[download] Downloading playlist: %s' % playlist)
playlist_results = []
playliststart = self.params.get('playliststart', 1) - 1
playlistend = self.params.get('playlistend')
# For backwards compatibility, interpret -1 as whole list
if playlistend == -1:
playlistend = None
playlistitems_str = self.params.get('playlist_items')
playlistitems = None
if playlistitems_str is not None:
def iter_playlistitems(format):
for string_segment in format.split(','):
if '-' in string_segment:
start, end = string_segment.split('-')
for item in range(int(start), int(end) + 1):
yield int(item)
else:
yield int(string_segment)
playlistitems = orderedSet(iter_playlistitems(playlistitems_str))
ie_entries = ie_result['entries']
def make_playlistitems_entries(list_ie_entries):
num_entries = len(list_ie_entries)
return [
list_ie_entries[i - 1] for i in playlistitems
if -num_entries <= i - 1 < num_entries]
def report_download(num_entries):
# Protect from infinite recursion due to recursively nested playlists
# (see https://github.com/ytdl-org/youtube-dl/issues/27833)
webpage_url = ie_result['webpage_url']
if webpage_url in self._playlist_urls:
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, num_entries))
'[download] Skipping already downloaded playlist: %s'
% ie_result.get('title') or ie_result.get('id'))
return
if isinstance(ie_entries, list):
n_all_entries = len(ie_entries)
if playlistitems:
entries = make_playlistitems_entries(ie_entries)
else:
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
self.to_screen(
'[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
elif isinstance(ie_entries, PagedList):
if playlistitems:
entries = []
for item in playlistitems:
entries.extend(ie_entries.getslice(
item - 1, item
))
else:
entries = ie_entries.getslice(
playliststart, playlistend)
n_entries = len(entries)
report_download(n_entries)
else: # iterable
if playlistitems:
entries = make_playlistitems_entries(list(itertools.islice(
ie_entries, 0, max(playlistitems))))
else:
entries = list(itertools.islice(
ie_entries, playliststart, playlistend))
n_entries = len(entries)
report_download(n_entries)
if self.params.get('playlistreverse', False):
entries = entries[::-1]
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
# minimal changes
if x_forwarded_for:
entry['__x_forwarded_for_ip'] = x_forwarded_for
extra = {
'n_entries': n_entries,
'playlist': playlist,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
}
reason = self._match_entry(entry, incomplete=True)
if reason is not None:
self.to_screen('[download] ' + reason)
continue
entry_result = self.process_ie_result(entry,
download=download,
extra_info=extra)
playlist_results.append(entry_result)
ie_result['entries'] = playlist_results
self.to_screen('[download] Finished downloading playlist: %s' % playlist)
return ie_result
self._playlist_level += 1
self._playlist_urls.add(webpage_url)
try:
return self.__process_playlist(ie_result, download)
finally:
self._playlist_level -= 1
if not self._playlist_level:
self._playlist_urls.clear()
elif result_type == 'compat_list':
self.report_warning(
'Extractor %s returned a compat_list result. '
@ -1038,6 +971,123 @@ class HaruhiDL(object):
else:
raise Exception('Invalid result type: %s' % result_type)
def __process_playlist(self, ie_result, download):
# We process each entry in the playlist
playlist = ie_result.get('title') or ie_result.get('id')
self.to_screen('[download] Downloading playlist: %s' % playlist)
playlist_results = []
playliststart = self.params.get('playliststart', 1) - 1
playlistend = self.params.get('playlistend')
# For backwards compatibility, interpret -1 as whole list
if playlistend == -1:
playlistend = None
playlistitems_str = self.params.get('playlist_items')
playlistitems = None
if playlistitems_str is not None:
def iter_playlistitems(format):
for string_segment in format.split(','):
if '-' in string_segment:
start, end = string_segment.split('-')
for item in range(int(start), int(end) + 1):
yield int(item)
else:
yield int(string_segment)
playlistitems = orderedSet(iter_playlistitems(playlistitems_str))
ie_entries = ie_result['entries']
def make_playlistitems_entries(list_ie_entries):
num_entries = len(list_ie_entries)
return [
list_ie_entries[i - 1] for i in playlistitems
if -num_entries <= i - 1 < num_entries]
def report_download(num_entries):
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, num_entries))
if isinstance(ie_entries, list):
n_all_entries = len(ie_entries)
if playlistitems:
entries = make_playlistitems_entries(ie_entries)
else:
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
self.to_screen(
'[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
elif isinstance(ie_entries, PagedList):
if playlistitems:
entries = []
for item in playlistitems:
entries.extend(ie_entries.getslice(
item - 1, item
))
else:
entries = ie_entries.getslice(
playliststart, playlistend)
n_entries = len(entries)
report_download(n_entries)
else: # iterable
if playlistitems:
entries = make_playlistitems_entries(list(itertools.islice(
ie_entries, 0, max(playlistitems))))
else:
entries = list(itertools.islice(
ie_entries, playliststart, playlistend))
n_entries = len(entries)
report_download(n_entries)
if self.params.get('playlistreverse', False):
entries = entries[::-1]
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
# minimal changes
if x_forwarded_for:
entry['__x_forwarded_for_ip'] = x_forwarded_for
extra = {
'n_entries': n_entries,
'playlist': playlist,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
}
reason = self._match_entry(entry, incomplete=True)
if reason is not None:
self.to_screen('[download] ' + reason)
continue
entry_result = self.__process_iterable_entry(entry, download, extra)
# TODO: skip failed (empty) entries?
playlist_results.append(entry_result)
ie_result['entries'] = playlist_results
self.to_screen('[download] Finished downloading playlist: %s' % playlist)
return ie_result
@__handle_extraction_exceptions
def __process_iterable_entry(self, entry, download, extra_info):
return self.process_ie_result(
entry, download=download, extra_info=extra_info)
def _build_format_filter(self, filter_spec):
" Returns a function to filter the formats according to the filter_spec "
@ -1077,7 +1127,7 @@ class HaruhiDL(object):
'*=': lambda attr, value: value in attr,
}
str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id|language)
\s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+)
\s*$
@ -1220,6 +1270,8 @@ class HaruhiDL(object):
group = _parse_format_selection(tokens, inside_group=True)
current_selector = FormatSelector(GROUP, group, [])
elif string == '+':
if inside_merge:
raise syntax_error('Unexpected "+"', start)
video_selector = current_selector
audio_selector = _parse_format_selection(tokens, inside_merge=True)
if not video_selector or not audio_selector:
@ -1480,14 +1532,18 @@ class HaruhiDL(object):
if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id']
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict['timestamp'])
info_dict['upload_date'] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError):
pass
for ts_key, date_key in (
('timestamp', 'upload_date'),
('release_timestamp', 'release_date'),
):
if info_dict.get(date_key) is None and info_dict.get(ts_key) is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict[ts_key])
info_dict[date_key] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError):
pass
# Auto generate title fields corresponding to the *_number fields when missing
# in order to always have clean titles. This is very common for TV series.
@ -1495,6 +1551,19 @@ class HaruhiDL(object):
if info_dict.get('%s_number' % field) is not None and not info_dict.get(field):
info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field])
# Some fragmented media manifests like m3u8 allow embedding subtitles
# This is a weird hack to provide these subtitles to users without a very huge refactor of extractors
if 'formats' in info_dict:
formats_subtitles = list(filter(lambda x: x.get('_subtitle'), info_dict['formats']))
if formats_subtitles:
info_dict.setdefault('subtitles', {})
for sub in formats_subtitles:
if sub['_key'] not in info_dict['subtitles']:
info_dict['subtitles'][sub['_key']] = []
info_dict['subtitles'][sub['_key']].append(sub['_subtitle'])
# remove these subtitles from formats now
info_dict['formats'] = list(filter(lambda x: '_subtitle' not in x, info_dict['formats']))
for cc_kind in ('subtitles', 'automatic_captions'):
cc = info_dict.get(cc_kind)
if cc:
@ -1502,6 +1571,12 @@ class HaruhiDL(object):
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if subtitle_format.get('protocol') is None:
subtitle_format['protocol'] = determine_protocol(subtitle_format)
if subtitle_format.get('http_headers') is None:
full_info = info_dict.copy()
full_info.update(subtitle_format)
subtitle_format['http_headers'] = self._calc_headers(full_info)
if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
@ -1604,7 +1679,7 @@ class HaruhiDL(object):
if req_format is None:
req_format = self._default_format_spec(info_dict, download=download)
if self.params.get('verbose'):
self.to_stdout('[debug] Default format spec: %s' % req_format)
self._write_string('[debug] Default format spec: %s\n' % req_format)
format_selector = self.build_format_selector(req_format)
@ -1614,7 +1689,7 @@ class HaruhiDL(object):
# by extractor are incomplete or not (i.e. whether extractor provides only
# video-only or audio-only formats) for proper formats selection for
# extractors with such incomplete formats (see
# https://github.com/ytdl-org/haruhi-dl/pull/5556).
# https://github.com/ytdl-org/youtube-dl/pull/5556).
# Since formats may be filtered during format selection and may not match
# the original formats the results may be incorrect. Thus original formats
# or pre-calculated metrics should be passed to format selection routines
@ -1622,7 +1697,7 @@ class HaruhiDL(object):
# We will pass a context object containing all necessary additional data
# instead of just formats.
# This fixes incorrect format selection issue (see
# https://github.com/ytdl-org/haruhi-dl/issues/10083).
# https://github.com/ytdl-org/youtube-dl/issues/10083).
incomplete_formats = (
# All formats are video-only or
all(f.get('vcodec') != 'none' and f.get('acodec') == 'none' for f in formats)
@ -1771,6 +1846,8 @@ class HaruhiDL(object):
os.makedirs(dn)
return True
except (OSError, IOError) as err:
if isinstance(err, OSError) and err.errno == errno.EEXIST:
return True
self.report_error('unable to create directory ' + error_to_compat_str(err))
return False
@ -1816,7 +1893,6 @@ class HaruhiDL(object):
# subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE
subtitles = info_dict['requested_subtitles']
ie = self.get_info_extractor(info_dict['extractor_key'])
for sub_lang, sub_info in subtitles.items():
sub_format = sub_info['ext']
sub_filename = subtitles_filename(filename, sub_lang, sub_format, info_dict.get('ext'))
@ -1827,7 +1903,7 @@ class HaruhiDL(object):
if sub_info.get('data') is not None:
try:
# Use newline='' to prevent conversion of newline characters
# See https://github.com/ytdl-org/haruhi-dl/issues/10268
# See https://github.com/ytdl-org/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_info['data'])
except (OSError, IOError):
@ -1835,10 +1911,8 @@ class HaruhiDL(object):
return
else:
try:
sub_data = ie._request_webpage(
sub_info['url'], info_dict['id'], note=False).read()
with io.open(encodeFilename(sub_filename), 'wb') as subfile:
subfile.write(sub_data)
subd = get_suitable_downloader(sub_info, self.params)(self, self.params)
subd.download(sub_filename, sub_info)
except (ExtractorError, IOError, OSError, ValueError) as err:
self.report_warning('Unable to download subtitle for "%s": %s' %
(sub_lang, error_to_compat_str(err)))
@ -1865,7 +1939,11 @@ class HaruhiDL(object):
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
if self.params.get('verbose'):
self.to_stdout('[debug] Invoking downloader on %r' % info.get('url'))
self.to_screen('[debug] Invoking downloader on %r' % info.get('url'))
if info.get('protocol') == 'bittorrent' and not self.params.get('allow_p2p'):
raise HaruhiDLError('Peer-to-peer format got selected, but peer-to-peer '
'downloads are not allowed. '
'Choose different format or add --allow-p2p option')
return fd.download(name, info)
if info_dict.get('requested_formats') is not None:
@ -1882,8 +1960,32 @@ class HaruhiDL(object):
def compatible_formats(formats):
video, audio = formats
# Check extension
# Check extensions and codecs
video_ext, audio_ext = video.get('ext'), audio.get('ext')
video_codec, audio_codec = video.get('vcodec'), audio.get('acodec')
if video_codec and audio_codec:
COMPATIBLE_CODECS = {
'mp4': (
# fourcc (m3u8, mpd)
'av01', 'hevc', 'avc1', 'mp4a',
# whatever the ism does
'h264', 'aacl',
),
'webm': (
'av01', 'vp9', 'vp8', 'opus', 'vrbs',
# these are in the webm spec, so putting it here to be sure
'vp9x', 'vp8x',
),
}
video_codec = video_codec[:4].lower()
audio_codec = audio_codec[:4].lower()
for ext in COMPATIBLE_CODECS:
if all(codec in COMPATIBLE_CODECS[ext]
for codec in (video_codec, audio_codec)):
info_dict['ext'] = ext
return True
if video_ext and audio_ext:
COMPATIBLE_EXTS = (
('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'ismv', 'isma'),
@ -1892,7 +1994,6 @@ class HaruhiDL(object):
for exts in COMPATIBLE_EXTS:
if video_ext in exts and audio_ext in exts:
return True
# TODO: Check acodec/vcodec
return False
filename_real_ext = os.path.splitext(filename)[1][1:]
@ -2246,7 +2347,7 @@ class HaruhiDL(object):
return
if type('') is not compat_str:
# Python 2.6 on SLES11 SP1 (https://github.com/ytdl-org/haruhi-dl/issues/3326)
# Python 2.6 on SLES11 SP1 (https://github.com/ytdl-org/youtube-dl/issues/3326)
self.report_warning(
'Your Python is broken! Update to a newer and supported version')
@ -2340,7 +2441,7 @@ class HaruhiDL(object):
proxies = {'http': opts_proxy, 'https': opts_proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/ytdl-org/haruhi-dl/issues/805)
# Set HTTPS proxy to HTTP one if given (https://github.com/ytdl-org/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = PerRequestProxyHandler(proxies)
@ -2354,7 +2455,7 @@ class HaruhiDL(object):
# When passing our own FileHandler instance, build_opener won't add the
# default FileHandler and allows us to disable the file protocol, which
# can be used for malicious purposes (see
# https://github.com/ytdl-org/haruhi-dl/issues/8227)
# https://github.com/ytdl-org/youtube-dl/issues/8227)
file_handler = compat_urllib_request.FileHandler()
def file_open(*args, **kwargs):
@ -2366,7 +2467,7 @@ class HaruhiDL(object):
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
# (See https://github.com/ytdl-org/haruhi-dl/issues/1309 for details)
# (See https://github.com/ytdl-org/youtube-dl/issues/1309 for details)
opener.addheaders = []
self._opener = opener
@ -2404,7 +2505,7 @@ class HaruhiDL(object):
thumb_ext = determine_ext(t['url'], 'jpg')
suffix = '_%s' % t['id'] if len(thumbnails) > 1 else ''
thumb_display_id = '%s ' % t['id'] if len(thumbnails) > 1 else ''
t['filename'] = thumb_filename = os.path.splitext(filename)[0] + suffix + '.' + thumb_ext
t['filename'] = thumb_filename = replace_extension(filename + suffix, thumb_ext, info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(thumb_filename)):
self.to_screen('[%s] %s: Thumbnail %sis already present' %

View file

@ -1,7 +1,10 @@
#!/usr/bin/env python
#!/usr/bin/env python3
# coding: utf-8
from __future__ import unicode_literals
import sys
if sys.version_info[0] == 2:
sys.exit('haruhi-dl no longer works on Python 2, use Python 3 instead')
__license__ = 'LGPL-3.0-or-later'
@ -9,7 +12,6 @@ import codecs
import io
import os
import random
import sys
from .options import (
@ -48,7 +50,7 @@ from .HaruhiDL import HaruhiDL
def _real_main(argv=None):
# Compatibility fixes for Windows
if sys.platform == 'win32':
# https://github.com/ytdl-org/haruhi-dl/issues/820
# https://github.com/ytdl-org/youtube-dl/issues/820
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)
workaround_optparse_bug9161()
@ -174,6 +176,10 @@ def _real_main(argv=None):
opts.max_sleep_interval = opts.sleep_interval
if opts.ap_mso and opts.ap_mso not in MSO_INFO:
parser.error('Unsupported TV Provider, use --ap-list-mso to get a list of supported TV Providers')
if opts.force_generic_extractor and opts.force_use_mastodon:
parser.error('force either generic extractor or Mastodon')
if opts.force_playwright_browser not in ('firefox', 'chromium', 'webkit', None):
parser.error('invalid browser forced, must be on of: firefox, chromium, webkit')
def parse_retries(retries):
if retries in ('inf', 'infinite'):
@ -340,11 +346,14 @@ def _real_main(argv=None):
'format': opts.format,
'listformats': opts.listformats,
'outtmpl': outtmpl,
'outtmpl_na_placeholder': opts.outtmpl_na_placeholder,
'autonumber_size': opts.autonumber_size,
'autonumber_start': opts.autonumber_start,
'restrictfilenames': opts.restrictfilenames,
'ignoreerrors': opts.ignoreerrors,
'force_generic_extractor': opts.force_generic_extractor,
'force_use_mastodon': opts.force_use_mastodon,
'ie_key': opts.ie_key,
'ratelimit': opts.ratelimit,
'nooverwrites': opts.nooverwrites,
'retries': opts.retries,
@ -417,12 +426,14 @@ def _real_main(argv=None):
'headless_playwright': opts.headless_playwright,
'sleep_interval': opts.sleep_interval,
'max_sleep_interval': opts.max_sleep_interval,
'force_playwright_browser': opts.force_playwright_browser,
'external_downloader': opts.external_downloader,
'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items,
'xattr_set_filesize': opts.xattr_set_filesize,
'match_filter': match_filter,
'no_color': opts.no_color,
'use_proxy_sites': opts.use_proxy_sites,
'ffmpeg_location': opts.ffmpeg_location,
'hls_prefer_native': opts.hls_prefer_native,
'hls_use_mpegts': opts.hls_use_mpegts,
@ -434,6 +445,8 @@ def _real_main(argv=None):
'geo_bypass': opts.geo_bypass,
'geo_bypass_country': opts.geo_bypass_country,
'geo_bypass_ip_block': opts.geo_bypass_ip_block,
'allow_p2p': opts.allow_p2p if not opts.prefer_p2p else True,
'prefer_p2p': opts.prefer_p2p,
# just for deprecation check
'autonumber': opts.autonumber if opts.autonumber is True else None,
'usetitle': opts.usetitle if opts.usetitle is True else None,

View file

@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
from __future__ import unicode_literals
# Execute with
@ -7,6 +7,9 @@ from __future__ import unicode_literals
import sys
if sys.version_info[0] == 2:
sys.exit('haruhi-dl no longer works on Python 2, use Python 3 instead')
if __package__ is None and not hasattr(sys, 'frozen'):
# direct call of __main__.py
import os.path

File diff suppressed because it is too large Load diff

View file

@ -1,5 +1,18 @@
from __future__ import unicode_literals
from ..utils import (
determine_protocol,
)
def _get_real_downloader(info_dict, protocol=None, *args, **kwargs):
info_copy = info_dict.copy()
if protocol:
info_copy['protocol'] = protocol
return get_suitable_downloader(info_copy, *args, **kwargs)
# Some of these require _get_real_downloader
from .common import FileDownloader
from .f4m import F4mFD
from .hls import HlsFD
@ -8,15 +21,13 @@ from .rtmp import RtmpFD
from .dash import DashSegmentsFD
from .rtsp import RtspFD
from .ism import IsmFD
from .niconico import NiconicoDmcFD
from .external import (
get_external_downloader,
Aria2cFD,
FFmpegFD,
)
from ..utils import (
determine_protocol,
)
PROTOCOL_MAP = {
'rtmp': RtmpFD,
'm3u8_native': HlsFD,
@ -26,6 +37,8 @@ PROTOCOL_MAP = {
'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
'ism': IsmFD,
'bittorrent': Aria2cFD,
'niconico_dmc': NiconicoDmcFD,
}

View file

@ -182,15 +182,16 @@ class Aria2cFD(ExternalFD):
AVAILABLE_OPT = '-v'
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-c']
cmd = [self.exe or 'aria2c', '-c']
cmd += self._configuration_args([
'--min-split-size', '1M', '--max-connection-per-server', '4'])
dn = os.path.dirname(tmpfilename)
if dn:
cmd += ['--dir', dn]
cmd += ['--out', os.path.basename(tmpfilename)]
for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)]
if info_dict['protocol'] != 'bittorrent':
for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._option('--interface', 'source_address')
cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
@ -240,7 +241,7 @@ class FFmpegFD(ExternalFD):
# setting -seekable prevents ffmpeg from guessing if the server
# supports seeking(by adding the header `Range: bytes=0-`), which
# can cause problems in some cases
# https://github.com/ytdl-org/haruhi-dl/issues/11800#issuecomment-275037127
# https://github.com/ytdl-org/youtube-dl/issues/11800#issuecomment-275037127
# http://trac.ffmpeg.org/ticket/6125#comment:10
args += ['-seekable', '1' if seekable else '0']
@ -317,7 +318,9 @@ class FFmpegFD(ExternalFD):
args += ['-fs', compat_str(self._TEST_FILE_SIZE)]
if protocol in ('m3u8', 'm3u8_native'):
if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
if info_dict['ext'] == 'vtt':
args += ['-f', 'webvtt']
elif self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
args += ['-f', 'mpegts']
else:
args += ['-f', 'mp4']
@ -341,7 +344,7 @@ class FFmpegFD(ExternalFD):
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/ytdl-org/haruhi-dl/issues/8300).
# files (see https://github.com/ytdl-org/youtube-dl/issues/8300).
if sys.platform != 'win32':
proc.communicate(b'q')
raise

View file

@ -324,8 +324,8 @@ class F4mFD(FragmentFD):
urlh = self.hdl.urlopen(self._prepare_url(info_dict, man_url))
man_url = urlh.geturl()
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/ytdl-org/haruhi-dl/issues/6215#issuecomment-121704244
# and https://github.com/ytdl-org/haruhi-dl/issues/7823)
# (see https://github.com/ytdl-org/youtube-dl/issues/6215#issuecomment-121704244
# and https://github.com/ytdl-org/youtube-dl/issues/7823)
manifest = fix_xml_ampersands(urlh.read().decode('utf-8', 'ignore')).strip()
doc = compat_etree_fromstring(manifest)
@ -409,7 +409,7 @@ class F4mFD(FragmentFD):
# In tests, segments may be truncated, and thus
# FlvReader may not be able to parse the whole
# chunk. If so, write the segment as is
# See https://github.com/ytdl-org/haruhi-dl/issues/9214
# See https://github.com/ytdl-org/youtube-dl/issues/9214
dest_stream.write(down_data)
break
raise

View file

@ -97,12 +97,15 @@ class FragmentFD(FileDownloader):
def _download_fragment(self, ctx, frag_url, info_dict, headers=None):
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], ctx['fragment_index'])
success = ctx['dl'].download(fragment_filename, {
fragment_info_dict = {
'url': frag_url,
'http_headers': headers or info_dict.get('http_headers'),
})
}
success = ctx['dl'].download(fragment_filename, fragment_info_dict)
if not success:
return False, None
if fragment_info_dict.get('filetime'):
ctx['fragment_filetime'] = fragment_info_dict.get('filetime')
down, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
frag_content = down.read()
@ -258,6 +261,13 @@ class FragmentFD(FileDownloader):
downloaded_bytes = ctx['complete_frags_downloaded_bytes']
else:
self.try_rename(ctx['tmpfilename'], ctx['filename'])
if self.params.get('updatetime', True):
filetime = ctx.get('fragment_filetime')
if filetime:
try:
os.utime(ctx['filename'], (time.time(), filetime))
except Exception:
pass
downloaded_bytes = os.path.getsize(encodeFilename(ctx['filename']))
self._hook_progress({

View file

@ -42,11 +42,13 @@ class HlsFD(FragmentFD):
# no segments will definitely be appended to the end of the playlist.
# r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# # event media playlists [4]
r'#EXT-X-MAP:', # media initialization [5]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
# 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
# 5. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.5
)
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
is_aes128_enc = '#EXT-X-KEY:METHOD=AES-128' in manifest
@ -152,8 +154,8 @@ class HlsFD(FragmentFD):
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/ytdl-org/haruhi-dl/issues/10165,
# https://github.com/ytdl-org/haruhi-dl/issues/10448).
# See https://github.com/ytdl-org/youtube-dl/issues/10165,
# https://github.com/ytdl-org/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
@ -170,8 +172,12 @@ class HlsFD(FragmentFD):
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.hdl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
# Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
# size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if not test:
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
self._append_fragment(ctx, frag_content)
# We only download the first fragment during the test
if test:

View file

@ -109,14 +109,16 @@ class HttpFD(FileDownloader):
try:
ctx.data = self.hdl.urlopen(request)
except (compat_urllib_error.URLError, ) as err:
if isinstance(err.reason, socket.timeout):
# reason may not be available, e.g. for urllib2.HTTPError on python 2.6
reason = getattr(err, 'reason', None)
if isinstance(reason, socket.timeout):
raise RetryDownload(err)
raise err
# When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers
# that don't support resuming and serve a whole file with no Content-Range
# set in response despite of requested Range (see
# https://github.com/ytdl-org/haruhi-dl/issues/6057#issuecomment-126129799)
# https://github.com/ytdl-org/youtube-dl/issues/6057#issuecomment-126129799)
if has_range:
content_range = ctx.data.headers.get('Content-Range')
if content_range:

View file

@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
import threading
from .common import FileDownloader
from ..downloader import _get_real_downloader
from ..extractor.niconico import NiconicoIE
from ..compat import compat_urllib_request
class NiconicoDmcFD(FileDownloader):
""" Downloading niconico douga from DMC with heartbeat """
FD_NAME = 'niconico_dmc'
def real_download(self, filename, info_dict):
self.to_screen('[%s] Downloading from DMC' % self.FD_NAME)
ie = NiconicoIE(self.hdl)
info_dict, heartbeat_info_dict = ie._get_heartbeat_info(info_dict)
fd = _get_real_downloader(info_dict, params=self.params)(self.hdl, self.params)
success = download_complete = False
timer = [None]
heartbeat_lock = threading.Lock()
heartbeat_url = heartbeat_info_dict['url']
heartbeat_data = heartbeat_info_dict['data'].encode()
heartbeat_interval = heartbeat_info_dict.get('interval', 30)
def heartbeat():
try:
compat_urllib_request.urlopen(url=heartbeat_url, data=heartbeat_data)
except Exception:
self.to_screen('[%s] Heartbeat failed' % self.FD_NAME)
with heartbeat_lock:
if not download_complete:
timer[0] = threading.Timer(heartbeat_interval, heartbeat)
timer[0].start()
heartbeat_info_dict['ping']()
self.to_screen('[%s] Heartbeat with %d second interval ...' % (self.FD_NAME, heartbeat_interval))
try:
heartbeat()
if type(fd).__name__ == 'HlsFD':
info_dict.update(ie._extract_m3u8_formats(info_dict['url'], info_dict['id'])[0])
success = fd.real_download(filename, info_dict)
finally:
if heartbeat_lock:
with heartbeat_lock:
timer[0].cancel()
download_complete = True
return success

View file

@ -1,14 +1,15 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import re
import time
from .amp import AMPIE
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import compat_urlparse
from ..utils import (
parse_duration,
parse_iso8601,
try_get,
)
class AbcNewsVideoIE(AMPIE):
@ -18,8 +19,8 @@ class AbcNewsVideoIE(AMPIE):
(?:
abcnews\.go\.com/
(?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-|
video/embed\?.*?\bid=
(?:[^/]+/)*video/(?P<display_id>[0-9a-z-]+)-|
video/(?:embed|itemfeed)\?.*?\bid=
)|
fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/
)
@ -36,6 +37,8 @@ class AbcNewsVideoIE(AMPIE):
'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.',
'duration': 180,
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1380454200,
'upload_date': '20130929',
},
'params': {
# m3u8 download
@ -47,6 +50,12 @@ class AbcNewsVideoIE(AMPIE):
}, {
'url': 'http://abcnews.go.com/2020/video/2020-husband-stands-teacher-jail-student-affairs-26119478',
'only_matching': True,
}, {
'url': 'http://abcnews.go.com/video/itemfeed?id=46979033',
'only_matching': True,
}, {
'url': 'https://abcnews.go.com/GMA/News/video/history-christmas-story-67894761',
'only_matching': True,
}]
def _real_extract(self, url):
@ -67,28 +76,23 @@ class AbcNewsIE(InfoExtractor):
_VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',
# Youtube Embeds
'url': 'https://abcnews.go.com/Entertainment/peter-billingsley-child-actor-christmas-story-hollywood-power/story?id=51286501',
'info_dict': {
'id': '10505354',
'ext': 'flv',
'display_id': 'dramatic-video-rare-death-job-america',
'title': 'Occupational Hazards',
'description': 'Nightline investigates the dangers that lurk at various jobs.',
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20100428',
'timestamp': 1272412800,
'id': '51286501',
'title': "Peter Billingsley: From child actor in 'A Christmas Story' to Hollywood power player",
'description': 'Billingsley went from a child actor to Hollywood power player.',
},
'add_ie': ['AbcNewsVideo'],
'playlist_count': 5,
}, {
'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818',
'info_dict': {
'id': '38897857',
'ext': 'mp4',
'display_id': 'justin-timberlake-performs-stop-feeling-eurovision-2016',
'title': 'Justin Timberlake Drops Hints For Secret Single',
'description': 'Lara Spencer reports the buzziest stories of the day in "GMA" Pop News.',
'upload_date': '20160515',
'timestamp': 1463329500,
'upload_date': '20160505',
'timestamp': 1462442280,
},
'params': {
# m3u8 download
@ -100,49 +104,55 @@ class AbcNewsIE(InfoExtractor):
}, {
'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
'only_matching': True,
}, {
# inline.type == 'video'
'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
video_id = mobj.group('id')
story_id = self._match_id(url)
webpage = self._download_webpage(url, story_id)
story = self._parse_json(self._search_regex(
r"window\['__abcnews__'\]\s*=\s*({.+?});",
webpage, 'data'), story_id)['page']['content']['story']['everscroll'][0]
article_contents = story.get('articleContents') or {}
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'window\.abcnvideo\.url\s*=\s*"([^"]+)"', webpage, 'video URL')
full_video_url = compat_urlparse.urljoin(url, video_url)
def entries():
featured_video = story.get('featuredVideo') or {}
feed = try_get(featured_video, lambda x: x['video']['feed'])
if feed:
yield {
'_type': 'url',
'id': featured_video.get('id'),
'title': featured_video.get('name'),
'url': feed,
'thumbnail': featured_video.get('images'),
'description': featured_video.get('description'),
'timestamp': parse_iso8601(featured_video.get('uploadDate')),
'duration': parse_duration(featured_video.get('duration')),
'ie_key': AbcNewsVideoIE.ie_key(),
}
youtube_url = YoutubeIE._extract_url(webpage)
for inline in (article_contents.get('inlines') or []):
inline_type = inline.get('type')
if inline_type == 'iframe':
iframe_url = try_get(inline, lambda x: x['attrs']['src'])
if iframe_url:
yield self.url_result(iframe_url)
elif inline_type == 'video':
video_id = inline.get('id')
if video_id:
yield {
'_type': 'url',
'id': video_id,
'url': 'http://abcnews.go.com/video/embed?id=' + video_id,
'thumbnail': inline.get('imgSrc') or inline.get('imgDefault'),
'description': inline.get('description'),
'duration': parse_duration(inline.get('duration')),
'ie_key': AbcNewsVideoIE.ie_key(),
}
timestamp = None
date_str = self._html_search_regex(
r'<span[^>]+class="timestamp">([^<]+)</span>',
webpage, 'timestamp', fatal=False)
if date_str:
tz_offset = 0
if date_str.endswith(' ET'): # Eastern Time
tz_offset = -5
date_str = date_str[:-3]
date_formats = ['%b. %d, %Y', '%b %d, %Y, %I:%M %p']
for date_format in date_formats:
try:
timestamp = calendar.timegm(time.strptime(date_str.strip(), date_format))
except ValueError:
continue
if timestamp is not None:
timestamp -= tz_offset * 3600
entry = {
'_type': 'url_transparent',
'ie_key': AbcNewsVideoIE.ie_key(),
'url': full_video_url,
'id': video_id,
'display_id': display_id,
'timestamp': timestamp,
}
if youtube_url:
entries = [entry, self.url_result(youtube_url, ie=YoutubeIE.ie_key())]
return self.playlist_result(entries)
return entry
return self.playlist_result(
entries(), story_id, article_contents.get('headline'),
article_contents.get('subHead'))

View file

@ -2,21 +2,52 @@
from __future__ import unicode_literals
import re
import functools
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
clean_podcast_url,
float_or_none,
int_or_none,
try_get,
unified_timestamp,
OnDemandPagedList,
js_to_json,
parse_iso8601,
urljoin,
ExtractorError,
)
class ACastIE(InfoExtractor):
class ACastBaseIE(InfoExtractor):
def _extract_episode(self, episode, show_info):
title = episode['title']
info = {
'id': episode['id'],
'display_id': episode.get('episodeUrl'),
'url': clean_podcast_url(episode['url']),
'title': title,
'description': clean_html(episode.get('description') or episode.get('summary')),
'thumbnail': episode.get('image'),
'timestamp': parse_iso8601(episode.get('publishDate')),
'duration': int_or_none(episode.get('duration')),
'filesize': int_or_none(episode.get('contentLength')),
'season_number': int_or_none(episode.get('season')),
'episode': title,
'episode_number': int_or_none(episode.get('episode')),
}
info.update(show_info)
return info
def _extract_show_info(self, show):
return {
'creator': show.get('author'),
'series': show.get('title'),
}
def _call_api(self, path, video_id, query=None):
return self._download_json(
'https://feeder.acast.com/api/v1/shows/' + path, video_id, query=query)
class ACastIE(ACastBaseIE):
IE_NAME = 'acast'
_VALID_URL = r'''(?x)
https?://
@ -28,15 +59,15 @@ class ACastIE(InfoExtractor):
'''
_TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': '16d936099ec5ca2d5869e3a813ee8dc4',
'md5': 'f5598f3ad1e4776fed12ec1407153e4b',
'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3',
'title': '2. Raggarmordet - Röster ur det förflutna',
'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4',
'description': 'md5:a992ae67f4d98f1c0141598f7bebbf67',
'timestamp': 1477346700,
'upload_date': '20161024',
'duration': 2766.602563,
'duration': 2766,
'creator': 'Anton Berg & Martin Johnson',
'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna',
@ -45,7 +76,7 @@ class ACastIE(InfoExtractor):
'url': 'http://embed.acast.com/adambuxton/ep.12-adam-joeschristmaspodcast2015',
'only_matching': True,
}, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22',
'url': 'https://play.acast.com/s/rattegangspodden/s04e09styckmordetihelenelund-del2-2',
'only_matching': True,
}, {
'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9',
@ -54,40 +85,14 @@ class ACastIE(InfoExtractor):
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
s = self._download_json(
'https://feeder.acast.com/api/v1/shows/%s/episodes/%s' % (channel, display_id),
display_id)
media_url = s['url']
if re.search(r'[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}', display_id):
episode_url = s.get('episodeUrl')
if episode_url:
display_id = episode_url
else:
channel, display_id = re.match(self._VALID_URL, s['link']).groups()
cast_data = self._download_json(
'https://play-api.acast.com/splash/%s/%s' % (channel, display_id),
display_id)['result']
e = cast_data['episode']
title = e.get('name') or s['title']
return {
'id': compat_str(e['id']),
'display_id': display_id,
'url': media_url,
'title': title,
'description': e.get('summary') or clean_html(e.get('description') or s.get('description')),
'thumbnail': e.get('image'),
'timestamp': unified_timestamp(e.get('publishingDate') or s.get('publishDate')),
'duration': float_or_none(e.get('duration') or s.get('duration')),
'filesize': int_or_none(e.get('contentLength')),
'creator': try_get(cast_data, lambda x: x['show']['author'], compat_str),
'series': try_get(cast_data, lambda x: x['show']['name'], compat_str),
'season_number': int_or_none(e.get('seasonNumber')),
'episode': title,
'episode_number': int_or_none(e.get('episodeNumber')),
}
episode = self._call_api(
'%s/episodes/%s' % (channel, display_id),
display_id, {'showInfo': 'true'})
return self._extract_episode(
episode, self._extract_show_info(episode.get('show') or {}))
class ACastChannelIE(InfoExtractor):
class ACastChannelIE(ACastBaseIE):
IE_NAME = 'acast:channel'
_VALID_URL = r'''(?x)
https?://
@ -102,34 +107,97 @@ class ACastChannelIE(InfoExtractor):
'info_dict': {
'id': '4efc5294-5385-4847-98bd-519799ce5786',
'title': 'Today in Focus',
'description': 'md5:9ba5564de5ce897faeb12963f4537a64',
'description': 'md5:c09ce28c91002ce4ffce71d6504abaae',
},
'playlist_mincount': 35,
'playlist_mincount': 200,
}, {
'url': 'http://play.acast.com/s/ft-banking-weekly',
'only_matching': True,
}]
_API_BASE_URL = 'https://play.acast.com/api/'
_PAGE_SIZE = 10
@classmethod
def suitable(cls, url):
return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
def _fetch_page(self, channel_slug, page):
casts = self._download_json(
self._API_BASE_URL + 'channels/%s/acasts?page=%s' % (channel_slug, page),
channel_slug, note='Download page %d of channel data' % page)
for cast in casts:
yield self.url_result(
'https://play.acast.com/s/%s/%s' % (channel_slug, cast['url']),
'ACast', cast['id'])
def _real_extract(self, url):
show_slug = self._match_id(url)
show = self._call_api(show_slug, show_slug)
show_info = self._extract_show_info(show)
entries = []
for episode in (show.get('episodes') or []):
entries.append(self._extract_episode(episode, show_info))
return self.playlist_result(
entries, show.get('id'), show.get('title'), show.get('description'))
class ACastPlayerIE(InfoExtractor):
IE_NAME = 'acast:player'
_VALID_URL = r'https?://player\.acast\.com/(?:[^/]+/episodes/)?(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://player.acast.com/600595844cac453f8579eca0/episodes/maciej-konieczny-podatek-medialny-to-mechanizm-kontroli?theme=default&latest=1',
'info_dict': {
'id': '601dc897fb37095537d48e6f',
'ext': 'mp3',
'title': 'Maciej Konieczny: "Podatek medialny to bardziej mechanizm kontroli niż podatkowy”',
'upload_date': '20210208',
'timestamp': 1612764000,
},
}, {
'url': 'https://player.acast.com/5d09057251a90dcf7fa8e985?theme=default&latest=1',
'info_dict': {
'id': '5d09057251a90dcf7fa8e985',
'title': 'DGPtalk: Obiektywnie o biznesie',
},
'playlist_mincount': 5,
}]
@staticmethod
def _extract_urls(webpage, **kw):
return [mobj.group('url')
for mobj in re.finditer(
r'(?x)<iframe\b[^>]+\bsrc=(["\'])(?P<url>%s(?:\?[^#]+)?(?:\#.+?)?)\1' % ACastPlayerIE._VALID_URL,
webpage)]
def _real_extract(self, url):
channel_slug = self._match_id(url)
channel_data = self._download_json(
self._API_BASE_URL + 'channels/%s' % channel_slug, channel_slug)
entries = OnDemandPagedList(functools.partial(
self._fetch_page, channel_slug), self._PAGE_SIZE)
return self.playlist_result(entries, compat_str(
channel_data['id']), channel_data['name'], channel_data.get('description'))
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
data = self._parse_json(
js_to_json(
self._search_regex(
r'(?s)var _global\s*=\s*({.+?});',
webpage, 'podcast data')), display_id)
show = data['show']
players = [{
'id': player['_id'],
'title': player['title'],
'url': player['audio'],
'duration': float_or_none(player.get('duration')),
'timestamp': parse_iso8601(player.get('publishDate')),
'thumbnail': urljoin('https://player.acast.com/', player.get('cover')),
'series': show['title'],
'episode': player['title'],
} for player in data['player']]
if len(players) > 1:
info_dict = {
'_type': 'playlist',
'entries': players,
'id': show['_id'],
'title': show['title'],
'series': show['title'],
}
if show.get('cover'):
info_dict['thumbnails'] = [{
'url': urljoin('https://player.acast.com/', show['cover']['url']),
'filesize': int_or_none(show['cover'].get('size')),
}]
return info_dict
if len(players) == 1:
return players[0]
raise ExtractorError('No podcast episodes found')

View file

@ -10,6 +10,7 @@ import random
from .common import InfoExtractor
from ..aes import aes_cbc_decrypt
from ..compat import (
compat_HTTPError,
compat_b64decode,
compat_ord,
)
@ -18,11 +19,14 @@ from ..utils import (
bytes_to_long,
ExtractorError,
float_or_none,
int_or_none,
intlist_to_bytes,
long_to_bytes,
pkcs1pad,
strip_or_none,
urljoin,
try_get,
unified_strdate,
urlencode_postdata,
)
@ -31,16 +35,30 @@ class ADNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?animedigitalnetwork\.fr/video/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
'md5': 'e497370d847fd79d9d4c74be55575c7a',
'md5': '0319c99885ff5547565cacb4f3f9348d',
'info_dict': {
'id': '7778',
'ext': 'mp4',
'title': 'Blue Exorcist - Kyôto Saga - Épisode 1',
'title': 'Blue Exorcist - Kyôto Saga - Episode 1',
'description': 'md5:2f7b5aa76edbc1a7a92cedcda8a528d5',
'series': 'Blue Exorcist - Kyôto Saga',
'duration': 1467,
'release_date': '20170106',
'comment_count': int,
'average_rating': float,
'season_number': 2,
'episode': 'Début des hostilités',
'episode_number': 1,
}
}
_NETRC_MACHINE = 'animedigitalnetwork'
_BASE_URL = 'http://animedigitalnetwork.fr'
_RSA_KEY = (0xc35ae1e4356b65a73b551493da94b8cb443491c0aa092a357a5aee57ffc14dda85326f42d716e539a34542a0d3f363adf16c5ec222d713d5997194030ee2e4f0d1fb328c01a81cf6868c090d50de8e169c6b13d1675b9eeed1cbc51e1fffca9b38af07f37abd790924cd3bee59d0257cfda4fe5f3f0534877e21ce5821447d1b, 65537)
_API_BASE_URL = 'https://gw.api.animedigitalnetwork.fr/'
_PLAYER_BASE_URL = _API_BASE_URL + 'player/'
_HEADERS = {}
_LOGIN_ERR_MESSAGE = 'Unable to log in'
_RSA_KEY = (0x9B42B08905199A5CCE2026274399CA560ECB209EE9878A708B1C0812E1BB8CB5D1FB7441861147C1A1F2F3A0476DD63A9CAC20D3E983613346850AA6CB38F16DC7D720FD7D86FC6E5B3D5BBC72E14CD0BF9E869F2CEA2CCAD648F1DCE38F1FF916CEFB2D339B64AA0264372344BC775E265E8A852F88144AB0BD9AA06C1A4ABB, 65537)
_POS_ALIGN_MAP = {
'start': 1,
'end': 3,
@ -54,26 +72,24 @@ class ADNIE(InfoExtractor):
def _ass_subtitles_timecode(seconds):
return '%01d:%02d:%02d.%02d' % (seconds / 3600, (seconds % 3600) / 60, seconds % 60, (seconds % 1) * 100)
def _get_subtitles(self, sub_path, video_id):
if not sub_path:
def _get_subtitles(self, sub_url, video_id):
if not sub_url:
return None
enc_subtitles = self._download_webpage(
urljoin(self._BASE_URL, sub_path),
video_id, 'Downloading subtitles location', fatal=False) or '{}'
sub_url, video_id, 'Downloading subtitles location', fatal=False) or '{}'
subtitle_location = (self._parse_json(enc_subtitles, video_id, fatal=False) or {}).get('location')
if subtitle_location:
enc_subtitles = self._download_webpage(
urljoin(self._BASE_URL, subtitle_location),
video_id, 'Downloading subtitles data', fatal=False,
headers={'Origin': 'https://animedigitalnetwork.fr'})
subtitle_location, video_id, 'Downloading subtitles data',
fatal=False, headers={'Origin': 'https://animedigitalnetwork.fr'})
if not enc_subtitles:
return None
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
bytes_to_intlist(binascii.unhexlify(self._K + '4b8ef13ec1872730')),
bytes_to_intlist(binascii.unhexlify(self._K + 'ab9f52f5baae7c72')),
bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
))
subtitles_json = self._parse_json(
@ -117,61 +133,100 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
}])
return subtitles
def _real_initialize(self):
username, password = self._get_login_info()
if not username:
return
try:
access_token = (self._download_json(
self._API_BASE_URL + 'authentication/login', None,
'Logging in', self._LOGIN_ERR_MESSAGE, fatal=False,
data=urlencode_postdata({
'password': password,
'rememberMe': False,
'source': 'Web',
'username': username,
})) or {}).get('accessToken')
if access_token:
self._HEADERS = {'authorization': 'Bearer ' + access_token}
except ExtractorError as e:
message = None
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
resp = self._parse_json(
e.cause.read().decode(), None, fatal=False) or {}
message = resp.get('message') or resp.get('code')
self.report_warning(message or self._LOGIN_ERR_MESSAGE)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player_config = self._parse_json(self._search_regex(
r'playerConfig\s*=\s*({.+});', webpage,
'player config', default='{}'), video_id, fatal=False)
if not player_config:
config_url = urljoin(self._BASE_URL, self._search_regex(
r'(?:id="player"|class="[^"]*adn-player-container[^"]*")[^>]+data-url="([^"]+)"',
webpage, 'config url'))
player_config = self._download_json(
config_url, video_id,
'Downloading player config JSON metadata')['player']
video_base_url = self._PLAYER_BASE_URL + 'video/%s/' % video_id
player = self._download_json(
video_base_url + 'configuration', video_id,
'Downloading player config JSON metadata',
headers=self._HEADERS)['player']
options = player['options']
video_info = {}
video_info_str = self._search_regex(
r'videoInfo\s*=\s*({.+});', webpage,
'video info', fatal=False)
if video_info_str:
video_info = self._parse_json(
video_info_str, video_id, fatal=False) or {}
user = options['user']
if not user.get('hasAccess'):
self.raise_login_required()
options = player_config.get('options') or {}
metas = options.get('metas') or {}
links = player_config.get('links') or {}
sub_path = player_config.get('subtitles')
error = None
if not links:
links_url = player_config.get('linksurl') or options['videoUrl']
token = options['token']
self._K = ''.join([random.choice('0123456789abcdef') for _ in range(16)])
message = bytes_to_intlist(json.dumps({
'k': self._K,
'e': 60,
't': token,
}))
token = self._download_json(
user.get('refreshTokenUrl') or (self._PLAYER_BASE_URL + 'refresh/token'),
video_id, 'Downloading access token', headers={
'x-player-refresh-token': user['refreshToken']
}, data=b'')['token']
links_url = try_get(options, lambda x: x['video']['url']) or (video_base_url + 'link')
self._K = ''.join([random.choice('0123456789abcdef') for _ in range(16)])
message = bytes_to_intlist(json.dumps({
'k': self._K,
't': token,
}))
# Sometimes authentication fails for no good reason, retry with
# a different random padding
links_data = None
for _ in range(3):
padded_message = intlist_to_bytes(pkcs1pad(message, 128))
n, e = self._RSA_KEY
encrypted_message = long_to_bytes(pow(bytes_to_long(padded_message), e, n))
authorization = base64.b64encode(encrypted_message).decode()
links_data = self._download_json(
urljoin(self._BASE_URL, links_url), video_id,
'Downloading links JSON metadata', headers={
'Authorization': 'Bearer ' + authorization,
})
links = links_data.get('links') or {}
metas = metas or links_data.get('meta') or {}
sub_path = sub_path or links_data.get('subtitles') or \
'index.php?option=com_vodapi&task=subtitles.getJSON&format=json&id=' + video_id
sub_path += '&token=' + token
error = links_data.get('error')
title = metas.get('title') or video_info['title']
try:
links_data = self._download_json(
links_url, video_id, 'Downloading links JSON metadata', headers={
'X-Player-Token': authorization
}, query={
'freeWithAds': 'true',
'adaptive': 'false',
'withMetadata': 'true',
'source': 'Web'
})
break
except ExtractorError as e:
if not isinstance(e.cause, compat_HTTPError):
raise e
if e.cause.code == 401:
# This usually goes away with a different random pkcs1pad, so retry
continue
error = self._parse_json(e.cause.read(), video_id)
message = error.get('message')
if e.cause.code == 403 and error.get('code') == 'player-bad-geolocation-country':
self.raise_geo_restricted(msg=message)
raise ExtractorError(message)
else:
raise ExtractorError('Giving up retrying')
links = links_data.get('links') or {}
metas = links_data.get('metadata') or {}
sub_url = (links.get('subtitles') or {}).get('all')
video_info = links_data.get('video') or {}
title = metas['title']
formats = []
for format_id, qualities in links.items():
for format_id, qualities in (links.get('streaming') or {}).items():
if not isinstance(qualities, dict):
continue
for quality, load_balancer_url in qualities.items():
@ -189,19 +244,26 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
for f in m3u8_formats:
f['language'] = 'fr'
formats.extend(m3u8_formats)
if not error:
error = options.get('error')
if not formats and error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
self._sort_formats(formats)
video = (self._download_json(
self._API_BASE_URL + 'video/%s' % video_id, video_id,
'Downloading additional video metadata', fatal=False) or {}).get('video') or {}
show = video.get('show') or {}
return {
'id': video_id,
'title': title,
'description': strip_or_none(metas.get('summary') or video_info.get('resume')),
'thumbnail': video_info.get('image'),
'description': strip_or_none(metas.get('summary') or video.get('summary')),
'thumbnail': video_info.get('image') or player.get('image'),
'formats': formats,
'subtitles': self.extract_subtitles(sub_path, video_id),
'episode': metas.get('subtitle') or video_info.get('videoTitle'),
'series': video_info.get('playlistTitle'),
'subtitles': self.extract_subtitles(sub_url, video_id),
'episode': metas.get('subtitle') or video.get('name'),
'episode_number': int_or_none(video.get('shortNumber')),
'series': show.get('title'),
'season_number': int_or_none(video.get('season')),
'duration': int_or_none(video_info.get('duration') or video.get('duration')),
'release_date': unified_strdate(video.get('releaseDate')),
'average_rating': float_or_none(video.get('rating') or metas.get('rating')),
'comment_count': int_or_none(video.get('commentsCount')),
}

View file

@ -5,20 +5,32 @@ import re
from .theplatform import ThePlatformIE
from ..utils import (
extract_attributes,
ExtractorError,
GeoRestrictedError,
int_or_none,
smuggle_url,
update_url_query,
)
from ..compat import (
compat_urlparse,
urlencode_postdata,
)
class AENetworksBaseIE(ThePlatformIE):
_BASE_URL_REGEX = r'''(?x)https?://
(?:(?:www|play|watch)\.)?
(?P<domain>
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/'''
_THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t'
_DOMAIN_MAP = {
'history.com': ('HISTORY', 'history'),
'aetv.com': ('AETV', 'aetv'),
'mylifetime.com': ('LIFETIME', 'lifetime'),
'lifetimemovieclub.com': ('LIFETIMEMOVIECLUB', 'lmc'),
'fyi.tv': ('FYI', 'fyi'),
'historyvault.com': (None, 'historyvault'),
'biography.com': (None, 'biography'),
}
def _extract_aen_smil(self, smil_url, video_id, auth=None):
query = {'mbr': 'true'}
@ -31,7 +43,7 @@ class AENetworksBaseIE(ThePlatformIE):
'assetTypes': 'high_video_s3'
}, {
'assetTypes': 'high_video_s3',
'switch': 'hls_ingest_fastly'
'switch': 'hls_high_fastly',
}]
formats = []
subtitles = {}
@ -44,6 +56,8 @@ class AENetworksBaseIE(ThePlatformIE):
tp_formats, tp_subtitles = self._extract_theplatform_smil(
m_url, video_id, 'Downloading %s SMIL data' % (q.get('switch') or q['assetTypes']))
except ExtractorError as e:
if isinstance(e, GeoRestrictedError):
raise
last_e = e
continue
formats.extend(tp_formats)
@ -57,24 +71,45 @@ class AENetworksBaseIE(ThePlatformIE):
'subtitles': subtitles,
}
def _extract_aetn_info(self, domain, filter_key, filter_value, url):
requestor_id, brand = self._DOMAIN_MAP[domain]
result = self._download_json(
'https://feeds.video.aetnd.com/api/v2/%s/videos' % brand,
filter_value, query={'filter[%s]' % filter_key: filter_value})['results'][0]
title = result['title']
video_id = result['id']
media_url = result['publicUrl']
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
auth = None
if theplatform_metadata.get('AETN$isBehindWall'):
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._extract_aen_smil(media_url, video_id, auth))
info.update({
'title': title,
'series': result.get('seriesName'),
'season_number': int_or_none(result.get('tvSeasonNumber')),
'episode_number': int_or_none(result.get('tvSeasonEpisodeNumber')),
})
return info
class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?P<domain>
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/
(?:
shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|
movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?|
specials/(?P<special_display_id>[^/]+)/(?:full-special|preview-)|
collections/[^/]+/(?P<collection_display_id>[^/]+)
)
'''
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'''(?P<id>
shows/[^/]+/season-\d+/episode-\d+|
(?:
(?:movie|special)s/[^/]+|
(?:shows/[^/]+/)?videos
)/[^/?#&]+
)'''
_TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'info_dict': {
@ -91,22 +126,23 @@ class AENetworksIE(AENetworksBaseIE):
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.history.com/shows/ancient-aliens/season-1',
'info_dict': {
'id': '71889446852',
},
'playlist_mincount': 5,
}, {
'url': 'http://www.mylifetime.com/shows/atlanta-plastic',
'info_dict': {
'id': 'SERIES4317',
'title': 'Atlanta Plastic',
},
'playlist_mincount': 2,
'skip': 'This video is only available for users of participating TV providers.',
}, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'only_matching': True
'info_dict': {
'id': '600587331957',
'ext': 'mp4',
'title': 'Inlawful Entry',
'description': 'md5:57c12115a2b384d883fe64ca50529e08',
'timestamp': 1452634428,
'upload_date': '20160112',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True
@ -117,78 +153,125 @@ class AENetworksIE(AENetworksBaseIE):
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True
}, {
'url': 'https://www.lifetimemovieclub.com/movies/a-killer-among-us',
'url': 'https://watch.lifetimemovieclub.com/movies/10-year-reunion/full-movie',
'only_matching': True
}, {
'url': 'http://www.history.com/specials/sniper-into-the-kill-zone/full-special',
'only_matching': True
}, {
'url': 'https://www.historyvault.com/collections/america-the-story-of-us/westward',
'only_matching': True
}, {
'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story/preview-hunting-jonbenets-killer-the-untold-story',
'only_matching': True
}, {
'url': 'http://www.history.com/videos/history-of-valentines-day',
'only_matching': True
}, {
'url': 'https://play.aetv.com/shows/duck-dynasty/videos/best-of-duck-dynasty-getting-quack-in-shape',
'only_matching': True
}]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME',
'lifetimemovieclub.com': 'LIFETIMEMOVIECLUB',
'fyi.tv': 'FYI',
}
def _real_extract(self, url):
domain, show_path, movie_display_id, special_display_id, collection_display_id = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id or special_display_id or collection_display_id
webpage = self._download_webpage(url, display_id, headers=self.geo_verification_headers())
if show_path:
url_parts = show_path.split('/')
url_parts_len = len(url_parts)
if url_parts_len == 1:
entries = []
for season_url_path in re.findall(r'(?s)<li[^>]+data-href="(/shows/%s/season-\d+)"' % url_parts[0], webpage):
entries.append(self.url_result(
compat_urlparse.urljoin(url, season_url_path), 'AENetworks'))
if entries:
return self.playlist_result(
entries, self._html_search_meta('aetn:SeriesId', webpage),
self._html_search_meta('aetn:SeriesTitle', webpage))
else:
# single season
url_parts_len = 2
if url_parts_len == 2:
entries = []
for episode_item in re.findall(r'(?s)<[^>]+class="[^"]*(?:episode|program)-item[^"]*"[^>]*>', webpage):
episode_attributes = extract_attributes(episode_item)
episode_url = compat_urlparse.urljoin(
url, episode_attributes['data-canonical'])
entries.append(self.url_result(
episode_url, 'AENetworks',
episode_attributes.get('data-videoid') or episode_attributes.get('data-video-id')))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeasonId', webpage))
domain, canonical = re.match(self._VALID_URL, url).groups()
return self._extract_aetn_info(domain, 'canonical', '/' + canonical, url)
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex(
[r"media_url\s*=\s*'(?P<url>[^']+)'",
r'data-media-url=(?P<url>(?:https?:)?//[^\s>]+)',
r'data-media-url=(["\'])(?P<url>(?:(?!\1).)+?)\1'],
webpage, 'video url', group='url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
auth = None
if theplatform_metadata.get('AETN$isBehindWall'):
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._search_json_ld(webpage, video_id, fatal=False))
info.update(self._extract_aen_smil(media_url, video_id, auth))
return info
class AENetworksListBaseIE(AENetworksBaseIE):
def _call_api(self, resource, slug, brand, fields):
return self._download_json(
'https://yoga.appsvcs.aetnd.com/graphql',
slug, query={'brand': brand}, data=urlencode_postdata({
'query': '''{
%s(slug: "%s") {
%s
}
}''' % (resource, slug, fields),
}))['data'][resource]
def _real_extract(self, url):
domain, slug = re.match(self._VALID_URL, url).groups()
_, brand = self._DOMAIN_MAP[domain]
playlist = self._call_api(self._RESOURCE, slug, brand, self._FIELDS)
base_url = 'http://watch.%s' % domain
entries = []
for item in (playlist.get(self._ITEMS_KEY) or []):
doc = self._get_doc(item)
canonical = doc.get('canonical')
if not canonical:
continue
entries.append(self.url_result(
base_url + canonical, AENetworksIE.ie_key(), doc.get('id')))
description = None
if self._PLAYLIST_DESCRIPTION_KEY:
description = playlist.get(self._PLAYLIST_DESCRIPTION_KEY)
return self.playlist_result(
entries, playlist.get('id'),
playlist.get(self._PLAYLIST_TITLE_KEY), description)
class AENetworksCollectionIE(AENetworksListBaseIE):
IE_NAME = 'aenetworks:collection'
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'(?:[^/]+/)*(?:list|collections)/(?P<id>[^/?#&]+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'https://watch.historyvault.com/list/america-the-story-of-us',
'info_dict': {
'id': '282',
'title': 'America The Story of Us',
},
'playlist_mincount': 12,
}, {
'url': 'https://watch.historyvault.com/shows/america-the-story-of-us-2/season-1/list/america-the-story-of-us',
'only_matching': True
}, {
'url': 'https://www.historyvault.com/collections/mysteryquest',
'only_matching': True
}]
_RESOURCE = 'list'
_ITEMS_KEY = 'items'
_PLAYLIST_TITLE_KEY = 'display_title'
_PLAYLIST_DESCRIPTION_KEY = None
_FIELDS = '''id
display_title
items {
... on ListVideoItem {
doc {
canonical
id
}
}
}'''
def _get_doc(self, item):
return item.get('doc') or {}
class AENetworksShowIE(AENetworksListBaseIE):
IE_NAME = 'aenetworks:show'
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'shows/(?P<id>[^/?#&]+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'http://www.history.com/shows/ancient-aliens',
'info_dict': {
'id': 'SERIES1574',
'title': 'Ancient Aliens',
'description': 'md5:3f6d74daf2672ff3ae29ed732e37ea7f',
},
'playlist_mincount': 150,
}]
_RESOURCE = 'series'
_ITEMS_KEY = 'episodes'
_PLAYLIST_TITLE_KEY = 'title'
_PLAYLIST_DESCRIPTION_KEY = 'description'
_FIELDS = '''description
id
title
episodes {
canonical
id
}'''
def _get_doc(self, item):
return item
class HistoryTopicIE(AENetworksBaseIE):
@ -204,6 +287,7 @@ class HistoryTopicIE(AENetworksBaseIE):
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
'timestamp': 1375819729,
'upload_date': '20130806',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
@ -212,36 +296,47 @@ class HistoryTopicIE(AENetworksBaseIE):
'add_ie': ['ThePlatform'],
}]
def theplatform_url_result(self, theplatform_url, video_id, query):
return {
'_type': 'url_transparent',
'id': video_id,
'url': smuggle_url(
update_url_query(theplatform_url, query),
{
'sig': {
'key': self._THEPLATFORM_KEY,
'secret': self._THEPLATFORM_SECRET,
},
'force_smil_url': True
}),
'ie_key': 'ThePlatform',
}
def _real_extract(self, url):
display_id = self._match_id(url)
return self.url_result(
'http://www.history.com/videos/' + display_id,
AENetworksIE.ie_key())
class HistoryPlayerIE(AENetworksBaseIE):
IE_NAME = 'history:player'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|biography)\.com)/player/(?P<id>\d+)'
_TESTS = []
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_aetn_info(domain, 'id', video_id, url)
class BiographyIE(AENetworksBaseIE):
_VALID_URL = r'https?://(?:www\.)?biography\.com/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.biography.com/video/vincent-van-gogh-full-episode-2075049808',
'info_dict': {
'id': '30322987',
'ext': 'mp4',
'title': 'Vincent Van Gogh - Full Episode',
'description': 'A full biography about the most influential 20th century painter, Vincent Van Gogh.',
'timestamp': 1311970571,
'upload_date': '20110729',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'<phoenix-iframe[^>]+src="[^"]+\btpid=(\d+)', webpage, 'tpid')
result = self._download_json(
'https://feeds.video.aetnd.com/api/v2/history/videos',
video_id, query={'filter[id]': video_id})['results'][0]
title = result['title']
info = self._extract_aen_smil(result['publicUrl'], video_id)
info.update({
'title': title,
'description': result.get('description'),
'duration': int_or_none(result.get('duration')),
'timestamp': int_or_none(result.get('added'), 1000),
})
return info
player_url = self._search_regex(
r'<phoenix-iframe[^>]+src="(%s)' % HistoryPlayerIE._VALID_URL,
webpage, 'player URL')
return self.url_result(player_url, HistoryPlayerIE.ie_key())

View file

@ -1,13 +1,16 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aljazeera\.com/(?:programmes|video)/.*?/(?P<id>[^/]+)\.html'
_VALID_URL = r'https?://(?:www\.)?aljazeera\.com/(?P<type>program/[^/]+|(?:feature|video)s)/\d{4}/\d{1,2}/\d{1,2}/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',
'url': 'https://www.aljazeera.com/program/episode/2014/9/19/deliverance',
'info_dict': {
'id': '3792260579001',
'ext': 'mp4',
@ -20,14 +23,34 @@ class AlJazeeraIE(InfoExtractor):
'add_ie': ['BrightcoveNew'],
'skip': 'Not accessible from Travis CI server',
}, {
'url': 'http://www.aljazeera.com/video/news/2017/05/sierra-leone-709-carat-diamond-auctioned-170511100111930.html',
'url': 'https://www.aljazeera.com/videos/2017/5/11/sierra-leone-709-carat-diamond-to-be-auctioned-off',
'only_matching': True,
}, {
'url': 'https://www.aljazeera.com/features/2017/8/21/transforming-pakistans-buses-into-art',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/665003303001/default_default/index.html?videoId=%s'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
def _real_extract(self, url):
program_name = self._match_id(url)
webpage = self._download_webpage(url, program_name)
brightcove_id = self._search_regex(
r'RenderPagesVideo\(\'(.+?)\'', webpage, 'brightcove id')
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
post_type, name = re.match(self._VALID_URL, url).groups()
post_type = {
'features': 'post',
'program': 'episode',
'videos': 'video',
}[post_type.split('/')[0]]
video = self._download_json(
'https://www.aljazeera.com/graphql', name, query={
'operationName': 'SingleArticleQuery',
'variables': json.dumps({
'name': name,
'postType': post_type,
}),
}, headers={
'wp-site': 'aje',
})['data']['article']['video']
video_id = video['id']
account_id = video.get('accountId') or '665003303001'
player_id = video.get('playerId') or 'BkeSH5BDb'
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % (account_id, player_id, video_id),
'BrightcoveNew', video_id)

View file

@ -0,0 +1,103 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .youtube import YoutubeIE
from .vimeo import VimeoIE
from ..utils import (
int_or_none,
parse_iso8601,
update_url_query,
)
class AmaraIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?amara\.org/(?:\w+/)?videos/(?P<id>\w+)'
_TESTS = [{
# Youtube
'url': 'https://amara.org/en/videos/jVx79ZKGK1ky/info/why-jury-trials-are-becoming-less-common/?tab=video',
'md5': 'ea10daf2b6154b8c1ecf9922aca5e8ae',
'info_dict': {
'id': 'h6ZuVdvYnfE',
'ext': 'mp4',
'title': 'Why jury trials are becoming less common',
'description': 'md5:a61811c319943960b6ab1c23e0cbc2c1',
'thumbnail': r're:^https?://.*\.jpg$',
'subtitles': dict,
'upload_date': '20160813',
'uploader': 'PBS NewsHour',
'uploader_id': 'PBSNewsHour',
'timestamp': 1549639570,
}
}, {
# Vimeo
'url': 'https://amara.org/en/videos/kYkK1VUTWW5I/info/vimeo-at-ces-2011',
'md5': '99392c75fa05d432a8f11df03612195e',
'info_dict': {
'id': '18622084',
'ext': 'mov',
'title': 'Vimeo at CES 2011!',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'subtitles': dict,
'timestamp': 1294763658,
'upload_date': '20110111',
'uploader': 'Sam Morrill',
'uploader_id': 'sammorrill'
}
}, {
# Direct Link
'url': 'https://amara.org/en/videos/s8KL7I3jLmh6/info/the-danger-of-a-single-story/',
'md5': 'd3970f08512738ee60c5807311ff5d3f',
'info_dict': {
'id': 's8KL7I3jLmh6',
'ext': 'mp4',
'title': 'The danger of a single story',
'description': 'md5:d769b31139c3b8bb5be9177f62ea3f23',
'thumbnail': r're:^https?://.*\.jpg$',
'subtitles': dict,
'upload_date': '20091007',
'timestamp': 1254942511,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
meta = self._download_json(
'https://amara.org/api/videos/%s/' % video_id,
video_id, query={'format': 'json'})
title = meta['title']
video_url = meta['all_urls'][0]
subtitles = {}
for language in (meta.get('languages') or []):
subtitles_uri = language.get('subtitles_uri')
if not (subtitles_uri and language.get('published')):
continue
subtitle = subtitles.setdefault(language.get('code') or 'en', [])
for f in ('json', 'srt', 'vtt'):
subtitle.append({
'ext': f,
'url': update_url_query(subtitles_uri, {'format': f}),
})
info = {
'url': video_url,
'id': video_id,
'subtitles': subtitles,
'title': title,
'description': meta.get('description'),
'thumbnail': meta.get('thumbnail'),
'duration': int_or_none(meta.get('duration')),
'timestamp': parse_iso8601(meta.get('created')),
}
for ie in (YoutubeIE, VimeoIE):
if ie.suitable(video_url):
info.update({
'_type': 'url_transparent',
'ie_key': ie.ie_key(),
})
break
return info

View file

@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .theplatform import ThePlatformIE
from ..utils import (
int_or_none,
@ -11,25 +13,22 @@ from ..utils import (
class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?:movies|shows(?:/[^/]+)+)/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?P<site>amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?P<id>(?:movies|shows(?:/[^/]+)+)/[^/?#&]+)'
_TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
'md5': '',
'url': 'https://www.bbcamerica.com/shows/the-graham-norton-show/videos/tina-feys-adorable-airline-themed-family-dinner--51631',
'info_dict': {
'id': 's3MX01Nl4vPH',
'id': '4Lq1dzOnZGt0',
'ext': 'mp4',
'title': 'Maron - Season 4 - Step 1',
'description': 'In denial about his current situation, Marc is reluctantly convinced by his friends to enter rehab. Starring Marc Maron and Constance Zimmer.',
'age_limit': 17,
'upload_date': '20160505',
'timestamp': 1462468831,
'title': "The Graham Norton Show - Season 28 - Tina Fey's Adorable Airline-Themed Family Dinner",
'description': "It turns out child stewardesses are very generous with the wine! All-new episodes of 'The Graham Norton Show' premiere Fridays at 11/10c on BBC America.",
'upload_date': '20201120',
'timestamp': 1605904350,
'uploader': 'AMCN',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Requires TV provider accounts',
}, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True,
@ -55,32 +54,34 @@ class AMCNetworksIE(ThePlatformIE):
'url': 'https://www.sundancetv.com/shows/riviera/full-episodes/season-1/episode-01-episode-1',
'only_matching': True,
}]
_REQUESTOR_ID_MAP = {
'amc': 'AMC',
'bbcamerica': 'BBCA',
'ifc': 'IFC',
'sundancetv': 'SUNDANCE',
'wetv': 'WETV',
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
site, display_id = re.match(self._VALID_URL, url).groups()
requestor_id = self._REQUESTOR_ID_MAP[site]
properties = self._download_json(
'https://content-delivery-gw.svc.ds.amcn.com/api/v2/content/amcn/%s/url/%s' % (requestor_id.lower(), display_id),
display_id)['data']['properties']
query = {
'mbr': 'true',
'manifest': 'm3u',
}
media_url = self._search_regex(
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)',
webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
tp_path = 'M_UwQC/media/' + properties['videoPid']
media_url = 'https://link.theplatform.com/s/' + tp_path
theplatform_metadata = self._download_theplatform_metadata(tp_path, display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = try_get(
theplatform_metadata, lambda x: x['ratings'][0]['rating'])
auth_required = self._search_regex(
r'window\.authRequired\s*=\s*(true|false);',
webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(
r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
webpage, 'requestor id')
video_category = properties.get('videoCategory')
if video_category and video_category.endswith('-Auth'):
resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(

View file

@ -1,82 +1,159 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
js_to_json,
try_get,
unified_strdate,
unified_timestamp,
)
class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?(?:americastestkitchen|cooks(?:country|illustrated))\.com/(?P<resource_type>episode|videos)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': {
'id': '5b400b9ee338f922cb06450c',
'title': 'Weeknight Japanese Suppers',
'title': 'Japanese Suppers',
'ext': 'mp4',
'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8',
'description': 'md5:64e606bfee910627efc4b5f050de92b3',
'thumbnail': r're:^https?://',
'timestamp': 1523664000,
'upload_date': '20180414',
'release_date': '20180414',
'timestamp': 1523318400,
'upload_date': '20180410',
'release_date': '20180410',
'series': "America's Test Kitchen",
'season_number': 18,
'episode': 'Weeknight Japanese Suppers',
'episode': 'Japanese Suppers',
'episode_number': 15,
},
'params': {
'skip_download': True,
},
}, {
# Metadata parsing behaves differently for newer episodes (705) as opposed to older episodes (582 above)
'url': 'https://www.americastestkitchen.com/episode/705-simple-chicken-dinner',
'md5': '06451608c57651e985a498e69cec17e5',
'info_dict': {
'id': '5fbe8c61bda2010001c6763b',
'title': 'Simple Chicken Dinner',
'ext': 'mp4',
'description': 'md5:eb68737cc2fd4c26ca7db30139d109e7',
'thumbnail': r're:^https?://',
'timestamp': 1610755200,
'upload_date': '20210116',
'release_date': '20210116',
'series': "America's Test Kitchen",
'season_number': 21,
'episode': 'Simple Chicken Dinner',
'episode_number': 3,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.americastestkitchen.com/videos/3420-pan-seared-salmon',
'only_matching': True,
}, {
'url': 'https://www.cookscountry.com/episode/564-when-only-chocolate-will-do',
'only_matching': True,
}, {
'url': 'https://www.cooksillustrated.com/videos/4478-beef-wellington',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
resource_type, video_id = re.match(self._VALID_URL, url).groups()
is_episode = resource_type == 'episode'
if is_episode:
resource_type = 'episodes'
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(
self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'),
video_id, js_to_json)
ep_data = try_get(
video_data,
(lambda x: x['episodeDetail']['content']['data'],
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
'description') or ep_meta.get('description'))
thumbnail = try_get(ep_meta, lambda x: x['photo']['image_url'])
release_date = unified_strdate(ep_data.get('aired_at'))
season_number = int_or_none(ep_meta.get('season_number'))
episode = ep_meta.get('title')
episode_number = int_or_none(ep_meta.get('episode_number'))
resource = self._download_json(
'https://www.americastestkitchen.com/api/v6/%s/%s' % (resource_type, video_id), video_id)
video = resource['video'] if is_episode else resource
episode = resource if is_episode else resource.get('episode') or {}
return {
'_type': 'url_transparent',
'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id,
'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % video['zypeId'],
'ie_key': 'Zype',
'title': title,
'description': description,
'thumbnail': thumbnail,
'release_date': release_date,
'series': "America's Test Kitchen",
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
'description': clean_html(video.get('description')),
'timestamp': unified_timestamp(video.get('publishDate')),
'release_date': unified_strdate(video.get('publishDate')),
'episode_number': int_or_none(episode.get('number')),
'season_number': int_or_none(episode.get('season')),
'series': try_get(episode, lambda x: x['show']['title']),
'episode': episode.get('title'),
}
class AmericasTestKitchenSeasonIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<show>americastestkitchen|cookscountry)\.com/episodes/browse/season_(?P<id>\d+)'
_TESTS = [{
# ATK Season
'url': 'https://www.americastestkitchen.com/episodes/browse/season_1',
'info_dict': {
'id': 'season_1',
'title': 'Season 1',
},
'playlist_count': 13,
}, {
# Cooks Country Season
'url': 'https://www.cookscountry.com/episodes/browse/season_12',
'info_dict': {
'id': 'season_12',
'title': 'Season 12',
},
'playlist_count': 13,
}]
def _real_extract(self, url):
show_name, season_number = re.match(self._VALID_URL, url).groups()
season_number = int(season_number)
slug = 'atk' if show_name == 'americastestkitchen' else 'cco'
season = 'Season %d' % season_number
season_search = self._download_json(
'https://y1fnzxui30-dsn.algolia.net/1/indexes/everest_search_%s_season_desc_production' % slug,
season, headers={
'Origin': 'https://www.%s.com' % show_name,
'X-Algolia-API-Key': '8d504d0099ed27c1b73708d22871d805',
'X-Algolia-Application-Id': 'Y1FNZXUI30',
}, query={
'facetFilters': json.dumps([
'search_season_list:' + season,
'search_document_klass:episode',
'search_show_slug:' + slug,
]),
'attributesToRetrieve': 'description,search_%s_episode_number,search_document_date,search_url,title' % slug,
'attributesToHighlight': '',
'hitsPerPage': 1000,
})
def entries():
for episode in (season_search.get('hits') or []):
search_url = episode.get('search_url')
if not search_url:
continue
yield {
'_type': 'url',
'url': 'https://www.%s.com%s' % (show_name, search_url),
'id': try_get(episode, lambda e: e['objectID'].split('_')[-1]),
'title': episode.get('title'),
'description': episode.get('description'),
'timestamp': unified_timestamp(episode.get('search_document_date')),
'season_number': season_number,
'episode_number': int_or_none(episode.get('search_%s_episode_number' % slug)),
'ie_key': AmericasTestKitchenIE.ie_key(),
}
return self.playlist_result(
entries(), 'season_%d' % season_number, season)

View file

@ -8,6 +8,7 @@ from ..utils import (
int_or_none,
mimetype2ext,
parse_iso8601,
unified_timestamp,
url_or_none,
)
@ -88,7 +89,7 @@ class AMPIE(InfoExtractor):
self._sort_formats(formats)
timestamp = parse_iso8601(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
timestamp = unified_timestamp(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
return {
'id': video_id,

View file

@ -116,8 +116,6 @@ class AnimeOnDemandIE(InfoExtractor):
r'(?s)<div[^>]+itemprop="description"[^>]*>(.+?)</div>',
webpage, 'anime description', default=None)
entries = []
def extract_info(html, video_id, num=None):
title, description = [None] * 2
formats = []
@ -233,7 +231,7 @@ class AnimeOnDemandIE(InfoExtractor):
self._sort_formats(info['formats'])
f = common_info.copy()
f.update(info)
entries.append(f)
yield f
# Extract teaser/trailer only when full episode is not available
if not info['formats']:
@ -247,7 +245,7 @@ class AnimeOnDemandIE(InfoExtractor):
'title': m.group('title'),
'url': urljoin(url, m.group('href')),
})
entries.append(f)
yield f
def extract_episodes(html):
for num, episode_html in enumerate(re.findall(
@ -275,7 +273,8 @@ class AnimeOnDemandIE(InfoExtractor):
'episode_number': episode_number,
}
extract_entries(episode_html, video_id, common_info)
for e in extract_entries(episode_html, video_id, common_info):
yield e
def extract_film(html, video_id):
common_info = {
@ -283,11 +282,18 @@ class AnimeOnDemandIE(InfoExtractor):
'title': anime_title,
'description': anime_description,
}
extract_entries(html, video_id, common_info)
for e in extract_entries(html, video_id, common_info):
yield e
extract_episodes(webpage)
def entries():
has_episodes = False
for e in extract_episodes(webpage):
has_episodes = True
yield e
if not entries:
extract_film(webpage, anime_id)
if not has_episodes:
for e in extract_film(webpage, anime_id):
yield e
return self.playlist_result(entries, anime_id, anime_title, anime_description)
return self.playlist_result(
entries(), anime_id, anime_title, anime_description)

View file

@ -116,7 +116,76 @@ class AnvatoIE(InfoExtractor):
'anvato_scripps_app_ios_prod_409c41960c60b308db43c3cc1da79cab9f1c3d93': 'WPxj5GraLTkYCyj3M7RozLqIycjrXOEcDGFMIJPn',
'EZqvRyKBJLrgpClDPDF8I7Xpdp40Vx73': '4OxGd2dEakylntVKjKF0UK9PDPYB6A9W',
'M2v78QkpleXm9hPp9jUXI63x5vA6BogR': 'ka6K32k7ZALmpINkjJUGUo0OE42Md1BQ',
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ'
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ',
'X8POa4zPPaKVZHqmWjuEzfP31b1QM9VN': 'Dn5vOY9ooDw7VSl9qztjZI5o0g08mA0z',
'M2v78QkBMpNJlSPp9diX5F2PBmBy6Bog': 'ka6K32kyo7nDZfNkjQCGWf1lpApXMd1B',
'bvJ0dQpav07l0hG5JgfVLF2dv1vARwpP': 'BzoQW24GrJZoJfmNodiJKSPeB9B8NOxj',
'lxQMLg2XZKuEZaWgsqubBxV9INZ6bryY': 'Vm2Mx6noKds9jB71h6urazwlTG3m9x8l',
'04EnjvXeoSmkbJ9ckPs7oY0mcxv7PlyN': 'aXERQP9LMfQVlEDsgGs6eEA1SWznAQ8P',
'mQbO2ge6BFRWVPYCYpU06YvNt80XLvAX': 'E2BV1NGmasN5v7eujECVPJgwflnLPm2A',
'g43oeBzJrCml7o6fa5fRL1ErCdeD8z4K': 'RX34mZ6zVH4Nr6whbxIGLv9WSbxEKo8V',
'VQrDJoP7mtdBzkxhXbSPwGB1coeElk4x': 'j2VejQx0VFKQepAF7dI0mJLKtOVJE18z',
'WxA5NzLRjCrmq0NUgaU5pdMDuZO7RJ4w': 'lyY5ADLKaIOLEgAsGQCveEMAcqnx3rY9',
'M4lpMXB71ie0PjMCjdFzVXq0SeRVqz49': 'n2zVkOqaLIv3GbLfBjcwW51LcveWOZ2e',
'dyDZGEqN8u8nkJZcJns0oxYmtP7KbGAn': 'VXOEqQW9BtEVLajfZQSLEqxgS5B7qn2D',
'E7QNjrVY5u5mGvgu67IoDgV1CjEND8QR': 'rz8AaDmdKIkLmPNhB5ILPJnjS5PnlL8d',
'a4zrqjoKlfzg0dwHEWtP31VqcLBpjm4g': 'LY9J16gwETdGWa3hjBu5o0RzuoQDjqXQ',
'dQP5BZroMsMVLO1hbmT5r2Enu86GjxA6': '7XR3oOdbPF6x3PRFLDCq9RkgsRjAo48V',
'M4lKNBO1NFe0PjMCj1tzVXq0SeRVqzA9': 'n2zoRqGLRUv3GbLfBmTwW51LcveWOZYe',
'nAZ7MZdpGCGg1pqFEbsoJOz2C60mv143': 'dYJgdqA9aT4yojETqGi7yNgoFADxqmXP',
'3y1MERYgOuE9NzbFgwhV6Wv2F0YKvbyz': '081xpZDQgC4VadLTavhWQxrku56DAgXV',
'bmQvmEXr5HWklBMCZOcpE2Z3HBYwqGyl': 'zxXPbVNyMiMAZldhr9FkOmA0fl4aKr2v',
'wA7oDNYldfr6050Hwxi52lPZiVlB86Ap': 'ZYK16aA7ni0d3l3c34uwpxD7CbReMm8Q',
'g43MbKMWmFml7o7sJoSRkXxZiXRvJ3QK': 'RX3oBJonvs4Nr6rUWBCGn3matRGqJPXV',
'mA9VdlqpLS0raGaSDvtoqNrBTzb8XY4q': '0XN4OjBD3fnW7r7IbmtJB4AyfOmlrE2r',
'mAajOwgkGt17oGoFmEuklMP9H0GnW54d': 'lXbBLPGyzikNGeGujAuAJGjZiwLRxyXR',
'vy8vjJ9kbUwrRqRu59Cj5dWZfzYErlAb': 'K8l7gpwaGcBpnAnCLNCmPZRdin3eaQX0',
'xQMWBpR8oHEZaWaSMGUb0avOHjLVYn4Y': 'm2MrN4vEaf9jB7BFy5Srb40jTrN67AYl',
'xyKEmVO3miRr6D6UVkt7oB8jtD6aJEAv': 'g2ddDebqDfqdgKgswyUKwGjbTWwzq923',
'7Qk0wa2D9FjKapacoJF27aLvUDKkLGA0': 'b2kgBEkephJaMkMTL7s1PLe4Ua6WyP2P',
'3QLg6nqmNTJ5VvVTo7f508LPidz1xwyY': 'g2L1GgpraipmAOAUqmIbBnPxHOmw4MYa',
'3y1B7zZjXTE9NZNSzZSVNPZaTNLjo6Qz': '081b5G6wzH4VagaURmcWbN5mT4JGEe2V',
'lAqnwvkw6SG6D8DSqmUg6DRLUp0w3G4x': 'O2pbP0xPDFNJjpjIEvcdryOJtpkVM4X5',
'awA7xd1N0Hr6050Hw2c52lPZiVlB864p': 'GZYKpn4aoT0d3l3c3PiwpxD7CbReMmXQ',
'jQVqPLl9YHL1WGWtR1HDgWBGT63qRNyV': '6X03ne6vrU4oWyWUN7tQVoajikxJR3Ye',
'GQRMR8mL7uZK797t7xH3eNzPIP5dOny1': 'm2vqPWGd4U31zWzSyasDRAoMT1PKRp8o',
'zydq9RdmRhXLkNkfNoTJlMzaF0lWekQB': '3X7LnvE7vH5nkEkSqLiey793Un7dLB8e',
'VQrDzwkB2IdBzjzu9MHPbEYkSB50gR4x': 'j2VebLzoKUKQeEesmVh0gM1eIp9jKz8z',
'mAa2wMamBs17oGoFmktklMP9H0GnW54d': 'lXbgP74xZTkNGeGujVUAJGjZiwLRxy8R',
'7yjB6ZLG6sW8R6RF2xcan1KGfJ5dNoyd': 'wXQkPorvPHZ45N5t4Jf6qwg5Tp4xvw29',
'a4zPpNeWGuzg0m0iX3tPeanGSkRKWXQg': 'LY9oa3QAyHdGW9Wu3Ri5JGeEik7l1N8Q',
'k2rneA2M38k25cXDwwSknTJlxPxQLZ6M': '61lyA2aEVDzklfdwmmh31saPxQx2VRjp',
'bK9Zk4OvPnvxduLgxvi8VUeojnjA02eV': 'o5jANYjbeMb4nfBaQvcLAt1jzLzYx6ze',
'5VD6EydM3R9orHmNMGInGCJwbxbQvGRw': 'w3zjmX7g4vnxzCxElvUEOiewkokXprkZ',
'70X35QbVYVYNPUmP9YfbzI06YqYQk2R1': 'vG4Aj2BMjMjoztB7zeFOnCVPJpJ8lMOa',
'26qYwQVG9p1Bks2GgBckjfDJOXOAMgG1': 'r4ev9X0mv5zqJc0yk5IBDcQOwZw8mnwQ',
'rvVKpA56MBXWlSxMw3cobT5pdkd4Dm7q': '1J7ZkY53pZ645c93owcLZuveE7E8B3rL',
'qN1zdy1zlYL23IWZGWtDvfV6WeWQWkJo': 'qN1zdy1zlYL23IWZGWtDvfV6WeWQWkJo',
'jdKqRGF16dKsBviMDae7IGDl7oTjEbVV': 'Q09l7vhlNxPFErIOK6BVCe7KnwUW5DVV',
'3QLkogW1OUJ5VvPsrDH56DY2u7lgZWyY': 'g2LRE1V9espmAOPhE4ubj4ZdUA57yDXa',
'wyJvWbXGBSdbkEzhv0CW8meou82aqRy8': 'M2wolPvyBIpQGkbT4juedD4ruzQGdK2y',
'7QkdZrzEkFjKap6IYDU2PB0oCNZORmA0': 'b2kN1l96qhJaMkPs9dt1lpjBfwqZoA8P',
'pvA05113MHG1w3JTYxc6DVlRCjErVz4O': 'gQXeAbblBUnDJ7vujbHvbRd1cxlz3AXO',
'mA9blJDZwT0raG1cvkuoeVjLC7ZWd54q': '0XN9jRPwMHnW7rvumgfJZOD9CJgVkWYr',
'5QwRN5qKJTvGKlDTmnf7xwNZcjRmvEy9': 'R2GP6LWBJU1QlnytwGt0B9pytWwAdDYy',
'eyn5rPPbkfw2KYxH32fG1q58CbLJzM40': 'p2gyqooZnS56JWeiDgfmOy1VugOQEBXn',
'3BABn3b5RfPJGDwilbHe7l82uBoR05Am': '7OYZG7KMVhbPdKJS3xcWEN3AuDlLNmXj',
'xA5zNGXD3HrmqMlF6OS5pdMDuZO7RJ4w': 'yY5DAm6r1IOLE3BCVMFveEMAcqnx3r29',
'g43PgW3JZfml7o6fDEURL1ErCdeD8zyK': 'RX3aQn1zrS4Nr6whDgCGLv9WSbxEKo2V',
'lAqp8WbGgiG6D8LTKJcg3O72CDdre1Qx': 'O2pnm6473HNJjpKuVosd3vVeh975yrX5',
'wyJbYEDxKSdbkJ6S6RhW8meou82aqRy8': 'M2wPm7EgRSpQGlAh70CedD4ruzQGdKYy',
'M4lgW28nLCe0PVdtaXszVXq0SeRVqzA9': 'n2zmJvg4jHv3G0ETNgiwW51LcveWOZ8e',
'5Qw3OVvp9FvGKlDTmOC7xwNZcjRmvEQ9': 'R2GzDdml9F1Qlnytw9s0B9pytWwAdD8y',
'vy8a98X7zCwrRqbHrLUjYzwDiK2b70Qb': 'K8lVwzyjZiBpnAaSGeUmnAgxuGOBxmY0',
'g4eGjJLLoiqRD3Pf9oT5O03LuNbLRDQp': '6XqD59zzpfN4EwQuaGt67qNpSyRBlnYy',
'g43OPp9boIml7o6fDOIRL1ErCdeD8z4K': 'RX33alNB4s4Nr6whDPUGLv9WSbxEKoXV',
'xA2ng9OkBcGKzDbTkKsJlx7dUK8R3dA5': 'z2aPnJvzBfObkwGC3vFaPxeBhxoMqZ8K',
'xyKEgBajZuRr6DEC0Kt7XpD1cnNW9gAv': 'g2ddlEBvRsqdgKaI4jUK9PrgfMexGZ23',
'BAogww51jIMa2JnH1BcYpXM5F658RNAL': 'rYWDmm0KptlkGv4FGJFMdZmjs9RDE6XR',
'BAokpg62VtMa2JnH1mHYpXM5F658RNAL': 'rYWryDnlNslkGv4FG4HMdZmjs9RDE62R',
'a4z1Px5e2hzg0m0iMMCPeanGSkRKWXAg': 'LY9eorNQGUdGW9WuKKf5JGeEik7l1NYQ',
'kAx69R58kF9nY5YcdecJdl2pFXP53WyX': 'gXyRxELpbfPvLeLSaRil0mp6UEzbZJ8L',
'BAoY13nwViMa2J2uo2cY6BlETgmdwryL': 'rYWwKzJmNFlkGvGtNoUM9bzwIJVzB1YR',
}
_MCP_TO_ACCESS_KEY_TABLE = {
@ -189,19 +258,17 @@ class AnvatoIE(InfoExtractor):
video_data_url += '&X-Anvato-Adst-Auth=' + base64.b64encode(auth_secret).decode('ascii')
anvrid = md5_text(time.time() * 1000 * random.random())[:30]
payload = {
'api': {
'anvrid': anvrid,
'anvstk': md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time,
self._ANVACK_TABLE.get(access_key, self._API_KEY))),
'anvts': server_time,
},
api = {
'anvrid': anvrid,
'anvts': server_time,
}
api['anvstk'] = md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time,
self._ANVACK_TABLE.get(access_key, self._API_KEY)))
return self._download_json(
video_data_url, video_id, transform_source=strip_jsonp,
data=json.dumps(payload).encode('utf-8'))
data=json.dumps({'api': api}).encode('utf-8'))
def _get_anvato_videos(self, access_key, video_id):
video_data = self._get_video_json(access_key, video_id)
@ -259,7 +326,7 @@ class AnvatoIE(InfoExtractor):
'description': video_data.get('def_description'),
'tags': video_data.get('def_tags', '').split(','),
'categories': video_data.get('categories'),
'thumbnail': video_data.get('thumbnail'),
'thumbnail': video_data.get('src_image_url') or video_data.get('thumbnail'),
'timestamp': int_or_none(video_data.get(
'ts_published') or video_data.get('ts_added')),
'uploader': video_data.get('mcp_id'),

View file

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .yahoo import YahooIE
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
@ -15,9 +15,9 @@ from ..utils import (
)
class AolIE(InfoExtractor):
class AolIE(YahooIE):
IE_NAME = 'aol.com'
_VALID_URL = r'(?:aol-video:|https?://(?:www\.)?aol\.(?:com|ca|co\.uk|de|jp)/video/(?:[^/]+/)*)(?P<id>[0-9a-f]+)'
_VALID_URL = r'(?:aol-video:|https?://(?:www\.)?aol\.(?:com|ca|co\.uk|de|jp)/video/(?:[^/]+/)*)(?P<id>\d{9}|[0-9a-f]{24}|[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12})'
_TESTS = [{
# video with 5min ID
@ -76,10 +76,16 @@ class AolIE(InfoExtractor):
}, {
'url': 'https://www.aol.jp/video/playlist/5a28e936a1334d000137da0c/5a28f3151e642219fde19831/',
'only_matching': True,
}, {
# Yahoo video
'url': 'https://www.aol.com/video/play/991e6700-ac02-11ea-99ff-357400036f61/24bbc846-3e30-3c46-915e-fe8ccd7fcc46/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
if '-' in video_id:
return self._extract_yahoo_video(video_id, 'us')
response = self._download_json(
'https://feedapi.b2c.on.aol.com/v1.0/app/videos/aolon/%s/details' % video_id,

View file

@ -6,25 +6,21 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
int_or_none,
url_or_none,
)
class APAIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.apa\.at/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_VALID_URL = r'(?P<base_url>https?://[^/]+\.apa\.at)/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'http://uvp.apa.at/embed/293f6d17-692a-44e3-9fd5-7b178f3a1029',
'md5': '2b12292faeb0a7d930c778c7a5b4759b',
'info_dict': {
'id': 'jjv85FdZ',
'id': '293f6d17-692a-44e3-9fd5-7b178f3a1029',
'ext': 'mp4',
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'title': '293f6d17-692a-44e3-9fd5-7b178f3a1029',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 254,
'timestamp': 1519211149,
'upload_date': '20180221',
},
}, {
'url': 'https://uvp-apapublisher.sf.apa.at/embed/2f94e9e6-d945-4db2-9548-f9a41ebf7b78',
@ -46,9 +42,11 @@ class APAIE(InfoExtractor):
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
video_id, base_url = mobj.group('id', 'base_url')
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
'%s/player/%s' % (base_url, video_id), video_id)
jwplatform_id = self._search_regex(
r'media[iI]d\s*:\s*["\'](?P<id>[a-zA-Z0-9]{8})', webpage,
@ -59,16 +57,18 @@ class APAIE(InfoExtractor):
'jwplatform:' + jwplatform_id, ie='JWPlatform',
video_id=video_id)
sources = self._parse_json(
self._search_regex(
r'sources\s*=\s*(\[.+?\])\s*;', webpage, 'sources'),
video_id, transform_source=js_to_json)
def extract(field, name=None):
return self._search_regex(
r'\b%s["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % field,
webpage, name or field, default=None, group='value')
title = extract('title') or video_id
description = extract('description')
thumbnail = extract('poster', 'thumbnail')
formats = []
for source in sources:
if not isinstance(source, dict):
continue
source_url = url_or_none(source.get('file'))
for format_id in ('hls', 'progressive'):
source_url = url_or_none(extract(format_id))
if not source_url:
continue
ext = determine_ext(source_url)
@ -77,18 +77,19 @@ class APAIE(InfoExtractor):
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
height = int_or_none(self._search_regex(
r'(\d+)\.mp4', source_url, 'height', default=None))
formats.append({
'url': source_url,
'format_id': format_id,
'height': height,
})
self._sort_formats(formats)
thumbnail = self._search_regex(
r'image\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='url')
return {
'id': video_id,
'title': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View file

@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
get_element_by_id,
int_or_none,
merge_dicts,
mimetype2ext,
@ -39,23 +40,15 @@ class AparatIE(InfoExtractor):
webpage = self._download_webpage(url, video_id, fatal=False)
if not webpage:
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id)
options = self._parse_json(
self._search_regex(
r'options\s*=\s*JSON\.parse\(\s*(["\'])(?P<value>(?:(?!\1).)+)\1\s*\)',
webpage, 'options', group='value'),
video_id)
player = options['plugins']['sabaPlayerPlugin']
options = self._parse_json(self._search_regex(
r'options\s*=\s*({.+?})\s*;', webpage, 'options'), video_id)
formats = []
for sources in player['multiSRC']:
for sources in (options.get('multiSRC') or []):
for item in sources:
if not isinstance(item, dict):
continue
@ -85,11 +78,12 @@ class AparatIE(InfoExtractor):
info = self._search_json_ld(webpage, video_id, default={})
if not info.get('title'):
info['title'] = player['title']
info['title'] = get_element_by_id('videoTitle', webpage) or \
self._html_search_meta(['og:title', 'twitter:title', 'DC.Title', 'title'], webpage, fatal=True)
return merge_dicts(info, {
'id': video_id,
'thumbnail': url_or_none(options.get('poster')),
'duration': int_or_none(player.get('duration')),
'duration': int_or_none(options.get('duration')),
'formats': formats,
})

View file

@ -9,10 +9,10 @@ from ..utils import (
class AppleConnectIE(InfoExtractor):
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/idsa\.(?P<id>[\w-]+)'
_TEST = {
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/(?:id)?sa\.(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://itunes.apple.com/us/post/idsa.4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'md5': 'e7c38568a01ea45402570e6029206723',
'md5': 'c1d41f72c8bcaf222e089434619316e4',
'info_dict': {
'id': '4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'ext': 'm4v',
@ -22,7 +22,10 @@ class AppleConnectIE(InfoExtractor):
'upload_date': '20150710',
'timestamp': 1436545535,
},
}
}, {
'url': 'https://itunes.apple.com/us/post/sa.0fe0229f-2457-11e5-9f40-1bb645f2d5d9',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -36,7 +39,7 @@ class AppleConnectIE(InfoExtractor):
video_data = self._parse_json(video_json, video_id)
timestamp = str_to_int(self._html_search_regex(r'data-timestamp="(\d+)"', webpage, 'timestamp'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count', default=None))
return {
'id': video_id,

View file

@ -0,0 +1,62 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_podcast_url,
int_or_none,
parse_iso8601,
try_get,
)
class ApplePodcastsIE(InfoExtractor):
_VALID_URL = r'https?://podcasts\.apple\.com/(?:[^/]+/)?podcast(?:/[^/]+){1,2}.*?\bi=(?P<id>\d+)'
_TESTS = [{
'url': 'https://podcasts.apple.com/us/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
'md5': 'df02e6acb11c10e844946a39e7222b08',
'info_dict': {
'id': '1000482637777',
'ext': 'mp3',
'title': '207 - Whitney Webb Returns',
'description': 'md5:13a73bade02d2e43737751e3987e1399',
'upload_date': '20200705',
'timestamp': 1593921600,
'duration': 6425,
'series': 'The Tim Dillon Show',
}
}, {
'url': 'https://podcasts.apple.com/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
'only_matching': True,
}, {
'url': 'https://podcasts.apple.com/podcast/207-whitney-webb-returns?i=1000482637777',
'only_matching': True,
}, {
'url': 'https://podcasts.apple.com/podcast/id1135137367?i=1000482637777',
'only_matching': True,
}]
def _real_extract(self, url):
episode_id = self._match_id(url)
webpage = self._download_webpage(url, episode_id)
ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id)
ember_data = ember_data.get(episode_id) or ember_data
episode = ember_data['data']['attributes']
description = episode.get('description') or {}
series = None
for inc in (ember_data.get('included') or []):
if inc.get('type') == 'media/podcast':
series = try_get(inc, lambda x: x['attributes']['name'])
return {
'id': episode_id,
'title': episode['name'],
'url': clean_podcast_url(episode['assetUrl']),
'description': description.get('standard') or description.get('short'),
'timestamp': parse_iso8601(episode.get('releaseDateTime')),
'duration': int_or_none(episode.get('durationInMilliseconds'), 1000),
'series': series,
}

View file

@ -2,15 +2,17 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
unified_strdate,
clean_html,
extract_attributes,
unified_strdate,
unified_timestamp,
)
class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'md5': '8af1d4cf447933ed3c7f4871162602db',
@ -19,8 +21,11 @@ class ArchiveOrgIE(InfoExtractor):
'ext': 'ogg',
'title': '1968 Demo - FJCC Conference Presentation Reel #1',
'description': 'md5:da45c349df039f1cc8075268eb1b5c25',
'upload_date': '19681210',
'uploader': 'SRI International'
'creator': 'SRI International',
'release_date': '19681210',
'uploader': 'SRI International',
'timestamp': 1268695290,
'upload_date': '20100315',
}
}, {
'url': 'https://archive.org/details/Cops1922',
@ -29,22 +34,43 @@ class ArchiveOrgIE(InfoExtractor):
'id': 'Cops1922',
'ext': 'mp4',
'title': 'Buster Keaton\'s "Cops" (1922)',
'description': 'md5:89e7c77bf5d965dd5c0372cfb49470f6',
'description': 'md5:43a603fd6c5b4b90d12a96b921212b9c',
'timestamp': 1387699629,
'upload_date': '20131222',
}
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'only_matching': True,
}, {
'url': 'https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\)",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
playlist = None
play8 = self._search_regex(
r'(<[^>]+\bclass=["\']js-play8-playlist[^>]+>)', webpage,
'playlist', default=None)
if play8:
attrs = extract_attributes(play8)
playlist = attrs.get('value')
if not playlist:
# Old jwplayer fallback
playlist = self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\)",
webpage, 'jwplayer playlist', default='[]')
jwplayer_playlist = self._parse_json(playlist, video_id, fatal=False)
if jwplayer_playlist:
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
else:
# HTML5 media fallback
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
info['id'] = video_id
def get_optional(metadata, field):
return metadata.get(field, [None])[0]
@ -58,8 +84,12 @@ class ArchiveOrgIE(InfoExtractor):
'description': clean_html(get_optional(metadata, 'description')),
})
if info.get('_type') != 'playlist':
creator = get_optional(metadata, 'creator')
info.update({
'uploader': get_optional(metadata, 'creator'),
'upload_date': unified_strdate(get_optional(metadata, 'date')),
'creator': creator,
'release_date': unified_strdate(get_optional(metadata, 'date')),
'uploader': get_optional(metadata, 'publisher') or creator,
'timestamp': unified_timestamp(get_optional(metadata, 'publicdate')),
'language': get_optional(metadata, 'language'),
})
return info

View file

@ -0,0 +1,174 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
int_or_none,
parse_iso8601,
try_get,
)
class ArcPublishingIE(InfoExtractor):
_UUID_REGEX = r'[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12}'
_VALID_URL = r'arcpublishing:(?P<org>[a-z]+):(?P<id>%s)' % _UUID_REGEX
_TESTS = [{
# https://www.adn.com/politics/2020/11/02/video-senate-candidates-campaign-in-anchorage-on-eve-of-election-day/
'url': 'arcpublishing:adn:8c99cb6e-b29c-4bc9-9173-7bf9979225ab',
'only_matching': True,
}, {
# https://www.bostonglobe.com/video/2020/12/30/metro/footage-released-showing-officer-talking-about-striking-protesters-with-car/
'url': 'arcpublishing:bostonglobe:232b7ae6-7d73-432d-bc0a-85dbf0119ab1',
'only_matching': True,
}, {
# https://www.actionnewsjax.com/video/live-stream/
'url': 'arcpublishing:cmg:cfb1cf1b-3ab5-4d1b-86c5-a5515d311f2a',
'only_matching': True,
}, {
# https://elcomercio.pe/videos/deportes/deporte-total-futbol-peruano-seleccion-peruana-la-valorizacion-de-los-peruanos-en-el-exterior-tras-un-2020-atipico-nnav-vr-video-noticia/
'url': 'arcpublishing:elcomercio:27a7e1f8-2ec7-4177-874f-a4feed2885b3',
'only_matching': True,
}, {
# https://www.clickondetroit.com/video/community/2020/05/15/events-surrounding-woodward-dream-cruise-being-canceled/
'url': 'arcpublishing:gmg:c8793fb2-8d44-4242-881e-2db31da2d9fe',
'only_matching': True,
}, {
# https://www.wabi.tv/video/2020/12/30/trenton-company-making-equipment-pfizer-covid-vaccine/
'url': 'arcpublishing:gray:0b0ba30e-032a-4598-8810-901d70e6033e',
'only_matching': True,
}, {
# https://www.lateja.cr/el-mundo/video-china-aprueba-con-condiciones-su-primera/dfcbfa57-527f-45ff-a69b-35fe71054143/video/
'url': 'arcpublishing:gruponacion:dfcbfa57-527f-45ff-a69b-35fe71054143',
'only_matching': True,
}, {
# https://www.fifthdomain.com/video/2018/03/09/is-america-vulnerable-to-a-cyber-attack/
'url': 'arcpublishing:mco:aa0ca6fe-1127-46d4-b32c-be0d6fdb8055',
'only_matching': True,
}, {
# https://www.vl.no/kultur/2020/12/09/en-melding-fra-en-lytter-endret-julelista-til-lewi-bergrud/
'url': 'arcpublishing:mentormedier:47a12084-650b-4011-bfd0-3699b6947b2d',
'only_matching': True,
}, {
# https://www.14news.com/2020/12/30/whiskey-theft-caught-camera-henderson-liquor-store/
'url': 'arcpublishing:raycom:b89f61f8-79fa-4c09-8255-e64237119bf7',
'only_matching': True,
}, {
# https://www.theglobeandmail.com/world/video-ethiopian-woman-who-became-symbol-of-integration-in-italy-killed-on/
'url': 'arcpublishing:tgam:411b34c1-8701-4036-9831-26964711664b',
'only_matching': True,
}, {
# https://www.pilotonline.com/460f2931-8130-4719-8ea1-ffcb2d7cb685-132.html
'url': 'arcpublishing:tronc:460f2931-8130-4719-8ea1-ffcb2d7cb685',
'only_matching': True,
}]
_POWA_DEFAULTS = [
(['cmg', 'prisa'], '%s-config-prod.api.cdn.arcpublishing.com/video'),
([
'adn', 'advancelocal', 'answers', 'bonnier', 'bostonglobe', 'demo',
'gmg', 'gruponacion', 'infobae', 'mco', 'nzme', 'pmn', 'raycom',
'spectator', 'tbt', 'tgam', 'tronc', 'wapo', 'wweek',
], 'video-api-cdn.%s.arcpublishing.com/api'),
]
@staticmethod
def _extract_urls(webpage, **kw):
entries = []
# https://arcpublishing.atlassian.net/wiki/spaces/POWA/overview
for powa_el in re.findall(r'(<div[^>]+class="[^"]*\bpowa\b[^"]*"[^>]+data-uuid="%s"[^>]*>)' % ArcPublishingIE._UUID_REGEX, webpage):
powa = extract_attributes(powa_el) or {}
org = powa.get('data-org')
uuid = powa.get('data-uuid')
if org and uuid:
entries.append('arcpublishing:%s:%s' % (org, uuid))
return entries
def _real_extract(self, url):
org, uuid = re.match(self._VALID_URL, url).groups()
for orgs, tmpl in self._POWA_DEFAULTS:
if org in orgs:
base_api_tmpl = tmpl
break
else:
base_api_tmpl = '%s-prod-cdn.video-api.arcpublishing.com/api'
if org == 'wapo':
org = 'washpost'
video = self._download_json(
'https://%s/v1/ansvideos/findByUuid' % (base_api_tmpl % org),
uuid, query={'uuid': uuid})[0]
title = video['headlines']['basic']
is_live = video.get('status') == 'live'
urls = []
formats = []
for s in video.get('streams', []):
s_url = s.get('url')
if not s_url or s_url in urls:
continue
urls.append(s_url)
stream_type = s.get('stream_type')
if stream_type == 'smil':
smil_formats = self._extract_smil_formats(
s_url, uuid, fatal=False)
for f in smil_formats:
if f['url'].endswith('/cfx/st'):
f['app'] = 'cfx/st'
if not f['play_path'].startswith('mp4:'):
f['play_path'] = 'mp4:' + f['play_path']
if isinstance(f['tbr'], float):
f['vbr'] = f['tbr'] * 1000
del f['tbr']
f['format_id'] = 'rtmp-%d' % f['vbr']
formats.extend(smil_formats)
elif stream_type in ('ts', 'hls'):
m3u8_formats = self._extract_m3u8_formats(
s_url, uuid, 'mp4', 'm3u8' if is_live else 'm3u8_native',
m3u8_id='hls', fatal=False)
if all([f.get('acodec') == 'none' for f in m3u8_formats]):
continue
for f in m3u8_formats:
if f.get('acodec') == 'none':
f['preference'] = -40
elif f.get('vcodec') == 'none':
f['preference'] = -50
height = f.get('height')
if not height:
continue
vbr = self._search_regex(
r'[_x]%d[_-](\d+)' % height, f['url'], 'vbr', default=None)
if vbr:
f['vbr'] = int(vbr)
formats.extend(m3u8_formats)
else:
vbr = int_or_none(s.get('bitrate'))
formats.append({
'format_id': '%s-%d' % (stream_type, vbr) if vbr else stream_type,
'vbr': vbr,
'width': int_or_none(s.get('width')),
'height': int_or_none(s.get('height')),
'filesize': int_or_none(s.get('filesize')),
'url': s_url,
'preference': -1,
})
self._sort_formats(
formats, ('preference', 'width', 'height', 'vbr', 'filesize', 'tbr', 'ext', 'format_id'))
subtitles = {}
for subtitle in (try_get(video, lambda x: x['subtitles']['urls'], list) or []):
subtitle_url = subtitle.get('url')
if subtitle_url:
subtitles.setdefault('en', []).append({'url': subtitle_url})
return {
'id': uuid,
'title': self._live_title(title) if is_live else title,
'thumbnail': try_get(video, lambda x: x['promo_image']['url']),
'description': try_get(video, lambda x: x['subheadlines']['basic']),
'formats': formats,
'duration': int_or_none(video.get('duration'), 100),
'timestamp': parse_iso8601(video.get('created_date')),
'subtitles': subtitles,
'is_live': is_live,
}

View file

@ -187,13 +187,13 @@ class ARDMediathekIE(ARDMediathekBaseIE):
if doc.tag == 'rss':
return GenericIE()._extract_rss(url, video_id, doc)
title = self._html_search_regex(
title = self._og_search_title(webpage, default=None) or self._html_search_regex(
[r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>',
r'<meta name="dcterms\.title" content="(.*?)"/>',
r'<h4 class="headline">(.*?)</h4>',
r'<title[^>]*>(.*?)</title>'],
webpage, 'title')
description = self._html_search_meta(
description = self._og_search_description(webpage, default=None) or self._html_search_meta(
'dcterms.abstract', webpage, 'description', default=None)
if description is None:
description = self._html_search_meta(
@ -249,31 +249,40 @@ class ARDMediathekIE(ARDMediathekBaseIE):
class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_VALID_URL = r'(?P<mainurl>https?://(?:www\.)?daserste\.de/(?:[^/?#&]+/)+(?P<id>[^/?#&]+))\.html'
_TESTS = [{
# available till 14.02.2019
'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
'md5': '8e4ec85f31be7c7fc08a26cdbc5a1f49',
# available till 7.01.2022
'url': 'https://www.daserste.de/information/talk/maischberger/videos/maischberger-die-woche-video100.html',
'md5': '867d8aa39eeaf6d76407c5ad1bb0d4c1',
'info_dict': {
'display_id': 'das-groko-drama-zerlegen-sich-die-volksparteien-video',
'id': '102',
'id': 'maischberger-die-woche-video100',
'display_id': 'maischberger-die-woche-video100',
'ext': 'mp4',
'duration': 4435.0,
'title': 'Das GroKo-Drama: Zerlegen sich die Volksparteien?',
'upload_date': '20180214',
'duration': 3687.0,
'title': 'maischberger. die woche vom 7. Januar 2021',
'upload_date': '20210107',
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
'url': 'https://www.daserste.de/information/politik-weltgeschehen/morgenmagazin/videosextern/dominik-kahun-aus-der-nhl-direkt-zur-weltmeisterschaft-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/information/nachrichten-wetter/tagesthemen/videosextern/tagesthemen-17736.html',
'only_matching': True,
}, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/unterhaltung/serie/in-aller-freundschaft-die-jungen-aerzte/Drehpause-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/unterhaltung/film/filmmittwoch-im-ersten/videos/making-ofwendezeit-video-100.html',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
display_id = mobj.group('id')
player_url = mobj.group('mainurl') + '~playerXml.xml'
doc = self._download_xml(player_url, display_id)
@ -284,25 +293,47 @@ class ARDIE(InfoExtractor):
formats = []
for a in video_node.findall('.//asset'):
file_name = xpath_text(a, './fileName', default=None)
if not file_name:
continue
format_type = a.attrib.get('type')
format_url = url_or_none(file_name)
if format_url:
ext = determine_ext(file_name)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_type or 'hls', fatal=False))
continue
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(format_url, {'hdcore': '3.7.0'}),
display_id, f4m_id=format_type or 'hds', fatal=False))
continue
f = {
'format_id': a.attrib['type'],
'width': int_or_none(a.find('./frameWidth').text),
'height': int_or_none(a.find('./frameHeight').text),
'vbr': int_or_none(a.find('./bitrateVideo').text),
'abr': int_or_none(a.find('./bitrateAudio').text),
'vcodec': a.find('./codecVideo').text,
'tbr': int_or_none(a.find('./totalBitrate').text),
'format_id': format_type,
'width': int_or_none(xpath_text(a, './frameWidth')),
'height': int_or_none(xpath_text(a, './frameHeight')),
'vbr': int_or_none(xpath_text(a, './bitrateVideo')),
'abr': int_or_none(xpath_text(a, './bitrateAudio')),
'vcodec': xpath_text(a, './codecVideo'),
'tbr': int_or_none(xpath_text(a, './totalBitrate')),
}
if a.find('./serverPrefix').text:
f['url'] = a.find('./serverPrefix').text
f['playpath'] = a.find('./fileName').text
server_prefix = xpath_text(a, './serverPrefix', default=None)
if server_prefix:
f.update({
'url': server_prefix,
'playpath': file_name,
})
else:
f['url'] = a.find('./fileName').text
if not format_url:
continue
f['url'] = format_url
formats.append(f)
self._sort_formats(formats)
return {
'id': mobj.group('id'),
'id': xpath_text(video_node, './videoId', default=display_id),
'formats': formats,
'display_id': display_id,
'title': video_node.find('./title').text,
@ -313,19 +344,19 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?:[^/]+/)?(?:player|live|video)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://ardmediathek.de/ard/video/die-robuste-roswita/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/',
'md5': 'a1dc75a39c61601b980648f7c9f9f71d',
'info_dict': {
'display_id': 'die-robuste-roswita',
'id': '70153354',
'id': '78566716',
'title': 'Die robuste Roswita',
'description': r're:^Der Mord.*trüber ist als die Ilm.',
'description': r're:^Der Mord.*totgeglaubte Ehefrau Roswita',
'duration': 5316,
'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard',
'timestamp': 1577047500,
'upload_date': '20191222',
'thumbnail': 'https://img.ardmediathek.de/standard/00/78/56/67/84/575672121/16x9/960?mandant=ard',
'timestamp': 1596658200,
'upload_date': '20200805',
'ext': 'mp4',
},
}, {
@ -343,22 +374,22 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
}, {
'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/video/coronavirus-update-ndr-info/astrazeneca-kurz-lockdown-und-pims-syndrom-81/ndr/Y3JpZDovL25kci5kZS84NzE0M2FjNi0wMWEwLTQ5ODEtOTE5NS1mOGZhNzdhOTFmOTI/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3dkci5kZS9CZWl0cmFnLWQ2NDJjYWEzLTMwZWYtNGI4NS1iMTI2LTU1N2UxYTcxOGIzOQ/tatort-duo-koeln-leipzig-ihr-kinderlein-kommet',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
display_id = mobj.group('display_id')
if display_id:
display_id = display_id.rstrip('/')
if not display_id:
display_id = video_id
video_id = self._match_id(url)
player_page = self._download_json(
'https://api.ardmediathek.de/public-gateway',
display_id, data=json.dumps({
video_id, data=json.dumps({
'query': '''{
playerPage(client:"%s", clipId: "%s") {
playerPage(client: "ard", clipId: "%s") {
blockedByFsk
broadcastedOn
maturityContentRating
@ -388,7 +419,7 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
}
}
}
}''' % (mobj.group('client'), video_id),
}''' % video_id,
}).encode(), headers={
'Content-Type': 'application/json'
})['data']['playerPage']
@ -413,7 +444,6 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
info.update({
'age_limit': age_limit,
'display_id': display_id,
'title': title,
'description': description,
'timestamp': unified_timestamp(player_page.get('broadcastedOn')),

View file

@ -103,7 +103,7 @@ class ArkenaIE(InfoExtractor):
f_url, video_id, mpd_id=kind, fatal=False))
elif kind == 'silverlight':
# TODO: process when ism is supported (see
# https://github.com/ytdl-org/haruhi-dl/issues/8118)
# https://github.com/ytdl-org/youtube-dl/issues/8118)
continue
else:
tbr = float_or_none(f.get('Bitrate'), 1000)

View file

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
remove_start,
)
class ArnesIE(InfoExtractor):
IE_NAME = 'video.arnes.si'
IE_DESC = 'Arnes Video'
_VALID_URL = r'https?://video\.arnes\.si/(?:[a-z]{2}/)?(?:watch|embed|api/(?:asset|public/video))/(?P<id>[0-9a-zA-Z]{12})'
_TESTS = [{
'url': 'https://video.arnes.si/watch/a1qrWTOQfVoU?t=10',
'md5': '4d0f4d0a03571b33e1efac25fd4a065d',
'info_dict': {
'id': 'a1qrWTOQfVoU',
'ext': 'mp4',
'title': 'Linearna neodvisnost, definicija',
'description': 'Linearna neodvisnost, definicija',
'license': 'PRIVATE',
'creator': 'Polona Oblak',
'timestamp': 1585063725,
'upload_date': '20200324',
'channel': 'Polona Oblak',
'channel_id': 'q6pc04hw24cj',
'channel_url': 'https://video.arnes.si/?channel=q6pc04hw24cj',
'duration': 596.75,
'view_count': int,
'tags': ['linearna_algebra'],
'start_time': 10,
}
}, {
'url': 'https://video.arnes.si/api/asset/s1YjnV7hadlC/play.mp4',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/en/watch/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC?t=123&hideRelated=1',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/api/public/video/s1YjnV7hadlC',
'only_matching': True,
}]
_BASE_URL = 'https://video.arnes.si'
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
self._BASE_URL + '/api/public/video/' + video_id, video_id)['data']
title = video['title']
formats = []
for media in (video.get('media') or []):
media_url = media.get('url')
if not media_url:
continue
formats.append({
'url': self._BASE_URL + media_url,
'format_id': remove_start(media.get('format'), 'FORMAT_'),
'format_note': media.get('formatTranslation'),
'width': int_or_none(media.get('width')),
'height': int_or_none(media.get('height')),
})
self._sort_formats(formats)
channel = video.get('channel') or {}
channel_id = channel.get('url')
thumbnail = video.get('thumbnailUrl')
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': self._BASE_URL + thumbnail,
'description': video.get('description'),
'license': video.get('license'),
'creator': video.get('author'),
'timestamp': parse_iso8601(video.get('creationTime')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': self._BASE_URL + '/?channel=' + channel_id if channel_id else None,
'duration': float_or_none(video.get('duration'), 1000),
'view_count': int_or_none(video.get('views')),
'tags': video.get('hashtags'),
'start_time': int_or_none(compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('t', [None])[0]),
}

View file

@ -4,23 +4,57 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
qualities,
try_get,
unified_strdate,
url_or_none,
)
# There are different sources of video in arte.tv, the extraction process
# is different for each one. The videos usually expire in 7 days, so we can't
# add tests.
class ArteTVBaseIE(InfoExtractor):
def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id)
_ARTE_LANGUAGES = 'fr|de|en|es|it|pl'
_API_BASE = 'https://api.arte.tv/api/player/v1'
class ArteTVIE(ArteTVBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?arte\.tv/(?P<lang>%(langs)s)/videos|
api\.arte\.tv/api/player/v\d+/config/(?P<lang_2>%(langs)s)
)
/(?P<id>\d{6}-\d{3}-[AF])
''' % {'langs': ArteTVBaseIE._ARTE_LANGUAGES}
_TESTS = [{
'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/',
'info_dict': {
'id': '088501-000-A',
'ext': 'mp4',
'title': 'Mexico: Stealing Petrol to Survive',
'upload_date': '20190628',
},
}, {
'url': 'https://www.arte.tv/pl/videos/100103-000-A/usa-dyskryminacja-na-porodowce/',
'only_matching': True,
}, {
'url': 'https://api.arte.tv/api/player/v2/config/de/100605-013-A',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
lang = mobj.group('lang') or mobj.group('lang_2')
info = self._download_json(
'%s/config/%s/%s' % (self._API_BASE, lang, video_id), video_id)
player_info = info['videoJsonPlayer']
vsr = try_get(player_info, lambda x: x['VSR'], dict)
@ -37,18 +71,11 @@ class ArteTVBaseIE(InfoExtractor):
if not upload_date_str:
upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
title = (player_info.get('VTI') or title or player_info['VID']).strip()
title = (player_info.get('VTI') or player_info['VID']).strip()
subtitle = player_info.get('VSU', '').strip()
if subtitle:
title += ' - %s' % subtitle
info_dict = {
'id': player_info['VID'],
'title': title,
'description': player_info.get('VDE'),
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
qfunc = qualities(['MQ', 'HQ', 'EQ', 'SQ'])
LANGS = {
@ -65,6 +92,10 @@ class ArteTVBaseIE(InfoExtractor):
formats = []
for format_id, format_dict in vsr.items():
f = dict(format_dict)
format_url = url_or_none(f.get('url'))
streamer = f.get('streamer')
if not format_url and not streamer:
continue
versionCode = f.get('versionCode')
l = re.escape(langcode)
@ -107,6 +138,16 @@ class ArteTVBaseIE(InfoExtractor):
else:
lang_pref = -1
media_type = f.get('mediaType')
if media_type == 'hls':
m3u8_formats = self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
for m3u8_format in m3u8_formats:
m3u8_format['language_preference'] = lang_pref
formats.extend(m3u8_formats)
continue
format = {
'format_id': format_id,
'preference': -10 if f.get('videoFormat') == 'M3U8' else None,
@ -118,7 +159,7 @@ class ArteTVBaseIE(InfoExtractor):
'quality': qfunc(f.get('quality')),
}
if f.get('mediaType') == 'rtmp':
if media_type == 'rtmp':
format['url'] = f['streamer']
format['play_path'] = 'mp4:' + f['url']
format['ext'] = 'flv'
@ -127,56 +168,50 @@ class ArteTVBaseIE(InfoExtractor):
formats.append(format)
self._check_formats(formats, video_id)
self._sort_formats(formats)
info_dict['formats'] = formats
return info_dict
return {
'id': player_info.get('VID') or video_id,
'title': title,
'description': player_info.get('VDE'),
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
'formats': formats,
}
class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>\d{6}-\d{3}-[AF])'
class ArteTVEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?arte\.tv/player/v\d+/index\.php\?.*?\bjson_url=.+'
_TESTS = [{
'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/',
'url': 'https://www.arte.tv/player/v5/index.php?json_url=https%3A%2F%2Fapi.arte.tv%2Fapi%2Fplayer%2Fv2%2Fconfig%2Fde%2F100605-013-A&lang=de&autoplay=true&mute=0100605-013-A',
'info_dict': {
'id': '088501-000-A',
'id': '100605-013-A',
'ext': 'mp4',
'title': 'Mexico: Stealing Petrol to Survive',
'upload_date': '20190628',
'title': 'United we Stream November Lockdown Edition #13',
'description': 'md5:be40b667f45189632b78c1425c7c2ce1',
'upload_date': '20201116',
},
}, {
'url': 'https://www.arte.tv/player/v3/index.php?json_url=https://api.arte.tv/api/player/v2/config/de/100605-013-A',
'only_matching': True,
}]
def _real_extract(self, url):
lang, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_from_json_url(
'https://api.arte.tv/api/player/v1/config/%s/%s' % (lang, video_id),
video_id, lang)
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
https://www\.arte\.tv
/player/v3/index\.php\?json_url=
(?P<json_url>
https?://api\.arte\.tv/api/player/v1/config/
(?P<lang>[^/]+)/(?P<id>\d{6}-\d{3}-[AF])
)
'''
_TESTS = []
@staticmethod
def _extract_urls(webpage):
return [url for _, url in re.findall(
r'<(?:iframe|script)[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?arte\.tv/player/v\d+/index\.php\?.*?\bjson_url=.+?)\1',
webpage)]
def _real_extract(self, url):
json_url, lang, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_from_json_url(json_url, video_id, lang)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
json_url = qs['json_url'][0]
video_id = ArteTVIE._match_id(json_url)
return self.url_result(
json_url, ie=ArteTVIE.ie_key(), video_id=video_id)
class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>RC-\d{6})'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>%s)/videos/(?P<id>RC-\d{6})' % ArteTVBaseIE._ARTE_LANGUAGES
_TESTS = [{
'url': 'https://www.arte.tv/en/videos/RC-016954/earn-a-living/',
'info_dict': {
@ -185,17 +220,35 @@ class ArteTVPlaylistIE(ArteTVBaseIE):
'description': 'md5:d322c55011514b3a7241f7fb80d494c2',
},
'playlist_mincount': 6,
}, {
'url': 'https://www.arte.tv/pl/videos/RC-014123/arte-reportage/',
'only_matching': True,
}]
def _real_extract(self, url):
lang, playlist_id = re.match(self._VALID_URL, url).groups()
collection = self._download_json(
'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
% (lang, playlist_id), playlist_id)
'%s/collectionData/%s/%s?source=videos'
% (self._API_BASE, lang, playlist_id), playlist_id)
entries = []
for video in collection['videos']:
if not isinstance(video, dict):
continue
video_url = url_or_none(video.get('url')) or url_or_none(video.get('jsonUrl'))
if not video_url:
continue
video_id = video.get('programId')
entries.append({
'_type': 'url_transparent',
'url': video_url,
'id': video_id,
'title': video.get('title'),
'alt_title': video.get('subtitle'),
'thumbnail': url_or_none(try_get(video, lambda x: x['mainImage']['url'], compat_str)),
'duration': int_or_none(video.get('durationSeconds')),
'view_count': int_or_none(video.get('views')),
'ie_key': ArteTVIE.ie_key(),
})
title = collection.get('title')
description = collection.get('shortDescription') or collection.get('teaserText')
entries = [
self._extract_from_json_url(
video['jsonUrl'], video.get('programId') or playlist_id, lang)
for video in collection['videos'] if video.get('jsonUrl')]
return self.playlist_result(entries, playlist_id, title, description)

View file

@ -1,27 +1,91 @@
# coding: utf-8
from __future__ import unicode_literals
import functools
import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import extract_attributes
from ..utils import (
extract_attributes,
int_or_none,
OnDemandPagedList,
parse_age_limit,
strip_or_none,
try_get,
)
class AsianCrushIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|cocoro\.tv))'
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % _VALID_URL_BASE
class AsianCrushBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|(?:cocoro|retrocrush)\.tv))'
_KALTURA_KEYS = [
'video_url', 'progressive_url', 'download_url', 'thumbnail_url',
'widescreen_thumbnail_url', 'screencap_widescreen',
]
_API_SUFFIX = {'retrocrush.tv': '-ott'}
def _call_api(self, host, endpoint, video_id, query, resource):
return self._download_json(
'https://api%s.%s/%s' % (self._API_SUFFIX.get(host, ''), host, endpoint), video_id,
'Downloading %s JSON metadata' % resource, query=query,
headers=self.geo_verification_headers())['objects']
def _download_object_data(self, host, object_id, resource):
return self._call_api(
host, 'search', object_id, {'id': object_id}, resource)[0]
def _get_object_description(self, obj):
return strip_or_none(obj.get('long_description') or obj.get('short_description'))
def _parse_video_data(self, video):
title = video['name']
entry_id, partner_id = [None] * 2
for k in self._KALTURA_KEYS:
k_url = video.get(k)
if k_url:
mobj = re.search(r'/p/(\d+)/.+?/entryId/([^/]+)/', k_url)
if mobj:
partner_id, entry_id = mobj.groups()
break
meta_categories = try_get(video, lambda x: x['meta']['categories'], list) or []
categories = list(filter(None, [c.get('name') for c in meta_categories]))
show_info = video.get('show_info') or {}
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, entry_id),
'ie_key': KalturaIE.ie_key(),
'id': entry_id,
'title': title,
'description': self._get_object_description(video),
'age_limit': parse_age_limit(video.get('mpaa_rating') or video.get('tv_rating')),
'categories': categories,
'series': show_info.get('show_name'),
'season_number': int_or_none(show_info.get('season_num')),
'season_id': show_info.get('season_id'),
'episode_number': int_or_none(show_info.get('episode_num')),
}
class AsianCrushIE(AsianCrushBaseIE):
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % AsianCrushBaseIE._VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
'url': 'https://www.asiancrush.com/video/004289v/women-who-flirt',
'md5': 'c3b740e48d0ba002a42c0b72857beae6',
'info_dict': {
'id': '1_y4tmjm5r',
'ext': 'mp4',
'title': 'Women Who Flirt',
'description': 'md5:7e986615808bcfb11756eb503a751487',
'description': 'md5:b65c7e0ae03a85585476a62a186f924c',
'timestamp': 1496936429,
'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com',
'age_limit': 13,
'categories': 'count:5',
'duration': 5812,
},
}, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
@ -41,67 +105,35 @@ class AsianCrushIE(InfoExtractor):
}, {
'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/',
'only_matching': True,
}, {
'url': 'https://www.retrocrush.tv/video/true-tears/012328v-i...gave-away-my-tears',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
host, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
entry_id, partner_id, title = [None] * 3
vars = self._parse_json(
self._search_regex(
if host == 'cocoro.tv':
webpage = self._download_webpage(url, video_id)
embed_vars = self._parse_json(self._search_regex(
r'iEmbedVars\s*=\s*({.+?})', webpage, 'embed vars',
default='{}'), video_id, fatal=False)
if vars:
entry_id = vars.get('entry_id')
partner_id = vars.get('partner_id')
title = vars.get('vid_label')
default='{}'), video_id, fatal=False) or {}
video_id = embed_vars.get('entry_id') or video_id
if not entry_id:
entry_id = self._search_regex(
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
'https://api.%s/embeddedVideoPlayer' % host, video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
r'entry_id["\']\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1', player,
'kaltura id', group='id')
if not partner_id:
partner_id = self._search_regex(
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
description = self._html_search_regex(
r'(?s)<div[^>]+\bclass=["\']description["\'][^>]*>(.+?)</div>',
webpage, 'description', fatal=False)
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
'ie_key': KalturaIE.ie_key(),
'id': video_id,
'title': title,
'description': description,
}
video = self._download_object_data(host, video_id, 'video')
return self._parse_video_data(video)
class AsianCrushPlaylistIE(InfoExtractor):
_VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushIE._VALID_URL_BASE
class AsianCrushPlaylistIE(AsianCrushBaseIE):
_VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushBaseIE._VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
'url': 'https://www.asiancrush.com/series/006447s/fruity-samurai',
'info_dict': {
'id': '12481',
'title': 'Scholar Who Walks the Night',
'description': 'md5:7addd7c5132a09fd4741152d96cce886',
'id': '6447',
'title': 'Fruity Samurai',
'description': 'md5:7535174487e4a202d3872a7fc8f2f154',
},
'playlist_count': 20,
'playlist_count': 13,
}, {
'url': 'https://www.yuyutv.com/series/013920s/peep-show/',
'only_matching': True,
@ -111,35 +143,58 @@ class AsianCrushPlaylistIE(InfoExtractor):
}, {
'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/',
'only_matching': True,
}, {
'url': 'https://www.retrocrush.tv/series/012355s/true-tears',
'only_matching': True,
}]
_PAGE_SIZE = 1000000000
def _fetch_page(self, domain, parent_id, page):
videos = self._call_api(
domain, 'getreferencedobjects', parent_id, {
'max': self._PAGE_SIZE,
'object_type': 'video',
'parent_id': parent_id,
'start': page * self._PAGE_SIZE,
}, 'page %d' % (page + 1))
for video in videos:
yield self._parse_video_data(video)
def _real_extract(self, url):
playlist_id = self._match_id(url)
host, playlist_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, playlist_id)
if host == 'cocoro.tv':
webpage = self._download_webpage(url, playlist_id)
entries = []
entries = []
for mobj in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL,
webpage):
attrs = extract_attributes(mobj.group(0))
if attrs.get('class') == 'clearfix':
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
for mobj in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL,
webpage):
attrs = extract_attributes(mobj.group(0))
if attrs.get('class') == 'clearfix':
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
if title:
title = re.sub(r'\s*\|\s*.+?$', '', title)
title = self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
if title:
title = re.sub(r'\s*\|\s*.+?$', '', title)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'twitter:description', webpage, 'description', fatal=False)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'twitter:description', webpage, 'description', fatal=False)
else:
show = self._download_object_data(host, playlist_id, 'show')
title = show.get('name')
description = self._get_object_description(show)
entries = OnDemandPagedList(
functools.partial(self._fetch_page, host, playlist_id),
self._PAGE_SIZE)
return self.playlist_result(entries, playlist_id, title, description)

View file

@ -48,6 +48,7 @@ class AWAANBaseIE(InfoExtractor):
'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
'is_live': is_live,
'uploader_id': video_data.get('user_id'),
}
@ -107,6 +108,7 @@ class AWAANLiveIE(AWAANBaseIE):
'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'upload_date': '20150107',
'timestamp': 1420588800,
'uploader_id': '71',
},
'params': {
# m3u8 download

View file

@ -47,7 +47,7 @@ class AZMedienIE(InfoExtractor):
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True
}]
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/a4016f65fe62b81dc6664dd9f4910e4ab40383be'
_PARTNER_ID = '1719221'
def _real_extract(self, url):

View file

@ -0,0 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
from .brightcove import BrightcoveNewIE
from ..utils import extract_attributes
class BandaiChannelIE(BrightcoveNewIE):
IE_NAME = 'bandaichannel'
_VALID_URL = r'https?://(?:www\.)?b-ch\.com/titles/(?P<id>\d+/\d+)'
_TESTS = [{
'url': 'https://www.b-ch.com/titles/514/001',
'md5': 'a0f2d787baa5729bed71108257f613a4',
'info_dict': {
'id': '6128044564001',
'ext': 'mp4',
'title': 'メタルファイターMIKU 第1話',
'timestamp': 1580354056,
'uploader_id': '5797077852001',
'upload_date': '20200130',
'duration': 1387.733,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
attrs = extract_attributes(self._search_regex(
r'(<video-js[^>]+\bid="bcplayer"[^>]*>)', webpage, 'player'))
bc = self._download_json(
'https://pbifcd.b-ch.com/v1/playbackinfo/ST/70/' + attrs['data-info'],
video_id, headers={'X-API-KEY': attrs['data-auth'].strip()})['bc']
return self._parse_brightcove_metadata(bc, bc['id'])

View file

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import random
@ -5,10 +6,7 @@ import re
import time
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..compat import compat_str
from ..utils import (
ExtractorError,
float_or_none,
@ -17,30 +15,32 @@ from ..utils import (
parse_filesize,
str_or_none,
try_get,
unescapeHTML,
update_url_query,
unified_strdate,
unified_timestamp,
url_or_none,
urljoin,
)
class BandcampIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.bandcamp\.com/track/(?P<title>[^/?#&]+)'
_VALID_URL = r'https?://[^/]+\.bandcamp\.com/track/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://haruhi-dl.bandcamp.com/track/haruhi-dl-test-song',
'md5': 'c557841d5e50261777a6585648adf439',
'info_dict': {
'id': '1812978515',
'ext': 'mp3',
'title': "haruhi-dl \"'/\\\u00e4\u21ad - haruhi-dl test song \"'/\\\u00e4\u21ad",
'title': "haruhi-dl \"'/\\ä↭ - haruhi-dl \"'/\\ä↭ - haruhi-dl test song \"'/\\ä↭",
'duration': 9.8485,
'uploader': 'haruhi-dl "\'/\\ä↭',
'upload_date': '20121129',
'timestamp': 1354224127,
},
'_skip': 'There is a limit of 200 free downloads / month for the test song'
}, {
# free download
'url': 'http://benprunty.bandcamp.com/track/lanius-battle',
'md5': '853e35bf34aa1d6fe2615ae612564b36',
'info_dict': {
'id': '2650410135',
'ext': 'aiff',
@ -49,6 +49,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Ben Prunty',
'timestamp': 1396508491,
'upload_date': '20140403',
'release_timestamp': 1396483200,
'release_date': '20140403',
'duration': 260.877,
'track': 'Lanius (Battle)',
@ -69,6 +70,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Mastodon',
'timestamp': 1322005399,
'upload_date': '20111122',
'release_timestamp': 1076112000,
'release_date': '20040207',
'duration': 120.79,
'track': 'Hail to Fire',
@ -79,11 +81,16 @@ class BandcampIE(InfoExtractor):
},
}]
def _extract_data_attr(self, webpage, video_id, attr='tralbum', fatal=True):
return self._parse_json(self._html_search_regex(
r'data-%s=(["\'])({.+?})\1' % attr, webpage,
attr + ' data', group=2), video_id, fatal=fatal)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
title = self._match_id(url)
webpage = self._download_webpage(url, title)
thumbnail = self._html_search_meta('og:image', webpage, default=None)
tralbum = self._extract_data_attr(webpage, title)
thumbnail = self._og_search_thumbnail(webpage)
track_id = None
track = None
@ -91,10 +98,7 @@ class BandcampIE(InfoExtractor):
duration = None
formats = []
track_info = self._parse_json(
self._search_regex(
r'trackinfo\s*:\s*\[\s*({.+?})\s*\]\s*,\s*?\n',
webpage, 'track info', default='{}'), title)
track_info = try_get(tralbum, lambda x: x['trackinfo'][0], dict)
if track_info:
file_ = track_info.get('file')
if isinstance(file_, dict):
@ -111,37 +115,25 @@ class BandcampIE(InfoExtractor):
'abr': int_or_none(abr_str),
})
track = track_info.get('title')
track_id = str_or_none(track_info.get('track_id') or track_info.get('id'))
track_id = str_or_none(
track_info.get('track_id') or track_info.get('id'))
track_number = int_or_none(track_info.get('track_num'))
duration = float_or_none(track_info.get('duration'))
def extract(key):
return self._search_regex(
r'\b%s\s*["\']?\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % key,
webpage, key, default=None, group='value')
artist = extract('artist')
album = extract('album_title')
embed = self._extract_data_attr(webpage, title, 'embed', False)
current = tralbum.get('current') or {}
artist = embed.get('artist') or current.get('artist') or tralbum.get('artist')
timestamp = unified_timestamp(
extract('publish_date') or extract('album_publish_date'))
release_date = unified_strdate(extract('album_release_date'))
current.get('publish_date') or tralbum.get('album_publish_date'))
download_link = self._search_regex(
r'freeDownloadPage\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'download link', default=None, group='url')
download_link = tralbum.get('freeDownloadPage')
if download_link:
track_id = self._search_regex(
r'(?ms)var TralbumData = .*?[{,]\s*id: (?P<id>\d+),?$',
webpage, 'track id')
track_id = compat_str(tralbum['id'])
download_webpage = self._download_webpage(
download_link, track_id, 'Downloading free downloads page')
blob = self._parse_json(
self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', download_webpage,
'blob', group='blob'),
track_id, transform_source=unescapeHTML)
blob = self._extract_data_attr(download_webpage, track_id, 'blob')
info = try_get(
blob, (lambda x: x['digital_items'][0],
@ -207,20 +199,20 @@ class BandcampIE(InfoExtractor):
'thumbnail': thumbnail,
'uploader': artist,
'timestamp': timestamp,
'release_date': release_date,
'release_timestamp': unified_timestamp(tralbum.get('album_release_date')),
'duration': duration,
'track': track,
'track_number': track_number,
'track_id': track_id,
'artist': artist,
'album': album,
'album': embed.get('album_title'),
'formats': formats,
}
class BandcampAlbumIE(InfoExtractor):
class BandcampAlbumIE(BandcampIE):
IE_NAME = 'Bandcamp:album'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<album_id>[^/?#&]+))?'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@ -230,7 +222,10 @@ class BandcampAlbumIE(InfoExtractor):
'info_dict': {
'id': '1353101989',
'ext': 'mp3',
'title': 'Intro',
'title': 'Blazo - Intro',
'timestamp': 1311756226,
'upload_date': '20110727',
'uploader': 'Blazo',
}
},
{
@ -238,7 +233,10 @@ class BandcampAlbumIE(InfoExtractor):
'info_dict': {
'id': '38097443',
'ext': 'mp3',
'title': 'Kero One - Keep It Alive (Blazo remix)',
'title': 'Blazo - Kero One - Keep It Alive (Blazo remix)',
'timestamp': 1311757238,
'upload_date': '20110727',
'uploader': 'Blazo',
}
},
],
@ -274,6 +272,7 @@ class BandcampAlbumIE(InfoExtractor):
'title': '"Entropy" EP',
'uploader_id': 'jstrecords',
'id': 'entropy-ep',
'description': 'md5:0ff22959c943622972596062f2f366a5',
},
'playlist_mincount': 3,
}, {
@ -283,6 +282,7 @@ class BandcampAlbumIE(InfoExtractor):
'id': 'we-are-the-plague',
'title': 'WE ARE THE PLAGUE',
'uploader_id': 'insulters',
'description': 'md5:b3cf845ee41b2b1141dc7bde9237255f',
},
'playlist_count': 2,
}]
@ -294,41 +294,34 @@ class BandcampAlbumIE(InfoExtractor):
else super(BandcampAlbumIE, cls).suitable(url))
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
uploader_id = mobj.group('subdomain')
album_id = mobj.group('album_id')
uploader_id, album_id = re.match(self._VALID_URL, url).groups()
playlist_id = album_id or uploader_id
webpage = self._download_webpage(url, playlist_id)
track_elements = re.findall(
r'(?s)<div[^>]*>(.*?<a[^>]+href="([^"]+?)"[^>]+itemprop="url"[^>]*>.*?)</div>', webpage)
if not track_elements:
tralbum = self._extract_data_attr(webpage, playlist_id)
track_info = tralbum.get('trackinfo')
if not track_info:
raise ExtractorError('The page doesn\'t contain any tracks')
# Only tracks with duration info have songs
entries = [
self.url_result(
compat_urlparse.urljoin(url, t_path),
ie=BandcampIE.ie_key(),
video_title=self._search_regex(
r'<span\b[^>]+\bitemprop=["\']name["\'][^>]*>([^<]+)',
elem_content, 'track title', fatal=False))
for elem_content, t_path in track_elements
if self._html_search_meta('duration', elem_content, default=None)]
urljoin(url, t['title_link']), BandcampIE.ie_key(),
str_or_none(t.get('track_id') or t.get('id')), t.get('title'))
for t in track_info
if t.get('duration')]
current = tralbum.get('current') or {}
title = self._html_search_regex(
r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"',
webpage, 'title', fatal=False)
if title:
title = title.replace(r'\"', '"')
return {
'_type': 'playlist',
'uploader_id': uploader_id,
'id': playlist_id,
'title': title,
'title': current.get('title'),
'description': current.get('about'),
'entries': entries,
}
class BandcampWeeklyIE(InfoExtractor):
class BandcampWeeklyIE(BandcampIE):
IE_NAME = 'Bandcamp:weekly'
_VALID_URL = r'https?://(?:www\.)?bandcamp\.com/?\?(?:.*?&)?show=(?P<id>\d+)'
_TESTS = [{
@ -343,29 +336,23 @@ class BandcampWeeklyIE(InfoExtractor):
'release_date': '20170404',
'series': 'Bandcamp Weekly',
'episode': 'Magic Moments',
'episode_number': 208,
'episode_id': '224',
}
},
'params': {
'format': 'opus-lo',
},
}, {
'url': 'https://bandcamp.com/?blah/blah@&show=228',
'only_matching': True
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
show_id = self._match_id(url)
webpage = self._download_webpage(url, show_id)
blob = self._parse_json(
self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', webpage,
'blob', group='blob'),
video_id, transform_source=unescapeHTML)
blob = self._extract_data_attr(webpage, show_id, 'blob')
show = blob['bcw_show']
# This is desired because any invalid show id redirects to `bandcamp.com`
# which happens to expose the latest Bandcamp Weekly episode.
show_id = int_or_none(show.get('show_id')) or int_or_none(video_id)
show = blob['bcw_data'][show_id]
formats = []
for format_id, format_url in show['audio_stream'].items():
@ -390,20 +377,8 @@ class BandcampWeeklyIE(InfoExtractor):
if subtitle:
title += ' - %s' % subtitle
episode_number = None
seq = blob.get('bcw_seq')
if seq and isinstance(seq, list):
try:
episode_number = next(
int_or_none(e.get('episode_number'))
for e in seq
if isinstance(e, dict) and int_or_none(e.get('id')) == show_id)
except StopIteration:
pass
return {
'id': video_id,
'id': show_id,
'title': title,
'description': show.get('desc') or show.get('short_desc'),
'duration': float_or_none(show.get('audio_duration')),
@ -411,7 +386,6 @@ class BandcampWeeklyIE(InfoExtractor):
'release_date': unified_strdate(show.get('published_date')),
'series': 'Bandcamp Weekly',
'episode': show.get('subtitle'),
'episode_number': episode_number,
'episode_id': compat_str(video_id),
'episode_id': show_id,
'formats': formats
}

View file

@ -1,31 +1,39 @@
# coding: utf-8
from __future__ import unicode_literals
import functools
import itertools
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
OnDemandPagedList,
clean_html,
dict_get,
ExtractorError,
float_or_none,
get_element_by_class,
int_or_none,
js_to_json,
parse_duration,
parse_iso8601,
strip_or_none,
try_get,
unescapeHTML,
unified_timestamp,
url_or_none,
urlencode_postdata,
urljoin,
)
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_urlparse,
)
class BBCCoUkIE(InfoExtractor):
@ -49,22 +57,17 @@ class BBCCoUkIE(InfoExtractor):
_LOGIN_URL = 'https://account.bbc.com/signin'
_NETRC_MACHINE = 'bbc'
_MEDIASELECTOR_URLS = [
_MEDIA_SELECTOR_URL_TEMPL = 'https://open.live.bbc.co.uk/mediaselector/6/select/version/2.0/mediaset/%s/vpid/%s'
_MEDIA_SETS = [
# Provides HQ HLS streams with even better quality that pc mediaset but fails
# with geolocation in some cases when it's even not geo restricted at all (e.g.
# http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable.
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s',
'iptv-all',
'pc',
]
_MEDIASELECTION_NS = 'http://bbc.co.uk/2008/mp/mediaselection'
_EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist'
_NAMESPACES = (
_MEDIASELECTION_NS,
_EMP_PLAYLIST_NS,
)
_TESTS = [
{
'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
@ -209,7 +212,7 @@ class BBCCoUkIE(InfoExtractor):
},
'skip': 'Now it\'s really geo-restricted',
}, {
# compact player (https://github.com/ytdl-org/haruhi-dl/issues/8147)
# compact player (https://github.com/ytdl-org/youtube-dl/issues/8147)
'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
'info_dict': {
'id': 'p028bfkj',
@ -261,8 +264,6 @@ class BBCCoUkIE(InfoExtractor):
'only_matching': True,
}]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
def _login(self):
username, password = self._get_login_info()
if username is None:
@ -307,22 +308,14 @@ class BBCCoUkIE(InfoExtractor):
def _extract_items(self, playlist):
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
def _findall_ns(self, element, xpath):
elements = []
for ns in self._NAMESPACES:
elements.extend(element.findall(xpath % ns))
return elements
def _extract_medias(self, media_selection):
error = media_selection.find('./{%s}error' % self._MEDIASELECTION_NS)
if error is None:
media_selection.find('./{%s}error' % self._EMP_PLAYLIST_NS)
if error is not None:
raise BBCCoUkIE.MediaSelectionError(error.get('id'))
return self._findall_ns(media_selection, './{%s}media')
error = media_selection.get('result')
if error:
raise BBCCoUkIE.MediaSelectionError(error)
return media_selection.get('media') or []
def _extract_connections(self, media):
return self._findall_ns(media, './{%s}connection')
return media.get('connection') or []
def _get_subtitles(self, media, programme_id):
subtitles = {}
@ -334,13 +327,13 @@ class BBCCoUkIE(InfoExtractor):
cc_url, programme_id, 'Downloading captions', fatal=False)
if not isinstance(captions, compat_etree_Element):
continue
lang = captions.get('{http://www.w3.org/XML/1998/namespace}lang', 'en')
subtitles[lang] = [
subtitles['en'] = [
{
'url': connection.get('href'),
'ext': 'ttml',
},
]
break
return subtitles
def _raise_extractor_error(self, media_selection_error):
@ -350,10 +343,10 @@ class BBCCoUkIE(InfoExtractor):
def _download_media_selector(self, programme_id):
last_exception = None
for mediaselector_url in self._MEDIASELECTOR_URLS:
for media_set in self._MEDIA_SETS:
try:
return self._download_media_selector_url(
mediaselector_url % programme_id, programme_id)
self._MEDIA_SELECTOR_URL_TEMPL % (media_set, programme_id), programme_id)
except BBCCoUkIE.MediaSelectionError as e:
if e.id in ('notukerror', 'geolocation', 'selectionunavailable'):
last_exception = e
@ -362,8 +355,8 @@ class BBCCoUkIE(InfoExtractor):
self._raise_extractor_error(last_exception)
def _download_media_selector_url(self, url, programme_id=None):
media_selection = self._download_xml(
url, programme_id, 'Downloading media selection XML',
media_selection = self._download_json(
url, programme_id, 'Downloading media selection JSON',
expected_status=(403, 404))
return self._process_media_selector(media_selection, programme_id)
@ -377,7 +370,6 @@ class BBCCoUkIE(InfoExtractor):
if kind in ('video', 'audio'):
bitrate = int_or_none(media.get('bitrate'))
encoding = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
@ -392,8 +384,6 @@ class BBCCoUkIE(InfoExtractor):
supplier = connection.get('supplier')
transfer_format = connection.get('transferFormat')
format_id = supplier or conn_kind or protocol
if service:
format_id = '%s_%s' % (service, format_id)
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
@ -408,20 +398,11 @@ class BBCCoUkIE(InfoExtractor):
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
if re.search(self._USP_RE, href):
usp_formats = self._extract_m3u8_formats(
re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href),
programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
for f in usp_formats:
if f.get('height') and f['height'] > 720:
continue
formats.append(f)
elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False))
else:
if not service and not supplier and bitrate:
if not supplier and bitrate:
format_id += '-%d' % bitrate
fmt = {
'format_id': format_id,
@ -554,7 +535,7 @@ class BBCCoUkIE(InfoExtractor):
webpage = self._download_webpage(url, group_id, 'Downloading video page')
error = self._search_regex(
r'<div\b[^>]+\bclass=["\']smp__message delta["\'][^>]*>([^<]+)<',
r'<div\b[^>]+\bclass=["\'](?:smp|playout)__message delta["\'][^>]*>\s*([^<]+?)\s*<',
webpage, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
@ -607,16 +588,9 @@ class BBCIE(BBCCoUkIE):
IE_DESC = 'BBC'
_VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)'
_MEDIASELECTOR_URLS = [
# Provides HQ HLS streams but fails with geolocation in some cases when it's
# even not geo restricted at all
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
# Provides more formats, namely direct mp4 links, but fails on some videos with
# notukerror for non UK (?) users (e.g.
# http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
'http://open.live.bbc.co.uk/mediaselector/4/mtis/stream/%s',
# Provides fewer formats, but works everywhere for everybody (hopefully)
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/journalism-pc/vpid/%s',
_MEDIA_SETS = [
'mobile-tablet-main',
'pc',
]
_TESTS = [{
@ -790,8 +764,17 @@ class BBCIE(BBCCoUkIE):
'only_matching': True,
}, {
# custom redirection to www.bbc.com
# also, video with window.__INITIAL_DATA__
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True,
'info_dict': {
'id': 'p02xzws1',
'ext': 'mp4',
'title': "Pluto may have 'nitrogen glaciers'",
'description': 'md5:6a95b593f528d7a5f2605221bc56912f',
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1437785037,
'upload_date': '20150725',
},
}, {
# single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
@ -827,11 +810,25 @@ class BBCIE(BBCCoUkIE):
'description': 'Learn English words and phrases from this story',
},
'add_ie': [BBCCoUkIE.ie_key()],
}, {
# BBC Reel
'url': 'https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness',
'info_dict': {
'id': 'p07c6sb9',
'ext': 'mp4',
'title': 'How positive thinking is harming your happiness',
'alt_title': 'The downsides of positive thinking',
'description': 'md5:fad74b31da60d83b8265954ee42d85b4',
'duration': 235,
'thumbnail': r're:https?://.+/p07c9dsr.jpg',
'upload_date': '20190604',
'categories': ['Psychology'],
},
}]
@classmethod
def suitable(cls, url):
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerEpisodesIE, BBCCoUkIPlayerGroupIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url))
@ -963,7 +960,7 @@ class BBCIE(BBCCoUkIE):
else:
entry['title'] = info['title']
entry['formats'].extend(info['formats'])
except Exception as e:
except ExtractorError as e:
# Some playlist URL may fail with 500, at the same time
# the other one may work fine (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
@ -981,7 +978,7 @@ class BBCIE(BBCCoUkIE):
group_id = self._search_regex(
r'<div[^>]+\bclass=["\']video["\'][^>]+\bdata-pid=["\'](%s)' % self._ID_REGEX,
webpage, 'group id', default=None)
if playlist_id:
if group_id:
return self.url_result(
'https://www.bbc.co.uk/programmes/%s' % group_id,
ie=BBCCoUkIE.ie_key())
@ -1014,6 +1011,37 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles,
}
# bbc reel (e.g. https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness)
initial_data = self._parse_json(self._html_search_regex(
r'<script[^>]+id=(["\'])initial-data\1[^>]+data-json=(["\'])(?P<json>(?:(?!\2).)+)',
webpage, 'initial data', default='{}', group='json'), playlist_id, fatal=False)
if initial_data:
init_data = try_get(
initial_data, lambda x: x['initData']['items'][0], dict) or {}
smp_data = init_data.get('smpData') or {}
clip_data = try_get(smp_data, lambda x: x['items'][0], dict) or {}
version_id = clip_data.get('versionID')
if version_id:
title = smp_data['title']
formats, subtitles = self._download_media_selector(version_id)
self._sort_formats(formats)
image_url = smp_data.get('holdingImageURL')
display_date = init_data.get('displayDate')
topic_title = init_data.get('topicTitle')
return {
'id': version_id,
'title': title,
'formats': formats,
'alt_title': init_data.get('shortTitle'),
'thumbnail': image_url.replace('$recipe', 'raw') if image_url else None,
'description': smp_data.get('summary') or init_data.get('shortSummary'),
'upload_date': display_date.replace('-', '') if display_date else None,
'subtitles': subtitles,
'duration': int_or_none(clip_data.get('duration')),
'categories': [topic_title] if topic_title else None,
}
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
# There are several setPayload calls may be present but the video
# seems to be always related to the first one
@ -1075,7 +1103,7 @@ class BBCIE(BBCCoUkIE):
thumbnail = None
image_url = current_programme.get('image_url')
if image_url:
thumbnail = image_url.replace('{recipe}', '1920x1920')
thumbnail = image_url.replace('{recipe}', 'raw')
return {
'id': programme_id,
'title': title,
@ -1092,10 +1120,26 @@ class BBCIE(BBCCoUkIE):
self._search_regex(
r'(?s)bbcthreeConfig\s*=\s*({.+?})\s*;\s*<', webpage,
'bbcthree config', default='{}'),
playlist_id, transform_source=js_to_json, fatal=False)
if bbc3_config:
playlist_id, transform_source=js_to_json, fatal=False) or {}
payload = bbc3_config.get('payload') or {}
if payload:
clip = payload.get('currentClip') or {}
clip_vpid = clip.get('vpid')
clip_title = clip.get('title')
if clip_vpid and clip_title:
formats, subtitles = self._download_media_selector(clip_vpid)
self._sort_formats(formats)
return {
'id': clip_vpid,
'title': clip_title,
'thumbnail': dict_get(clip, ('poster', 'imageUrl')),
'description': clip.get('description'),
'duration': parse_duration(clip.get('duration')),
'formats': formats,
'subtitles': subtitles,
}
bbc3_playlist = try_get(
bbc3_config, lambda x: x['payload']['content']['bbcMedia']['playlist'],
payload, lambda x: x['content']['bbcMedia']['playlist'],
dict)
if bbc3_playlist:
playlist_title = bbc3_playlist.get('title') or playlist_title
@ -1118,6 +1162,56 @@ class BBCIE(BBCCoUkIE):
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)
initial_data = self._parse_json(self._search_regex(
r'window\.__INITIAL_DATA__\s*=\s*({.+?});', webpage,
'preload state', default='{}'), playlist_id, fatal=False)
if initial_data:
def parse_media(media):
if not media:
return
for item in (try_get(media, lambda x: x['media']['items'], list) or []):
item_id = item.get('id')
item_title = item.get('title')
if not (item_id and item_title):
continue
formats, subtitles = self._download_media_selector(item_id)
self._sort_formats(formats)
item_desc = None
blocks = try_get(media, lambda x: x['summary']['blocks'], list)
if blocks:
summary = []
for block in blocks:
text = try_get(block, lambda x: x['model']['text'], compat_str)
if text:
summary.append(text)
if summary:
item_desc = '\n\n'.join(summary)
item_time = None
for meta in try_get(media, lambda x: x['metadata']['items'], list) or []:
if try_get(meta, lambda x: x['label']) == 'Published':
item_time = unified_timestamp(meta.get('timestamp'))
break
entries.append({
'id': item_id,
'title': item_title,
'thumbnail': item.get('holdingImageUrl'),
'formats': formats,
'subtitles': subtitles,
'timestamp': item_time,
'description': strip_or_none(item_desc),
})
for resp in (initial_data.get('data') or {}).values():
name = resp.get('name')
if name == 'media-experience':
parse_media(try_get(resp, lambda x: x['data']['initialItem']['mediaItem'], dict))
elif name == 'article':
for block in (try_get(resp, lambda x: x['data']['blocks'], list) or []):
if block.get('type') != 'media':
continue
parse_media(block.get('model'))
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)
def extract_all(pattern):
return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False),
@ -1278,21 +1372,149 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
class BBCCoUkIPlayerPlaylistBaseIE(InfoExtractor):
_VALID_URL_TMPL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/%%s/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
@staticmethod
def _get_default(episode, key, default_key='default'):
return try_get(episode, lambda x: x[key][default_key])
def _get_description(self, data):
synopsis = data.get(self._DESCRIPTION_KEY) or {}
return dict_get(synopsis, ('large', 'medium', 'small'))
def _fetch_page(self, programme_id, per_page, series_id, page):
elements = self._get_elements(self._call_api(
programme_id, per_page, page + 1, series_id))
for element in elements:
episode = self._get_episode(element)
episode_id = episode.get('id')
if not episode_id:
continue
thumbnail = None
image = self._get_episode_image(episode)
if image:
thumbnail = image.replace('{recipe}', 'raw')
category = self._get_default(episode, 'labels', 'category')
yield {
'_type': 'url',
'id': episode_id,
'title': self._get_episode_field(episode, 'subtitle'),
'url': 'https://www.bbc.co.uk/iplayer/episode/' + episode_id,
'thumbnail': thumbnail,
'description': self._get_description(episode),
'categories': [category] if category else None,
'series': self._get_episode_field(episode, 'title'),
'ie_key': BBCCoUkIE.ie_key(),
}
def _real_extract(self, url):
pid = self._match_id(url)
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
series_id = qs.get('seriesId', [None])[0]
page = qs.get('page', [None])[0]
per_page = 36 if page else self._PAGE_SIZE
fetch_page = functools.partial(self._fetch_page, pid, per_page, series_id)
entries = fetch_page(int(page) - 1) if page else OnDemandPagedList(fetch_page, self._PAGE_SIZE)
playlist_data = self._get_playlist_data(self._call_api(pid, 1))
return self.playlist_result(
entries, pid, self._get_playlist_title(playlist_data),
self._get_description(playlist_data))
class BBCCoUkIPlayerEpisodesIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:episodes'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'episodes'
_TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance',
'description': 'French thriller serial about a missing teenager.',
'description': 'md5:58eb101aee3116bad4da05f91179c0cb',
},
'playlist_mincount': 6,
'skip': 'This programme is not currently available on BBC iPlayer',
'playlist_mincount': 8,
}, {
# all seasons
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 10,
}, {
# explicit season
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster?seriesId=b094m6nv',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 5,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 37,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove?page=2',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 1,
}]
_PAGE_SIZE = 100
_DESCRIPTION_KEY = 'synopsis'
def _get_episode_image(self, episode):
return self._get_default(episode, 'image')
def _get_episode_field(self, episode, field):
return self._get_default(episode, field)
@staticmethod
def _get_elements(data):
return data['entities']['results']
@staticmethod
def _get_episode(element):
return element.get('episode') or {}
def _call_api(self, pid, per_page, page=1, series_id=None):
variables = {
'id': pid,
'page': page,
'perPage': per_page,
}
if series_id:
variables['sliceId'] = series_id
return self._download_json(
'https://graph.ibl.api.bbc.co.uk/', pid, headers={
'Content-Type': 'application/json'
}, data=json.dumps({
'id': '5692d93d5aac8d796a0305e895e61551',
'variables': variables,
}).encode('utf-8'))['data']['programme']
@staticmethod
def _get_playlist_data(data):
return data
def _get_playlist_title(self, data):
return self._get_default(data, 'title')
class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:group'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'group'
_TESTS = [{
# Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': {
@ -1301,14 +1523,56 @@ class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
},
'playlist_mincount': 10,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 47,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7?page=2',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 11,
}]
_PAGE_SIZE = 200
_DESCRIPTION_KEY = 'synopses'
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
webpage, 'description', fatal=False, group='value')
return title, description
def _get_episode_image(self, episode):
return self._get_default(episode, 'images', 'standard')
def _get_episode_field(self, episode, field):
return episode.get(field)
@staticmethod
def _get_elements(data):
return data['elements']
@staticmethod
def _get_episode(element):
return element
def _call_api(self, pid, per_page, page=1, series_id=None):
return self._download_json(
'http://ibl.api.bbc.co.uk/ibl/v1/groups/%s/episodes' % pid,
pid, query={
'page': page,
'per_page': per_page,
})['group_episodes']
@staticmethod
def _get_playlist_data(data):
return data['group']
def _get_playlist_title(self, data):
return data.get('title')
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):

View file

@ -1,194 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
compat_str,
float_or_none,
int_or_none,
parse_iso8601,
try_get,
urljoin,
)
class BeamProBaseIE(InfoExtractor):
_API_BASE = 'https://mixer.com/api/v1'
_RATINGS = {'family': 0, 'teen': 13, '18+': 18}
def _extract_channel_info(self, chan):
user_id = chan.get('userId') or try_get(chan, lambda x: x['user']['id'])
return {
'uploader': chan.get('token') or try_get(
chan, lambda x: x['user']['username'], compat_str),
'uploader_id': compat_str(user_id) if user_id else None,
'age_limit': self._RATINGS.get(chan.get('audience')),
}
class BeamProLiveIE(BeamProBaseIE):
IE_NAME = 'Mixer:live'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://mixer.com/niterhayven',
'info_dict': {
'id': '261562',
'ext': 'mp4',
'title': 'Introducing The Witcher 3 // The Grind Starts Now!',
'description': 'md5:0b161ac080f15fe05d18a07adb44a74d',
'thumbnail': r're:https://.*\.jpg$',
'timestamp': 1483477281,
'upload_date': '20170103',
'uploader': 'niterhayven',
'uploader_id': '373396',
'age_limit': 18,
'is_live': True,
'view_count': int,
},
'skip': 'niterhayven is offline',
'params': {
'skip_download': True,
},
}
_MANIFEST_URL_TEMPLATE = '%s/channels/%%s/manifest.%%s' % BeamProBaseIE._API_BASE
@classmethod
def suitable(cls, url):
return False if BeamProVodIE.suitable(url) else super(BeamProLiveIE, cls).suitable(url)
def _real_extract(self, url):
channel_name = self._match_id(url)
chan = self._download_json(
'%s/channels/%s' % (self._API_BASE, channel_name), channel_name)
if chan.get('online') is False:
raise ExtractorError(
'{0} is offline'.format(channel_name), expected=True)
channel_id = chan['id']
def manifest_url(kind):
return self._MANIFEST_URL_TEMPLATE % (channel_id, kind)
formats = self._extract_m3u8_formats(
manifest_url('m3u8'), channel_name, ext='mp4', m3u8_id='hls',
fatal=False)
formats.extend(self._extract_smil_formats(
manifest_url('smil'), channel_name, fatal=False))
self._sort_formats(formats)
info = {
'id': compat_str(chan.get('id') or channel_name),
'title': self._live_title(chan.get('name') or channel_name),
'description': clean_html(chan.get('description')),
'thumbnail': try_get(
chan, lambda x: x['thumbnail']['url'], compat_str),
'timestamp': parse_iso8601(chan.get('updatedAt')),
'is_live': True,
'view_count': int_or_none(chan.get('viewersTotal')),
'formats': formats,
}
info.update(self._extract_channel_info(chan))
return info
class BeamProVodIE(BeamProBaseIE):
IE_NAME = 'Mixer:vod'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>[^?#&]+)'
_TESTS = [{
'url': 'https://mixer.com/willow8714?vod=2259830',
'md5': 'b2431e6e8347dc92ebafb565d368b76b',
'info_dict': {
'id': '2259830',
'ext': 'mp4',
'title': 'willow8714\'s Channel',
'duration': 6828.15,
'thumbnail': r're:https://.*source\.png$',
'timestamp': 1494046474,
'upload_date': '20170506',
'uploader': 'willow8714',
'uploader_id': '6085379',
'age_limit': 13,
'view_count': int,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://mixer.com/streamer?vod=IxFno1rqC0S_XJ1a2yGgNw',
'only_matching': True,
}, {
'url': 'https://mixer.com/streamer?vod=Rh3LY0VAqkGpEQUe2pN-ig',
'only_matching': True,
}]
@staticmethod
def _extract_format(vod, vod_type):
if not vod.get('baseUrl'):
return []
if vod_type == 'hls':
filename, protocol = 'manifest.m3u8', 'm3u8_native'
elif vod_type == 'raw':
filename, protocol = 'source.mp4', 'https'
else:
assert False
data = vod.get('data') if isinstance(vod.get('data'), dict) else {}
format_id = [vod_type]
if isinstance(data.get('Height'), compat_str):
format_id.append('%sp' % data['Height'])
return [{
'url': urljoin(vod['baseUrl'], filename),
'format_id': '-'.join(format_id),
'ext': 'mp4',
'protocol': protocol,
'width': int_or_none(data.get('Width')),
'height': int_or_none(data.get('Height')),
'fps': int_or_none(data.get('Fps')),
'tbr': int_or_none(data.get('Bitrate'), 1000),
}]
def _real_extract(self, url):
vod_id = self._match_id(url)
vod_info = self._download_json(
'%s/recordings/%s' % (self._API_BASE, vod_id), vod_id)
state = vod_info.get('state')
if state != 'AVAILABLE':
raise ExtractorError(
'VOD %s is not available (state: %s)' % (vod_id, state),
expected=True)
formats = []
thumbnail_url = None
for vod in vod_info['vods']:
vod_type = vod.get('format')
if vod_type in ('hls', 'raw'):
formats.extend(self._extract_format(vod, vod_type))
elif vod_type == 'thumbnail':
thumbnail_url = urljoin(vod.get('baseUrl'), 'source.png')
self._sort_formats(formats)
info = {
'id': vod_id,
'title': vod_info.get('name') or vod_id,
'duration': float_or_none(vod_info.get('duration')),
'thumbnail': thumbnail_url,
'timestamp': parse_iso8601(vod_info.get('createdAt')),
'view_count': int_or_none(vod_info.get('viewsTotal')),
'formats': formats,
}
info.update(self._extract_channel_info(vod_info.get('channel') or {}))
return info

View file

@ -0,0 +1,103 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import extract_attributes
class BFMTVBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?bfmtv\.com/'
_VALID_URL_TMPL = _VALID_URL_BASE + r'(?:[^/]+/)*[^/?&#]+_%s[A-Z]-(?P<id>\d{12})\.html'
_VIDEO_BLOCK_REGEX = r'(<div[^>]+class="video_block"[^>]*>)'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
def _brightcove_url_result(self, video_id, video_block):
account_id = video_block.get('accountid') or '876450612001'
player_id = video_block.get('playerid') or 'I2qBTln4u'
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % (account_id, player_id, video_id),
'BrightcoveNew', video_id)
class BFMTVIE(BFMTVBaseIE):
IE_NAME = 'bfmtv'
_VALID_URL = BFMTVBaseIE._VALID_URL_TMPL % 'V'
_TESTS = [{
'url': 'https://www.bfmtv.com/politique/emmanuel-macron-l-islam-est-une-religion-qui-vit-une-crise-aujourd-hui-partout-dans-le-monde_VN-202010020146.html',
'info_dict': {
'id': '6196747868001',
'ext': 'mp4',
'title': 'Emmanuel Macron: "L\'Islam est une religion qui vit une crise aujourdhui, partout dans le monde"',
'description': 'Le Président s\'exprime sur la question du séparatisme depuis les Mureaux, dans les Yvelines.',
'uploader_id': '876450610001',
'upload_date': '20201002',
'timestamp': 1601629620,
},
}]
def _real_extract(self, url):
bfmtv_id = self._match_id(url)
webpage = self._download_webpage(url, bfmtv_id)
video_block = extract_attributes(self._search_regex(
self._VIDEO_BLOCK_REGEX, webpage, 'video block'))
return self._brightcove_url_result(video_block['videoid'], video_block)
class BFMTVLiveIE(BFMTVIE):
IE_NAME = 'bfmtv:live'
_VALID_URL = BFMTVBaseIE._VALID_URL_BASE + '(?P<id>(?:[^/]+/)?en-direct)'
_TESTS = [{
'url': 'https://www.bfmtv.com/en-direct/',
'info_dict': {
'id': '5615950982001',
'ext': 'mp4',
'title': r're:^le direct BFMTV WEB \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'uploader_id': '876450610001',
'upload_date': '20171018',
'timestamp': 1508329950,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.bfmtv.com/economie/en-direct/',
'only_matching': True,
}]
class BFMTVArticleIE(BFMTVBaseIE):
IE_NAME = 'bfmtv:article'
_VALID_URL = BFMTVBaseIE._VALID_URL_TMPL % 'A'
_TESTS = [{
'url': 'https://www.bfmtv.com/sante/covid-19-un-responsable-de-l-institut-pasteur-se-demande-quand-la-france-va-se-reconfiner_AV-202101060198.html',
'info_dict': {
'id': '202101060198',
'title': 'Covid-19: un responsable de l\'Institut Pasteur se demande "quand la France va se reconfiner"',
'description': 'md5:947974089c303d3ac6196670ae262843',
},
'playlist_count': 2,
}, {
'url': 'https://www.bfmtv.com/international/pour-bolsonaro-le-bresil-est-en-faillite-mais-il-ne-peut-rien-faire_AD-202101060232.html',
'only_matching': True,
}, {
'url': 'https://www.bfmtv.com/sante/covid-19-oui-le-vaccin-de-pfizer-distribue-en-france-a-bien-ete-teste-sur-des-personnes-agees_AN-202101060275.html',
'only_matching': True,
}]
def _real_extract(self, url):
bfmtv_id = self._match_id(url)
webpage = self._download_webpage(url, bfmtv_id)
entries = []
for video_block_el in re.findall(self._VIDEO_BLOCK_REGEX, webpage):
video_block = extract_attributes(video_block_el)
video_id = video_block.get('videoid')
if not video_id:
continue
entries.append(self._brightcove_url_result(video_id, video_block))
return self.playlist_result(
entries, bfmtv_id, self._og_search_title(webpage, fatal=False),
self._html_search_meta(['og:description', 'description'], webpage))

View file

@ -0,0 +1,30 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class BibelTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bibeltv\.de/mediathek/videos/(?:crn/)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.bibeltv.de/mediathek/videos/329703-sprachkurs-in-malaiisch',
'md5': '252f908192d611de038b8504b08bf97f',
'info_dict': {
'id': 'ref:329703',
'ext': 'mp4',
'title': 'Sprachkurs in Malaiisch',
'description': 'md5:3e9f197d29ee164714e67351cf737dfe',
'timestamp': 1608316701,
'uploader_id': '5840105145001',
'upload_date': '20201218',
}
}, {
'url': 'https://www.bibeltv.de/mediathek/videos/crn/326374',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5840105145001/default_default/index.html?videoId=ref:%s'
def _real_extract(self, url):
crn_id = self._match_id(url)
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % crn_id, 'BrightcoveNew')

View file

@ -156,6 +156,7 @@ class BiliBiliIE(InfoExtractor):
cid = js['result']['cid']
headers = {
'Accept': 'application/json',
'Referer': url
}
headers.update(self.geo_verification_headers())
@ -232,7 +233,7 @@ class BiliBiliIE(InfoExtractor):
webpage)
if uploader_mobj:
info.update({
'uploader': uploader_mobj.group('name'),
'uploader': uploader_mobj.group('name').strip(),
'uploader_id': uploader_mobj.group('id'),
})
if not info.get('uploader'):

View file

@ -90,13 +90,19 @@ class BleacherReportCMSIE(AMPIE):
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36}|\d{5})'
_TESTS = [{
'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1&library=video-cms',
'md5': '2e4b0a997f9228ffa31fada5c53d1ed1',
'md5': '670b2d73f48549da032861130488c681',
'info_dict': {
'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
'ext': 'flv',
'ext': 'mp4',
'title': 'Cena vs. Rollins Would Expose the Heavyweight Division',
'description': 'md5:984afb4ade2f9c0db35f3267ed88b36e',
'upload_date': '20150723',
'timestamp': 1437679032,
},
'expected_warnings': [
'Unable to download f4m manifest'
]
}]
def _real_extract(self, url):

View file

@ -1,86 +0,0 @@
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
remove_start,
int_or_none,
)
class BlinkxIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:www\.)blinkx\.com/#?ce/|blinkx:)(?P<id>[^?]+)'
IE_NAME = 'blinkx'
_TEST = {
'url': 'http://www.blinkx.com/ce/Da0Gw3xc5ucpNduzLuDDlv4WC9PuI4fDi1-t6Y3LyfdY2SZS5Urbvn-UPJvrvbo8LTKTc67Wu2rPKSQDJyZeeORCR8bYkhs8lI7eqddznH2ofh5WEEdjYXnoRtj7ByQwt7atMErmXIeYKPsSDuMAAqJDlQZ-3Ff4HJVeH_s3Gh8oQ',
'md5': '337cf7a344663ec79bf93a526a2e06c7',
'info_dict': {
'id': 'Da0Gw3xc',
'ext': 'mp4',
'title': 'No Daily Show for John Oliver; HBO Show Renewed - IGN News',
'uploader': 'IGN News',
'upload_date': '20150217',
'timestamp': 1424215740,
'description': 'HBO has renewed Last Week Tonight With John Oliver for two more seasons.',
'duration': 47.743333,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = video_id[:8]
api_url = ('https://apib4.blinkx.com/api.php?action=play_video&'
+ 'video=%s' % video_id)
data_json = self._download_webpage(api_url, display_id)
data = json.loads(data_json)['api']['results'][0]
duration = None
thumbnails = []
formats = []
for m in data['media']:
if m['type'] == 'jpg':
thumbnails.append({
'url': m['link'],
'width': int(m['w']),
'height': int(m['h']),
})
elif m['type'] == 'original':
duration = float(m['d'])
elif m['type'] == 'youtube':
yt_id = m['link']
self.to_screen('Youtube video detected: %s' % yt_id)
return self.url_result(yt_id, 'Youtube', video_id=yt_id)
elif m['type'] in ('flv', 'mp4'):
vcodec = remove_start(m['vcodec'], 'ff')
acodec = remove_start(m['acodec'], 'ff')
vbr = int_or_none(m.get('vbr') or m.get('vbitrate'), 1000)
abr = int_or_none(m.get('abr') or m.get('abitrate'), 1000)
tbr = vbr + abr if vbr and abr else None
format_id = '%s-%sk-%s' % (vcodec, tbr, m['w'])
formats.append({
'format_id': format_id,
'url': m['link'],
'vcodec': vcodec,
'acodec': acodec,
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(m.get('w')),
'height': int_or_none(m.get('h')),
})
self._sort_formats(formats)
return {
'id': display_id,
'fullid': video_id,
'title': data['title'],
'formats': formats,
'uploader': data['channel_name'],
'timestamp': data['pubdate_epoch'],
'description': data.get('description'),
'thumbnails': thumbnails,
'duration': duration,
}

View file

@ -0,0 +1,60 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
try_get,
urlencode_postdata,
)
class BongaCamsIE(InfoExtractor):
_VALID_URL = r'https?://(?P<host>(?:[^/]+\.)?bongacams\d*\.com)/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://de.bongacams.com/azumi-8',
'only_matching': True,
}, {
'url': 'https://cn.bongacams.com/azumi-8',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
channel_id = mobj.group('id')
amf = self._download_json(
'https://%s/tools/amf.php' % host, channel_id,
data=urlencode_postdata((
('method', 'getRoomData'),
('args[]', channel_id),
('args[]', 'false'),
)), headers={'X-Requested-With': 'XMLHttpRequest'})
server_url = amf['localData']['videoServerUrl']
uploader_id = try_get(
amf, lambda x: x['performerData']['username'], compat_str) or channel_id
uploader = try_get(
amf, lambda x: x['performerData']['displayName'], compat_str)
like_count = int_or_none(try_get(
amf, lambda x: x['performerData']['loversCount']))
formats = self._extract_m3u8_formats(
'%s/hls/stream_%s/playlist.m3u8' % (server_url, uploader_id),
channel_id, 'mp4', m3u8_id='hls', live=True)
self._sort_formats(formats)
return {
'id': channel_id,
'title': self._live_title(uploader or uploader_id),
'uploader': uploader,
'uploader_id': uploader_id,
'like_count': like_count,
'age_limit': 18,
'is_live': True,
'formats': formats,
}

View file

@ -0,0 +1,98 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_iso8601,
# try_get,
update_url_query,
)
class BoxIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?app\.box\.com/s/(?P<shared_name>[^/]+)/file/(?P<id>\d+)'
_TEST = {
'url': 'https://mlssoccer.app.box.com/s/0evd2o3e08l60lr4ygukepvnkord1o1x/file/510727257538',
'md5': '1f81b2fd3960f38a40a3b8823e5fcd43',
'info_dict': {
'id': '510727257538',
'ext': 'mp4',
'title': 'Garber St. Louis will be 28th MLS team +scarving.mp4',
'uploader': 'MLS Video',
'timestamp': 1566320259,
'upload_date': '20190820',
'uploader_id': '235196876',
}
}
def _real_extract(self, url):
shared_name, file_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, file_id)
request_token = self._parse_json(self._search_regex(
r'Box\.config\s*=\s*({.+?});', webpage,
'Box config'), file_id)['requestToken']
access_token = self._download_json(
'https://app.box.com/app-api/enduserapp/elements/tokens', file_id,
'Downloading token JSON metadata',
data=json.dumps({'fileIDs': [file_id]}).encode(), headers={
'Content-Type': 'application/json',
'X-Request-Token': request_token,
'X-Box-EndUser-API': 'sharedName=' + shared_name,
})[file_id]['read']
shared_link = 'https://app.box.com/s/' + shared_name
f = self._download_json(
'https://api.box.com/2.0/files/' + file_id, file_id,
'Downloading file JSON metadata', headers={
'Authorization': 'Bearer ' + access_token,
'BoxApi': 'shared_link=' + shared_link,
'X-Rep-Hints': '[dash]', # TODO: extract `hls` formats
}, query={
'fields': 'authenticated_download_url,created_at,created_by,description,extension,is_download_available,name,representations,size'
})
title = f['name']
query = {
'access_token': access_token,
'shared_link': shared_link
}
formats = []
# for entry in (try_get(f, lambda x: x['representations']['entries'], list) or []):
# entry_url_template = try_get(
# entry, lambda x: x['content']['url_template'])
# if not entry_url_template:
# continue
# representation = entry.get('representation')
# if representation == 'dash':
# TODO: append query to every fragment URL
# formats.extend(self._extract_mpd_formats(
# entry_url_template.replace('{+asset_path}', 'manifest.mpd'),
# file_id, query=query))
authenticated_download_url = f.get('authenticated_download_url')
if authenticated_download_url and f.get('is_download_available'):
formats.append({
'ext': f.get('extension') or determine_ext(title),
'filesize': f.get('size'),
'format_id': 'download',
'url': update_url_query(authenticated_download_url, query),
})
self._sort_formats(formats)
creator = f.get('created_by') or {}
return {
'id': file_id,
'title': title,
'formats': formats,
'description': f.get('description') or None,
'uploader': creator.get('name'),
'timestamp': parse_iso8601(f.get('created_at')),
'uploader_id': creator.get('id'),
}

View file

@ -12,7 +12,7 @@ from ..utils import (
class BravoTVIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?P<req_id>bravotv|oxygen)\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.bravotv.com/top-chef/season-16/episode-15/videos/the-top-chef-season-16-winner-is',
'md5': 'e34684cfea2a96cd2ee1ef3a60909de9',
@ -28,10 +28,13 @@ class BravoTVIE(AdobePassIE):
}, {
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-2/episode-16/videos/handling-the-horwitz-house-after-the-murder-season-2',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
settings = self._parse_json(self._search_regex(
r'<script[^>]+data-drupal-selector="drupal-settings-json"[^>]*>({.+?})</script>', webpage, 'drupal settings'),
@ -53,11 +56,14 @@ class BravoTVIE(AdobePassIE):
tp_path = release_pid = tve['release_pid']
if tve.get('entitlement') == 'auth':
adobe_pass = settings.get('tve_adobe_auth', {})
if site == 'bravotv':
site = 'bravo'
resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'bravo'),
adobe_pass.get('adobePassResourceId') or site,
tve['title'], release_pid, tve.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource)
url, release_pid,
adobe_pass.get('adobePassRequestorId') or site, resource)
else:
shared_playlist = settings['ls_playlist']
account_pid = shared_playlist['account_pid']

View file

@ -28,6 +28,7 @@ from ..utils import (
parse_iso8601,
smuggle_url,
str_or_none,
try_get,
unescapeHTML,
unsmuggle_url,
UnsupportedError,
@ -129,7 +130,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'skip': 'Unsupported URL',
},
{
# playlist with 'playlistTab' (https://github.com/ytdl-org/haruhi-dl/issues/9965)
# playlist with 'playlistTab' (https://github.com/ytdl-org/youtube-dl/issues/9965)
'url': 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=AQ%7E%7E,AAABXlLMdok%7E,NJ4EoMlZ4rZdx9eU1rkMVd8EaYPBBUlg',
'info_dict': {
'id': '1522758701001',
@ -153,10 +154,10 @@ class BrightcoveLegacyIE(InfoExtractor):
<object class="BrightcoveExperience">{params}</object>
"""
# Fix up some stupid HTML, see https://github.com/ytdl-org/haruhi-dl/issues/1553
# Fix up some stupid HTML, see https://github.com/ytdl-org/youtube-dl/issues/1553
object_str = re.sub(r'(<param(?:\s+[a-zA-Z0-9_]+="[^"]*")*)>',
lambda m: m.group(1) + '/>', object_str)
# Fix up some stupid XML, see https://github.com/ytdl-org/haruhi-dl/issues/1608
# Fix up some stupid XML, see https://github.com/ytdl-org/youtube-dl/issues/1608
object_str = object_str.replace('<--', '<!--')
# remove namespace to simplify extraction
object_str = re.sub(r'(<object[^>]*)(xmlns=".*?")', r'\1', object_str)
@ -470,13 +471,18 @@ class BrightcoveNewIE(AdobePassIE):
def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
title = json_data['name'].strip()
num_drm_sources = 0
formats = []
for source in json_data.get('sources', []):
sources = json_data.get('sources') or []
for source in sources:
container = source.get('container')
ext = mimetype2ext(source.get('type'))
src = source.get('src')
# https://support.brightcove.com/playback-api-video-fields-reference#key_systems_object
if ext == 'ism' or container == 'WVM' or source.get('key_systems'):
if container == 'WVM' or source.get('key_systems'):
num_drm_sources += 1
continue
elif ext == 'ism':
continue
elif ext == 'm3u8' or container == 'M2TS':
if not src:
@ -533,20 +539,15 @@ class BrightcoveNewIE(AdobePassIE):
'format_id': build_format_id('rtmp'),
})
formats.append(f)
if not formats:
# for sonyliv.com DRM protected videos
s3_source_url = json_data.get('custom_fields', {}).get('s3sourceurl')
if s3_source_url:
formats.append({
'url': s3_source_url,
'format_id': 'source',
})
errors = json_data.get('errors')
if not formats and errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
if not formats:
errors = json_data.get('errors')
if errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
if sources and num_drm_sources == len(sources):
raise ExtractorError('This video is DRM protected.', expected=True)
self._sort_formats(formats)
@ -600,24 +601,27 @@ class BrightcoveNewIE(AdobePassIE):
store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
def extract_policy_key():
webpage = self._download_webpage(
'http://players.brightcove.net/%s/%s_%s/index.min.js'
% (account_id, player_id, embed), video_id)
policy_key = None
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
base_url = 'http://players.brightcove.net/%s/%s_%s/' % (account_id, player_id, embed)
config = self._download_json(
base_url + 'config.json', video_id, fatal=False) or {}
policy_key = try_get(
config, lambda x: x['video_cloud']['policy_key'])
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
webpage = self._download_webpage(
base_url + 'index.min.js', video_id)
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
store_pk(policy_key)
return policy_key

View file

@ -8,18 +8,20 @@ from .gigya import GigyaBaseIE
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
strip_or_none,
clean_html,
extract_attributes,
float_or_none,
get_element_by_class,
int_or_none,
merge_dicts,
parse_iso8601,
str_or_none,
strip_or_none,
url_or_none,
)
class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza|dako)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '68993eda72ef62386a15ea2cf3c93107',
@ -37,6 +39,7 @@ class CanvasIE(InfoExtractor):
'url': 'https://mediazone.vrt.be/api/v1/canvas/assets/mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
'only_matching': True,
}]
_GEO_BYPASS = False
_HLS_ENTRY_PROTOCOLS_MAP = {
'HLS': 'm3u8_native',
'HLS_AES': 'm3u8',
@ -47,29 +50,34 @@ class CanvasIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site_id'), mobj.group('id')
# Old API endpoint, serves more formats but may fail for some videos
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id, 'Downloading asset JSON',
'Unable to download asset JSON', fatal=False)
data = None
if site_id != 'vrtvideo':
# Old API endpoint, serves more formats but may fail for some videos
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id, 'Downloading asset JSON',
'Unable to download asset JSON', fatal=False)
# New API endpoint
if not data:
headers = self.geo_verification_headers()
headers.update({'Content-Type': 'application/json'})
token = self._download_json(
'%s/tokens' % self._REST_API_BASE, video_id,
'Downloading token', data=b'',
headers={'Content-Type': 'application/json'})['vrtPlayerToken']
'Downloading token', data=b'', headers=headers)['vrtPlayerToken']
data = self._download_json(
'%s/videos/%s' % (self._REST_API_BASE, video_id),
video_id, 'Downloading video JSON', fatal=False, query={
video_id, 'Downloading video JSON', query={
'vrtPlayerToken': token,
'client': '%s@PROD' % site_id,
}, expected_status=400)
message = data.get('message')
if message and not data.get('title'):
if data.get('code') == 'AUTHENTICATION_REQUIRED':
self.raise_login_required(message)
raise ExtractorError(message, expected=True)
if not data.get('title'):
code = data.get('code')
if code == 'AUTHENTICATION_REQUIRED':
self.raise_login_required()
elif code == 'INVALID_LOCATION':
self.raise_geo_restricted(countries=['BE'])
raise ExtractorError(data.get('message') or code, expected=True)
title = data['title']
description = data.get('description')
@ -205,20 +213,24 @@ class CanvasEenIE(InfoExtractor):
class VrtNUIE(GigyaBaseIE):
IE_DESC = 'VrtNU.be'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/vrtnu/a-z/(?:[^/]+/){2}(?P<id>[^/?#&]+)'
_TESTS = [{
# Available via old API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1989/postbus-x-s1989a1/',
'info_dict': {
'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
'id': 'pbs-pub-e8713dac-899e-41de-9313-81269f4c04ac$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
'ext': 'mp4',
'title': 'De zwarte weduwe',
'description': 'md5:db1227b0f318c849ba5eab1fef895ee4',
'title': 'Postbus X - Aflevering 1 (Seizoen 1989)',
'description': 'md5:b704f669eb9262da4c55b33d7c6ed4b7',
'duration': 1457.04,
'thumbnail': r're:^https?://.*\.jpg$',
'season': 'Season 1',
'season_number': 1,
'series': 'Postbus X',
'season': 'Seizoen 1989',
'season_number': 1989,
'episode': 'De zwarte weduwe',
'episode_number': 1,
'timestamp': 1595822400,
'upload_date': '20200727',
},
'skip': 'This video is only available for registered users',
'params': {
@ -300,69 +312,73 @@ class VrtNUIE(GigyaBaseIE):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(url, display_id)
webpage = self._download_webpage(url, display_id)
attrs = extract_attributes(self._search_regex(
r'(<nui-media[^>]+>)', webpage, 'media element'))
video_id = attrs['videoid']
publication_id = attrs.get('publicationid')
if publication_id:
video_id = publication_id + '$' + video_id
page = (self._parse_json(self._search_regex(
r'digitalData\s*=\s*({.+?});', webpage, 'digial data',
default='{}'), video_id, fatal=False) or {}).get('page') or {}
info = self._search_json_ld(webpage, display_id, default={})
# title is optional here since it may be extracted by extractor
# that is delegated from here
title = strip_or_none(self._html_search_regex(
r'(?ms)<h1 class="content__heading">(.+?)</h1>',
webpage, 'title', default=None))
description = self._html_search_regex(
r'(?ms)<div class="content__description">(.+?)</div>',
webpage, 'description', default=None)
season = self._html_search_regex(
[r'''(?xms)<div\ class="tabs__tab\ tabs__tab--active">\s*
<span>seizoen\ (.+?)</span>\s*
</div>''',
r'<option value="seizoen (\d{1,3})" data-href="[^"]+?" selected>'],
webpage, 'season', default=None)
season_number = int_or_none(season)
episode_number = int_or_none(self._html_search_regex(
r'''(?xms)<div\ class="content__episode">\s*
<abbr\ title="aflevering">afl</abbr>\s*<span>(\d+)</span>
</div>''',
webpage, 'episode_number', default=None))
release_date = parse_iso8601(self._html_search_regex(
r'(?ms)<div class="content__broadcastdate">\s*<time\ datetime="(.+?)"',
webpage, 'release_date', default=None))
# If there's a ? or a # in the URL, remove them and everything after
clean_url = urlh.geturl().split('?')[0].split('#')[0].strip('/')
securevideo_url = clean_url + '.mssecurevideo.json'
try:
video = self._download_json(securevideo_url, display_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
self.raise_login_required()
raise
# We are dealing with a '../<show>.relevant' URL
redirect_url = video.get('url')
if redirect_url:
return self.url_result(self._proto_relative_url(redirect_url, 'https:'))
# There is only one entry, but with an unknown key, so just get
# the first one
video_id = list(video.values())[0].get('videoid')
return merge_dicts(info, {
'_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/vrtvideo/assets/%s' % video_id,
'ie_key': CanvasIE.ie_key(),
'id': video_id,
'display_id': display_id,
'season_number': int_or_none(page.get('episode_season')),
})
class DagelijkseKostIE(InfoExtractor):
IE_DESC = 'dagelijksekost.een.be'
_VALID_URL = r'https?://dagelijksekost\.een\.be/gerechten/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://dagelijksekost.een.be/gerechten/hachis-parmentier-met-witloof',
'md5': '30bfffc323009a3e5f689bef6efa2365',
'info_dict': {
'id': 'md-ast-27a4d1ff-7d7b-425e-b84f-a4d227f592fa',
'display_id': 'hachis-parmentier-met-witloof',
'ext': 'mp4',
'title': 'Hachis parmentier met witloof',
'description': 'md5:9960478392d87f63567b5b117688cdc5',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 283.02,
},
'expected_warnings': ['is not a supported codec'],
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = strip_or_none(get_element_by_class(
'dish-metadata__title', webpage
) or self._html_search_meta(
'twitter:title', webpage))
description = clean_html(get_element_by_class(
'dish-description', webpage)
) or self._html_search_meta(
('description', 'twitter:description', 'og:description'),
webpage)
video_id = self._html_search_regex(
r'data-url=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id',
group='id')
return {
'_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/dako/assets/%s' % video_id,
'ie_key': CanvasIE.ie_key(),
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'season': season,
'season_number': season_number,
'episode_number': episode_number,
'release_date': release_date,
})
}

View file

@ -0,0 +1,91 @@
# coding: utf-8
from .common import InfoExtractor
from ..utils import (
parse_duration,
)
import re
class CastosHostedIE(InfoExtractor):
_VALID_URL = r'https?://[^/.]+\.castos\.com/(?:player|episodes)/(?P<id>[\da-zA-Z-]+)'
IE_NAME = 'castos:hosted'
_TESTS = [{
'url': 'https://audience.castos.com/player/408278',
'info_dict': {
'id': '408278',
'ext': 'mp3',
},
}, {
'url': 'https://audience.castos.com/episodes/improve-your-podcast-production',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage, **kw):
return [mobj.group(1) for mobj
in re.finditer(
r'<iframe\b[^>]+(?<!-)src="(https?://[^/.]+\.castos\.com/player/\d+)',
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
series = self._html_search_regex(
r'<div class="show">\s+<strong>([^<]+)</strong>', webpage, 'series name')
title = self._html_search_regex(
r'<div class="episode-title">([^<]+)</div>', webpage, 'episode title')
audio_url = self._html_search_regex(
r'<audio class="clip">\s+<source\b[^>]+src="(https?://[^"]+)"', webpage, 'audio url')
duration = parse_duration(self._search_regex(
r'<time id="duration">(\d\d(?::\d\d)+)</time>', webpage, 'duration'))
return {
'id': video_id,
'title': title,
'url': audio_url,
'duration': duration,
'series': series,
'episode': title,
}
class CastosSSPIE(InfoExtractor):
@classmethod
def _extract_entries(self, webpage, **kw):
entries = []
for found in re.finditer(
r'(?s)<div class="castos-player[^"]*"[^>]*data-episode="(\d+)-[a-z\d]+">(.+?</nav>)\s*</div>',
webpage):
video_id, entry = found.group(1, 2)
def search_entry(regex):
res = re.search(regex, entry)
if res:
return res.group(1)
series = search_entry(r'<div class="show">\s+<strong>([^<]+)</strong>')
title = search_entry(r'<div class="episode-title">([^<]+)</div>')
audio_url = search_entry(
r'<audio class="clip[^"]*">\s+<source\b[^>]+src="(https?://[^"]+)"')
duration = parse_duration(
search_entry(r'<time id="duration[^"]*">(\d\d(?::\d\d)+)</time>'))
if not title or not audio_url:
continue
entries.append({
'id': video_id,
'title': title,
'url': audio_url,
'duration': duration,
'series': series,
'episode': title,
})
return entries

View file

@ -27,7 +27,7 @@ class CBSBaseIE(ThePlatformFeedIE):
class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:(?:cbs|paramountplus)\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@ -52,6 +52,9 @@ class CBSIE(CBSBaseIE):
}, {
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}, {
'url': 'https://www.paramountplus.com/shows/all-rise/video/QmR1WhNkh1a_IrdHZrbcRklm176X_rVc/all-rise-space/',
'only_matching': True,
}]
def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517):

View file

@ -11,7 +11,47 @@ from ..utils import (
class CBSLocalIE(AnvatoIE):
_VALID_URL = r'https?://[a-z]+\.cbslocal\.com/(?:\d+/\d+/\d+|video)/(?P<id>[0-9a-z-]+)'
_VALID_URL_BASE = r'https?://[a-z]+\.cbslocal\.com/'
_VALID_URL = _VALID_URL_BASE + r'video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
'info_dict': {
'id': '3580809',
'ext': 'mp4',
'title': 'A Very Blue Anniversary',
'description': 'CBS2s Cindy Hsu has more.',
'thumbnail': 're:^https?://.*',
'timestamp': int,
'upload_date': r're:^\d{8}$',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\WCBSTV',
'Syndication\\AOL',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\Yahoo',
'Content\\News',
'Content\\News\\Local News',
],
'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
mcp_id = self._match_id(url)
return self.url_result(
'anvato:anvato_cbslocal_app_web_prod_547f3e49241ef0e5d30c79b2efbca5d92c698f67:' + mcp_id, 'Anvato', mcp_id)
class CBSLocalArticleIE(AnvatoIE):
_VALID_URL = CBSLocalIE._VALID_URL_BASE + r'\d+/\d+/\d+/(?P<id>[0-9a-z-]+)'
_TESTS = [{
# Anvato backend
@ -52,31 +92,6 @@ class CBSLocalIE(AnvatoIE):
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
'info_dict': {
'id': '3580809',
'ext': 'mp4',
'title': 'A Very Blue Anniversary',
'description': 'CBS2s Cindy Hsu has more.',
'thumbnail': 're:^https?://.*',
'timestamp': int,
'upload_date': r're:^\d{8}$',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\WCBSTV',
'Syndication\\AOL',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\Yahoo',
'Content\\News',
'Content\\News\\Local News',
],
'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
},
}]
def _real_extract(self, url):

View file

@ -26,7 +26,7 @@ class CBSNewsEmbedIE(CBSIE):
def _real_extract(self, url):
item = self._parse_json(zlib.decompress(compat_b64decode(
compat_urllib_parse_unquote(self._match_id(url))),
-zlib.MAX_WBITS), None)['video']['items'][0]
-zlib.MAX_WBITS).decode('utf-8'), None)['video']['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews')

View file

@ -1,38 +1,113 @@
from __future__ import unicode_literals
from .cbs import CBSBaseIE
import re
# from .cbs import CBSBaseIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
)
class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/(?:video|news)/(?P<id>[^/?#&]+)'
# class CBSSportsEmbedIE(CBSBaseIE):
class CBSSportsEmbedIE(InfoExtractor):
IE_NAME = 'cbssports:embed'
_VALID_URL = r'''(?ix)https?://(?:(?:www\.)?cbs|embed\.247)sports\.com/player/embed.+?
(?:
ids%3D(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})|
pcid%3D(?P<pcid>\d+)
)'''
_TESTS = [{
'url': 'https://www.cbssports.com/nba/video/donovan-mitchell-flashes-star-potential-in-game-2-victory-over-thunder/',
'info_dict': {
'id': '1214315075735',
'ext': 'mp4',
'title': 'Donovan Mitchell flashes star potential in Game 2 victory over Thunder',
'description': 'md5:df6f48622612c2d6bd2e295ddef58def',
'timestamp': 1524111457,
'upload_date': '20180419',
'uploader': 'CBSI-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
}
'url': 'https://www.cbssports.com/player/embed/?args=player_id%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26ids%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26resizable%3D1%26autoplay%3Dtrue%26domain%3Dcbssports.com%26comp_ads_enabled%3Dfalse%26watchAndRead%3D0%26startTime%3D0%26env%3Dprod',
'only_matching': True,
}, {
'url': 'https://www.cbssports.com/nba/news/nba-playoffs-2018-watch-76ers-vs-heat-game-3-series-schedule-tv-channel-online-stream/',
'url': 'https://embed.247sports.com/player/embed/?args=%3fplayer_id%3d1827823171591%26channel%3dcollege-football-recruiting%26pcid%3d1827823171591%26width%3d640%26height%3d360%26autoplay%3dTrue%26comp_ads_enabled%3dFalse%26uvpc%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_v4%2526partner%253d247%26uvpc_m%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_m_v4%2526partner_m%253d247_mobile%26utag%3d247sportssite%26resizable%3dTrue',
'only_matching': True,
}]
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
# def _extract_video_info(self, filter_query, video_id):
# return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url):
uuid, pcid = re.match(self._VALID_URL, url).groups()
query = {'id': uuid} if uuid else {'pcid': pcid}
video = self._download_json(
'https://www.cbssports.com/api/content/video/',
uuid or pcid, query=query)[0]
video_id = video['id']
title = video['title']
metadata = video.get('metaData') or {}
# return self._extract_video_info('byId=%d' % metadata['mpxOutletId'], video_id)
# return self._extract_video_info('byGuid=' + metadata['mpxRefId'], video_id)
formats = self._extract_m3u8_formats(
metadata['files'][0]['url'], video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
self._sort_formats(formats)
image = video.get('image')
thumbnails = None
if image:
image_path = image.get('path')
if image_path:
thumbnails = [{
'url': image_path,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
'filesize': int_or_none(image.get('size')),
}]
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': video.get('description'),
'timestamp': int_or_none(try_get(video, lambda x: x['dateCreated']['epoch'])),
'duration': int_or_none(metadata.get('duration')),
}
class CBSSportsBaseIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'(?:=|%26)pcid%3D(\d+)', r'embedVideo(?:Container)?_(\d+)'],
webpage, 'video id')
return self._extract_video_info('byId=%s' % video_id, video_id)
iframe_url = self._search_regex(
r'<iframe[^>]+(?:data-)?src="(https?://[^/]+/player/embed[^"]+)"',
webpage, 'embed url')
return self.url_result(iframe_url, CBSSportsEmbedIE.ie_key())
class CBSSportsIE(CBSSportsBaseIE):
IE_NAME = 'cbssports'
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.cbssports.com/college-football/video/cover-3-stanford-spring-gleaning/',
'info_dict': {
'id': 'b56c03a6-231a-4bbe-9c55-af3c8a8e9636',
'ext': 'mp4',
'title': 'Cover 3: Stanford Spring Gleaning',
'description': 'The Cover 3 crew break down everything you need to know about the Stanford Cardinal this spring.',
'timestamp': 1617218398,
'upload_date': '20210331',
'duration': 502,
},
}]
class TwentyFourSevenSportsIE(CBSSportsBaseIE):
IE_NAME = '247sports'
_VALID_URL = r'https?://(?:www\.)?247sports\.com/Video/(?:[^/?#&]+-)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://247sports.com/Video/2021-QB-Jake-Garcia-senior-highlights-through-five-games-10084854/',
'info_dict': {
'id': '4f1265cb-c3b5-44a8-bb1d-1914119a0ccc',
'ext': 'mp4',
'title': '2021 QB Jake Garcia senior highlights through five games',
'description': 'md5:8cb67ebed48e2e6adac1701e0ff6e45b',
'timestamp': 1607114223,
'upload_date': '20201204',
'duration': 208,
},
}]

View file

@ -1,15 +1,18 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import datetime
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_timezone,
int_or_none,
parse_duration,
parse_iso8601,
parse_resolution,
try_get,
url_or_none,
)
@ -24,8 +27,9 @@ class CCMAIE(InfoExtractor):
'ext': 'mp4',
'title': 'L\'espot de La Marató de TV3',
'description': 'md5:f12987f320e2f6e988e9908e4fe97765',
'timestamp': 1470918540,
'upload_date': '20160811',
'timestamp': 1478608140,
'upload_date': '20161108',
'age_limit': 0,
}
}, {
'url': 'http://www.ccma.cat/catradio/alacarta/programa/el-consell-de-savis-analitza-el-derbi/audio/943685/',
@ -35,8 +39,24 @@ class CCMAIE(InfoExtractor):
'ext': 'mp3',
'title': 'El Consell de Savis analitza el derbi',
'description': 'md5:e2a3648145f3241cb9c6b4b624033e53',
'upload_date': '20171205',
'timestamp': 1512507300,
'upload_date': '20170512',
'timestamp': 1494622500,
'vcodec': 'none',
'categories': ['Esports'],
}
}, {
'url': 'http://www.ccma.cat/tv3/alacarta/crims/crims-josep-tallada-lespereu-me-capitol-1/video/6031387/',
'md5': 'b43c3d3486f430f3032b5b160d80cbc3',
'info_dict': {
'id': '6031387',
'ext': 'mp4',
'title': 'Crims - Josep Talleda, l\'"Espereu-me" (capítol 1)',
'description': 'md5:7cbdafb640da9d0d2c0f62bad1e74e60',
'timestamp': 1582577700,
'upload_date': '20200224',
'subtitles': 'mincount:4',
'age_limit': 16,
'series': 'Crims',
}
}]
@ -72,17 +92,28 @@ class CCMAIE(InfoExtractor):
informacio = media['informacio']
title = informacio['titol']
durada = informacio.get('durada', {})
durada = informacio.get('durada') or {}
duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
timestamp = parse_iso8601(informacio.get('data_emissio', {}).get('utc'))
tematica = try_get(informacio, lambda x: x['tematica']['text'])
timestamp = None
data_utc = try_get(informacio, lambda x: x['data_emissio']['utc'])
try:
timezone, data_utc = extract_timezone(data_utc)
timestamp = calendar.timegm((datetime.datetime.strptime(
data_utc, '%Y-%d-%mT%H:%M:%S') - timezone).timetuple())
except TypeError:
pass
subtitles = {}
subtitols = media.get('subtitols', {})
if subtitols:
sub_url = subtitols.get('url')
subtitols = media.get('subtitols') or []
if isinstance(subtitols, dict):
subtitols = [subtitols]
for st in subtitols:
sub_url = st.get('url')
if sub_url:
subtitles.setdefault(
subtitols.get('iso') or subtitols.get('text') or 'ca', []).append({
st.get('iso') or st.get('text') or 'ca', []).append({
'url': sub_url,
})
@ -97,6 +128,16 @@ class CCMAIE(InfoExtractor):
'height': int_or_none(imatges.get('alcada')),
}]
age_limit = None
codi_etic = try_get(informacio, lambda x: x['codi_etic']['id'])
if codi_etic:
codi_etic_s = codi_etic.split('_')
if len(codi_etic_s) == 2:
if codi_etic_s[1] == 'TP':
age_limit = 0
else:
age_limit = int_or_none(codi_etic_s[1])
return {
'id': media_id,
'title': title,
@ -106,4 +147,9 @@ class CCMAIE(InfoExtractor):
'thumbnails': thumbnails,
'subtitles': subtitles,
'formats': formats,
'age_limit': age_limit,
'alt_title': informacio.get('titol_complet'),
'episode_number': int_or_none(informacio.get('capitol')),
'categories': [tematica] if tematica else None,
'series': informacio.get('programa'),
}

View file

@ -1,7 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import datetime
import hashlib
import hmac
from .common import InfoExtractor
from ..utils import (
@ -21,6 +24,7 @@ class CDABaseExtractor(InfoExtractor):
# apparently the token is hardcoded in the app
'Authorization': 'Basic YzU3YzBlZDUtYTIzOC00MWQwLWI2NjQtNmZmMWMxY2Y2YzVlOklBTm95QlhRRVR6U09MV1hnV3MwMW0xT2VyNWJNZzV4clRNTXhpNGZJUGVGZ0lWUlo5UGVYTDhtUGZaR1U1U3Q',
}
_NETRC_MACHINE = 'cda'
_bearer = None
# logs into cda.pl and returns _BASE_HEADERS with the Bearer token
@ -33,8 +37,27 @@ class CDABaseExtractor(InfoExtractor):
})
return headers
token_res = self._download_json(self._BASE_URL + '/oauth/token?grant_type=password&login=niezesrajciesiecda&password=VD3QbYWSb_uwAShBZKN7F1DwEg_tRTdb4Xd3JvFsx6Y',
video_id, 'Logging into cda.pl', headers=headers, data=bytes(''.encode('utf-8')))
username, password = self._get_login_info()
if username is None or password is None:
username = 'niezesrajciesiecda'
password_hash = 'VD3QbYWSb_uwAShBZKN7F1DwEg_tRTdb4Xd3JvFsx6Y'
account_type = 'shared'
else:
pwd_md5 = ""
for byte in hashlib.md5(password.encode('utf-8')).digest():
# bytes() param must be iterable of ints and not int
hexik = bytes((byte & 255, )).hex()
while len(hexik) < 2:
hexik = "0" + hexik
pwd_md5 += hexik
digest = hmac.new(
's01m1Oer5IANoyBXQETzSOLWXgWs01m1Oer5bMg5xrTMMxRZ9Pi4fIPeFgIVRZ9PeXL8mPfXQETZGUAN5StRZ9P'.encode('utf-8'),
pwd_md5.encode('utf-8'), hashlib.sha256).digest()
password_hash = base64.urlsafe_b64encode(digest).decode('utf-8').replace('=', '')
account_type = 'user'
token_res = self._download_json('%s/oauth/token?grant_type=password&login=%s&password=%s' % (self._BASE_URL, username, password_hash),
video_id, 'Logging into cda.pl with a %s account' % account_type, headers=headers, data=bytes(''.encode('utf-8')))
self._bearer = {
'token': token_res['access_token'],
@ -103,9 +126,6 @@ class CDAIE(CDABaseExtractor):
metadata = self._download_json(
self._BASE_URL + '/video/' + video_id, video_id, headers=headers)['video']
if metadata.get('premium') is True and metadata.get('premium_free') is not True:
raise ExtractorError('This video is only available for premium users.', expected=True)
uploader = try_get(metadata, lambda x: x['author']['login'])
# anonymous uploader
if uploader == 'anonim':
@ -113,6 +133,8 @@ class CDAIE(CDABaseExtractor):
formats = []
for quality in metadata['qualities']:
if not quality['file']:
continue
formats.append({
'url': quality['file'],
'format': quality['title'],
@ -121,6 +143,13 @@ class CDAIE(CDABaseExtractor):
'filesize': quality.get('length'),
})
if not formats:
if metadata.get('premium') is True and metadata.get('premium_free') is not True:
raise ExtractorError('This video is only available for premium users.', expected=True)
raise ExtractorError('No video qualities found', video_id=video_id)
self._sort_formats(formats)
return {
'id': video_id,
'title': metadata['title'],

View file

@ -157,7 +157,7 @@ class CeskaTelevizeIE(InfoExtractor):
stream_formats = self._extract_mpd_formats(
stream_url, playlist_id,
mpd_id='dash-%s' % format_id, fatal=False)
# See https://github.com/ytdl-org/haruhi-dl/issues/12119#issuecomment-280037031
# See https://github.com/ytdl-org/youtube-dl/issues/12119#issuecomment-280037031
if format_id == 'audioDescription':
for f in stream_formats:
f['source_preference'] = -10

View file

@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import smuggle_url
@ -38,7 +39,7 @@ class CNBCIE(InfoExtractor):
class CNBCVideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cnbc\.com/video/(?:[^/]+/)+(?P<id>[^./?#&]+)'
_VALID_URL = r'https?://(?:www\.)?cnbc\.com(?P<path>/video/(?:[^/]+/)+(?P<id>[^./?#&]+)\.html)'
_TEST = {
'url': 'https://www.cnbc.com/video/2018/07/19/trump-i-dont-necessarily-agree-with-raising-rates.html',
'info_dict': {
@ -56,11 +57,15 @@ class CNBCVideoIE(InfoExtractor):
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'content_id["\']\s*:\s*["\'](\d+)', webpage, display_id,
'video id')
path, display_id = re.match(self._VALID_URL, url).groups()
video_id = self._download_json(
'https://webql-redesign.cnbcfm.com/graphql', display_id, query={
'query': '''{
page(path: "%s") {
vcpsId
}
}''' % path,
})['data']['page']['vcpsId']
return self.url_result(
'http://video.cnbc.com/gallery/?video=%s' % video_id,
'http://video.cnbc.com/gallery/?video=%d' % video_id,
CNBCIE.ie_key())

View file

@ -96,7 +96,10 @@ class CNNIE(TurnerBaseIE):
config['data_src'] % path, page_title, {
'default': {
'media_src': config['media_src'],
}
},
'f4m': {
'host': 'cnn-vh.akamaihd.net',
},
})

View file

@ -1,142 +1,51 @@
from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor
from .common import InfoExtractor
class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
(video-clips|episodes|cc-studios|video-collections|shows(?=/[^/]+/(?!full-episodes)))
/(?P<title>.*)'''
_VALID_URL = r'https?://(?:www\.)?cc\.com/(?:episodes|video(?:-clips)?)/(?P<id>[0-9a-z]{6})'
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'
_TESTS = [{
'url': 'http://www.cc.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
'md5': 'c4f48e9eda1b16dd10add0744344b6d8',
'url': 'http://www.cc.com/video-clips/5ke9v2/the-daily-show-with-trevor-noah-doc-rivers-and-steve-ballmer---the-nba-player-strike',
'md5': 'b8acb347177c680ff18a292aa2166f80',
'info_dict': {
'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354',
'id': '89ccc86e-1b02-4f83-b0c9-1d9592ecd025',
'ext': 'mp4',
'title': 'CC:Stand-Up|August 18, 2013|1|0101|Uncensored - Too Good of a Mother',
'description': 'After a certain point, breastfeeding becomes c**kblocking.',
'timestamp': 1376798400,
'upload_date': '20130818',
'title': 'The Daily Show with Trevor Noah|August 28, 2020|25|25149|Doc Rivers and Steve Ballmer - The NBA Player Strike',
'description': 'md5:5334307c433892b85f4f5e5ac9ef7498',
'timestamp': 1598670000,
'upload_date': '20200829',
},
}, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview',
'url': 'http://www.cc.com/episodes/pnzzci/drawn-together--american-idol--parody-clip-show-season-3-ep-314',
'only_matching': True,
}]
class ComedyCentralFullEpisodesIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
(?:full-episodes|shows(?=/[^/]+/full-episodes))
/(?P<id>[^?]+)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'
_TESTS = [{
'url': 'http://www.cc.com/full-episodes/pv391a/the-daily-show-with-trevor-noah-november-28--2016---ryan-speedo-green-season-22-ep-22028',
'info_dict': {
'description': 'Donald Trump is accused of exploiting his president-elect status for personal gain, Cuban leader Fidel Castro dies, and Ryan Speedo Green discusses "Sing for Your Life."',
'title': 'November 28, 2016 - Ryan Speedo Green',
},
'playlist_count': 4,
}, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
mgid = self._extract_triforce_mgid(webpage, data_zone='t2_lc_promo1')
videos_info = self._get_videos_info(mgid)
return videos_info
class ToshIE(MTVServicesInfoExtractor):
IE_DESC = 'Tosh.0'
_VALID_URL = r'^https?://tosh\.cc\.com/video-(?:clips|collections)/[^/]+/(?P<videotitle>[^/?#]+)'
_FEED_URL = 'http://tosh.cc.com/feeds/mrss'
_TESTS = [{
'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans',
'info_dict': {
'description': 'Tosh asked fans to share their summer plans.',
'title': 'Twitter Users Share Summer Plans',
},
'playlist': [{
'md5': 'f269e88114c1805bb6d7653fecea9e06',
'info_dict': {
'id': '90498ec2-ed00-11e0-aca6-0026b9414f30',
'ext': 'mp4',
'title': 'Tosh.0|June 9, 2077|2|211|Twitter Users Share Summer Plans',
'description': 'Tosh asked fans to share their summer plans.',
'thumbnail': r're:^https?://.*\.jpg',
# It's really reported to be published on year 2077
'upload_date': '20770610',
'timestamp': 3390510600,
'subtitles': {
'en': 'mincount:3',
},
},
}]
}, {
'url': 'http://tosh.cc.com/video-collections/x2iz7k/just-plain-foul/m5q4fp',
'url': 'https://www.cc.com/video/k3sdvm/the-daily-show-with-jon-stewart-exclusive-the-fourth-estate',
'only_matching': True,
}]
class ComedyCentralTVIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/folgen/(?P<id>[0-9a-z]{6})'
_TESTS = [{
'url': 'http://www.comedycentral.tv/staffeln/7436-the-mindy-project-staffel-4',
'url': 'https://www.comedycentral.tv/folgen/pxdpec/josh-investigates-klimawandel-staffel-1-ep-1',
'info_dict': {
'id': 'local_playlist-f99b626bdfe13568579a',
'ext': 'flv',
'title': 'Episode_the-mindy-project_shows_season-4_episode-3_full-episode_part1',
'id': '15907dc3-ec3c-11e8-a442-0e40cf2fc285',
'ext': 'mp4',
'title': 'Josh Investigates',
'description': 'Steht uns das Ende der Welt bevor?',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
'url': 'http://www.comedycentral.tv/shows/1074-workaholics',
'only_matching': True,
}, {
'url': 'http://www.comedycentral.tv/shows/1727-the-mindy-project/bonus',
'only_matching': True,
}]
_FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
_GEO_COUNTRIES = ['DE']
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mrss_url = self._search_regex(
r'data-mrss=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'mrss url', group='url')
return self._get_videos_info_from_url(mrss_url, video_id)
class ComedyCentralShortnameIE(InfoExtractor):
_VALID_URL = r'^:(?P<id>tds|thedailyshow|theopposition)$'
_TESTS = [{
'url': ':tds',
'only_matching': True,
}, {
'url': ':thedailyshow',
'only_matching': True,
}, {
'url': ':theopposition',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
shortcut_map = {
'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'theopposition': 'http://www.cc.com/shows/the-opposition-with-jordan-klepper/full-episodes',
def _get_feed_query(self, uri):
return {
'accountOverride': 'intl.mtvi.com',
'arcEp': 'web.cc.tv',
'ep': 'b9032c3a',
'imageEp': 'web.cc.tv',
'mgid': uri,
}
return self.url_result(shortcut_map[video_id])

View file

@ -17,7 +17,7 @@ import math
from ..compat import (
compat_cookiejar_Cookie,
compat_cookies,
compat_cookies_SimpleCookie,
compat_etree_Element,
compat_etree_fromstring,
compat_getpass,
@ -70,6 +70,7 @@ from ..utils import (
str_or_none,
str_to_int,
strip_or_none,
try_get,
unescapeHTML,
unified_strdate,
unified_timestamp,
@ -204,6 +205,14 @@ class InfoExtractor(object):
* downloader_options A dictionary of downloader options as
described in FileDownloader
Internally, extractors can include subtitles in the format
list, in this format:
* _subtitle The subtitle object, in the same format
as in subtitles field
* _key The tag for the provided subtitle
This is never included in the output JSON, but moved
into the subtitles field.
url: Final video URL.
ext: Video filename extension.
format: The video format, defaults to ext (used for --get-format)
@ -230,8 +239,10 @@ class InfoExtractor(object):
uploader: Full name of the video uploader.
license: License name the video is licensed under.
creator: The creator of the video.
release_timestamp: UNIX timestamp of the moment the video was released.
release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video became available.
timestamp: UNIX timestamp of the moment the video became available
(uploaded).
upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
@ -245,11 +256,15 @@ class InfoExtractor(object):
subtitles: The available subtitles as a dictionary in the format
{tag: subformats}. "tag" is usually a language code, and
"subformats" is a list sorted from lower to higher
preference, each element is a dictionary with the "ext"
entry and one of:
preference, each element is a dictionary,
which must contain one of these values:
* "data": The subtitles file contents
* "url": A URL pointing to the subtitles file
"ext" will be calculated from URL if missing
These values are guessed based on other data, if missing,
in a way analogic to the formats data:
* "ext" - subtitle extension name (vtt, srt, ...)
* "proto" - download protocol (https, http, m3u8, ...)
* "http_headers"
automatic_captions: Like 'subtitles', used by the YoutubeIE for
automatically generated captions
duration: Length of the video in seconds, as an integer or float.
@ -336,8 +351,8 @@ class InfoExtractor(object):
object, each element of which is a valid dictionary by this specification.
Additionally, playlists can have "id", "title", "description", "uploader",
"uploader_id", "uploader_url" attributes with the same semantics as videos
(see above).
"uploader_id", "uploader_url", "duration" attributes with the same semantics
as videos (see above).
_type "multi_video" indicates that there are multiple videos that
@ -1239,8 +1254,16 @@ class InfoExtractor(object):
'ViewAction': 'view',
}
def extract_interaction_type(e):
interaction_type = e.get('interactionType')
if isinstance(interaction_type, dict):
interaction_type = interaction_type.get('@type')
return str_or_none(interaction_type)
def extract_interaction_statistic(e):
interaction_statistic = e.get('interactionStatistic')
if isinstance(interaction_statistic, dict):
interaction_statistic = [interaction_statistic]
if not isinstance(interaction_statistic, list):
return
for is_e in interaction_statistic:
@ -1248,8 +1271,8 @@ class InfoExtractor(object):
continue
if is_e.get('@type') != 'InteractionCounter':
continue
interaction_type = is_e.get('interactionType')
if not isinstance(interaction_type, compat_str):
interaction_type = extract_interaction_type(is_e)
if not interaction_type:
continue
# For interaction count some sites provide string instead of
# an integer (as per spec) with non digit characters (e.g. ",")
@ -1265,6 +1288,23 @@ class InfoExtractor(object):
continue
info[count_key] = interaction_count
def extract_author(e):
if not e:
return None
if not e.get('author'):
return None
e = e['author']
if isinstance(e, str):
info['uploader'] = e
elif isinstance(e, dict):
etype = e.get('@type')
if etype in ('Person', 'Organization'):
info.update({
'uploader': e.get('name'),
'uploader_id': e.get('identifier'),
'uploader_url': try_get(e, lambda x: x['url']['url'], str),
})
media_object_types = ('MediaObject', 'VideoObject', 'AudioObject', 'MusicVideoObject')
def extract_media_object(e):
@ -1282,7 +1322,6 @@ class InfoExtractor(object):
'thumbnails': thumbnails,
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
@ -1290,6 +1329,7 @@ class InfoExtractor(object):
'view_count': int_or_none(e.get('interactionCount')),
})
extract_interaction_statistic(e)
extract_author(e)
for e in json_ld:
if '@context' in e:
@ -1383,6 +1423,10 @@ class InfoExtractor(object):
f['tbr'] = f['abr'] + f['vbr']
def _formats_key(f):
# manifest subtitle workaround
if '_subtitle' in f:
return (-1,)
# TODO remove the following workaround
from ..utils import determine_ext
if not f.get('ext') and 'url' in f:
@ -1402,7 +1446,19 @@ class InfoExtractor(object):
preference -= 0.5
protocol = f.get('protocol') or determine_protocol(f)
proto_preference = 0 if protocol in ['http', 'https'] else (-0.5 if protocol == 'rtsp' else -0.1)
if protocol in ['http', 'https']:
proto_preference = 0
elif protocol == 'rtsp':
proto_preference = -0.5
elif protocol == 'bittorrent':
if self._downloader.params.get('prefer_p2p') is True:
proto_preference = 1
elif self._downloader.params.get('allow_p2p') is True:
proto_preference = -0.1
else:
proto_preference = -2
else:
proto_preference = -0.1
if f.get('vcodec') == 'none': # audio only
preference -= 50
@ -1474,9 +1530,10 @@ class InfoExtractor(object):
try:
self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers)
return True
except ExtractorError:
except ExtractorError as e:
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, item))
'%s: %s URL is invalid, skipping: %s'
% (video_id, item, error_to_compat_str(e.cause)))
return False
def http_scheme(self):
@ -1510,7 +1567,7 @@ class InfoExtractor(object):
manifest_url, video_id, 'Downloading f4m manifest',
'Unable to download f4m manifest',
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/ytdl-org/haruhi-dl/issues/6215#issuecomment-121704244)
# (see https://github.com/ytdl-org/youtube-dl/issues/6215#issuecomment-121704244)
transform_source=transform_source,
fatal=fatal, data=data, headers=headers, query=query)
@ -1541,7 +1598,7 @@ class InfoExtractor(object):
manifest_version = '2.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
# Remove unsupported DRM protected media from final formats
# rendition (see https://github.com/ytdl-org/haruhi-dl/issues/8573).
# rendition (see https://github.com/ytdl-org/youtube-dl/issues/8573).
media_nodes = remove_encrypted_media(media_nodes)
if not media_nodes:
return formats
@ -1672,8 +1729,8 @@ class InfoExtractor(object):
# References:
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21
# 2. https://github.com/ytdl-org/haruhi-dl/issues/12211
# 3. https://github.com/ytdl-org/haruhi-dl/issues/18923
# 2. https://github.com/ytdl-org/youtube-dl/issues/12211
# 3. https://github.com/ytdl-org/youtube-dl/issues/18923
# We should try extracting formats only from master playlists [1, 4.3.4],
# i.e. playlists that describe available qualities. On the other hand
@ -1705,7 +1762,7 @@ class InfoExtractor(object):
if not (media_type and group_id and name):
return
groups.setdefault(group_id, []).append(media)
if media_type not in ('VIDEO', 'AUDIO'):
if media_type not in ('VIDEO', 'AUDIO', 'SUBTITLES'):
return
media_url = media.get('URI')
if media_url:
@ -1713,17 +1770,27 @@ class InfoExtractor(object):
for v in (m3u8_id, group_id, name):
if v:
format_id.append(v)
f = {
'format_id': '-'.join(format_id),
'url': format_url(media_url),
'manifest_url': m3u8_url,
'language': media.get('LANGUAGE'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
}
if media_type == 'AUDIO':
f['vcodec'] = 'none'
if media_type == 'SUBTITLES':
f = {
'_subtitle': {
'url': format_url(media_url),
'ext': 'vtt',
'protocol': entry_protocol,
},
'_key': media.get('LANGUAGE'),
}
else:
f = {
'format_id': '-'.join(format_id),
'url': format_url(media_url),
'manifest_url': m3u8_url,
'language': media.get('LANGUAGE'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
}
if media_type == 'AUDIO':
f['vcodec'] = 'none'
formats.append(f)
def build_stream_name():
@ -2229,7 +2296,7 @@ class InfoExtractor(object):
# First of, % characters outside $...$ templates
# must be escaped by doubling for proper processing
# by % operator string formatting used further (see
# https://github.com/ytdl-org/haruhi-dl/issues/16867).
# https://github.com/ytdl-org/youtube-dl/issues/16867).
t = ''
in_template = False
for c in tmpl:
@ -2248,7 +2315,7 @@ class InfoExtractor(object):
# @initialization is a regular template like @media one
# so it should be handled just the same way (see
# https://github.com/ytdl-org/haruhi-dl/issues/11605)
# https://github.com/ytdl-org/youtube-dl/issues/11605)
if 'initialization' in representation_ms_info:
initialization_template = prepare_template(
'initialization',
@ -2334,7 +2401,7 @@ class InfoExtractor(object):
elif 'segment_urls' in representation_ms_info:
# Segment URLs with no SegmentTimeline
# Example: https://www.seznam.cz/zpravy/clanek/cesko-zasahne-vitr-o-sile-vichrice-muze-byt-i-zivotu-nebezpecny-39091
# https://github.com/ytdl-org/haruhi-dl/pull/14844
# https://github.com/ytdl-org/youtube-dl/pull/14844
fragments = []
segment_duration = float_or_none(
representation_ms_info['segment_duration'],
@ -2372,8 +2439,8 @@ class InfoExtractor(object):
# According to [1, 5.3.5.2, Table 7, page 35] @id of Representation
# is not necessarily unique within a Period thus formats with
# the same `format_id` are quite possible. There are numerous examples
# of such manifests (see https://github.com/ytdl-org/haruhi-dl/issues/15111,
# https://github.com/ytdl-org/haruhi-dl/issues/13919)
# of such manifests (see https://github.com/ytdl-org/youtube-dl/issues/15111,
# https://github.com/ytdl-org/youtube-dl/issues/13919)
full_info = formats_dict.get(representation_id, {}).copy()
full_info.update(f)
formats.append(full_info)
@ -2536,7 +2603,7 @@ class InfoExtractor(object):
media_tags.extend(re.findall(
# We only allow video|audio followed by a whitespace or '>'.
# Allowing more characters may end up in significant slow down (see
# https://github.com/ytdl-org/haruhi-dl/issues/11979, example URL:
# https://github.com/ytdl-org/youtube-dl/issues/11979, example URL:
# http://www.porntrex.com/maps/videositemap.xml).
r'(?s)(<(?P<tag>(?:amp-)?(?:video|audio))(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
for media_tag, media_type, media_content in media_tags:
@ -2612,7 +2679,15 @@ class InfoExtractor(object):
return entries
def _extract_akamai_formats(self, manifest_url, video_id, hosts={}):
signed = 'hdnea=' in manifest_url
if not signed:
# https://learn.akamai.com/en-us/webhelp/media-services-on-demand/stream-packaging-user-guide/GUID-BE6C0F73-1E06-483B-B0EA-57984B91B7F9.html
manifest_url = re.sub(
r'(?:b=[\d,-]+|(?:__a__|attributes)=off|__b__=\d+)&?',
'', manifest_url).strip('?')
formats = []
hdcore_sign = 'hdcore=3.7.0'
f4m_url = re.sub(r'(https?://[^/]+)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
hds_host = hosts.get('hds')
@ -2625,13 +2700,38 @@ class InfoExtractor(object):
for entry in f4m_formats:
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.extend(f4m_formats)
m3u8_url = re.sub(r'(https?://[^/]+)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
hls_host = hosts.get('hls')
if hls_host:
m3u8_url = re.sub(r'(https?://)[^/]+', r'\1' + hls_host, m3u8_url)
formats.extend(self._extract_m3u8_formats(
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats)
http_host = hosts.get('http')
if http_host and m3u8_formats and not signed:
REPL_REGEX = r'https?://[^/]+/i/([^,]+),([^/]+),([^/]+)\.csmil/.+'
qualities = re.match(REPL_REGEX, m3u8_url).group(2).split(',')
qualities_length = len(qualities)
if len(m3u8_formats) in (qualities_length, qualities_length + 1):
i = 0
for f in m3u8_formats:
if f['vcodec'] != 'none':
for protocol in ('http', 'https'):
http_f = f.copy()
del http_f['manifest_url']
http_url = re.sub(
REPL_REGEX, protocol + r'://%s/\g<1>%s\3' % (http_host, qualities[i]), f['url'])
http_f.update({
'format_id': http_f['format_id'].replace('hls-', protocol + '-'),
'url': http_url,
'protocol': protocol,
})
formats.append(http_f)
i += 1
return formats
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
@ -2882,10 +2982,10 @@ class InfoExtractor(object):
self._downloader.cookiejar.set_cookie(cookie)
def _get_cookies(self, url):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """
""" Return a compat_cookies_SimpleCookie with the cookies for the url """
req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie'))
return compat_cookies_SimpleCookie(req.get_header('Cookie'))
def _apply_first_set_cookie_header(self, url_handle, cookie):
"""
@ -2898,7 +2998,7 @@ class InfoExtractor(object):
We will workaround this issue by resetting the cookie to
the first one manually.
1. https://new.vk.com/
2. https://github.com/ytdl-org/haruhi-dl/issues/9841#issuecomment-227871201
2. https://github.com/ytdl-org/youtube-dl/issues/9841#issuecomment-227871201
3. https://learning.oreilly.com/
"""
for header, cookies in url_handle.headers.items():

View file

@ -36,7 +36,7 @@ class UnicodeBOMIE(InfoExtractor):
_VALID_URL = r'(?P<bom>\ufeff)(?P<id>.*)$'
# Disable test for python 3.2 since BOM is broken in re in this version
# (see https://github.com/ytdl-org/haruhi-dl/issues/9751)
# (see https://github.com/ytdl-org/youtube-dl/issues/9751)
_TESTS = [] if (3, 0) < sys.version_info <= (3, 3) else [{
'url': '\ufeffhttp://www.youtube.com/watch?v=BaW_jenozKc',
'only_matching': True,

View file

@ -1,9 +1,15 @@
from __future__ import unicode_literals
from urllib.parse import parse_qs
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
)
from ..utils import (
try_get,
ExtractorError,
)
class RtmpIE(InfoExtractor):
@ -58,3 +64,71 @@ class MmsIE(InfoExtractor):
'title': title,
'url': url,
}
class BitTorrentMagnetIE(InfoExtractor):
IE_DESC = False
_VALID_URL = r'(?i)magnet:\?.+'
_TESTS = [{
'url': 'magnet:?xs=https%3A%2F%2Fvideo.internet-czas-dzialac.pl%2Fstatic%2Ftorrents%2F9085aa69-90c2-40c6-a707-3472b92cafc8-0.torrent&xt=urn:btih:0ae4cc8cb0e098a1a40b3224aa578bb4210a8cff&dn=Podcast+Internet.+Czas+dzia%C5%82a%C4%87!+-+Trailer&tr=wss%3A%2F%2Fvideo.internet-czas-dzialac.pl%3A443%2Ftracker%2Fsocket&tr=https%3A%2F%2Fvideo.internet-czas-dzialac.pl%2Ftracker%2Fannounce&ws=https%3A%2F%2Fvideo.internet-czas-dzialac.pl%2Fstatic%2Fwebseed%2F9085aa69-90c2-40c6-a707-3472b92cafc8-0.mp4',
'info_dict': {
'id': 'urn:btih:0ae4cc8cb0e098a1a40b3224aa578bb4210a8cff',
'ext': 'torrent',
'title': 'Podcast Internet. Czas działać! - Trailer',
},
'params': {
'allow_p2p': True,
'prefer_p2p': True,
'skip_download': True,
},
}]
def _real_extract(self, url):
qs = parse_qs(url[len('magnet:?'):])
# eXact Topic
video_id = qs['xt'][0]
if not video_id.startswith('urn:btih:'):
raise ExtractorError('Not a BitTorrent magnet')
# Display Name
title = try_get(qs, lambda x: x['dn'][0], str) or video_id[len('urn:btih:'):]
formats = [{
'url': url,
'protocol': 'bittorrent',
}]
# Web Seed
if qs.get('ws'):
for ws in qs['ws']:
formats.append({
'url': ws,
})
# Acceptable Source
if qs.get('as'):
for as_ in qs['as']:
formats.append({
'url': as_,
'preference': -2,
})
# eXact Source
if qs.get('xs'):
for xs in qs['xs']:
formats.append({
'url': xs,
'protocol': 'bittorrent',
})
self._sort_formats(formats)
# eXact Length
if qs.get('xl'):
xl = int(qs['xl'][0])
for i in range(0, len(formats)):
formats[i]['filesize'] = xl
return {
'id': video_id,
'title': title,
'formats': formats,
}

View file

@ -16,6 +16,8 @@ from ..utils import (
mimetype2ext,
orderedSet,
parse_iso8601,
strip_or_none,
try_get,
)
@ -82,6 +84,7 @@ class CondeNastIE(InfoExtractor):
'uploader': 'gq',
'upload_date': '20170321',
'timestamp': 1490126427,
'description': 'How much grimmer would things be if these people were competent?',
},
}, {
# JS embed
@ -93,7 +96,7 @@ class CondeNastIE(InfoExtractor):
'title': '3D printed TSA Travel Sentry keys really do open TSA locks',
'uploader': 'arstechnica',
'upload_date': '20150916',
'timestamp': 1442434955,
'timestamp': 1442434920,
}
}, {
'url': 'https://player.cnevids.com/inline/video/59138decb57ac36b83000005.js?target=js-cne-player',
@ -196,6 +199,13 @@ class CondeNastIE(InfoExtractor):
})
self._sort_formats(formats)
subtitles = {}
for t, caption in video_info.get('captions', {}).items():
caption_url = caption.get('src')
if not (t in ('vtt', 'srt', 'tml') and caption_url):
continue
subtitles.setdefault('en', []).append({'url': caption_url})
return {
'id': video_id,
'formats': formats,
@ -208,6 +218,7 @@ class CondeNastIE(InfoExtractor):
'season': video_info.get('season_title'),
'timestamp': parse_iso8601(video_info.get('premiere_date')),
'categories': video_info.get('categories'),
'subtitles': subtitles,
}
def _real_extract(self, url):
@ -225,8 +236,16 @@ class CondeNastIE(InfoExtractor):
if url_type == 'series':
return self._extract_series(url, webpage)
else:
params = self._extract_video_params(webpage, display_id)
info = self._search_json_ld(
webpage, display_id, fatal=False)
video = try_get(self._parse_json(self._search_regex(
r'__PRELOADED_STATE__\s*=\s*({.+?});', webpage,
'preload state', '{}'), display_id),
lambda x: x['transformed']['video'])
if video:
params = {'videoId': video['id']}
info = {'description': strip_or_none(video.get('description'))}
else:
params = self._extract_video_params(webpage, display_id)
info = self._search_json_ld(
webpage, display_id, fatal=False)
info.update(self._extract_video(params))
return info

View file

@ -112,7 +112,7 @@ class CrunchyrollBaseIE(InfoExtractor):
# > This content may be inappropriate for some people.
# > Are you sure you want to continue?
# since it's not disabled by default in crunchyroll account's settings.
# See https://github.com/ytdl-org/haruhi-dl/issues/7202.
# See https://github.com/ytdl-org/youtube-dl/issues/7202.
qs['skip_wall'] = ['1']
return compat_urlparse.urlunparse(
parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
@ -267,7 +267,7 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues
# similar to https://github.com/ytdl-org/haruhi-dl/issues/6797.
# similar to https://github.com/ytdl-org/youtube-dl/issues/6797.
# Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
# should be imposed or not (from what I can see it just takes the first language
# ignoring the priority and requires it to correspond the IP). By the way this causes

View file

@ -8,9 +8,14 @@ from ..utils import (
ExtractorError,
extract_attributes,
find_xpath_attr,
get_element_by_attribute,
get_element_by_class,
int_or_none,
js_to_json,
merge_dicts,
parse_iso8601,
smuggle_url,
str_to_int,
unescapeHTML,
)
from .senateisvp import SenateISVPIE
@ -98,6 +103,48 @@ class CSpanIE(InfoExtractor):
bc_attr['data-bcid'])
return self.url_result(smuggle_url(bc_url, {'source_url': url}))
def add_referer(formats):
for f in formats:
f.setdefault('http_headers', {})['Referer'] = url
# As of 01.12.2020 this path looks to cover all cases making the rest
# of the code unnecessary
jwsetup = self._parse_json(
self._search_regex(
r'(?s)jwsetup\s*=\s*({.+?})\s*;', webpage, 'jwsetup',
default='{}'),
video_id, transform_source=js_to_json, fatal=False)
if jwsetup:
info = self._parse_jwplayer_data(
jwsetup, video_id, require_title=False, m3u8_id='hls',
base_url=url)
add_referer(info['formats'])
for subtitles in info['subtitles'].values():
for subtitle in subtitles:
ext = determine_ext(subtitle['url'])
if ext == 'php':
ext = 'vtt'
subtitle['ext'] = ext
ld_info = self._search_json_ld(webpage, video_id, default={})
title = get_element_by_class('video-page-title', webpage) or \
self._og_search_title(webpage)
description = get_element_by_attribute('itemprop', 'description', webpage) or \
self._html_search_meta(['og:description', 'description'], webpage)
return merge_dicts(info, ld_info, {
'title': title,
'thumbnail': get_element_by_attribute('itemprop', 'thumbnailUrl', webpage),
'description': description,
'timestamp': parse_iso8601(get_element_by_attribute('itemprop', 'uploadDate', webpage)),
'location': get_element_by_attribute('itemprop', 'contentLocation', webpage),
'duration': int_or_none(self._search_regex(
r'jwsetup\.seclength\s*=\s*(\d+);',
webpage, 'duration', fatal=False)),
'view_count': str_to_int(self._search_regex(
r"<span[^>]+class='views'[^>]*>([\d,]+)\s+Views</span>",
webpage, 'views', fatal=False)),
})
# Obsolete
# We first look for clipid, because clipprog always appears before
patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')]
results = list(filter(None, (re.search(p, webpage) for p in patterns)))
@ -165,6 +212,7 @@ class CSpanIE(InfoExtractor):
formats = self._extract_m3u8_formats(
path, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls') if determine_ext(path) == 'm3u8' else [{'url': path, }]
add_referer(formats)
self._sort_formats(formats)
entries.append({
'id': '%s_%d' % (video_id, partnum + 1),

View file

@ -0,0 +1,52 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class CTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ctv\.ca/(?P<id>(?:show|movie)s/[^/]+/[^/?#&]+)'
_TESTS = [{
'url': 'https://www.ctv.ca/shows/your-morning/wednesday-december-23-2020-s5e88',
'info_dict': {
'id': '2102249',
'ext': 'flv',
'title': 'Wednesday, December 23, 2020',
'thumbnail': r're:^https?://.*\.jpg$',
'description': 'Your Morning delivers original perspectives and unique insights into the headlines of the day.',
'timestamp': 1608732000,
'upload_date': '20201223',
'series': 'Your Morning',
'season': '2020-2021',
'season_number': 5,
'episode_number': 88,
'tags': ['Your Morning'],
'categories': ['Talk Show'],
'duration': 7467.126,
},
}, {
'url': 'https://www.ctv.ca/movies/adam-sandlers-eight-crazy-nights/adam-sandlers-eight-crazy-nights',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
content = self._download_json(
'https://www.ctv.ca/space-graphql/graphql', display_id, query={
'query': '''{
resolvedPath(path: "/%s") {
lastSegment {
content {
... on AxisContent {
axisId
videoPlayerDestCode
}
}
}
}
}''' % display_id,
})['data']['resolvedPath']['lastSegment']['content']
video_id = content['axisId']
return self.url_result(
'9c9media:%s:%s' % (content['videoPlayerDestCode'], video_id),
'NineCNineMedia', video_id)

View file

@ -25,12 +25,12 @@ class CuriosityStreamBaseIE(InfoExtractor):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True)
def _call_api(self, path, video_id):
def _call_api(self, path, video_id, query=None):
headers = {}
if self._auth_token:
headers['X-Auth-Token'] = self._auth_token
result = self._download_json(
self._API_BASE_URL + path, video_id, headers=headers)
self._API_BASE_URL + path, video_id, headers=headers, query=query)
self._handle_errors(result)
return result['data']
@ -52,62 +52,75 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
}
},
'params': {
'format': 'bestvideo',
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
title = media['title']
formats = []
for encoding in media.get('encodings', []):
m3u8_url = encoding.get('master_playlist_url')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
encoding_url = encoding.get('url')
file_url = encoding.get('file_url')
if not encoding_url and not file_url:
continue
f = {
'width': int_or_none(encoding.get('width')),
'height': int_or_none(encoding.get('height')),
'vbr': int_or_none(encoding.get('video_bitrate')),
'abr': int_or_none(encoding.get('audio_bitrate')),
'filesize': int_or_none(encoding.get('size_in_bytes')),
'vcodec': encoding.get('video_codec'),
'acodec': encoding.get('audio_codec'),
'container': encoding.get('container_type'),
}
for f_url in (encoding_url, file_url):
if not f_url:
for encoding_format in ('m3u8', 'mpd'):
media = self._call_api('media/' + video_id, video_id, query={
'encodingsNew': 'true',
'encodingsFormat': encoding_format,
})
for encoding in media.get('encodings', []):
playlist_url = encoding.get('master_playlist_url')
if encoding_format == 'm3u8':
# use `m3u8` entry_protocol until EXT-X-MAP is properly supported by `m3u8_native` entry_protocol
formats.extend(self._extract_m3u8_formats(
playlist_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
elif encoding_format == 'mpd':
formats.extend(self._extract_mpd_formats(
playlist_url, video_id, mpd_id='dash', fatal=False))
encoding_url = encoding.get('url')
file_url = encoding.get('file_url')
if not encoding_url and not file_url:
continue
fmt = f.copy()
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', f_url)
if rtmp:
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': 'rtmp',
})
else:
fmt.update({
'url': f_url,
'format_id': 'http',
})
formats.append(fmt)
f = {
'width': int_or_none(encoding.get('width')),
'height': int_or_none(encoding.get('height')),
'vbr': int_or_none(encoding.get('video_bitrate')),
'abr': int_or_none(encoding.get('audio_bitrate')),
'filesize': int_or_none(encoding.get('size_in_bytes')),
'vcodec': encoding.get('video_codec'),
'acodec': encoding.get('audio_codec'),
'container': encoding.get('container_type'),
}
for f_url in (encoding_url, file_url):
if not f_url:
continue
fmt = f.copy()
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', f_url)
if rtmp:
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': 'rtmp',
})
else:
fmt.update({
'url': f_url,
'format_id': 'http',
})
formats.append(fmt)
self._sort_formats(formats)
title = media['title']
subtitles = {}
for closed_caption in media.get('closed_captions', []):
sub_url = closed_caption.get('file')
@ -132,7 +145,7 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collection|series)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collections?|series)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://app.curiositystream.com/collection/2',
'info_dict': {
@ -140,10 +153,13 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?',
},
'playlist_mincount': 17,
'playlist_mincount': 16,
}, {
'url': 'https://curiositystream.com/series/2',
'only_matching': True,
}, {
'url': 'https://curiositystream.com/collections/36',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -7,7 +7,7 @@ from .dplay import DPlayIE
class DiscoveryNetworksDeIE(DPlayIE):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:tlc|dmax)\.de|dplay\.co\.uk)/(?:programme|show)/(?P<programme>[^/]+)/video/(?P<alternate_id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:tlc|dmax)\.de|dplay\.co\.uk)/(?:programme|show|sendungen)/(?P<programme>[^/]+)/(?:video/)?(?P<alternate_id>[^/]+)'
_TESTS = [{
'url': 'https://www.tlc.de/programme/breaking-amish/video/die-welt-da-drauen/DCB331270001100',
@ -29,6 +29,9 @@ class DiscoveryNetworksDeIE(DPlayIE):
}, {
'url': 'https://www.dplay.co.uk/show/ghost-adventures/video/hotel-leger-103620/EHD_280313B',
'only_matching': True,
}, {
'url': 'https://tlc.de/sendungen/breaking-amish/die-welt-da-drauen/',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -32,6 +32,18 @@ class DigitallySpeakingIE(InfoExtractor):
# From http://www.gdcvault.com/play/1013700/Advanced-Material
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
'only_matching': True,
}, {
# From https://gdcvault.com/play/1016624, empty speakerVideo
'url': 'https://sevt.dispeak.com/ubm/gdc/online12/xml/201210-822101_1349794556671DDDD.xml',
'info_dict': {
'id': '201210-822101_1349794556671DDDD',
'ext': 'flv',
'title': 'Pre-launch - Preparing to Take the Plunge',
},
}, {
# From http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru, empty slideVideo
'url': 'http://events.digitallyspeaking.com/gdc/project25/xml/p25-miyamoto1999_1282467389849HSVB.xml',
'only_matching': True,
}]
def _parse_mp4(self, metadata):
@ -84,26 +96,20 @@ class DigitallySpeakingIE(InfoExtractor):
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
for video_key, format_id, preference in (
('slide', 'slides', -2), ('speaker', 'speaker', -1)):
video_path = xpath_text(metadata, './%sVideo' % video_key)
if not video_path:
continue
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(video_path, '.flv'),
'ext': 'flv',
'format_note': '%s video' % video_key,
'quality': preference,
'preference': preference,
'format_id': format_id,
})
return formats
def _real_extract(self, url):

View file

@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@ -10,16 +11,23 @@ from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
strip_or_none,
unified_timestamp,
)
class DPlayIE(InfoExtractor):
_PATH_REGEX = r'/(?P<id>[^/]+/[^/?#]+)'
_VALID_URL = r'''(?x)https?://
(?P<domain>
(?:www\.)?(?P<host>dplay\.(?P<country>dk|fi|jp|se|no))|
(?:www\.)?(?P<host>d
(?:
play\.(?P<country>dk|fi|jp|se|no)|
iscoveryplus\.(?P<plus_country>dk|es|fi|it|se|no)
)
)|
(?P<subdomain_country>es|it)\.dplay\.com
)/[^/]+/(?P<id>[^/]+/[^/?#]+)'''
)/[^/]+''' + _PATH_REGEX
_TESTS = [{
# non geo restricted, via secure api, unsigned download hls URL
@ -126,58 +134,99 @@ class DPlayIE(InfoExtractor):
}, {
'url': 'https://www.dplay.jp/video/gold-rush/24086',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.se/videos/nugammalt-77-handelser-som-format-sverige/nugammalt-77-handelser-som-format-sverige-101',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.dk/videoer/ted-bundy-mind-of-a-monster/ted-bundy-mind-of-a-monster',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.no/videoer/i-kongens-klr/sesong-1-episode-7',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.it/videos/biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.es/videos/la-fiebre-del-oro/temporada-8-episodio-1',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.fi/videot/shifting-gears-with-aaron-kaufman/episode-16',
'only_matching': True,
}]
def _process_errors(self, e, geo_countries):
info = self._parse_json(e.cause.read().decode('utf-8'), None)
error = info['errors'][0]
error_code = error.get('code')
if error_code == 'access.denied.geoblocked':
self.raise_geo_restricted(countries=geo_countries)
elif error_code in ('access.denied.missingpackage', 'invalid.token'):
raise ExtractorError(
'This video is only available for registered users. You may want to use --cookies.', expected=True)
raise ExtractorError(info['errors'][0]['detail'], expected=True)
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers['Authorization'] = 'Bearer ' + self._download_json(
disco_base + 'token', display_id, 'Downloading token',
query={
'realm': realm,
})['data']['attributes']['token']
def _download_video_playback_info(self, disco_base, video_id, headers):
streaming = self._download_json(
disco_base + 'playback/videoPlaybackInfo/' + video_id,
video_id, headers=headers)['data']['attributes']['streaming']
streaming_list = []
for format_id, format_dict in streaming.items():
streaming_list.append({
'type': format_id,
'url': format_dict.get('url'),
})
return streaming_list
def _get_disco_api_info(self, url, display_id, disco_host, realm, country):
geo_countries = [country.upper()]
self._initialize_geo_bypass({
'countries': geo_countries,
})
disco_base = 'https://%s/' % disco_host
token = self._download_json(
disco_base + 'token', display_id, 'Downloading token',
query={
'realm': realm,
})['data']['attributes']['token']
headers = {
'Referer': url,
'Authorization': 'Bearer ' + token,
}
video = self._download_json(
disco_base + 'content/videos/' + display_id, display_id,
headers=headers, query={
'fields[channel]': 'name',
'fields[image]': 'height,src,width',
'fields[show]': 'name',
'fields[tag]': 'name',
'fields[video]': 'description,episodeNumber,name,publishStart,seasonNumber,videoDuration',
'include': 'images,primaryChannel,show,tags'
})
self._update_disco_api_headers(headers, disco_base, display_id, realm)
try:
video = self._download_json(
disco_base + 'content/videos/' + display_id, display_id,
headers=headers, query={
'fields[channel]': 'name',
'fields[image]': 'height,src,width',
'fields[show]': 'name',
'fields[tag]': 'name',
'fields[video]': 'description,episodeNumber,name,publishStart,seasonNumber,videoDuration',
'include': 'images,primaryChannel,show,tags'
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
self._process_errors(e, geo_countries)
raise
video_id = video['data']['id']
info = video['data']['attributes']
title = info['name'].strip()
formats = []
try:
streaming = self._download_json(
disco_base + 'playback/videoPlaybackInfo/' + video_id,
display_id, headers=headers)['data']['attributes']['streaming']
streaming = self._download_video_playback_info(
disco_base, video_id, headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
error = info['errors'][0]
error_code = error.get('code')
if error_code == 'access.denied.geoblocked':
self.raise_geo_restricted(countries=geo_countries)
elif error_code == 'access.denied.missingpackage':
self.raise_login_required()
raise ExtractorError(info['errors'][0]['detail'], expected=True)
self._process_errors(e, geo_countries)
raise
for format_id, format_dict in streaming.items():
for format_dict in streaming:
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
if not format_url:
continue
format_id = format_dict.get('type')
ext = determine_ext(format_url)
if format_id == 'dash' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
@ -225,7 +274,7 @@ class DPlayIE(InfoExtractor):
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('description'),
'description': strip_or_none(info.get('description')),
'duration': float_or_none(info.get('videoDuration'), 1000),
'timestamp': unified_timestamp(info.get('publishStart')),
'series': series,
@ -241,7 +290,80 @@ class DPlayIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
domain = mobj.group('domain').lstrip('www.')
country = mobj.group('country') or mobj.group('subdomain_country')
host = 'disco-api.' + domain if domain.startswith('dplay.') else 'eu2-prod.disco-api.com'
country = mobj.group('country') or mobj.group('subdomain_country') or mobj.group('plus_country')
host = 'disco-api.' + domain if domain[0] == 'd' else 'eu2-prod.disco-api.com'
return self._get_disco_api_info(
url, display_id, host, 'dplay' + country, country)
class DiscoveryPlusIE(DPlayIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/video' + DPlayIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoveryplus.com/video/property-brothers-forever-home/food-and-family',
'info_dict': {
'id': '1140794',
'display_id': 'property-brothers-forever-home/food-and-family',
'ext': 'mp4',
'title': 'Food and Family',
'description': 'The brothers help a Richmond family expand their single-level home.',
'duration': 2583.113,
'timestamp': 1609304400,
'upload_date': '20201230',
'creator': 'HGTV',
'series': 'Property Brothers: Forever Home',
'season_number': 1,
'episode_number': 1,
},
'skip': 'Available for Premium users',
}]
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers['x-disco-client'] = 'WEB:UNKNOWN:dplus_us:15.0.0'
def _download_video_playback_info(self, disco_base, video_id, headers):
return self._download_json(
disco_base + 'playback/v3/videoPlaybackInfo',
video_id, headers=headers, data=json.dumps({
'deviceInfo': {
'adBlocker': False,
},
'videoId': video_id,
'wisteriaProperties': {
'platform': 'desktop',
'product': 'dplus_us',
},
}).encode('utf-8'))['data']['attributes']['streaming']
def _real_extract(self, url):
display_id = self._match_id(url)
return self._get_disco_api_info(
url, display_id, 'us1-prod-direct.discoveryplus.com', 'go', 'us')
class HGTVDeIE(DPlayIE):
_VALID_URL = r'https?://de\.hgtv\.com/sendungen' + DPlayIE._PATH_REGEX
_TESTS = [{
'url': 'https://de.hgtv.com/sendungen/tiny-house-klein-aber-oho/wer-braucht-schon-eine-toilette/',
'info_dict': {
'id': '151205',
'display_id': 'tiny-house-klein-aber-oho/wer-braucht-schon-eine-toilette',
'ext': 'mp4',
'title': 'Wer braucht schon eine Toilette',
'description': 'md5:05b40a27e7aed2c9172de34d459134e2',
'duration': 1177.024,
'timestamp': 1595705400,
'upload_date': '20200725',
'creator': 'HGTV',
'series': 'Tiny House - klein, aber oho',
'season_number': 3,
'episode_number': 3,
},
'params': {
'format': 'bestvideo',
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
return self._get_disco_api_info(
url, display_id, 'eu1-prod.disco-api.com', 'hgtv', 'de')

View file

@ -1,193 +1,43 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
xpath_text,
determine_ext,
float_or_none,
ExtractorError,
)
from .zdf import ZDFIE
class DreiSatIE(InfoExtractor):
class DreiSatIE(ZDFIE):
IE_NAME = '3sat'
_GEO_COUNTRIES = ['DE']
_VALID_URL = r'https?://(?:www\.)?3sat\.de/mediathek/(?:(?:index|mediathek)\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)'
_TESTS = [
{
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
'md5': 'be37228896d30a88f315b638900a026e',
'info_dict': {
'id': '45918',
'ext': 'mp4',
'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'uploader': 'SCHWEIZWEIT',
'uploader_id': '100000210',
'upload_date': '20140913'
},
'params': {
'skip_download': True, # m3u8 downloads
}
_VALID_URL = r'https?://(?:www\.)?3sat\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
_TESTS = [{
# Same as https://www.zdf.de/dokumentation/ab-18/10-wochen-sommer-102.html
'url': 'https://www.3sat.de/film/ab-18/10-wochen-sommer-108.html',
'md5': '0aff3e7bc72c8813f5e0fae333316a1d',
'info_dict': {
'id': '141007_ab18_10wochensommer_film',
'ext': 'mp4',
'title': 'Ab 18! - 10 Wochen Sommer',
'description': 'md5:8253f41dc99ce2c3ff892dac2d65fe26',
'duration': 2660,
'timestamp': 1608604200,
'upload_date': '20201222',
},
{
'url': 'http://www.3sat.de/mediathek/mediathek.php?mode=play&obj=51066',
'only_matching': True,
}, {
'url': 'https://www.3sat.de/gesellschaft/schweizweit/waidmannsheil-100.html',
'info_dict': {
'id': '140913_sendung_schweizweit',
'ext': 'mp4',
'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'timestamp': 1410623100,
'upload_date': '20140913'
},
]
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
param_groups = {}
for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
group_id = param_group.get(self._xpath_ns(
'id', 'http://www.w3.org/XML/1998/namespace'))
params = {}
for param in param_group:
params[param.get('name')] = param.get('value')
param_groups[group_id] = params
formats = []
for video in smil.findall(self._xpath_ns('.//video', namespace)):
src = video.get('src')
if not src:
continue
bitrate = int_or_none(self._search_regex(r'_(\d+)k', src, 'bitrate', None)) or float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
group_id = video.get('paramGroup')
param_group = param_groups[group_id]
for proto in param_group['protocols'].split(','):
formats.append({
'url': '%s://%s' % (proto, param_group['host']),
'app': param_group['app'],
'play_path': src,
'ext': 'flv',
'format_id': '%s-%d' % (proto, bitrate),
'tbr': bitrate,
})
self._sort_formats(formats)
return formats
def extract_from_xml_url(self, video_id, xml_url):
doc = self._download_xml(
xml_url, video_id,
note='Downloading video info',
errnote='Failed to download video info')
status_code = xpath_text(doc, './status/statuscode')
if status_code and status_code != 'ok':
if status_code == 'notVisibleAnymore':
message = 'Video %s is not available' % video_id
else:
message = '%s returned error: %s' % (self.IE_NAME, status_code)
raise ExtractorError(message, expected=True)
title = xpath_text(doc, './/information/title', 'title', True)
urls = []
formats = []
for fnode in doc.findall('.//formitaeten/formitaet'):
video_url = xpath_text(fnode, 'url')
if not video_url or video_url in urls:
continue
urls.append(video_url)
is_available = 'http://www.metafilegenerator' not in video_url
geoloced = 'static_geoloced_online' in video_url
if not is_available or geoloced:
continue
format_id = fnode.attrib['basetype']
format_m = re.match(r'''(?x)
(?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
(?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
''', format_id)
ext = determine_ext(video_url, None) or format_m.group('container')
if ext == 'meta':
continue
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
video_url, video_id, fatal=False))
elif ext == 'm3u8':
# the certificates are misconfigured (see
# https://github.com/ytdl-org/haruhi-dl/issues/8665)
if video_url.startswith('https://'):
continue
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id=format_id, fatal=False))
else:
quality = xpath_text(fnode, './quality')
if quality:
format_id += '-' + quality
abr = int_or_none(xpath_text(fnode, './audioBitrate'), 1000)
vbr = int_or_none(xpath_text(fnode, './videoBitrate'), 1000)
tbr = int_or_none(self._search_regex(
r'_(\d+)k', video_url, 'bitrate', None))
if tbr and vbr and not abr:
abr = tbr - vbr
formats.append({
'format_id': format_id,
'url': video_url,
'ext': ext,
'acodec': format_m.group('acodec'),
'vcodec': format_m.group('vcodec'),
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(xpath_text(fnode, './width')),
'height': int_or_none(xpath_text(fnode, './height')),
'filesize': int_or_none(xpath_text(fnode, './filesize')),
'protocol': format_m.group('proto').lower(),
})
geolocation = xpath_text(doc, './/details/geolocation')
if not formats and geolocation and geolocation != 'none':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
thumbnails = []
for node in doc.findall('.//teaserimages/teaserimage'):
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
thumbnail_key = node.get('key')
if thumbnail_key:
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
upload_date = unified_strdate(xpath_text(doc, './/details/airtime'))
return {
'id': video_id,
'title': title,
'description': xpath_text(doc, './/information/detail'),
'duration': int_or_none(xpath_text(doc, './/details/lengthSec')),
'thumbnails': thumbnails,
'uploader': xpath_text(doc, './/details/originChannelTitle'),
'uploader_id': xpath_text(doc, './/details/originChannelId'),
'upload_date': upload_date,
'formats': formats,
'params': {
'skip_download': True,
}
def _real_extract(self, url):
video_id = self._match_id(url)
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?id=%s' % video_id
return self.extract_from_xml_url(video_id, details_url)
}, {
# Same as https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html
'url': 'https://www.3sat.de/film/spielfilm/der-hauptmann-100.html',
'only_matching': True,
}, {
# Same as https://www.zdf.de/wissen/nano/nano-21-mai-2019-102.html, equal media ids
'url': 'https://www.3sat.de/wissen/nano/nano-21-mai-2019-102.html',
'only_matching': True,
}]

View file

@ -29,7 +29,7 @@ class DRTVIE(InfoExtractor):
https?://
(?:
(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*|
(?:www\.)?(?:dr\.dk|dr-massive\.com)/drtv/(?:se|episode)/
(?:www\.)?(?:dr\.dk|dr-massive\.com)/drtv/(?:se|episode|program)/
)
(?P<id>[\da-z_-]+)
'''
@ -111,6 +111,9 @@ class DRTVIE(InfoExtractor):
}, {
'url': 'https://dr-massive.com/drtv/se/bonderoeven_71769',
'only_matching': True,
}, {
'url': 'https://www.dr.dk/drtv/program/jagten_220924',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -1,15 +1,57 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
smuggle_url,
unified_strdate,
unsmuggle_url,
ExtractorError,
)
from ..compat import compat_urlparse
class DWIE(InfoExtractor):
class DWVideoIE(InfoExtractor):
IE_NAME = 'dw:video'
_VALID_URL = r'dw:(?P<id>\d+)'
def _get_dw_formats(self, media_id, hidden_inputs):
if hidden_inputs.get('player_type') == 'video':
# https://www.dw.com/smil/v-{video_id} returns more formats,
# but they are all RTMP. ytdl used to do this:
# url.replace('rtmp://tv-od.dw.de/flash/', 'http://tv-download.dw.de/dwtv_video/flv/')
# this returns formats, but it's completely random if they work or not.
formats = [{
'url': fmt['file'],
'format_code': fmt['label'],
'height': int_or_none(fmt['label']),
} for fmt in self._download_json(
'https://www.dw.com/playersources/v-%s' % media_id,
media_id, 'Downloading JSON formats')]
self._sort_formats(formats)
else:
formats = [{'url': hidden_inputs['file_name']}]
return {
'id': media_id,
'title': hidden_inputs['media_title'],
'formats': formats,
'duration': int_or_none(hidden_inputs.get('file_duration')),
'upload_date': hidden_inputs.get('display_date'),
'thumbnail': hidden_inputs.get('preview_image'),
'is_live': hidden_inputs.get('isLiveVideo'),
}
def _real_extract(self, url):
media_id = self._match_id(url)
_, hidden_inputs = unsmuggle_url(url)
if not hidden_inputs:
return self.url_result('https://www.dw.com/en/av-%s' % media_id, 'DW', media_id)
return self._get_dw_formats(media_id, hidden_inputs)
class DWIE(DWVideoIE):
IE_NAME = 'dw'
_VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+(?:av|e)-(?P<id>\d+)'
_TESTS = [{
@ -21,7 +63,7 @@ class DWIE(InfoExtractor):
'ext': 'mp4',
'title': 'Intelligent light',
'description': 'md5:90e00d5881719f2a6a5827cb74985af1',
'upload_date': '20160311',
'upload_date': '20160605',
}
}, {
# audio
@ -52,57 +94,57 @@ class DWIE(InfoExtractor):
media_id = self._match_id(url)
webpage = self._download_webpage(url, media_id)
hidden_inputs = self._hidden_inputs(webpage)
title = hidden_inputs['media_title']
media_id = hidden_inputs.get('media_id') or media_id
if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
formats = self._extract_smil_formats(
'http://www.dw.com/smil/v-%s' % media_id, media_id,
transform_source=lambda s: s.replace(
'rtmp://tv-od.dw.de/flash/',
'http://tv-download.dw.de/dwtv_video/flv/'))
self._sort_formats(formats)
else:
formats = [{'url': hidden_inputs['file_name']}]
info_dict = {
'description': self._og_search_description(webpage),
}
info_dict.update(self._get_dw_formats(media_id, hidden_inputs))
upload_date = hidden_inputs.get('display_date')
if not upload_date:
if info_dict.get('upload_date') is None:
upload_date = self._html_search_regex(
r'<span[^>]+class="date">([0-9.]+)\s*\|', webpage,
'upload date', default=None)
upload_date = unified_strdate(upload_date)
info_dict['upload_date'] = unified_strdate(upload_date)
return {
'id': media_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': hidden_inputs.get('preview_image'),
'duration': int_or_none(hidden_inputs.get('file_duration')),
'upload_date': upload_date,
'formats': formats,
}
return info_dict
class DWArticleIE(InfoExtractor):
class DWArticleIE(DWVideoIE):
IE_NAME = 'dw:article'
_VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+a-(?P<id>\d+)'
_TEST = {
'url': 'http://www.dw.com/en/no-hope-limited-options-for-refugees-in-idomeni/a-19111009',
'md5': '8ca657f9d068bbef74d6fc38b97fc869',
'url': 'https://www.dw.com/pl/zalecenie-ema-szczepmy-si%C4%99-astrazenec%C4%85/a-56919770',
'info_dict': {
'id': '19105868',
'id': '56911196',
'ext': 'mp4',
'title': 'The harsh life of refugees in Idomeni',
'description': 'md5:196015cc7e48ebf474db9399420043c7',
'upload_date': '20160310',
}
'title': 'Czy AstraZeneca jest bezpieczna?',
'upload_date': '20210318',
},
}
def _real_extract(self, url):
article_id = self._match_id(url)
webpage = self._download_webpage(url, article_id)
hidden_inputs = self._hidden_inputs(webpage)
media_id = hidden_inputs['media_id']
media_path = self._search_regex(r'href="([^"]+av-%s)"\s+class="overlayLink"' % media_id, webpage, 'media url')
media_url = compat_urlparse.urljoin(url, media_path)
return self.url_result(media_url, 'DW', media_id)
videos = re.finditer(
r'<div class="mediaItem" data-media-id="(?P<id>\d+)">(?P<hidden_inputs>.+?)<div',
webpage)
if not videos:
raise ExtractorError('No videos found')
entries = []
for video in videos:
video_id, hidden_inputs = video.group('id', 'hidden_inputs')
hidden_inputs = self._hidden_inputs(hidden_inputs)
entries.append({
'_type': 'url_transparent',
'title': hidden_inputs['media_title'],
'url': smuggle_url('dw:%s' % video_id, hidden_inputs),
'ie_key': 'DWVideo',
})
return {
'_type': 'playlist',
'entries': entries,
'id': article_id,
'title': self._html_search_regex(r'<h1>([^>]+)</h1>', webpage, 'article title'),
'description': self._og_search_description(webpage),
}

Some files were not shown because too many files have changed in this diff Show more