Compare commits

...

793 commits

Author SHA1 Message Date
Lauren Liberda 2f375d447c fix/speedup ci 2021-09-09 12:38:11 +02:00
Lauren Liberda d464b29113 vider support 2021-09-06 22:34:06 +02:00
Lauren Liberda 19602fb3f5 [polskieradio] fix PR4 audition shit 2021-08-31 20:25:12 +02:00
Lauren Liberda a550e21b8c [ipla] state the DRM requirement clearly 2021-08-07 02:23:28 +02:00
Lauren Liberda 1ae67712e8 [ipla] error handling 2021-08-07 01:08:07 +02:00
Dominika Liberda a96bf110da * version 2021.08.01 2021-08-01 17:44:07 +02:00
Lauren Liberda 973652cf4d [youtube] fix age gate for *some* videos 2021-08-01 17:39:30 +02:00
Lauren Liberda d81137a604 [peertube] pt 3.3+ url scheme support, fix tests, minor fixes 2021-07-30 20:40:19 +02:00
Lauren Liberda a0d52ce5be [niconico] dmc downloader and other stuff from yt-dlp (as of 40078a5) 2021-06-26 14:40:02 +02:00
Dominika Liberda 81b5018d99 * version 2021.06.24.1 2021-06-24 14:01:25 +02:00
Dominika Liberda 31b7bf5bdb * fixes crash if signature decryption code isn't packed with artifacts 2021-06-24 13:58:36 +02:00
Dominika Liberda a0cb1b40a2 * fix in release script 2021-06-24 13:18:36 +02:00
Dominika Liberda c3e48f4934 * version 2021.06.24 2021-06-24 13:07:07 +02:00
Dominika Liberda ca6cbb6234 * fixes youtube list extractor 2021-06-24 12:27:39 +02:00
Lauren Liberda 7858dc7b9f fix app crash/tests 2021-06-22 03:17:30 +02:00
Lauren Liberda 2234b1100c [liveleak] remove for real 2021-06-22 03:02:52 +02:00
Lauren Liberda 75442522b2 [soundcloud] prerelease client id fetching 2021-06-22 02:43:50 +02:00
Lauren Liberda f4070e6fe4 prerelease artifact generator, for youtube sig 2021-06-21 23:01:02 +02:00
Lauren Liberda b30cd7afbb [liveleak] remove extractor 2021-06-21 20:43:52 +02:00
Lauren Liberda 29389b4935 [pornhub] Add support for pornhubthbh7ap3u.onion
Original author: dstftw <dstftw@gmail.com>
2021-06-21 20:26:48 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3fc2d04e08 [pornhub] Detect geo restriction 2021-06-21 20:22:14 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 30a3fb457e [pornhub] Dismiss tbr extracted from download URLs (closes #28927)
No longer reliable
2021-06-21 20:22:07 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 69813b6be8 [curiositystream:collection] Extend _VALID_URL (closes #26326, closes…
#29117)
2021-06-21 20:22:00 +02:00
Tianyi Shi f1a365faf8 [bilibili] Strip uploader name (#29202) 2021-06-21 20:21:17 +02:00
Logan B 86c90f7d47 [umg:de] Update GraphQL API URL (#29304)
Previous one no longer resolves

Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-06-21 20:20:56 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a33a92ba4b [nrk] Switch psapi URL to https (closes #29344)
Catalog calls no longer work via http
2021-06-21 20:20:49 +02:00
kikuyan 6057163d97 [postprocessor/ffmpeg] Show ffmpeg output on error (refs #22680) (#29…
…336)
2021-06-21 20:20:43 +02:00
kikuyan aad8936157 [egghead] Add support for app.egghead.io (closes #28404) (#29303)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-06-21 20:20:36 +02:00
kikuyan 18dd355e39 [appleconnect] Fix extraction (#29208) 2021-06-21 20:20:29 +02:00
kikuyan e628fc3794 [orf:tvthek] Add support for MPD formats (closes #28672) (#29236) 2021-06-21 20:20:18 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ac99e96a1e [facebook] Improve login required detection 2021-06-21 20:19:41 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 93131809f2 [youporn] Fix formats and view count extraction (closes #29216) 2021-06-21 20:19:35 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 9cced7b3d2 [orf:tvthek] Fix thumbnails extraction (closes #29217) 2021-06-21 20:19:28 +02:00
Remita Amine b526b67bc1 [formula1] fix extraction(closes #29206) 2021-06-21 20:19:20 +02:00
Lauren Liberda e676b759d1 [youtube] fix the fancy georestricted error 2021-06-20 23:00:58 +02:00
Dominika Liberda 1d54631bfb * version 2021.06.20 2021-06-20 22:26:02 +02:00
Lauren Liberda 073959a503 update changelog 2021-06-20 22:10:21 +02:00
Dominika Liberda eaf7a8bd6e * fixes agegate on youtube 2021-06-20 22:08:26 +02:00
Lauren Liberda ed273bfbf2 [youtube] cleanup, speed up age-gated extraction, fix videos with js-like syntax 2021-06-06 16:40:57 +02:00
Lauren Liberda 9373a2f667 [options] fix playwright headlessness behavior 2021-06-03 17:16:39 +02:00
Lauren Liberda f2a5fa2e53 [playwright] option to force a specific browser 2021-06-03 17:13:41 +02:00
Lauren Liberda 9b1ef5167d [tiktok] fix empty video lists
I'm fucking stupid
2021-06-03 16:53:01 +02:00
Lauren Liberda 7787c45730 [playwright] simplify code 2021-06-03 14:33:14 +02:00
Dominika Liberda f34b024e70 * version 2021.06.01 2021-06-01 10:39:48 +02:00
Lauren Liberda 0d8ef28280 update changelog 2021-05-31 23:37:14 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 132d7674e3 [ard] Relax _VALID_URL and fix video ids (closes #22724, closes #29091) 2021-05-31 23:29:01 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e19e102a56 [ustream] Detect https embeds (closes #29133) 2021-05-31 23:28:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= dd62e6bab3 [ted] Prefer own formats over external sources (closes #29142) 2021-05-31 23:28:48 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 484dabbf8a [twitch:clips] Improve extraction (closes #29149) 2021-05-31 23:28:40 +02:00
phlip 2e387cb356 [twitch:clips] Add access token query to download URLs (closes #29136) 2021-05-31 23:28:25 +02:00
Remita Amine 177f5c64de [vimeo] fix vimeo pro embed extraction(closes #29126) 2021-05-31 23:28:17 +02:00
Remita Amine a9f7bf158b [redbulltv] fix embed data extraction(closes #28770) 2021-05-31 23:28:11 +02:00
Remita Amine 80c9bfae14 [shahid] relax _VALID_URL(closes #28772, closes #28930) 2021-05-31 23:28:05 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8fac551776 [playstuff] Add extractor (closes #28901, closes #28931) 2021-05-31 23:27:42 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 47fec1e10b [eroprofile] Skip test 2021-05-31 23:27:36 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 57c88d40d3 [eroprofile] Fix extraction (closes #23200, closes #23626, closes #29…
…008)
2021-05-31 23:27:20 +02:00
kr4ssi ce5c2526bc [vivo] Add support for vivo.st (#29009)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-31 23:27:14 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8c826fe7ce [generic] Add support for og:audio (closes #28311, closes #29015) 2021-05-31 23:27:07 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1c539931b6 [options] Fix thumbnail option group name (closes #29042) 2021-05-31 23:27:01 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6d5cb9e661 [phoenix] Fix extraction (closes #29057) 2021-05-31 23:26:55 +02:00
Lauren Liberda 97abd98bc3 [generic] Add support for sibnet embeds
286e01ce30
2021-05-31 23:26:35 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 646a08b1c5 [vk] Add support for sibnet embeds (closes #9500) 2021-05-31 23:23:22 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e32f3c07ea [generic] Add Referer header for direct videojs download URLs (closes…
#2879, closes #20217, closes #29053)
2021-05-31 23:23:16 +02:00
Lukas Anzinger 56d9861eb5 [orf:radio] Switch download URLs to HTTPS (closes #29012) (#29046) 2021-05-31 23:23:09 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8d0c50580c [blinkx] Remove extractor (closes #28941)
No longer exists.
2021-05-31 23:23:02 +02:00
catboy 07adc2e4cd [medaltv] Relax _VALID_URL (#28884)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-31 23:22:56 +02:00
Jacob Chapman a3e21baccc [YoutubeDL] Improve extract_info doc (#28946)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-31 23:22:47 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c3b5074fcd [funimation] Add support for optional lang code in URLs (closes #28950) 2021-05-31 23:22:07 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 30d8947496 [gdcvault] Add support for HTML5 videos 2021-05-31 23:20:33 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2489669316 [dispeak] DRY and update tests (closes #28970) 2021-05-31 23:20:13 +02:00
Ben Rog-Wilhelm 5512cc0f37 [dispeak] Improve FLV extraction (closes #13513) 2021-05-31 23:20:06 +02:00
Ben Rog-Wilhelm 1643b0b490 [kaltura] Improve iframe extraction (#28969)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-31 23:19:55 +02:00
Lauren Liberda 41cd26d4cf [kaltura] Make embed code alternatives actually work 2021-05-31 23:19:38 +02:00
Lauren Liberda 993cb8ce4c [youtube] fix videos with age gate 2021-05-31 22:31:08 +02:00
Lauren Liberda fca8c46c7b radiokapital extractors 2021-05-30 15:37:30 +02:00
Lauren Liberda 9d9b571371 [misskey] add tests 2021-05-28 00:53:27 +02:00
Lauren Liberda d540126206 utils: flake8 2021-05-28 00:50:34 +02:00
Lauren Liberda fa290c78e7 misskey extractor 2021-05-28 00:50:00 +02:00
Lauren Liberda 2c8fa677b2 [tiktok] deduplicate videos 2021-05-23 17:44:08 +02:00
Lauren Liberda ad5cc09566 [peertube] logging in 2021-05-23 14:43:00 +02:00
Lauren Liberda e83f44815c [mastodon] support cards to external services 2021-05-07 14:24:46 +02:00
Lauren Liberda 6adb5ea838 [mastodon] cache apps on logging in 2021-05-04 01:07:29 +02:00
Lauren Liberda 8dee2b0f85 changelog update 2021-05-03 23:16:25 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 36bc893bd8 [twitter] Improve formats extraction from vmap URL (closes #28909) 2021-05-03 23:00:18 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ceab7dc7ec [xtube] Fix formats extraction (closes #28870) 2021-05-03 23:00:13 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 560a3ab05d [svtplay] Improve extraction (closes #28507, closes #28876) 2021-05-03 23:00:07 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b7f9dc517f [tv2dk] Fix extraction (closes #28888) 2021-05-03 22:59:59 +02:00
schnusch d56b6a0b75 [xfileshare] Add support for wolfstream.tv (#28858) 2021-05-03 22:59:44 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2403ecd42d [francetvinfo] Improve video id extraction (closes #28792) 2021-05-03 22:59:38 +02:00
catboy 19dc8442c2 [medaltv] Fix extraction (#28807)
numeric clip ids are no longer used by medal, and integer user ids are now sent as strings.
2021-05-03 22:59:31 +02:00
The Hatsune Daishi d40d350a69 [tver] Redirect all downloads to Brightcove (#28849) 2021-05-03 22:59:23 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 63c541a3cd [test_execution] Add test for lazy extractors (refs #28780) 2021-05-03 22:57:13 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c9c96706eb [bbc] Extract full description from __INITIAL_DATA__ (refs #28774) 2021-05-03 22:55:55 +02:00
dirkf 35043c6160 [bbc] Extract description and timestamp from __INITIAL_DATA__ (#28774) 2021-05-03 22:55:49 +02:00
Lauren Liberda 5c054ee942 [mastodon] oh haruhi what did I NOT do here
+ --force-use-mastodon option
+ logging in to mastodon/pleroma
+ fetching posts via different mastodon/pleroma instances to get follower-only/direct posts
+ fetching peertube videos via pleroma instances to circuvument censorship (?)
2021-05-03 01:41:05 +02:00
Lauren Liberda 76d4e8de92 [wppilot] add tests 2021-04-28 11:43:51 +02:00
Lauren Liberda e9f7e06635 [wppilot] reduce logging in and throw meaningful errors 2021-04-28 01:53:41 +02:00
Lauren Liberda 64ec930237 wp pilot extractors 2021-04-28 00:33:27 +02:00
Lauren Liberda ac8b9e45fb yet another update on funding 2021-04-26 17:16:19 +02:00
Lauren Liberda 8b4a9656f0 [tvp] fix website extracting with weird urls 2021-04-26 17:14:13 +02:00
Lauren Liberda 6ad8f4990a [tvn] better extraction method choosing 2021-04-26 14:40:15 +02:00
Lauren Liberda b31ca60b3a update on donations 2021-04-23 18:09:37 +02:00
Lauren Liberda eb67a3cd44 [tvp:embed] handling formats better way 2021-04-22 15:49:51 +02:00
Lauren Liberda cde74b6420 [youtube:channel] fix multiple page extraction 2021-04-20 23:57:15 +02:00
Lauren Liberda d68515cd12 readme update 2021-04-19 01:27:42 +02:00
Lauren Liberda 379b17f27e [tvp] fix jp2.tvp.pl 2021-04-18 21:49:22 +02:00
Lauren Liberda 83a294d881 [mastodon] support for soapbox and audio files 2021-04-18 18:19:33 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 4c46e374bc [cbsnews] Fix extraction for python <3.6 (closes #23359) 2021-04-18 16:41:09 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 5f6bcc20f5 [utils] Add support for support for experimental HTTP response status…
… code 308 Permanent Redirect (refs #27877, refs #28768)
2021-04-18 16:38:56 +02:00
quyleanh 865b8fd65f [pluralsight] Extend anti-throttling timeout (#28712) 2021-04-18 16:37:48 +02:00
Aaron Lipinski f7cde33162 [maoritv] Add new extractor(closes #24552) 2021-04-18 16:37:39 +02:00
Remita Amine 9ef69b9a67 [mtv] Fix Viacom A/B Testing Video Player extraction(closes #28703) 2021-04-18 16:37:32 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 05f71071f4 [pornhub] Extract DASH and HLS formats from get_media end point (clos…
…es #28698)
2021-04-18 16:37:25 +02:00
Remita Amine f755095cb3 [cbssports] fix extraction(closes #28682) 2021-04-18 16:37:17 +02:00
Remita Amine 85f9e11581 [jamendo] fix track extraction(closes #28686) 2021-04-18 16:37:11 +02:00
Remita Amine 6108793376 [curiositystream] fix format extraction(closes #26845, closes #28668) 2021-04-18 16:37:02 +02:00
Lauren Liberda d94f06105c compat simplecookie again because reasons 2021-04-18 16:36:44 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0a6031afcb [compat] Use more conventional name for compat SimpleCookie 2021-04-18 16:35:21 +02:00
guredora d8d8cc0945 [line] add support live.line.me (closes #17205)(closes #28658) 2021-04-18 16:30:36 +02:00
Lauren Liberda 8deedd7636 added compat_SimpleCookie for compatibility with ytdl 2021-04-18 16:28:42 +02:00
Remita Amine 229b4d1671 [compat] add compat_SimpleCookie 2021-04-18 16:26:49 +02:00
Lauren Liberda 2208983e30 [vimeo] extraction improvements
originally by Remita Amine <remitamine@gmail.com>
2021-04-18 16:24:11 +02:00
RomanEmelyanov 8f35a39d9f [youku] Update ccode(closes #17852, closes #28447, closes #28460) (#2…
…8648)
2021-04-18 16:10:33 +02:00
Remita Amine 97b46bced6 [extractor/common] fix _get_cookies method for python 2(#20673, #2325…
…6, #20326, closes #28640)
2021-04-18 16:09:45 +02:00
Remita Amine 6f678388cb [screencastomatic] fix extraction(closes #11976, closes #24489) 2021-04-18 16:09:39 +02:00
Allan Daemon 40ef0c5a1c [palcomp3] Add new extractor(closes #13120) 2021-04-18 16:09:32 +02:00
Vid f0dd168230 [arnes] Add new extractor(closes #28483) 2021-04-18 16:09:19 +02:00
Adrian Heine df566be96f [magentamusik360] Add new extractor 2021-04-16 23:41:22 +02:00
Lauren Liberda 923069eb48 [core] merge formats by codecs 2021-04-15 01:41:42 +02:00
Lauren Liberda a0986f874d [senat.pl] support for live videos 2021-04-14 14:21:07 +02:00
Lauren Liberda 12a935cf42 [sejm.pl] support live streams 2021-04-14 14:01:54 +02:00
Lauren Liberda 44ed85b18b + castos extractors 2021-04-13 00:17:17 +02:00
Lauren Liberda 2bd0f6069a spryciarze.pl extractors 2021-04-12 20:53:07 +02:00
Lauren Liberda e2764f61ea json_dl: better author extraction 2021-04-12 20:52:49 +02:00
Lauren Liberda 66e93478d8 [spreaker] embedded player support 2021-04-12 14:45:18 +02:00
Lauren Liberda a4d58a6adf [spreaker] new url schemes 2021-04-12 14:33:51 +02:00
Lauren Liberda abb792e7b5 senat.pl extractor 2021-04-12 13:55:46 +02:00
Lauren Liberda 55e021da8e [sejm.pl] extracting ism formats, small changes to work with senat 2021-04-12 13:55:30 +02:00
Lauren Liberda 13cc377d6f [sejm.pl] multiple cameras and PJM translator 2021-04-12 12:07:19 +02:00
Lauren Liberda 46d28e0fd5 + sejm.gov.pl archival video extractor 2021-04-12 01:14:28 +02:00
Lauren Liberda 9c0e55eb79 improve documentation on subtitles 2021-04-09 18:41:55 +02:00
Lauren Liberda 860a8f2061 [tvp] support for tvp.info vue pages 2021-04-09 17:07:59 +02:00
Lauren Liberda 557fe650bb [cda] fix premium videos for premium users (?) 2021-04-09 13:56:17 +02:00
Lauren Liberda baf8549c0a [tvn24] refactor nextjs frontend handling
mitigating HTTP 404 response issues
2021-04-09 01:50:18 +02:00
Lauren Liberda dae5140251 - [ninateka] remove extractor [*]
ninateka uses DRM protection now
2021-04-06 14:51:34 +02:00
Lauren Liberda 9eaffe8278 [tvp:series] error handling, fallback to web 2021-04-05 04:11:02 +02:00
Lauren Liberda 6ed5f6bbc8 copykitku: get ready for merging other fork changes 2021-04-05 02:44:06 +02:00
Dominika Liberda a71cc68530 * version 2021.04.01 2021-04-01 20:57:36 +02:00
Lauren Liberda 8a0ec69c60 [vlive] merge all updates from ytdl 2021-04-01 14:54:46 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 607734c7ef [francetvinfo] Improve video id extraction (closes #28584) 2021-04-01 14:49:09 +02:00
Chris Hranj 3a0f408546 [instagram] Improve title extraction and extract duration (#28469)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-04-01 14:49:01 +02:00
Lauren Liberda a067097513 [youtube] better consent workaround 2021-04-01 14:07:32 +02:00
Dominika Liberda b428c73970 * version 2021.03.30 2021-03-30 22:43:27 +02:00
Lauren Liberda e824771caf [makefile] use python3 2021-03-30 22:33:20 +02:00
Lauren Liberda ecf455300f [youtube] consent shit workaround (fuck google)
Co-authored-by: Dominika Liberda <ja@sdomi.pl>
2021-03-30 22:32:29 +02:00
Remita Amine 605ba1f477 [sbs] add support for ondemand watch URLs(closes #28566) 2021-03-28 22:26:01 +02:00
Remita Amine 6277a6f4c7 [picarto] fix live stream extraction(closes #28532) 2021-03-28 22:25:34 +02:00
Remita Amine 608d64024e [vimeo] fix unlisted video extraction(closes #28414) 2021-03-28 22:25:29 +02:00
Remita Amine b587a7656e [ard] improve clip id extraction(#22724)(closes #28528) 2021-03-28 22:25:21 +02:00
Roman Sebastian Karwacik 14ee975fb4 [zoom] Add new extractor(closes #16597, closes #27002, closes #28531) 2021-03-28 22:25:14 +02:00
The Hatsune Daishi 1cd1ed16ed [extractor] escape forgotten dot for hostnames in regular expression …
…(#28530)
2021-03-28 22:25:08 +02:00
Remita Amine 74ae4cb2be [bbc] fix BBC IPlayer Episodes/Group extraction(closes #28360) 2021-03-28 22:25:02 +02:00
Remita Amine 7bee125ade [zingmp3] fix extraction(closes #11589, closes #16409, closes #16968,…
closes #27205)
2021-03-28 22:24:43 +02:00
=?UTF-8?q?Martin=20Str=C3=B6m?= 847a1ddff4 [vgtv] Add support for new tv.aftonbladet.se URL schema (#28514)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-03-28 22:24:36 +02:00
Lauren Liberda 64f7b37d8e [tiktok] detect private videos 2021-03-28 22:24:13 +02:00
Lauren Liberda 2404fc148e --ie-key cli option 2021-03-28 22:01:33 +02:00
Lauren Liberda 9aa7e4481b fix dw:article, refactor dw 2021-03-28 21:45:08 +02:00
Lauren Liberda 2e7f27f566 + patronite audio extractor 2021-03-24 16:40:15 +01:00
Dominika Liberda 7ba6fd5e2c * version 2021.03.21 2021-03-21 03:33:30 +01:00
Lauren Liberda d2d859b0cb changelog update 2021-03-21 03:17:30 +01:00
Lauren Liberda 1644003935 [youtube] meaningful error for age-gated no-embed videos 2021-03-21 02:40:20 +01:00
Lauren Liberda d7455472c7 - removed tvnplayer extractor 2021-03-21 00:08:12 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a688593c71 [yandexmusic:playlist] Request missing tracks in chunks (closes #2735…
…5, closes #28184)
2021-03-20 20:40:49 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ce1c406432 [yandexmusic:album] Simplify 2021-03-20 20:40:43 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ef06ab2626 [yandexmusic] Add support for music.yandex.com (closes #27425) 2021-03-20 20:40:37 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 5403f15eca [yandexmusic] DRY _VALID_URL base 2021-03-20 20:40:31 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 11e7d9a9bc [yandexmusic:album] Improve album title extraction (closes #27418) 2021-03-20 20:40:26 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ad4946376d [yandexmusic] Refactor and add support for artist's tracks and albums…
… (closes #11887, closes #22284)
2021-03-20 20:40:20 +01:00
Lauren Liberda fae71efe4b [peertube] improve thumbnail extraction
Original author: remitamine
2021-03-20 20:35:40 +01:00
Lauren Liberda 2a36637212 [PATCH] [vimeo:album] Fix extraction for albums with number of videos multiple to page size
Original author: dstftw
2021-03-20 20:32:05 +01:00
Remita Amine 051da7778d [vvvvid] fix kenc format extraction(closes #28473) 2021-03-20 20:29:36 +01:00
Remita Amine 6faaa046ba [mlb] fix video extracion(#21241) 2021-03-20 20:29:29 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3216bd2742 [svtplay] Improve extraction (closes #28448) 2021-03-20 20:29:24 +01:00
Remita Amine 8210d0d578 [applepodcasts] fix extraction(closes #28445) 2021-03-20 20:29:16 +01:00
Remita Amine 1df8de409f [rtve] improve extraction
- extract all formats
- fix RTVE Infantil extraction(closes #24851)
- extract is_live and series
2021-03-20 20:29:11 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= bc2dfba575 [southpark] Fix extraction and add support for southparkstudios.com (…
…closes #26763, closes #28413)
2021-03-20 20:29:05 +01:00
Remita Amine 7e5f6863ca [sportdeutschland] fix extraction(closes #21856)(closes #28425) 2021-03-20 20:28:59 +01:00
Remita Amine 8e580fb912 [pinterest] reduce the number of HLS format requests 2021-03-20 20:28:52 +01:00
Remita Amine a84bff7941 [tver] improve title extraction(closes #28418) 2021-03-20 20:28:41 +01:00
Remita Amine c07c6fd0bf [fujitv] fix HLS formats extension(closes #28416) 2021-03-20 20:28:35 +01:00
Remita Amine 0bf5bb20bb [shahid] fix format extraction(closes #28383) 2021-03-20 20:28:28 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 19f1ef28f1 [bandcamp] Extract release_timestamp 2021-03-20 20:28:21 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 06a0a2404e Introduce release_timestamp meta field (refs #28386) 2021-03-20 20:28:11 +01:00
Lauren Liberda b7c5d42047 [pornhub] Detect flagged videos
Original author: dstftw
2021-03-20 20:27:13 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8332796684 [pornhub] Extract formats from get_media end point (#28395) 2021-03-20 20:25:13 +01:00
Remita Amine fd211154d3 [bilibili] fix video info extraction(closes #28341) 2021-03-20 20:25:07 +01:00
Remita Amine e6efc4cc87 [cbs] add support for Paramount+ (closes #28342) 2021-03-20 20:25:02 +01:00
Remita Amine 9f9d5f98fd [trovo] Add Origin header to VOD formats(closes #28346) 2021-03-20 20:24:57 +01:00
Remita Amine 6e95b224c2 [voxmedia] fix volume embed extraction(closes #28338) 2021-03-20 20:24:52 +01:00
Remita Amine 0eab1a6949 [9c9media] fix extraction for videos with multiple ContentPackages(cl…
…oses #28309)
2021-03-20 20:24:44 +01:00
Remita Amine a28058ddeb [bbc] correct catched exception type 2021-03-20 20:24:38 +01:00
dirkf 62d5e81ff1 [bbc] add support for BBC Reel videos(closes #21870, closes #23660, c…
…loses #28268)
2021-03-20 20:24:32 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ef668c9585 [zdf] Rework extractors (closes #11606, closes #13473, closes #17354,…
closes #21185, closes #26711, closes #27068, closes #27930, closes #28198, closes #28199, closes #28274)

* Generalize unique video ids for zdf based extractors
* Improve extraction
* Fix 3sat and phoenix
2021-03-20 20:24:19 +01:00
Lauren Liberda 63755989fc fix the patch hook 2021-03-20 20:23:58 +01:00
Remita Amine f67e11c888 [stretchinternet] Fix extraction(closes #28297) 2021-03-20 20:20:26 +01:00
Remita Amine 28d7757c8b [urplay] fix episode data extraction(closes #28292) 2021-03-20 20:20:03 +01:00
Remita Amine d49b9356ce [bandaichannel] Add new extractor(closes #21404) 2021-03-20 20:19:49 +01:00
Lauren Liberda efffe9e670 [tvp:embed] extracting video subtitles 2021-03-20 16:13:40 +01:00
Lauren Liberda a4a4af8546 fix m3u8 parsing test 2021-03-16 22:23:26 +01:00
Lauren Liberda 8e8af58d04 fix possible crash 2021-03-16 22:23:13 +01:00
Lauren Liberda 4ddca367de support for vtt subtitles in m3u8 manifests 2021-03-16 21:34:15 +01:00
Lauren Liberda 3e7425297f [pulsevideo] unduplicating formats 2021-03-16 15:23:46 +01:00
Lauren Liberda 3b151afce7 [polskieradio] radiokierowcow.pl extractor 2021-03-16 14:57:21 +01:00
Lauren Liberda 999ab0298b [youtube] some formats are now just static 2021-03-15 23:50:56 +01:00
Lauren Liberda 510512606a [youtube] better signature handling for DASH formats 2021-03-15 23:25:24 +01:00
Lauren Liberda a8e3f00134 [generic] extracting mpd manifests properly 2021-03-14 00:15:17 +01:00
Lauren Liberda ca57ada0fc + bittorrent magnet extractor 2021-03-12 04:59:48 +01:00
Lauren Liberda ade6eb8abc [generic] detecting bittorrent manifest files 2021-03-12 04:10:40 +01:00
Lauren Liberda 6f3c4fd2f8 [peertube] bittorrent formats 2021-03-12 03:04:49 +01:00
Lauren Liberda 58538a2c64 initial bittorrent support 2021-03-12 03:04:37 +01:00
Lauren Liberda 3426d75467 [tiktok] hashtag and music extractors 2021-03-11 22:13:57 +01:00
Lauren Liberda 199edacd48 [onnetwork] refactor 2021-03-10 17:58:23 +01:00
Lauren Liberda 9e535b8762 [polskieradio] podcast support 2021-03-09 19:47:12 +01:00
Lauren Liberda 0b5407d6ec [youtube] more descriptive geo-lock messages (with countries) 2021-03-09 18:30:31 +01:00
Timothy Wynn c10469c0a8 Update go.py 2021-03-07 22:52:29 +01:00
Lauren Liberda 9759eb7182 removed a lot of deprecated platform support code 2021-03-05 22:45:56 +01:00
Lauren Liberda 5311710390 new exe build script 2021-03-05 21:12:35 +01:00
Lauren Liberda c42920795e [playwright] more verbose errors if --verbose 2021-03-05 15:41:47 +01:00
Lauren Liberda 0de898ecb5 [youtube] signature function caching 2021-03-04 21:05:12 +01:00
Lauren Liberda ec0abef671 fix links to ytdl issues 2021-03-04 14:22:51 +01:00
Lauren Liberda d5ad78cd0b pypy tests 2021-03-04 12:01:19 +01:00
Lauren Liberda 3e69892860 videotarget extractor 2021-03-03 23:15:44 +01:00
Lauren Liberda 3240e9f582 acast player extractor 2021-03-03 20:17:44 +01:00
Dominika Liberda ba5cda94c7 version 2021.03.01 2021-03-01 23:16:35 +01:00
Lauren Liberda 1786d6c1c4 [peertube] playlist, channel and account extractor 2021-03-01 21:44:26 +01:00
Lauren Liberda 0234f9eacc [cda] logging in with a user account 2021-03-01 18:18:43 +01:00
Laura Liberda 0f68e2ad09 remove some unused devscripts/docs 2021-02-27 02:32:47 +01:00
Dominika Liberda 09b397e541 version 2021.02.27 2021-02-27 02:18:34 +01:00
Laura Liberda e4639cf66f add --use-proxy-sites option 2021-02-27 02:16:00 +01:00
Laura Liberda 44adc8a082 nitter extractor 2021-02-27 01:52:19 +01:00
bopol 6178129851 [nitter] Add new extractor 2021-02-27 01:05:58 +01:00
Laura Liberda c2a729e48f updated changelog for 2021.02.26 2021-02-27 00:41:37 +01:00
Laura Liberda 57114f45ea [ipla] reformat code 2021-02-27 00:33:12 +01:00
Laura Liberda 6217437dc2 remove now-invalid unicode_literals test 2021-02-27 00:06:15 +01:00
Dominika Liberda c5e2afea28 version 2021.02.26 2021-02-26 23:47:53 +01:00
Dominika Liberda 5a9cef5476 new youtube crypto 2021-02-26 23:46:27 +01:00
Laura Liberda 98be3a5cab make sure py2 throws a deprecation notice 2021-02-26 21:00:17 +01:00
Laura Liberda bdd4030586 changelog 2021-02-26 20:33:18 +01:00
Lauren Liberda 7b16bb6509 Merge branch 'ytdl-backports' into 'master'
youtube-dl backports

Closes #26, #23, and #1

See merge request laudompat/haruhi-dl!3
2021-02-26 19:28:28 +00:00
Laura Liberda 67692545da fix crash in generic extractor 2021-02-26 18:47:47 +01:00
Laura Liberda 293eada0f4 fix hdl tests 2021-02-26 18:39:45 +01:00
Alexander Seiler 57da386d5c [srgssr] improve extraction
- extract subtitle
- fix extraction for new videos
- update srf download domains

closes #14717
closes #14725
closes #27231
closes #28238
2021-02-26 18:19:42 +01:00
Remita Amine 302b6ffb09 [vvvvid] reduce season request payload size 2021-02-26 18:19:38 +01:00
nixxo 2b6555f2eb [vvvvid] extract series sublists playlist_title (#27601) (#27618) 2021-02-26 18:19:33 +01:00
Remita Amine cf883f24cc [dplay] Extract Ad-Free uplynk URLs(#28160) 2021-02-26 18:19:28 +01:00
Remita Amine 36ee1ad35d [wat] detect DRM protected videos(closes #27958) 2021-02-26 18:19:22 +01:00
Remita Amine ff1ee8a80e [tf1] improve extraction(closes #27980)(closes #28040) 2021-02-26 18:19:16 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 7a3bf913b8 [tmz] Fix and improve extraction (closes #24603, closes #24687, close…
…s #28211)
2021-02-26 18:19:11 +01:00
Remita Amine 77ba700626 [gedidigital] improve asset id matching 2021-02-26 18:19:07 +01:00
nixxo 1c08ff576b [gedidigital] Add new extractor(closes #7347)(closes #26946) 2021-02-26 18:18:58 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a52155b129 [apa] Improve extraction (closes #27750) 2021-02-26 18:18:33 +01:00
Adrian Heine 21daa7ea91 [apa] Fix extraction 2021-02-26 18:18:28 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6e796716f9 [youporn] Skip test 2021-02-26 18:18:22 +01:00
piplongrun a3b6d4d975 [youporn] Extract duration (#28019)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-02-26 18:18:12 +01:00
Isaac-the-Man f9fa934413 [samplefocus] Add new extractor(closes #27763) 2021-02-26 18:17:50 +01:00
Remita Amine fd1c09264d [vimeo] add support for unlisted video source format extraction 2021-02-26 18:17:44 +01:00
Remita Amine fa6393bfd5 [viki] improve extraction(closes #26522)(closes #28203)
- extract uploader_url and episode_number
- report login required error
- extract 480p formats
- fix API v4 calls
2021-02-26 18:17:39 +01:00
Remita Amine e3b224a330 [ninegag] unscape title(#28201) 2021-02-26 18:17:34 +01:00
Remita Amine 131c65b8ba [dplay] add support for de.hgtv.com (closes #28182) 2021-02-26 18:17:09 +01:00
Remita Amine f10f61fa0e [dplay] Add support for discoveryplus.com (closes #24698) 2021-02-26 18:17:01 +01:00
dmsummers fc9e0b111d [simplecast] Add new extractor(closes #24107) 2021-02-26 18:16:47 +01:00
Max f23361c5d2 [postprocessor/embedthumbnail] Recognize atomicparsley binary in lowe…
…rcase (#28112)
2021-02-26 18:12:11 +01:00
Stephen Stair 2a7bf89e70 [storyfire] Add new extractor(closes #25628)(closes #26349) 2021-02-26 18:12:05 +01:00
Remita Amine a5d4fcbbd5 [zhihu] Add new extractor(closes #28177) 2021-02-26 18:11:59 +01:00
Remita Amine 727a4a5b79 [ccma] fix timestamp parsing in python 2 2021-02-26 18:11:46 +01:00
Remita Amine eb88460be9 [videopress] add support for video.wordpress.com 2021-02-26 18:10:30 +01:00
Remita Amine 3fc10250f2 [kakao] improve info extraction and detect geo restriction(closes #26…
…577)
2021-02-26 18:04:54 +01:00
Remita Amine 221f01621a [xboxclips] fix extraction(closes #27151) 2021-02-26 18:04:46 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 89b41a73aa [ard] Improve formats extraction (closes #28155) 2021-02-26 18:04:40 +01:00
Kevin Velghe ec2eaef0ca [canvas] Add new extractor for Dagelijkse Kost (#28119) 2021-02-26 18:04:23 +01:00
Remita Amine 87dee740af [ign] fix extraction(closes #24771) 2021-02-26 18:04:15 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 22b2970a2d [xhamster] Extract formats from xplayer settings and extract filesize…
…s (closes #28114)
2021-02-26 18:04:08 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2df9ac2526 [archiveorg] Fix and improve extraction (closes #21330, closes #23586…
…, closes #25277, closes #26780, closes #27109, closes #27236, closes #28063)
2021-02-26 18:03:59 +01:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 1432a02035 [urplay] Fix extraction (closes #28073) (#28074) 2021-02-26 18:03:49 +01:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 1da3c67651 [azmedien] Fix extraction (#28064) 2021-02-26 18:03:38 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 4d9300cc44 [pornhub] Implement lazy playlist extraction 2021-02-26 18:02:37 +01:00
Sergey M f21660e963 [pornhub] Add placeholder netrc machine 2021-02-26 18:02:15 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 044b166cda [svtplay] Fix video id extraction (closes #28058) 2021-02-26 18:00:39 +01:00
Sergey M f2cffa26d4 [pornhub] Add support for authentication (closes #18797, closes #21416, closes #24294) 2021-02-26 18:00:16 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 00fe24846c [pornhub:user] Add support for URLs unavailable via /videos page and …
…improve paging (closes #27853)
2021-02-26 16:28:34 +01:00
Remita Amine 21321d23dc [bravotv] add support for oxygen.com(closes #13357)(closes #22500) 2021-02-26 16:28:28 +01:00
Guillem Vela 6cf6a0cf15 [ccma] improve metadata extraction(closes #27994)
- extract age_limit, alt_title, categories, series and episode_number
- fix timestamp multiple subtitles extraction
2021-02-26 16:28:15 +01:00
Remita Amine e9b3810524 [egghead] fix typo 2021-02-26 16:26:44 +01:00
Viren Rajput cd74c846a6 [egghead] update API domain(closes #28038) 2021-02-26 16:26:38 +01:00
Remita Amine e2095ebc11 [vidzi] remove extractor(closes #12629) 2021-02-26 16:26:30 +01:00
Remita Amine 591d23365c [vidio] improve metadata extraction 2021-02-26 16:26:08 +01:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= aec7e2fbb1 [AENetworks] update AENetworksShowIE test playlist id (#27851) 2021-02-26 16:25:20 +01:00
nixxo 148394b527 [vvvvid] add support for youtube embeds (#27825) 2021-02-26 16:25:08 +01:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 00197b5fa8 [awaan] Extract uploader id (#27963) 2021-02-26 16:24:30 +01:00
Remita Amine 6a18fcbd8a [medialaan] add support DPG Media MyChannels based websites
closes #14871
closes #15597
closes #16106
closes #16489
2021-02-26 16:24:11 +01:00
Remita Amine 8d47c811f1 [abcnews] fix extraction(closes #12394)(closes #27920) 2021-02-26 16:19:08 +01:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 288b2cc25b [AMP] Fix upload_date and timestamp extraction (#27970) 2021-02-26 16:19:03 +01:00
Remita Amine 95fa7a8985 [tv4] relax _VALID_URL(closes #27964) 2021-02-26 16:18:56 +01:00
Remita Amine 743a3f4c00 [tv2] improve MTV Uutiset Article extraction 2021-02-26 16:18:51 +01:00
tpikonen 6e3cdd8515 [tv2] Add support for mtvuutiset.fi (#27744) 2021-02-26 16:18:45 +01:00
Remita Amine d5cdaae9c8 [adn] improve login warning reporting 2021-02-26 16:18:40 +01:00
Remita Amine 36552561a6 [zype] fix uplynk id extraction(closes #27956) 2021-02-26 16:18:36 +01:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 5c5e031816 [ADN] Implement login (#27937)
closes #17091
closes #27841
2021-02-26 16:18:31 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b94ec338ad [franceculture] Make thumbnail optional (closes #18807) 2021-02-26 16:18:24 +01:00
=?UTF-8?q?Aur=C3=A9lien=20Grosdidier?= d2324df444 [franceculture] Fix extraction (closes #27891) (#27903)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-02-26 16:18:19 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c5f2993484 [options] Clarify --extract-audio help string (closes #27878) 2021-02-26 16:18:10 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1752b8b8c8 Introduce --output-na-placeholder (closes #27896) 2021-02-26 16:17:36 +01:00
aarubui 886d51e368 [njpwworld] fix extraction (#27890) 2021-02-26 16:17:10 +01:00
Remita Amine 48680ac382 [comedycentral] fix extraction(closes #27905) 2021-02-26 16:17:05 +01:00
Remita Amine 8976512791 [wat] remove unused variable 2021-02-26 16:16:59 +01:00
Remita Amine e5ac4c2a67 [wat] fix format extraction(closes #27901) 2021-02-26 16:16:55 +01:00
Remita Amine 64c8ca8464 [americastestkitchen] improve season extraction 2021-02-26 16:16:44 +01:00
Brian Marks 9b58478829 [americastestkitchen] Add support for downloading entire seasons (#27…
…861)
2021-02-26 16:16:38 +01:00
Remita Amine d381066e1d [trovo] Add new extractor(closes #26125) 2021-02-26 16:16:18 +01:00
Remita Amine 6e51fd65a8 [aol] add support for yahoo videos(closes #26650) 2021-02-26 16:14:37 +01:00
Remita Amine 6295bb4307 [yahoo] fix single video extraction 2021-02-26 16:14:27 +01:00
Remita Amine b99e3e93f3 [ninegag] improve extraction 2021-02-26 16:14:21 +01:00
DrWursterich 5a5b791576 [9gag] Fix Extraction (#23022) 2021-02-26 16:14:15 +01:00
Brian Marks 300dfe0df2 [americastestkitchen] Improve metadata extraction for ATK episodes (#…
…27860)
2021-02-26 16:14:08 +01:00
Remita Amine 4d3655d3d9 [aljazeera] fix extraction(closes #20911)(closes #27779) 2021-02-26 16:13:59 +01:00
Remita Amine 6bec24872b [minds] improve extraction 2021-02-26 16:13:55 +01:00
Tatsh 4630d90a5a [Minds] Add new extractor (#17934) 2021-02-26 16:13:47 +01:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= dd74d8e9d0 [ard] Fix title and description extraction and update tests (#27761) 2021-02-26 16:13:38 +01:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 22b81f3cf4 [aenetworks] Fix test (#27847) 2021-02-26 16:13:32 +01:00
Remita Amine 2e2fed69ef [spotify] Add new extractor for Spotify Podcasts(closes #27443) 2021-02-26 16:13:26 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 9167f6a104 [mixcloud:playlist:base] Fix video id extraction in flat playlist mod…
…e (refs #27787)
2021-02-26 16:13:15 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b55f99df9a [animeondemand] Add support for lazy playlist extraction (closes #27829) 2021-02-26 16:13:09 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= cc74091612 [YoutubeDL] Protect from infinite recursion due to recursively nested…
… playlists (closes #27833)
2021-02-26 16:13:02 +01:00
Remita Amine c613cf5aaa [twitter] Add tests for more cards 2021-02-26 16:12:57 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 43c1032e32 [youporn] Restrict fallback download URL (refs #27822) 2021-02-26 16:12:51 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 7361b972e3 [youporn] Improve height and tbr extraction (refs #23659, refs #20425) 2021-02-26 16:12:46 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8de14bb3eb [youporn] Fix extraction (closes #27822) 2021-02-26 16:12:41 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2423972ddc [twitter] Add support for unified cards (closes #27826) 2021-02-26 16:12:36 +01:00
main() 04eccf71c1 [twitch] Set OAuth token for GraphQL requests using auth-token cookie…
… (#27790)

Co-authored-by: remitamine <remitamine@gmail.com>
2021-02-26 16:12:29 +01:00
Aaron Zeng e225806484 [YoutubeDL] Ignore failure to create existing directory (#27811) 2021-02-26 16:12:22 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 5b7e334c86 [YoutubeDL] Raise syntax error for format selection expressions with …
…multiple + operators (closes #27803)
2021-02-26 16:12:06 +01:00
Aarni Koskela 4737de0eee [Mixcloud] Harmonize ID generation from lists with full ID generation…
… (#27787)

Mixcloud IDs are generated as `username_slug` when the full ID dict has been
downloaded.  When downloading a list (e.g. uploads, favorites, ...), the temporary
ID is just the `slug`.  This made e.g. archive file usage require the download
of stream metadata before the download can be rejected as already downloaded.

This commit attempts to get the uploader username during the GraphQL query, so the
temporary IDs are generated similarly.
2021-02-26 16:12:00 +01:00
Remita Amine bb8fb80a18 [cspan] improve info extraction(closes #27791) 2021-02-26 16:11:53 +01:00
Remita Amine 8e3ed83758 [adn] improve info extraction 2021-02-26 16:11:47 +01:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= a1c65b0c5f [ADN] Fix extraction (#27732)
Closes #26963.
2021-02-26 16:11:40 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 527cad1c28 [twitch] Improve login error extraction 2021-02-26 16:11:18 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= d695e5396b [twitch] Fix authentication (refs #27743) 2021-02-26 16:11:14 +01:00
Remita Amine 02b371b81e [threeqsdn] Improve extraction(closes #21058) 2021-02-26 16:11:08 +01:00
0l-l0 5e672bbd59 [peertube] Extract files also from streamingPlaylists (#27728)
JSON objects with an empty "files" tag seem to be a valid PeerTube API
response. In those cases the "files" arrays contained in the
"streamingPlaylists" members can be used instead.
closes #26002
closes #27586
2021-02-26 16:10:59 +01:00
Remita Amine 716a12bbe7 [khanacademy] fix extraction(closes #2887)(closes #26803) 2021-02-26 16:10:53 +01:00
Remita Amine 533f1a3d59 [spike] Update Paramount Network feed URL(closes #27715) 2021-02-26 16:10:48 +01:00
nixxo 3311732cc4 [rai] improve subtitles extraction (#27705)
closes #27698
2021-02-26 16:10:40 +01:00
Remita Amine d445754878 [canvas] Match only supported VRT NU URLs(#27707) 2021-02-26 16:10:31 +01:00
Remita Amine c20b7305d7 [extractors] add BibelTVIE import 2021-02-26 16:10:26 +01:00
Remita Amine d09dc9da38 [bibeltv] Add new extractor(closes #14361) 2021-02-26 16:10:21 +01:00
Remita Amine ba94de7b0d [bfmtv] Add new extractor(closes #16053)(closes #26615) 2021-02-26 16:10:17 +01:00
Remita Amine b1b5f6effe [sbs] Add support for ondemand play and news embed URLs(closes #17650…
…)(closes #27629)
2021-02-26 16:10:13 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c80f8c2006 [twitch] Refactor 2021-02-26 16:10:08 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= d83dfdc63a [twitch] Drop legacy kraken API v5 code altogether 2021-02-26 16:10:04 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ff8d551021 [twitch:vod] Switch to GraphQL for video metadata 2021-02-26 16:10:00 +01:00
Remita Amine ff330d9727 [canvas] Fix VRT NU extraction(closes #26957)(closes #27053) 2021-02-26 16:09:56 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 189885594b [twitch] Improve access token extraction and remove unused code (clos…
…es #27646)
2021-02-26 16:09:51 +01:00
23rd 948dc5834d [twitch] Switch access token to GraphQL and refactor. 2021-02-26 16:09:46 +01:00
nixxo 24f5760134 [rai] Detect ContentItem in iframe (closes #12652) (#27673)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-02-26 16:09:42 +01:00
Remita Amine eb001126da [ketnet] fix extraction(closes #27662) 2021-02-26 16:09:38 +01:00
Remita Amine 7f4e988520 [dplay] Add suport Discovery+ domains(closes #27680) 2021-02-26 16:09:33 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e94762a1a7 [motherless] Fix review issues and improve extraction (closes #26495,…
closes #27450)
2021-02-26 16:09:21 +01:00
cladmi a72df1d249 [motherless] Fix recent videos upload date extraction (closes #27661)
Less than a week old videos use a '20h ago' or '1d ago' format.

I kept the support for 'Ago' with uppercase start at is was already in the code.
2021-02-26 16:09:15 +01:00
Kevin O'Connor 903c90bd4c [downloader/hls] Disable decryption in tests (#27660)
Tests truncate the download to 10241 bytes, which is not divisible by 16
and cannot be decrypted. Tests don't really care about the decrypted
content, just that the data they retrieved is the expected data.
Therefore, it's fine to just return the encrypted data to tests.

See: #27621 and #27620
2021-02-26 16:08:56 +01:00
Yurii H 055e9eb904 [iheart] Update test description value (#27037)
the description has no HTML tags now.
2021-02-26 16:08:50 +01:00
Remita Amine 25dff12eb1 [nrk] fix extraction for videos without a legalAge rating 2021-02-26 16:08:44 +01:00
Remita Amine 67ff5da6ea [iheart] clean HTML tags from episode description 2021-02-26 16:08:39 +01:00
Remita Amine 8818f177ab [iheart] remove print statement 2021-02-26 16:08:34 +01:00
Remita Amine 1dc43fd3fc [googleplus] Remove Extractor(closes #4955)(closes #7400) 2021-02-26 16:08:27 +01:00
Remita Amine 607b324dff [applepodcasts] Add new extractor(#25918) 2021-02-26 16:08:20 +01:00
Remita Amine 1b1752a1b5 [googlepodcasts] Add new extractor 2021-02-26 16:08:15 +01:00
Remita Amine e52adb5328 [iheart] Add new extractor for iHeartRadio(#27037) 2021-02-26 16:06:26 +01:00
Remita Amine 626d26e13a [acast] clean podcast URLs 2021-02-26 16:06:22 +01:00
Remita Amine 1e653be1d0 [stitcher] clean podcast URLs 2021-02-26 16:06:16 +01:00
Remita Amine 017215032a [utils] add a function to clean podcast URLs 2021-02-26 16:06:12 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e98e8454c5 [xfileshare] Add support for aparat.cam (closes #27651) 2021-02-26 16:06:04 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a22e2b59b4 [nrktv] Add subtitles test 2021-02-26 16:06:00 +01:00
Remita Amine 3f2bf67bc9 [twitter] Add support for summary card(closes #25121) 2021-02-26 16:05:55 +01:00
Remita Amine 28c4062a58 [twitter] try to use a Generic fallback for unknown twitter cards(clo…
…ses #25982)
2021-02-26 16:05:50 +01:00
Remita Amine 3f43c99d4a [stitcher] Add support for shows and show metadata extraction(closes …
…#20510)
2021-02-26 16:05:44 +01:00
Remita Amine 8406b57ac6 [stv] improve episode id extraction(closes #23083) 2021-02-26 16:05:39 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8e538fc605 [nrk] Fix age limit extraction 2021-02-26 16:05:31 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b51ed7b039 [nrk] Improve series metadata extraction (closes #27473) 2021-02-26 16:05:26 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 785078cb08 [nrk] PEP 8 2021-02-26 16:05:21 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 57a63ed4a1 [nrk] Improve episode and season number extraction 2021-02-26 16:05:12 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c00a4d81ca [nrktv] Fix tests 2021-02-26 16:05:07 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= aa829b6cd3 [nrk] Improve series metadata extraction 2021-02-26 16:04:49 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= db48c8dbfe [nrk] Extract subtitles 2021-02-26 16:04:44 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= eff203d3ae [nrk] Fix age limit extraction 2021-02-26 16:04:38 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= d9673551d7 [nrk] Inline _extract_from_playback 2021-02-26 16:04:34 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 634ebea93d [nrk] Improve video id extraction 2021-02-26 16:04:15 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e1145c77fd [nrk] Add more shortcut tests 2021-02-26 16:04:08 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 18be494898 [nrk] Improve extraction (closes #27634, closes #27635)
+ Add support for mp3 formats
* Generalize and delegate all item extractors to nrk, beware ie key breakages
+ Add support for podcasts
+ Generalize nrk shortcut form to support all kind of ids
2021-02-26 16:04:02 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 973258396d [nrktv] Switch to playback endpoint
mediaelement endpoint is no longer in use.
2021-02-26 16:02:33 +01:00
Remita Amine 417963200c [vvvvid] fix season metadata extraction(#18130) 2021-02-26 16:02:17 +01:00
Remita Amine 51535e0624 [stitcher] fix extraction(closes #20811)(closes #27606) 2021-02-26 16:02:04 +01:00
Remita Amine 5ccde7fdb3 [acast] fix extraction(closes #21444)(closes #27612)(closes #27613) 2021-02-26 16:01:52 +01:00
Remita Amine 56a45e91d2 [arcpublishing] add missing staticmethod decorator 2021-02-26 16:01:45 +01:00
Remita Amine a13444f117 [arcpublishing] Add new extractor
closes #2298
closes #9340
closes #17200
2021-02-26 16:01:11 +01:00
Remita Amine fc156473d9 [sky] add support for Sports News articles and Brighcove videos(close…
…s #13054)
2021-02-26 15:54:49 +01:00
Remita Amine 2aafa2f712 [vvvvid] skip unplayable episodes and extract akamai formats(closes #…
…27599)
2021-02-26 15:54:41 +01:00
Remita Amine 0ade73d562 [yandexvideo] fix extraction for Python 3.4 2021-02-26 15:54:32 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 52fd0e8bb8 [redditr] Fix review issues and extract source thumbnail (closes #27503) 2021-02-26 15:54:14 +01:00
ozburo e4f3383802 [redditr] Extract all thumbnails 2021-02-26 15:54:08 +01:00
Remita Amine 9a6885f335 [vvvvid] imporove info extraction 2021-02-26 15:54:01 +01:00
nixxo 0165049a52 [vvvvid] add playlists support (#27574)
closes #18130
2021-02-26 15:53:57 +01:00
Remita Amine 97eb5b61b7 [yandexdisk] extract info from webpage
the public API does not return metadata when download limit is reached
2021-02-26 15:53:51 +01:00
Remita Amine 355b6d9ab6 [yandexdisk] fix extraction(closes #17861)(closes #27131) 2021-02-26 15:53:46 +01:00
Remita Amine 95b5454a31 [yandexvideo] use old api call as fallback 2021-02-26 15:53:41 +01:00
Remita Amine a2f4d6ec07 [yandexvideo] fix extraction(closes #25000) 2021-02-26 15:53:33 +01:00
Remita Amine 2c4b3dd864 [utils] accept only supported protocols in url_or_none 2021-02-26 15:53:27 +01:00
Remita Amine 10af8572d4 [YoutubeDL] Allow format filtering using audio language(#16209) 2021-02-26 15:53:16 +01:00
Remita Amine c7d0af171f [nbc] Remove CSNNE extractor 2021-02-26 15:52:06 +01:00
Remita Amine f1931b8ba8 [nbc] fix NBCSport VPlayer URL extraction(closes #16640) 2021-02-26 15:51:59 +01:00
Remita Amine 3ca3074dc3 [aenetworks] fix HistoryPlayerIE tests 2021-02-26 15:51:52 +01:00
Remita Amine f7bef2772c [aenetworks] add support for biography.com (closes #3863) 2021-02-26 15:51:25 +01:00
Remita Amine 50162a3580 [uktvplay] match new video URLs(closes #17909) 2021-02-26 15:49:59 +01:00
Remita Amine 1dbf12006f [sevenplay] detect API errors 2021-02-26 15:49:54 +01:00
Remita Amine db69be3ccc [tenplay] fix format extraction(closes #26653) 2021-02-26 15:49:47 +01:00
Remita Amine 21f2e0a12e [brightcove] raise ExtractorError for DRM protected videos(closes #23…
…467)(closes #27568)
2021-02-26 15:49:42 +01:00
Remita Amine 838ac10bc7 [aparat] Fix extraction
closes #22285
closes #22611
closes #23348
closes #24354
closes #24591
closes #24904
closes #25418
closes #26070
closes #26350
closes #26738
closes #27563
2021-02-26 15:49:32 +01:00
Remita Amine f3474e105d [brightcove] remove sonyliv specific code 2021-02-26 15:49:26 +01:00
Remita Amine afa77db731 [piksel] import format extraction 2021-02-26 15:49:21 +01:00
Remita Amine 68335e76a7 [zype] Add support for uplynk videos 2021-02-26 15:49:14 +01:00
Remita Amine d5bf4b0fea [toggle] add support for live.mewatch.sg (closes #27555) 2021-02-26 15:49:07 +01:00
JamKage 7b1f0173c1 [go] Added support for FXNetworks (#26826)
Co-authored-by: James Kirrage <james.kirrage@mortgagegym.com>

closes #13972
closes #22467
closes #23754
2021-02-26 15:48:48 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2a0a9bac02 [teachable] Improve embed detection (closes #26923) 2021-02-26 15:46:56 +01:00
Remita Amine 217918987a [mitele] fix free video extraction(#24624)(closes #25827)(closes #26757) 2021-02-26 15:46:52 +01:00
Remita Amine 7490ed64b4 [telecinco] fix extraction 2021-02-26 15:46:46 +01:00
Sergey M c4445c3311 [youtube] Update invidious.snopyta.org (#22667)
Co-authored-by: sofutru <54445344+sofutru@users.noreply.github.com>
2021-02-26 15:46:39 +01:00
Remita Amine 2d3b82a754 [amcnetworks] improve auth only video detection(closes #27548) 2021-02-26 15:46:29 +01:00
Laura Liberda 92bd8a446e VHX embeds
https://github.com/ytdl-org/youtube-dl/issues/27546
2021-02-26 15:45:57 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1d9552c236 [instagram] Fix test 2021-02-26 15:39:10 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 97c3432659 [instagram] Fix comment count extraction 2021-02-26 15:39:04 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 477e444c3b [instagram] Add support for reel URLs (closes #26234, closes #26250) 2021-02-26 15:38:59 +01:00
Remita Amine c298be2ebd [bbc] switch to media selector v6
closes #23232
closes #23933
closes #26303
closes #26432
closes #26821
closes #27538
2021-02-26 15:38:53 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ff12ad0ee4 [instagram] Improve thumbnail extraction 2021-02-26 15:38:48 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 73c5dc4104 [instagram] Improve extraction (closes #22880) 2021-02-26 15:38:42 +01:00
Andrew Udvare d7c028a33e [instagram] Fix extraction when authenticated (closes #27422) 2021-02-26 15:38:34 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 10a6f841a7 [spankbang] Remove unused import 2021-02-26 15:38:28 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 50dfe7adb8 [spangbang:playlist] Fix extraction (closes #24087) 2021-02-26 15:38:20 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 84b7f91b28 [spangbang] Add support for playlist videos 2021-02-26 15:38:12 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= dc69c587bf [pornhub] Fix review issues (closes #27393) 2021-02-26 15:36:38 +01:00
JChris246 226faa5521 [pornhub] Fix lq formats extraction (closes #27386) 2021-02-26 15:36:31 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 5b75c620bd [bongacams] Add extractor (closes #27440) 2021-02-26 15:36:23 +01:00
Remita Amine b88f43a813 [theweatherchannel] fix extraction (closes #25930)(closes #26051) 2021-02-26 15:36:17 +01:00
Remita Amine 8567d4488f [sprout] correct typo 2021-02-26 15:36:11 +01:00
Remita Amine 4d81f83267 [sprout] Add support for Universal Kids (closes #22518) 2021-02-26 15:36:04 +01:00
Remita Amine 5f00c83c35 [theplatform] allow passing geo bypass countries from other extractors 2021-02-26 15:35:57 +01:00
Remita Amine 90a021a137 [ctv] Add new extractor (closes #27525) 2021-02-26 15:35:17 +01:00
Remita Amine f350e326ac [9c9media] improve info extraction 2021-02-26 15:35:11 +01:00
Remita Amine 0445f9de8d [sonyliv] fix title for movies 2021-02-26 15:34:50 +01:00
Remita Amine 08d63a28df [sonyliv] fix extraction(closes #25667) 2021-02-26 15:34:41 +01:00
Remita Amine 6e80cb939b [streetvoice] fix extraction(closes #27455)(closes #27492) 2021-02-26 15:34:30 +01:00
Remita Amine 00e2c2ddea [facebook] add support for watchparty pages(closes #27507) 2021-02-26 15:33:41 +01:00
Remita Amine 437ab525e9 [cbslocal] fix video extraction 2021-02-26 15:26:26 +01:00
Remita Amine fc441623a8 [brightcove] add another method to extract policyKey 2021-02-26 15:26:18 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 4317f7c6fa [mewatch] Relax _VALID_URL (closes #27506) 2021-02-26 15:26:14 +01:00
Remita Amine ed14efaed2 [anvato] remove NFLTokenGenerator
until a better solution is introduced that:
- works with lazy_extractors
- allows for 3rd party token generators
2021-02-26 15:24:16 +01:00
Remita Amine 3c6c586e4b [tastytrade] Remove Extractor(closes #25716)
covered by GenericIE via BrighcoveNewIE
2021-02-26 15:24:11 +01:00
Remita Amine b3acd855b8 [niconico] fix playlist extraction(closes #27428) 2021-02-26 15:24:05 +01:00
Remita Amine 90988f4772 [everyonesmixtape] Remove Extractor 2021-02-26 15:23:53 +01:00
Remita Amine ef03683547 [kanalplay] Remove Extractor 2021-02-26 15:22:09 +01:00
Remita Amine 027f07edd3 [nba] rewrite extractor 2021-02-26 15:21:32 +01:00
Remita Amine cb5a16067b [turner] improve info extraction 2021-02-26 15:21:20 +01:00
Remita Amine 9d2fabe5d4 [common] remove unwanted query params from unsigned akamai manifest URLs 2021-02-26 15:21:13 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 457ef9b4b5 [generic] Improve RSS age limit extraction 2021-02-26 15:20:52 +01:00
renalid c359b19034 [generic] Fix RSS itunes thumbnail extraction (#27405) 2021-02-26 15:20:47 +01:00
Trevor Nelson 4f7380c8f5 [redditr] Extract duration (#27426) 2021-02-26 15:20:42 +01:00
Remita Amine 76c441edf0 [anvato] Disable NFLTokenGenerator(closes #27449) 2021-02-26 15:20:36 +01:00
Remita Amine 597505ed41 [zaq1] Remove extractor 2021-02-26 15:20:12 +01:00
Remita Amine 794a3becfb [asiancrush] fix extraction and add support for retrocrush.tv
closes #25577
closes #25829
2021-02-26 15:17:04 +01:00
Remita Amine 7346665442 [nfl] fix extraction(closes #22245) 2021-02-26 15:16:34 +01:00
Remita Amine 3463c192f6 [anvato] update ANVACK table and add experimental token generator for…
… NFL
2021-02-26 15:16:29 +01:00
Remita Amine 441fbc4056 [sky] relax SkySports URL regex (closes #27435) 2021-02-26 15:16:18 +01:00
Remita Amine dfb69009b9 [tv5unis] Add new extractor(closes #22399)(closes #24890) 2021-02-26 15:16:13 +01:00
Remita Amine 1315296aed [videomore] add support more.tv (closes #27088) 2021-02-26 15:16:07 +01:00
Remita Amine 1859fa8ac4 [nhk:program] Add support for audio programs and program clips 2021-02-26 15:15:53 +01:00
Matthew Rayermann 27765ca68f [nhk] Add support for NHK video programs (#27230) 2021-02-26 15:15:47 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 541e22037b [test_InfoExtractor] PEP 8 2021-02-26 15:15:40 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2c85578a1f [mdr] Bypass geo restriction 2021-02-26 15:15:34 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1339530c44 [mdr] Improve extraction (closes #24346, closes #26873) 2021-02-26 15:15:27 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c4dfcc3d9c [eporner] Fix view count extraction and make optional (closes #23306) 2021-02-26 15:14:15 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e4b993e9db [extractor/common] Improve JSON-LD interaction statistic extraction (…
…refs #23306)
2021-02-26 15:14:10 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2f63edb44a [eporner] Fix embed test URL 2021-02-26 15:13:58 +01:00
spvkgn ff7c31e4f2 [eporner] Fix hash extraction and extend _VALID_URL (#27396)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-02-26 15:13:48 +01:00
Remita Amine f42fea5402 [slideslive] use m3u8 entry protocol for m3u8 formats(closes #27400) 2021-02-26 15:13:40 +01:00
Remita Amine a7f325972c [downloader/hls] delegate manifests with media initialization to ffmpeg 2021-02-26 15:13:34 +01:00
Remita Amine 0f3f3e9046 [twitcasting] fix format extraction and improve info extraction(close…
…s #24868)
2021-02-26 15:13:28 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 51d290f5d7 [extractor/common] Document duration meta field for playlists 2021-02-26 15:13:21 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 28bbfdff53 [linuxacademy] Fix authentication and extraction (closes #21129, clos…
…es #26223, closes #27402)
2021-02-26 15:13:12 +01:00
Remita Amine 0311375dc5 [itv] clean description from HTML tags (closes #27399) 2021-02-26 15:13:05 +01:00
Remita Amine a1e7449703 [hotstart] fix and improve extraction
- fix format extraction (closes #26690)
- extract thumbnail URL (closes #16079, closes #20412)
- support country specific playlist URLs (closes #23496)
- select the last id in video URL (closes #26412)
2021-02-26 15:12:42 +01:00
toniz4 d1a7ceb19a [youtube] Add some invidious instances (#27373)
Co-authored-by: Cássio <heyitscassio@cock.li>
2021-02-26 15:12:33 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c1f59f3fb6 [ruutu] Extract more metadata and detect non-free videos (closes #21154) 2021-02-26 15:12:07 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e9de74c42f [ruutu] Authenticate format URLs (closes #21031, closes #26782) 2021-02-26 15:12:03 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 39031fb5ac [ruutu] Add support for static.nelonenmedia.fi (closes #25412) 2021-02-26 15:11:57 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1646a89d71 [ruutu] Extend _VALID_URL (closes #24839) 2021-02-26 15:11:51 +01:00
Remita Amine 19d8f83013 [facebook] Add support archived live video URLs(closes #15859) 2021-02-26 15:11:44 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a8573bb5b2 [wdr] Extent subtitles extraction and improve overall extraction (clo…
…ses #22672, closes #22723)
2021-02-26 15:11:39 +01:00
Remita Amine 7cebd30677 [facebook] add support for videos attached to Relay based story pages…
…(#10795)
2021-02-26 15:11:35 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ba0f2c14da [wdr:page] Add support for kinder.wdr.de (closes #27350) 2021-02-26 15:11:31 +01:00
Remita Amine fe9f5a795d [facebook] Add another regex for handleServerJS 2021-02-26 15:11:26 +01:00
Remita Amine cba73be180 [facebook] fix embed page extraction 2021-02-26 15:11:21 +01:00
compujo 7818a5cbb6 [YoutubeDL] Improve thumbnails' filenames deducing (closes #26010) (#…
…27244)
2021-02-26 15:11:12 +01:00
Remita Amine 96b2d8bb34 [PATCH] [facebook] add support for Relay post pages(closes #26935) 2021-02-26 15:08:54 +01:00
Remita Amine 9f4416afd7 [facebook] proper support for watch videos(closes #22795)(#27062) 2021-02-26 14:54:28 +01:00
Remita Amine e51e641c6c Revert "[facebook] add support for watch videos(closes #22795)"
This reverts commit dc65041c224497f46b2984df02c234ce54bdedfd.
2021-02-26 14:54:19 +01:00
Remita Amine 493a5245dc [facebook] add support for watch videos(closes #22795) 2021-02-26 14:54:13 +01:00
Remita Amine 1765b2f870 [facebook] add support for group posts with multiple videos(closes #1…
…9131)
2021-02-26 14:54:08 +01:00
Remita Amine c9b7b7dd04 [itv] remove old extractio method and fix series metadata extraction
closes #23177
closes #26897
2021-02-26 14:54:03 +01:00
Remita Amine 91f1af44a1 [facebook] redirect Mobile URLs to Desktop URLs
closes #24831
closes #25624
2021-02-26 14:53:58 +01:00
Remita Amine fa06aa76ad [facebook] Add support for Relay based pages(closes #26823) 2021-02-26 14:53:53 +01:00
Remita Amine feac903afb [facebook] try to reduce unessessary tahoe requests 2021-02-26 14:53:49 +01:00
Remita Amine e62320f70a [facebook] remove hardcoded chrome user-agent
closes #18974
closes #25411
closes #26958
closes #27329
2021-02-26 14:53:36 +01:00
Andrey Smirnoff ec7e1e27c2 [smotri] Remove extractor (#27358) 2021-02-26 14:53:29 +01:00
Remita Amine 325ff4c628 [beampro] Remove Extractor
closes #17290
closes #22871
closes #23020
closes #23061
closes #26099
2021-02-26 14:53:18 +01:00
EntranceJew eefe89651d [tubitv] Extract release year (#27317) 2021-02-26 14:53:07 +01:00
Remita Amine 228d41686d [amcnetworks] Fix free content extraction(closes #20354) 2021-02-26 14:52:17 +01:00
Remita Amine e754d9d1a5 [telequebec] Fix Extraction and Add Support for video.telequebec.tv
closes #25733
closes #26883
closes #27339
2021-02-26 14:52:02 +01:00
Remita Amine 8b9bc4eeee [generic] comment a test covered now by AmericasTestKitchenIE 2021-02-26 14:51:54 +01:00
Remita Amine 7e83a9d619 [tvplay:home] Fix extraction(closes #21153) 2021-02-26 14:51:45 +01:00
Remita Amine 96e0370bb2 [americastestkitchen] Fix Extraction and add support for Cook's Count…
…ry and Cook's Illustrated

closes #17234
closes #27322
2021-02-26 14:51:38 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 32e8c82a3b [slideslive] Add support for yoda service videos and extract subtitle…
…s (closes #27323)
2021-02-26 14:51:20 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= f717a3cc82 [extractor/generic] Remove unused import 2021-02-26 14:51:01 +01:00
Remita Amine 96e0184377 [aenetworks] Fix extraction
- Fix Fastly format extraction
- Add support for play and watch subdomains
- Extract series metadata

closes #23363
closes #23390
closes #26795
closes #26985
2021-02-26 14:50:53 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 371904a4d9 [extractor/common] Extract timestamp from Last-Modified header 2021-02-26 14:50:42 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1744410baa [generic] Extract RSS video itunes metadata 2021-02-26 14:48:44 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0257cb6e42 [generic] Extract RSS video timestamp 2021-02-26 14:48:35 +01:00
renalid e76a3363ba [generic] Extract RSS video description (#27177) 2021-02-26 14:48:29 +01:00
Remita Amine 5c239bfc65 [nrk] reduce requests for Radio series 2021-02-26 14:48:24 +01:00
Remita Amine 04ac0950aa [nrk] reduce the number of instalments requests 2021-02-26 14:46:34 +01:00
Remita Amine d88959f3b3 [nrk] improve format extraction 2021-02-26 14:46:29 +01:00
Remita Amine b8975995ef [nrk] improve extraction
- improve format extraction for old akamai formats
- update some of the tests
- add is_live value to entry info dict
- request instalments only when their available
- fix skole extraction
2021-02-26 14:46:24 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 93e7c99ad6 [peertube] Extract fps 2021-02-26 14:46:16 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b79c74dad9 [peertube] Recognize audio-only formats (closes #27295) 2021-02-26 14:46:08 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8e06fa07b9 [teachable:course] Improve extraction (closes #24507, closes #27286) 2021-02-26 14:46:00 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0ef2cc2a31 [nrk] Improve error extraction 2021-02-26 14:44:43 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 05fae5e182 [nrktv] Relax _VALID_URL 2021-02-26 14:44:39 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= d3b00a0fa6 [nrktv:series] Improve extraction (closes #21926) 2021-02-26 14:44:34 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ea80c8f15e [nrktv:series] Improve extraction 2021-02-26 14:44:26 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 08fea1baa1 [nrktv:season] Improve extraction 2021-02-26 14:44:20 +01:00
Remita Amine 75dc35e418 [nrk] fix call to moved method 2021-02-26 14:44:14 +01:00
Remita Amine 226efefec6 [nrk] fix typo 2021-02-26 14:44:09 +01:00
Remita Amine d439a5df63 [nrk] improve format extraction and geo-restriction detection (closes…
#24221)
2021-02-26 14:44:04 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 58edf65c1b [pornhub] Handle HTTP errors gracefully (closes #26414) 2021-02-26 14:43:56 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 841628af91 [nrktv] Relax _VALID_URL (closes #27299, closes #26185) 2021-02-26 14:43:47 +01:00
Remita Amine ecfc7cb9f1 [zdf] extract webm formats(closes #26659) 2021-02-26 14:43:06 +01:00
Matthew Rayermann 8c8e98ffdd [nhk] Add audio clip test to NHK extractor (#27269) 2021-02-26 14:42:58 +01:00
Remita Amine 2f04ca9dac [gamespot] Extract DASH and HTTP formats 2021-02-26 14:42:52 +01:00
Remita Amine 76263cc893 [extractor/commons] improve Akamai HTTP formats extraction 2021-02-26 14:42:46 +01:00
Remita Amine 0a48eb0c7f [tver] correct episode_number key 2021-02-26 14:42:39 +01:00
Remita Amine d60195c74b [extractor/common] improve Akamai HTTP format extraction
- Allow m3u8 manifest without an additional audio format
- Fix extraction for qualities starting with a number
Solution provided by @nixxo based on: https://stackoverflow.com/a/5984688
2021-02-26 14:42:28 +01:00
Remita Amine b789c2b6bb [tver] Add new extractor (closes #26662)(closes #27284) 2021-02-26 14:41:57 +01:00
Remita Amine 87889f1fe8 [extractors] Add QubIE import 2021-02-26 14:39:36 +01:00
Remita Amine 0475d9eaff [tva] Add support for qub.ca (closes #27235) 2021-02-26 14:39:30 +01:00
Remita Amine 9e5ac5f629 [toggle] Detect DRM protected videos (closes #16479)(closes #20805) 2021-02-26 14:39:24 +01:00
Remita Amine 9b24767e1e [toggle] Add support for new MeWatch URLs (closes #27256) 2021-02-26 14:39:18 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8c785f8472 [cspan] Extract info from jwplayer data (closes #3672, closes #3734, …
…closes #10638, closes #13030, closes #18806, closes #23148, closes #24461, closes #26171, closes #26800, closes #27263)
2021-02-26 14:39:10 +01:00
=?UTF-8?q?Roman=20Ber=C3=A1nek?= a321724c88 [cspan] Pass Referer header with format's video URL (#26032) (closes …
…#25729)
2021-02-26 14:38:55 +01:00
Remita Amine 31a2706650 [mediaset] add support for movie URLs(closes #27240) 2021-02-26 14:38:48 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 863cae8fe4 [yandexmusic:track] Fix extraction (closes #26449, closes #26669, clo…
…ses #26747, closes #26748, closes #26762)
2021-02-26 14:38:16 +01:00
Michael Munch f44820d718 [drtv] Extend _VALID_URL (#27243) 2021-02-26 14:38:09 +01:00
bopol bbb93695b0 [ina] Add support for mobile URLs (#27229) 2021-02-26 14:37:59 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= de296b234a [YoutubeDL] Write static debug to stderr and respect quiet for dynami…
…c debug (closes #14579, closes #22593)

TODO: logging and verbosity needs major refactoring (refs #10894)
2021-02-26 14:37:50 +01:00
=?UTF-8?q?Adrian=20Heine=20n=C3=A9=20Lang?= 34a34d7f71 [videa] Adapt to updates (#26301)
closes #25973, closes #25650.
2021-02-26 14:37:14 +01:00
Remita Amine 37108e29a6 [spreaker] fix SpreakerShowIE test URL 2021-02-26 14:37:08 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 79cd28f514 [spreaker] Add extractor (closes #13480, closes #13877) 2021-02-26 14:37:00 +01:00
Remita Amine 7a49184ca6 [viki] fix video API request(closes #27184) 2021-02-26 14:36:24 +01:00
Remita Amine 3f6dc5d4ef [bbc] fix BBC Three clip extraction 2021-02-26 14:36:19 +01:00
Remita Amine 45eded9bd2 [bbc] fix BBC News videos extraction 2021-02-26 14:36:12 +01:00
Remita Amine d1114a12e1 [medaltv] improve extraction 2021-02-26 14:36:07 +01:00
Joshua Lochner 997dc3ca44 [medaltv] Add new extractor (#27149) 2021-02-26 14:36:01 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 00088ef4b1 [downloader/fragment] Set final file's mtime according to last fragme…
…nt's Last-Modified header (closes #11718, closes #18384, closes #27138)
2021-02-26 14:35:54 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c719619471 [nrk] Fix extraction 2021-02-26 14:35:49 +01:00
Remita Amine 2a368bc78e [pinterest] Add support for large collections(more than 25 pins) 2021-02-26 14:35:22 +01:00
Remita Amine 3d030642c7 [franceinter] flake8 2021-02-26 14:35:16 +01:00
renalid 950c574c22 [franceinter] add thumbnail url (#27153)
Co-authored-by: remitamine <remitamine@gmail.com>
2021-02-26 14:35:11 +01:00
Remita Amine e1c07eb79f [box] Add new extractor(#5949) 2021-02-26 14:34:37 +01:00
Jia Rong Yee 7a0255f6e2 [nytimes] Add new cooking.nytimes.com extractor (#27143)
* [nytimes] support cooking.nytimes.com, resolves #27112

Co-authored-by: remitamine <remitamine@gmail.com>
2021-02-26 14:34:21 +01:00
Remita Amine abe5d97246 [rumble] add support for embed pages(#10785) 2021-02-26 14:34:06 +01:00
Remita Amine 186e07f960 [skyit] add support for multiple Sky Italia websites(closes #26629) 2021-02-26 14:34:00 +01:00
Remita Amine ac852e57a0 [extractor/common] add generic support for akamai http format extraction 2021-02-26 14:33:51 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 9a5816f425 [pinterest] Add extractor (closes #25747) 2021-02-26 14:33:38 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= d64e153832 [svtplay] Fix test title 2021-02-26 14:33:05 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 5a94d1b61d [svtplay] Add support for svt.se/barnkanalen (closes #24817) 2021-02-26 14:32:59 +01:00
Mattias Wadman 9e816eca8f [svt] Extract timestamp and thumbnail in more cases (#27130)
Add timestamp, set to "valid from" which i think could been seen as publish time.
Add thumbnail in more cases, seems to was only done in the embedded data case for some reason.
Switch svtplay test url to an existing video and also one with no expire date.
Also add an additional thumbnail url test regex.
2021-02-26 14:32:52 +01:00
Remita Amine 968583c56f [infoq] fix format extraction(closes #25984) 2021-02-26 14:28:13 +01:00
renalid f3c426a2ee [francetv] Update to fix thumbnail URL issue (#27120)
Fix the thumbnail URL. The issue was here for many years, never fixed. It's done ! :-)

Example : https://www.france.tv/france-2/de-gaulle-l-eclat-et-le-secret/de-gaulle-l-eclat-et-le-secret-saison-1/2035247-solitude.html

failed thumbnail url generated : http://pluzz.francetv.fr/staticftv/ref_emissions/2020-11-02/EMI_1104da66f533cc7dc5d0d07a181a18c2e2fe1d81_20201014122553940.jpg

right thumbnail url fixed : https://sivideo.webservices.francetelevisions.fr/staticftv/ref_emissions/2020-11-02/EMI_1104da66f533cc7dc5d0d07a181a18c2e2fe1d81_20201014122553940.jpg
2021-02-26 14:28:03 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= bb0f8c2607 [downloader/http] Fix crash during urlopen caused by missing reason o…
…f URLError
2021-02-26 14:27:49 +01:00
Laura Liberda acfb99b684 improve copykitku patch hook 2021-02-26 14:27:42 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8175a5e8b1 [YoutubeDL] Fix --ignore-errors for playlists with generator-based en…
…tries of url_transparent (closes #27064)
2021-02-26 14:19:38 +01:00
Remita Amine 3ffb643844 [discoverynetworks] add support new TLC/DMAX URLs(closes #27100) 2021-02-26 14:19:12 +01:00
Remita Amine a732493292 [rai] fix protocol relative relinker URLs(closes #22766) 2021-02-26 14:19:07 +01:00
Remita Amine 493d279604 [rai] fix unavailable video format detection 2021-02-26 14:19:02 +01:00
Remita Amine a7bd83e154 [rai] improve extraction 2021-02-26 14:18:55 +01:00
Leonardo Taccari 9fd254036b [rai] Fix extraction for recent raiplay.it updates (#27077)
- Remove first test of RaiPlayIE: it is no longer available
- Make RaiPlayIE extension-agnostic (passing possible `.json' URLs is now
  supported too)
- Adjust RaiPlayLiveIE to recent raiplay.it updates.  Passing it as
  `url_transparent' is no longer supported (there is no longer an accessible
  ContentItem)
- Adjust RaiPlayPlaylistIE to recent raiplay.it updates and instruct it about
  ContentSet-s.
- Update a RaiIE test and remove two tests that are no longer availables

Thanks to @remitamine for the review!
2021-02-26 14:18:51 +01:00
Remita Amine ddc62043ed [viki] improve format extraction 2021-02-26 14:18:46 +01:00
beefchop 9adedd82f3 [viki] fix stream extraction from mpd (#27092)
Co-authored-by: beefchop <beefchop@users.noreply.github.com>
2021-02-26 14:18:39 +01:00
Remita Amine 339f127540 [amara] improve extraction 2021-02-26 14:16:30 +01:00
Joost Verdoorn 9a527679ed [Amara] Add new extractor (#20618)
* [Amara] Add new extractor
2021-02-26 14:16:03 +01:00
Remita Amine 514683921a [vimeo:album] fix extraction(closes #27079) 2021-02-26 14:13:40 +01:00
Remita Amine 46fce7272c [mtv] fix mgid extraction(closes #26841) 2021-02-26 14:13:32 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3a32ea072b [youporn] Fix upload date extraction and make comment count optional …
…(closes #26986)
2021-02-26 14:13:24 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1451f4f498 [arte] Rework extractors
* Reimplement embed and playlist extractors to delegate to the single entrypoint artetv extractor
  Beware reluctant download archive extractor keys breakage.
* Improve embeds detection (closes #27057)
- Remove obsolete code
2021-02-26 14:13:19 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 93064492e9 [arte] Extract m3u8 formats (closes #27061) 2021-02-26 14:13:13 +01:00
Remita Amine f8fb198326 [mgtv] fix format extraction(closes #26415) 2021-02-26 14:13:08 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e2b997d3bf [extractor/common] Output error for invalid URLs in _is_valid_url (re…
…fs #21400, refs #24151, refs #25617, refs #25618, refs #25586, refs #26068, refs #27072)
2021-02-26 14:13:00 +01:00
Remita Amine 9a4014d394 [francetv] improve info extraction 2021-02-26 14:12:48 +01:00
gdzx ff92752e7c [francetv] Add fallback video url extraction (#27047)
Fallback on another API endpoint when no video formats are found.

Closes ytdl-org#22561
2021-02-26 14:12:41 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 9f47f2a04e [spiegel] Fix extraction (closes #24206, closes #24767)
Code picked from PR #24767 since original repo is not available due to takedown.
2021-02-26 14:12:29 +01:00
Remita Amine 14539655d5 [malltv] fix extraction(closes #27035) 2021-02-26 14:12:10 +01:00
Remita Amine 4826425743 [bandcamp] extract playlist_description(closes #22684) 2021-02-26 14:12:02 +01:00
Remita Amine 768e8bb238 [urplay] fix extraction(closes #26828) 2021-02-26 14:11:55 +01:00
Remita Amine ebc218c4c4 [lrt] fix extraction with empty tags(closes #20264) 2021-02-26 14:11:39 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 883cf213dc [ndr:embed:base] Extract subtitles (closes #25447, closes #26106) 2021-02-26 14:11:26 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ae004ab316 [servus] Add support for pm-wissen.com (closes #25869) 2021-02-26 14:11:12 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 058b02f57f [servus] Fix extraction (closes #26872, closes #26967, closes #26983,…
closes #27000)
2021-02-26 14:11:04 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 51dd5a4cc5 [xtube] Fix extraction (closes #26996) 2021-02-26 14:10:56 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= bc38ef9445 [utils] Skip ! prefixed code in js_to_json 2021-02-26 14:10:46 +01:00
Remita Amine 2901a6439b [lrt] fix extraction 2021-02-26 14:10:38 +01:00
Remita Amine c62c95923a [condenast] fix extraction and extract subtitles 2021-02-26 14:10:03 +01:00
Remita Amine 44676b32c3 [bandcamp] fix extraction 2021-02-26 14:09:54 +01:00
Remita Amine d52a2bf577 [rai] fix RaiPlay extraction 2021-02-26 14:09:47 +01:00
Remita Amine 33c8322b1d [usanetwork] fix extraction 2021-02-26 14:02:17 +01:00
Remita Amine 4d26aa35af [nbc] fix NBCNews/Today/MSNBC extraction 2021-02-26 14:00:20 +01:00
Edward Betts 0f60a7c66c [devscripts/make_lazy_extractors] Correct a spelling mistake (#26991) 2021-02-26 13:59:51 +01:00
Remita Amine b85fc0e982 [cnbc] fix extraction 2021-02-26 01:17:49 +01:00
Lauren Liberda 37d16d8dbf Merge branch 'haruhi-dl/haruhi-dl/mr-4' into 'master'
Drop python2 in bin file

See merge request laudompat/haruhi-dl!2
2021-02-25 19:10:11 +00:00
sech1p 23a8ab9cd4 drop python 2 2021-02-25 19:47:18 +01:00
Laura Liberda f42428e8e0 tvn24: next.js frontend extraction without playwright
thanks to @ptrcnull
2021-02-25 15:35:14 +01:00
Laura Liberda 07f5e2ae1c common: next.js data searcher 2021-02-25 15:22:17 +01:00
Laura Liberda 51eefea363 test_download: adjust tests to the environments properly 2021-02-24 17:46:00 +01:00
Laura Liberda 3bd8aa2897 playwright extractor testing 2021-02-24 17:17:22 +01:00
Laura Liberda 3ac7b35f1b PolskaPress extractor 2021-02-24 17:06:33 +01:00
Laura Liberda f91cd92ea0 [x-news] search for x-link in divs 2021-02-24 14:48:54 +01:00
Laura Liberda 8d30f19740 transistorfm extractors
based on extractor by @asz: https://github.com/ytdl-org/youtube-dl/pull/28022
2021-02-24 12:20:49 +01:00
Laura Liberda 1c3ca4fe2c copykitku project config and patch hook 2021-02-24 11:26:54 +01:00
Dominika eb0891a824 version 2021.02.23 2021-02-23 15:06:34 +01:00
Laura Liberda 84c08079bd [youtube] fix dynamic crypto 2021-02-23 14:01:39 +01:00
Dominika Liberda 5f6c836641 version 2021.02.22 2021-02-22 20:44:54 +01:00
Laura Liberda 75e2b81f08 updated readme about python 2 2021-02-22 20:42:57 +01:00
Dominika Liberda 13db87c0a2 + new crypto for YT 2021-02-22 20:31:44 +01:00
Dominika Liberda 32cf3c6401 * reworked manual crypto extractor for YT 2021-02-22 20:31:28 +01:00
Dominika Liberda 85bb796e05 * jwplayer search meaningful error 2021-02-22 20:23:33 +01:00
Laura Liberda e35e90a263 [tvn24] fix age_limit on fakty 2021-02-12 13:16:04 +01:00
Laura Liberda 9bcb5f1b61 [tvn24] fix fakty extraction (closes #37) 2021-02-12 12:59:24 +01:00
Laura Liberda 8a703db751 add symlink from youtube_dl to haruhi_dl 2021-02-11 16:33:05 +01:00
Laura Liberda ac9192619a [pulsevideo] support age limit 2021-02-11 14:34:59 +01:00
Laura Liberda 4cfa7883a3 [clip.rs] fix extraction 2021-02-11 14:08:07 +01:00
Laura Liberda b55552ad1a [vod.pl] fix extraction 2021-02-11 13:18:57 +01:00
Laura Liberda 630a86c5e3 refactor from onet mvp to pulsevideo 2021-02-11 01:09:05 +01:00
Laura Liberda f603f36c3f warn about python 2 2021-02-10 22:22:37 +01:00
Laura Liberda 46643f645d [pulsembed] fix 2021-02-10 22:06:47 +01:00
Laura Liberda 62008d3f8b pulsembed extractor 2021-02-10 01:55:47 +01:00
Laura Liberda d980ef4e35 [tvp:embed] fix DeprecationWarning 2021-02-10 01:50:08 +01:00
Laura Liberda bffba35446 [x-link] add iframe support 2021-02-10 01:39:10 +01:00
Laura Liberda 07f326f52d [tvp:embed] fix _extract_urls 2021-02-10 01:05:00 +01:00
Laura Liberda f2038499ef [OnetMVP] url extractor 2021-02-10 00:40:16 +01:00
Laura Liberda a60c736c44 [tvp:embed] new url scheme and url extractor 2021-02-10 00:39:27 +01:00
Laura Liberda edca3b8c96 simplify libsyn extraction 2021-02-10 00:38:40 +01:00
Laura Liberda b98dd103da [tiktok] user profile extractor 2021-02-05 16:37:22 +01:00
Laura Liberda a9cd876cf9 [lurker] domain update 2021-02-05 13:00:36 +01:00
Laura Liberda 87fad4b7eb x-link (x-news.pl embeds) extractor 2021-01-31 01:30:47 +01:00
Laura Liberda a3816f69be [generic] normalizing embeds part 2137 2021-01-31 01:26:39 +01:00
Laura Liberda b1c1d64de0 albicla extractor 2021-01-30 19:47:13 +01:00
Dominika Liberda f3b5985cc3 * yeet CI 2021-01-29 19:21:24 +01:00
Laura Liberda d2e522de09 wp.pl extractor 2021-01-25 15:45:55 +01:00
Laura Liberda 0b1dcd17b3 [tvn24] fix next.js frontend extraction on no cookies 2021-01-24 22:21:55 +01:00
Laura Liberda f39144fa0a [youtube] youtube.com/video/ url scheme 2021-01-24 20:14:01 +01:00
Laura Liberda d14745983d [peertube] reduce request amount if webpage downloaded 2021-01-24 06:13:30 +01:00
Dominika e65850dfd5 version 2021.01.24 2021-01-24 05:17:04 +01:00
Laura Liberda d08ac18c25 [tvn24] remove tvn24bis.pl references, remove GDPR consent cookies before opening page in browser 2021-01-24 04:56:39 +01:00
Laura Liberda 85e45ed607 remove phantomjs wrapper (closes #28) 2021-01-23 20:18:54 +01:00
Laura Liberda 95b061297c [pornhub] refactor scraping protection workaround from phantomjs to playwright 2021-01-23 20:15:27 +01:00
Laura Liberda 88f5839a37 [playwright] cookie sync, non-headless option, global playwright instance 2021-01-23 20:14:31 +01:00
Laura Liberda 1285da6e17 [youtube] match /shorts/ urls 2021-01-23 15:42:50 +01:00
Laura Liberda 570cf794a9 [agora] wyborcza/wysokieobcasy/tokfm podcast fixes 2021-01-23 00:31:40 +01:00
Laura Liberda 3bb3d99229 simplify checking package version for some people 2021-01-22 23:17:58 +01:00
Laura Liberda 4f112c3592 revert my stupidity 2021-01-22 23:05:04 +01:00
Laura Liberda bed7b7f44c [crunchyroll] connect to https by default 2021-01-22 20:00:06 +01:00
Laura Liberda 339db8d1d2 [tvn24] support nextjs frontend (playwright required) and magazine 2021-01-22 19:59:00 +01:00
Laura Liberda 1804387116 playwright helper improvements 2021-01-22 19:58:03 +01:00
Laura Liberda 1cbbae3868 playwright version printing on --verbose 2021-01-22 19:57:28 +01:00
Laura Liberda 0ec8c8e8f1 is_outdated_version: allow exact same version 2021-01-22 19:56:26 +01:00
Laura Liberda 54ea13966e [youtube] minor playlist improvements 2021-01-19 18:45:18 +01:00
Laura Liberda 7fd9596a0f fix tests on envs with lazy_extractors builded 2021-01-19 18:44:27 +01:00
Laura Liberda 6807c8869a [playwright] fix tests again 2021-01-18 22:59:11 +01:00
Laura Liberda 9224bfe84f [playwright] fix tests 2021-01-18 22:57:17 +01:00
Laura Liberda 29128c9704 update readme 2021-01-18 22:40:58 +01:00
Laura Liberda c53f744097 playwright wrapper (#28) 2021-01-18 22:39:18 +01:00
Laura Liberda 365daad4f5 oko.press extractor 2021-01-18 03:52:28 +01:00
Laura Liberda abfbb7d014 _json_ld: podcasts objects 2021-01-18 03:52:19 +01:00
Laura Liberda 3a79666639 update readme 2021-01-18 02:35:46 +01:00
Laura Liberda 254f95b75f wyborcza extractors, merge agora extractors files 2021-01-17 08:41:00 +01:00
Laura Liberda d959fff39f lurker extractor 2021-01-17 06:08:30 +01:00
Dominika 2c354d15a0 version 2021.01.16 2021-01-16 02:38:11 +01:00
Laura Liberda 097c7bd4ba linkedin:post extractor 2021-01-11 17:58:24 +01:00
Laura Liberda 05620dab04 [weibo] dash formats 2021-01-10 23:36:29 +01:00
Laura Liberda eb1333e65b [vimeo:review] fix videos with video password
https://github.com/ytdl-org/youtube-dl/issues/27591
2021-01-10 10:42:15 +01:00
Laura Liberda 90291aa422 gtv.org extractor, I guess 2021-01-10 09:56:59 +01:00
Laura Liberda b2f578c3c6 fix youtube url tests 2021-01-10 03:56:47 +01:00
Laura Liberda fe0f39da07 fix for youtube topic channels 2021-01-10 01:27:39 +01:00
Laura Liberda 90f556c478 [youtube] fix regular channels and playlists 2021-01-10 00:11:36 +01:00
Laura Liberda d51d7e3865 youtube music album extractor 2021-01-10 00:06:08 +01:00
Laura Liberda b044df8d86 LBRY extractor
why did I do this?
2021-01-09 04:31:11 +01:00
Laura Liberda 2acc7755fc [generic] embetty embeds 2021-01-09 02:11:47 +01:00
Laura Liberda 0abf03dbaf [heise] extract embetty embeds 2021-01-09 02:11:33 +01:00
Laura Liberda d72d60bf5e embetty extractor 2021-01-09 02:11:12 +01:00
Laura Liberda 2a772c54bb the guardian extractors 2021-01-08 21:01:36 +01:00
Laura Liberda cf9d87b0e6 [tvp:series] refactor to API 2021-01-08 12:41:55 +01:00
Laura Liberda 6e4b1019e4 basic tuba.fm support 2021-01-08 11:19:15 +01:00
Laura Liberda c758741d55 rmf extractors 2021-01-07 00:51:08 +01:00
Laura Liberda 0aa7cb240c _json_ld: handle multiple thumbnails 2021-01-07 00:49:10 +01:00
Laura Liberda 78d9e9046c [cda] refactor to mobile JSON API 2021-01-06 01:28:32 +01:00
Laura Liberda 1d601522cc a bit more embed searching normalization 2021-01-05 20:06:22 +01:00
Dominika Liberda 8f86520b7a version 2021.01.03 2021-01-03 23:30:31 +01:00
Dominika Liberda 5049bf7bea + Ipla Extractor (reverse engineered by @ptrcnull) 2021-01-03 23:16:16 +01:00
Laura Liberda 33c63089d9 tvnplayer:series extractor 2021-01-03 21:13:16 +01:00
Laura Liberda 837924dd1b fix tvnplayer on py2.7 2021-01-03 05:14:37 +01:00
Dominika Liberda 9bba7c1c5f + TVNPlayer Extractor (reverse engineered by @ptrcnull) 2021-01-03 03:52:08 +01:00
Laura Liberda 7ed8e0c502 tokfm podcast and audition extractor 2021-01-03 02:42:43 +01:00
Laura Liberda d915fe0b0c [generic] embed searching normalization 2/n 2021-01-01 07:05:16 +01:00
Laura Liberda c73049bc5b [youtube] alt_title w/ eng title, if it differs from the original one 2020-12-14 18:52:31 +01:00
Laura Liberda 49bf656179 [generic] simplify the embed searching a lot 2020-12-14 17:12:34 +01:00
Laura Liberda c9accf707d [funkwhale] fix TypeError on null album release date 2020-12-13 22:19:20 +01:00
Laura Liberda 048bac7c49 [funkwhale:channel] fix tests 2020-12-13 22:13:54 +01:00
Laura Liberda c3e1d87fcd [funkwhale] add webpage_url 2020-12-13 22:12:46 +01:00
Laura Liberda c40130632c [funkwhale] radio extractor 2020-12-13 21:57:22 +01:00
Laura Liberda 5592cda782 [tvp] abc.tvp.pl and general fixes 2020-12-13 03:12:31 +01:00
Laura Liberda 2d2abe34c8 [youtube] fix (dis)like count when it's 0 (#14) 2020-12-12 23:53:50 +01:00
Laura Liberda 572b04b7f1 [funkwhale] fix track duration 2020-12-12 23:11:58 +01:00
Laura Liberda bc1164719f [funkwhale] improve album data extraction 2020-12-12 23:05:16 +01:00
Laura Liberda e59434e242 [onet] remove extractors for dead services 2020-12-12 17:15:06 +01:00
Laura Liberda f646087983 [funkwhale] improve detection 2020-12-12 17:01:59 +01:00
Laura Liberda 4221c2ee68 funkwhale extractors 2020-12-12 06:12:40 +01:00
Dominika b2e1200c40 version 2020.12.11 2020-12-11 23:55:40 +01:00
Laura Liberda 32617e06b3 youtube:channel hotfix 2020-12-11 23:35:25 +01:00
Laura Liberda a12318ed7e shie: _match_id_and_host helper 2020-12-10 23:39:39 +01:00
Laura Liberda adea7807af mastodon extractor (#11) 2020-12-10 03:23:28 +01:00
Laura Liberda 0d8a0cefc1 fix suitable_selfhosted() on extractors with no regexes 2020-12-10 03:23:01 +01:00
Laura Liberda e33e398767 [generic] fix peertube embed tests 2020-12-10 01:46:57 +01:00
Laura Liberda b3c8623e4c [tvp:embed] refactored to TVPlayer2 API 2020-12-10 01:33:20 +01:00
Laura Liberda 6530f90d24 fixed try_get util 2020-12-10 01:32:45 +01:00
Laura Liberda 889005bab3 selfhosted extractors, peertube extractor reworked (#10) 2020-12-09 21:52:30 +01:00
Laura Liberda 005b3fbedd eskago extractor (#20) 2020-12-08 00:16:49 +01:00
Laura Liberda dab3e41041 [atttechchannel] fix working, drop flash references 2020-12-07 04:37:02 +01:00
Laura Liberda a6102b5483 eurozet player extractors (#16) 2020-12-07 03:48:12 +01:00
Laura Liberda 24a54d5d52 eurozet article video extractor (#16) 2020-12-06 20:52:26 +01:00
Laura Liberda 4453792f0a [onet] amp urls 2020-12-06 04:07:00 +01:00
Laura Liberda 86629d8574 [onet] libsyn podcasts support, fixed tests 2020-12-06 03:51:23 +01:00
Laura Liberda 9b6ec60622 [onnetwork:frame] simplify code a lot 2020-12-06 01:17:44 +01:00
Laura Liberda 91d6c6dbd4 [polskieradio] livestream player extractor 2020-12-06 01:14:27 +01:00
Laura Liberda 6a4da9addf [polskieradio] new player_data thing 2020-12-05 20:01:55 +01:00
Laura Liberda ef114c2560 [polskieradio] support for polskieradio24.pl 2020-12-05 17:58:31 +01:00
Laura Liberda 3a38277a95 [cda] more fail-safe url replaces 2020-12-05 17:29:03 +01:00
Laura Liberda 33c19049af [cda] fix adult pages 2020-12-05 17:14:08 +01:00
Laura Liberda a11b8a8e5f [tvp] regional (client-side rendered) pages 2020-12-05 05:18:19 +01:00
Laura Liberda 07fa0508f9 [youtube] remove more useless tests 2020-12-05 01:09:29 +01:00
Laura Liberda 043bec4c9b [tvp] tvp stream support 2020-12-05 01:09:06 +01:00
Laura Liberda 0eea826b48 fix the license declaration in __init__ 2020-12-04 23:37:41 +01:00
Laura Liberda 10bb46f940 [tvp] polandin.com support 2020-12-04 19:37:08 +01:00
Dominika b3383de0a5 version 2020.11.27 2020-11-27 14:28:08 +01:00
Laura Liberda 40638606b9 [youtube] history, subscriptions 2020-11-24 21:48:21 +01:00
Laura Liberda 75c1755cc1 [youtube] liked, watch later support (#2) 2020-11-24 16:58:50 +01:00
Laura Liberda ea7336113f update readme because why the fuck not 2020-11-22 23:38:10 +01:00
Laura Liberda a3cba131b3 openfm extractor (closes #8) 2020-11-22 05:44:40 +01:00
Laura Liberda abdc94a6a5 ninateka extractor (closes #7) 2020-11-22 05:05:51 +01:00
Laura Liberda 7671ce8f00 [youtube] fix yet another UnboundLocalError 2020-11-21 00:56:20 +01:00
Dominika 486463ba53 version 2020.11.20 2020-11-20 13:36:08 +01:00
Laura Liberda b26cd01621 readme update 2020-11-19 13:03:51 +01:00
Laura Liberda c08b658033 [wykop] remove debug logs 😳 2020-11-18 17:51:46 +01:00
Laura Liberda 4eaf67ad2e [youtube] fix UnboundLocalError 2020-11-17 17:03:15 +01:00
Laura Liberda 2c9b034593 [test] fix youtube format selection test
it's up to the current standards now
2020-11-17 16:56:26 +01:00
Laura Liberda 0aea1ea746 update readme with installing instructions 2020-11-17 15:45:09 +01:00
Dominika a661491bcf removed more SWF references from youtube extractor 2020-11-17 13:56:32 +01:00
Laura c55393ce44 [youtube] fix channel/search on videos with no views 2020-11-16 02:31:20 +01:00
selfisekai 80a5d8d55e [youtube] fix channels with hyphen inside id 2020-11-16 02:31:20 +01:00
Dominika 98e6a95bd5 version 2020.11.16 2020-11-16 01:31:37 +01:00
selfisekai 6f876fba51 [youtube] fixed some download tests 2020-11-15 23:08:28 +01:00
selfisekai 158d4e9088 [youtube] search info extractor 2020-11-15 20:31:40 +01:00
selfisekai ede99f9f13 remove youtube:live, fix tests 2020-11-15 16:41:15 +01:00
selfisekai 92d1bd1b90 [youtube] brand new channel/playlist extractors 2020-11-15 15:44:07 +01:00
Laura Liberda f4131bcac4 Merge branch 'youtube-sig-crypto-proposal' into 'master'
[youtube] dynamic sig crypto fallback

See merge request laudom/haruhi-dl!1
2020-11-13 10:21:21 +00:00
selfisekai 85d8ce7a3d Merge branch 'master' of ssh://git.sakamoto.pl:2137/laudom/haruhi-dl into youtube-sig-crypto-proposal 2020-11-12 11:04:15 +01:00
selfisekai 0259a32b73 [youtube] fix reworked sig decrypting 2020-11-12 11:02:07 +01:00
selfisekai 8c0ff392ea [youtube] dynamic sig improvements 2020-11-12 06:31:11 +01:00
selfisekai e31c0d2576 [youtube] dynamic sig crypto fallback 2020-11-09 16:02:49 +01:00
385 changed files with 24526 additions and 17182 deletions

5
.copykitkurc.toml Normal file
View file

@ -0,0 +1,5 @@
# https://git.sakamoto.pl/laudompat/copykitku
dest = "https://git.sakamoto.pl/laudompat/haruhi-dl"
patchHook = "devscripts/copykitku-patch-hook.js"

2
.github/FUNDING.yml vendored Normal file
View file

@ -0,0 +1,2 @@
github: selfisekai
ko_fi: selfisekai

2
.gitignore vendored
View file

@ -15,6 +15,7 @@ haruhi-dl.1
haruhi-dl.bash-completion
haruhi-dl.fish
haruhi_dl/extractor/lazy_extractors.py
haruhi_dl/extractor_artifacts/
haruhi-dl
haruhi-dl.exe
haruhi-dl.tar.gz
@ -39,6 +40,7 @@ updates_key.pem
*.part
*.ytdl
*.swp
*.webm
test/local_parameters.json
.tox
youtube-dl.zsh

View file

@ -1,13 +1,25 @@
default:
before_script:
- sed -i "s@dl-cdn.alpinelinux.org@alpine.sakamoto.pl@g" /etc/apk/repositories
- apk add bash
- pip install nose
py2.7-core:
image: python:2.7
pypy3.6-core:
image: pypy:3.6-slim
variables:
HDL_TEST_SET: core
before_script:
- apt-get update && apt-get install -y bash && apt-get clean
- pip install nose
script:
- ./devscripts/run_tests.sh
pypy3.7-core:
image: pypy:3.7-slim
variables:
HDL_TEST_SET: core
before_script:
- apt-get update && apt-get install -y bash && apt-get clean
- pip install nose
script:
- ./devscripts/run_tests.sh
@ -48,15 +60,25 @@ py3.9-download:
script:
- ./devscripts/run_tests.sh
jython-core:
image: openjdk:11-slim
playwright-tests-core:
image: mcr.microsoft.com/playwright:focal
variables:
HDL_TEST_SET: core
allow_failure: true
before_script:
- apt-get update
- apt-get install -y wget
- ./devscripts/install_jython.sh
- export PATH="$HOME/jython/bin:$PATH"
- apt-get update && apt-get install -y bash && apt-get clean
- pip install nose
script:
- ./devscripts/run_tests.sh
playwright-tests-download:
image: mcr.microsoft.com/playwright:focal
variables:
HDL_TEST_SET: download
HDL_TEST_PLAYWRIGHT_DOWNLOAD: 1
allow_failure: true
before_script:
- apt-get update && apt-get install -y bash && apt-get clean
- pip install nose
script:
- ./devscripts/run_tests.sh

704
ChangeLog
View file

@ -1,3 +1,707 @@
version 2021.08.01
Extractor
* [youtube] fixed agegate
* [niconico] dmc downloader from youtube-dlp
* [peertube] new URL schemas
version 2021.06.20
Core
* [playwright] fixed headlessness
+ [playwright] option to force a specific browser
Extractor
* [tiktok] fix empty video lists
* [youtube] fix and speed-up age-gate circumvention
* [youtube] fix videos with JS-like syntax
version 2021.06.01
Core
* merging formats by codecs
* [json_ld] better author extraction
+ --force-use-mastodon option
* support for HTTP 308 redirects
+ [test_execution] add test for lazy extractors
* Improve extract_info doc
* [options] Fix thumbnail option group name
Extractor
* [tvp:series] fallback to web
- [ninateka] remove extractor
* [tvn24] refactor handling next.js frontend
* [cda] fix premium videos for premium users (?)
* [tvp] support for tvp.info vue.js pages
+ [sejm.gov.pl] new extractors
+ [senat.gov.pl] new extractors
* [spreaker] new url schemes
* [spreaker] support for embedded player
+ [spryciarze.pl] new extractors
+ [castos] new extractors
+ [magentamusik360] new extractor
+ [arnes] new extractor
+ [palcomp3] new extractor
* [screencastomatic] fix extraction
* [youku] update ccode
+ [line] support live.line.me
* [curiositystream] fix format extraction
* [jamendo] fix track extraction
* [pornhub] extracting DASH and HLS formats
* [mtv] fix Viacom A/B testing video player
+ [maoritv] new extractor
* [pluralsight] extend anti-throttling timeout
* [mastodon] support for soapbox and audio files
* [tvp] fix jp2.tvp.pl
* [youtube:channel] fix multiple page extraction
* [tvp:embed] handling formats better way
* [tvn] better extraction method choosing
* [tvp] fix tvp:website extracting with weird urls
+ [wppilot] new extractors
+ [mastodon] logging in to mastodon/pleroma
+ [mastodon] fetching posts via different instances
+ [mastodon] fetching peertube videos via pleroma instances
* [bbc] extract full description from __INITIAL_DATA__
* [tver] redirect all downloads to Brightcove
* [medaltv] fix extraction
* [francetvinfo] improve video id extraction
* [xfileshare] support for wolfstream.tv
* [tv2dk] fix extraction
* [svtplay] improve extraction
* [xtube] fix formats extraction
* [twitter] improve formats extraction from vmap URL
* [mastodon] cache apps on logging in
* [mastodon] support cards to external services
* [peertube] logging in
* [tiktok] deduplicate videos
+ [misskey] new extractor
+ [radiokapital] new extractors
* [youtube] fix videos with age gate
* [kaltura] Make embed code alternatives actually work
* [kaltura] Improve iframe extraction
* [dispeak] Improve FLV extraction
* [dispeak] DRY and update tests
* [gdcvault] Add support for HTML5 videos
* [funimation] Add support for optional lang code in URLs
* [medaltv] Relax _VALID_URL
- [blinkx] Remove extractor
* [orf:radio] Switch download URLs to HTTPS
+ [generic] Add Referer header for direct videojs download URLs
+ [vk] Add support for sibnet embeds
+ [generic] Add support for sibnet embeds
* [phoenix] Fix extraction
* [generic] Add support for og:audio
* [vivo] Add support for vivo.st
* [eroprofile] Fix extraction
* [playstuff] Add extractor
* [shahid] relax _VALID_URL
* [redbulltv] fix embed data extraction
* [vimeo] fix vimeo pro embed extraction
* [twitch:clips] Add access token query to download URLs
* [twitch:clips] Improve extraction
* [ted] Prefer own formats over external sources
* [ustream] Detect https embeds
* [ard] Relax _VALID_URL and fix video ids
version 2021.04.01
Core
- Removed Herobrine
Extractor
* [youtube] fixed GDPR consent workaround
* [instagram] improve title extraction and extract duration
* [francetvinfo] improve video ID extraction
* [vlive] merge all updates from YTDL
version 2021.03.30
Core
* `--ie-key` commandline option for selecting specific extractor
Extractor
* [tiktok] detect private videos
* [dw:article] fix extractor
+ [patroniteaudio] added extractor
+ [sbs] Add support for ondemand watch URLs
* [picarto] Fix live stream extraction
* [vimeo] Fix unlisted video extraction
* [ard] Improve clip id extraction
+ [zoom] Add support for zoom.us
* [bbc] Fix BBC IPlayer Episodes/Group extraction
* [zingmp3] Fix extraction
* [youtube] added workaround for cookie consent
version 2021.03.21
Core
* [playwright] More verbose errors
- Removed a lot of deprecated platform support code
* New win32 exe build system
+ Support for BitTorrent formats
+ Support for VTT subtitles in m3u8 (HLS) manifests
+ `release_timestamp` meta field
Extractor
+ [acast:player] new extractor
+ [videotarget] new extractor
* [youtube] caching extracted signature functions
* [go] fix extraction
* [youtube] more descriptive geo-lock messages (with countries)
* [polskieradio] podcast support
* [onnetwork] refactored extraction
+ [tiktok] hashtag and music extractors
* [peertube] bittorrent formats
* [generic] detecting bittorrent manifest files
+ bittorrent magnet extractor
* [generic] extracting mpd manifests properly
* [youtube] better signature handling for DASH formats
* [youtube] some DASH formats are now just static files
+ [polskieradio] radiokierowcow.pl extractor
* [pulsevideo] unduplicating formats
+ [tvp:embed] extracting video subtitles
+ [bandaichannel] Add new extractor
* [urplay] fix episode data extraction
* [stretchinternet] Fix extraction
* [zdf] Rework extractors
+ [bbc] add support for BBC Reel videos
* [9c9media] fix extraction for videos with multiple ContentPackages
* [voxmedia] fix volume embed extraction
* [trovo] Add Origin header to VOD formats
* [cbs] add support for Paramount+
* [bilibili] fix video info extraction
* [pornhub] Extract formats from get_media end point
* [pornhub] Detect flagged videos
* [bandcamp] Extract release_timestamp
* [shahid] fix format extraction
* [fujitv] fix HLS formats extension
* [tver] improve title extraction
* [pinterest] reduce the number of HLS format requests
* [sportdeutschland] fix extraction
* [southpark] Fix extraction and add support for southparkstudios.com
* [rtve] improve extraction
* [applepodcasts] fix extraction
* [svtplay] Improve extraction
* [mlb] fix video extracion
* [vvvvid] fix kenc format extraction
* [vimeo:album] Fix extraction for albums with number of videos multiple to page size
* [peertube] improve thumbnail extraction
* [yandexmusic] Refactor and add support for artist's tracks and albums
* [yandexmusic:album] Improve album title extraction
* [yandexmusic] DRY _VALID_URL base
* [yandexmusic] Add support for music.yandex.com
* [yandexmusic:playlist] Request missing tracks in chunks
- [tvnplayer] removed extractor
* [youtube] meaningful error for age-gated no-embed videos
version 2021.03.01
Extractor
* [cda] logging in with a user account
* [peertube] playlist, channel and account extractor
version 2021.02.27
Core
+ Use proxy sites option
Extractor
+ Nitter extractor
version 2021.02.26
A lot of changes merged back from youtube-dl, thanks to the Copykitku project
Core
+ [postprocessor/embedthumbnail] Recognize atomicparsley binary in lowercase
* Introduce --output-na-placeholder (https://github.com/ytdl-org/youtube-dl/issues/27896)
* Protect from infinite recursion due to recursively nested
playlists (https://github.com/ytdl-org/youtube-dl/issues/27833)
* Ignore failure to create existing directory (https://github.com/ytdl-org/youtube-dl/issues/27811)
* Raise syntax error for format selection expressions with multiple
+ operators (https://github.com/ytdl-org/youtube-dl/issues/27803)
* [downloader/hls] Disable decryption in tests (https://github.com/ytdl-org/youtube-dl/issues/27660)
+ [utils] Add a function to clean podcast URLs
* [utils] Accept only supported protocols in url_or_none
* Allow format filtering using audio language (https://github.com/ytdl-org/youtube-dl/issues/16209)
* [common] Remove unwanted query params from unsigned akamai manifest URLs
* [extractor/common] Improve JSON-LD interaction statistic extraction (https://github.com/ytdl-org/youtube-dl/issues/23306)
* [downloader/hls] Delegate manifests with media initialization to ffmpeg
+ [extractor/common] Document duration meta field for playlists
* Improve thumbnail filename deducing (https://github.com/ytdl-org/youtube-dl/issues/26010, https://github.com/ytdl-org/youtube-dl/issues/27244)
* [extractor/common] Fix inline HTML5 media tags processing (https://github.com/ytdl-org/youtube-dl/issues/27345)
* [extractor/common] Extract timestamp from Last-Modified header
+ [extractor/common] Add support for dl8-* media tags (https://github.com/ytdl-org/youtube-dl/issues/27283)
* [extractor/common] Fix media type extraction for HTML5 media tags
in start/end form
* [extractor/common] Improve Akamai HTTP format extraction
* Allow m3u8 manifest without an additional audio format
* Fix extraction for qualities starting with a number
* Write static debug to stderr and respect quiet for dynamic debug
(https://github.com/ytdl-org/youtube-dl/issues/14579, https://github.com/ytdl-org/youtube-dl/issues/22593)
* [downloader/fragment] Set final file's mtime according to last fragment's
Last-Modified header (https://github.com/ytdl-org/youtube-dl/issues/11718, https://github.com/ytdl-org/youtube-dl/issues/18384, https://github.com/ytdl-org/youtube-dl/issues/27138)
+ [extractor/common] Add generic support for akamai HTTP format extraction
* [downloader/http] Fix crash during urlopen caused by missing reason
of URLError
* Fix --ignore-errors for playlists with generator-based entries
of url_transparent (https://github.com/ytdl-org/youtube-dl/issues/27064)
* [extractor/common] Output error for invalid URLs in _is_valid_url (https://github.com/ytdl-org/youtube-dl/issues/21400,
https://github.com/ytdl-org/youtube-dl/issues/24151, https://github.com/ytdl-org/youtube-dl/issues/25617, https://github.com/ytdl-org/youtube-dl/issues/25618, https://github.com/ytdl-org/youtube-dl/issues/25586, https://github.com/ytdl-org/youtube-dl/issues/26068, https://github.com/ytdl-org/youtube-dl/issues/27072)
+ [extractor/common] next.js data search function
- Removed Herobrine
Extractor
* [apa] Fix and improve extraction (https://github.com/ytdl-org/youtube-dl/issues/27750)
+ [youporn] Extract duration (https://github.com/ytdl-org/youtube-dl/issues/28019)
+ [samplefocus] Add support for samplefocus.com (https://github.com/ytdl-org/youtube-dl/issues/27763)
+ [vimeo] Add support for unlisted video source format extraction
* [viki] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/26522, https://github.com/ytdl-org/youtube-dl/issues/28203)
* Extract uploader URL and episode number
* Report login required error
+ Extract 480p formats
* Fix API v4 calls
* [ninegag] Unescape title (https://github.com/ytdl-org/youtube-dl/issues/28201)
+ [dplay] Add support for de.hgtv.com (https://github.com/ytdl-org/youtube-dl/issues/28182)
+ [dplay] Add support for discoveryplus.com (https://github.com/ytdl-org/youtube-dl/issues/24698)
+ [simplecast] Add support for simplecast.com (https://github.com/ytdl-org/youtube-dl/issues/24107)
* [yandexmusic:playlist] Request missing tracks in chunks (https://github.com/ytdl-org/youtube-dl/issues/27355, https://github.com/ytdl-org/youtube-dl/issues/28184)
+ [storyfire] Add support for storyfire.com (https://github.com/ytdl-org/youtube-dl/issues/25628, https://github.com/ytdl-org/youtube-dl/issues/26349)
+ [zhihu] Add support for zhihu.com (https://github.com/ytdl-org/youtube-dl/issues/28177)
* [ccma] Fix timestamp parsing in python 2
+ [videopress] Add support for video.wordpress.com
* [kakao] Improve info extraction and detect geo restriction (https://github.com/ytdl-org/youtube-dl/issues/26577)
* [xboxclips] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27151)
* [ard] Improve formats extraction (https://github.com/ytdl-org/youtube-dl/issues/28155)
+ [canvas] Add support for dagelijksekost.een.be (https://github.com/ytdl-org/youtube-dl/issues/28119)
* [ign] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/24771)
+ [xhamster] Extract format filesize
+ [xhamster] Extract formats from xplayer settings (https://github.com/ytdl-org/youtube-dl/issues/28114)
* [archiveorg] Fix and improve extraction (https://github.com/ytdl-org/youtube-dl/issues/21330, https://github.com/ytdl-org/youtube-dl/issues/23586, https://github.com/ytdl-org/youtube-dl/issues/25277, https://github.com/ytdl-org/youtube-dl/issues/26780,
https://github.com/ytdl-org/youtube-dl/issues/27109, https://github.com/ytdl-org/youtube-dl/issues/27236, https://github.com/ytdl-org/youtube-dl/issues/28063)
* [urplay] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/28073, https://github.com/ytdl-org/youtube-dl/issues/28074)
* [azmedien] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/28064)
* [pornhub] Implement lazy playlist extraction
* [svtplay] Fix video id extraction (https://github.com/ytdl-org/youtube-dl/issues/28058)
+ [pornhub] Add support for authentication (https://github.com/ytdl-org/youtube-dl/issues/18797, https://github.com/ytdl-org/youtube-dl/issues/21416, https://github.com/ytdl-org/youtube-dl/issues/24294)
* [pornhub:user] Improve paging
+ [pornhub:user] Add support for URLs unavailable via /videos page (https://github.com/ytdl-org/youtube-dl/issues/27853)
+ [bravotv] Add support for oxygen.com (https://github.com/ytdl-org/youtube-dl/issues/13357, https://github.com/ytdl-org/youtube-dl/issues/22500)
* [ccma] Improve metadata extraction (https://github.com/ytdl-org/youtube-dl/issues/27994)
+ Extract age limit, alt title, categories, series and episode number
* Fix timestamp multiple subtitles extraction
* [egghead] Update API domain (https://github.com/ytdl-org/youtube-dl/issues/28038)
- [vidzi] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/12629)
* [vidio] Improve metadata extraction
+ [vvvvid] Add support for youtube embeds (https://github.com/ytdl-org/youtube-dl/issues/27825)
* [vlive] Fix error message decoding for python 2 (https://github.com/ytdl-org/youtube-dl/issues/28004)
+ [awaan] Extract uploader id (https://github.com/ytdl-org/youtube-dl/issues/27963)
+ [medialaan] Add support DPG Media MyChannels based websites (https://github.com/ytdl-org/youtube-dl/issues/14871, https://github.com/ytdl-org/youtube-dl/issues/15597,
https://github.com/ytdl-org/youtube-dl/issues/16106, https://github.com/ytdl-org/youtube-dl/issues/16489)
* [abcnews] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/12394, https://github.com/ytdl-org/youtube-dl/issues/27920)
* [AMP] Fix upload date and timestamp extraction (https://github.com/ytdl-org/youtube-dl/issues/27970)
* [tv4] Relax URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/27964)
+ [tv2] Add support for mtvuutiset.fi (https://github.com/ytdl-org/youtube-dl/issues/27744)
* [adn] Improve login warning reporting
* [zype] Fix uplynk id extraction (https://github.com/ytdl-org/youtube-dl/issues/27956)
+ [adn] Add support for authentication (https://github.com/ytdl-org/youtube-dl/issues/17091, https://github.com/ytdl-org/youtube-dl/issues/27841, https://github.com/ytdl-org/youtube-dl/issues/27937)
* [franceculture] Make thumbnail optional (https://github.com/ytdl-org/youtube-dl/issues/18807)
* [franceculture] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27891, https://github.com/ytdl-org/youtube-dl/issues/27903)
* [njpwworld] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27890)
* [comedycentral] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27905)
* [wat] Fix format extraction (https://github.com/ytdl-org/youtube-dl/issues/27901)
+ [americastestkitchen:season] Add support for seasons (https://github.com/ytdl-org/youtube-dl/issues/27861)
+ [trovo] Add support for trovo.live (https://github.com/ytdl-org/youtube-dl/issues/26125)
+ [aol] Add support for yahoo videos (https://github.com/ytdl-org/youtube-dl/issues/26650)
* [yahoo] Fix single video extraction
* [9gag] Fix and improve extraction (https://github.com/ytdl-org/youtube-dl/issues/23022)
* [americastestkitchen] Improve metadata extraction for ATK episodes (https://github.com/ytdl-org/youtube-dl/issues/27860)
* [aljazeera] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/20911, https://github.com/ytdl-org/youtube-dl/issues/27779)
+ [minds] Add support for minds.com (https://github.com/ytdl-org/youtube-dl/issues/17934)
* [ard] Fix title and description extraction (https://github.com/ytdl-org/youtube-dl/issues/27761)
+ [spotify] Add support for Spotify Podcasts (https://github.com/ytdl-org/youtube-dl/issues/27443)
+ [animeondemand] Add support for lazy playlist extraction (https://github.com/ytdl-org/youtube-dl/issues/27829)
* [youporn] Restrict fallback download URL (https://github.com/ytdl-org/youtube-dl/issues/27822)
* [youporn] Improve height and tbr extraction (https://github.com/ytdl-org/youtube-dl/issues/20425, https://github.com/ytdl-org/youtube-dl/issues/23659)
* [youporn] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27822)
+ [twitter] Add support for unified cards (https://github.com/ytdl-org/youtube-dl/issues/27826)
+ [twitch] Add Authorization header with OAuth token for GraphQL requests
(https://github.com/ytdl-org/youtube-dl/issues/27790)
* [mixcloud:playlist:base] Extract video id in flat playlist mode (https://github.com/ytdl-org/youtube-dl/issues/27787)
* [cspan] Improve info extraction (https://github.com/ytdl-org/youtube-dl/issues/27791)
* [adn] Improve info extraction
* [adn] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/26963, https://github.com/ytdl-org/youtube-dl/issues/27732)
* [twitch] Improve login error extraction
* [twitch] Fix authentication (https://github.com/ytdl-org/youtube-dl/issues/27743)
* [3qsdn] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/21058)
* [peertube] Extract formats from streamingPlaylists (https://github.com/ytdl-org/youtube-dl/issues/26002, https://github.com/ytdl-org/youtube-dl/issues/27586, https://github.com/ytdl-org/youtube-dl/issues/27728)
* [khanacademy] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/2887, https://github.com/ytdl-org/youtube-dl/issues/26803)
* [spike] Update Paramount Network feed URL (https://github.com/ytdl-org/youtube-dl/issues/27715)
* [rai] Improve subtitles extraction (https://github.com/ytdl-org/youtube-dl/issues/27698, https://github.com/ytdl-org/youtube-dl/issues/27705)
* [canvas] Match only supported VRT NU URLs (https://github.com/ytdl-org/youtube-dl/issues/27707)
+ [bibeltv] Add support for bibeltv.de (https://github.com/ytdl-org/youtube-dl/issues/14361)
+ [bfmtv] Add support for bfmtv.com (https://github.com/ytdl-org/youtube-dl/issues/16053, https://github.com/ytdl-org/youtube-dl/issues/26615)
+ [sbs] Add support for ondemand play and news embed URLs (https://github.com/ytdl-org/youtube-dl/issues/17650, https://github.com/ytdl-org/youtube-dl/issues/27629)
* [twitch] Drop legacy kraken API v5 code altogether and refactor
* [twitch:vod] Switch to GraphQL for video metadata
* [canvas] Fix VRT NU extraction (https://github.com/ytdl-org/youtube-dl/issues/26957, https://github.com/ytdl-org/youtube-dl/issues/27053)
* [twitch] Switch access token to GraphQL and refactor (https://github.com/ytdl-org/youtube-dl/issues/27646)
+ [rai] Detect ContentItem in iframe (https://github.com/ytdl-org/youtube-dl/issues/12652, https://github.com/ytdl-org/youtube-dl/issues/27673)
* [ketnet] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27662)
+ [dplay] Add suport Discovery+ domains (https://github.com/ytdl-org/youtube-dl/issues/27680)
* [motherless] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/26495, https://github.com/ytdl-org/youtube-dl/issues/27450)
* [motherless] Fix recent videos upload date extraction (https://github.com/ytdl-org/youtube-dl/issues/27661)
* [nrk] Fix extraction for videos without a legalAge rating
- [googleplus] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/4955, https://github.com/ytdl-org/youtube-dl/issues/7400)
+ [applepodcasts] Add support for podcasts.apple.com (https://github.com/ytdl-org/youtube-dl/issues/25918)
+ [googlepodcasts] Add support for podcasts.google.com
+ [iheart] Add support for iheart.com (https://github.com/ytdl-org/youtube-dl/issues/27037)
* [acast] Clean podcast URLs
* [stitcher] Clean podcast URLs
+ [xfileshare] Add support for aparat.cam (https://github.com/ytdl-org/youtube-dl/issues/27651)
+ [twitter] Add support for summary card (https://github.com/ytdl-org/youtube-dl/issues/25121)
* [twitter] Try to use a Generic fallback for unknown twitter cards (https://github.com/ytdl-org/youtube-dl/issues/25982)
+ [stitcher] Add support for shows and show metadata extraction (https://github.com/ytdl-org/youtube-dl/issues/20510)
* [stv] Improve episode id extraction (https://github.com/ytdl-org/youtube-dl/issues/23083)
* [nrk] Improve series metadata extraction (https://github.com/ytdl-org/youtube-dl/issues/27473)
+ [nrk] Extract subtitles
* [nrk] Fix age limit extraction
* [nrk] Improve video id extraction
+ [nrk] Add support for podcasts (https://github.com/ytdl-org/youtube-dl/issues/27634, https://github.com/ytdl-org/youtube-dl/issues/27635)
* [nrk] Generalize and delegate all item extractors to nrk
+ [nrk] Add support for mp3 formats
* [nrktv] Switch to playback endpoint
* [vvvvid] Fix season metadata extraction (https://github.com/ytdl-org/youtube-dl/issues/18130)
* [stitcher] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/20811, https://github.com/ytdl-org/youtube-dl/issues/27606)
* [acast] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/21444, https://github.com/ytdl-org/youtube-dl/issues/27612, https://github.com/ytdl-org/youtube-dl/issues/27613)
+ [arcpublishing] Add support for arcpublishing.com (https://github.com/ytdl-org/youtube-dl/issues/2298, https://github.com/ytdl-org/youtube-dl/issues/9340, https://github.com/ytdl-org/youtube-dl/issues/17200)
+ [sky] Add support for Sports News articles and Brighcove videos (https://github.com/ytdl-org/youtube-dl/issues/13054)
+ [vvvvid] Extract akamai formats
* [vvvvid] Skip unplayable episodes (https://github.com/ytdl-org/youtube-dl/issues/27599)
* [yandexvideo] Fix extraction for Python 3.4
+ [redditr] Extract all thumbnails (https://github.com/ytdl-org/youtube-dl/issues/27503)
* [vvvvid] Improve info extraction
+ [vvvvid] Add support for playlists (https://github.com/ytdl-org/youtube-dl/issues/18130, https://github.com/ytdl-org/youtube-dl/issues/27574)
* [yandexvideo] Use old API call as fallback
* [yandexvideo] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/25000)
- [nbc] Remove CSNNE extractor
* [nbc] Fix NBCSport VPlayer URL extraction (https://github.com/ytdl-org/youtube-dl/issues/16640)
+ [aenetworks] Add support for biography.com (https://github.com/ytdl-org/youtube-dl/issues/3863)
* [uktvplay] Match new video URLs (https://github.com/ytdl-org/youtube-dl/issues/17909)
* [sevenplay] Detect API errors
* [tenplay] Fix format extraction (https://github.com/ytdl-org/youtube-dl/issues/26653)
* [brightcove] Raise error for DRM protected videos (https://github.com/ytdl-org/youtube-dl/issues/23467, https://github.com/ytdl-org/youtube-dl/issues/27568)
* [aparat] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/22285, https://github.com/ytdl-org/youtube-dl/issues/22611, https://github.com/ytdl-org/youtube-dl/issues/23348, https://github.com/ytdl-org/youtube-dl/issues/24354, https://github.com/ytdl-org/youtube-dl/issues/24591, https://github.com/ytdl-org/youtube-dl/issues/24904,
https://github.com/ytdl-org/youtube-dl/issues/25418, https://github.com/ytdl-org/youtube-dl/issues/26070, https://github.com/ytdl-org/youtube-dl/issues/26350, https://github.com/ytdl-org/youtube-dl/issues/26738, https://github.com/ytdl-org/youtube-dl/issues/27563)
- [brightcove] Remove sonyliv specific code
* [piksel] Improve format extraction
+ [zype] Add support for uplynk videos
+ [toggle] Add support for live.mewatch.sg (https://github.com/ytdl-org/youtube-dl/issues/27555)
+ [go] Add support for fxnow.fxnetworks.com (https://github.com/ytdl-org/youtube-dl/issues/13972, https://github.com/ytdl-org/youtube-dl/issues/22467, https://github.com/ytdl-org/youtube-dl/issues/23754, https://github.com/ytdl-org/youtube-dl/issues/26826)
* [teachable] Improve embed detection (https://github.com/ytdl-org/youtube-dl/issues/26923)
* [mitele] Fix free video extraction (https://github.com/ytdl-org/youtube-dl/issues/24624, https://github.com/ytdl-org/youtube-dl/issues/25827, https://github.com/ytdl-org/youtube-dl/issues/26757)
* [telecinco] Fix extraction
* [youtube] Update invidious.snopyta.org (https://github.com/ytdl-org/youtube-dl/issues/22667)
* [amcnetworks] Improve auth only video detection (https://github.com/ytdl-org/youtube-dl/issues/27548)
+ [generic] Add support for VHX Embeds (https://github.com/ytdl-org/youtube-dl/issues/27546)
* [instagram] Fix comment count extraction
+ [instagram] Add support for reel URLs (https://github.com/ytdl-org/youtube-dl/issues/26234, https://github.com/ytdl-org/youtube-dl/issues/26250)
* [bbc] Switch to media selector v6 (https://github.com/ytdl-org/youtube-dl/issues/23232, https://github.com/ytdl-org/youtube-dl/issues/23933, https://github.com/ytdl-org/youtube-dl/issues/26303, https://github.com/ytdl-org/youtube-dl/issues/26432, https://github.com/ytdl-org/youtube-dl/issues/26821,
https://github.com/ytdl-org/youtube-dl/issues/27538)
* [instagram] Improve thumbnail extraction
* [instagram] Fix extraction when authenticated (https://github.com/ytdl-org/youtube-dl/issues/22880, https://github.com/ytdl-org/youtube-dl/issues/26377, https://github.com/ytdl-org/youtube-dl/issues/26981,
https://github.com/ytdl-org/youtube-dl/issues/27422)
* [spankbang:playlist] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/24087)
+ [spankbang] Add support for playlist videos
* [pornhub] Improve like and dislike count extraction (https://github.com/ytdl-org/youtube-dl/issues/27356)
* [pornhub] Fix lq formats extraction (https://github.com/ytdl-org/youtube-dl/issues/27386, https://github.com/ytdl-org/youtube-dl/issues/27393)
+ [bongacams] Add support for bongacams.com (https://github.com/ytdl-org/youtube-dl/issues/27440)
* [theweatherchannel] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/25930, https://github.com/ytdl-org/youtube-dl/issues/26051)
+ [sprout] Add support for Universal Kids (https://github.com/ytdl-org/youtube-dl/issues/22518)
* [theplatform] Allow passing geo bypass countries from other extractors
+ [wistia] Add support for playlists (https://github.com/ytdl-org/youtube-dl/issues/27533)
+ [ctv] Add support for ctv.ca (https://github.com/ytdl-org/youtube-dl/issues/27525)
* [9c9media] Improve info extraction
* [sonyliv] Fix title for movies
* [sonyliv] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/25667)
* [streetvoice] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27455, https://github.com/ytdl-org/youtube-dl/issues/27492)
+ [facebook] Add support for watchparty pages (https://github.com/ytdl-org/youtube-dl/issues/27507)
* [cbslocal] Fix video extraction
+ [brightcove] Add another method to extract policyKey
* [mewatch] Relax URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/27506)
- [tastytrade] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/25716)
* [niconico] Fix playlist extraction (https://github.com/ytdl-org/youtube-dl/issues/27428)
- [everyonesmixtape] Remove extractor
- [kanalplay] Remove extractor
* [arkena] Fix extraction
* [nba] Rewrite extractor
* [turner] Improve info extraction
* [generic] Improve RSS age limit extraction
* [generic] Fix RSS itunes thumbnail extraction (https://github.com/ytdl-org/youtube-dl/issues/27405)
+ [redditr] Extract duration (https://github.com/ytdl-org/youtube-dl/issues/27426)
- [zaq1] Remove extractor
+ [asiancrush] Add support for retrocrush.tv
* [asiancrush] Fix extraction
- [noco] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/10864)
* [nfl] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/22245)
* [skysports] Relax URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/27435)
+ [tv5unis] Add support for tv5unis.ca (https://github.com/ytdl-org/youtube-dl/issues/22399, https://github.com/ytdl-org/youtube-dl/issues/24890)
+ [videomore] Add support for more.tv (https://github.com/ytdl-org/youtube-dl/issues/27088)
+ [nhk:program] Add support for audio programs and program clips
+ [nhk] Add support for NHK video programs (https://github.com/ytdl-org/youtube-dl/issues/27230)
* [mdr] Bypass geo restriction
* [mdr] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/24346, https://github.com/ytdl-org/youtube-dl/issues/26873)
* [eporner] Fix view count extraction and make optional (https://github.com/ytdl-org/youtube-dl/issues/23306)
+ [eporner] Extend URL regular expression
* [eporner] Fix hash extraction and extend _VALID_URL (https://github.com/ytdl-org/youtube-dl/issues/27396)
* [slideslive] Use m3u8 entry protocol for m3u8 formats (https://github.com/ytdl-org/youtube-dl/issues/27400)
* [twitcasting] Fix format extraction and improve info extraction (https://github.com/ytdl-org/youtube-dl/issues/24868)
* [linuxacademy] Fix authentication and extraction (https://github.com/ytdl-org/youtube-dl/issues/21129, https://github.com/ytdl-org/youtube-dl/issues/26223, https://github.com/ytdl-org/youtube-dl/issues/27402)
* [itv] Clean description from HTML tags (https://github.com/ytdl-org/youtube-dl/issues/27399)
* [vlive] Sort live formats (https://github.com/ytdl-org/youtube-dl/issues/27404)
* [hotstart] Fix and improve extraction
* Fix format extraction (https://github.com/ytdl-org/youtube-dl/issues/26690)
+ Extract thumbnail URL (https://github.com/ytdl-org/youtube-dl/issues/16079, https://github.com/ytdl-org/youtube-dl/issues/20412)
+ Add support for country specific playlist URLs (https://github.com/ytdl-org/youtube-dl/issues/23496)
* Select the last id in video URL (https://github.com/ytdl-org/youtube-dl/issues/26412)
+ [youtube] Add some invidious instances (https://github.com/ytdl-org/youtube-dl/issues/27373)
+ [ruutu] Extract more metadata
+ [ruutu] Detect non-free videos (https://github.com/ytdl-org/youtube-dl/issues/21154)
* [ruutu] Authenticate format URLs (https://github.com/ytdl-org/youtube-dl/issues/21031, https://github.com/ytdl-org/youtube-dl/issues/26782)
+ [ruutu] Add support for static.nelonenmedia.fi (https://github.com/ytdl-org/youtube-dl/issues/25412)
+ [ruutu] Extend URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/24839)
+ [facebook] Add support archived live video URLs (https://github.com/ytdl-org/youtube-dl/issues/15859)
* [wdr] Improve overall extraction
+ [wdr] Extend subtitles extraction (https://github.com/ytdl-org/youtube-dl/issues/22672, https://github.com/ytdl-org/youtube-dl/issues/22723)
+ [facebook] Add support for videos attached to Relay based story pages
(https://github.com/ytdl-org/youtube-dl/issues/10795)
+ [wdr:page] Add support for kinder.wdr.de (https://github.com/ytdl-org/youtube-dl/issues/27350)
+ [facebook] Add another regular expression for handleServerJS
* [facebook] Fix embed page extraction
+ [facebook] Add support for Relay post pages (https://github.com/ytdl-org/youtube-dl/issues/26935)
+ [facebook] Add support for watch videos (https://github.com/ytdl-org/youtube-dl/issues/22795, https://github.com/ytdl-org/youtube-dl/issues/27062)
+ [facebook] Add support for group posts with multiple videos (https://github.com/ytdl-org/youtube-dl/issues/19131)
* [itv] Fix series metadata extraction (https://github.com/ytdl-org/youtube-dl/issues/26897)
- [itv] Remove old extraction method (https://github.com/ytdl-org/youtube-dl/issues/23177)
* [facebook] Redirect mobile URLs to desktop URLs (https://github.com/ytdl-org/youtube-dl/issues/24831, https://github.com/ytdl-org/youtube-dl/issues/25624)
+ [facebook] Add support for Relay based pages (https://github.com/ytdl-org/youtube-dl/issues/26823)
* [facebook] Try to reduce unnecessary tahoe requests
- [facebook] Remove hardcoded Chrome User-Agent (https://github.com/ytdl-org/youtube-dl/issues/18974, https://github.com/ytdl-org/youtube-dl/issues/25411, https://github.com/ytdl-org/youtube-dl/issues/26958,
https://github.com/ytdl-org/youtube-dl/issues/27329)
- [smotri] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/27358)
- [beampro] Remove extractor (https://github.com/ytdl-org/youtube-dl/issues/17290, https://github.com/ytdl-org/youtube-dl/issues/22871, https://github.com/ytdl-org/youtube-dl/issues/23020, https://github.com/ytdl-org/youtube-dl/issues/23061, https://github.com/ytdl-org/youtube-dl/issues/26099)
+ [tubitv] Extract release year (https://github.com/ytdl-org/youtube-dl/issues/27317)
* [amcnetworks] Fix free content extraction (https://github.com/ytdl-org/youtube-dl/issues/20354)
+ [telequebec] Add support for video.telequebec.tv (https://github.com/ytdl-org/youtube-dl/issues/27339)
* [telequebec] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/25733, https://github.com/ytdl-org/youtube-dl/issues/26883)
* [tvplay:home] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/21153)
* [americastestkitchen] Fix Extraction and add support
for Cook's Country and Cook's Illustrated (https://github.com/ytdl-org/youtube-dl/issues/17234, https://github.com/ytdl-org/youtube-dl/issues/27322)
+ [slideslive] Add support for yoda service videos and extract subtitles
(https://github.com/ytdl-org/youtube-dl/issues/27323)
* [aenetworks] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/23363, https://github.com/ytdl-org/youtube-dl/issues/23390, https://github.com/ytdl-org/youtube-dl/issues/26795, https://github.com/ytdl-org/youtube-dl/issues/26985)
* Fix Fastly format extraction
+ Add support for play and watch subdomains
+ Extract series metadata
+ [generic] Extract RSS video description, timestamp and itunes metadata
(https://github.com/ytdl-org/youtube-dl/issues/27177)
* [nrk] Reduce the number of instalments and episodes requests
* [nrk] Improve extraction
* Improve format extraction for old akamai formats
+ Add is_live value to entry info dict
* Request instalments only when available
* Fix skole extraction
+ [peertube] Extract fps
+ [peertube] Recognize audio-only formats (https://github.com/ytdl-org/youtube-dl/issues/27295)
* [teachable:course] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/24507, https://github.com/ytdl-org/youtube-dl/issues/27286)
* [nrk] Improve error extraction
* [nrktv:series] Improve extraction (https://github.com/ytdl-org/youtube-dl/issues/21926)
* [nrktv:season] Improve extraction
* [nrk] Improve format extraction and geo-restriction detection (https://github.com/ytdl-org/youtube-dl/issues/24221)
* [pornhub] Handle HTTP errors gracefully (https://github.com/ytdl-org/youtube-dl/issues/26414)
* [nrktv] Relax URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/27299, https://github.com/ytdl-org/youtube-dl/issues/26185)
+ [zdf] Extract webm formats (https://github.com/ytdl-org/youtube-dl/issues/26659)
+ [gamespot] Extract DASH and HTTP formats
+ [tver] Add support for tver.jp (https://github.com/ytdl-org/youtube-dl/issues/26662, https://github.com/ytdl-org/youtube-dl/issues/27284)
+ [pornhub] Add support for pornhub.org (https://github.com/ytdl-org/youtube-dl/issues/27276)
+ [tva] Add support for qub.ca (https://github.com/ytdl-org/youtube-dl/issues/27235)
+ [toggle] Detect DRM protected videos (https://github.com/ytdl-org/youtube-dl/issues/16479, https://github.com/ytdl-org/youtube-dl/issues/20805)
+ [toggle] Add support for new MeWatch URLs (https://github.com/ytdl-org/youtube-dl/issues/27256)
+ [cspan] Extract info from jwplayer data (https://github.com/ytdl-org/youtube-dl/issues/3672, https://github.com/ytdl-org/youtube-dl/issues/3734, https://github.com/ytdl-org/youtube-dl/issues/10638, https://github.com/ytdl-org/youtube-dl/issues/13030,
https://github.com/ytdl-org/youtube-dl/issues/18806, https://github.com/ytdl-org/youtube-dl/issues/23148, https://github.com/ytdl-org/youtube-dl/issues/24461, https://github.com/ytdl-org/youtube-dl/issues/26171, https://github.com/ytdl-org/youtube-dl/issues/26800, https://github.com/ytdl-org/youtube-dl/issues/27263)
* [cspan] Pass Referer header with format's video URL (https://github.com/ytdl-org/youtube-dl/issues/26032, https://github.com/ytdl-org/youtube-dl/issues/25729)
+ [mediaset] Add support for movie URLs (https://github.com/ytdl-org/youtube-dl/issues/27240)
* [drtv] Extend URL regular expression (https://github.com/ytdl-org/youtube-dl/issues/27243)
+ [ina] Add support for mobile URLs (https://github.com/ytdl-org/youtube-dl/issues/27229)
* [pornhub] Fix like and dislike count extraction (https://github.com/ytdl-org/youtube-dl/issues/27227, https://github.com/ytdl-org/youtube-dl/issues/27234)
* [youtube] Improve yt initial player response extraction (https://github.com/ytdl-org/youtube-dl/issues/27216)
* [videa] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/25650, https://github.com/ytdl-org/youtube-dl/issues/25973, https://github.com/ytdl-org/youtube-dl/issues/26301)
+ [spreaker] Add support for spreaker.com (https://github.com/ytdl-org/youtube-dl/issues/13480, https://github.com/ytdl-org/youtube-dl/issues/13877)
* [vlive] Improve extraction for geo-restricted videos
+ [vlive] Add support for post URLs (https://github.com/ytdl-org/youtube-dl/issues/27122, https://github.com/ytdl-org/youtube-dl/issues/27123)
* [viki] Fix video API request (https://github.com/ytdl-org/youtube-dl/issues/27184)
* [bbc] Fix BBC Three clip extraction
* [bbc] Fix BBC News videos extraction
+ [medaltv] Add support for medal.tv (https://github.com/ytdl-org/youtube-dl/issues/27149)
* [nrk] Fix extraction
+ [pinterest] Add support for large collections (more than 25 pins)
+ [franceinter] Extract thumbnail (https://github.com/ytdl-org/youtube-dl/issues/27153)
+ [box] Add support for box.com (https://github.com/ytdl-org/youtube-dl/issues/5949)
+ [nytimes] Add support for cooking.nytimes.com (https://github.com/ytdl-org/youtube-dl/issues/27112, https://github.com/ytdl-org/youtube-dl/issues/27143)
+ [rumble] Add support for embed pages (https://github.com/ytdl-org/youtube-dl/issues/10785)
+ [skyit] Add support for multiple Sky Italia websites (https://github.com/ytdl-org/youtube-dl/issues/26629)
+ [pinterest] Add support for pinterest.com (https://github.com/ytdl-org/youtube-dl/issues/25747)
+ [svtplay] Add support for svt.se/barnkanalen (https://github.com/ytdl-org/youtube-dl/issues/24817)
+ [svt] Extract timestamp (https://github.com/ytdl-org/youtube-dl/issues/27130)
* [svtplay] Improve thumbnail extraction (https://github.com/ytdl-org/youtube-dl/issues/27130)
* [infoq] Fix format extraction (https://github.com/ytdl-org/youtube-dl/issues/25984)
* [francetv] Update to fix thumbnail URL issue (https://github.com/ytdl-org/youtube-dl/issues/27120)
+ [discoverynetworks] Add support new TLC/DMAX URLs (https://github.com/ytdl-org/youtube-dl/issues/27100)
* [rai] Fix protocol relative relinker URLs (https://github.com/ytdl-org/youtube-dl/issues/22766)
* [rai] Fix unavailable video format detection
* [rai] Improve extraction
* [rai] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27077)
* [viki] Improve format extraction
* [viki] Fix stream extraction from MPD (https://github.com/ytdl-org/youtube-dl/issues/27092)
+ [amara] Add support for amara.org (https://github.com/ytdl-org/youtube-dl/issues/20618)
* [vimeo:album] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27079)
* [mtv] Fix mgid extraction (https://github.com/ytdl-org/youtube-dl/issues/26841)
* [youporn] Fix upload date extraction
* [youporn] Make comment count optional (https://github.com/ytdl-org/youtube-dl/issues/26986)
* [arte] Rework extractors
* Reimplement embed and playlist extractors to delegate to the single
entrypoint artetv extractor
* Improve embeds detection (https://github.com/ytdl-org/youtube-dl/issues/27057)
+ [arte] Extract m3u8 formats (https://github.com/ytdl-org/youtube-dl/issues/27061)
* [mgtv] Fix format extraction (https://github.com/ytdl-org/youtube-dl/issues/26415)
* [francetv] Improve info extraction
+ [francetv] Add fallback video URL extraction (https://github.com/ytdl-org/youtube-dl/issues/27047)
* [spiegel] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/24206, https://github.com/ytdl-org/youtube-dl/issues/24767)
* [malltv] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/27035)
+ [bandcamp] Extract playlist description (https://github.com/ytdl-org/youtube-dl/issues/22684)
* [urplay] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/26828)
* [lrt] Fix extraction with empty tags (https://github.com/ytdl-org/youtube-dl/issues/20264)
+ [ndr:embed:base] Extract subtitles (https://github.com/ytdl-org/youtube-dl/issues/25447, https://github.com/ytdl-org/youtube-dl/issues/26106)
+ [servus] Add support for pm-wissen.com (https://github.com/ytdl-org/youtube-dl/issues/25869)
* [servus] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/26872, https://github.com/ytdl-org/youtube-dl/issues/26967, https://github.com/ytdl-org/youtube-dl/issues/26983, https://github.com/ytdl-org/youtube-dl/issues/27000)
* [xtube] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/26996)
* [lrt] Fix extraction
+ [condenast] Extract subtitles
* [condenast] Fix extraction
* [bandcamp] Fix extraction (https://github.com/ytdl-org/youtube-dl/issues/26681, https://github.com/ytdl-org/youtube-dl/issues/26684)
* [rai] Fix RaiPlay extraction (https://github.com/ytdl-org/youtube-dl/issues/26064, https://github.com/ytdl-org/youtube-dl/issues/26096)
* [vlive] Fix extraction
* [usanetwork] Fix extraction
* [nbc] Fix NBCNews/Today/MSNBC extraction
* [cnbc] Fix extraction
+ [transistorfm] new extractor
* [x-link] improved embed searching
+ [PolskaPress] new extractor
* [tvn24] next.js frontend extraction without playwright
version 2021.02.23
Extractor
* [youtube] new crypto
* [youtube] fixed automatic crypto extraction
version 2021.02.22
Extractor
* [peertube] reduced request amount
* [youtube] added new URL schemes
* [tvn24] fix next.js frontend extraction
* [wp.pl] added extractor
* [albicla] added extractor
* [x-news] now detects embeds
* [lurker] domain update
* [tiktok] user profile extractor
* [simplify] libsyn extraction
* [tvp:embed] new URL scheme
* [pulsembed] new extractor
* [vod.pl] fixed extraction
* [klip.rs] fixed extraction
* [tvn24] fixed Fakty extractor
* [youtube] new crypto
Core
* we no longer support Python2.7
* jwplayer has a meaningful error if data wasn't found
version 2021.01.24
Extractor
* [youtube] (you've guessed it) new crypto
* [lurker] new extractor
* [wyborcza] new extractor(s)
* [okopress] new extractor
* [youtube] minor playlist improvements
* [tvn24] fixed extractor (now uses playwright)
* [tokfm] podcast extractor fixes
* [youtube] match `/shorts/` URLs
* [pornhub] refactor scraping protection workaround to playwright
Core:
* _json_ld now extracts podcast objects
* playwright wrapper (replaces PhantomJS)
* ...also, we removed PhantomJS
version 2021.01.16
Extractor
* [youtube] new crypto
* [tvp] Laura probably fixed something, I can't keep up
* [linkedin] post extractor
* [weibo] dash formats support
* [vimeo:review] fix videos with video password
* [gtv.org] extractor that **may** _sometimes_ work
* [youtube] oh cool, playlist and channels got fixed
* [youtube] youtube-music extractor
* [lbry] new extractor
* [embetty] new extractor, detection in genericie
* [heise] extracting embetty embeds
* [theguardian] new extractor(s)
* [tvp:series] API refactor (I KNEW IT)
* [tubafm] basic support
* [rmf] added extractors
* [cda] refactor to mobile JSON API
Core:
* _json_ld now handles multiple thumbails
* normalization of searching for embeds in genericie
version 2021.01.03
Extractor
* [youtube] new crypto
* [youtube] now extracts alt-title with english title if it differs
* [tokfm] Added extractor
* [ipla] Added extractor (thankies, @ptrcnull!)
* [tvnplayer] Added extractor (again, thanks Pati!)
* [funkwhale] Added extractors
* [tvp] fixed tvpabc
* [onet] removed deprecated extractors
version 2020.12.11
Extractor
* [youtube] fixed playlist extract
* [mastodon] Added extractor
* [generic] selfhosted services like peertube are now matched with a regexp, not a list
* [tvp] fixed extractor
* [tvp] refactored to new tvp API
* [tvp] Added regional pages support
* [tvp] Added livestream support
* [eskago] Added extractor
* [atttechchannel] fixed extractor
* [eurozet] player and article video extractors
* [onet] libsyn podcast support
* [polskieradio] Added livestream extractor
* [cda] Fixed adult pages
version 2020.11.27
Extractor
* [youtube] support for liked videos, watch later, video history,
subscriptions feed, fixed crash if no cipher on video
* [ninateka] added extractor
* [openfm] added extractor
version 2020.11.16
Extractor
* [youtube] fixed channel, playlist, search
version 2020.11.14
Extractor
* [youtube] fixed static crypto
* [youtube] new *tiny* crypto extracting system rewritten from bash by @selfisekai
version 2020.11.12
Extractor
* [tiktok] fixed extractor

View file

@ -9,7 +9,7 @@ PREFIX ?= /usr/local
BINDIR ?= $(PREFIX)/bin
MANDIR ?= $(PREFIX)/man
SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python
PYTHON ?= /usr/bin/env python3
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)

View file

@ -1,5 +1,97 @@
# Haruhi-DL
# [Haruhi-DL](https://haruhi.download/)
[![build status](https://img.shields.io/gitlab/pipeline/laudom/haruhi-dl/master?gitlab_url=https%3A%2F%2Fgit.sakamoto.pl&style=flat-square)](https://git.sakamoto.pl/laudom/haruhi-dl/-/pipelines)
[![PyPI Downloads](https://img.shields.io/pypi/dm/haruhi-dl?style=flat-square)](https://pypi.org/project/haruhi-dl/)
[![License: LGPL 3.0 or later](https://img.shields.io/pypi/l/haruhi-dl?style=flat-square)](https://git.sakamoto.pl/laudom/haruhi-dl/-/blob/master/README.md)
[![Sasin stole 70 million PLN](https://img.shields.io/badge/Sasin-stole%2070%20million%20PLN-orange?style=flat-square)](https://www.planeta.pl/Wiadomosci/Polityka/Ile-kosztowaly-karty-wyborcze-Sasin-do-wiezienia-Wybory-odwolane)
[![Trans rights!](https://img.shields.io/badge/Trans-rights!-5BCEFA?style=flat-square)](http://transfuzja.org/en/artykuly/trans_people_in_poland/situation.htm)
This is a fork of [youtube-dl](https://yt-dl.org/), focused on bringing a fast, steady stream of updates. We'll do our best to merge patches to any site, not only youtube.
If you want to contribute, send us a diff to [contribute@haruhi.download](mailto:contribute@haruhi.download)
Our main repository is on our GitLab: https://git.sakamoto.pl/laudompat/haruhi-dl
A Microsoft GitHub mirror exists as well: https://github.com/haruhi-dl/haruhi-dl
## Installing
System-specific ways:
- [Windows .exe files](https://git.sakamoto.pl/laudompat/haruhi-dl/-/releases) ([mirror](https://github.com/haruhi-dl/haruhi-dl/releases)) - just unpack and run the exe file in cmd/powershell! (ffmpeg/rtmpdump not included, playwright extractors won't work)
- [Arch Linux (AUR)](https://aur.archlinux.org/packages/haruhi-dl/) - `yay -S haruhi-dl` (managed by mlunax)
- [macOS (homebrew)](https://formulae.brew.sh/formula/haruhi-dl) - `brew install haruhi-dl` (managed by Homebrew)
haruhi-dl is also available on PyPI: [![version on PyPI](https://img.shields.io/pypi/v/haruhi-dl?style=flat-square)](https://pypi.org/project/haruhi-dl/)
Install release from PyPI on Python 3.x:
```sh
$ python3 -m pip install --upgrade haruhi-dl
```
Install from master (unstable) on Python 3.x:
```sh
$ python3 -m pip install --upgrade git+https://git.sakamoto.pl/laudompat/haruhi-dl.git
```
**Python 2 support is dropped, use Python 3.**
## Usage
```sh
$ haruhi-dl "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
```
That's it! You just got rickrolled!
Full manual with all options:
```sh
$ haruhi-dl --help
```
## Differences from youtube-dl
_This is not a complete list._
- Changed license from Unlicense to LGPL 3.0
- Extracting and downloading video with subtitles from m3u8 (HLS) - this also includes subtitles from Twitter and some other services
- Support for BitTorrent protocol (only used when explicitly enabled by user with `--allow-p2p` or `--prefer-p2p`; aria2c required)
- Specific way to handle selfhosted services (untied to specific providers/domains, like PeerTube, Funkwhale, Mastodon)
- Specific way to handle content proxy sites (like Nitter for Twitter)
- Merging formats by codecs instead of file extensions, if possible (you'd rather like your AV1+opus downloads from YouTube to be .webm, than .mkv, don't you?)
- New/improved/fixed extractors:
- PeerTube (extracting playlists, channels and user accounts, optionally downloading with BitTorrent)
- Funkwhale
- TikTok (extractors for user profiles, hashtags and music - all except single video and music with `--no-playlist` require Playwright)
- cda.pl
- Ipla
- Weibo (DASH formats)
- LinkedIn (videos from user posts)
- Acast
- Mastodon (including Pleroma, Gab Social, Soapbox)
- Ring Publishing (aka PulsEmbed, PulseVideo, OnetMVP; Ringier Axel Springer)
- TVP (support for TVPlayer2, client-rendered sites and TVP ABC, refactored some extractors to use mobile JSON API)
- TVN24 (support for main page, Fakty and magazine frontend)
- PolskieRadio
- Agora (wyborcza.pl video, wyborcza.pl/wysokieobcasy.pl/audycje.tokfm.pl podcasts, tuba.fm)
- sejm.gov.pl/senat.gov.pl
- Some improvements with handling JSON-LD
## Bug reports
Please send the bug details to <bug@haruhi.download> or on [Microsoft GitHub](https://github.com/haruhi-dl/haruhi-dl/issues).
## Contributing
If you want to contribute, send us a diff to <contribute@haruhi.download>, or submit a Pull Request on [our mirror at Microsoft GitHub](https://github.com/haruhi-dl/haruhi-dl).
Why contribute to this fork, and not youtube-dl?
- You make sure your contributions will always be free - under Unlicense, anyone can take your code, modify it, and close the source. LGPL 3.0 makes it clear, that any contributions must be published.
## Donations
If my contributions helped you, please consider sending me a small tip.
[![Buy Me a Coffee at ko-fi.com](https://cdn.ko-fi.com/cdn/kofi1.png?v=2)](https://ko-fi.com/selfisekai)

View file

@ -1,6 +1,10 @@
#!/usr/bin/env python
#!/usr/bin/env python3
import haruhi_dl
import sys
if __name__ == '__main__':
haruhi_dl.main()
if sys.version_info[0] == 2:
sys.exit('haruhi-dl no longer works on Python 2, use Python 3 instead')
else:
import haruhi_dl
haruhi_dl.main()

View file

@ -1,37 +1,40 @@
#!/bin/bash
func="$(cat $1 | grep -P '[a-z]\=a\.split.*a\.join')"
echo $func
data="$(curl -s "https://www.youtube.com/s/player/$1/player_ias.vflset/en_GB/base.js")"
func="$(grep -P '[a-z]\=a\.split\([\"'"'"']{2}.*a\.join' <<< "$data")"
echo "full extracted function: $func"
obfuscatedName="$(echo $func | grep -Poh '\(""\);[A-Za-z]+' | sed -s 's/("");//')"
obfuscatedName="$(grep -Poh '\(""\);[A-Za-z]+' <<< "$func" | sed -s 's/("");//')"
obfuscatedFunc=$(cat "$1" | tr -d '\n' | grep -Poh "$obfuscatedName\=.*?}}")
mess="$(echo "$obfuscatedFunc" | grep -Poh "..:function\([a-z]+,[a-z]+\){var" | grep -Poh "^..")"
rev="$(echo "$obfuscatedFunc" | grep -Poh "..:function\([a-z]+\){[a-z]+.rev" | grep -Poh "^..")"
splice="$(echo "$obfuscatedFunc" | grep -Poh "..:function\([a-z]+\,[a-z]+\){[a-z]+\." | grep -Poh "^..")"
obfuscatedFunc=$(tr -d '\n' <<< "$data" | grep -Poh "$obfuscatedName\=.*?}}")
mess="$(grep -Poh "..:function\([a-z]+,[a-z]+\){var" <<< "$obfuscatedFunc" | grep -Poh "^..")"
rev="$(grep -Poh "..:function\([a-z]+\){[a-z]+.rev" <<< "$obfuscatedFunc" | grep -Poh "^..")"
splice="$(grep -Poh "..:function\([a-z]+\,[a-z]+\){[a-z]+\." <<< "$obfuscatedFunc" | grep -Poh "^..")"
echo "mess name: $mess"
echo "reverse name: $rev"
echo "splice name: $splice"
code="$(echo "$func" | sed -E 's/.*[a-z]+\.split\(""\);//;s/return.*//')"
code="$(sed -E 's/.*[a-z]+\.split\(""\);//;s/return.*//' <<< "$func")"
echo "---"
IFS=';'
for i in $code; do
num="$(echo "$i" | grep -Poh ',[0-9]+' | grep -Poh '[0-9]+')"
num="$(grep -Poh ',[0-9]+' <<< "$i" | grep -Poh '[0-9]+')"
if [[ "$i" == *"$splice"* ]]; then
echo "a=a[$num:]"
echo "a = a[$num:]"
elif [[ "$i" == *"$rev"* ]]; then
echo "a.reverse()"
elif [[ "$i" == *"$mess"* ]]; then
echo "a=self.mess(a,$num)"
echo "a = self.mess(a, $num)"
else
echo "UNKNOWN????"
fi
done
echo --- and now, JS
echo "--- and now, JS"
for i in $code; do
num="$(echo "$i" | grep -Poh ',[0-9]+' | grep -Poh '[0-9]+')"
num="$(grep -Poh ',[0-9]+' <<< "$i" | grep -Poh '[0-9]+')"
if [[ "$i" == *"$splice"* ]]; then
echo "a.splice(0,$num)"
elif [[ "$i" == *"$rev"* ]]; then
@ -41,4 +44,4 @@ for i in $code; do
else
echo "UNKNOWN????"
fi
done
done

View file

@ -1,433 +0,0 @@
#!/usr/bin/python3
import argparse
import ctypes
import functools
import shutil
import subprocess
import sys
import tempfile
import threading
import traceback
import os.path
sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__)))))
from haruhi_dl.compat import (
compat_input,
compat_http_server,
compat_str,
compat_urlparse,
)
# These are not used outside of buildserver.py thus not in compat.py
try:
import winreg as compat_winreg
except ImportError: # Python 2
import _winreg as compat_winreg
try:
import socketserver as compat_socketserver
except ImportError: # Python 2
import SocketServer as compat_socketserver
class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer):
allow_reuse_address = True
advapi32 = ctypes.windll.advapi32
SC_MANAGER_ALL_ACCESS = 0xf003f
SC_MANAGER_CREATE_SERVICE = 0x02
SERVICE_WIN32_OWN_PROCESS = 0x10
SERVICE_AUTO_START = 0x2
SERVICE_ERROR_NORMAL = 0x1
DELETE = 0x00010000
SERVICE_STATUS_START_PENDING = 0x00000002
SERVICE_STATUS_RUNNING = 0x00000004
SERVICE_ACCEPT_STOP = 0x1
SVCNAME = 'youtubedl_builder'
LPTSTR = ctypes.c_wchar_p
START_CALLBACK = ctypes.WINFUNCTYPE(None, ctypes.c_int, ctypes.POINTER(LPTSTR))
class SERVICE_TABLE_ENTRY(ctypes.Structure):
_fields_ = [
('lpServiceName', LPTSTR),
('lpServiceProc', START_CALLBACK)
]
HandlerEx = ctypes.WINFUNCTYPE(
ctypes.c_int, # return
ctypes.c_int, # dwControl
ctypes.c_int, # dwEventType
ctypes.c_void_p, # lpEventData,
ctypes.c_void_p, # lpContext,
)
def _ctypes_array(c_type, py_array):
ar = (c_type * len(py_array))()
ar[:] = py_array
return ar
def win_OpenSCManager():
res = advapi32.OpenSCManagerW(None, None, SC_MANAGER_ALL_ACCESS)
if not res:
raise Exception('Opening service manager failed - '
'are you running this as administrator?')
return res
def win_install_service(service_name, cmdline):
manager = win_OpenSCManager()
try:
h = advapi32.CreateServiceW(
manager, service_name, None,
SC_MANAGER_CREATE_SERVICE, SERVICE_WIN32_OWN_PROCESS,
SERVICE_AUTO_START, SERVICE_ERROR_NORMAL,
cmdline, None, None, None, None, None)
if not h:
raise OSError('Service creation failed: %s' % ctypes.FormatError())
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_uninstall_service(service_name):
manager = win_OpenSCManager()
try:
h = advapi32.OpenServiceW(manager, service_name, DELETE)
if not h:
raise OSError('Could not find service %s: %s' % (
service_name, ctypes.FormatError()))
try:
if not advapi32.DeleteService(h):
raise OSError('Deletion failed: %s' % ctypes.FormatError())
finally:
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_service_report_event(service_name, msg, is_error=True):
with open('C:/sshkeys/log', 'a', encoding='utf-8') as f:
f.write(msg + '\n')
event_log = advapi32.RegisterEventSourceW(None, service_name)
if not event_log:
raise OSError('Could not report event: %s' % ctypes.FormatError())
try:
type_id = 0x0001 if is_error else 0x0004
event_id = 0xc0000000 if is_error else 0x40000000
lines = _ctypes_array(LPTSTR, [msg])
if not advapi32.ReportEventW(
event_log, type_id, 0, event_id, None, len(lines), 0,
lines, None):
raise OSError('Event reporting failed: %s' % ctypes.FormatError())
finally:
advapi32.DeregisterEventSource(event_log)
def win_service_handler(stop_event, *args):
try:
raise ValueError('Handler called with args ' + repr(args))
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_set_status(handle, status_code):
svcStatus = SERVICE_STATUS()
svcStatus.dwServiceType = SERVICE_WIN32_OWN_PROCESS
svcStatus.dwCurrentState = status_code
svcStatus.dwControlsAccepted = SERVICE_ACCEPT_STOP
svcStatus.dwServiceSpecificExitCode = 0
if not advapi32.SetServiceStatus(handle, ctypes.byref(svcStatus)):
raise OSError('SetServiceStatus failed: %r' % ctypes.FormatError())
def win_service_main(service_name, real_main, argc, argv_raw):
try:
# args = [argv_raw[i].value for i in range(argc)]
stop_event = threading.Event()
handler = HandlerEx(functools.partial(stop_event, win_service_handler))
h = advapi32.RegisterServiceCtrlHandlerExW(service_name, handler, None)
if not h:
raise OSError('Handler registration failed: %s' %
ctypes.FormatError())
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_start(service_name, real_main):
try:
cb = START_CALLBACK(
functools.partial(win_service_main, service_name, real_main))
dispatch_table = _ctypes_array(SERVICE_TABLE_ENTRY, [
SERVICE_TABLE_ENTRY(
service_name,
cb
),
SERVICE_TABLE_ENTRY(None, ctypes.cast(None, START_CALLBACK))
])
if not advapi32.StartServiceCtrlDispatcherW(dispatch_table):
raise OSError('ctypes start failed: %s' % ctypes.FormatError())
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def main(args=None):
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--install',
action='store_const', dest='action', const='install',
help='Launch at Windows startup')
parser.add_argument('-u', '--uninstall',
action='store_const', dest='action', const='uninstall',
help='Remove Windows service')
parser.add_argument('-s', '--service',
action='store_const', dest='action', const='service',
help='Run as a Windows service')
parser.add_argument('-b', '--bind', metavar='<host:port>',
action='store', default='0.0.0.0:8142',
help='Bind to host:port (default %default)')
options = parser.parse_args(args=args)
if options.action == 'install':
fn = os.path.abspath(__file__).replace('v:', '\\\\vboxsrv\\vbox')
cmdline = '%s %s -s -b %s' % (sys.executable, fn, options.bind)
win_install_service(SVCNAME, cmdline)
return
if options.action == 'uninstall':
win_uninstall_service(SVCNAME)
return
if options.action == 'service':
win_service_start(SVCNAME, main)
return
host, port_str = options.bind.split(':')
port = int(port_str)
print('Listening on %s:%d' % (host, port))
srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler)
thr = threading.Thread(target=srv.serve_forever)
thr.start()
compat_input('Press ENTER to shut down')
srv.shutdown()
thr.join()
def rmtree(path):
for name in os.listdir(path):
fname = os.path.join(path, name)
if os.path.isdir(fname):
rmtree(fname)
else:
os.chmod(fname, 0o666)
os.remove(fname)
os.rmdir(path)
class BuildError(Exception):
def __init__(self, output, code=500):
self.output = output
self.code = code
def __str__(self):
return self.output
class HTTPError(BuildError):
pass
class PythonBuilder(object):
def __init__(self, **kwargs):
python_version = kwargs.pop('python', '3.4')
python_path = None
for node in ('Wow6432Node\\', ''):
try:
key = compat_winreg.OpenKey(
compat_winreg.HKEY_LOCAL_MACHINE,
r'SOFTWARE\%sPython\PythonCore\%s\InstallPath' % (node, python_version))
try:
python_path, _ = compat_winreg.QueryValueEx(key, '')
finally:
compat_winreg.CloseKey(key)
break
except Exception:
pass
if not python_path:
raise BuildError('No such Python version: %s' % python_version)
self.pythonPath = python_path
super(PythonBuilder, self).__init__(**kwargs)
class GITInfoBuilder(object):
def __init__(self, **kwargs):
try:
self.user, self.repoName = kwargs['path'][:2]
self.rev = kwargs.pop('rev')
except ValueError:
raise BuildError('Invalid path')
except KeyError as e:
raise BuildError('Missing mandatory parameter "%s"' % e.args[0])
path = os.path.join(os.environ['APPDATA'], 'Build archive', self.repoName, self.user)
if not os.path.exists(path):
os.makedirs(path)
self.basePath = tempfile.mkdtemp(dir=path)
self.buildPath = os.path.join(self.basePath, 'build')
super(GITInfoBuilder, self).__init__(**kwargs)
class GITBuilder(GITInfoBuilder):
def build(self):
try:
subprocess.check_output(['git', 'clone', 'git://github.com/%s/%s.git' % (self.user, self.repoName), self.buildPath])
subprocess.check_output(['git', 'checkout', self.rev], cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(GITBuilder, self).build()
class HaruhiDLBuilder(object):
authorizedUsers = ['fraca7', 'phihag', 'rg3', 'FiloSottile', 'ytdl-org']
def __init__(self, **kwargs):
if self.repoName != 'haruhi-dl':
raise BuildError('Invalid repository "%s"' % self.repoName)
if self.user not in self.authorizedUsers:
raise HTTPError('Unauthorized user "%s"' % self.user, 401)
super(HaruhiDLBuilder, self).__init__(**kwargs)
def build(self):
try:
proc = subprocess.Popen([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'], stdin=subprocess.PIPE, cwd=self.buildPath)
proc.wait()
#subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
# cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(HaruhiDLBuilder, self).build()
class DownloadBuilder(object):
def __init__(self, **kwargs):
self.handler = kwargs.pop('handler')
self.srcPath = os.path.join(self.buildPath, *tuple(kwargs['path'][2:]))
self.srcPath = os.path.abspath(os.path.normpath(self.srcPath))
if not self.srcPath.startswith(self.buildPath):
raise HTTPError(self.srcPath, 401)
super(DownloadBuilder, self).__init__(**kwargs)
def build(self):
if not os.path.exists(self.srcPath):
raise HTTPError('No such file', 404)
if os.path.isdir(self.srcPath):
raise HTTPError('Is a directory: %s' % self.srcPath, 401)
self.handler.send_response(200)
self.handler.send_header('Content-Type', 'application/octet-stream')
self.handler.send_header('Content-Disposition', 'attachment; filename=%s' % os.path.split(self.srcPath)[-1])
self.handler.send_header('Content-Length', str(os.stat(self.srcPath).st_size))
self.handler.end_headers()
with open(self.srcPath, 'rb') as src:
shutil.copyfileobj(src, self.handler.wfile)
super(DownloadBuilder, self).build()
class CleanupTempDir(object):
def build(self):
try:
rmtree(self.basePath)
except Exception as e:
print('WARNING deleting "%s": %s' % (self.basePath, e))
super(CleanupTempDir, self).build()
class Null(object):
def __init__(self, **kwargs):
pass
def start(self):
pass
def close(self):
pass
def build(self):
pass
class Builder(PythonBuilder, GITBuilder, HaruhiDLBuilder, DownloadBuilder, CleanupTempDir, Null):
pass
class BuildHTTPRequestHandler(compat_http_server.BaseHTTPRequestHandler):
actionDict = {'build': Builder, 'download': Builder} # They're the same, no more caching.
def do_GET(self):
path = compat_urlparse.urlparse(self.path)
paramDict = dict([(key, value[0]) for key, value in compat_urlparse.parse_qs(path.query).items()])
action, _, path = path.path.strip('/').partition('/')
if path:
path = path.split('/')
if action in self.actionDict:
try:
builder = self.actionDict[action](path=path, handler=self, **paramDict)
builder.start()
try:
builder.build()
finally:
builder.close()
except BuildError as e:
self.send_response(e.code)
msg = compat_str(e).encode('UTF-8')
self.send_header('Content-Type', 'text/plain; charset=UTF-8')
self.send_header('Content-Length', len(msg))
self.end_headers()
self.wfile.write(msg)
else:
self.send_response(500, 'Unknown build method "%s"' % action)
else:
self.send_response(500, 'Malformed URL')
if __name__ == '__main__':
main()

View file

@ -0,0 +1,36 @@
#!/usr/bin/env node
// patch hook for https://git.sakamoto.pl/laudompat/copykitku
module.exports = function patchHook(patchContent) {
[
[/(?:youtube-|yt-?)dl\.org/g, 'haruhi.download'],
// fork: https://github.com/blackjack4494/yt-dlc
[/youtube_dlc/g, 'haruhi_dl'],
[/youtube-dlc/g, 'haruhi-dl'],
[/ytdlc/g, 'hdl'],
[/yt-dlc/g, 'hdl'],
// fork: https://github.com/yt-dlp/yt-dlp
[/yt_dlp/g, 'haruhi_dl'],
[/yt-dlp/g, 'haruhi-dl'],
[/ytdlp/g, 'hdl'],
[/youtube_dl/g, 'haruhi_dl'],
[/youtube-dl/g, 'haruhi-dl'],
[/youtubedl/g, 'haruhidl'],
[/YoutubeDL/g, 'HaruhiDL'],
[/ytdl/g, 'hdl'],
[/yt-dl/g, 'h-dl'],
[/ydl/g, 'hdl'],
// prevent from linking to non-existent repository
[/github\.com\/(?:yt|h)dl-org\/haruhi-dl/g, 'github.com/ytdl-org/youtube-dl'],
[/github\.com\/rg3\/haruhi-dl/g, 'github.com/ytdl-org/youtube-dl'],
[/github\.com\/blackjack4494\/hdl/g, 'github.com/blackjack4494/yt-dlc'],
[/github\.com\/hdl\/hdl/g, 'github.com/yt-dlp/yt-dlp'],
// prevent changing the smuggle URLs (for compatibility with ytdl)
[/__haruhidl_smuggle/g, '__youtubedl_smuggle'],
].forEach(([regex, replacement]) => patchContent = patchContent.replace(regex, replacement));
return patchContent;
}

View file

@ -1,5 +0,0 @@
#!/bin/bash
wget https://repo1.maven.org/maven2/org/python/jython-installer/2.7.2/jython-installer-2.7.2.jar
java -jar jython-installer-2.7.2.jar -s -d "$HOME/jython"
$HOME/jython/bin/jython -m pip install nose

View file

@ -1,4 +1,5 @@
# coding: utf-8
# flake8: noqa
from __future__ import unicode_literals
import re
@ -17,3 +18,17 @@ class LazyLoadExtractor(object):
instance = real_cls.__new__(real_cls)
instance.__init__(*args, **kwargs)
return instance
# suitable() inserts below
{}
class LazyLoadSearchExtractor(LazyLoadExtractor):
pass
class LazyLoadSelfhostedExtractor(LazyLoadExtractor):
# suitable_selfhosted() inserts below
{}

View file

@ -1,33 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
import re
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
readme = inf.read()
bug_text = re.search(
r'(?s)#\s*BUGS\s*[^\n]*\s*(.*?)#\s*COPYRIGHT', readme).group(1)
dev_text = re.search(
r'(?s)(#\s*DEVELOPER INSTRUCTIONS.*?)#\s*EMBEDDING YOUTUBE-DL',
readme).group(1)
out = bug_text + dev_text
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View file

@ -1,29 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
issue_template_tmpl = inf.read()
# Get the version from haruhi_dl/version.py without importing the package
exec(compile(open('haruhi_dl/version.py').read(),
'haruhi_dl/version.py', 'exec'))
out = issue_template_tmpl % {'version': locals()['__version__']}
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View file

@ -15,14 +15,15 @@ if os.path.exists(lazy_extractors_filename):
os.remove(lazy_extractors_filename)
from haruhi_dl.extractor import _ALL_CLASSES
from haruhi_dl.extractor.common import InfoExtractor, SearchInfoExtractor
from haruhi_dl.extractor.common import InfoExtractor, SearchInfoExtractor, SelfhostedInfoExtractor
with open('devscripts/lazy_load_template.py', 'rt') as f:
module_template = f.read()
module_contents = [
module_template + '\n' + getsource(InfoExtractor.suitable) + '\n',
'class LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n']
module_template.format(getsource(InfoExtractor.suitable),
getsource(SelfhostedInfoExtractor.suitable_selfhosted)),
]
ie_template = '''
class {name}({bases}):
@ -30,6 +31,12 @@ class {name}({bases}):
_module = '{module}'
'''
sh_additions_template = '''
_SH_VALID_URL = {sh_valid_url!r}
_SH_VALID_CONTENT_STRINGS = {sh_valid_content_strings!r}
_SH_VALID_CONTENT_REGEXES = {sh_valid_content_regexes!r}
'''
make_valid_template = '''
@classmethod
def _make_valid_url(cls):
@ -42,6 +49,8 @@ def get_base_name(base):
return 'LazyLoadExtractor'
elif base is SearchInfoExtractor:
return 'LazyLoadSearchExtractor'
elif base is SelfhostedInfoExtractor:
return 'LazyLoadSelfhostedExtractor'
else:
return base.__name__
@ -53,6 +62,13 @@ def build_lazy_ie(ie, name):
bases=', '.join(map(get_base_name, ie.__bases__)),
valid_url=valid_url,
module=ie.__module__)
if ie._SELFHOSTED is True:
s += sh_additions_template.format(
sh_valid_url=ie._SH_VALID_URL,
sh_valid_content_strings=ie._SH_VALID_CONTENT_STRINGS,
sh_valid_content_regexes=ie._SH_VALID_CONTENT_REGEXES)
if ie.suitable_selfhosted.__func__ is not SelfhostedInfoExtractor.suitable_selfhosted.__func__:
s += '\n' + getsource(ie.suitable_selfhosted)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
s += '\n' + getsource(ie.suitable)
if hasattr(ie, '_make_valid_url'):
@ -61,7 +77,7 @@ def build_lazy_ie(ie, name):
return s
# find the correct sorting and add the required base classes so that sublcasses
# find the correct sorting and add the required base classes so that subclasses
# can be correctly created
classes = _ALL_CLASSES[:-1]
ordered_cls = []
@ -84,15 +100,19 @@ while classes:
ordered_cls.append(_ALL_CLASSES[-1])
names = []
sh_names = []
for ie in ordered_cls:
name = ie.__name__
src = build_lazy_ie(ie, name)
module_contents.append(src)
if ie in _ALL_CLASSES:
names.append(name)
if ie._SELFHOSTED is True:
sh_names.append(name)
module_contents.append(
'_ALL_CLASSES = [{0}]'.format(', '.join(names)))
module_contents.extend((
'\n_ALL_CLASSES = [{0}]'.format(', '.join(names)),
'\n_SH_CLASSES = [{0}]'.format(', '.join(sh_names))))
module_src = '\n'.join(module_contents) + '\n'

View file

@ -1,26 +0,0 @@
from __future__ import unicode_literals
import io
import sys
import re
README_FILE = 'README.md'
helptext = sys.stdin.read()
if isinstance(helptext, bytes):
helptext = helptext.decode('utf-8')
with io.open(README_FILE, encoding='utf-8') as f:
oldreadme = f.read()
header = oldreadme[:oldreadme.index('# OPTIONS')]
footer = oldreadme[oldreadme.index('# CONFIGURATION'):]
options = helptext[helptext.index(' General Options:') + 19:]
options = re.sub(r'(?m)^ (\w.+)$', r'## \1', options)
options = '# OPTIONS\n' + options + '\n'
with io.open(README_FILE, 'w', encoding='utf-8') as f:
f.write(header)
f.write(options)
f.write(footer)

View file

@ -1,46 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
import os
import sys
# Import haruhi_dl
ROOT_DIR = os.path.join(os.path.dirname(__file__), '..')
sys.path.insert(0, ROOT_DIR)
import haruhi_dl
def main():
parser = optparse.OptionParser(usage='%prog OUTFILE.md')
options, args = parser.parse_args()
if len(args) != 1:
parser.error('Expected an output filename')
outfile, = args
def gen_ies_md(ies):
for ie in ies:
ie_md = '**{0}**'.format(ie.IE_NAME)
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
continue
if ie_desc is not None:
ie_md += ': {0}'.format(ie.IE_DESC)
if not ie.working():
ie_md += ' (Currently broken)'
yield ie_md
ies = sorted(haruhi_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower())
out = '# Supported sites\n' + ''.join(
' - ' + md + '\n'
for md in gen_ies_md(ies))
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View file

@ -0,0 +1,32 @@
# this is intended to speed-up some extractors,
# which sometimes need to extract some data that doesn't change very much often,
# but it does on random times, like youtube's signature "crypto" or soundcloud's client id
import os
from os.path import dirname as dirn
import sys
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
from haruhi_dl import HaruhiDL
from haruhi_dl.utils import (
ExtractorError,
)
hdl = HaruhiDL(params={
'quiet': True,
})
artifact_dir = os.path.join(dirn(dirn((os.path.abspath(__file__)))), 'haruhi_dl', 'extractor_artifacts')
if not os.path.exists(artifact_dir):
os.mkdir(artifact_dir)
for ie_name in (
'Youtube',
'Soundcloud',
):
ie = hdl.get_info_extractor(ie_name)
try:
file_contents = ie._generate_prerelease_file()
with open(os.path.join(artifact_dir, ie_name.lower() + '.py'), 'w') as file:
file.write(file_contents)
except ExtractorError as err:
print(err)

View file

@ -1,141 +1,24 @@
#!/bin/bash
# IMPORTANT: the following assumptions are made
# * the GH repo is on the origin remote
# * the gh-pages branch is named so locally
# * the git config user.signingkey is properly set
# You will need
# pip install coverage nose rsa wheel
# TODO
# release notes
# make hash on local files
set -e
skip_tests=true
gpg_sign_commits=""
buildserver='localhost:8142'
while true
do
case "$1" in
--run-tests)
skip_tests=false
shift
;;
--gpg-sign-commits|-S)
gpg_sign_commits="-S"
shift
;;
--buildserver)
buildserver="$2"
shift 2
;;
--*)
echo "ERROR: unknown option $1"
exit 1
;;
*)
break
;;
esac
done
if [ -z "$1" ]; then echo "ERROR: specify version number like this: $0 1994.09.06"; exit 1; fi
version="$1"
major_version=$(echo "$version" | sed -n 's#^\([0-9]*\.[0-9]*\.[0-9]*\).*#\1#p')
if test "$major_version" '!=' "$(date '+%Y.%m.%d')"; then
echo "$version does not start with today's date!"
exit 1
if [[ "$(basename $(pwd))" == 'devscripts' ]]; then
cd ..
fi
if [ ! -z "`git tag | grep "$version"`" ]; then echo 'ERROR: version already present'; exit 1; fi
if [ ! -z "`git status --porcelain | grep -v CHANGELOG`" ]; then echo 'ERROR: the working directory is not clean; commit or stash changes'; exit 1; fi
useless_files=$(find haruhi_dl -type f -not -name '*.py')
if [ ! -z "$useless_files" ]; then echo "ERROR: Non-.py files in haruhi_dl: $useless_files"; exit 1; fi
if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi
if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; exit 1; fi
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
v="$(date "+%Y.%m.%d")"
read -p "Is ChangeLog up to date? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
/bin/echo -e "\n### First of all, testing..."
make clean
if $skip_tests ; then
echo 'SKIPPING TESTS'
else
nosetests --verbose --with-coverage --cover-package=haruhi_dl --cover-html test --stop || exit 1
if [[ "$(grep "'$v" haruhi_dl/version.py)" != '' ]]; then #' is this the first release of the day?
if [[ "$(grep -Poh '[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.[0-9]' haruhi_dl/version.py)" != '' ]]; then # so, 2nd or nth?
v="$v.$(($(cat haruhi_dl/version.py | grep -Poh '[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.[0-9]' | grep -Poh '[0-9]+$')+1))"
else
v="$v.1"
fi
fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" haruhi_dl/version.py
sed "s/__version__ = '.*'/__version__ = '$v'/g" -i haruhi_dl/version.py
/bin/echo -e "\n### Changing version in ChangeLog..."
sed -i "s/<unreleased>/$version/" ChangeLog
/bin/echo -e "\n### Committing documentation, templates and haruhi_dl/version.py..."
make README.md CONTRIBUTING.md issuetemplates supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE/1_broken_site.md .github/ISSUE_TEMPLATE/2_site_support_request.md .github/ISSUE_TEMPLATE/3_site_feature_request.md .github/ISSUE_TEMPLATE/4_bug_report.md .github/ISSUE_TEMPLATE/5_feature_request.md .github/ISSUE_TEMPLATE/6_question.md docs/supportedsites.md haruhi_dl/version.py ChangeLog
git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."
git tag -s -m "Release $version" "$version"
git show "$version"
read -p "Is it good, can I push? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
echo
MASTER=$(git rev-parse --abbrev-ref HEAD)
git push origin $MASTER:master
git push origin "$version"
/bin/echo -e "\n### OK, now it is time to build the binaries..."
REV=$(git rev-parse HEAD)
make haruhi-dl haruhi-dl.tar.gz
read -p "VM running? (y/n) " -n 1
wget "http://$buildserver/build/ytdl-org/haruhi-dl/haruhi-dl.exe?rev=$REV" -O haruhi-dl.exe
mkdir -p "build/$version"
mv haruhi-dl haruhi-dl.exe "build/$version"
mv haruhi-dl.tar.gz "build/$version/haruhi-dl-$version.tar.gz"
RELEASE_FILES="haruhi-dl haruhi-dl.exe haruhi-dl-$version.tar.gz"
(cd build/$version/ && md5sum $RELEASE_FILES > MD5SUMS)
(cd build/$version/ && sha1sum $RELEASE_FILES > SHA1SUMS)
(cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS)
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
/bin/echo -e "\n### Signing and uploading the new binaries to GitHub..."
for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
ROOT=$(pwd)
python devscripts/create-github-release.py ChangeLog $version "$ROOT/build/$version"
#ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
/bin/echo -e "\n### Now switching to gh-pages..."
git clone --branch gh-pages --single-branch . build/gh-pages
(
set -e
ORIGIN_URL=$(git config --get remote.origin.url)
cd build/gh-pages
"$ROOT/devscripts/gh-pages/add-version.py" $version
"$ROOT/devscripts/gh-pages/update-feed.py"
"$ROOT/devscripts/gh-pages/sign-versions.py" < "$ROOT/updates_key.pem"
"$ROOT/devscripts/gh-pages/generate-download.py"
"$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update
git commit $gpg_sign_commits -m "release $version"
git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages
)
rm -rf build
make pypi-files
echo "Uploading to PyPi ..."
python setup.py sdist bdist_wheel upload
make clean
/bin/echo -e "\n### DONE!"
python3 setup.py build_lazy_extractors
python3 devscripts/prerelease_codegen.py
rm -R build dist
python3 setup.py sdist bdist_wheel
python3 -m twine upload dist/*
devscripts/wine-py2exe.sh setup.py

View file

@ -12,6 +12,9 @@ case "$HDL_TEST_SET" in
;;
download)
test_set="-I test_(?!$DOWNLOAD_TESTS).+\.py"
if [[ "$HDL_TEST_PLAYWRIGHT_DOWNLOAD" == "1" ]]; then
test_set="-I test_(?!download).+\.py"
fi
multiprocess_args="--processes=4 --process-timeout=540"
;;
*)

View file

@ -2,7 +2,8 @@
# Run with as parameter a setup.py that works in the current directory
# e.g. no os.chdir()
# It will run twice, the first time will crash
# Wine >=6.3 required: https://bugs.winehq.org/show_bug.cgi?id=3591
set -e
@ -10,36 +11,30 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
if [ ! -d wine-py2exe ]; then
sudo apt-get install wine1.3 axel bsdiff
mkdir wine-py2exe
cd wine-py2exe
export WINEPREFIX=`pwd`
axel -a "http://www.python.org/ftp/python/2.7/python-2.7.msi"
axel -a "http://downloads.sourceforge.net/project/py2exe/py2exe/0.6.9/py2exe-0.6.9.win32-py2.7.exe"
#axel -a "http://winetricks.org/winetricks"
echo "Downloading Python 3.8.8"
aria2c "https://www.python.org/ftp/python/3.8.8/python-3.8.8.exe"
# this will need to be upgraded when switching to a newer version of python
winetricks win7
# http://appdb.winehq.org/objectManager.php?sClass=version&iId=21957
echo "Follow python setup on screen"
wine msiexec /i python-2.7.msi
echo "Installing Python 3.8.8"
wine python-3.8.8.exe /quiet InstallAllUsers=1 'DefaultAllUsersTargetDir=C:\\python38'
echo "Follow py2exe setup on screen"
wine py2exe-0.6.9.win32-py2.7.exe
echo "Installing py2exe"
wine 'C:\\python38\\python.exe' -m pip install wheel
wine 'C:\\python38\\python.exe' -m pip install py2exe
#wine 'C:\\python38\\python.exe' -m pip install playwright===1.9.0
#wine 'C:\\python38\\python.exe' -m playwright install
#echo "Follow Microsoft Visual C++ 2008 Redistributable Package setup on screen"
#bash winetricks vcrun2008
rm py2exe-0.6.9.win32-py2.7.exe
rm python-2.7.msi
#rm winetricks
# http://bugs.winehq.org/show_bug.cgi?id=3591
mv drive_c/Python27/Lib/site-packages/py2exe/run.exe drive_c/Python27/Lib/site-packages/py2exe/run.exe.backup
bspatch drive_c/Python27/Lib/site-packages/py2exe/run.exe.backup drive_c/Python27/Lib/site-packages/py2exe/run.exe "$SCRIPT_DIR/SizeOfImage.patch"
mv drive_c/Python27/Lib/site-packages/py2exe/run_w.exe drive_c/Python27/Lib/site-packages/py2exe/run_w.exe.backup
bspatch drive_c/Python27/Lib/site-packages/py2exe/run_w.exe.backup drive_c/Python27/Lib/site-packages/py2exe/run_w.exe "$SCRIPT_DIR/SizeOfImage_w.patch"
rm python-3.8.8.exe
cd -
@ -49,8 +44,8 @@ else
fi
wine "C:\\Python27\\python.exe" "$1" py2exe > "py2exe.log" 2>&1 || true
echo '# Copying python27.dll' >> "py2exe.log"
cp "$WINEPREFIX/drive_c/windows/system32/python27.dll" build/bdist.win32/winexe/bundle-2.7/
wine "C:\\Python27\\python.exe" "$1" py2exe >> "py2exe.log" 2>&1
mkdir -p build/bdist.win32/winexe/bundle-3.8/
# cp "$WINEPREFIX/drive_c/python38/python38.dll" build/bdist.win32/winexe/bundle-3.8/
echo "Making the exe file"
# cannot be piped into a file: https://forum.winehq.org/viewtopic.php?t=33992
wine 'C:\\python38\\python.exe' "$1" py2exe | tee py2exe.log

1
docs/.gitignore vendored
View file

@ -1 +0,0 @@
_build/

View file

@ -1,177 +0,0 @@
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
clean:
rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/haruhi-dl.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/haruhi-dl.qhc"
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/haruhi-dl"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/haruhi-dl"
@echo "# devhelp"
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."

View file

@ -1,71 +0,0 @@
# coding: utf-8
#
# haruhi-dl documentation build configuration file, created by
# sphinx-quickstart on Fri Mar 14 21:05:43 2014.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys
import os
# Allows to import haruhi_dl
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# -- General configuration ------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'haruhi-dl'
copyright = u'2014, Ricardo Garcia Gonzalez'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
from haruhi_dl.version import __version__
version = __version__
# The full version, including alpha/beta/rc tags.
release = version
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'default'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Output file base name for HTML help builder.
htmlhelp_basename = 'haruhi-dldoc'

View file

@ -1,23 +0,0 @@
Welcome to haruhi-dl's documentation!
======================================
*haruhi-dl* is a command-line program to download videos from YouTube.com and more sites.
It can also be used in Python code.
Developer guide
---------------
This section contains information for using *haruhi-dl* from Python programs.
.. toctree::
:maxdepth: 2
module_guide
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View file

@ -1,67 +0,0 @@
Using the ``haruhi_dl`` module
===============================
When using the ``haruhi_dl`` module, you start by creating an instance of :class:`HaruhiDL` and adding all the available extractors:
.. code-block:: python
>>> from haruhi_dl import HaruhiDL
>>> hdl = HaruhiDL()
>>> hdl.add_default_info_extractors()
Extracting video information
----------------------------
You use the :meth:`HaruhiDL.extract_info` method for getting the video information, which returns a dictionary:
.. code-block:: python
>>> info = hdl.extract_info('http://www.youtube.com/watch?v=BaW_jenozKc', download=False)
[youtube] Setting language
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
>>> info['title']
'haruhi-dl test video "\'/\\ä↭𝕐'
>>> info['height'], info['width']
(720, 1280)
If you want to download or play the video you can get its url:
.. code-block:: python
>>> info['url']
'https://...'
Extracting playlist information
-------------------------------
The playlist information is extracted in a similar way, but the dictionary is a bit different:
.. code-block:: python
>>> playlist = hdl.extract_info('http://www.ted.com/playlists/13/open_source_open_world', download=False)
[TED] open_source_open_world: Downloading playlist webpage
...
>>> playlist['title']
'Open-source, open world'
You can access the videos in the playlist with the ``entries`` field:
.. code-block:: python
>>> for video in playlist['entries']:
... print('Video #%d: %s' % (video['playlist_index'], video['title']))
Video #1: How Arduino is open-sourcing imagination
Video #2: The year open data went worldwide
Video #3: Massive-scale online collaboration
Video #4: The art of asking
Video #5: How cognitive surplus will change the world
Video #6: The birth of Wikipedia
Video #7: Coding a better government
Video #8: The era of open innovation
Video #9: The currency of the new economy is trust

File diff suppressed because it is too large Load diff

View file

@ -60,6 +60,7 @@ from .utils import (
format_bytes,
formatSeconds,
GeoRestrictedError,
HaruhiDLError,
int_or_none,
ISO3166Utils,
locked_file,
@ -96,9 +97,9 @@ from .utils import (
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
from .extractor.openload import PhantomJSwrapper
from .downloader import get_suitable_downloader
from .downloader.rtmp import rtmpdump_version
from .playwright import PlaywrightHelper
from .postprocessor import (
FFmpegFixupM3u8PP,
FFmpegFixupM4aPP,
@ -163,6 +164,7 @@ class HaruhiDL(object):
simulate: Do not download the video files.
format: Video format code. See options.py for more information.
outtmpl: Template for output names.
outtmpl_na_placeholder: Placeholder for unavailable meta fields.
restrictfilenames: Do not allow "&" and spaces in file names
ignoreerrors: Do not stop on download errors.
force_generic_extractor: Force downloader to use the generic extractor
@ -338,6 +340,8 @@ class HaruhiDL(object):
_pps = []
_download_retcode = None
_num_downloads = None
_playlist_level = 0
_playlist_urls = set()
_screen_file = None
def __init__(self, params=None, auto_init=True):
@ -401,6 +405,10 @@ class HaruhiDL(object):
else:
raise
if sys.version_info[0] == 2:
self.report_warning(
'Python 2 is not guaranteed to work, please use Python 3 if possible')
if (sys.platform != 'win32'
and sys.getfilesystemencoding() in ['ascii', 'ANSI_X3.4-1968']
and not params.get('restrictfilenames', False)):
@ -656,7 +664,7 @@ class HaruhiDL(object):
template_dict = dict((k, v if isinstance(v, compat_numeric_types) else sanitize(k, v))
for k, v in template_dict.items()
if v is not None and not isinstance(v, (list, tuple, dict)))
template_dict = collections.defaultdict(lambda: 'NA', template_dict)
template_dict = collections.defaultdict(lambda: self.params.get('outtmpl_na_placeholder', 'NA'), template_dict)
outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
@ -676,8 +684,8 @@ class HaruhiDL(object):
# Missing numeric fields used together with integer presentation types
# in format specification will break the argument substitution since
# string 'NA' is returned for missing fields. We will patch output
# template for missing fields to meet string presentation type.
# string NA placeholder is returned for missing fields. We will patch
# output template for missing fields to meet string presentation type.
for numeric_field in self._NUMERIC_FIELDS:
if numeric_field not in template_dict:
# As of [1] format syntax is:
@ -770,22 +778,38 @@ class HaruhiDL(object):
def extract_info(self, url, download=True, ie_key=None, extra_info={},
process=True, force_generic_extractor=False):
'''
Returns a list with a dictionary for each video we find.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result
'''
"""
Return a list with a dictionary for each video extracted.
Arguments:
url -- URL to extract
Keyword arguments:
download -- whether to download videos during extraction
ie_key -- extractor key hint
extra_info -- dictionary containing the extra values to add to each result
process -- whether to resolve all unresolved references (URLs, playlist items),
must be True for download to work.
force_generic_extractor -- force using the generic extractor
"""
if not ie_key and force_generic_extractor:
ie_key = 'Generic'
force_use_mastodon = self.params.get('force_use_mastodon')
if not ie_key and force_use_mastodon:
ie_key = 'MastodonSH'
if not ie_key:
ie_key = self.params.get('ie_key')
if ie_key:
ies = [self.get_info_extractor(ie_key)]
else:
ies = self._ies
for ie in ies:
if not ie.suitable(url):
if not force_use_mastodon and not ie.suitable(url):
continue
ie = self.get_info_extractor(ie.ie_key())
@ -793,21 +817,14 @@ class HaruhiDL(object):
self.report_warning('The program functionality for this site has been marked as broken, '
'and will probably not work.')
return self.__extract_info(url, ie, download, extra_info, process)
else:
self.report_error('no suitable InfoExtractor for URL %s' % url)
def __handle_extraction_exceptions(func):
def wrapper(self, *args, **kwargs):
try:
ie_result = ie.extract(url)
if ie_result is None: # Finished already (backwards compatibility; listformats and friends should be moved here)
break
if isinstance(ie_result, list):
# Backwards compatibility: old IE result format
ie_result = {
'_type': 'compat_list',
'entries': ie_result,
}
self.add_default_extra_info(ie_result, ie, url)
if process:
return self.process_ie_result(ie_result, download, extra_info)
else:
return ie_result
return func(self, *args, **kwargs)
except GeoRestrictedError as e:
msg = e.msg
if e.countries:
@ -815,20 +832,33 @@ class HaruhiDL(object):
map(ISO3166Utils.short2full, e.countries))
msg += '\nYou might want to use a VPN or a proxy server (with --proxy) to workaround.'
self.report_error(msg)
break
except ExtractorError as e: # An error we somewhat expected
self.report_error(compat_str(e), e.format_traceback())
break
except MaxDownloadsReached:
raise
except Exception as e:
if self.params.get('ignoreerrors', False):
self.report_error(error_to_compat_str(e), tb=encode_compat_str(traceback.format_exc()))
break
else:
raise
return wrapper
@__handle_extraction_exceptions
def __extract_info(self, url, ie, download, extra_info, process):
ie_result = ie.extract(url)
if ie_result is None: # Finished already (backwards compatibility; listformats and friends should be moved here)
return
if isinstance(ie_result, list):
# Backwards compatibility: old IE result format
ie_result = {
'_type': 'compat_list',
'entries': ie_result,
}
self.add_default_extra_info(ie_result, ie, url)
if process:
return self.process_ie_result(ie_result, download, extra_info)
else:
self.report_error('no suitable InfoExtractor for URL %s' % url)
return ie_result
def add_default_extra_info(self, ie_result, ie, url):
self.add_extra_info(ie_result, {
@ -893,123 +923,30 @@ class HaruhiDL(object):
# url_transparent. In such cases outer metadata (from ie_result)
# should be propagated to inner one (info). For this to happen
# _type of info should be overridden with url_transparent. This
# fixes issue from https://github.com/ytdl-org/haruhi-dl/pull/11163.
# fixes issue from https://github.com/ytdl-org/youtube-dl/pull/11163.
if new_result.get('_type') == 'url':
new_result['_type'] = 'url_transparent'
return self.process_ie_result(
new_result, download=download, extra_info=extra_info)
elif result_type in ('playlist', 'multi_video'):
# We process each entry in the playlist
playlist = ie_result.get('title') or ie_result.get('id')
self.to_screen('[download] Downloading playlist: %s' % playlist)
playlist_results = []
playliststart = self.params.get('playliststart', 1) - 1
playlistend = self.params.get('playlistend')
# For backwards compatibility, interpret -1 as whole list
if playlistend == -1:
playlistend = None
playlistitems_str = self.params.get('playlist_items')
playlistitems = None
if playlistitems_str is not None:
def iter_playlistitems(format):
for string_segment in format.split(','):
if '-' in string_segment:
start, end = string_segment.split('-')
for item in range(int(start), int(end) + 1):
yield int(item)
else:
yield int(string_segment)
playlistitems = orderedSet(iter_playlistitems(playlistitems_str))
ie_entries = ie_result['entries']
def make_playlistitems_entries(list_ie_entries):
num_entries = len(list_ie_entries)
return [
list_ie_entries[i - 1] for i in playlistitems
if -num_entries <= i - 1 < num_entries]
def report_download(num_entries):
# Protect from infinite recursion due to recursively nested playlists
# (see https://github.com/ytdl-org/youtube-dl/issues/27833)
webpage_url = ie_result['webpage_url']
if webpage_url in self._playlist_urls:
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, num_entries))
'[download] Skipping already downloaded playlist: %s'
% ie_result.get('title') or ie_result.get('id'))
return
if isinstance(ie_entries, list):
n_all_entries = len(ie_entries)
if playlistitems:
entries = make_playlistitems_entries(ie_entries)
else:
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
self.to_screen(
'[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
elif isinstance(ie_entries, PagedList):
if playlistitems:
entries = []
for item in playlistitems:
entries.extend(ie_entries.getslice(
item - 1, item
))
else:
entries = ie_entries.getslice(
playliststart, playlistend)
n_entries = len(entries)
report_download(n_entries)
else: # iterable
if playlistitems:
entries = make_playlistitems_entries(list(itertools.islice(
ie_entries, 0, max(playlistitems))))
else:
entries = list(itertools.islice(
ie_entries, playliststart, playlistend))
n_entries = len(entries)
report_download(n_entries)
if self.params.get('playlistreverse', False):
entries = entries[::-1]
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
# minimal changes
if x_forwarded_for:
entry['__x_forwarded_for_ip'] = x_forwarded_for
extra = {
'n_entries': n_entries,
'playlist': playlist,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
}
reason = self._match_entry(entry, incomplete=True)
if reason is not None:
self.to_screen('[download] ' + reason)
continue
entry_result = self.process_ie_result(entry,
download=download,
extra_info=extra)
playlist_results.append(entry_result)
ie_result['entries'] = playlist_results
self.to_screen('[download] Finished downloading playlist: %s' % playlist)
return ie_result
self._playlist_level += 1
self._playlist_urls.add(webpage_url)
try:
return self.__process_playlist(ie_result, download)
finally:
self._playlist_level -= 1
if not self._playlist_level:
self._playlist_urls.clear()
elif result_type == 'compat_list':
self.report_warning(
'Extractor %s returned a compat_list result. '
@ -1034,6 +971,123 @@ class HaruhiDL(object):
else:
raise Exception('Invalid result type: %s' % result_type)
def __process_playlist(self, ie_result, download):
# We process each entry in the playlist
playlist = ie_result.get('title') or ie_result.get('id')
self.to_screen('[download] Downloading playlist: %s' % playlist)
playlist_results = []
playliststart = self.params.get('playliststart', 1) - 1
playlistend = self.params.get('playlistend')
# For backwards compatibility, interpret -1 as whole list
if playlistend == -1:
playlistend = None
playlistitems_str = self.params.get('playlist_items')
playlistitems = None
if playlistitems_str is not None:
def iter_playlistitems(format):
for string_segment in format.split(','):
if '-' in string_segment:
start, end = string_segment.split('-')
for item in range(int(start), int(end) + 1):
yield int(item)
else:
yield int(string_segment)
playlistitems = orderedSet(iter_playlistitems(playlistitems_str))
ie_entries = ie_result['entries']
def make_playlistitems_entries(list_ie_entries):
num_entries = len(list_ie_entries)
return [
list_ie_entries[i - 1] for i in playlistitems
if -num_entries <= i - 1 < num_entries]
def report_download(num_entries):
self.to_screen(
'[%s] playlist %s: Downloading %d videos' %
(ie_result['extractor'], playlist, num_entries))
if isinstance(ie_entries, list):
n_all_entries = len(ie_entries)
if playlistitems:
entries = make_playlistitems_entries(ie_entries)
else:
entries = ie_entries[playliststart:playlistend]
n_entries = len(entries)
self.to_screen(
'[%s] playlist %s: Collected %d video ids (downloading %d of them)' %
(ie_result['extractor'], playlist, n_all_entries, n_entries))
elif isinstance(ie_entries, PagedList):
if playlistitems:
entries = []
for item in playlistitems:
entries.extend(ie_entries.getslice(
item - 1, item
))
else:
entries = ie_entries.getslice(
playliststart, playlistend)
n_entries = len(entries)
report_download(n_entries)
else: # iterable
if playlistitems:
entries = make_playlistitems_entries(list(itertools.islice(
ie_entries, 0, max(playlistitems))))
else:
entries = list(itertools.islice(
ie_entries, playliststart, playlistend))
n_entries = len(entries)
report_download(n_entries)
if self.params.get('playlistreverse', False):
entries = entries[::-1]
if self.params.get('playlistrandom', False):
random.shuffle(entries)
x_forwarded_for = ie_result.get('__x_forwarded_for_ip')
for i, entry in enumerate(entries, 1):
self.to_screen('[download] Downloading video %s of %s' % (i, n_entries))
# This __x_forwarded_for_ip thing is a bit ugly but requires
# minimal changes
if x_forwarded_for:
entry['__x_forwarded_for_ip'] = x_forwarded_for
extra = {
'n_entries': n_entries,
'playlist': playlist,
'playlist_id': ie_result.get('id'),
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
'extractor_key': ie_result['extractor_key'],
}
reason = self._match_entry(entry, incomplete=True)
if reason is not None:
self.to_screen('[download] ' + reason)
continue
entry_result = self.__process_iterable_entry(entry, download, extra)
# TODO: skip failed (empty) entries?
playlist_results.append(entry_result)
ie_result['entries'] = playlist_results
self.to_screen('[download] Finished downloading playlist: %s' % playlist)
return ie_result
@__handle_extraction_exceptions
def __process_iterable_entry(self, entry, download, extra_info):
return self.process_ie_result(
entry, download=download, extra_info=extra_info)
def _build_format_filter(self, filter_spec):
" Returns a function to filter the formats according to the filter_spec "
@ -1073,7 +1127,7 @@ class HaruhiDL(object):
'*=': lambda attr, value: value in attr,
}
str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id|language)
\s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+)
\s*$
@ -1216,6 +1270,8 @@ class HaruhiDL(object):
group = _parse_format_selection(tokens, inside_group=True)
current_selector = FormatSelector(GROUP, group, [])
elif string == '+':
if inside_merge:
raise syntax_error('Unexpected "+"', start)
video_selector = current_selector
audio_selector = _parse_format_selection(tokens, inside_merge=True)
if not video_selector or not audio_selector:
@ -1476,14 +1532,18 @@ class HaruhiDL(object):
if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id']
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict['timestamp'])
info_dict['upload_date'] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError):
pass
for ts_key, date_key in (
('timestamp', 'upload_date'),
('release_timestamp', 'release_date'),
):
if info_dict.get(date_key) is None and info_dict.get(ts_key) is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict[ts_key])
info_dict[date_key] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError):
pass
# Auto generate title fields corresponding to the *_number fields when missing
# in order to always have clean titles. This is very common for TV series.
@ -1491,6 +1551,19 @@ class HaruhiDL(object):
if info_dict.get('%s_number' % field) is not None and not info_dict.get(field):
info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field])
# Some fragmented media manifests like m3u8 allow embedding subtitles
# This is a weird hack to provide these subtitles to users without a very huge refactor of extractors
if 'formats' in info_dict:
formats_subtitles = list(filter(lambda x: x.get('_subtitle'), info_dict['formats']))
if formats_subtitles:
info_dict.setdefault('subtitles', {})
for sub in formats_subtitles:
if sub['_key'] not in info_dict['subtitles']:
info_dict['subtitles'][sub['_key']] = []
info_dict['subtitles'][sub['_key']].append(sub['_subtitle'])
# remove these subtitles from formats now
info_dict['formats'] = list(filter(lambda x: '_subtitle' not in x, info_dict['formats']))
for cc_kind in ('subtitles', 'automatic_captions'):
cc = info_dict.get(cc_kind)
if cc:
@ -1498,6 +1571,12 @@ class HaruhiDL(object):
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if subtitle_format.get('protocol') is None:
subtitle_format['protocol'] = determine_protocol(subtitle_format)
if subtitle_format.get('http_headers') is None:
full_info = info_dict.copy()
full_info.update(subtitle_format)
subtitle_format['http_headers'] = self._calc_headers(full_info)
if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
@ -1600,7 +1679,7 @@ class HaruhiDL(object):
if req_format is None:
req_format = self._default_format_spec(info_dict, download=download)
if self.params.get('verbose'):
self.to_stdout('[debug] Default format spec: %s' % req_format)
self._write_string('[debug] Default format spec: %s\n' % req_format)
format_selector = self.build_format_selector(req_format)
@ -1610,7 +1689,7 @@ class HaruhiDL(object):
# by extractor are incomplete or not (i.e. whether extractor provides only
# video-only or audio-only formats) for proper formats selection for
# extractors with such incomplete formats (see
# https://github.com/ytdl-org/haruhi-dl/pull/5556).
# https://github.com/ytdl-org/youtube-dl/pull/5556).
# Since formats may be filtered during format selection and may not match
# the original formats the results may be incorrect. Thus original formats
# or pre-calculated metrics should be passed to format selection routines
@ -1618,7 +1697,7 @@ class HaruhiDL(object):
# We will pass a context object containing all necessary additional data
# instead of just formats.
# This fixes incorrect format selection issue (see
# https://github.com/ytdl-org/haruhi-dl/issues/10083).
# https://github.com/ytdl-org/youtube-dl/issues/10083).
incomplete_formats = (
# All formats are video-only or
all(f.get('vcodec') != 'none' and f.get('acodec') == 'none' for f in formats)
@ -1767,6 +1846,8 @@ class HaruhiDL(object):
os.makedirs(dn)
return True
except (OSError, IOError) as err:
if isinstance(err, OSError) and err.errno == errno.EEXIST:
return True
self.report_error('unable to create directory ' + error_to_compat_str(err))
return False
@ -1812,7 +1893,6 @@ class HaruhiDL(object):
# subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE
subtitles = info_dict['requested_subtitles']
ie = self.get_info_extractor(info_dict['extractor_key'])
for sub_lang, sub_info in subtitles.items():
sub_format = sub_info['ext']
sub_filename = subtitles_filename(filename, sub_lang, sub_format, info_dict.get('ext'))
@ -1823,7 +1903,7 @@ class HaruhiDL(object):
if sub_info.get('data') is not None:
try:
# Use newline='' to prevent conversion of newline characters
# See https://github.com/ytdl-org/haruhi-dl/issues/10268
# See https://github.com/ytdl-org/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_info['data'])
except (OSError, IOError):
@ -1831,10 +1911,8 @@ class HaruhiDL(object):
return
else:
try:
sub_data = ie._request_webpage(
sub_info['url'], info_dict['id'], note=False).read()
with io.open(encodeFilename(sub_filename), 'wb') as subfile:
subfile.write(sub_data)
subd = get_suitable_downloader(sub_info, self.params)(self, self.params)
subd.download(sub_filename, sub_info)
except (ExtractorError, IOError, OSError, ValueError) as err:
self.report_warning('Unable to download subtitle for "%s": %s' %
(sub_lang, error_to_compat_str(err)))
@ -1861,7 +1939,11 @@ class HaruhiDL(object):
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
if self.params.get('verbose'):
self.to_stdout('[debug] Invoking downloader on %r' % info.get('url'))
self.to_screen('[debug] Invoking downloader on %r' % info.get('url'))
if info.get('protocol') == 'bittorrent' and not self.params.get('allow_p2p'):
raise HaruhiDLError('Peer-to-peer format got selected, but peer-to-peer '
'downloads are not allowed. '
'Choose different format or add --allow-p2p option')
return fd.download(name, info)
if info_dict.get('requested_formats') is not None:
@ -1878,8 +1960,32 @@ class HaruhiDL(object):
def compatible_formats(formats):
video, audio = formats
# Check extension
# Check extensions and codecs
video_ext, audio_ext = video.get('ext'), audio.get('ext')
video_codec, audio_codec = video.get('vcodec'), audio.get('acodec')
if video_codec and audio_codec:
COMPATIBLE_CODECS = {
'mp4': (
# fourcc (m3u8, mpd)
'av01', 'hevc', 'avc1', 'mp4a',
# whatever the ism does
'h264', 'aacl',
),
'webm': (
'av01', 'vp9', 'vp8', 'opus', 'vrbs',
# these are in the webm spec, so putting it here to be sure
'vp9x', 'vp8x',
),
}
video_codec = video_codec[:4].lower()
audio_codec = audio_codec[:4].lower()
for ext in COMPATIBLE_CODECS:
if all(codec in COMPATIBLE_CODECS[ext]
for codec in (video_codec, audio_codec)):
info_dict['ext'] = ext
return True
if video_ext and audio_ext:
COMPATIBLE_EXTS = (
('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'ismv', 'isma'),
@ -1888,7 +1994,6 @@ class HaruhiDL(object):
for exts in COMPATIBLE_EXTS:
if video_ext in exts and audio_ext in exts:
return True
# TODO: Check acodec/vcodec
return False
filename_real_ext = os.path.splitext(filename)[1][1:]
@ -2242,7 +2347,7 @@ class HaruhiDL(object):
return
if type('') is not compat_str:
# Python 2.6 on SLES11 SP1 (https://github.com/ytdl-org/haruhi-dl/issues/3326)
# Python 2.6 on SLES11 SP1 (https://github.com/ytdl-org/youtube-dl/issues/3326)
self.report_warning(
'Your Python is broken! Update to a newer and supported version')
@ -2286,7 +2391,7 @@ class HaruhiDL(object):
exe_versions = FFmpegPostProcessor.get_versions(self)
exe_versions['rtmpdump'] = rtmpdump_version()
exe_versions['phantomjs'] = PhantomJSwrapper._version()
exe_versions['playwright'] = PlaywrightHelper._version()
exe_str = ', '.join(
'%s %s' % (exe, v)
for exe, v in sorted(exe_versions.items())
@ -2336,7 +2441,7 @@ class HaruhiDL(object):
proxies = {'http': opts_proxy, 'https': opts_proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/ytdl-org/haruhi-dl/issues/805)
# Set HTTPS proxy to HTTP one if given (https://github.com/ytdl-org/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = PerRequestProxyHandler(proxies)
@ -2350,7 +2455,7 @@ class HaruhiDL(object):
# When passing our own FileHandler instance, build_opener won't add the
# default FileHandler and allows us to disable the file protocol, which
# can be used for malicious purposes (see
# https://github.com/ytdl-org/haruhi-dl/issues/8227)
# https://github.com/ytdl-org/youtube-dl/issues/8227)
file_handler = compat_urllib_request.FileHandler()
def file_open(*args, **kwargs):
@ -2362,7 +2467,7 @@ class HaruhiDL(object):
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
# (See https://github.com/ytdl-org/haruhi-dl/issues/1309 for details)
# (See https://github.com/ytdl-org/youtube-dl/issues/1309 for details)
opener.addheaders = []
self._opener = opener
@ -2400,7 +2505,7 @@ class HaruhiDL(object):
thumb_ext = determine_ext(t['url'], 'jpg')
suffix = '_%s' % t['id'] if len(thumbnails) > 1 else ''
thumb_display_id = '%s ' % t['id'] if len(thumbnails) > 1 else ''
t['filename'] = thumb_filename = os.path.splitext(filename)[0] + suffix + '.' + thumb_ext
t['filename'] = thumb_filename = replace_extension(filename + suffix, thumb_ext, info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(thumb_filename)):
self.to_screen('[%s] %s: Thumbnail %sis already present' %

View file

@ -1,15 +1,17 @@
#!/usr/bin/env python
#!/usr/bin/env python3
# coding: utf-8
from __future__ import unicode_literals
import sys
__license__ = 'Public Domain'
if sys.version_info[0] == 2:
sys.exit('haruhi-dl no longer works on Python 2, use Python 3 instead')
__license__ = 'LGPL-3.0-or-later'
import codecs
import io
import os
import random
import sys
from .options import (
@ -48,7 +50,7 @@ from .HaruhiDL import HaruhiDL
def _real_main(argv=None):
# Compatibility fixes for Windows
if sys.platform == 'win32':
# https://github.com/ytdl-org/haruhi-dl/issues/820
# https://github.com/ytdl-org/youtube-dl/issues/820
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)
workaround_optparse_bug9161()
@ -174,6 +176,10 @@ def _real_main(argv=None):
opts.max_sleep_interval = opts.sleep_interval
if opts.ap_mso and opts.ap_mso not in MSO_INFO:
parser.error('Unsupported TV Provider, use --ap-list-mso to get a list of supported TV Providers')
if opts.force_generic_extractor and opts.force_use_mastodon:
parser.error('force either generic extractor or Mastodon')
if opts.force_playwright_browser not in ('firefox', 'chromium', 'webkit', None):
parser.error('invalid browser forced, must be on of: firefox, chromium, webkit')
def parse_retries(retries):
if retries in ('inf', 'infinite'):
@ -340,11 +346,14 @@ def _real_main(argv=None):
'format': opts.format,
'listformats': opts.listformats,
'outtmpl': outtmpl,
'outtmpl_na_placeholder': opts.outtmpl_na_placeholder,
'autonumber_size': opts.autonumber_size,
'autonumber_start': opts.autonumber_start,
'restrictfilenames': opts.restrictfilenames,
'ignoreerrors': opts.ignoreerrors,
'force_generic_extractor': opts.force_generic_extractor,
'force_use_mastodon': opts.force_use_mastodon,
'ie_key': opts.ie_key,
'ratelimit': opts.ratelimit,
'nooverwrites': opts.nooverwrites,
'retries': opts.retries,
@ -414,14 +423,17 @@ def _real_main(argv=None):
'fixup': opts.fixup,
'source_address': opts.source_address,
'call_home': opts.call_home,
'headless_playwright': opts.headless_playwright,
'sleep_interval': opts.sleep_interval,
'max_sleep_interval': opts.max_sleep_interval,
'force_playwright_browser': opts.force_playwright_browser,
'external_downloader': opts.external_downloader,
'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items,
'xattr_set_filesize': opts.xattr_set_filesize,
'match_filter': match_filter,
'no_color': opts.no_color,
'use_proxy_sites': opts.use_proxy_sites,
'ffmpeg_location': opts.ffmpeg_location,
'hls_prefer_native': opts.hls_prefer_native,
'hls_use_mpegts': opts.hls_use_mpegts,
@ -433,6 +445,8 @@ def _real_main(argv=None):
'geo_bypass': opts.geo_bypass,
'geo_bypass_country': opts.geo_bypass_country,
'geo_bypass_ip_block': opts.geo_bypass_ip_block,
'allow_p2p': opts.allow_p2p if not opts.prefer_p2p else True,
'prefer_p2p': opts.prefer_p2p,
# just for deprecation check
'autonumber': opts.autonumber if opts.autonumber is True else None,
'usetitle': opts.usetitle if opts.usetitle is True else None,

View file

@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
from __future__ import unicode_literals
# Execute with
@ -7,6 +7,9 @@ from __future__ import unicode_literals
import sys
if sys.version_info[0] == 2:
sys.exit('haruhi-dl no longer works on Python 2, use Python 3 instead')
if __package__ is None and not hasattr(sys, 'frozen'):
# direct call of __main__.py
import os.path

File diff suppressed because it is too large Load diff

View file

@ -1,5 +1,18 @@
from __future__ import unicode_literals
from ..utils import (
determine_protocol,
)
def _get_real_downloader(info_dict, protocol=None, *args, **kwargs):
info_copy = info_dict.copy()
if protocol:
info_copy['protocol'] = protocol
return get_suitable_downloader(info_copy, *args, **kwargs)
# Some of these require _get_real_downloader
from .common import FileDownloader
from .f4m import F4mFD
from .hls import HlsFD
@ -8,15 +21,13 @@ from .rtmp import RtmpFD
from .dash import DashSegmentsFD
from .rtsp import RtspFD
from .ism import IsmFD
from .niconico import NiconicoDmcFD
from .external import (
get_external_downloader,
Aria2cFD,
FFmpegFD,
)
from ..utils import (
determine_protocol,
)
PROTOCOL_MAP = {
'rtmp': RtmpFD,
'm3u8_native': HlsFD,
@ -26,6 +37,8 @@ PROTOCOL_MAP = {
'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
'ism': IsmFD,
'bittorrent': Aria2cFD,
'niconico_dmc': NiconicoDmcFD,
}

View file

@ -182,15 +182,16 @@ class Aria2cFD(ExternalFD):
AVAILABLE_OPT = '-v'
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-c']
cmd = [self.exe or 'aria2c', '-c']
cmd += self._configuration_args([
'--min-split-size', '1M', '--max-connection-per-server', '4'])
dn = os.path.dirname(tmpfilename)
if dn:
cmd += ['--dir', dn]
cmd += ['--out', os.path.basename(tmpfilename)]
for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)]
if info_dict['protocol'] != 'bittorrent':
for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._option('--interface', 'source_address')
cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
@ -240,7 +241,7 @@ class FFmpegFD(ExternalFD):
# setting -seekable prevents ffmpeg from guessing if the server
# supports seeking(by adding the header `Range: bytes=0-`), which
# can cause problems in some cases
# https://github.com/ytdl-org/haruhi-dl/issues/11800#issuecomment-275037127
# https://github.com/ytdl-org/youtube-dl/issues/11800#issuecomment-275037127
# http://trac.ffmpeg.org/ticket/6125#comment:10
args += ['-seekable', '1' if seekable else '0']
@ -317,7 +318,9 @@ class FFmpegFD(ExternalFD):
args += ['-fs', compat_str(self._TEST_FILE_SIZE)]
if protocol in ('m3u8', 'm3u8_native'):
if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
if info_dict['ext'] == 'vtt':
args += ['-f', 'webvtt']
elif self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
args += ['-f', 'mpegts']
else:
args += ['-f', 'mp4']
@ -341,7 +344,7 @@ class FFmpegFD(ExternalFD):
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/ytdl-org/haruhi-dl/issues/8300).
# files (see https://github.com/ytdl-org/youtube-dl/issues/8300).
if sys.platform != 'win32':
proc.communicate(b'q')
raise

View file

@ -324,8 +324,8 @@ class F4mFD(FragmentFD):
urlh = self.hdl.urlopen(self._prepare_url(info_dict, man_url))
man_url = urlh.geturl()
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/ytdl-org/haruhi-dl/issues/6215#issuecomment-121704244
# and https://github.com/ytdl-org/haruhi-dl/issues/7823)
# (see https://github.com/ytdl-org/youtube-dl/issues/6215#issuecomment-121704244
# and https://github.com/ytdl-org/youtube-dl/issues/7823)
manifest = fix_xml_ampersands(urlh.read().decode('utf-8', 'ignore')).strip()
doc = compat_etree_fromstring(manifest)
@ -409,7 +409,7 @@ class F4mFD(FragmentFD):
# In tests, segments may be truncated, and thus
# FlvReader may not be able to parse the whole
# chunk. If so, write the segment as is
# See https://github.com/ytdl-org/haruhi-dl/issues/9214
# See https://github.com/ytdl-org/youtube-dl/issues/9214
dest_stream.write(down_data)
break
raise

View file

@ -97,12 +97,15 @@ class FragmentFD(FileDownloader):
def _download_fragment(self, ctx, frag_url, info_dict, headers=None):
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], ctx['fragment_index'])
success = ctx['dl'].download(fragment_filename, {
fragment_info_dict = {
'url': frag_url,
'http_headers': headers or info_dict.get('http_headers'),
})
}
success = ctx['dl'].download(fragment_filename, fragment_info_dict)
if not success:
return False, None
if fragment_info_dict.get('filetime'):
ctx['fragment_filetime'] = fragment_info_dict.get('filetime')
down, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
frag_content = down.read()
@ -258,6 +261,13 @@ class FragmentFD(FileDownloader):
downloaded_bytes = ctx['complete_frags_downloaded_bytes']
else:
self.try_rename(ctx['tmpfilename'], ctx['filename'])
if self.params.get('updatetime', True):
filetime = ctx.get('fragment_filetime')
if filetime:
try:
os.utime(ctx['filename'], (time.time(), filetime))
except Exception:
pass
downloaded_bytes = os.path.getsize(encodeFilename(ctx['filename']))
self._hook_progress({

View file

@ -42,11 +42,13 @@ class HlsFD(FragmentFD):
# no segments will definitely be appended to the end of the playlist.
# r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# # event media playlists [4]
r'#EXT-X-MAP:', # media initialization [5]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
# 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
# 5. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.5
)
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
is_aes128_enc = '#EXT-X-KEY:METHOD=AES-128' in manifest
@ -152,8 +154,8 @@ class HlsFD(FragmentFD):
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/ytdl-org/haruhi-dl/issues/10165,
# https://github.com/ytdl-org/haruhi-dl/issues/10448).
# See https://github.com/ytdl-org/youtube-dl/issues/10165,
# https://github.com/ytdl-org/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
@ -170,8 +172,12 @@ class HlsFD(FragmentFD):
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.hdl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
# Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
# size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if not test:
frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
self._append_fragment(ctx, frag_content)
# We only download the first fragment during the test
if test:

View file

@ -109,14 +109,16 @@ class HttpFD(FileDownloader):
try:
ctx.data = self.hdl.urlopen(request)
except (compat_urllib_error.URLError, ) as err:
if isinstance(err.reason, socket.timeout):
# reason may not be available, e.g. for urllib2.HTTPError on python 2.6
reason = getattr(err, 'reason', None)
if isinstance(reason, socket.timeout):
raise RetryDownload(err)
raise err
# When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers
# that don't support resuming and serve a whole file with no Content-Range
# set in response despite of requested Range (see
# https://github.com/ytdl-org/haruhi-dl/issues/6057#issuecomment-126129799)
# https://github.com/ytdl-org/youtube-dl/issues/6057#issuecomment-126129799)
if has_range:
content_range = ctx.data.headers.get('Content-Range')
if content_range:

View file

@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
import threading
from .common import FileDownloader
from ..downloader import _get_real_downloader
from ..extractor.niconico import NiconicoIE
from ..compat import compat_urllib_request
class NiconicoDmcFD(FileDownloader):
""" Downloading niconico douga from DMC with heartbeat """
FD_NAME = 'niconico_dmc'
def real_download(self, filename, info_dict):
self.to_screen('[%s] Downloading from DMC' % self.FD_NAME)
ie = NiconicoIE(self.hdl)
info_dict, heartbeat_info_dict = ie._get_heartbeat_info(info_dict)
fd = _get_real_downloader(info_dict, params=self.params)(self.hdl, self.params)
success = download_complete = False
timer = [None]
heartbeat_lock = threading.Lock()
heartbeat_url = heartbeat_info_dict['url']
heartbeat_data = heartbeat_info_dict['data'].encode()
heartbeat_interval = heartbeat_info_dict.get('interval', 30)
def heartbeat():
try:
compat_urllib_request.urlopen(url=heartbeat_url, data=heartbeat_data)
except Exception:
self.to_screen('[%s] Heartbeat failed' % self.FD_NAME)
with heartbeat_lock:
if not download_complete:
timer[0] = threading.Timer(heartbeat_interval, heartbeat)
timer[0].start()
heartbeat_info_dict['ping']()
self.to_screen('[%s] Heartbeat with %d second interval ...' % (self.FD_NAME, heartbeat_interval))
try:
heartbeat()
if type(fd).__name__ == 'HlsFD':
info_dict.update(ie._extract_m3u8_formats(info_dict['url'], info_dict['id'])[0])
success = fd.real_download(filename, info_dict)
finally:
if heartbeat_lock:
with heartbeat_lock:
timer[0].cancel()
download_complete = True
return success

View file

@ -2,7 +2,7 @@ from __future__ import unicode_literals
try:
from .lazy_extractors import *
from .lazy_extractors import _ALL_CLASSES
from .lazy_extractors import _ALL_CLASSES, _SH_CLASSES
_LAZY_LOADER = True
except ImportError:
_LAZY_LOADER = False
@ -14,6 +14,11 @@ except ImportError:
if name.endswith('IE') and name != 'GenericIE'
]
_ALL_CLASSES.append(GenericIE)
_SH_CLASSES = [
klass
for klass in _ALL_CLASSES
if klass._SELFHOSTED is True
]
def gen_extractor_classes():

View file

@ -1,14 +1,15 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import re
import time
from .amp import AMPIE
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import compat_urlparse
from ..utils import (
parse_duration,
parse_iso8601,
try_get,
)
class AbcNewsVideoIE(AMPIE):
@ -18,8 +19,8 @@ class AbcNewsVideoIE(AMPIE):
(?:
abcnews\.go\.com/
(?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-|
video/embed\?.*?\bid=
(?:[^/]+/)*video/(?P<display_id>[0-9a-z-]+)-|
video/(?:embed|itemfeed)\?.*?\bid=
)|
fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/
)
@ -36,6 +37,8 @@ class AbcNewsVideoIE(AMPIE):
'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.',
'duration': 180,
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1380454200,
'upload_date': '20130929',
},
'params': {
# m3u8 download
@ -47,6 +50,12 @@ class AbcNewsVideoIE(AMPIE):
}, {
'url': 'http://abcnews.go.com/2020/video/2020-husband-stands-teacher-jail-student-affairs-26119478',
'only_matching': True,
}, {
'url': 'http://abcnews.go.com/video/itemfeed?id=46979033',
'only_matching': True,
}, {
'url': 'https://abcnews.go.com/GMA/News/video/history-christmas-story-67894761',
'only_matching': True,
}]
def _real_extract(self, url):
@ -67,28 +76,23 @@ class AbcNewsIE(InfoExtractor):
_VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',
# Youtube Embeds
'url': 'https://abcnews.go.com/Entertainment/peter-billingsley-child-actor-christmas-story-hollywood-power/story?id=51286501',
'info_dict': {
'id': '10505354',
'ext': 'flv',
'display_id': 'dramatic-video-rare-death-job-america',
'title': 'Occupational Hazards',
'description': 'Nightline investigates the dangers that lurk at various jobs.',
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20100428',
'timestamp': 1272412800,
'id': '51286501',
'title': "Peter Billingsley: From child actor in 'A Christmas Story' to Hollywood power player",
'description': 'Billingsley went from a child actor to Hollywood power player.',
},
'add_ie': ['AbcNewsVideo'],
'playlist_count': 5,
}, {
'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818',
'info_dict': {
'id': '38897857',
'ext': 'mp4',
'display_id': 'justin-timberlake-performs-stop-feeling-eurovision-2016',
'title': 'Justin Timberlake Drops Hints For Secret Single',
'description': 'Lara Spencer reports the buzziest stories of the day in "GMA" Pop News.',
'upload_date': '20160515',
'timestamp': 1463329500,
'upload_date': '20160505',
'timestamp': 1462442280,
},
'params': {
# m3u8 download
@ -100,49 +104,55 @@ class AbcNewsIE(InfoExtractor):
}, {
'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
'only_matching': True,
}, {
# inline.type == 'video'
'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
video_id = mobj.group('id')
story_id = self._match_id(url)
webpage = self._download_webpage(url, story_id)
story = self._parse_json(self._search_regex(
r"window\['__abcnews__'\]\s*=\s*({.+?});",
webpage, 'data'), story_id)['page']['content']['story']['everscroll'][0]
article_contents = story.get('articleContents') or {}
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'window\.abcnvideo\.url\s*=\s*"([^"]+)"', webpage, 'video URL')
full_video_url = compat_urlparse.urljoin(url, video_url)
def entries():
featured_video = story.get('featuredVideo') or {}
feed = try_get(featured_video, lambda x: x['video']['feed'])
if feed:
yield {
'_type': 'url',
'id': featured_video.get('id'),
'title': featured_video.get('name'),
'url': feed,
'thumbnail': featured_video.get('images'),
'description': featured_video.get('description'),
'timestamp': parse_iso8601(featured_video.get('uploadDate')),
'duration': parse_duration(featured_video.get('duration')),
'ie_key': AbcNewsVideoIE.ie_key(),
}
youtube_url = YoutubeIE._extract_url(webpage)
for inline in (article_contents.get('inlines') or []):
inline_type = inline.get('type')
if inline_type == 'iframe':
iframe_url = try_get(inline, lambda x: x['attrs']['src'])
if iframe_url:
yield self.url_result(iframe_url)
elif inline_type == 'video':
video_id = inline.get('id')
if video_id:
yield {
'_type': 'url',
'id': video_id,
'url': 'http://abcnews.go.com/video/embed?id=' + video_id,
'thumbnail': inline.get('imgSrc') or inline.get('imgDefault'),
'description': inline.get('description'),
'duration': parse_duration(inline.get('duration')),
'ie_key': AbcNewsVideoIE.ie_key(),
}
timestamp = None
date_str = self._html_search_regex(
r'<span[^>]+class="timestamp">([^<]+)</span>',
webpage, 'timestamp', fatal=False)
if date_str:
tz_offset = 0
if date_str.endswith(' ET'): # Eastern Time
tz_offset = -5
date_str = date_str[:-3]
date_formats = ['%b. %d, %Y', '%b %d, %Y, %I:%M %p']
for date_format in date_formats:
try:
timestamp = calendar.timegm(time.strptime(date_str.strip(), date_format))
except ValueError:
continue
if timestamp is not None:
timestamp -= tz_offset * 3600
entry = {
'_type': 'url_transparent',
'ie_key': AbcNewsVideoIE.ie_key(),
'url': full_video_url,
'id': video_id,
'display_id': display_id,
'timestamp': timestamp,
}
if youtube_url:
entries = [entry, self.url_result(youtube_url, ie=YoutubeIE.ie_key())]
return self.playlist_result(entries)
return entry
return self.playlist_result(
entries(), story_id, article_contents.get('headline'),
article_contents.get('subHead'))

View file

@ -2,21 +2,52 @@
from __future__ import unicode_literals
import re
import functools
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
clean_podcast_url,
float_or_none,
int_or_none,
try_get,
unified_timestamp,
OnDemandPagedList,
js_to_json,
parse_iso8601,
urljoin,
ExtractorError,
)
class ACastIE(InfoExtractor):
class ACastBaseIE(InfoExtractor):
def _extract_episode(self, episode, show_info):
title = episode['title']
info = {
'id': episode['id'],
'display_id': episode.get('episodeUrl'),
'url': clean_podcast_url(episode['url']),
'title': title,
'description': clean_html(episode.get('description') or episode.get('summary')),
'thumbnail': episode.get('image'),
'timestamp': parse_iso8601(episode.get('publishDate')),
'duration': int_or_none(episode.get('duration')),
'filesize': int_or_none(episode.get('contentLength')),
'season_number': int_or_none(episode.get('season')),
'episode': title,
'episode_number': int_or_none(episode.get('episode')),
}
info.update(show_info)
return info
def _extract_show_info(self, show):
return {
'creator': show.get('author'),
'series': show.get('title'),
}
def _call_api(self, path, video_id, query=None):
return self._download_json(
'https://feeder.acast.com/api/v1/shows/' + path, video_id, query=query)
class ACastIE(ACastBaseIE):
IE_NAME = 'acast'
_VALID_URL = r'''(?x)
https?://
@ -28,15 +59,15 @@ class ACastIE(InfoExtractor):
'''
_TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': '16d936099ec5ca2d5869e3a813ee8dc4',
'md5': 'f5598f3ad1e4776fed12ec1407153e4b',
'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3',
'title': '2. Raggarmordet - Röster ur det förflutna',
'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4',
'description': 'md5:a992ae67f4d98f1c0141598f7bebbf67',
'timestamp': 1477346700,
'upload_date': '20161024',
'duration': 2766.602563,
'duration': 2766,
'creator': 'Anton Berg & Martin Johnson',
'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna',
@ -45,7 +76,7 @@ class ACastIE(InfoExtractor):
'url': 'http://embed.acast.com/adambuxton/ep.12-adam-joeschristmaspodcast2015',
'only_matching': True,
}, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22',
'url': 'https://play.acast.com/s/rattegangspodden/s04e09styckmordetihelenelund-del2-2',
'only_matching': True,
}, {
'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9',
@ -54,40 +85,14 @@ class ACastIE(InfoExtractor):
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
s = self._download_json(
'https://feeder.acast.com/api/v1/shows/%s/episodes/%s' % (channel, display_id),
display_id)
media_url = s['url']
if re.search(r'[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}', display_id):
episode_url = s.get('episodeUrl')
if episode_url:
display_id = episode_url
else:
channel, display_id = re.match(self._VALID_URL, s['link']).groups()
cast_data = self._download_json(
'https://play-api.acast.com/splash/%s/%s' % (channel, display_id),
display_id)['result']
e = cast_data['episode']
title = e.get('name') or s['title']
return {
'id': compat_str(e['id']),
'display_id': display_id,
'url': media_url,
'title': title,
'description': e.get('summary') or clean_html(e.get('description') or s.get('description')),
'thumbnail': e.get('image'),
'timestamp': unified_timestamp(e.get('publishingDate') or s.get('publishDate')),
'duration': float_or_none(e.get('duration') or s.get('duration')),
'filesize': int_or_none(e.get('contentLength')),
'creator': try_get(cast_data, lambda x: x['show']['author'], compat_str),
'series': try_get(cast_data, lambda x: x['show']['name'], compat_str),
'season_number': int_or_none(e.get('seasonNumber')),
'episode': title,
'episode_number': int_or_none(e.get('episodeNumber')),
}
episode = self._call_api(
'%s/episodes/%s' % (channel, display_id),
display_id, {'showInfo': 'true'})
return self._extract_episode(
episode, self._extract_show_info(episode.get('show') or {}))
class ACastChannelIE(InfoExtractor):
class ACastChannelIE(ACastBaseIE):
IE_NAME = 'acast:channel'
_VALID_URL = r'''(?x)
https?://
@ -102,34 +107,97 @@ class ACastChannelIE(InfoExtractor):
'info_dict': {
'id': '4efc5294-5385-4847-98bd-519799ce5786',
'title': 'Today in Focus',
'description': 'md5:9ba5564de5ce897faeb12963f4537a64',
'description': 'md5:c09ce28c91002ce4ffce71d6504abaae',
},
'playlist_mincount': 35,
'playlist_mincount': 200,
}, {
'url': 'http://play.acast.com/s/ft-banking-weekly',
'only_matching': True,
}]
_API_BASE_URL = 'https://play.acast.com/api/'
_PAGE_SIZE = 10
@classmethod
def suitable(cls, url):
return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
def _fetch_page(self, channel_slug, page):
casts = self._download_json(
self._API_BASE_URL + 'channels/%s/acasts?page=%s' % (channel_slug, page),
channel_slug, note='Download page %d of channel data' % page)
for cast in casts:
yield self.url_result(
'https://play.acast.com/s/%s/%s' % (channel_slug, cast['url']),
'ACast', cast['id'])
def _real_extract(self, url):
show_slug = self._match_id(url)
show = self._call_api(show_slug, show_slug)
show_info = self._extract_show_info(show)
entries = []
for episode in (show.get('episodes') or []):
entries.append(self._extract_episode(episode, show_info))
return self.playlist_result(
entries, show.get('id'), show.get('title'), show.get('description'))
class ACastPlayerIE(InfoExtractor):
IE_NAME = 'acast:player'
_VALID_URL = r'https?://player\.acast\.com/(?:[^/]+/episodes/)?(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://player.acast.com/600595844cac453f8579eca0/episodes/maciej-konieczny-podatek-medialny-to-mechanizm-kontroli?theme=default&latest=1',
'info_dict': {
'id': '601dc897fb37095537d48e6f',
'ext': 'mp3',
'title': 'Maciej Konieczny: "Podatek medialny to bardziej mechanizm kontroli niż podatkowy”',
'upload_date': '20210208',
'timestamp': 1612764000,
},
}, {
'url': 'https://player.acast.com/5d09057251a90dcf7fa8e985?theme=default&latest=1',
'info_dict': {
'id': '5d09057251a90dcf7fa8e985',
'title': 'DGPtalk: Obiektywnie o biznesie',
},
'playlist_mincount': 5,
}]
@staticmethod
def _extract_urls(webpage, **kw):
return [mobj.group('url')
for mobj in re.finditer(
r'(?x)<iframe\b[^>]+\bsrc=(["\'])(?P<url>%s(?:\?[^#]+)?(?:\#.+?)?)\1' % ACastPlayerIE._VALID_URL,
webpage)]
def _real_extract(self, url):
channel_slug = self._match_id(url)
channel_data = self._download_json(
self._API_BASE_URL + 'channels/%s' % channel_slug, channel_slug)
entries = OnDemandPagedList(functools.partial(
self._fetch_page, channel_slug), self._PAGE_SIZE)
return self.playlist_result(entries, compat_str(
channel_data['id']), channel_data['name'], channel_data.get('description'))
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
data = self._parse_json(
js_to_json(
self._search_regex(
r'(?s)var _global\s*=\s*({.+?});',
webpage, 'podcast data')), display_id)
show = data['show']
players = [{
'id': player['_id'],
'title': player['title'],
'url': player['audio'],
'duration': float_or_none(player.get('duration')),
'timestamp': parse_iso8601(player.get('publishDate')),
'thumbnail': urljoin('https://player.acast.com/', player.get('cover')),
'series': show['title'],
'episode': player['title'],
} for player in data['player']]
if len(players) > 1:
info_dict = {
'_type': 'playlist',
'entries': players,
'id': show['_id'],
'title': show['title'],
'series': show['title'],
}
if show.get('cover'):
info_dict['thumbnails'] = [{
'url': urljoin('https://player.acast.com/', show['cover']['url']),
'filesize': int_or_none(show['cover'].get('size')),
}]
return info_dict
if len(players) == 1:
return players[0]
raise ExtractorError('No podcast episodes found')

View file

@ -10,6 +10,7 @@ import random
from .common import InfoExtractor
from ..aes import aes_cbc_decrypt
from ..compat import (
compat_HTTPError,
compat_b64decode,
compat_ord,
)
@ -18,11 +19,14 @@ from ..utils import (
bytes_to_long,
ExtractorError,
float_or_none,
int_or_none,
intlist_to_bytes,
long_to_bytes,
pkcs1pad,
strip_or_none,
urljoin,
try_get,
unified_strdate,
urlencode_postdata,
)
@ -31,16 +35,30 @@ class ADNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?animedigitalnetwork\.fr/video/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
'md5': 'e497370d847fd79d9d4c74be55575c7a',
'md5': '0319c99885ff5547565cacb4f3f9348d',
'info_dict': {
'id': '7778',
'ext': 'mp4',
'title': 'Blue Exorcist - Kyôto Saga - Épisode 1',
'title': 'Blue Exorcist - Kyôto Saga - Episode 1',
'description': 'md5:2f7b5aa76edbc1a7a92cedcda8a528d5',
'series': 'Blue Exorcist - Kyôto Saga',
'duration': 1467,
'release_date': '20170106',
'comment_count': int,
'average_rating': float,
'season_number': 2,
'episode': 'Début des hostilités',
'episode_number': 1,
}
}
_NETRC_MACHINE = 'animedigitalnetwork'
_BASE_URL = 'http://animedigitalnetwork.fr'
_RSA_KEY = (0xc35ae1e4356b65a73b551493da94b8cb443491c0aa092a357a5aee57ffc14dda85326f42d716e539a34542a0d3f363adf16c5ec222d713d5997194030ee2e4f0d1fb328c01a81cf6868c090d50de8e169c6b13d1675b9eeed1cbc51e1fffca9b38af07f37abd790924cd3bee59d0257cfda4fe5f3f0534877e21ce5821447d1b, 65537)
_API_BASE_URL = 'https://gw.api.animedigitalnetwork.fr/'
_PLAYER_BASE_URL = _API_BASE_URL + 'player/'
_HEADERS = {}
_LOGIN_ERR_MESSAGE = 'Unable to log in'
_RSA_KEY = (0x9B42B08905199A5CCE2026274399CA560ECB209EE9878A708B1C0812E1BB8CB5D1FB7441861147C1A1F2F3A0476DD63A9CAC20D3E983613346850AA6CB38F16DC7D720FD7D86FC6E5B3D5BBC72E14CD0BF9E869F2CEA2CCAD648F1DCE38F1FF916CEFB2D339B64AA0264372344BC775E265E8A852F88144AB0BD9AA06C1A4ABB, 65537)
_POS_ALIGN_MAP = {
'start': 1,
'end': 3,
@ -54,26 +72,24 @@ class ADNIE(InfoExtractor):
def _ass_subtitles_timecode(seconds):
return '%01d:%02d:%02d.%02d' % (seconds / 3600, (seconds % 3600) / 60, seconds % 60, (seconds % 1) * 100)
def _get_subtitles(self, sub_path, video_id):
if not sub_path:
def _get_subtitles(self, sub_url, video_id):
if not sub_url:
return None
enc_subtitles = self._download_webpage(
urljoin(self._BASE_URL, sub_path),
video_id, 'Downloading subtitles location', fatal=False) or '{}'
sub_url, video_id, 'Downloading subtitles location', fatal=False) or '{}'
subtitle_location = (self._parse_json(enc_subtitles, video_id, fatal=False) or {}).get('location')
if subtitle_location:
enc_subtitles = self._download_webpage(
urljoin(self._BASE_URL, subtitle_location),
video_id, 'Downloading subtitles data', fatal=False,
headers={'Origin': 'https://animedigitalnetwork.fr'})
subtitle_location, video_id, 'Downloading subtitles data',
fatal=False, headers={'Origin': 'https://animedigitalnetwork.fr'})
if not enc_subtitles:
return None
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
bytes_to_intlist(binascii.unhexlify(self._K + '4b8ef13ec1872730')),
bytes_to_intlist(binascii.unhexlify(self._K + 'ab9f52f5baae7c72')),
bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
))
subtitles_json = self._parse_json(
@ -117,61 +133,100 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
}])
return subtitles
def _real_initialize(self):
username, password = self._get_login_info()
if not username:
return
try:
access_token = (self._download_json(
self._API_BASE_URL + 'authentication/login', None,
'Logging in', self._LOGIN_ERR_MESSAGE, fatal=False,
data=urlencode_postdata({
'password': password,
'rememberMe': False,
'source': 'Web',
'username': username,
})) or {}).get('accessToken')
if access_token:
self._HEADERS = {'authorization': 'Bearer ' + access_token}
except ExtractorError as e:
message = None
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
resp = self._parse_json(
e.cause.read().decode(), None, fatal=False) or {}
message = resp.get('message') or resp.get('code')
self.report_warning(message or self._LOGIN_ERR_MESSAGE)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
player_config = self._parse_json(self._search_regex(
r'playerConfig\s*=\s*({.+});', webpage,
'player config', default='{}'), video_id, fatal=False)
if not player_config:
config_url = urljoin(self._BASE_URL, self._search_regex(
r'(?:id="player"|class="[^"]*adn-player-container[^"]*")[^>]+data-url="([^"]+)"',
webpage, 'config url'))
player_config = self._download_json(
config_url, video_id,
'Downloading player config JSON metadata')['player']
video_base_url = self._PLAYER_BASE_URL + 'video/%s/' % video_id
player = self._download_json(
video_base_url + 'configuration', video_id,
'Downloading player config JSON metadata',
headers=self._HEADERS)['player']
options = player['options']
video_info = {}
video_info_str = self._search_regex(
r'videoInfo\s*=\s*({.+});', webpage,
'video info', fatal=False)
if video_info_str:
video_info = self._parse_json(
video_info_str, video_id, fatal=False) or {}
user = options['user']
if not user.get('hasAccess'):
self.raise_login_required()
options = player_config.get('options') or {}
metas = options.get('metas') or {}
links = player_config.get('links') or {}
sub_path = player_config.get('subtitles')
error = None
if not links:
links_url = player_config.get('linksurl') or options['videoUrl']
token = options['token']
self._K = ''.join([random.choice('0123456789abcdef') for _ in range(16)])
message = bytes_to_intlist(json.dumps({
'k': self._K,
'e': 60,
't': token,
}))
token = self._download_json(
user.get('refreshTokenUrl') or (self._PLAYER_BASE_URL + 'refresh/token'),
video_id, 'Downloading access token', headers={
'x-player-refresh-token': user['refreshToken']
}, data=b'')['token']
links_url = try_get(options, lambda x: x['video']['url']) or (video_base_url + 'link')
self._K = ''.join([random.choice('0123456789abcdef') for _ in range(16)])
message = bytes_to_intlist(json.dumps({
'k': self._K,
't': token,
}))
# Sometimes authentication fails for no good reason, retry with
# a different random padding
links_data = None
for _ in range(3):
padded_message = intlist_to_bytes(pkcs1pad(message, 128))
n, e = self._RSA_KEY
encrypted_message = long_to_bytes(pow(bytes_to_long(padded_message), e, n))
authorization = base64.b64encode(encrypted_message).decode()
links_data = self._download_json(
urljoin(self._BASE_URL, links_url), video_id,
'Downloading links JSON metadata', headers={
'Authorization': 'Bearer ' + authorization,
})
links = links_data.get('links') or {}
metas = metas or links_data.get('meta') or {}
sub_path = sub_path or links_data.get('subtitles') or \
'index.php?option=com_vodapi&task=subtitles.getJSON&format=json&id=' + video_id
sub_path += '&token=' + token
error = links_data.get('error')
title = metas.get('title') or video_info['title']
try:
links_data = self._download_json(
links_url, video_id, 'Downloading links JSON metadata', headers={
'X-Player-Token': authorization
}, query={
'freeWithAds': 'true',
'adaptive': 'false',
'withMetadata': 'true',
'source': 'Web'
})
break
except ExtractorError as e:
if not isinstance(e.cause, compat_HTTPError):
raise e
if e.cause.code == 401:
# This usually goes away with a different random pkcs1pad, so retry
continue
error = self._parse_json(e.cause.read(), video_id)
message = error.get('message')
if e.cause.code == 403 and error.get('code') == 'player-bad-geolocation-country':
self.raise_geo_restricted(msg=message)
raise ExtractorError(message)
else:
raise ExtractorError('Giving up retrying')
links = links_data.get('links') or {}
metas = links_data.get('metadata') or {}
sub_url = (links.get('subtitles') or {}).get('all')
video_info = links_data.get('video') or {}
title = metas['title']
formats = []
for format_id, qualities in links.items():
for format_id, qualities in (links.get('streaming') or {}).items():
if not isinstance(qualities, dict):
continue
for quality, load_balancer_url in qualities.items():
@ -189,19 +244,26 @@ Format: Marked,Start,End,Style,Name,MarginL,MarginR,MarginV,Effect,Text'''
for f in m3u8_formats:
f['language'] = 'fr'
formats.extend(m3u8_formats)
if not error:
error = options.get('error')
if not formats and error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
self._sort_formats(formats)
video = (self._download_json(
self._API_BASE_URL + 'video/%s' % video_id, video_id,
'Downloading additional video metadata', fatal=False) or {}).get('video') or {}
show = video.get('show') or {}
return {
'id': video_id,
'title': title,
'description': strip_or_none(metas.get('summary') or video_info.get('resume')),
'thumbnail': video_info.get('image'),
'description': strip_or_none(metas.get('summary') or video.get('summary')),
'thumbnail': video_info.get('image') or player.get('image'),
'formats': formats,
'subtitles': self.extract_subtitles(sub_path, video_id),
'episode': metas.get('subtitle') or video_info.get('videoTitle'),
'series': video_info.get('playlistTitle'),
'subtitles': self.extract_subtitles(sub_url, video_id),
'episode': metas.get('subtitle') or video.get('name'),
'episode_number': int_or_none(video.get('shortNumber')),
'series': show.get('title'),
'season_number': int_or_none(video.get('season')),
'duration': int_or_none(video_info.get('duration') or video.get('duration')),
'release_date': unified_strdate(video.get('releaseDate')),
'average_rating': float_or_none(video.get('rating') or metas.get('rating')),
'comment_count': int_or_none(video.get('commentsCount')),
}

View file

@ -5,20 +5,32 @@ import re
from .theplatform import ThePlatformIE
from ..utils import (
extract_attributes,
ExtractorError,
GeoRestrictedError,
int_or_none,
smuggle_url,
update_url_query,
)
from ..compat import (
compat_urlparse,
urlencode_postdata,
)
class AENetworksBaseIE(ThePlatformIE):
_BASE_URL_REGEX = r'''(?x)https?://
(?:(?:www|play|watch)\.)?
(?P<domain>
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/'''
_THEPLATFORM_KEY = 'crazyjava'
_THEPLATFORM_SECRET = 's3cr3t'
_DOMAIN_MAP = {
'history.com': ('HISTORY', 'history'),
'aetv.com': ('AETV', 'aetv'),
'mylifetime.com': ('LIFETIME', 'lifetime'),
'lifetimemovieclub.com': ('LIFETIMEMOVIECLUB', 'lmc'),
'fyi.tv': ('FYI', 'fyi'),
'historyvault.com': (None, 'historyvault'),
'biography.com': (None, 'biography'),
}
def _extract_aen_smil(self, smil_url, video_id, auth=None):
query = {'mbr': 'true'}
@ -31,7 +43,7 @@ class AENetworksBaseIE(ThePlatformIE):
'assetTypes': 'high_video_s3'
}, {
'assetTypes': 'high_video_s3',
'switch': 'hls_ingest_fastly'
'switch': 'hls_high_fastly',
}]
formats = []
subtitles = {}
@ -44,6 +56,8 @@ class AENetworksBaseIE(ThePlatformIE):
tp_formats, tp_subtitles = self._extract_theplatform_smil(
m_url, video_id, 'Downloading %s SMIL data' % (q.get('switch') or q['assetTypes']))
except ExtractorError as e:
if isinstance(e, GeoRestrictedError):
raise
last_e = e
continue
formats.extend(tp_formats)
@ -57,24 +71,45 @@ class AENetworksBaseIE(ThePlatformIE):
'subtitles': subtitles,
}
def _extract_aetn_info(self, domain, filter_key, filter_value, url):
requestor_id, brand = self._DOMAIN_MAP[domain]
result = self._download_json(
'https://feeds.video.aetnd.com/api/v2/%s/videos' % brand,
filter_value, query={'filter[%s]' % filter_key: filter_value})['results'][0]
title = result['title']
video_id = result['id']
media_url = result['publicUrl']
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
auth = None
if theplatform_metadata.get('AETN$isBehindWall'):
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._extract_aen_smil(media_url, video_id, auth))
info.update({
'title': title,
'series': result.get('seriesName'),
'season_number': int_or_none(result.get('tvSeasonNumber')),
'episode_number': int_or_none(result.get('tvSeasonEpisodeNumber')),
})
return info
class AENetworksIE(AENetworksBaseIE):
IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?P<domain>
(?:history(?:vault)?|aetv|mylifetime|lifetimemovieclub)\.com|
fyi\.tv
)/
(?:
shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|
movies/(?P<movie_display_id>[^/]+)(?:/full-movie)?|
specials/(?P<special_display_id>[^/]+)/(?:full-special|preview-)|
collections/[^/]+/(?P<collection_display_id>[^/]+)
)
'''
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'''(?P<id>
shows/[^/]+/season-\d+/episode-\d+|
(?:
(?:movie|special)s/[^/]+|
(?:shows/[^/]+/)?videos
)/[^/?#&]+
)'''
_TESTS = [{
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'info_dict': {
@ -91,22 +126,23 @@ class AENetworksIE(AENetworksBaseIE):
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.history.com/shows/ancient-aliens/season-1',
'info_dict': {
'id': '71889446852',
},
'playlist_mincount': 5,
}, {
'url': 'http://www.mylifetime.com/shows/atlanta-plastic',
'info_dict': {
'id': 'SERIES4317',
'title': 'Atlanta Plastic',
},
'playlist_mincount': 2,
'skip': 'This video is only available for users of participating TV providers.',
}, {
'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
'only_matching': True
'info_dict': {
'id': '600587331957',
'ext': 'mp4',
'title': 'Inlawful Entry',
'description': 'md5:57c12115a2b384d883fe64ca50529e08',
'timestamp': 1452634428,
'upload_date': '20160112',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}, {
'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
'only_matching': True
@ -117,78 +153,125 @@ class AENetworksIE(AENetworksBaseIE):
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
'only_matching': True
}, {
'url': 'https://www.lifetimemovieclub.com/movies/a-killer-among-us',
'url': 'https://watch.lifetimemovieclub.com/movies/10-year-reunion/full-movie',
'only_matching': True
}, {
'url': 'http://www.history.com/specials/sniper-into-the-kill-zone/full-special',
'only_matching': True
}, {
'url': 'https://www.historyvault.com/collections/america-the-story-of-us/westward',
'only_matching': True
}, {
'url': 'https://www.aetv.com/specials/hunting-jonbenets-killer-the-untold-story/preview-hunting-jonbenets-killer-the-untold-story',
'only_matching': True
}, {
'url': 'http://www.history.com/videos/history-of-valentines-day',
'only_matching': True
}, {
'url': 'https://play.aetv.com/shows/duck-dynasty/videos/best-of-duck-dynasty-getting-quack-in-shape',
'only_matching': True
}]
_DOMAIN_TO_REQUESTOR_ID = {
'history.com': 'HISTORY',
'aetv.com': 'AETV',
'mylifetime.com': 'LIFETIME',
'lifetimemovieclub.com': 'LIFETIMEMOVIECLUB',
'fyi.tv': 'FYI',
}
def _real_extract(self, url):
domain, show_path, movie_display_id, special_display_id, collection_display_id = re.match(self._VALID_URL, url).groups()
display_id = show_path or movie_display_id or special_display_id or collection_display_id
webpage = self._download_webpage(url, display_id, headers=self.geo_verification_headers())
if show_path:
url_parts = show_path.split('/')
url_parts_len = len(url_parts)
if url_parts_len == 1:
entries = []
for season_url_path in re.findall(r'(?s)<li[^>]+data-href="(/shows/%s/season-\d+)"' % url_parts[0], webpage):
entries.append(self.url_result(
compat_urlparse.urljoin(url, season_url_path), 'AENetworks'))
if entries:
return self.playlist_result(
entries, self._html_search_meta('aetn:SeriesId', webpage),
self._html_search_meta('aetn:SeriesTitle', webpage))
else:
# single season
url_parts_len = 2
if url_parts_len == 2:
entries = []
for episode_item in re.findall(r'(?s)<[^>]+class="[^"]*(?:episode|program)-item[^"]*"[^>]*>', webpage):
episode_attributes = extract_attributes(episode_item)
episode_url = compat_urlparse.urljoin(
url, episode_attributes['data-canonical'])
entries.append(self.url_result(
episode_url, 'AENetworks',
episode_attributes.get('data-videoid') or episode_attributes.get('data-video-id')))
return self.playlist_result(
entries, self._html_search_meta('aetn:SeasonId', webpage))
domain, canonical = re.match(self._VALID_URL, url).groups()
return self._extract_aetn_info(domain, 'canonical', '/' + canonical, url)
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex(
[r"media_url\s*=\s*'(?P<url>[^']+)'",
r'data-media-url=(?P<url>(?:https?:)?//[^\s>]+)',
r'data-media-url=(["\'])(?P<url>(?:(?!\1).)+?)\1'],
webpage, 'video url', group='url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'https?://link\.theplatform\.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
auth = None
if theplatform_metadata.get('AETN$isBehindWall'):
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
resource = self._get_mvpd_resource(
requestor_id, theplatform_metadata['title'],
theplatform_metadata.get('AETN$PPL_pplProgramId') or theplatform_metadata.get('AETN$PPL_pplProgramId_OLD'),
theplatform_metadata['ratings'][0]['rating'])
auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource)
info.update(self._search_json_ld(webpage, video_id, fatal=False))
info.update(self._extract_aen_smil(media_url, video_id, auth))
return info
class AENetworksListBaseIE(AENetworksBaseIE):
def _call_api(self, resource, slug, brand, fields):
return self._download_json(
'https://yoga.appsvcs.aetnd.com/graphql',
slug, query={'brand': brand}, data=urlencode_postdata({
'query': '''{
%s(slug: "%s") {
%s
}
}''' % (resource, slug, fields),
}))['data'][resource]
def _real_extract(self, url):
domain, slug = re.match(self._VALID_URL, url).groups()
_, brand = self._DOMAIN_MAP[domain]
playlist = self._call_api(self._RESOURCE, slug, brand, self._FIELDS)
base_url = 'http://watch.%s' % domain
entries = []
for item in (playlist.get(self._ITEMS_KEY) or []):
doc = self._get_doc(item)
canonical = doc.get('canonical')
if not canonical:
continue
entries.append(self.url_result(
base_url + canonical, AENetworksIE.ie_key(), doc.get('id')))
description = None
if self._PLAYLIST_DESCRIPTION_KEY:
description = playlist.get(self._PLAYLIST_DESCRIPTION_KEY)
return self.playlist_result(
entries, playlist.get('id'),
playlist.get(self._PLAYLIST_TITLE_KEY), description)
class AENetworksCollectionIE(AENetworksListBaseIE):
IE_NAME = 'aenetworks:collection'
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'(?:[^/]+/)*(?:list|collections)/(?P<id>[^/?#&]+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'https://watch.historyvault.com/list/america-the-story-of-us',
'info_dict': {
'id': '282',
'title': 'America The Story of Us',
},
'playlist_mincount': 12,
}, {
'url': 'https://watch.historyvault.com/shows/america-the-story-of-us-2/season-1/list/america-the-story-of-us',
'only_matching': True
}, {
'url': 'https://www.historyvault.com/collections/mysteryquest',
'only_matching': True
}]
_RESOURCE = 'list'
_ITEMS_KEY = 'items'
_PLAYLIST_TITLE_KEY = 'display_title'
_PLAYLIST_DESCRIPTION_KEY = None
_FIELDS = '''id
display_title
items {
... on ListVideoItem {
doc {
canonical
id
}
}
}'''
def _get_doc(self, item):
return item.get('doc') or {}
class AENetworksShowIE(AENetworksListBaseIE):
IE_NAME = 'aenetworks:show'
_VALID_URL = AENetworksBaseIE._BASE_URL_REGEX + r'shows/(?P<id>[^/?#&]+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'http://www.history.com/shows/ancient-aliens',
'info_dict': {
'id': 'SERIES1574',
'title': 'Ancient Aliens',
'description': 'md5:3f6d74daf2672ff3ae29ed732e37ea7f',
},
'playlist_mincount': 150,
}]
_RESOURCE = 'series'
_ITEMS_KEY = 'episodes'
_PLAYLIST_TITLE_KEY = 'title'
_PLAYLIST_DESCRIPTION_KEY = 'description'
_FIELDS = '''description
id
title
episodes {
canonical
id
}'''
def _get_doc(self, item):
return item
class HistoryTopicIE(AENetworksBaseIE):
@ -204,6 +287,7 @@ class HistoryTopicIE(AENetworksBaseIE):
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
'timestamp': 1375819729,
'upload_date': '20130806',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
@ -212,36 +296,47 @@ class HistoryTopicIE(AENetworksBaseIE):
'add_ie': ['ThePlatform'],
}]
def theplatform_url_result(self, theplatform_url, video_id, query):
return {
'_type': 'url_transparent',
'id': video_id,
'url': smuggle_url(
update_url_query(theplatform_url, query),
{
'sig': {
'key': self._THEPLATFORM_KEY,
'secret': self._THEPLATFORM_SECRET,
},
'force_smil_url': True
}),
'ie_key': 'ThePlatform',
}
def _real_extract(self, url):
display_id = self._match_id(url)
return self.url_result(
'http://www.history.com/videos/' + display_id,
AENetworksIE.ie_key())
class HistoryPlayerIE(AENetworksBaseIE):
IE_NAME = 'history:player'
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|biography)\.com)/player/(?P<id>\d+)'
_TESTS = []
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_aetn_info(domain, 'id', video_id, url)
class BiographyIE(AENetworksBaseIE):
_VALID_URL = r'https?://(?:www\.)?biography\.com/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.biography.com/video/vincent-van-gogh-full-episode-2075049808',
'info_dict': {
'id': '30322987',
'ext': 'mp4',
'title': 'Vincent Van Gogh - Full Episode',
'description': 'A full biography about the most influential 20th century painter, Vincent Van Gogh.',
'timestamp': 1311970571,
'upload_date': '20110729',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['ThePlatform'],
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'<phoenix-iframe[^>]+src="[^"]+\btpid=(\d+)', webpage, 'tpid')
result = self._download_json(
'https://feeds.video.aetnd.com/api/v2/history/videos',
video_id, query={'filter[id]': video_id})['results'][0]
title = result['title']
info = self._extract_aen_smil(result['publicUrl'], video_id)
info.update({
'title': title,
'description': result.get('description'),
'duration': int_or_none(result.get('duration')),
'timestamp': int_or_none(result.get('added'), 1000),
})
return info
player_url = self._search_regex(
r'<phoenix-iframe[^>]+src="(%s)' % HistoryPlayerIE._VALID_URL,
webpage, 'player URL')
return self.url_result(player_url, HistoryPlayerIE.ie_key())

View file

@ -0,0 +1,253 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import uuid
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
ExtractorError,
)
# this id is not an article id, it has to be extracted from the article
class WyborczaVideoIE(InfoExtractor):
_VALID_URL = r'wyborcza:video:(?P<id>\d+)'
IE_NAME = 'wyborcza:video'
_TESTS = [{
'url': 'wyborcza:video:26207634',
'info_dict': {
'id': '26207634',
'ext': 'mp4',
'title': '- Polska w 2020 r. jest innym państwem niż w 2015 r. Nie zmieniła się konstytucja, ale jest to już inny ustrój - mówi Adam Bodnar',
'description': ' ',
'uploader': 'Dorota Roman',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
meta = self._download_json('https://wyborcza.pl/api-video/%s' % video_id, video_id)
formats = []
base_url = meta['redirector'].replace('http://', 'https://') + meta['basePath']
for quality in ('standard', 'high'):
if not meta['files'].get(quality):
continue
formats.append({
'url': base_url + meta['files'][quality],
'height': int_or_none(
self._search_regex(
r'p(\d+)[a-z]+\.mp4$', meta['files'][quality],
'mp4 video height', default=None)),
'format_id': quality,
})
if meta['files'].get('dash'):
formats.extend(
self._extract_mpd_formats(
base_url + meta['files']['dash'], video_id))
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'title': meta['title'],
'description': meta.get('lead'),
'uploader': meta.get('signature'),
'thumbnail': meta.get('imageUrl'),
'duration': meta.get('duration'),
}
class WyborczaPodcastIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?:www\.)?
(?:wyborcza\.pl/podcast(?:/0,172673\.html)?
|wysokieobcasy\.pl/wysokie-obcasy/0,176631\.html)
(?:\?(?:[^&]+?&)*?podcast=(?P<episode_id>\d+))?
'''
_TESTS = [{
'url': 'https://wyborcza.pl/podcast/0,172673.html?podcast=100720#S.main_topic-K.C-B.6-L.1.podcast',
'info_dict': {
'id': '100720',
'ext': 'mp3',
'title': 'Cyfrodziewczyny. Kim były pionierki polskiej informatyki ',
'uploader': 'Michał Nogaś ',
'upload_date': '20210117',
'description': 'md5:49f0a06ffc4c1931210d3ab1416a651d',
},
}, {
'url': 'https://www.wysokieobcasy.pl/wysokie-obcasy/0,176631.html?podcast=100673',
'info_dict': {
'id': '100673',
'ext': 'mp3',
'title': 'Czym jest ubóstwo menstruacyjne i dlaczego dotyczy każdej i każdego z nas?',
'uploader': 'Agnieszka Urazińska ',
'upload_date': '20210115',
'description': 'md5:c161dc035f8dbb60077011fc41274899',
},
}, {
'url': 'https://wyborcza.pl/podcast',
'info_dict': {
'id': '334',
'title': 'Gościnnie w TOK FM: Wyborcza, 8:10',
},
'playlist_mincount': 370,
}, {
'url': 'https://www.wysokieobcasy.pl/wysokie-obcasy/0,176631.html',
'info_dict': {
'id': '395',
'title': 'Gościnnie w TOK FM: Wysokie Obcasy',
},
'playlist_mincount': 12,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
podcast_id = mobj.group('episode_id')
# linking to the playlist and not specific episode
if not podcast_id:
return {
'_type': 'url',
'url': 'tokfm:audition:%s' % ('395' if 'wysokieobcasy.pl/' in url else '334'),
'ie_key': 'TokFMAudition',
}
meta = self._download_json('https://wyborcza.pl/api/podcast?guid=%s%s' % (podcast_id,
'&type=wo' if 'wysokieobcasy.pl/' in url else ''),
podcast_id)
published_date = meta['publishedDate'].split(' ')
upload_date = '%s%s%s' % (published_date[2], {
'stycznia': '01',
'lutego': '02',
'marca': '03',
'kwietnia': '04',
'maja': '05',
'czerwca': '06',
'lipca': '07',
'sierpnia': '08',
'września': '09',
'października': '10',
'listopada': '11',
'grudnia': '12',
}.get(published_date[1]), ('0' + published_date[0])[-2:])
return {
'id': podcast_id,
'title': meta['title'],
'url': meta['url'],
'description': meta.get('description'),
'thumbnail': meta.get('imageUrl'),
'duration': parse_duration(meta.get('duration')),
'uploader': meta.get('author'),
'upload_date': upload_date,
}
class TokFMPodcastIE(InfoExtractor):
_VALID_URL = r'(?:https?://audycje\.tokfm\.pl/podcast/|tokfm:podcast:)(?P<id>\d+),?'
IE_NAME = 'tokfm:podcast'
_TESTS = [{
'url': 'https://audycje.tokfm.pl/podcast/91275,-Systemowy-rasizm-Czy-zamieszki-w-USA-po-morderstwie-w-Minneapolis-doprowadza-do-zmian-w-sluzbach-panstwowych',
'info_dict': {
'id': '91275',
'ext': 'mp3',
'title': '"Systemowy rasizm." Czy zamieszki w USA po morderstwie w Minneapolis doprowadzą do zmian w służbach państwowych?',
'series': 'Analizy',
},
}]
def _real_extract(self, url):
media_id = self._match_id(url)
metadata = self._download_json(
# why the fuck does this start with 3??????
# in case it breaks see this but it returns a lot of useless data
# https://api.podcast.radioagora.pl/api4/getPodcasts?podcast_id=100091&with_guests=true&with_leaders_for_mobile=true
'https://audycje.tokfm.pl/getp/3%s' % (media_id),
media_id, 'Downloading podcast metadata')
if len(metadata) == 0:
raise ExtractorError('No such podcast')
metadata = metadata[0]
formats = []
for ext in ('aac', 'mp3'):
url_data = self._download_json(
'https://api.podcast.radioagora.pl/api4/getSongUrl?podcast_id=%s&device_id=%s&ppre=false&audio=%s' % (media_id, uuid.uuid4(), ext),
media_id, 'Downloading podcast %s URL' % ext)
# prevents inserting the mp3 (default) multiple times
if 'link_ssl' in url_data and ('.%s' % ext) in url_data['link_ssl']:
formats.append({
'url': url_data['link_ssl'],
'ext': ext,
})
return {
'id': media_id,
'formats': formats,
'title': metadata['podcast_name'],
'series': metadata.get('series_name'),
'episode': metadata['podcast_name'],
}
class TokFMAuditionIE(InfoExtractor):
_VALID_URL = r'(?:https?://audycje\.tokfm\.pl/audycja/|tokfm:audition:)(?P<id>\d+),?'
IE_NAME = 'tokfm:audition'
_TESTS = [{
'url': 'https://audycje.tokfm.pl/audycja/218,Analizy',
'info_dict': {
'id': '218',
'title': 'Analizy',
'series': 'Analizy',
},
'playlist_count': 1635,
}]
def _real_extract(self, url):
audition_id = self._match_id(url)
headers = {
'User-Agent': 'Mozilla/5.0 (Linux; Android 9; Redmi 3S Build/PQ3A.190801.002; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/87.0.4280.101 Mobile Safari/537.36',
}
data = self._download_json(
'https://api.podcast.radioagora.pl/api4/getSeries?series_id=%s' % (audition_id),
audition_id, 'Downloading audition metadata', headers=headers)
if len(data) == 0:
raise ExtractorError('No such audition')
data = data[0]
entries = []
for page in range(0, (int(data['total_podcasts']) // 30) + 1):
podcast_page = False
retries = 0
while retries <= 5 and podcast_page is False:
podcast_page = self._download_json(
'https://api.podcast.radioagora.pl/api4/getPodcasts?series_id=%s&limit=30&offset=%d&with_guests=true&with_leaders_for_mobile=true' % (audition_id, page),
audition_id, 'Downloading podcast list (page #%d%s)' % (
page + 1,
(', try %d' % retries) if retries > 0 else ''),
headers=headers)
retries += 1
if podcast_page is False:
raise ExtractorError('Agora returned shit 5 times in a row', expected=True)
for podcast in podcast_page:
entries.append({
'_type': 'url_transparent',
'url': podcast['podcast_sharing_url'],
'title': podcast['podcast_name'],
'episode': podcast['podcast_name'],
'description': podcast.get('podcast_description'),
'timestamp': int_or_none(podcast.get('podcast_timestamp')),
'series': data['series_name'],
})
return {
'_type': 'playlist',
'id': audition_id,
'title': data['series_name'],
'series': data['series_name'],
'entries': entries,
}

View file

@ -0,0 +1,95 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
int_or_none,
)
from ..compat import compat_urllib_parse_urlencode
class AlbiclaIE(InfoExtractor):
_VALID_URL = r'https?://albicla\.com/[a-zA-Z\d]+/post/(?P<id>\d+)'
_LOGIN_REQUIRED = True
_NETRC_MACHINE = 'albicla'
_TESTS = [{
'url': 'https://albicla.com/PolandDailycom/post/1000270222',
'info_dict': {
'id': '1000270222',
'uploader': 'PolandDailycom',
},
'playlist_count': 1,
'params': {
'username': 'albicla@haruhi.download',
'password': 'fedupwithallthis',
'extract_flat': True,
},
}]
def _login(self):
email, password = self._get_login_info()
if not email:
self.report_warning('No Albicla login data found; use --username and --password or --netrc to provide them')
# if not self._downloader.cookiejar
self._download_webpage('https://albicla.com/login', 'login', 'Logging in',
data=bytes(compat_urllib_parse_urlencode({
'email': email,
'pass': password,
'remember': 'remember-me',
'signin': 'zaloguj',
}).encode('utf-8')), headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Origin': 'https://albicla.com',
'Referer': 'https://albicla.com/login',
})
def _real_initialize(self):
self._login()
def _real_extract(self, url):
post_id = self._match_id(url)
webpage = self._download_webpage(url, post_id)
post = re.search(r'''(?xs)
<div\b[^>]+\bclass="post-item">.+?
<p\b[^>]+>@(?P<username>[a-zA-Z\d]+).+?
<span\b[^>]+\bdata-timestamp="(?P<timestamp>\d+)".+?
<div\b[^>]+\bclass="user-post">\s+
<p\b[^>]*>(?P<content>[^<]*)</p>\s+
(?:<div\b[^>]+\bclass="card-full[ ]yt"[^>]*>
<iframe\b[^>]+\bsrc="(?P<yt_url>https?://(?:www\.)?youtube(?:-nocookie)?\.com/embed/[a-zA-Z\d_-]{11})"[^>]*>\s*</iframe>)?
(?:.+?<i\b[^>]+\bclass="fa[ ]fa-comment[^"]*"></i>\s*(?P<comments>\d+)</button>)?
(?:.+?<i\b[^>]+\bclass="fa[ ]fa-retweet"></i>\s*<span[^>]+>\s*(?P<forwards>\d+)</span>)?
(?:.+?<i\b[^>]+\bclass="fa[ ]fa-heart"></i>\s*<span[^>]+>\s*(?P<likes>\d+)</span>)?
''', webpage)
if not post:
raise ExtractorError('Could not extract post content')
content, yt_url, comment_count, repost_count, like_count, uploader, timestamp = post.group('content', 'yt_url', 'comments', 'forwards', 'likes', 'username', 'timestamp')
if not yt_url:
raise ExtractorError('Could not find youtube embed in the post')
return {
'_type': 'playlist',
'id': post_id,
'title': clean_html(content),
'entries': [{
'_type': 'url',
'url': yt_url,
'ie_key': 'Youtube',
}],
'uploader': uploader,
'uploader_url': 'https://albicla.com/%s' % uploader,
'comment_count': int_or_none(comment_count),
'repost_count': int_or_none(repost_count),
'like_count': int_or_none(like_count),
}

View file

@ -1,13 +1,16 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?aljazeera\.com/(?:programmes|video)/.*?/(?P<id>[^/]+)\.html'
_VALID_URL = r'https?://(?:www\.)?aljazeera\.com/(?P<type>program/[^/]+|(?:feature|video)s)/\d{4}/\d{1,2}/\d{1,2}/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',
'url': 'https://www.aljazeera.com/program/episode/2014/9/19/deliverance',
'info_dict': {
'id': '3792260579001',
'ext': 'mp4',
@ -20,14 +23,34 @@ class AlJazeeraIE(InfoExtractor):
'add_ie': ['BrightcoveNew'],
'skip': 'Not accessible from Travis CI server',
}, {
'url': 'http://www.aljazeera.com/video/news/2017/05/sierra-leone-709-carat-diamond-auctioned-170511100111930.html',
'url': 'https://www.aljazeera.com/videos/2017/5/11/sierra-leone-709-carat-diamond-to-be-auctioned-off',
'only_matching': True,
}, {
'url': 'https://www.aljazeera.com/features/2017/8/21/transforming-pakistans-buses-into-art',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/665003303001/default_default/index.html?videoId=%s'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
def _real_extract(self, url):
program_name = self._match_id(url)
webpage = self._download_webpage(url, program_name)
brightcove_id = self._search_regex(
r'RenderPagesVideo\(\'(.+?)\'', webpage, 'brightcove id')
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
post_type, name = re.match(self._VALID_URL, url).groups()
post_type = {
'features': 'post',
'program': 'episode',
'videos': 'video',
}[post_type.split('/')[0]]
video = self._download_json(
'https://www.aljazeera.com/graphql', name, query={
'operationName': 'SingleArticleQuery',
'variables': json.dumps({
'name': name,
'postType': post_type,
}),
}, headers={
'wp-site': 'aje',
})['data']['article']['video']
video_id = video['id']
account_id = video.get('accountId') or '665003303001'
player_id = video.get('playerId') or 'BkeSH5BDb'
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % (account_id, player_id, video_id),
'BrightcoveNew', video_id)

View file

@ -0,0 +1,103 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .youtube import YoutubeIE
from .vimeo import VimeoIE
from ..utils import (
int_or_none,
parse_iso8601,
update_url_query,
)
class AmaraIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?amara\.org/(?:\w+/)?videos/(?P<id>\w+)'
_TESTS = [{
# Youtube
'url': 'https://amara.org/en/videos/jVx79ZKGK1ky/info/why-jury-trials-are-becoming-less-common/?tab=video',
'md5': 'ea10daf2b6154b8c1ecf9922aca5e8ae',
'info_dict': {
'id': 'h6ZuVdvYnfE',
'ext': 'mp4',
'title': 'Why jury trials are becoming less common',
'description': 'md5:a61811c319943960b6ab1c23e0cbc2c1',
'thumbnail': r're:^https?://.*\.jpg$',
'subtitles': dict,
'upload_date': '20160813',
'uploader': 'PBS NewsHour',
'uploader_id': 'PBSNewsHour',
'timestamp': 1549639570,
}
}, {
# Vimeo
'url': 'https://amara.org/en/videos/kYkK1VUTWW5I/info/vimeo-at-ces-2011',
'md5': '99392c75fa05d432a8f11df03612195e',
'info_dict': {
'id': '18622084',
'ext': 'mov',
'title': 'Vimeo at CES 2011!',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'subtitles': dict,
'timestamp': 1294763658,
'upload_date': '20110111',
'uploader': 'Sam Morrill',
'uploader_id': 'sammorrill'
}
}, {
# Direct Link
'url': 'https://amara.org/en/videos/s8KL7I3jLmh6/info/the-danger-of-a-single-story/',
'md5': 'd3970f08512738ee60c5807311ff5d3f',
'info_dict': {
'id': 's8KL7I3jLmh6',
'ext': 'mp4',
'title': 'The danger of a single story',
'description': 'md5:d769b31139c3b8bb5be9177f62ea3f23',
'thumbnail': r're:^https?://.*\.jpg$',
'subtitles': dict,
'upload_date': '20091007',
'timestamp': 1254942511,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
meta = self._download_json(
'https://amara.org/api/videos/%s/' % video_id,
video_id, query={'format': 'json'})
title = meta['title']
video_url = meta['all_urls'][0]
subtitles = {}
for language in (meta.get('languages') or []):
subtitles_uri = language.get('subtitles_uri')
if not (subtitles_uri and language.get('published')):
continue
subtitle = subtitles.setdefault(language.get('code') or 'en', [])
for f in ('json', 'srt', 'vtt'):
subtitle.append({
'ext': f,
'url': update_url_query(subtitles_uri, {'format': f}),
})
info = {
'url': video_url,
'id': video_id,
'subtitles': subtitles,
'title': title,
'description': meta.get('description'),
'thumbnail': meta.get('thumbnail'),
'duration': int_or_none(meta.get('duration')),
'timestamp': parse_iso8601(meta.get('created')),
}
for ie in (YoutubeIE, VimeoIE):
if ie.suitable(video_url):
info.update({
'_type': 'url_transparent',
'ie_key': ie.ie_key(),
})
break
return info

View file

@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .theplatform import ThePlatformIE
from ..utils import (
int_or_none,
@ -11,25 +13,22 @@ from ..utils import (
class AMCNetworksIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?:movies|shows(?:/[^/]+)+)/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?P<site>amc|bbcamerica|ifc|(?:we|sundance)tv)\.com/(?P<id>(?:movies|shows(?:/[^/]+)+)/[^/?#&]+)'
_TESTS = [{
'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
'md5': '',
'url': 'https://www.bbcamerica.com/shows/the-graham-norton-show/videos/tina-feys-adorable-airline-themed-family-dinner--51631',
'info_dict': {
'id': 's3MX01Nl4vPH',
'id': '4Lq1dzOnZGt0',
'ext': 'mp4',
'title': 'Maron - Season 4 - Step 1',
'description': 'In denial about his current situation, Marc is reluctantly convinced by his friends to enter rehab. Starring Marc Maron and Constance Zimmer.',
'age_limit': 17,
'upload_date': '20160505',
'timestamp': 1462468831,
'title': "The Graham Norton Show - Season 28 - Tina Fey's Adorable Airline-Themed Family Dinner",
'description': "It turns out child stewardesses are very generous with the wine! All-new episodes of 'The Graham Norton Show' premiere Fridays at 11/10c on BBC America.",
'upload_date': '20201120',
'timestamp': 1605904350,
'uploader': 'AMCN',
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Requires TV provider accounts',
}, {
'url': 'http://www.bbcamerica.com/shows/the-hunt/full-episodes/season-1/episode-01-the-hardest-challenge',
'only_matching': True,
@ -55,32 +54,34 @@ class AMCNetworksIE(ThePlatformIE):
'url': 'https://www.sundancetv.com/shows/riviera/full-episodes/season-1/episode-01-episode-1',
'only_matching': True,
}]
_REQUESTOR_ID_MAP = {
'amc': 'AMC',
'bbcamerica': 'BBCA',
'ifc': 'IFC',
'sundancetv': 'SUNDANCE',
'wetv': 'WETV',
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
site, display_id = re.match(self._VALID_URL, url).groups()
requestor_id = self._REQUESTOR_ID_MAP[site]
properties = self._download_json(
'https://content-delivery-gw.svc.ds.amcn.com/api/v2/content/amcn/%s/url/%s' % (requestor_id.lower(), display_id),
display_id)['data']['properties']
query = {
'mbr': 'true',
'manifest': 'm3u',
}
media_url = self._search_regex(
r'window\.platformLinkURL\s*=\s*[\'"]([^\'"]+)',
webpage, 'media url')
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
r'link\.theplatform\.com/s/([^?]+)',
media_url, 'theplatform_path'), display_id)
tp_path = 'M_UwQC/media/' + properties['videoPid']
media_url = 'https://link.theplatform.com/s/' + tp_path
theplatform_metadata = self._download_theplatform_metadata(tp_path, display_id)
info = self._parse_theplatform_metadata(theplatform_metadata)
video_id = theplatform_metadata['pid']
title = theplatform_metadata['title']
rating = try_get(
theplatform_metadata, lambda x: x['ratings'][0]['rating'])
auth_required = self._search_regex(
r'window\.authRequired\s*=\s*(true|false);',
webpage, 'auth required')
if auth_required == 'true':
requestor_id = self._search_regex(
r'window\.requestor_id\s*=\s*[\'"]([^\'"]+)',
webpage, 'requestor id')
video_category = properties.get('videoCategory')
if video_category and video_category.endswith('-Auth'):
resource = self._get_mvpd_resource(
requestor_id, title, video_id, rating)
query['auth'] = self._extract_mvpd_auth(

View file

@ -1,82 +1,159 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
int_or_none,
js_to_json,
try_get,
unified_strdate,
unified_timestamp,
)
class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?(?:americastestkitchen|cooks(?:country|illustrated))\.com/(?P<resource_type>episode|videos)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': {
'id': '5b400b9ee338f922cb06450c',
'title': 'Weeknight Japanese Suppers',
'title': 'Japanese Suppers',
'ext': 'mp4',
'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8',
'description': 'md5:64e606bfee910627efc4b5f050de92b3',
'thumbnail': r're:^https?://',
'timestamp': 1523664000,
'upload_date': '20180414',
'release_date': '20180414',
'timestamp': 1523318400,
'upload_date': '20180410',
'release_date': '20180410',
'series': "America's Test Kitchen",
'season_number': 18,
'episode': 'Weeknight Japanese Suppers',
'episode': 'Japanese Suppers',
'episode_number': 15,
},
'params': {
'skip_download': True,
},
}, {
# Metadata parsing behaves differently for newer episodes (705) as opposed to older episodes (582 above)
'url': 'https://www.americastestkitchen.com/episode/705-simple-chicken-dinner',
'md5': '06451608c57651e985a498e69cec17e5',
'info_dict': {
'id': '5fbe8c61bda2010001c6763b',
'title': 'Simple Chicken Dinner',
'ext': 'mp4',
'description': 'md5:eb68737cc2fd4c26ca7db30139d109e7',
'thumbnail': r're:^https?://',
'timestamp': 1610755200,
'upload_date': '20210116',
'release_date': '20210116',
'series': "America's Test Kitchen",
'season_number': 21,
'episode': 'Simple Chicken Dinner',
'episode_number': 3,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.americastestkitchen.com/videos/3420-pan-seared-salmon',
'only_matching': True,
}, {
'url': 'https://www.cookscountry.com/episode/564-when-only-chocolate-will-do',
'only_matching': True,
}, {
'url': 'https://www.cooksillustrated.com/videos/4478-beef-wellington',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
resource_type, video_id = re.match(self._VALID_URL, url).groups()
is_episode = resource_type == 'episode'
if is_episode:
resource_type = 'episodes'
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(
self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'),
video_id, js_to_json)
ep_data = try_get(
video_data,
(lambda x: x['episodeDetail']['content']['data'],
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
'description') or ep_meta.get('description'))
thumbnail = try_get(ep_meta, lambda x: x['photo']['image_url'])
release_date = unified_strdate(ep_data.get('aired_at'))
season_number = int_or_none(ep_meta.get('season_number'))
episode = ep_meta.get('title')
episode_number = int_or_none(ep_meta.get('episode_number'))
resource = self._download_json(
'https://www.americastestkitchen.com/api/v6/%s/%s' % (resource_type, video_id), video_id)
video = resource['video'] if is_episode else resource
episode = resource if is_episode else resource.get('episode') or {}
return {
'_type': 'url_transparent',
'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id,
'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % video['zypeId'],
'ie_key': 'Zype',
'title': title,
'description': description,
'thumbnail': thumbnail,
'release_date': release_date,
'series': "America's Test Kitchen",
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
'description': clean_html(video.get('description')),
'timestamp': unified_timestamp(video.get('publishDate')),
'release_date': unified_strdate(video.get('publishDate')),
'episode_number': int_or_none(episode.get('number')),
'season_number': int_or_none(episode.get('season')),
'series': try_get(episode, lambda x: x['show']['title']),
'episode': episode.get('title'),
}
class AmericasTestKitchenSeasonIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<show>americastestkitchen|cookscountry)\.com/episodes/browse/season_(?P<id>\d+)'
_TESTS = [{
# ATK Season
'url': 'https://www.americastestkitchen.com/episodes/browse/season_1',
'info_dict': {
'id': 'season_1',
'title': 'Season 1',
},
'playlist_count': 13,
}, {
# Cooks Country Season
'url': 'https://www.cookscountry.com/episodes/browse/season_12',
'info_dict': {
'id': 'season_12',
'title': 'Season 12',
},
'playlist_count': 13,
}]
def _real_extract(self, url):
show_name, season_number = re.match(self._VALID_URL, url).groups()
season_number = int(season_number)
slug = 'atk' if show_name == 'americastestkitchen' else 'cco'
season = 'Season %d' % season_number
season_search = self._download_json(
'https://y1fnzxui30-dsn.algolia.net/1/indexes/everest_search_%s_season_desc_production' % slug,
season, headers={
'Origin': 'https://www.%s.com' % show_name,
'X-Algolia-API-Key': '8d504d0099ed27c1b73708d22871d805',
'X-Algolia-Application-Id': 'Y1FNZXUI30',
}, query={
'facetFilters': json.dumps([
'search_season_list:' + season,
'search_document_klass:episode',
'search_show_slug:' + slug,
]),
'attributesToRetrieve': 'description,search_%s_episode_number,search_document_date,search_url,title' % slug,
'attributesToHighlight': '',
'hitsPerPage': 1000,
})
def entries():
for episode in (season_search.get('hits') or []):
search_url = episode.get('search_url')
if not search_url:
continue
yield {
'_type': 'url',
'url': 'https://www.%s.com%s' % (show_name, search_url),
'id': try_get(episode, lambda e: e['objectID'].split('_')[-1]),
'title': episode.get('title'),
'description': episode.get('description'),
'timestamp': unified_timestamp(episode.get('search_document_date')),
'season_number': season_number,
'episode_number': int_or_none(episode.get('search_%s_episode_number' % slug)),
'ie_key': AmericasTestKitchenIE.ie_key(),
}
return self.playlist_result(
entries(), 'season_%d' % season_number, season)

View file

@ -8,6 +8,7 @@ from ..utils import (
int_or_none,
mimetype2ext,
parse_iso8601,
unified_timestamp,
url_or_none,
)
@ -88,7 +89,7 @@ class AMPIE(InfoExtractor):
self._sort_formats(formats)
timestamp = parse_iso8601(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
timestamp = unified_timestamp(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
return {
'id': video_id,

View file

@ -116,8 +116,6 @@ class AnimeOnDemandIE(InfoExtractor):
r'(?s)<div[^>]+itemprop="description"[^>]*>(.+?)</div>',
webpage, 'anime description', default=None)
entries = []
def extract_info(html, video_id, num=None):
title, description = [None] * 2
formats = []
@ -233,7 +231,7 @@ class AnimeOnDemandIE(InfoExtractor):
self._sort_formats(info['formats'])
f = common_info.copy()
f.update(info)
entries.append(f)
yield f
# Extract teaser/trailer only when full episode is not available
if not info['formats']:
@ -247,7 +245,7 @@ class AnimeOnDemandIE(InfoExtractor):
'title': m.group('title'),
'url': urljoin(url, m.group('href')),
})
entries.append(f)
yield f
def extract_episodes(html):
for num, episode_html in enumerate(re.findall(
@ -275,7 +273,8 @@ class AnimeOnDemandIE(InfoExtractor):
'episode_number': episode_number,
}
extract_entries(episode_html, video_id, common_info)
for e in extract_entries(episode_html, video_id, common_info):
yield e
def extract_film(html, video_id):
common_info = {
@ -283,11 +282,18 @@ class AnimeOnDemandIE(InfoExtractor):
'title': anime_title,
'description': anime_description,
}
extract_entries(html, video_id, common_info)
for e in extract_entries(html, video_id, common_info):
yield e
extract_episodes(webpage)
def entries():
has_episodes = False
for e in extract_episodes(webpage):
has_episodes = True
yield e
if not entries:
extract_film(webpage, anime_id)
if not has_episodes:
for e in extract_film(webpage, anime_id):
yield e
return self.playlist_result(entries, anime_id, anime_title, anime_description)
return self.playlist_result(
entries(), anime_id, anime_title, anime_description)

View file

@ -116,7 +116,76 @@ class AnvatoIE(InfoExtractor):
'anvato_scripps_app_ios_prod_409c41960c60b308db43c3cc1da79cab9f1c3d93': 'WPxj5GraLTkYCyj3M7RozLqIycjrXOEcDGFMIJPn',
'EZqvRyKBJLrgpClDPDF8I7Xpdp40Vx73': '4OxGd2dEakylntVKjKF0UK9PDPYB6A9W',
'M2v78QkpleXm9hPp9jUXI63x5vA6BogR': 'ka6K32k7ZALmpINkjJUGUo0OE42Md1BQ',
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ'
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ',
'X8POa4zPPaKVZHqmWjuEzfP31b1QM9VN': 'Dn5vOY9ooDw7VSl9qztjZI5o0g08mA0z',
'M2v78QkBMpNJlSPp9diX5F2PBmBy6Bog': 'ka6K32kyo7nDZfNkjQCGWf1lpApXMd1B',
'bvJ0dQpav07l0hG5JgfVLF2dv1vARwpP': 'BzoQW24GrJZoJfmNodiJKSPeB9B8NOxj',
'lxQMLg2XZKuEZaWgsqubBxV9INZ6bryY': 'Vm2Mx6noKds9jB71h6urazwlTG3m9x8l',
'04EnjvXeoSmkbJ9ckPs7oY0mcxv7PlyN': 'aXERQP9LMfQVlEDsgGs6eEA1SWznAQ8P',
'mQbO2ge6BFRWVPYCYpU06YvNt80XLvAX': 'E2BV1NGmasN5v7eujECVPJgwflnLPm2A',
'g43oeBzJrCml7o6fa5fRL1ErCdeD8z4K': 'RX34mZ6zVH4Nr6whbxIGLv9WSbxEKo8V',
'VQrDJoP7mtdBzkxhXbSPwGB1coeElk4x': 'j2VejQx0VFKQepAF7dI0mJLKtOVJE18z',
'WxA5NzLRjCrmq0NUgaU5pdMDuZO7RJ4w': 'lyY5ADLKaIOLEgAsGQCveEMAcqnx3rY9',
'M4lpMXB71ie0PjMCjdFzVXq0SeRVqz49': 'n2zVkOqaLIv3GbLfBjcwW51LcveWOZ2e',
'dyDZGEqN8u8nkJZcJns0oxYmtP7KbGAn': 'VXOEqQW9BtEVLajfZQSLEqxgS5B7qn2D',
'E7QNjrVY5u5mGvgu67IoDgV1CjEND8QR': 'rz8AaDmdKIkLmPNhB5ILPJnjS5PnlL8d',
'a4zrqjoKlfzg0dwHEWtP31VqcLBpjm4g': 'LY9J16gwETdGWa3hjBu5o0RzuoQDjqXQ',
'dQP5BZroMsMVLO1hbmT5r2Enu86GjxA6': '7XR3oOdbPF6x3PRFLDCq9RkgsRjAo48V',
'M4lKNBO1NFe0PjMCj1tzVXq0SeRVqzA9': 'n2zoRqGLRUv3GbLfBmTwW51LcveWOZYe',
'nAZ7MZdpGCGg1pqFEbsoJOz2C60mv143': 'dYJgdqA9aT4yojETqGi7yNgoFADxqmXP',
'3y1MERYgOuE9NzbFgwhV6Wv2F0YKvbyz': '081xpZDQgC4VadLTavhWQxrku56DAgXV',
'bmQvmEXr5HWklBMCZOcpE2Z3HBYwqGyl': 'zxXPbVNyMiMAZldhr9FkOmA0fl4aKr2v',
'wA7oDNYldfr6050Hwxi52lPZiVlB86Ap': 'ZYK16aA7ni0d3l3c34uwpxD7CbReMm8Q',
'g43MbKMWmFml7o7sJoSRkXxZiXRvJ3QK': 'RX3oBJonvs4Nr6rUWBCGn3matRGqJPXV',
'mA9VdlqpLS0raGaSDvtoqNrBTzb8XY4q': '0XN4OjBD3fnW7r7IbmtJB4AyfOmlrE2r',
'mAajOwgkGt17oGoFmEuklMP9H0GnW54d': 'lXbBLPGyzikNGeGujAuAJGjZiwLRxyXR',
'vy8vjJ9kbUwrRqRu59Cj5dWZfzYErlAb': 'K8l7gpwaGcBpnAnCLNCmPZRdin3eaQX0',
'xQMWBpR8oHEZaWaSMGUb0avOHjLVYn4Y': 'm2MrN4vEaf9jB7BFy5Srb40jTrN67AYl',
'xyKEmVO3miRr6D6UVkt7oB8jtD6aJEAv': 'g2ddDebqDfqdgKgswyUKwGjbTWwzq923',
'7Qk0wa2D9FjKapacoJF27aLvUDKkLGA0': 'b2kgBEkephJaMkMTL7s1PLe4Ua6WyP2P',
'3QLg6nqmNTJ5VvVTo7f508LPidz1xwyY': 'g2L1GgpraipmAOAUqmIbBnPxHOmw4MYa',
'3y1B7zZjXTE9NZNSzZSVNPZaTNLjo6Qz': '081b5G6wzH4VagaURmcWbN5mT4JGEe2V',
'lAqnwvkw6SG6D8DSqmUg6DRLUp0w3G4x': 'O2pbP0xPDFNJjpjIEvcdryOJtpkVM4X5',
'awA7xd1N0Hr6050Hw2c52lPZiVlB864p': 'GZYKpn4aoT0d3l3c3PiwpxD7CbReMmXQ',
'jQVqPLl9YHL1WGWtR1HDgWBGT63qRNyV': '6X03ne6vrU4oWyWUN7tQVoajikxJR3Ye',
'GQRMR8mL7uZK797t7xH3eNzPIP5dOny1': 'm2vqPWGd4U31zWzSyasDRAoMT1PKRp8o',
'zydq9RdmRhXLkNkfNoTJlMzaF0lWekQB': '3X7LnvE7vH5nkEkSqLiey793Un7dLB8e',
'VQrDzwkB2IdBzjzu9MHPbEYkSB50gR4x': 'j2VebLzoKUKQeEesmVh0gM1eIp9jKz8z',
'mAa2wMamBs17oGoFmktklMP9H0GnW54d': 'lXbgP74xZTkNGeGujVUAJGjZiwLRxy8R',
'7yjB6ZLG6sW8R6RF2xcan1KGfJ5dNoyd': 'wXQkPorvPHZ45N5t4Jf6qwg5Tp4xvw29',
'a4zPpNeWGuzg0m0iX3tPeanGSkRKWXQg': 'LY9oa3QAyHdGW9Wu3Ri5JGeEik7l1N8Q',
'k2rneA2M38k25cXDwwSknTJlxPxQLZ6M': '61lyA2aEVDzklfdwmmh31saPxQx2VRjp',
'bK9Zk4OvPnvxduLgxvi8VUeojnjA02eV': 'o5jANYjbeMb4nfBaQvcLAt1jzLzYx6ze',
'5VD6EydM3R9orHmNMGInGCJwbxbQvGRw': 'w3zjmX7g4vnxzCxElvUEOiewkokXprkZ',
'70X35QbVYVYNPUmP9YfbzI06YqYQk2R1': 'vG4Aj2BMjMjoztB7zeFOnCVPJpJ8lMOa',
'26qYwQVG9p1Bks2GgBckjfDJOXOAMgG1': 'r4ev9X0mv5zqJc0yk5IBDcQOwZw8mnwQ',
'rvVKpA56MBXWlSxMw3cobT5pdkd4Dm7q': '1J7ZkY53pZ645c93owcLZuveE7E8B3rL',
'qN1zdy1zlYL23IWZGWtDvfV6WeWQWkJo': 'qN1zdy1zlYL23IWZGWtDvfV6WeWQWkJo',
'jdKqRGF16dKsBviMDae7IGDl7oTjEbVV': 'Q09l7vhlNxPFErIOK6BVCe7KnwUW5DVV',
'3QLkogW1OUJ5VvPsrDH56DY2u7lgZWyY': 'g2LRE1V9espmAOPhE4ubj4ZdUA57yDXa',
'wyJvWbXGBSdbkEzhv0CW8meou82aqRy8': 'M2wolPvyBIpQGkbT4juedD4ruzQGdK2y',
'7QkdZrzEkFjKap6IYDU2PB0oCNZORmA0': 'b2kN1l96qhJaMkPs9dt1lpjBfwqZoA8P',
'pvA05113MHG1w3JTYxc6DVlRCjErVz4O': 'gQXeAbblBUnDJ7vujbHvbRd1cxlz3AXO',
'mA9blJDZwT0raG1cvkuoeVjLC7ZWd54q': '0XN9jRPwMHnW7rvumgfJZOD9CJgVkWYr',
'5QwRN5qKJTvGKlDTmnf7xwNZcjRmvEy9': 'R2GP6LWBJU1QlnytwGt0B9pytWwAdDYy',
'eyn5rPPbkfw2KYxH32fG1q58CbLJzM40': 'p2gyqooZnS56JWeiDgfmOy1VugOQEBXn',
'3BABn3b5RfPJGDwilbHe7l82uBoR05Am': '7OYZG7KMVhbPdKJS3xcWEN3AuDlLNmXj',
'xA5zNGXD3HrmqMlF6OS5pdMDuZO7RJ4w': 'yY5DAm6r1IOLE3BCVMFveEMAcqnx3r29',
'g43PgW3JZfml7o6fDEURL1ErCdeD8zyK': 'RX3aQn1zrS4Nr6whDgCGLv9WSbxEKo2V',
'lAqp8WbGgiG6D8LTKJcg3O72CDdre1Qx': 'O2pnm6473HNJjpKuVosd3vVeh975yrX5',
'wyJbYEDxKSdbkJ6S6RhW8meou82aqRy8': 'M2wPm7EgRSpQGlAh70CedD4ruzQGdKYy',
'M4lgW28nLCe0PVdtaXszVXq0SeRVqzA9': 'n2zmJvg4jHv3G0ETNgiwW51LcveWOZ8e',
'5Qw3OVvp9FvGKlDTmOC7xwNZcjRmvEQ9': 'R2GzDdml9F1Qlnytw9s0B9pytWwAdD8y',
'vy8a98X7zCwrRqbHrLUjYzwDiK2b70Qb': 'K8lVwzyjZiBpnAaSGeUmnAgxuGOBxmY0',
'g4eGjJLLoiqRD3Pf9oT5O03LuNbLRDQp': '6XqD59zzpfN4EwQuaGt67qNpSyRBlnYy',
'g43OPp9boIml7o6fDOIRL1ErCdeD8z4K': 'RX33alNB4s4Nr6whDPUGLv9WSbxEKoXV',
'xA2ng9OkBcGKzDbTkKsJlx7dUK8R3dA5': 'z2aPnJvzBfObkwGC3vFaPxeBhxoMqZ8K',
'xyKEgBajZuRr6DEC0Kt7XpD1cnNW9gAv': 'g2ddlEBvRsqdgKaI4jUK9PrgfMexGZ23',
'BAogww51jIMa2JnH1BcYpXM5F658RNAL': 'rYWDmm0KptlkGv4FGJFMdZmjs9RDE6XR',
'BAokpg62VtMa2JnH1mHYpXM5F658RNAL': 'rYWryDnlNslkGv4FG4HMdZmjs9RDE62R',
'a4z1Px5e2hzg0m0iMMCPeanGSkRKWXAg': 'LY9eorNQGUdGW9WuKKf5JGeEik7l1NYQ',
'kAx69R58kF9nY5YcdecJdl2pFXP53WyX': 'gXyRxELpbfPvLeLSaRil0mp6UEzbZJ8L',
'BAoY13nwViMa2J2uo2cY6BlETgmdwryL': 'rYWwKzJmNFlkGvGtNoUM9bzwIJVzB1YR',
}
_MCP_TO_ACCESS_KEY_TABLE = {
@ -189,19 +258,17 @@ class AnvatoIE(InfoExtractor):
video_data_url += '&X-Anvato-Adst-Auth=' + base64.b64encode(auth_secret).decode('ascii')
anvrid = md5_text(time.time() * 1000 * random.random())[:30]
payload = {
'api': {
'anvrid': anvrid,
'anvstk': md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time,
self._ANVACK_TABLE.get(access_key, self._API_KEY))),
'anvts': server_time,
},
api = {
'anvrid': anvrid,
'anvts': server_time,
}
api['anvstk'] = md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time,
self._ANVACK_TABLE.get(access_key, self._API_KEY)))
return self._download_json(
video_data_url, video_id, transform_source=strip_jsonp,
data=json.dumps(payload).encode('utf-8'))
data=json.dumps({'api': api}).encode('utf-8'))
def _get_anvato_videos(self, access_key, video_id):
video_data = self._get_video_json(access_key, video_id)
@ -259,7 +326,7 @@ class AnvatoIE(InfoExtractor):
'description': video_data.get('def_description'),
'tags': video_data.get('def_tags', '').split(','),
'categories': video_data.get('categories'),
'thumbnail': video_data.get('thumbnail'),
'thumbnail': video_data.get('src_image_url') or video_data.get('thumbnail'),
'timestamp': int_or_none(video_data.get(
'ts_published') or video_data.get('ts_added')),
'uploader': video_data.get('mcp_id'),

View file

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .yahoo import YahooIE
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
@ -15,9 +15,9 @@ from ..utils import (
)
class AolIE(InfoExtractor):
class AolIE(YahooIE):
IE_NAME = 'aol.com'
_VALID_URL = r'(?:aol-video:|https?://(?:www\.)?aol\.(?:com|ca|co\.uk|de|jp)/video/(?:[^/]+/)*)(?P<id>[0-9a-f]+)'
_VALID_URL = r'(?:aol-video:|https?://(?:www\.)?aol\.(?:com|ca|co\.uk|de|jp)/video/(?:[^/]+/)*)(?P<id>\d{9}|[0-9a-f]{24}|[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12})'
_TESTS = [{
# video with 5min ID
@ -76,10 +76,16 @@ class AolIE(InfoExtractor):
}, {
'url': 'https://www.aol.jp/video/playlist/5a28e936a1334d000137da0c/5a28f3151e642219fde19831/',
'only_matching': True,
}, {
# Yahoo video
'url': 'https://www.aol.com/video/play/991e6700-ac02-11ea-99ff-357400036f61/24bbc846-3e30-3c46-915e-fe8ccd7fcc46/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
if '-' in video_id:
return self._extract_yahoo_video(video_id, 'us')
response = self._download_json(
'https://feedapi.b2c.on.aol.com/v1.0/app/videos/aolon/%s/details' % video_id,

View file

@ -6,25 +6,21 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
int_or_none,
url_or_none,
)
class APAIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.apa\.at/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_VALID_URL = r'(?P<base_url>https?://[^/]+\.apa\.at)/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'http://uvp.apa.at/embed/293f6d17-692a-44e3-9fd5-7b178f3a1029',
'md5': '2b12292faeb0a7d930c778c7a5b4759b',
'info_dict': {
'id': 'jjv85FdZ',
'id': '293f6d17-692a-44e3-9fd5-7b178f3a1029',
'ext': 'mp4',
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'title': '293f6d17-692a-44e3-9fd5-7b178f3a1029',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 254,
'timestamp': 1519211149,
'upload_date': '20180221',
},
}, {
'url': 'https://uvp-apapublisher.sf.apa.at/embed/2f94e9e6-d945-4db2-9548-f9a41ebf7b78',
@ -38,7 +34,7 @@ class APAIE(InfoExtractor):
}]
@staticmethod
def _extract_urls(webpage):
def _extract_urls(webpage, **kwargs):
return [
mobj.group('url')
for mobj in re.finditer(
@ -46,9 +42,11 @@ class APAIE(InfoExtractor):
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
video_id, base_url = mobj.group('id', 'base_url')
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
'%s/player/%s' % (base_url, video_id), video_id)
jwplatform_id = self._search_regex(
r'media[iI]d\s*:\s*["\'](?P<id>[a-zA-Z0-9]{8})', webpage,
@ -59,16 +57,18 @@ class APAIE(InfoExtractor):
'jwplatform:' + jwplatform_id, ie='JWPlatform',
video_id=video_id)
sources = self._parse_json(
self._search_regex(
r'sources\s*=\s*(\[.+?\])\s*;', webpage, 'sources'),
video_id, transform_source=js_to_json)
def extract(field, name=None):
return self._search_regex(
r'\b%s["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % field,
webpage, name or field, default=None, group='value')
title = extract('title') or video_id
description = extract('description')
thumbnail = extract('poster', 'thumbnail')
formats = []
for source in sources:
if not isinstance(source, dict):
continue
source_url = url_or_none(source.get('file'))
for format_id in ('hls', 'progressive'):
source_url = url_or_none(extract(format_id))
if not source_url:
continue
ext = determine_ext(source_url)
@ -77,18 +77,19 @@ class APAIE(InfoExtractor):
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
height = int_or_none(self._search_regex(
r'(\d+)\.mp4', source_url, 'height', default=None))
formats.append({
'url': source_url,
'format_id': format_id,
'height': height,
})
self._sort_formats(formats)
thumbnail = self._search_regex(
r'image\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='url')
return {
'id': video_id,
'title': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View file

@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
get_element_by_id,
int_or_none,
merge_dicts,
mimetype2ext,
@ -39,23 +40,15 @@ class AparatIE(InfoExtractor):
webpage = self._download_webpage(url, video_id, fatal=False)
if not webpage:
# Note: There is an easier-to-parse configuration at
# http://www.aparat.com/video/video/config/videohash/%video_id
# but the URL in there does not work
webpage = self._download_webpage(
'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id,
video_id)
options = self._parse_json(
self._search_regex(
r'options\s*=\s*JSON\.parse\(\s*(["\'])(?P<value>(?:(?!\1).)+)\1\s*\)',
webpage, 'options', group='value'),
video_id)
player = options['plugins']['sabaPlayerPlugin']
options = self._parse_json(self._search_regex(
r'options\s*=\s*({.+?})\s*;', webpage, 'options'), video_id)
formats = []
for sources in player['multiSRC']:
for sources in (options.get('multiSRC') or []):
for item in sources:
if not isinstance(item, dict):
continue
@ -85,11 +78,12 @@ class AparatIE(InfoExtractor):
info = self._search_json_ld(webpage, video_id, default={})
if not info.get('title'):
info['title'] = player['title']
info['title'] = get_element_by_id('videoTitle', webpage) or \
self._html_search_meta(['og:title', 'twitter:title', 'DC.Title', 'title'], webpage, fatal=True)
return merge_dicts(info, {
'id': video_id,
'thumbnail': url_or_none(options.get('poster')),
'duration': int_or_none(player.get('duration')),
'duration': int_or_none(options.get('duration')),
'formats': formats,
})

View file

@ -9,10 +9,10 @@ from ..utils import (
class AppleConnectIE(InfoExtractor):
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/idsa\.(?P<id>[\w-]+)'
_TEST = {
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/(?:id)?sa\.(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://itunes.apple.com/us/post/idsa.4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'md5': 'e7c38568a01ea45402570e6029206723',
'md5': 'c1d41f72c8bcaf222e089434619316e4',
'info_dict': {
'id': '4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'ext': 'm4v',
@ -22,7 +22,10 @@ class AppleConnectIE(InfoExtractor):
'upload_date': '20150710',
'timestamp': 1436545535,
},
}
}, {
'url': 'https://itunes.apple.com/us/post/sa.0fe0229f-2457-11e5-9f40-1bb645f2d5d9',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -36,7 +39,7 @@ class AppleConnectIE(InfoExtractor):
video_data = self._parse_json(video_json, video_id)
timestamp = str_to_int(self._html_search_regex(r'data-timestamp="(\d+)"', webpage, 'timestamp'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count', default=None))
return {
'id': video_id,

View file

@ -0,0 +1,62 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_podcast_url,
int_or_none,
parse_iso8601,
try_get,
)
class ApplePodcastsIE(InfoExtractor):
_VALID_URL = r'https?://podcasts\.apple\.com/(?:[^/]+/)?podcast(?:/[^/]+){1,2}.*?\bi=(?P<id>\d+)'
_TESTS = [{
'url': 'https://podcasts.apple.com/us/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
'md5': 'df02e6acb11c10e844946a39e7222b08',
'info_dict': {
'id': '1000482637777',
'ext': 'mp3',
'title': '207 - Whitney Webb Returns',
'description': 'md5:13a73bade02d2e43737751e3987e1399',
'upload_date': '20200705',
'timestamp': 1593921600,
'duration': 6425,
'series': 'The Tim Dillon Show',
}
}, {
'url': 'https://podcasts.apple.com/podcast/207-whitney-webb-returns/id1135137367?i=1000482637777',
'only_matching': True,
}, {
'url': 'https://podcasts.apple.com/podcast/207-whitney-webb-returns?i=1000482637777',
'only_matching': True,
}, {
'url': 'https://podcasts.apple.com/podcast/id1135137367?i=1000482637777',
'only_matching': True,
}]
def _real_extract(self, url):
episode_id = self._match_id(url)
webpage = self._download_webpage(url, episode_id)
ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id)
ember_data = ember_data.get(episode_id) or ember_data
episode = ember_data['data']['attributes']
description = episode.get('description') or {}
series = None
for inc in (ember_data.get('included') or []):
if inc.get('type') == 'media/podcast':
series = try_get(inc, lambda x: x['attributes']['name'])
return {
'id': episode_id,
'title': episode['name'],
'url': clean_podcast_url(episode['assetUrl']),
'description': description.get('standard') or description.get('short'),
'timestamp': parse_iso8601(episode.get('releaseDateTime')),
'duration': int_or_none(episode.get('durationInMilliseconds'), 1000),
'series': series,
}

View file

@ -2,15 +2,17 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
unified_strdate,
clean_html,
extract_attributes,
unified_strdate,
unified_timestamp,
)
class ArchiveOrgIE(InfoExtractor):
IE_NAME = 'archive.org'
IE_DESC = 'archive.org videos'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#]+)(?:[?].*)?$'
_VALID_URL = r'https?://(?:www\.)?archive\.org/(?:details|embed)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://archive.org/details/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'md5': '8af1d4cf447933ed3c7f4871162602db',
@ -19,8 +21,11 @@ class ArchiveOrgIE(InfoExtractor):
'ext': 'ogg',
'title': '1968 Demo - FJCC Conference Presentation Reel #1',
'description': 'md5:da45c349df039f1cc8075268eb1b5c25',
'upload_date': '19681210',
'uploader': 'SRI International'
'creator': 'SRI International',
'release_date': '19681210',
'uploader': 'SRI International',
'timestamp': 1268695290,
'upload_date': '20100315',
}
}, {
'url': 'https://archive.org/details/Cops1922',
@ -29,22 +34,43 @@ class ArchiveOrgIE(InfoExtractor):
'id': 'Cops1922',
'ext': 'mp4',
'title': 'Buster Keaton\'s "Cops" (1922)',
'description': 'md5:89e7c77bf5d965dd5c0372cfb49470f6',
'description': 'md5:43a603fd6c5b4b90d12a96b921212b9c',
'timestamp': 1387699629,
'upload_date': '20131222',
}
}, {
'url': 'http://archive.org/embed/XD300-23_68HighlightsAResearchCntAugHumanIntellect',
'only_matching': True,
}, {
'url': 'https://archive.org/details/MSNBCW_20131125_040000_To_Catch_a_Predator/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://archive.org/embed/' + video_id, video_id)
jwplayer_playlist = self._parse_json(self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\)",
webpage, 'jwplayer playlist'), video_id)
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
playlist = None
play8 = self._search_regex(
r'(<[^>]+\bclass=["\']js-play8-playlist[^>]+>)', webpage,
'playlist', default=None)
if play8:
attrs = extract_attributes(play8)
playlist = attrs.get('value')
if not playlist:
# Old jwplayer fallback
playlist = self._search_regex(
r"(?s)Play\('[^']+'\s*,\s*(\[.+\])\s*,\s*{.*?}\)",
webpage, 'jwplayer playlist', default='[]')
jwplayer_playlist = self._parse_json(playlist, video_id, fatal=False)
if jwplayer_playlist:
info = self._parse_jwplayer_data(
{'playlist': jwplayer_playlist}, video_id, base_url=url)
else:
# HTML5 media fallback
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
info['id'] = video_id
def get_optional(metadata, field):
return metadata.get(field, [None])[0]
@ -58,8 +84,12 @@ class ArchiveOrgIE(InfoExtractor):
'description': clean_html(get_optional(metadata, 'description')),
})
if info.get('_type') != 'playlist':
creator = get_optional(metadata, 'creator')
info.update({
'uploader': get_optional(metadata, 'creator'),
'upload_date': unified_strdate(get_optional(metadata, 'date')),
'creator': creator,
'release_date': unified_strdate(get_optional(metadata, 'date')),
'uploader': get_optional(metadata, 'publisher') or creator,
'timestamp': unified_timestamp(get_optional(metadata, 'publicdate')),
'language': get_optional(metadata, 'language'),
})
return info

View file

@ -0,0 +1,174 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
int_or_none,
parse_iso8601,
try_get,
)
class ArcPublishingIE(InfoExtractor):
_UUID_REGEX = r'[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12}'
_VALID_URL = r'arcpublishing:(?P<org>[a-z]+):(?P<id>%s)' % _UUID_REGEX
_TESTS = [{
# https://www.adn.com/politics/2020/11/02/video-senate-candidates-campaign-in-anchorage-on-eve-of-election-day/
'url': 'arcpublishing:adn:8c99cb6e-b29c-4bc9-9173-7bf9979225ab',
'only_matching': True,
}, {
# https://www.bostonglobe.com/video/2020/12/30/metro/footage-released-showing-officer-talking-about-striking-protesters-with-car/
'url': 'arcpublishing:bostonglobe:232b7ae6-7d73-432d-bc0a-85dbf0119ab1',
'only_matching': True,
}, {
# https://www.actionnewsjax.com/video/live-stream/
'url': 'arcpublishing:cmg:cfb1cf1b-3ab5-4d1b-86c5-a5515d311f2a',
'only_matching': True,
}, {
# https://elcomercio.pe/videos/deportes/deporte-total-futbol-peruano-seleccion-peruana-la-valorizacion-de-los-peruanos-en-el-exterior-tras-un-2020-atipico-nnav-vr-video-noticia/
'url': 'arcpublishing:elcomercio:27a7e1f8-2ec7-4177-874f-a4feed2885b3',
'only_matching': True,
}, {
# https://www.clickondetroit.com/video/community/2020/05/15/events-surrounding-woodward-dream-cruise-being-canceled/
'url': 'arcpublishing:gmg:c8793fb2-8d44-4242-881e-2db31da2d9fe',
'only_matching': True,
}, {
# https://www.wabi.tv/video/2020/12/30/trenton-company-making-equipment-pfizer-covid-vaccine/
'url': 'arcpublishing:gray:0b0ba30e-032a-4598-8810-901d70e6033e',
'only_matching': True,
}, {
# https://www.lateja.cr/el-mundo/video-china-aprueba-con-condiciones-su-primera/dfcbfa57-527f-45ff-a69b-35fe71054143/video/
'url': 'arcpublishing:gruponacion:dfcbfa57-527f-45ff-a69b-35fe71054143',
'only_matching': True,
}, {
# https://www.fifthdomain.com/video/2018/03/09/is-america-vulnerable-to-a-cyber-attack/
'url': 'arcpublishing:mco:aa0ca6fe-1127-46d4-b32c-be0d6fdb8055',
'only_matching': True,
}, {
# https://www.vl.no/kultur/2020/12/09/en-melding-fra-en-lytter-endret-julelista-til-lewi-bergrud/
'url': 'arcpublishing:mentormedier:47a12084-650b-4011-bfd0-3699b6947b2d',
'only_matching': True,
}, {
# https://www.14news.com/2020/12/30/whiskey-theft-caught-camera-henderson-liquor-store/
'url': 'arcpublishing:raycom:b89f61f8-79fa-4c09-8255-e64237119bf7',
'only_matching': True,
}, {
# https://www.theglobeandmail.com/world/video-ethiopian-woman-who-became-symbol-of-integration-in-italy-killed-on/
'url': 'arcpublishing:tgam:411b34c1-8701-4036-9831-26964711664b',
'only_matching': True,
}, {
# https://www.pilotonline.com/460f2931-8130-4719-8ea1-ffcb2d7cb685-132.html
'url': 'arcpublishing:tronc:460f2931-8130-4719-8ea1-ffcb2d7cb685',
'only_matching': True,
}]
_POWA_DEFAULTS = [
(['cmg', 'prisa'], '%s-config-prod.api.cdn.arcpublishing.com/video'),
([
'adn', 'advancelocal', 'answers', 'bonnier', 'bostonglobe', 'demo',
'gmg', 'gruponacion', 'infobae', 'mco', 'nzme', 'pmn', 'raycom',
'spectator', 'tbt', 'tgam', 'tronc', 'wapo', 'wweek',
], 'video-api-cdn.%s.arcpublishing.com/api'),
]
@staticmethod
def _extract_urls(webpage, **kw):
entries = []
# https://arcpublishing.atlassian.net/wiki/spaces/POWA/overview
for powa_el in re.findall(r'(<div[^>]+class="[^"]*\bpowa\b[^"]*"[^>]+data-uuid="%s"[^>]*>)' % ArcPublishingIE._UUID_REGEX, webpage):
powa = extract_attributes(powa_el) or {}
org = powa.get('data-org')
uuid = powa.get('data-uuid')
if org and uuid:
entries.append('arcpublishing:%s:%s' % (org, uuid))
return entries
def _real_extract(self, url):
org, uuid = re.match(self._VALID_URL, url).groups()
for orgs, tmpl in self._POWA_DEFAULTS:
if org in orgs:
base_api_tmpl = tmpl
break
else:
base_api_tmpl = '%s-prod-cdn.video-api.arcpublishing.com/api'
if org == 'wapo':
org = 'washpost'
video = self._download_json(
'https://%s/v1/ansvideos/findByUuid' % (base_api_tmpl % org),
uuid, query={'uuid': uuid})[0]
title = video['headlines']['basic']
is_live = video.get('status') == 'live'
urls = []
formats = []
for s in video.get('streams', []):
s_url = s.get('url')
if not s_url or s_url in urls:
continue
urls.append(s_url)
stream_type = s.get('stream_type')
if stream_type == 'smil':
smil_formats = self._extract_smil_formats(
s_url, uuid, fatal=False)
for f in smil_formats:
if f['url'].endswith('/cfx/st'):
f['app'] = 'cfx/st'
if not f['play_path'].startswith('mp4:'):
f['play_path'] = 'mp4:' + f['play_path']
if isinstance(f['tbr'], float):
f['vbr'] = f['tbr'] * 1000
del f['tbr']
f['format_id'] = 'rtmp-%d' % f['vbr']
formats.extend(smil_formats)
elif stream_type in ('ts', 'hls'):
m3u8_formats = self._extract_m3u8_formats(
s_url, uuid, 'mp4', 'm3u8' if is_live else 'm3u8_native',
m3u8_id='hls', fatal=False)
if all([f.get('acodec') == 'none' for f in m3u8_formats]):
continue
for f in m3u8_formats:
if f.get('acodec') == 'none':
f['preference'] = -40
elif f.get('vcodec') == 'none':
f['preference'] = -50
height = f.get('height')
if not height:
continue
vbr = self._search_regex(
r'[_x]%d[_-](\d+)' % height, f['url'], 'vbr', default=None)
if vbr:
f['vbr'] = int(vbr)
formats.extend(m3u8_formats)
else:
vbr = int_or_none(s.get('bitrate'))
formats.append({
'format_id': '%s-%d' % (stream_type, vbr) if vbr else stream_type,
'vbr': vbr,
'width': int_or_none(s.get('width')),
'height': int_or_none(s.get('height')),
'filesize': int_or_none(s.get('filesize')),
'url': s_url,
'preference': -1,
})
self._sort_formats(
formats, ('preference', 'width', 'height', 'vbr', 'filesize', 'tbr', 'ext', 'format_id'))
subtitles = {}
for subtitle in (try_get(video, lambda x: x['subtitles']['urls'], list) or []):
subtitle_url = subtitle.get('url')
if subtitle_url:
subtitles.setdefault('en', []).append({'url': subtitle_url})
return {
'id': uuid,
'title': self._live_title(title) if is_live else title,
'thumbnail': try_get(video, lambda x: x['promo_image']['url']),
'description': try_get(video, lambda x: x['subheadlines']['basic']),
'formats': formats,
'duration': int_or_none(video.get('duration'), 100),
'timestamp': parse_iso8601(video.get('created_date')),
'subtitles': subtitles,
'is_live': is_live,
}

View file

@ -187,13 +187,13 @@ class ARDMediathekIE(ARDMediathekBaseIE):
if doc.tag == 'rss':
return GenericIE()._extract_rss(url, video_id, doc)
title = self._html_search_regex(
title = self._og_search_title(webpage, default=None) or self._html_search_regex(
[r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>',
r'<meta name="dcterms\.title" content="(.*?)"/>',
r'<h4 class="headline">(.*?)</h4>',
r'<title[^>]*>(.*?)</title>'],
webpage, 'title')
description = self._html_search_meta(
description = self._og_search_description(webpage, default=None) or self._html_search_meta(
'dcterms.abstract', webpage, 'description', default=None)
if description is None:
description = self._html_search_meta(
@ -249,31 +249,40 @@ class ARDMediathekIE(ARDMediathekBaseIE):
class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_VALID_URL = r'(?P<mainurl>https?://(?:www\.)?daserste\.de/(?:[^/?#&]+/)+(?P<id>[^/?#&]+))\.html'
_TESTS = [{
# available till 14.02.2019
'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
'md5': '8e4ec85f31be7c7fc08a26cdbc5a1f49',
# available till 7.01.2022
'url': 'https://www.daserste.de/information/talk/maischberger/videos/maischberger-die-woche-video100.html',
'md5': '867d8aa39eeaf6d76407c5ad1bb0d4c1',
'info_dict': {
'display_id': 'das-groko-drama-zerlegen-sich-die-volksparteien-video',
'id': '102',
'id': 'maischberger-die-woche-video100',
'display_id': 'maischberger-die-woche-video100',
'ext': 'mp4',
'duration': 4435.0,
'title': 'Das GroKo-Drama: Zerlegen sich die Volksparteien?',
'upload_date': '20180214',
'duration': 3687.0,
'title': 'maischberger. die woche vom 7. Januar 2021',
'upload_date': '20210107',
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
'url': 'https://www.daserste.de/information/politik-weltgeschehen/morgenmagazin/videosextern/dominik-kahun-aus-der-nhl-direkt-zur-weltmeisterschaft-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/information/nachrichten-wetter/tagesthemen/videosextern/tagesthemen-17736.html',
'only_matching': True,
}, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/unterhaltung/serie/in-aller-freundschaft-die-jungen-aerzte/Drehpause-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/unterhaltung/film/filmmittwoch-im-ersten/videos/making-ofwendezeit-video-100.html',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
display_id = mobj.group('id')
player_url = mobj.group('mainurl') + '~playerXml.xml'
doc = self._download_xml(player_url, display_id)
@ -284,25 +293,47 @@ class ARDIE(InfoExtractor):
formats = []
for a in video_node.findall('.//asset'):
file_name = xpath_text(a, './fileName', default=None)
if not file_name:
continue
format_type = a.attrib.get('type')
format_url = url_or_none(file_name)
if format_url:
ext = determine_ext(file_name)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_type or 'hls', fatal=False))
continue
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(format_url, {'hdcore': '3.7.0'}),
display_id, f4m_id=format_type or 'hds', fatal=False))
continue
f = {
'format_id': a.attrib['type'],
'width': int_or_none(a.find('./frameWidth').text),
'height': int_or_none(a.find('./frameHeight').text),
'vbr': int_or_none(a.find('./bitrateVideo').text),
'abr': int_or_none(a.find('./bitrateAudio').text),
'vcodec': a.find('./codecVideo').text,
'tbr': int_or_none(a.find('./totalBitrate').text),
'format_id': format_type,
'width': int_or_none(xpath_text(a, './frameWidth')),
'height': int_or_none(xpath_text(a, './frameHeight')),
'vbr': int_or_none(xpath_text(a, './bitrateVideo')),
'abr': int_or_none(xpath_text(a, './bitrateAudio')),
'vcodec': xpath_text(a, './codecVideo'),
'tbr': int_or_none(xpath_text(a, './totalBitrate')),
}
if a.find('./serverPrefix').text:
f['url'] = a.find('./serverPrefix').text
f['playpath'] = a.find('./fileName').text
server_prefix = xpath_text(a, './serverPrefix', default=None)
if server_prefix:
f.update({
'url': server_prefix,
'playpath': file_name,
})
else:
f['url'] = a.find('./fileName').text
if not format_url:
continue
f['url'] = format_url
formats.append(f)
self._sort_formats(formats)
return {
'id': mobj.group('id'),
'id': xpath_text(video_node, './videoId', default=display_id),
'formats': formats,
'display_id': display_id,
'title': video_node.find('./title').text,
@ -313,19 +344,19 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?:[^/]+/)?(?:player|live|video)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://ardmediathek.de/ard/video/die-robuste-roswita/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/',
'md5': 'a1dc75a39c61601b980648f7c9f9f71d',
'info_dict': {
'display_id': 'die-robuste-roswita',
'id': '70153354',
'id': '78566716',
'title': 'Die robuste Roswita',
'description': r're:^Der Mord.*trüber ist als die Ilm.',
'description': r're:^Der Mord.*totgeglaubte Ehefrau Roswita',
'duration': 5316,
'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard',
'timestamp': 1577047500,
'upload_date': '20191222',
'thumbnail': 'https://img.ardmediathek.de/standard/00/78/56/67/84/575672121/16x9/960?mandant=ard',
'timestamp': 1596658200,
'upload_date': '20200805',
'ext': 'mp4',
},
}, {
@ -343,22 +374,22 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
}, {
'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/video/coronavirus-update-ndr-info/astrazeneca-kurz-lockdown-und-pims-syndrom-81/ndr/Y3JpZDovL25kci5kZS84NzE0M2FjNi0wMWEwLTQ5ODEtOTE5NS1mOGZhNzdhOTFmOTI/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3dkci5kZS9CZWl0cmFnLWQ2NDJjYWEzLTMwZWYtNGI4NS1iMTI2LTU1N2UxYTcxOGIzOQ/tatort-duo-koeln-leipzig-ihr-kinderlein-kommet',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
display_id = mobj.group('display_id')
if display_id:
display_id = display_id.rstrip('/')
if not display_id:
display_id = video_id
video_id = self._match_id(url)
player_page = self._download_json(
'https://api.ardmediathek.de/public-gateway',
display_id, data=json.dumps({
video_id, data=json.dumps({
'query': '''{
playerPage(client:"%s", clipId: "%s") {
playerPage(client: "ard", clipId: "%s") {
blockedByFsk
broadcastedOn
maturityContentRating
@ -388,7 +419,7 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
}
}
}
}''' % (mobj.group('client'), video_id),
}''' % video_id,
}).encode(), headers={
'Content-Type': 'application/json'
})['data']['playerPage']
@ -413,7 +444,6 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
info.update({
'age_limit': age_limit,
'display_id': display_id,
'title': title,
'description': description,
'timestamp': unified_timestamp(player_page.get('broadcastedOn')),

View file

@ -103,7 +103,7 @@ class ArkenaIE(InfoExtractor):
f_url, video_id, mpd_id=kind, fatal=False))
elif kind == 'silverlight':
# TODO: process when ism is supported (see
# https://github.com/ytdl-org/haruhi-dl/issues/8118)
# https://github.com/ytdl-org/youtube-dl/issues/8118)
continue
else:
tbr = float_or_none(f.get('Bitrate'), 1000)

View file

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
remove_start,
)
class ArnesIE(InfoExtractor):
IE_NAME = 'video.arnes.si'
IE_DESC = 'Arnes Video'
_VALID_URL = r'https?://video\.arnes\.si/(?:[a-z]{2}/)?(?:watch|embed|api/(?:asset|public/video))/(?P<id>[0-9a-zA-Z]{12})'
_TESTS = [{
'url': 'https://video.arnes.si/watch/a1qrWTOQfVoU?t=10',
'md5': '4d0f4d0a03571b33e1efac25fd4a065d',
'info_dict': {
'id': 'a1qrWTOQfVoU',
'ext': 'mp4',
'title': 'Linearna neodvisnost, definicija',
'description': 'Linearna neodvisnost, definicija',
'license': 'PRIVATE',
'creator': 'Polona Oblak',
'timestamp': 1585063725,
'upload_date': '20200324',
'channel': 'Polona Oblak',
'channel_id': 'q6pc04hw24cj',
'channel_url': 'https://video.arnes.si/?channel=q6pc04hw24cj',
'duration': 596.75,
'view_count': int,
'tags': ['linearna_algebra'],
'start_time': 10,
}
}, {
'url': 'https://video.arnes.si/api/asset/s1YjnV7hadlC/play.mp4',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/en/watch/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC?t=123&hideRelated=1',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/api/public/video/s1YjnV7hadlC',
'only_matching': True,
}]
_BASE_URL = 'https://video.arnes.si'
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
self._BASE_URL + '/api/public/video/' + video_id, video_id)['data']
title = video['title']
formats = []
for media in (video.get('media') or []):
media_url = media.get('url')
if not media_url:
continue
formats.append({
'url': self._BASE_URL + media_url,
'format_id': remove_start(media.get('format'), 'FORMAT_'),
'format_note': media.get('formatTranslation'),
'width': int_or_none(media.get('width')),
'height': int_or_none(media.get('height')),
})
self._sort_formats(formats)
channel = video.get('channel') or {}
channel_id = channel.get('url')
thumbnail = video.get('thumbnailUrl')
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': self._BASE_URL + thumbnail,
'description': video.get('description'),
'license': video.get('license'),
'creator': video.get('author'),
'timestamp': parse_iso8601(video.get('creationTime')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': self._BASE_URL + '/?channel=' + channel_id if channel_id else None,
'duration': float_or_none(video.get('duration'), 1000),
'view_count': int_or_none(video.get('views')),
'tags': video.get('hashtags'),
'start_time': int_or_none(compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('t', [None])[0]),
}

View file

@ -4,23 +4,57 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
ExtractorError,
int_or_none,
qualities,
try_get,
unified_strdate,
url_or_none,
)
# There are different sources of video in arte.tv, the extraction process
# is different for each one. The videos usually expire in 7 days, so we can't
# add tests.
class ArteTVBaseIE(InfoExtractor):
def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id)
_ARTE_LANGUAGES = 'fr|de|en|es|it|pl'
_API_BASE = 'https://api.arte.tv/api/player/v1'
class ArteTVIE(ArteTVBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
(?:www\.)?arte\.tv/(?P<lang>%(langs)s)/videos|
api\.arte\.tv/api/player/v\d+/config/(?P<lang_2>%(langs)s)
)
/(?P<id>\d{6}-\d{3}-[AF])
''' % {'langs': ArteTVBaseIE._ARTE_LANGUAGES}
_TESTS = [{
'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/',
'info_dict': {
'id': '088501-000-A',
'ext': 'mp4',
'title': 'Mexico: Stealing Petrol to Survive',
'upload_date': '20190628',
},
}, {
'url': 'https://www.arte.tv/pl/videos/100103-000-A/usa-dyskryminacja-na-porodowce/',
'only_matching': True,
}, {
'url': 'https://api.arte.tv/api/player/v2/config/de/100605-013-A',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
lang = mobj.group('lang') or mobj.group('lang_2')
info = self._download_json(
'%s/config/%s/%s' % (self._API_BASE, lang, video_id), video_id)
player_info = info['videoJsonPlayer']
vsr = try_get(player_info, lambda x: x['VSR'], dict)
@ -37,18 +71,11 @@ class ArteTVBaseIE(InfoExtractor):
if not upload_date_str:
upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
title = (player_info.get('VTI') or title or player_info['VID']).strip()
title = (player_info.get('VTI') or player_info['VID']).strip()
subtitle = player_info.get('VSU', '').strip()
if subtitle:
title += ' - %s' % subtitle
info_dict = {
'id': player_info['VID'],
'title': title,
'description': player_info.get('VDE'),
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
qfunc = qualities(['MQ', 'HQ', 'EQ', 'SQ'])
LANGS = {
@ -65,6 +92,10 @@ class ArteTVBaseIE(InfoExtractor):
formats = []
for format_id, format_dict in vsr.items():
f = dict(format_dict)
format_url = url_or_none(f.get('url'))
streamer = f.get('streamer')
if not format_url and not streamer:
continue
versionCode = f.get('versionCode')
l = re.escape(langcode)
@ -107,6 +138,16 @@ class ArteTVBaseIE(InfoExtractor):
else:
lang_pref = -1
media_type = f.get('mediaType')
if media_type == 'hls':
m3u8_formats = self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
for m3u8_format in m3u8_formats:
m3u8_format['language_preference'] = lang_pref
formats.extend(m3u8_formats)
continue
format = {
'format_id': format_id,
'preference': -10 if f.get('videoFormat') == 'M3U8' else None,
@ -118,7 +159,7 @@ class ArteTVBaseIE(InfoExtractor):
'quality': qfunc(f.get('quality')),
}
if f.get('mediaType') == 'rtmp':
if media_type == 'rtmp':
format['url'] = f['streamer']
format['play_path'] = 'mp4:' + f['url']
format['ext'] = 'flv'
@ -127,56 +168,50 @@ class ArteTVBaseIE(InfoExtractor):
formats.append(format)
self._check_formats(formats, video_id)
self._sort_formats(formats)
info_dict['formats'] = formats
return info_dict
return {
'id': player_info.get('VID') or video_id,
'title': title,
'description': player_info.get('VDE'),
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
'formats': formats,
}
class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>\d{6}-\d{3}-[AF])'
class ArteTVEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?arte\.tv/player/v\d+/index\.php\?.*?\bjson_url=.+'
_TESTS = [{
'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/',
'url': 'https://www.arte.tv/player/v5/index.php?json_url=https%3A%2F%2Fapi.arte.tv%2Fapi%2Fplayer%2Fv2%2Fconfig%2Fde%2F100605-013-A&lang=de&autoplay=true&mute=0100605-013-A',
'info_dict': {
'id': '088501-000-A',
'id': '100605-013-A',
'ext': 'mp4',
'title': 'Mexico: Stealing Petrol to Survive',
'upload_date': '20190628',
'title': 'United we Stream November Lockdown Edition #13',
'description': 'md5:be40b667f45189632b78c1425c7c2ce1',
'upload_date': '20201116',
},
}, {
'url': 'https://www.arte.tv/player/v3/index.php?json_url=https://api.arte.tv/api/player/v2/config/de/100605-013-A',
'only_matching': True,
}]
def _real_extract(self, url):
lang, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_from_json_url(
'https://api.arte.tv/api/player/v1/config/%s/%s' % (lang, video_id),
video_id, lang)
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
https://www\.arte\.tv
/player/v3/index\.php\?json_url=
(?P<json_url>
https?://api\.arte\.tv/api/player/v1/config/
(?P<lang>[^/]+)/(?P<id>\d{6}-\d{3}-[AF])
)
'''
_TESTS = []
@staticmethod
def _extract_urls(webpage):
return [url for _, url in re.findall(
r'<(?:iframe|script)[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?arte\.tv/player/v\d+/index\.php\?.*?\bjson_url=.+?)\1',
webpage)]
def _real_extract(self, url):
json_url, lang, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_from_json_url(json_url, video_id, lang)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
json_url = qs['json_url'][0]
video_id = ArteTVIE._match_id(json_url)
return self.url_result(
json_url, ie=ArteTVIE.ie_key(), video_id=video_id)
class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>RC-\d{6})'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>%s)/videos/(?P<id>RC-\d{6})' % ArteTVBaseIE._ARTE_LANGUAGES
_TESTS = [{
'url': 'https://www.arte.tv/en/videos/RC-016954/earn-a-living/',
'info_dict': {
@ -185,17 +220,35 @@ class ArteTVPlaylistIE(ArteTVBaseIE):
'description': 'md5:d322c55011514b3a7241f7fb80d494c2',
},
'playlist_mincount': 6,
}, {
'url': 'https://www.arte.tv/pl/videos/RC-014123/arte-reportage/',
'only_matching': True,
}]
def _real_extract(self, url):
lang, playlist_id = re.match(self._VALID_URL, url).groups()
collection = self._download_json(
'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
% (lang, playlist_id), playlist_id)
'%s/collectionData/%s/%s?source=videos'
% (self._API_BASE, lang, playlist_id), playlist_id)
entries = []
for video in collection['videos']:
if not isinstance(video, dict):
continue
video_url = url_or_none(video.get('url')) or url_or_none(video.get('jsonUrl'))
if not video_url:
continue
video_id = video.get('programId')
entries.append({
'_type': 'url_transparent',
'url': video_url,
'id': video_id,
'title': video.get('title'),
'alt_title': video.get('subtitle'),
'thumbnail': url_or_none(try_get(video, lambda x: x['mainImage']['url'], compat_str)),
'duration': int_or_none(video.get('durationSeconds')),
'view_count': int_or_none(video.get('views')),
'ie_key': ArteTVIE.ie_key(),
})
title = collection.get('title')
description = collection.get('shortDescription') or collection.get('teaserText')
entries = [
self._extract_from_json_url(
video['jsonUrl'], video.get('programId') or playlist_id, lang)
for video in collection['videos'] if video.get('jsonUrl')]
return self.playlist_result(entries, playlist_id, title, description)

View file

@ -1,27 +1,91 @@
# coding: utf-8
from __future__ import unicode_literals
import functools
import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import extract_attributes
from ..utils import (
extract_attributes,
int_or_none,
OnDemandPagedList,
parse_age_limit,
strip_or_none,
try_get,
)
class AsianCrushIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|cocoro\.tv))'
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % _VALID_URL_BASE
class AsianCrushBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|(?:cocoro|retrocrush)\.tv))'
_KALTURA_KEYS = [
'video_url', 'progressive_url', 'download_url', 'thumbnail_url',
'widescreen_thumbnail_url', 'screencap_widescreen',
]
_API_SUFFIX = {'retrocrush.tv': '-ott'}
def _call_api(self, host, endpoint, video_id, query, resource):
return self._download_json(
'https://api%s.%s/%s' % (self._API_SUFFIX.get(host, ''), host, endpoint), video_id,
'Downloading %s JSON metadata' % resource, query=query,
headers=self.geo_verification_headers())['objects']
def _download_object_data(self, host, object_id, resource):
return self._call_api(
host, 'search', object_id, {'id': object_id}, resource)[0]
def _get_object_description(self, obj):
return strip_or_none(obj.get('long_description') or obj.get('short_description'))
def _parse_video_data(self, video):
title = video['name']
entry_id, partner_id = [None] * 2
for k in self._KALTURA_KEYS:
k_url = video.get(k)
if k_url:
mobj = re.search(r'/p/(\d+)/.+?/entryId/([^/]+)/', k_url)
if mobj:
partner_id, entry_id = mobj.groups()
break
meta_categories = try_get(video, lambda x: x['meta']['categories'], list) or []
categories = list(filter(None, [c.get('name') for c in meta_categories]))
show_info = video.get('show_info') or {}
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, entry_id),
'ie_key': KalturaIE.ie_key(),
'id': entry_id,
'title': title,
'description': self._get_object_description(video),
'age_limit': parse_age_limit(video.get('mpaa_rating') or video.get('tv_rating')),
'categories': categories,
'series': show_info.get('show_name'),
'season_number': int_or_none(show_info.get('season_num')),
'season_id': show_info.get('season_id'),
'episode_number': int_or_none(show_info.get('episode_num')),
}
class AsianCrushIE(AsianCrushBaseIE):
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % AsianCrushBaseIE._VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
'url': 'https://www.asiancrush.com/video/004289v/women-who-flirt',
'md5': 'c3b740e48d0ba002a42c0b72857beae6',
'info_dict': {
'id': '1_y4tmjm5r',
'ext': 'mp4',
'title': 'Women Who Flirt',
'description': 'md5:7e986615808bcfb11756eb503a751487',
'description': 'md5:b65c7e0ae03a85585476a62a186f924c',
'timestamp': 1496936429,
'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com',
'age_limit': 13,
'categories': 'count:5',
'duration': 5812,
},
}, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
@ -41,67 +105,35 @@ class AsianCrushIE(InfoExtractor):
}, {
'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/',
'only_matching': True,
}, {
'url': 'https://www.retrocrush.tv/video/true-tears/012328v-i...gave-away-my-tears',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
host, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
entry_id, partner_id, title = [None] * 3
vars = self._parse_json(
self._search_regex(
if host == 'cocoro.tv':
webpage = self._download_webpage(url, video_id)
embed_vars = self._parse_json(self._search_regex(
r'iEmbedVars\s*=\s*({.+?})', webpage, 'embed vars',
default='{}'), video_id, fatal=False)
if vars:
entry_id = vars.get('entry_id')
partner_id = vars.get('partner_id')
title = vars.get('vid_label')
default='{}'), video_id, fatal=False) or {}
video_id = embed_vars.get('entry_id') or video_id
if not entry_id:
entry_id = self._search_regex(
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
'https://api.%s/embeddedVideoPlayer' % host, video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
r'entry_id["\']\s*:\s*(["\'])(?P<id>(?:(?!\1).)+)\1', player,
'kaltura id', group='id')
if not partner_id:
partner_id = self._search_regex(
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
description = self._html_search_regex(
r'(?s)<div[^>]+\bclass=["\']description["\'][^>]*>(.+?)</div>',
webpage, 'description', fatal=False)
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
'ie_key': KalturaIE.ie_key(),
'id': video_id,
'title': title,
'description': description,
}
video = self._download_object_data(host, video_id, 'video')
return self._parse_video_data(video)
class AsianCrushPlaylistIE(InfoExtractor):
_VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushIE._VALID_URL_BASE
class AsianCrushPlaylistIE(AsianCrushBaseIE):
_VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushBaseIE._VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
'url': 'https://www.asiancrush.com/series/006447s/fruity-samurai',
'info_dict': {
'id': '12481',
'title': 'Scholar Who Walks the Night',
'description': 'md5:7addd7c5132a09fd4741152d96cce886',
'id': '6447',
'title': 'Fruity Samurai',
'description': 'md5:7535174487e4a202d3872a7fc8f2f154',
},
'playlist_count': 20,
'playlist_count': 13,
}, {
'url': 'https://www.yuyutv.com/series/013920s/peep-show/',
'only_matching': True,
@ -111,35 +143,58 @@ class AsianCrushPlaylistIE(InfoExtractor):
}, {
'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/',
'only_matching': True,
}, {
'url': 'https://www.retrocrush.tv/series/012355s/true-tears',
'only_matching': True,
}]
_PAGE_SIZE = 1000000000
def _fetch_page(self, domain, parent_id, page):
videos = self._call_api(
domain, 'getreferencedobjects', parent_id, {
'max': self._PAGE_SIZE,
'object_type': 'video',
'parent_id': parent_id,
'start': page * self._PAGE_SIZE,
}, 'page %d' % (page + 1))
for video in videos:
yield self._parse_video_data(video)
def _real_extract(self, url):
playlist_id = self._match_id(url)
host, playlist_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, playlist_id)
if host == 'cocoro.tv':
webpage = self._download_webpage(url, playlist_id)
entries = []
entries = []
for mobj in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL,
webpage):
attrs = extract_attributes(mobj.group(0))
if attrs.get('class') == 'clearfix':
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
for mobj in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL,
webpage):
attrs = extract_attributes(mobj.group(0))
if attrs.get('class') == 'clearfix':
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
if title:
title = re.sub(r'\s*\|\s*.+?$', '', title)
title = self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
if title:
title = re.sub(r'\s*\|\s*.+?$', '', title)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'twitter:description', webpage, 'description', fatal=False)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'twitter:description', webpage, 'description', fatal=False)
else:
show = self._download_object_data(host, playlist_id, 'show')
title = show.get('name')
description = self._get_object_description(show)
entries = OnDemandPagedList(
functools.partial(self._fetch_page, host, playlist_id),
self._PAGE_SIZE)
return self.playlist_result(entries, playlist_id, title, description)

View file

@ -6,22 +6,18 @@ from ..utils import unified_strdate
class ATTTechChannelIE(InfoExtractor):
_VALID_URL = r'https?://techchannel\.att\.com/play-video\.cfm/([^/]+/)*(?P<id>.+)'
_TEST = {
_TESTS = [{
'url': 'http://techchannel.att.com/play-video.cfm/2014/1/27/ATT-Archives-The-UNIX-System-Making-Computers-Easier-to-Use',
'info_dict': {
'id': '11316',
'display_id': 'ATT-Archives-The-UNIX-System-Making-Computers-Easier-to-Use',
'ext': 'flv',
'title': 'AT&T Archives : The UNIX System: Making Computers Easier to Use',
'ext': 'm3u8',
'title': 'AT&T Archives: The UNIX System: Making Computers Easier to Use',
'description': 'A 1982 film about UNIX is the foundation for software in use around Bell Labs and AT&T.',
'thumbnail': r're:^https?://.*\.jpg$',
'upload_date': '20140127',
},
'params': {
# rtmp download
'skip_download': True,
},
}
}]
def _real_extract(self, url):
display_id = self._match_id(url)
@ -29,16 +25,19 @@ class ATTTechChannelIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
video_url = self._search_regex(
r"url\s*:\s*'(rtmp://[^']+)'",
r'(?m)^\s*"src=\'(https?://.+?\.m3u8)\'',
webpage, 'video URL')
video_id = self._search_regex(
r'mediaid\s*=\s*(\d+)',
webpage, 'video id', fatal=False)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
formats = self._extract_m3u8_formats(video_url, video_id)
title = self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title')
description = self._html_search_meta('description', webpage, 'description')
thumbnail = self._search_regex(r'poster=\'(https?://.+?)\'',
webpage, 'thumbnail', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'[Rr]elease\s+date:\s*(\d{1,2}/\d{1,2}/\d{4})',
webpage, 'upload date', fatal=False), False)
@ -46,8 +45,7 @@ class ATTTechChannelIE(InfoExtractor):
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'ext': 'flv',
'formats': formats,
'title': title,
'description': description,
'thumbnail': thumbnail,

View file

@ -48,6 +48,7 @@ class AWAANBaseIE(InfoExtractor):
'duration': int_or_none(video_data.get('duration')),
'timestamp': parse_iso8601(video_data.get('create_time'), ' '),
'is_live': is_live,
'uploader_id': video_data.get('user_id'),
}
@ -107,6 +108,7 @@ class AWAANLiveIE(AWAANBaseIE):
'title': 're:Dubai Al Oula [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'upload_date': '20150107',
'timestamp': 1420588800,
'uploader_id': '71',
},
'params': {
# m3u8 download

View file

@ -47,7 +47,7 @@ class AZMedienIE(InfoExtractor):
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True
}]
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/a4016f65fe62b81dc6664dd9f4910e4ab40383be'
_PARTNER_ID = '1719221'
def _real_extract(self, url):

View file

@ -0,0 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
from .brightcove import BrightcoveNewIE
from ..utils import extract_attributes
class BandaiChannelIE(BrightcoveNewIE):
IE_NAME = 'bandaichannel'
_VALID_URL = r'https?://(?:www\.)?b-ch\.com/titles/(?P<id>\d+/\d+)'
_TESTS = [{
'url': 'https://www.b-ch.com/titles/514/001',
'md5': 'a0f2d787baa5729bed71108257f613a4',
'info_dict': {
'id': '6128044564001',
'ext': 'mp4',
'title': 'メタルファイターMIKU 第1話',
'timestamp': 1580354056,
'uploader_id': '5797077852001',
'upload_date': '20200130',
'duration': 1387.733,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
attrs = extract_attributes(self._search_regex(
r'(<video-js[^>]+\bid="bcplayer"[^>]*>)', webpage, 'player'))
bc = self._download_json(
'https://pbifcd.b-ch.com/v1/playbackinfo/ST/70/' + attrs['data-info'],
video_id, headers={'X-API-KEY': attrs['data-auth'].strip()})['bc']
return self._parse_brightcove_metadata(bc, bc['id'])

View file

@ -1,3 +1,4 @@
# coding: utf-8
from __future__ import unicode_literals
import random
@ -5,10 +6,7 @@ import re
import time
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..compat import compat_str
from ..utils import (
ExtractorError,
float_or_none,
@ -17,30 +15,32 @@ from ..utils import (
parse_filesize,
str_or_none,
try_get,
unescapeHTML,
update_url_query,
unified_strdate,
unified_timestamp,
url_or_none,
urljoin,
)
class BandcampIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.bandcamp\.com/track/(?P<title>[^/?#&]+)'
_VALID_URL = r'https?://[^/]+\.bandcamp\.com/track/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://haruhi-dl.bandcamp.com/track/haruhi-dl-test-song',
'md5': 'c557841d5e50261777a6585648adf439',
'info_dict': {
'id': '1812978515',
'ext': 'mp3',
'title': "haruhi-dl \"'/\\\u00e4\u21ad - haruhi-dl test song \"'/\\\u00e4\u21ad",
'title': "haruhi-dl \"'/\\ä↭ - haruhi-dl \"'/\\ä↭ - haruhi-dl test song \"'/\\ä↭",
'duration': 9.8485,
'uploader': 'haruhi-dl "\'/\\ä↭',
'upload_date': '20121129',
'timestamp': 1354224127,
},
'_skip': 'There is a limit of 200 free downloads / month for the test song'
}, {
# free download
'url': 'http://benprunty.bandcamp.com/track/lanius-battle',
'md5': '853e35bf34aa1d6fe2615ae612564b36',
'info_dict': {
'id': '2650410135',
'ext': 'aiff',
@ -49,6 +49,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Ben Prunty',
'timestamp': 1396508491,
'upload_date': '20140403',
'release_timestamp': 1396483200,
'release_date': '20140403',
'duration': 260.877,
'track': 'Lanius (Battle)',
@ -69,6 +70,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Mastodon',
'timestamp': 1322005399,
'upload_date': '20111122',
'release_timestamp': 1076112000,
'release_date': '20040207',
'duration': 120.79,
'track': 'Hail to Fire',
@ -79,11 +81,16 @@ class BandcampIE(InfoExtractor):
},
}]
def _extract_data_attr(self, webpage, video_id, attr='tralbum', fatal=True):
return self._parse_json(self._html_search_regex(
r'data-%s=(["\'])({.+?})\1' % attr, webpage,
attr + ' data', group=2), video_id, fatal=fatal)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
title = self._match_id(url)
webpage = self._download_webpage(url, title)
thumbnail = self._html_search_meta('og:image', webpage, default=None)
tralbum = self._extract_data_attr(webpage, title)
thumbnail = self._og_search_thumbnail(webpage)
track_id = None
track = None
@ -91,10 +98,7 @@ class BandcampIE(InfoExtractor):
duration = None
formats = []
track_info = self._parse_json(
self._search_regex(
r'trackinfo\s*:\s*\[\s*({.+?})\s*\]\s*,\s*?\n',
webpage, 'track info', default='{}'), title)
track_info = try_get(tralbum, lambda x: x['trackinfo'][0], dict)
if track_info:
file_ = track_info.get('file')
if isinstance(file_, dict):
@ -111,37 +115,25 @@ class BandcampIE(InfoExtractor):
'abr': int_or_none(abr_str),
})
track = track_info.get('title')
track_id = str_or_none(track_info.get('track_id') or track_info.get('id'))
track_id = str_or_none(
track_info.get('track_id') or track_info.get('id'))
track_number = int_or_none(track_info.get('track_num'))
duration = float_or_none(track_info.get('duration'))
def extract(key):
return self._search_regex(
r'\b%s\s*["\']?\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % key,
webpage, key, default=None, group='value')
artist = extract('artist')
album = extract('album_title')
embed = self._extract_data_attr(webpage, title, 'embed', False)
current = tralbum.get('current') or {}
artist = embed.get('artist') or current.get('artist') or tralbum.get('artist')
timestamp = unified_timestamp(
extract('publish_date') or extract('album_publish_date'))
release_date = unified_strdate(extract('album_release_date'))
current.get('publish_date') or tralbum.get('album_publish_date'))
download_link = self._search_regex(
r'freeDownloadPage\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'download link', default=None, group='url')
download_link = tralbum.get('freeDownloadPage')
if download_link:
track_id = self._search_regex(
r'(?ms)var TralbumData = .*?[{,]\s*id: (?P<id>\d+),?$',
webpage, 'track id')
track_id = compat_str(tralbum['id'])
download_webpage = self._download_webpage(
download_link, track_id, 'Downloading free downloads page')
blob = self._parse_json(
self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', download_webpage,
'blob', group='blob'),
track_id, transform_source=unescapeHTML)
blob = self._extract_data_attr(download_webpage, track_id, 'blob')
info = try_get(
blob, (lambda x: x['digital_items'][0],
@ -207,20 +199,20 @@ class BandcampIE(InfoExtractor):
'thumbnail': thumbnail,
'uploader': artist,
'timestamp': timestamp,
'release_date': release_date,
'release_timestamp': unified_timestamp(tralbum.get('album_release_date')),
'duration': duration,
'track': track,
'track_number': track_number,
'track_id': track_id,
'artist': artist,
'album': album,
'album': embed.get('album_title'),
'formats': formats,
}
class BandcampAlbumIE(InfoExtractor):
class BandcampAlbumIE(BandcampIE):
IE_NAME = 'Bandcamp:album'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<album_id>[^/?#&]+))?'
_VALID_URL = r'https?://(?:(?P<subdomain>[^.]+)\.)?bandcamp\.com(?:/album/(?P<id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://blazo.bandcamp.com/album/jazz-format-mixtape-vol-1',
@ -230,7 +222,10 @@ class BandcampAlbumIE(InfoExtractor):
'info_dict': {
'id': '1353101989',
'ext': 'mp3',
'title': 'Intro',
'title': 'Blazo - Intro',
'timestamp': 1311756226,
'upload_date': '20110727',
'uploader': 'Blazo',
}
},
{
@ -238,7 +233,10 @@ class BandcampAlbumIE(InfoExtractor):
'info_dict': {
'id': '38097443',
'ext': 'mp3',
'title': 'Kero One - Keep It Alive (Blazo remix)',
'title': 'Blazo - Kero One - Keep It Alive (Blazo remix)',
'timestamp': 1311757238,
'upload_date': '20110727',
'uploader': 'Blazo',
}
},
],
@ -274,6 +272,7 @@ class BandcampAlbumIE(InfoExtractor):
'title': '"Entropy" EP',
'uploader_id': 'jstrecords',
'id': 'entropy-ep',
'description': 'md5:0ff22959c943622972596062f2f366a5',
},
'playlist_mincount': 3,
}, {
@ -283,6 +282,7 @@ class BandcampAlbumIE(InfoExtractor):
'id': 'we-are-the-plague',
'title': 'WE ARE THE PLAGUE',
'uploader_id': 'insulters',
'description': 'md5:b3cf845ee41b2b1141dc7bde9237255f',
},
'playlist_count': 2,
}]
@ -294,41 +294,34 @@ class BandcampAlbumIE(InfoExtractor):
else super(BandcampAlbumIE, cls).suitable(url))
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
uploader_id = mobj.group('subdomain')
album_id = mobj.group('album_id')
uploader_id, album_id = re.match(self._VALID_URL, url).groups()
playlist_id = album_id or uploader_id
webpage = self._download_webpage(url, playlist_id)
track_elements = re.findall(
r'(?s)<div[^>]*>(.*?<a[^>]+href="([^"]+?)"[^>]+itemprop="url"[^>]*>.*?)</div>', webpage)
if not track_elements:
tralbum = self._extract_data_attr(webpage, playlist_id)
track_info = tralbum.get('trackinfo')
if not track_info:
raise ExtractorError('The page doesn\'t contain any tracks')
# Only tracks with duration info have songs
entries = [
self.url_result(
compat_urlparse.urljoin(url, t_path),
ie=BandcampIE.ie_key(),
video_title=self._search_regex(
r'<span\b[^>]+\bitemprop=["\']name["\'][^>]*>([^<]+)',
elem_content, 'track title', fatal=False))
for elem_content, t_path in track_elements
if self._html_search_meta('duration', elem_content, default=None)]
urljoin(url, t['title_link']), BandcampIE.ie_key(),
str_or_none(t.get('track_id') or t.get('id')), t.get('title'))
for t in track_info
if t.get('duration')]
current = tralbum.get('current') or {}
title = self._html_search_regex(
r'album_title\s*:\s*"((?:\\.|[^"\\])+?)"',
webpage, 'title', fatal=False)
if title:
title = title.replace(r'\"', '"')
return {
'_type': 'playlist',
'uploader_id': uploader_id,
'id': playlist_id,
'title': title,
'title': current.get('title'),
'description': current.get('about'),
'entries': entries,
}
class BandcampWeeklyIE(InfoExtractor):
class BandcampWeeklyIE(BandcampIE):
IE_NAME = 'Bandcamp:weekly'
_VALID_URL = r'https?://(?:www\.)?bandcamp\.com/?\?(?:.*?&)?show=(?P<id>\d+)'
_TESTS = [{
@ -343,29 +336,23 @@ class BandcampWeeklyIE(InfoExtractor):
'release_date': '20170404',
'series': 'Bandcamp Weekly',
'episode': 'Magic Moments',
'episode_number': 208,
'episode_id': '224',
}
},
'params': {
'format': 'opus-lo',
},
}, {
'url': 'https://bandcamp.com/?blah/blah@&show=228',
'only_matching': True
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
show_id = self._match_id(url)
webpage = self._download_webpage(url, show_id)
blob = self._parse_json(
self._search_regex(
r'data-blob=(["\'])(?P<blob>{.+?})\1', webpage,
'blob', group='blob'),
video_id, transform_source=unescapeHTML)
blob = self._extract_data_attr(webpage, show_id, 'blob')
show = blob['bcw_show']
# This is desired because any invalid show id redirects to `bandcamp.com`
# which happens to expose the latest Bandcamp Weekly episode.
show_id = int_or_none(show.get('show_id')) or int_or_none(video_id)
show = blob['bcw_data'][show_id]
formats = []
for format_id, format_url in show['audio_stream'].items():
@ -390,20 +377,8 @@ class BandcampWeeklyIE(InfoExtractor):
if subtitle:
title += ' - %s' % subtitle
episode_number = None
seq = blob.get('bcw_seq')
if seq and isinstance(seq, list):
try:
episode_number = next(
int_or_none(e.get('episode_number'))
for e in seq
if isinstance(e, dict) and int_or_none(e.get('id')) == show_id)
except StopIteration:
pass
return {
'id': video_id,
'id': show_id,
'title': title,
'description': show.get('desc') or show.get('short_desc'),
'duration': float_or_none(show.get('audio_duration')),
@ -411,7 +386,6 @@ class BandcampWeeklyIE(InfoExtractor):
'release_date': unified_strdate(show.get('published_date')),
'series': 'Bandcamp Weekly',
'episode': show.get('subtitle'),
'episode_number': episode_number,
'episode_id': compat_str(video_id),
'episode_id': show_id,
'formats': formats
}

View file

@ -1,31 +1,39 @@
# coding: utf-8
from __future__ import unicode_literals
import functools
import itertools
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
OnDemandPagedList,
clean_html,
dict_get,
ExtractorError,
float_or_none,
get_element_by_class,
int_or_none,
js_to_json,
parse_duration,
parse_iso8601,
strip_or_none,
try_get,
unescapeHTML,
unified_timestamp,
url_or_none,
urlencode_postdata,
urljoin,
)
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_urlparse,
)
class BBCCoUkIE(InfoExtractor):
@ -49,22 +57,17 @@ class BBCCoUkIE(InfoExtractor):
_LOGIN_URL = 'https://account.bbc.com/signin'
_NETRC_MACHINE = 'bbc'
_MEDIASELECTOR_URLS = [
_MEDIA_SELECTOR_URL_TEMPL = 'https://open.live.bbc.co.uk/mediaselector/6/select/version/2.0/mediaset/%s/vpid/%s'
_MEDIA_SETS = [
# Provides HQ HLS streams with even better quality that pc mediaset but fails
# with geolocation in some cases when it's even not geo restricted at all (e.g.
# http://www.bbc.co.uk/programmes/b06bp7lf). Also may fail with selectionunavailable.
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s',
'iptv-all',
'pc',
]
_MEDIASELECTION_NS = 'http://bbc.co.uk/2008/mp/mediaselection'
_EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist'
_NAMESPACES = (
_MEDIASELECTION_NS,
_EMP_PLAYLIST_NS,
)
_TESTS = [
{
'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
@ -209,7 +212,7 @@ class BBCCoUkIE(InfoExtractor):
},
'skip': 'Now it\'s really geo-restricted',
}, {
# compact player (https://github.com/ytdl-org/haruhi-dl/issues/8147)
# compact player (https://github.com/ytdl-org/youtube-dl/issues/8147)
'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
'info_dict': {
'id': 'p028bfkj',
@ -261,8 +264,6 @@ class BBCCoUkIE(InfoExtractor):
'only_matching': True,
}]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
def _login(self):
username, password = self._get_login_info()
if username is None:
@ -307,22 +308,14 @@ class BBCCoUkIE(InfoExtractor):
def _extract_items(self, playlist):
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
def _findall_ns(self, element, xpath):
elements = []
for ns in self._NAMESPACES:
elements.extend(element.findall(xpath % ns))
return elements
def _extract_medias(self, media_selection):
error = media_selection.find('./{%s}error' % self._MEDIASELECTION_NS)
if error is None:
media_selection.find('./{%s}error' % self._EMP_PLAYLIST_NS)
if error is not None:
raise BBCCoUkIE.MediaSelectionError(error.get('id'))
return self._findall_ns(media_selection, './{%s}media')
error = media_selection.get('result')
if error:
raise BBCCoUkIE.MediaSelectionError(error)
return media_selection.get('media') or []
def _extract_connections(self, media):
return self._findall_ns(media, './{%s}connection')
return media.get('connection') or []
def _get_subtitles(self, media, programme_id):
subtitles = {}
@ -334,13 +327,13 @@ class BBCCoUkIE(InfoExtractor):
cc_url, programme_id, 'Downloading captions', fatal=False)
if not isinstance(captions, compat_etree_Element):
continue
lang = captions.get('{http://www.w3.org/XML/1998/namespace}lang', 'en')
subtitles[lang] = [
subtitles['en'] = [
{
'url': connection.get('href'),
'ext': 'ttml',
},
]
break
return subtitles
def _raise_extractor_error(self, media_selection_error):
@ -350,10 +343,10 @@ class BBCCoUkIE(InfoExtractor):
def _download_media_selector(self, programme_id):
last_exception = None
for mediaselector_url in self._MEDIASELECTOR_URLS:
for media_set in self._MEDIA_SETS:
try:
return self._download_media_selector_url(
mediaselector_url % programme_id, programme_id)
self._MEDIA_SELECTOR_URL_TEMPL % (media_set, programme_id), programme_id)
except BBCCoUkIE.MediaSelectionError as e:
if e.id in ('notukerror', 'geolocation', 'selectionunavailable'):
last_exception = e
@ -362,8 +355,8 @@ class BBCCoUkIE(InfoExtractor):
self._raise_extractor_error(last_exception)
def _download_media_selector_url(self, url, programme_id=None):
media_selection = self._download_xml(
url, programme_id, 'Downloading media selection XML',
media_selection = self._download_json(
url, programme_id, 'Downloading media selection JSON',
expected_status=(403, 404))
return self._process_media_selector(media_selection, programme_id)
@ -377,7 +370,6 @@ class BBCCoUkIE(InfoExtractor):
if kind in ('video', 'audio'):
bitrate = int_or_none(media.get('bitrate'))
encoding = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
@ -392,8 +384,6 @@ class BBCCoUkIE(InfoExtractor):
supplier = connection.get('supplier')
transfer_format = connection.get('transferFormat')
format_id = supplier or conn_kind or protocol
if service:
format_id = '%s_%s' % (service, format_id)
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
@ -408,20 +398,11 @@ class BBCCoUkIE(InfoExtractor):
formats.extend(self._extract_m3u8_formats(
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
if re.search(self._USP_RE, href):
usp_formats = self._extract_m3u8_formats(
re.sub(self._USP_RE, r'/\1.ism/\1.m3u8', href),
programme_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False)
for f in usp_formats:
if f.get('height') and f['height'] > 720:
continue
formats.append(f)
elif transfer_format == 'hds':
formats.extend(self._extract_f4m_formats(
href, programme_id, f4m_id=format_id, fatal=False))
else:
if not service and not supplier and bitrate:
if not supplier and bitrate:
format_id += '-%d' % bitrate
fmt = {
'format_id': format_id,
@ -554,7 +535,7 @@ class BBCCoUkIE(InfoExtractor):
webpage = self._download_webpage(url, group_id, 'Downloading video page')
error = self._search_regex(
r'<div\b[^>]+\bclass=["\']smp__message delta["\'][^>]*>([^<]+)<',
r'<div\b[^>]+\bclass=["\'](?:smp|playout)__message delta["\'][^>]*>\s*([^<]+?)\s*<',
webpage, 'error', default=None)
if error:
raise ExtractorError(error, expected=True)
@ -607,16 +588,9 @@ class BBCIE(BBCCoUkIE):
IE_DESC = 'BBC'
_VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)'
_MEDIASELECTOR_URLS = [
# Provides HQ HLS streams but fails with geolocation in some cases when it's
# even not geo restricted at all
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/%s',
# Provides more formats, namely direct mp4 links, but fails on some videos with
# notukerror for non UK (?) users (e.g.
# http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
'http://open.live.bbc.co.uk/mediaselector/4/mtis/stream/%s',
# Provides fewer formats, but works everywhere for everybody (hopefully)
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/journalism-pc/vpid/%s',
_MEDIA_SETS = [
'mobile-tablet-main',
'pc',
]
_TESTS = [{
@ -790,8 +764,17 @@ class BBCIE(BBCCoUkIE):
'only_matching': True,
}, {
# custom redirection to www.bbc.com
# also, video with window.__INITIAL_DATA__
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True,
'info_dict': {
'id': 'p02xzws1',
'ext': 'mp4',
'title': "Pluto may have 'nitrogen glaciers'",
'description': 'md5:6a95b593f528d7a5f2605221bc56912f',
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1437785037,
'upload_date': '20150725',
},
}, {
# single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
@ -827,11 +810,25 @@ class BBCIE(BBCCoUkIE):
'description': 'Learn English words and phrases from this story',
},
'add_ie': [BBCCoUkIE.ie_key()],
}, {
# BBC Reel
'url': 'https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness',
'info_dict': {
'id': 'p07c6sb9',
'ext': 'mp4',
'title': 'How positive thinking is harming your happiness',
'alt_title': 'The downsides of positive thinking',
'description': 'md5:fad74b31da60d83b8265954ee42d85b4',
'duration': 235,
'thumbnail': r're:https?://.+/p07c9dsr.jpg',
'upload_date': '20190604',
'categories': ['Psychology'],
},
}]
@classmethod
def suitable(cls, url):
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerEpisodesIE, BBCCoUkIPlayerGroupIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url))
@ -963,7 +960,7 @@ class BBCIE(BBCCoUkIE):
else:
entry['title'] = info['title']
entry['formats'].extend(info['formats'])
except Exception as e:
except ExtractorError as e:
# Some playlist URL may fail with 500, at the same time
# the other one may work fine (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
@ -981,7 +978,7 @@ class BBCIE(BBCCoUkIE):
group_id = self._search_regex(
r'<div[^>]+\bclass=["\']video["\'][^>]+\bdata-pid=["\'](%s)' % self._ID_REGEX,
webpage, 'group id', default=None)
if playlist_id:
if group_id:
return self.url_result(
'https://www.bbc.co.uk/programmes/%s' % group_id,
ie=BBCCoUkIE.ie_key())
@ -1014,6 +1011,37 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles,
}
# bbc reel (e.g. https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness)
initial_data = self._parse_json(self._html_search_regex(
r'<script[^>]+id=(["\'])initial-data\1[^>]+data-json=(["\'])(?P<json>(?:(?!\2).)+)',
webpage, 'initial data', default='{}', group='json'), playlist_id, fatal=False)
if initial_data:
init_data = try_get(
initial_data, lambda x: x['initData']['items'][0], dict) or {}
smp_data = init_data.get('smpData') or {}
clip_data = try_get(smp_data, lambda x: x['items'][0], dict) or {}
version_id = clip_data.get('versionID')
if version_id:
title = smp_data['title']
formats, subtitles = self._download_media_selector(version_id)
self._sort_formats(formats)
image_url = smp_data.get('holdingImageURL')
display_date = init_data.get('displayDate')
topic_title = init_data.get('topicTitle')
return {
'id': version_id,
'title': title,
'formats': formats,
'alt_title': init_data.get('shortTitle'),
'thumbnail': image_url.replace('$recipe', 'raw') if image_url else None,
'description': smp_data.get('summary') or init_data.get('shortSummary'),
'upload_date': display_date.replace('-', '') if display_date else None,
'subtitles': subtitles,
'duration': int_or_none(clip_data.get('duration')),
'categories': [topic_title] if topic_title else None,
}
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
# There are several setPayload calls may be present but the video
# seems to be always related to the first one
@ -1075,7 +1103,7 @@ class BBCIE(BBCCoUkIE):
thumbnail = None
image_url = current_programme.get('image_url')
if image_url:
thumbnail = image_url.replace('{recipe}', '1920x1920')
thumbnail = image_url.replace('{recipe}', 'raw')
return {
'id': programme_id,
'title': title,
@ -1092,10 +1120,26 @@ class BBCIE(BBCCoUkIE):
self._search_regex(
r'(?s)bbcthreeConfig\s*=\s*({.+?})\s*;\s*<', webpage,
'bbcthree config', default='{}'),
playlist_id, transform_source=js_to_json, fatal=False)
if bbc3_config:
playlist_id, transform_source=js_to_json, fatal=False) or {}
payload = bbc3_config.get('payload') or {}
if payload:
clip = payload.get('currentClip') or {}
clip_vpid = clip.get('vpid')
clip_title = clip.get('title')
if clip_vpid and clip_title:
formats, subtitles = self._download_media_selector(clip_vpid)
self._sort_formats(formats)
return {
'id': clip_vpid,
'title': clip_title,
'thumbnail': dict_get(clip, ('poster', 'imageUrl')),
'description': clip.get('description'),
'duration': parse_duration(clip.get('duration')),
'formats': formats,
'subtitles': subtitles,
}
bbc3_playlist = try_get(
bbc3_config, lambda x: x['payload']['content']['bbcMedia']['playlist'],
payload, lambda x: x['content']['bbcMedia']['playlist'],
dict)
if bbc3_playlist:
playlist_title = bbc3_playlist.get('title') or playlist_title
@ -1118,6 +1162,56 @@ class BBCIE(BBCCoUkIE):
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)
initial_data = self._parse_json(self._search_regex(
r'window\.__INITIAL_DATA__\s*=\s*({.+?});', webpage,
'preload state', default='{}'), playlist_id, fatal=False)
if initial_data:
def parse_media(media):
if not media:
return
for item in (try_get(media, lambda x: x['media']['items'], list) or []):
item_id = item.get('id')
item_title = item.get('title')
if not (item_id and item_title):
continue
formats, subtitles = self._download_media_selector(item_id)
self._sort_formats(formats)
item_desc = None
blocks = try_get(media, lambda x: x['summary']['blocks'], list)
if blocks:
summary = []
for block in blocks:
text = try_get(block, lambda x: x['model']['text'], compat_str)
if text:
summary.append(text)
if summary:
item_desc = '\n\n'.join(summary)
item_time = None
for meta in try_get(media, lambda x: x['metadata']['items'], list) or []:
if try_get(meta, lambda x: x['label']) == 'Published':
item_time = unified_timestamp(meta.get('timestamp'))
break
entries.append({
'id': item_id,
'title': item_title,
'thumbnail': item.get('holdingImageUrl'),
'formats': formats,
'subtitles': subtitles,
'timestamp': item_time,
'description': strip_or_none(item_desc),
})
for resp in (initial_data.get('data') or {}).values():
name = resp.get('name')
if name == 'media-experience':
parse_media(try_get(resp, lambda x: x['data']['initialItem']['mediaItem'], dict))
elif name == 'article':
for block in (try_get(resp, lambda x: x['data']['blocks'], list) or []):
if block.get('type') != 'media':
continue
parse_media(block.get('model'))
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)
def extract_all(pattern):
return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False),
@ -1278,21 +1372,149 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
class BBCCoUkIPlayerPlaylistBaseIE(InfoExtractor):
_VALID_URL_TMPL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/%%s/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
@staticmethod
def _get_default(episode, key, default_key='default'):
return try_get(episode, lambda x: x[key][default_key])
def _get_description(self, data):
synopsis = data.get(self._DESCRIPTION_KEY) or {}
return dict_get(synopsis, ('large', 'medium', 'small'))
def _fetch_page(self, programme_id, per_page, series_id, page):
elements = self._get_elements(self._call_api(
programme_id, per_page, page + 1, series_id))
for element in elements:
episode = self._get_episode(element)
episode_id = episode.get('id')
if not episode_id:
continue
thumbnail = None
image = self._get_episode_image(episode)
if image:
thumbnail = image.replace('{recipe}', 'raw')
category = self._get_default(episode, 'labels', 'category')
yield {
'_type': 'url',
'id': episode_id,
'title': self._get_episode_field(episode, 'subtitle'),
'url': 'https://www.bbc.co.uk/iplayer/episode/' + episode_id,
'thumbnail': thumbnail,
'description': self._get_description(episode),
'categories': [category] if category else None,
'series': self._get_episode_field(episode, 'title'),
'ie_key': BBCCoUkIE.ie_key(),
}
def _real_extract(self, url):
pid = self._match_id(url)
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
series_id = qs.get('seriesId', [None])[0]
page = qs.get('page', [None])[0]
per_page = 36 if page else self._PAGE_SIZE
fetch_page = functools.partial(self._fetch_page, pid, per_page, series_id)
entries = fetch_page(int(page) - 1) if page else OnDemandPagedList(fetch_page, self._PAGE_SIZE)
playlist_data = self._get_playlist_data(self._call_api(pid, 1))
return self.playlist_result(
entries, pid, self._get_playlist_title(playlist_data),
self._get_description(playlist_data))
class BBCCoUkIPlayerEpisodesIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:episodes'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'episodes'
_TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance',
'description': 'French thriller serial about a missing teenager.',
'description': 'md5:58eb101aee3116bad4da05f91179c0cb',
},
'playlist_mincount': 6,
'skip': 'This programme is not currently available on BBC iPlayer',
'playlist_mincount': 8,
}, {
# all seasons
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 10,
}, {
# explicit season
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster?seriesId=b094m6nv',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 5,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 37,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove?page=2',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 1,
}]
_PAGE_SIZE = 100
_DESCRIPTION_KEY = 'synopsis'
def _get_episode_image(self, episode):
return self._get_default(episode, 'image')
def _get_episode_field(self, episode, field):
return self._get_default(episode, field)
@staticmethod
def _get_elements(data):
return data['entities']['results']
@staticmethod
def _get_episode(element):
return element.get('episode') or {}
def _call_api(self, pid, per_page, page=1, series_id=None):
variables = {
'id': pid,
'page': page,
'perPage': per_page,
}
if series_id:
variables['sliceId'] = series_id
return self._download_json(
'https://graph.ibl.api.bbc.co.uk/', pid, headers={
'Content-Type': 'application/json'
}, data=json.dumps({
'id': '5692d93d5aac8d796a0305e895e61551',
'variables': variables,
}).encode('utf-8'))['data']['programme']
@staticmethod
def _get_playlist_data(data):
return data
def _get_playlist_title(self, data):
return self._get_default(data, 'title')
class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:group'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'group'
_TESTS = [{
# Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': {
@ -1301,14 +1523,56 @@ class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
},
'playlist_mincount': 10,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 47,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7?page=2',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 11,
}]
_PAGE_SIZE = 200
_DESCRIPTION_KEY = 'synopses'
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
webpage, 'description', fatal=False, group='value')
return title, description
def _get_episode_image(self, episode):
return self._get_default(episode, 'images', 'standard')
def _get_episode_field(self, episode, field):
return episode.get(field)
@staticmethod
def _get_elements(data):
return data['elements']
@staticmethod
def _get_episode(element):
return element
def _call_api(self, pid, per_page, page=1, series_id=None):
return self._download_json(
'http://ibl.api.bbc.co.uk/ibl/v1/groups/%s/episodes' % pid,
pid, query={
'page': page,
'per_page': per_page,
})['group_episodes']
@staticmethod
def _get_playlist_data(data):
return data['group']
def _get_playlist_title(self, data):
return data.get('title')
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):

View file

@ -1,194 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
compat_str,
float_or_none,
int_or_none,
parse_iso8601,
try_get,
urljoin,
)
class BeamProBaseIE(InfoExtractor):
_API_BASE = 'https://mixer.com/api/v1'
_RATINGS = {'family': 0, 'teen': 13, '18+': 18}
def _extract_channel_info(self, chan):
user_id = chan.get('userId') or try_get(chan, lambda x: x['user']['id'])
return {
'uploader': chan.get('token') or try_get(
chan, lambda x: x['user']['username'], compat_str),
'uploader_id': compat_str(user_id) if user_id else None,
'age_limit': self._RATINGS.get(chan.get('audience')),
}
class BeamProLiveIE(BeamProBaseIE):
IE_NAME = 'Mixer:live'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://mixer.com/niterhayven',
'info_dict': {
'id': '261562',
'ext': 'mp4',
'title': 'Introducing The Witcher 3 // The Grind Starts Now!',
'description': 'md5:0b161ac080f15fe05d18a07adb44a74d',
'thumbnail': r're:https://.*\.jpg$',
'timestamp': 1483477281,
'upload_date': '20170103',
'uploader': 'niterhayven',
'uploader_id': '373396',
'age_limit': 18,
'is_live': True,
'view_count': int,
},
'skip': 'niterhayven is offline',
'params': {
'skip_download': True,
},
}
_MANIFEST_URL_TEMPLATE = '%s/channels/%%s/manifest.%%s' % BeamProBaseIE._API_BASE
@classmethod
def suitable(cls, url):
return False if BeamProVodIE.suitable(url) else super(BeamProLiveIE, cls).suitable(url)
def _real_extract(self, url):
channel_name = self._match_id(url)
chan = self._download_json(
'%s/channels/%s' % (self._API_BASE, channel_name), channel_name)
if chan.get('online') is False:
raise ExtractorError(
'{0} is offline'.format(channel_name), expected=True)
channel_id = chan['id']
def manifest_url(kind):
return self._MANIFEST_URL_TEMPLATE % (channel_id, kind)
formats = self._extract_m3u8_formats(
manifest_url('m3u8'), channel_name, ext='mp4', m3u8_id='hls',
fatal=False)
formats.extend(self._extract_smil_formats(
manifest_url('smil'), channel_name, fatal=False))
self._sort_formats(formats)
info = {
'id': compat_str(chan.get('id') or channel_name),
'title': self._live_title(chan.get('name') or channel_name),
'description': clean_html(chan.get('description')),
'thumbnail': try_get(
chan, lambda x: x['thumbnail']['url'], compat_str),
'timestamp': parse_iso8601(chan.get('updatedAt')),
'is_live': True,
'view_count': int_or_none(chan.get('viewersTotal')),
'formats': formats,
}
info.update(self._extract_channel_info(chan))
return info
class BeamProVodIE(BeamProBaseIE):
IE_NAME = 'Mixer:vod'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>[^?#&]+)'
_TESTS = [{
'url': 'https://mixer.com/willow8714?vod=2259830',
'md5': 'b2431e6e8347dc92ebafb565d368b76b',
'info_dict': {
'id': '2259830',
'ext': 'mp4',
'title': 'willow8714\'s Channel',
'duration': 6828.15,
'thumbnail': r're:https://.*source\.png$',
'timestamp': 1494046474,
'upload_date': '20170506',
'uploader': 'willow8714',
'uploader_id': '6085379',
'age_limit': 13,
'view_count': int,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://mixer.com/streamer?vod=IxFno1rqC0S_XJ1a2yGgNw',
'only_matching': True,
}, {
'url': 'https://mixer.com/streamer?vod=Rh3LY0VAqkGpEQUe2pN-ig',
'only_matching': True,
}]
@staticmethod
def _extract_format(vod, vod_type):
if not vod.get('baseUrl'):
return []
if vod_type == 'hls':
filename, protocol = 'manifest.m3u8', 'm3u8_native'
elif vod_type == 'raw':
filename, protocol = 'source.mp4', 'https'
else:
assert False
data = vod.get('data') if isinstance(vod.get('data'), dict) else {}
format_id = [vod_type]
if isinstance(data.get('Height'), compat_str):
format_id.append('%sp' % data['Height'])
return [{
'url': urljoin(vod['baseUrl'], filename),
'format_id': '-'.join(format_id),
'ext': 'mp4',
'protocol': protocol,
'width': int_or_none(data.get('Width')),
'height': int_or_none(data.get('Height')),
'fps': int_or_none(data.get('Fps')),
'tbr': int_or_none(data.get('Bitrate'), 1000),
}]
def _real_extract(self, url):
vod_id = self._match_id(url)
vod_info = self._download_json(
'%s/recordings/%s' % (self._API_BASE, vod_id), vod_id)
state = vod_info.get('state')
if state != 'AVAILABLE':
raise ExtractorError(
'VOD %s is not available (state: %s)' % (vod_id, state),
expected=True)
formats = []
thumbnail_url = None
for vod in vod_info['vods']:
vod_type = vod.get('format')
if vod_type in ('hls', 'raw'):
formats.extend(self._extract_format(vod, vod_type))
elif vod_type == 'thumbnail':
thumbnail_url = urljoin(vod.get('baseUrl'), 'source.png')
self._sort_formats(formats)
info = {
'id': vod_id,
'title': vod_info.get('name') or vod_id,
'duration': float_or_none(vod_info.get('duration')),
'thumbnail': thumbnail_url,
'timestamp': parse_iso8601(vod_info.get('createdAt')),
'view_count': int_or_none(vod_info.get('viewsTotal')),
'formats': formats,
}
info.update(self._extract_channel_info(vod_info.get('channel') or {}))
return info

View file

@ -0,0 +1,103 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import extract_attributes
class BFMTVBaseIE(InfoExtractor):
_VALID_URL_BASE = r'https?://(?:www\.)?bfmtv\.com/'
_VALID_URL_TMPL = _VALID_URL_BASE + r'(?:[^/]+/)*[^/?&#]+_%s[A-Z]-(?P<id>\d{12})\.html'
_VIDEO_BLOCK_REGEX = r'(<div[^>]+class="video_block"[^>]*>)'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/%s_default/index.html?videoId=%s'
def _brightcove_url_result(self, video_id, video_block):
account_id = video_block.get('accountid') or '876450612001'
player_id = video_block.get('playerid') or 'I2qBTln4u'
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % (account_id, player_id, video_id),
'BrightcoveNew', video_id)
class BFMTVIE(BFMTVBaseIE):
IE_NAME = 'bfmtv'
_VALID_URL = BFMTVBaseIE._VALID_URL_TMPL % 'V'
_TESTS = [{
'url': 'https://www.bfmtv.com/politique/emmanuel-macron-l-islam-est-une-religion-qui-vit-une-crise-aujourd-hui-partout-dans-le-monde_VN-202010020146.html',
'info_dict': {
'id': '6196747868001',
'ext': 'mp4',
'title': 'Emmanuel Macron: "L\'Islam est une religion qui vit une crise aujourdhui, partout dans le monde"',
'description': 'Le Président s\'exprime sur la question du séparatisme depuis les Mureaux, dans les Yvelines.',
'uploader_id': '876450610001',
'upload_date': '20201002',
'timestamp': 1601629620,
},
}]
def _real_extract(self, url):
bfmtv_id = self._match_id(url)
webpage = self._download_webpage(url, bfmtv_id)
video_block = extract_attributes(self._search_regex(
self._VIDEO_BLOCK_REGEX, webpage, 'video block'))
return self._brightcove_url_result(video_block['videoid'], video_block)
class BFMTVLiveIE(BFMTVIE):
IE_NAME = 'bfmtv:live'
_VALID_URL = BFMTVBaseIE._VALID_URL_BASE + '(?P<id>(?:[^/]+/)?en-direct)'
_TESTS = [{
'url': 'https://www.bfmtv.com/en-direct/',
'info_dict': {
'id': '5615950982001',
'ext': 'mp4',
'title': r're:^le direct BFMTV WEB \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'uploader_id': '876450610001',
'upload_date': '20171018',
'timestamp': 1508329950,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.bfmtv.com/economie/en-direct/',
'only_matching': True,
}]
class BFMTVArticleIE(BFMTVBaseIE):
IE_NAME = 'bfmtv:article'
_VALID_URL = BFMTVBaseIE._VALID_URL_TMPL % 'A'
_TESTS = [{
'url': 'https://www.bfmtv.com/sante/covid-19-un-responsable-de-l-institut-pasteur-se-demande-quand-la-france-va-se-reconfiner_AV-202101060198.html',
'info_dict': {
'id': '202101060198',
'title': 'Covid-19: un responsable de l\'Institut Pasteur se demande "quand la France va se reconfiner"',
'description': 'md5:947974089c303d3ac6196670ae262843',
},
'playlist_count': 2,
}, {
'url': 'https://www.bfmtv.com/international/pour-bolsonaro-le-bresil-est-en-faillite-mais-il-ne-peut-rien-faire_AD-202101060232.html',
'only_matching': True,
}, {
'url': 'https://www.bfmtv.com/sante/covid-19-oui-le-vaccin-de-pfizer-distribue-en-france-a-bien-ete-teste-sur-des-personnes-agees_AN-202101060275.html',
'only_matching': True,
}]
def _real_extract(self, url):
bfmtv_id = self._match_id(url)
webpage = self._download_webpage(url, bfmtv_id)
entries = []
for video_block_el in re.findall(self._VIDEO_BLOCK_REGEX, webpage):
video_block = extract_attributes(video_block_el)
video_id = video_block.get('videoid')
if not video_id:
continue
entries.append(self._brightcove_url_result(video_id, video_block))
return self.playlist_result(
entries, bfmtv_id, self._og_search_title(webpage, fatal=False),
self._html_search_meta(['og:description', 'description'], webpage))

View file

@ -0,0 +1,30 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class BibelTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bibeltv\.de/mediathek/videos/(?:crn/)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.bibeltv.de/mediathek/videos/329703-sprachkurs-in-malaiisch',
'md5': '252f908192d611de038b8504b08bf97f',
'info_dict': {
'id': 'ref:329703',
'ext': 'mp4',
'title': 'Sprachkurs in Malaiisch',
'description': 'md5:3e9f197d29ee164714e67351cf737dfe',
'timestamp': 1608316701,
'uploader_id': '5840105145001',
'upload_date': '20201218',
}
}, {
'url': 'https://www.bibeltv.de/mediathek/videos/crn/326374',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5840105145001/default_default/index.html?videoId=ref:%s'
def _real_extract(self, url):
crn_id = self._match_id(url)
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % crn_id, 'BrightcoveNew')

View file

@ -156,6 +156,7 @@ class BiliBiliIE(InfoExtractor):
cid = js['result']['cid']
headers = {
'Accept': 'application/json',
'Referer': url
}
headers.update(self.geo_verification_headers())
@ -232,7 +233,7 @@ class BiliBiliIE(InfoExtractor):
webpage)
if uploader_mobj:
info.update({
'uploader': uploader_mobj.group('name'),
'uploader': uploader_mobj.group('name').strip(),
'uploader_id': uploader_mobj.group('id'),
})
if not info.get('uploader'):

View file

@ -90,13 +90,19 @@ class BleacherReportCMSIE(AMPIE):
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36}|\d{5})'
_TESTS = [{
'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1&library=video-cms',
'md5': '2e4b0a997f9228ffa31fada5c53d1ed1',
'md5': '670b2d73f48549da032861130488c681',
'info_dict': {
'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
'ext': 'flv',
'ext': 'mp4',
'title': 'Cena vs. Rollins Would Expose the Heavyweight Division',
'description': 'md5:984afb4ade2f9c0db35f3267ed88b36e',
'upload_date': '20150723',
'timestamp': 1437679032,
},
'expected_warnings': [
'Unable to download f4m manifest'
]
}]
def _real_extract(self, url):

View file

@ -1,86 +0,0 @@
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
remove_start,
int_or_none,
)
class BlinkxIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:www\.)blinkx\.com/#?ce/|blinkx:)(?P<id>[^?]+)'
IE_NAME = 'blinkx'
_TEST = {
'url': 'http://www.blinkx.com/ce/Da0Gw3xc5ucpNduzLuDDlv4WC9PuI4fDi1-t6Y3LyfdY2SZS5Urbvn-UPJvrvbo8LTKTc67Wu2rPKSQDJyZeeORCR8bYkhs8lI7eqddznH2ofh5WEEdjYXnoRtj7ByQwt7atMErmXIeYKPsSDuMAAqJDlQZ-3Ff4HJVeH_s3Gh8oQ',
'md5': '337cf7a344663ec79bf93a526a2e06c7',
'info_dict': {
'id': 'Da0Gw3xc',
'ext': 'mp4',
'title': 'No Daily Show for John Oliver; HBO Show Renewed - IGN News',
'uploader': 'IGN News',
'upload_date': '20150217',
'timestamp': 1424215740,
'description': 'HBO has renewed Last Week Tonight With John Oliver for two more seasons.',
'duration': 47.743333,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = video_id[:8]
api_url = ('https://apib4.blinkx.com/api.php?action=play_video&'
+ 'video=%s' % video_id)
data_json = self._download_webpage(api_url, display_id)
data = json.loads(data_json)['api']['results'][0]
duration = None
thumbnails = []
formats = []
for m in data['media']:
if m['type'] == 'jpg':
thumbnails.append({
'url': m['link'],
'width': int(m['w']),
'height': int(m['h']),
})
elif m['type'] == 'original':
duration = float(m['d'])
elif m['type'] == 'youtube':
yt_id = m['link']
self.to_screen('Youtube video detected: %s' % yt_id)
return self.url_result(yt_id, 'Youtube', video_id=yt_id)
elif m['type'] in ('flv', 'mp4'):
vcodec = remove_start(m['vcodec'], 'ff')
acodec = remove_start(m['acodec'], 'ff')
vbr = int_or_none(m.get('vbr') or m.get('vbitrate'), 1000)
abr = int_or_none(m.get('abr') or m.get('abitrate'), 1000)
tbr = vbr + abr if vbr and abr else None
format_id = '%s-%sk-%s' % (vcodec, tbr, m['w'])
formats.append({
'format_id': format_id,
'url': m['link'],
'vcodec': vcodec,
'acodec': acodec,
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(m.get('w')),
'height': int_or_none(m.get('h')),
})
self._sort_formats(formats)
return {
'id': display_id,
'fullid': video_id,
'title': data['title'],
'formats': formats,
'uploader': data['channel_name'],
'timestamp': data['pubdate_epoch'],
'description': data.get('description'),
'thumbnails': thumbnails,
'duration': duration,
}

View file

@ -0,0 +1,60 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
try_get,
urlencode_postdata,
)
class BongaCamsIE(InfoExtractor):
_VALID_URL = r'https?://(?P<host>(?:[^/]+\.)?bongacams\d*\.com)/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://de.bongacams.com/azumi-8',
'only_matching': True,
}, {
'url': 'https://cn.bongacams.com/azumi-8',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
channel_id = mobj.group('id')
amf = self._download_json(
'https://%s/tools/amf.php' % host, channel_id,
data=urlencode_postdata((
('method', 'getRoomData'),
('args[]', channel_id),
('args[]', 'false'),
)), headers={'X-Requested-With': 'XMLHttpRequest'})
server_url = amf['localData']['videoServerUrl']
uploader_id = try_get(
amf, lambda x: x['performerData']['username'], compat_str) or channel_id
uploader = try_get(
amf, lambda x: x['performerData']['displayName'], compat_str)
like_count = int_or_none(try_get(
amf, lambda x: x['performerData']['loversCount']))
formats = self._extract_m3u8_formats(
'%s/hls/stream_%s/playlist.m3u8' % (server_url, uploader_id),
channel_id, 'mp4', m3u8_id='hls', live=True)
self._sort_formats(formats)
return {
'id': channel_id,
'title': self._live_title(uploader or uploader_id),
'uploader': uploader,
'uploader_id': uploader_id,
'like_count': like_count,
'age_limit': 18,
'is_live': True,
'formats': formats,
}

View file

@ -0,0 +1,98 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_iso8601,
# try_get,
update_url_query,
)
class BoxIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?app\.box\.com/s/(?P<shared_name>[^/]+)/file/(?P<id>\d+)'
_TEST = {
'url': 'https://mlssoccer.app.box.com/s/0evd2o3e08l60lr4ygukepvnkord1o1x/file/510727257538',
'md5': '1f81b2fd3960f38a40a3b8823e5fcd43',
'info_dict': {
'id': '510727257538',
'ext': 'mp4',
'title': 'Garber St. Louis will be 28th MLS team +scarving.mp4',
'uploader': 'MLS Video',
'timestamp': 1566320259,
'upload_date': '20190820',
'uploader_id': '235196876',
}
}
def _real_extract(self, url):
shared_name, file_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, file_id)
request_token = self._parse_json(self._search_regex(
r'Box\.config\s*=\s*({.+?});', webpage,
'Box config'), file_id)['requestToken']
access_token = self._download_json(
'https://app.box.com/app-api/enduserapp/elements/tokens', file_id,
'Downloading token JSON metadata',
data=json.dumps({'fileIDs': [file_id]}).encode(), headers={
'Content-Type': 'application/json',
'X-Request-Token': request_token,
'X-Box-EndUser-API': 'sharedName=' + shared_name,
})[file_id]['read']
shared_link = 'https://app.box.com/s/' + shared_name
f = self._download_json(
'https://api.box.com/2.0/files/' + file_id, file_id,
'Downloading file JSON metadata', headers={
'Authorization': 'Bearer ' + access_token,
'BoxApi': 'shared_link=' + shared_link,
'X-Rep-Hints': '[dash]', # TODO: extract `hls` formats
}, query={
'fields': 'authenticated_download_url,created_at,created_by,description,extension,is_download_available,name,representations,size'
})
title = f['name']
query = {
'access_token': access_token,
'shared_link': shared_link
}
formats = []
# for entry in (try_get(f, lambda x: x['representations']['entries'], list) or []):
# entry_url_template = try_get(
# entry, lambda x: x['content']['url_template'])
# if not entry_url_template:
# continue
# representation = entry.get('representation')
# if representation == 'dash':
# TODO: append query to every fragment URL
# formats.extend(self._extract_mpd_formats(
# entry_url_template.replace('{+asset_path}', 'manifest.mpd'),
# file_id, query=query))
authenticated_download_url = f.get('authenticated_download_url')
if authenticated_download_url and f.get('is_download_available'):
formats.append({
'ext': f.get('extension') or determine_ext(title),
'filesize': f.get('size'),
'format_id': 'download',
'url': update_url_query(authenticated_download_url, query),
})
self._sort_formats(formats)
creator = f.get('created_by') or {}
return {
'id': file_id,
'title': title,
'formats': formats,
'description': f.get('description') or None,
'uploader': creator.get('name'),
'timestamp': parse_iso8601(f.get('created_at')),
'uploader_id': creator.get('id'),
}

View file

@ -12,7 +12,7 @@ from ..utils import (
class BravoTVIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?P<req_id>bravotv|oxygen)\.com/(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.bravotv.com/top-chef/season-16/episode-15/videos/the-top-chef-season-16-winner-is',
'md5': 'e34684cfea2a96cd2ee1ef3a60909de9',
@ -28,10 +28,13 @@ class BravoTVIE(AdobePassIE):
}, {
'url': 'http://www.bravotv.com/below-deck/season-3/ep-14-reunion-part-1',
'only_matching': True,
}, {
'url': 'https://www.oxygen.com/in-ice-cold-blood/season-2/episode-16/videos/handling-the-horwitz-house-after-the-murder-season-2',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
settings = self._parse_json(self._search_regex(
r'<script[^>]+data-drupal-selector="drupal-settings-json"[^>]*>({.+?})</script>', webpage, 'drupal settings'),
@ -53,11 +56,14 @@ class BravoTVIE(AdobePassIE):
tp_path = release_pid = tve['release_pid']
if tve.get('entitlement') == 'auth':
adobe_pass = settings.get('tve_adobe_auth', {})
if site == 'bravotv':
site = 'bravo'
resource = self._get_mvpd_resource(
adobe_pass.get('adobePassResourceId', 'bravo'),
adobe_pass.get('adobePassResourceId') or site,
tve['title'], release_pid, tve.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, release_pid, adobe_pass.get('adobePassRequestorId', 'bravo'), resource)
url, release_pid,
adobe_pass.get('adobePassRequestorId') or site, resource)
else:
shared_playlist = settings['ls_playlist']
account_pid = shared_playlist['account_pid']

View file

@ -28,6 +28,7 @@ from ..utils import (
parse_iso8601,
smuggle_url,
str_or_none,
try_get,
unescapeHTML,
unsmuggle_url,
UnsupportedError,
@ -129,7 +130,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'skip': 'Unsupported URL',
},
{
# playlist with 'playlistTab' (https://github.com/ytdl-org/haruhi-dl/issues/9965)
# playlist with 'playlistTab' (https://github.com/ytdl-org/youtube-dl/issues/9965)
'url': 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=AQ%7E%7E,AAABXlLMdok%7E,NJ4EoMlZ4rZdx9eU1rkMVd8EaYPBBUlg',
'info_dict': {
'id': '1522758701001',
@ -153,10 +154,10 @@ class BrightcoveLegacyIE(InfoExtractor):
<object class="BrightcoveExperience">{params}</object>
"""
# Fix up some stupid HTML, see https://github.com/ytdl-org/haruhi-dl/issues/1553
# Fix up some stupid HTML, see https://github.com/ytdl-org/youtube-dl/issues/1553
object_str = re.sub(r'(<param(?:\s+[a-zA-Z0-9_]+="[^"]*")*)>',
lambda m: m.group(1) + '/>', object_str)
# Fix up some stupid XML, see https://github.com/ytdl-org/haruhi-dl/issues/1608
# Fix up some stupid XML, see https://github.com/ytdl-org/youtube-dl/issues/1608
object_str = object_str.replace('<--', '<!--')
# remove namespace to simplify extraction
object_str = re.sub(r'(<object[^>]*)(xmlns=".*?")', r'\1', object_str)
@ -470,13 +471,18 @@ class BrightcoveNewIE(AdobePassIE):
def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
title = json_data['name'].strip()
num_drm_sources = 0
formats = []
for source in json_data.get('sources', []):
sources = json_data.get('sources') or []
for source in sources:
container = source.get('container')
ext = mimetype2ext(source.get('type'))
src = source.get('src')
# https://support.brightcove.com/playback-api-video-fields-reference#key_systems_object
if ext == 'ism' or container == 'WVM' or source.get('key_systems'):
if container == 'WVM' or source.get('key_systems'):
num_drm_sources += 1
continue
elif ext == 'ism':
continue
elif ext == 'm3u8' or container == 'M2TS':
if not src:
@ -533,20 +539,15 @@ class BrightcoveNewIE(AdobePassIE):
'format_id': build_format_id('rtmp'),
})
formats.append(f)
if not formats:
# for sonyliv.com DRM protected videos
s3_source_url = json_data.get('custom_fields', {}).get('s3sourceurl')
if s3_source_url:
formats.append({
'url': s3_source_url,
'format_id': 'source',
})
errors = json_data.get('errors')
if not formats and errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
if not formats:
errors = json_data.get('errors')
if errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
if sources and num_drm_sources == len(sources):
raise ExtractorError('This video is DRM protected.', expected=True)
self._sort_formats(formats)
@ -600,24 +601,27 @@ class BrightcoveNewIE(AdobePassIE):
store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
def extract_policy_key():
webpage = self._download_webpage(
'http://players.brightcove.net/%s/%s_%s/index.min.js'
% (account_id, player_id, embed), video_id)
policy_key = None
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
base_url = 'http://players.brightcove.net/%s/%s_%s/' % (account_id, player_id, embed)
config = self._download_json(
base_url + 'config.json', video_id, fatal=False) or {}
policy_key = try_get(
config, lambda x: x['video_cloud']['policy_key'])
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
webpage = self._download_webpage(
base_url + 'index.min.js', video_id)
catalog = self._search_regex(
r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
catalog = self._parse_json(
js_to_json(catalog), video_id, fatal=False)
if catalog:
policy_key = catalog.get('policyKey')
if not policy_key:
policy_key = self._search_regex(
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
store_pk(policy_key)
return policy_key

View file

@ -8,18 +8,20 @@ from .gigya import GigyaBaseIE
from ..compat import compat_HTTPError
from ..utils import (
ExtractorError,
strip_or_none,
clean_html,
extract_attributes,
float_or_none,
get_element_by_class,
int_or_none,
merge_dicts,
parse_iso8601,
str_or_none,
strip_or_none,
url_or_none,
)
class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza|dako)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'md5': '68993eda72ef62386a15ea2cf3c93107',
@ -37,6 +39,7 @@ class CanvasIE(InfoExtractor):
'url': 'https://mediazone.vrt.be/api/v1/canvas/assets/mz-ast-5e5f90b6-2d72-4c40-82c2-e134f884e93e',
'only_matching': True,
}]
_GEO_BYPASS = False
_HLS_ENTRY_PROTOCOLS_MAP = {
'HLS': 'm3u8_native',
'HLS_AES': 'm3u8',
@ -47,29 +50,34 @@ class CanvasIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site_id'), mobj.group('id')
# Old API endpoint, serves more formats but may fail for some videos
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id, 'Downloading asset JSON',
'Unable to download asset JSON', fatal=False)
data = None
if site_id != 'vrtvideo':
# Old API endpoint, serves more formats but may fail for some videos
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), video_id, 'Downloading asset JSON',
'Unable to download asset JSON', fatal=False)
# New API endpoint
if not data:
headers = self.geo_verification_headers()
headers.update({'Content-Type': 'application/json'})
token = self._download_json(
'%s/tokens' % self._REST_API_BASE, video_id,
'Downloading token', data=b'',
headers={'Content-Type': 'application/json'})['vrtPlayerToken']
'Downloading token', data=b'', headers=headers)['vrtPlayerToken']
data = self._download_json(
'%s/videos/%s' % (self._REST_API_BASE, video_id),
video_id, 'Downloading video JSON', fatal=False, query={
video_id, 'Downloading video JSON', query={
'vrtPlayerToken': token,
'client': '%s@PROD' % site_id,
}, expected_status=400)
message = data.get('message')
if message and not data.get('title'):
if data.get('code') == 'AUTHENTICATION_REQUIRED':
self.raise_login_required(message)
raise ExtractorError(message, expected=True)
if not data.get('title'):
code = data.get('code')
if code == 'AUTHENTICATION_REQUIRED':
self.raise_login_required()
elif code == 'INVALID_LOCATION':
self.raise_geo_restricted(countries=['BE'])
raise ExtractorError(data.get('message') or code, expected=True)
title = data['title']
description = data.get('description')
@ -205,20 +213,24 @@ class CanvasEenIE(InfoExtractor):
class VrtNUIE(GigyaBaseIE):
IE_DESC = 'VrtNU.be'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/vrtnu/a-z/(?:[^/]+/){2}(?P<id>[^/?#&]+)'
_TESTS = [{
# Available via old API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1989/postbus-x-s1989a1/',
'info_dict': {
'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
'id': 'pbs-pub-e8713dac-899e-41de-9313-81269f4c04ac$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
'ext': 'mp4',
'title': 'De zwarte weduwe',
'description': 'md5:db1227b0f318c849ba5eab1fef895ee4',
'title': 'Postbus X - Aflevering 1 (Seizoen 1989)',
'description': 'md5:b704f669eb9262da4c55b33d7c6ed4b7',
'duration': 1457.04,
'thumbnail': r're:^https?://.*\.jpg$',
'season': 'Season 1',
'season_number': 1,
'series': 'Postbus X',
'season': 'Seizoen 1989',
'season_number': 1989,
'episode': 'De zwarte weduwe',
'episode_number': 1,
'timestamp': 1595822400,
'upload_date': '20200727',
},
'skip': 'This video is only available for registered users',
'params': {
@ -300,69 +312,73 @@ class VrtNUIE(GigyaBaseIE):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(url, display_id)
webpage = self._download_webpage(url, display_id)
attrs = extract_attributes(self._search_regex(
r'(<nui-media[^>]+>)', webpage, 'media element'))
video_id = attrs['videoid']
publication_id = attrs.get('publicationid')
if publication_id:
video_id = publication_id + '$' + video_id
page = (self._parse_json(self._search_regex(
r'digitalData\s*=\s*({.+?});', webpage, 'digial data',
default='{}'), video_id, fatal=False) or {}).get('page') or {}
info = self._search_json_ld(webpage, display_id, default={})
# title is optional here since it may be extracted by extractor
# that is delegated from here
title = strip_or_none(self._html_search_regex(
r'(?ms)<h1 class="content__heading">(.+?)</h1>',
webpage, 'title', default=None))
description = self._html_search_regex(
r'(?ms)<div class="content__description">(.+?)</div>',
webpage, 'description', default=None)
season = self._html_search_regex(
[r'''(?xms)<div\ class="tabs__tab\ tabs__tab--active">\s*
<span>seizoen\ (.+?)</span>\s*
</div>''',
r'<option value="seizoen (\d{1,3})" data-href="[^"]+?" selected>'],
webpage, 'season', default=None)
season_number = int_or_none(season)
episode_number = int_or_none(self._html_search_regex(
r'''(?xms)<div\ class="content__episode">\s*
<abbr\ title="aflevering">afl</abbr>\s*<span>(\d+)</span>
</div>''',
webpage, 'episode_number', default=None))
release_date = parse_iso8601(self._html_search_regex(
r'(?ms)<div class="content__broadcastdate">\s*<time\ datetime="(.+?)"',
webpage, 'release_date', default=None))
# If there's a ? or a # in the URL, remove them and everything after
clean_url = urlh.geturl().split('?')[0].split('#')[0].strip('/')
securevideo_url = clean_url + '.mssecurevideo.json'
try:
video = self._download_json(securevideo_url, display_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
self.raise_login_required()
raise
# We are dealing with a '../<show>.relevant' URL
redirect_url = video.get('url')
if redirect_url:
return self.url_result(self._proto_relative_url(redirect_url, 'https:'))
# There is only one entry, but with an unknown key, so just get
# the first one
video_id = list(video.values())[0].get('videoid')
return merge_dicts(info, {
'_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/vrtvideo/assets/%s' % video_id,
'ie_key': CanvasIE.ie_key(),
'id': video_id,
'display_id': display_id,
'season_number': int_or_none(page.get('episode_season')),
})
class DagelijkseKostIE(InfoExtractor):
IE_DESC = 'dagelijksekost.een.be'
_VALID_URL = r'https?://dagelijksekost\.een\.be/gerechten/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://dagelijksekost.een.be/gerechten/hachis-parmentier-met-witloof',
'md5': '30bfffc323009a3e5f689bef6efa2365',
'info_dict': {
'id': 'md-ast-27a4d1ff-7d7b-425e-b84f-a4d227f592fa',
'display_id': 'hachis-parmentier-met-witloof',
'ext': 'mp4',
'title': 'Hachis parmentier met witloof',
'description': 'md5:9960478392d87f63567b5b117688cdc5',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 283.02,
},
'expected_warnings': ['is not a supported codec'],
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = strip_or_none(get_element_by_class(
'dish-metadata__title', webpage
) or self._html_search_meta(
'twitter:title', webpage))
description = clean_html(get_element_by_class(
'dish-description', webpage)
) or self._html_search_meta(
('description', 'twitter:description', 'og:description'),
webpage)
video_id = self._html_search_regex(
r'data-url=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id',
group='id')
return {
'_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/dako/assets/%s' % video_id,
'ie_key': CanvasIE.ie_key(),
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'season': season,
'season_number': season_number,
'episode_number': episode_number,
'release_date': release_date,
})
}

View file

@ -0,0 +1,91 @@
# coding: utf-8
from .common import InfoExtractor
from ..utils import (
parse_duration,
)
import re
class CastosHostedIE(InfoExtractor):
_VALID_URL = r'https?://[^/.]+\.castos\.com/(?:player|episodes)/(?P<id>[\da-zA-Z-]+)'
IE_NAME = 'castos:hosted'
_TESTS = [{
'url': 'https://audience.castos.com/player/408278',
'info_dict': {
'id': '408278',
'ext': 'mp3',
},
}, {
'url': 'https://audience.castos.com/episodes/improve-your-podcast-production',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage, **kw):
return [mobj.group(1) for mobj
in re.finditer(
r'<iframe\b[^>]+(?<!-)src="(https?://[^/.]+\.castos\.com/player/\d+)',
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
series = self._html_search_regex(
r'<div class="show">\s+<strong>([^<]+)</strong>', webpage, 'series name')
title = self._html_search_regex(
r'<div class="episode-title">([^<]+)</div>', webpage, 'episode title')
audio_url = self._html_search_regex(
r'<audio class="clip">\s+<source\b[^>]+src="(https?://[^"]+)"', webpage, 'audio url')
duration = parse_duration(self._search_regex(
r'<time id="duration">(\d\d(?::\d\d)+)</time>', webpage, 'duration'))
return {
'id': video_id,
'title': title,
'url': audio_url,
'duration': duration,
'series': series,
'episode': title,
}
class CastosSSPIE(InfoExtractor):
@classmethod
def _extract_entries(self, webpage, **kw):
entries = []
for found in re.finditer(
r'(?s)<div class="castos-player[^"]*"[^>]*data-episode="(\d+)-[a-z\d]+">(.+?</nav>)\s*</div>',
webpage):
video_id, entry = found.group(1, 2)
def search_entry(regex):
res = re.search(regex, entry)
if res:
return res.group(1)
series = search_entry(r'<div class="show">\s+<strong>([^<]+)</strong>')
title = search_entry(r'<div class="episode-title">([^<]+)</div>')
audio_url = search_entry(
r'<audio class="clip[^"]*">\s+<source\b[^>]+src="(https?://[^"]+)"')
duration = parse_duration(
search_entry(r'<time id="duration[^"]*">(\d\d(?::\d\d)+)</time>'))
if not title or not audio_url:
continue
entries.append({
'id': video_id,
'title': title,
'url': audio_url,
'duration': duration,
'series': series,
'episode': title,
})
return entries

View file

@ -27,7 +27,7 @@ class CBSBaseIE(ThePlatformFeedIE):
class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:(?:cbs|paramountplus)\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@ -52,6 +52,9 @@ class CBSIE(CBSBaseIE):
}, {
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}, {
'url': 'https://www.paramountplus.com/shows/all-rise/video/QmR1WhNkh1a_IrdHZrbcRklm176X_rVc/all-rise-space/',
'only_matching': True,
}]
def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517):

View file

@ -11,7 +11,47 @@ from ..utils import (
class CBSLocalIE(AnvatoIE):
_VALID_URL = r'https?://[a-z]+\.cbslocal\.com/(?:\d+/\d+/\d+|video)/(?P<id>[0-9a-z-]+)'
_VALID_URL_BASE = r'https?://[a-z]+\.cbslocal\.com/'
_VALID_URL = _VALID_URL_BASE + r'video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
'info_dict': {
'id': '3580809',
'ext': 'mp4',
'title': 'A Very Blue Anniversary',
'description': 'CBS2s Cindy Hsu has more.',
'thumbnail': 're:^https?://.*',
'timestamp': int,
'upload_date': r're:^\d{8}$',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\WCBSTV',
'Syndication\\AOL',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\Yahoo',
'Content\\News',
'Content\\News\\Local News',
],
'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
mcp_id = self._match_id(url)
return self.url_result(
'anvato:anvato_cbslocal_app_web_prod_547f3e49241ef0e5d30c79b2efbca5d92c698f67:' + mcp_id, 'Anvato', mcp_id)
class CBSLocalArticleIE(AnvatoIE):
_VALID_URL = CBSLocalIE._VALID_URL_BASE + r'\d+/\d+/\d+/(?P<id>[0-9a-z-]+)'
_TESTS = [{
# Anvato backend
@ -52,31 +92,6 @@ class CBSLocalIE(AnvatoIE):
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
'info_dict': {
'id': '3580809',
'ext': 'mp4',
'title': 'A Very Blue Anniversary',
'description': 'CBS2s Cindy Hsu has more.',
'thumbnail': 're:^https?://.*',
'timestamp': int,
'upload_date': r're:^\d{8}$',
'uploader': 'CBS',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\WCBSTV',
'Syndication\\AOL',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\Yahoo',
'Content\\News',
'Content\\News\\Local News',
],
'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
},
}]
def _real_extract(self, url):

View file

@ -26,7 +26,7 @@ class CBSNewsEmbedIE(CBSIE):
def _real_extract(self, url):
item = self._parse_json(zlib.decompress(compat_b64decode(
compat_urllib_parse_unquote(self._match_id(url))),
-zlib.MAX_WBITS), None)['video']['items'][0]
-zlib.MAX_WBITS).decode('utf-8'), None)['video']['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews')

View file

@ -1,38 +1,113 @@
from __future__ import unicode_literals
from .cbs import CBSBaseIE
import re
# from .cbs import CBSBaseIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
)
class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/(?:video|news)/(?P<id>[^/?#&]+)'
# class CBSSportsEmbedIE(CBSBaseIE):
class CBSSportsEmbedIE(InfoExtractor):
IE_NAME = 'cbssports:embed'
_VALID_URL = r'''(?ix)https?://(?:(?:www\.)?cbs|embed\.247)sports\.com/player/embed.+?
(?:
ids%3D(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})|
pcid%3D(?P<pcid>\d+)
)'''
_TESTS = [{
'url': 'https://www.cbssports.com/nba/video/donovan-mitchell-flashes-star-potential-in-game-2-victory-over-thunder/',
'info_dict': {
'id': '1214315075735',
'ext': 'mp4',
'title': 'Donovan Mitchell flashes star potential in Game 2 victory over Thunder',
'description': 'md5:df6f48622612c2d6bd2e295ddef58def',
'timestamp': 1524111457,
'upload_date': '20180419',
'uploader': 'CBSI-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
}
'url': 'https://www.cbssports.com/player/embed/?args=player_id%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26ids%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26resizable%3D1%26autoplay%3Dtrue%26domain%3Dcbssports.com%26comp_ads_enabled%3Dfalse%26watchAndRead%3D0%26startTime%3D0%26env%3Dprod',
'only_matching': True,
}, {
'url': 'https://www.cbssports.com/nba/news/nba-playoffs-2018-watch-76ers-vs-heat-game-3-series-schedule-tv-channel-online-stream/',
'url': 'https://embed.247sports.com/player/embed/?args=%3fplayer_id%3d1827823171591%26channel%3dcollege-football-recruiting%26pcid%3d1827823171591%26width%3d640%26height%3d360%26autoplay%3dTrue%26comp_ads_enabled%3dFalse%26uvpc%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_v4%2526partner%253d247%26uvpc_m%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_m_v4%2526partner_m%253d247_mobile%26utag%3d247sportssite%26resizable%3dTrue',
'only_matching': True,
}]
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
# def _extract_video_info(self, filter_query, video_id):
# return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url):
uuid, pcid = re.match(self._VALID_URL, url).groups()
query = {'id': uuid} if uuid else {'pcid': pcid}
video = self._download_json(
'https://www.cbssports.com/api/content/video/',
uuid or pcid, query=query)[0]
video_id = video['id']
title = video['title']
metadata = video.get('metaData') or {}
# return self._extract_video_info('byId=%d' % metadata['mpxOutletId'], video_id)
# return self._extract_video_info('byGuid=' + metadata['mpxRefId'], video_id)
formats = self._extract_m3u8_formats(
metadata['files'][0]['url'], video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
self._sort_formats(formats)
image = video.get('image')
thumbnails = None
if image:
image_path = image.get('path')
if image_path:
thumbnails = [{
'url': image_path,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
'filesize': int_or_none(image.get('size')),
}]
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': video.get('description'),
'timestamp': int_or_none(try_get(video, lambda x: x['dateCreated']['epoch'])),
'duration': int_or_none(metadata.get('duration')),
}
class CBSSportsBaseIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'(?:=|%26)pcid%3D(\d+)', r'embedVideo(?:Container)?_(\d+)'],
webpage, 'video id')
return self._extract_video_info('byId=%s' % video_id, video_id)
iframe_url = self._search_regex(
r'<iframe[^>]+(?:data-)?src="(https?://[^/]+/player/embed[^"]+)"',
webpage, 'embed url')
return self.url_result(iframe_url, CBSSportsEmbedIE.ie_key())
class CBSSportsIE(CBSSportsBaseIE):
IE_NAME = 'cbssports'
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.cbssports.com/college-football/video/cover-3-stanford-spring-gleaning/',
'info_dict': {
'id': 'b56c03a6-231a-4bbe-9c55-af3c8a8e9636',
'ext': 'mp4',
'title': 'Cover 3: Stanford Spring Gleaning',
'description': 'The Cover 3 crew break down everything you need to know about the Stanford Cardinal this spring.',
'timestamp': 1617218398,
'upload_date': '20210331',
'duration': 502,
},
}]
class TwentyFourSevenSportsIE(CBSSportsBaseIE):
IE_NAME = '247sports'
_VALID_URL = r'https?://(?:www\.)?247sports\.com/Video/(?:[^/?#&]+-)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://247sports.com/Video/2021-QB-Jake-Garcia-senior-highlights-through-five-games-10084854/',
'info_dict': {
'id': '4f1265cb-c3b5-44a8-bb1d-1914119a0ccc',
'ext': 'mp4',
'title': '2021 QB Jake Garcia senior highlights through five games',
'description': 'md5:8cb67ebed48e2e6adac1701e0ff6e45b',
'timestamp': 1607114223,
'upload_date': '20201204',
'duration': 208,
},
}]

View file

@ -1,15 +1,18 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import datetime
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
extract_timezone,
int_or_none,
parse_duration,
parse_iso8601,
parse_resolution,
try_get,
url_or_none,
)
@ -24,8 +27,9 @@ class CCMAIE(InfoExtractor):
'ext': 'mp4',
'title': 'L\'espot de La Marató de TV3',
'description': 'md5:f12987f320e2f6e988e9908e4fe97765',
'timestamp': 1470918540,
'upload_date': '20160811',
'timestamp': 1478608140,
'upload_date': '20161108',
'age_limit': 0,
}
}, {
'url': 'http://www.ccma.cat/catradio/alacarta/programa/el-consell-de-savis-analitza-el-derbi/audio/943685/',
@ -35,8 +39,24 @@ class CCMAIE(InfoExtractor):
'ext': 'mp3',
'title': 'El Consell de Savis analitza el derbi',
'description': 'md5:e2a3648145f3241cb9c6b4b624033e53',
'upload_date': '20171205',
'timestamp': 1512507300,
'upload_date': '20170512',
'timestamp': 1494622500,
'vcodec': 'none',
'categories': ['Esports'],
}
}, {
'url': 'http://www.ccma.cat/tv3/alacarta/crims/crims-josep-tallada-lespereu-me-capitol-1/video/6031387/',
'md5': 'b43c3d3486f430f3032b5b160d80cbc3',
'info_dict': {
'id': '6031387',
'ext': 'mp4',
'title': 'Crims - Josep Talleda, l\'"Espereu-me" (capítol 1)',
'description': 'md5:7cbdafb640da9d0d2c0f62bad1e74e60',
'timestamp': 1582577700,
'upload_date': '20200224',
'subtitles': 'mincount:4',
'age_limit': 16,
'series': 'Crims',
}
}]
@ -72,17 +92,28 @@ class CCMAIE(InfoExtractor):
informacio = media['informacio']
title = informacio['titol']
durada = informacio.get('durada', {})
durada = informacio.get('durada') or {}
duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
timestamp = parse_iso8601(informacio.get('data_emissio', {}).get('utc'))
tematica = try_get(informacio, lambda x: x['tematica']['text'])
timestamp = None
data_utc = try_get(informacio, lambda x: x['data_emissio']['utc'])
try:
timezone, data_utc = extract_timezone(data_utc)
timestamp = calendar.timegm((datetime.datetime.strptime(
data_utc, '%Y-%d-%mT%H:%M:%S') - timezone).timetuple())
except TypeError:
pass
subtitles = {}
subtitols = media.get('subtitols', {})
if subtitols:
sub_url = subtitols.get('url')
subtitols = media.get('subtitols') or []
if isinstance(subtitols, dict):
subtitols = [subtitols]
for st in subtitols:
sub_url = st.get('url')
if sub_url:
subtitles.setdefault(
subtitols.get('iso') or subtitols.get('text') or 'ca', []).append({
st.get('iso') or st.get('text') or 'ca', []).append({
'url': sub_url,
})
@ -97,6 +128,16 @@ class CCMAIE(InfoExtractor):
'height': int_or_none(imatges.get('alcada')),
}]
age_limit = None
codi_etic = try_get(informacio, lambda x: x['codi_etic']['id'])
if codi_etic:
codi_etic_s = codi_etic.split('_')
if len(codi_etic_s) == 2:
if codi_etic_s[1] == 'TP':
age_limit = 0
else:
age_limit = int_or_none(codi_etic_s[1])
return {
'id': media_id,
'title': title,
@ -106,4 +147,9 @@ class CCMAIE(InfoExtractor):
'thumbnails': thumbnails,
'subtitles': subtitles,
'formats': formats,
'age_limit': age_limit,
'alt_title': informacio.get('titol_complet'),
'episode_number': int_or_none(informacio.get('capitol')),
'categories': [tematica] if tematica else None,
'series': informacio.get('programa'),
}

View file

@ -1,24 +1,77 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import base64
import datetime
import hashlib
import hmac
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_unquote_plus,
ExtractorError,
float_or_none,
multipart_encode,
parse_duration,
random_birthday,
rot47,
urljoin,
int_or_none,
try_get,
)
class CDAIE(InfoExtractor):
class CDABaseExtractor(InfoExtractor):
_BASE_URL = 'https://api.cda.pl'
_BASE_HEADERS = {
'Accept': 'application/vnd.cda.public+json',
'User-Agent': 'pl.cda 1.0 (version 1.2.88 build 15306; Android 9; Xiaomi Redmi 3S)',
# gets replaced with Bearer token after the login request
# apparently the token is hardcoded in the app
'Authorization': 'Basic YzU3YzBlZDUtYTIzOC00MWQwLWI2NjQtNmZmMWMxY2Y2YzVlOklBTm95QlhRRVR6U09MV1hnV3MwMW0xT2VyNWJNZzV4clRNTXhpNGZJUGVGZ0lWUlo5UGVYTDhtUGZaR1U1U3Q',
}
_NETRC_MACHINE = 'cda'
_bearer = None
# logs into cda.pl and returns _BASE_HEADERS with the Bearer token
def _get_headers(self, video_id):
headers = self._BASE_HEADERS
if self._bearer and self._bearer['valid_until'] > datetime.datetime.now().timestamp() + 5:
headers.update({
'Authorization': 'Bearer %s' % self._bearer['token'],
})
return headers
username, password = self._get_login_info()
if username is None or password is None:
username = 'niezesrajciesiecda'
password_hash = 'VD3QbYWSb_uwAShBZKN7F1DwEg_tRTdb4Xd3JvFsx6Y'
account_type = 'shared'
else:
pwd_md5 = ""
for byte in hashlib.md5(password.encode('utf-8')).digest():
# bytes() param must be iterable of ints and not int
hexik = bytes((byte & 255, )).hex()
while len(hexik) < 2:
hexik = "0" + hexik
pwd_md5 += hexik
digest = hmac.new(
's01m1Oer5IANoyBXQETzSOLWXgWs01m1Oer5bMg5xrTMMxRZ9Pi4fIPeFgIVRZ9PeXL8mPfXQETZGUAN5StRZ9P'.encode('utf-8'),
pwd_md5.encode('utf-8'), hashlib.sha256).digest()
password_hash = base64.urlsafe_b64encode(digest).decode('utf-8').replace('=', '')
account_type = 'user'
token_res = self._download_json('%s/oauth/token?grant_type=password&login=%s&password=%s' % (self._BASE_URL, username, password_hash),
video_id, 'Logging into cda.pl with a %s account' % account_type, headers=headers, data=bytes(''.encode('utf-8')))
self._bearer = {
'token': token_res['access_token'],
'valid_until': token_res['expires_in'] + datetime.datetime.now().timestamp(),
}
headers.update({
'Authorization': 'Bearer %s' % token_res['access_token'],
})
return headers
class CDAIE(CDABaseExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?cda\.pl/video|ebd\.cda\.pl/[0-9]+x[0-9]+)/(?P<id>[0-9a-z]+)'
_BASE_URL = 'http://www.cda.pl/'
_TESTS = [{
'url': 'http://www.cda.pl/video/5749950c',
'md5': '6f844bf51b15f31fae165365707ae970',
@ -28,7 +81,7 @@ class CDAIE(InfoExtractor):
'height': 720,
'title': 'Oto dlaczego przed zakrętem należy zwolnić.',
'description': 'md5:269ccd135d550da90d1662651fcb9772',
'thumbnail': r're:^https?://.*\.jpg$',
'thumbnail': r're:^https?://.*\.jpg(?:\?t=\d+)?$',
'average_rating': float,
'duration': 39,
'age_limit': 0,
@ -41,7 +94,7 @@ class CDAIE(InfoExtractor):
'ext': 'mp4',
'title': 'Lądowanie na lotnisku na Maderze',
'description': 'md5:60d76b71186dcce4e0ba6d4bbdb13e1a',
'thumbnail': r're:^https?://.*\.jpg$',
'thumbnail': r're:^https?://.*\.jpg(?:\?t=\d+)?$',
'uploader': 'crash404',
'average_rating': float,
'duration': 137,
@ -55,7 +108,7 @@ class CDAIE(InfoExtractor):
'ext': 'mp4',
'title': 'Cycki',
'description': 'cycki cycuszki fajne ciekawe azja ajzatka',
'thumbnail': r're:^https?://.*\.jpg$',
'thumbnail': r're:^https?://.*\.jpg(?:\?t=\d+)?$',
'duration': 6,
'age_limit': 18,
'average_rating': float,
@ -65,111 +118,47 @@ class CDAIE(InfoExtractor):
'only_matching': True,
}]
def _download_age_confirm_page(self, url, video_id, *args, **kwargs):
form_data = random_birthday('rok', 'miesiac', 'dzien')
form_data.update({'return': url, 'module': 'video', 'module_id': video_id})
data, content_type = multipart_encode(form_data)
return self._download_webpage(
urljoin(url, '/a/validatebirth'), video_id, *args,
data=data, headers={
'Referer': url,
'Content-Type': content_type,
}, **kwargs)
def _real_extract(self, url):
video_id = self._match_id(url)
self._set_cookie('cda.pl', 'cda.player', 'html5')
webpage = self._download_webpage(
self._BASE_URL + '/video/' + video_id, video_id)
if 'Ten film jest dostępny dla użytkowników premium' in webpage:
raise ExtractorError('This video is only available for premium users.', expected=True)
headers = self._get_headers(video_id)
need_confirm_age = False
if self._html_search_regex(r'(<form[^>]+action="[^>]*/a/validatebirth")',
webpage, 'birthday validate form', default=None):
webpage = self._download_age_confirm_page(
url, video_id, note='Confirming age')
need_confirm_age = True
metadata = self._download_json(
self._BASE_URL + '/video/' + video_id, video_id, headers=headers)['video']
uploader = try_get(metadata, lambda x: x['author']['login'])
# anonymous uploader
if uploader == 'anonim':
uploader = None
formats = []
uploader = self._search_regex(r'"author":\s*{[^}]*"name":\s*"([^"]+)"',
webpage, 'uploader', default=None)
average_rating = self._search_regex(
r'<span class="rating">\s*([\d.]+)',
webpage, 'rating', fatal=False)
info_dict = {
'id': video_id,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'uploader': uploader,
'average_rating': float_or_none(average_rating),
'thumbnail': self._og_search_thumbnail(webpage),
'formats': formats,
'duration': None,
'age_limit': 18 if need_confirm_age else 0,
}
def extract_format(page, version):
json_str = self._html_search_regex(
r'player_data=(\\?["\'])(?P<player_data>.+?)\1', page,
'%s player_json' % version, fatal=False, group='player_data')
if not json_str:
return
player_data = self._parse_json(
json_str, '%s player_data' % version, fatal=False)
if not player_data:
return
video = player_data.get('video')
if not video or 'file' not in video:
self.report_warning('Unable to extract %s version information' % version)
return
video['file'] = rot47(compat_urllib_parse_unquote_plus(video['file']))
if not video['file'].startswith('http'):
video['file'] = 'https://' + video['file']
video['file'] = video['file'].replace('.3cda.pl', '.cda.pl')
if video['file'].endswith('adc.mp4'):
video['file'] = video['file'].replace('adc.mp4', '.mp4')
if not video['file'].endswith('.mp4'):
video['file'] = video['file'][:-3] + '.mp4'
f = {
'url': video['file'],
}
m = re.search(
r'<a[^>]+data-quality="(?P<format_id>[^"]+)"[^>]+href="[^"]+"[^>]+class="[^"]*quality-btn-active[^"]*">(?P<height>[0-9]+)p',
page)
if m:
f.update({
'format_id': m.group('format_id'),
'height': int(m.group('height')),
})
info_dict['formats'].append(f)
if not info_dict['duration']:
info_dict['duration'] = parse_duration(video.get('duration'))
extract_format(webpage, 'default')
for href, resolution in re.findall(
r'<a[^>]+data-quality="[^"]+"[^>]+href="([^"]+)"[^>]+class="quality-btn"[^>]*>([0-9]+p)',
webpage):
if need_confirm_age:
handler = self._download_age_confirm_page
else:
handler = self._download_webpage
webpage = handler(
href if href.startswith('http') else self._BASE_URL + href, video_id,
'Downloading %s version information' % resolution, fatal=False)
if not webpage:
# Manually report warning because empty page is returned when
# invalid version is requested.
self.report_warning('Unable to download %s version information' % resolution)
for quality in metadata['qualities']:
if not quality['file']:
continue
formats.append({
'url': quality['file'],
'format': quality['title'],
'resolution': quality['name'],
'height': int_or_none(quality['name'][:-1]), # for the format sorting
'filesize': quality.get('length'),
})
extract_format(webpage, resolution)
if not formats:
if metadata.get('premium') is True and metadata.get('premium_free') is not True:
raise ExtractorError('This video is only available for premium users.', expected=True)
raise ExtractorError('No video qualities found', video_id=video_id)
self._sort_formats(formats)
return info_dict
return {
'id': video_id,
'title': metadata['title'],
'description': metadata.get('description'),
'uploader': uploader,
'average_rating': float_or_none(metadata.get('rating')),
'thumbnail': metadata.get('thumb'),
'formats': formats,
'duration': metadata.get('duration'),
'age_limit': 18 if metadata.get('for_adults') else 0,
'view_count': metadata.get('views'),
}

View file

@ -157,7 +157,7 @@ class CeskaTelevizeIE(InfoExtractor):
stream_formats = self._extract_mpd_formats(
stream_url, playlist_id,
mpd_id='dash-%s' % format_id, fatal=False)
# See https://github.com/ytdl-org/haruhi-dl/issues/12119#issuecomment-280037031
# See https://github.com/ytdl-org/youtube-dl/issues/12119#issuecomment-280037031
if format_id == 'audioDescription':
for f in stream_formats:
f['source_preference'] = -10

View file

@ -82,7 +82,7 @@ class Channel9IE(InfoExtractor):
_RSS_URL = 'http://channel9.msdn.com/%s/RSS'
@staticmethod
def _extract_urls(webpage):
def _extract_urls(webpage, **kwargs):
return re.findall(
r'<iframe[^>]+src=["\'](https?://channel9\.msdn\.com/(?:[^/]+/)+)player\b',
webpage)

View file

@ -1,14 +1,17 @@
# coding: utf-8
from __future__ import unicode_literals
from .onet import OnetBaseIE
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
from .pulsembed import PulseVideoIE, PulsEmbedIE
class ClipRsIE(OnetBaseIE):
class ClipRsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
_TEST = {
'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
'md5': 'c412d57815ba07b56f9edc7b5d6a14e5',
_TESTS = [{
'url': 'https://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
'info_dict': {
'id': '1488842.1399140381',
'ext': 'mp4',
@ -18,16 +21,42 @@ class ClipRsIE(OnetBaseIE):
'timestamp': 1459850243,
'upload_date': '20160405',
}
}
}, {
'url': 'https://www.clip.rs/u-novom-sadu-se-sinoc-desio-jedan-zimski-neum-svi-su-zaboravili-na-koronu-uhvatili-se-u-kolo-i-nastao-je-hit-video/15686',
'info_dict': {
'id': '2210721.1689293351',
'ext': 'mp4',
'title': 'U Novom Sadu se sinoć desio jedan zimski Neum: Svi su zaboravili na koronu, uhvatili se u kolo i nastao je HIT VIDEO',
'description': 'md5:b1d7d6c0b029b922f06a2a08c9761852',
'timestamp': 1609405068,
'upload_date': '20201231',
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
info_dict = {}
mvp_id = self._search_mvp_id(webpage)
info_dict = self._extract_from_id(mvp_id, webpage)
info_dict['display_id'] = display_id
mvp_id = PulseVideoIE._search_mvp_id(webpage, default=None)
if mvp_id:
info_dict.update({
'url': 'pulsevideo:%s' % PulseVideoIE._search_mvp_id(webpage),
'ie_key': PulseVideoIE.ie_key(),
})
else:
entries = PulsEmbedIE._extract_entries(webpage)
if not entries:
raise ExtractorError('Video ID not found on webpage')
if len(entries) > 1:
raise ExtractorError('More than 1 PulsEmbed')
info_dict.update(entries[0])
info_dict.update({
'_type': 'url_transparent',
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'display_id': display_id,
})
return info_dict

View file

@ -41,7 +41,7 @@ class CloudflareStreamIE(InfoExtractor):
}]
@staticmethod
def _extract_urls(webpage):
def _extract_urls(webpage, **kwargs):
return [
mobj.group('url')
for mobj in re.finditer(

View file

@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import smuggle_url
@ -38,7 +39,7 @@ class CNBCIE(InfoExtractor):
class CNBCVideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cnbc\.com/video/(?:[^/]+/)+(?P<id>[^./?#&]+)'
_VALID_URL = r'https?://(?:www\.)?cnbc\.com(?P<path>/video/(?:[^/]+/)+(?P<id>[^./?#&]+)\.html)'
_TEST = {
'url': 'https://www.cnbc.com/video/2018/07/19/trump-i-dont-necessarily-agree-with-raising-rates.html',
'info_dict': {
@ -56,11 +57,15 @@ class CNBCVideoIE(InfoExtractor):
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'content_id["\']\s*:\s*["\'](\d+)', webpage, display_id,
'video id')
path, display_id = re.match(self._VALID_URL, url).groups()
video_id = self._download_json(
'https://webql-redesign.cnbcfm.com/graphql', display_id, query={
'query': '''{
page(path: "%s") {
vcpsId
}
}''' % path,
})['data']['page']['vcpsId']
return self.url_result(
'http://video.cnbc.com/gallery/?video=%s' % video_id,
'http://video.cnbc.com/gallery/?video=%d' % video_id,
CNBCIE.ie_key())

View file

@ -96,7 +96,10 @@ class CNNIE(TurnerBaseIE):
config['data_src'] % path, page_title, {
'default': {
'media_src': config['media_src'],
}
},
'f4m': {
'host': 'cnn-vh.akamaihd.net',
},
})

View file

@ -1,142 +1,51 @@
from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor
from .common import InfoExtractor
class ComedyCentralIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
(video-clips|episodes|cc-studios|video-collections|shows(?=/[^/]+/(?!full-episodes)))
/(?P<title>.*)'''
_VALID_URL = r'https?://(?:www\.)?cc\.com/(?:episodes|video(?:-clips)?)/(?P<id>[0-9a-z]{6})'
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'
_TESTS = [{
'url': 'http://www.cc.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother',
'md5': 'c4f48e9eda1b16dd10add0744344b6d8',
'url': 'http://www.cc.com/video-clips/5ke9v2/the-daily-show-with-trevor-noah-doc-rivers-and-steve-ballmer---the-nba-player-strike',
'md5': 'b8acb347177c680ff18a292aa2166f80',
'info_dict': {
'id': 'cef0cbb3-e776-4bc9-b62e-8016deccb354',
'id': '89ccc86e-1b02-4f83-b0c9-1d9592ecd025',
'ext': 'mp4',
'title': 'CC:Stand-Up|August 18, 2013|1|0101|Uncensored - Too Good of a Mother',
'description': 'After a certain point, breastfeeding becomes c**kblocking.',
'timestamp': 1376798400,
'upload_date': '20130818',
'title': 'The Daily Show with Trevor Noah|August 28, 2020|25|25149|Doc Rivers and Steve Ballmer - The NBA Player Strike',
'description': 'md5:5334307c433892b85f4f5e5ac9ef7498',
'timestamp': 1598670000,
'upload_date': '20200829',
},
}, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/interviews/6yx39d/exclusive-rand-paul-extended-interview',
'url': 'http://www.cc.com/episodes/pnzzci/drawn-together--american-idol--parody-clip-show-season-3-ep-314',
'only_matching': True,
}]
class ComedyCentralFullEpisodesIE(MTVServicesInfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
(?:full-episodes|shows(?=/[^/]+/full-episodes))
/(?P<id>[^?]+)'''
_FEED_URL = 'http://comedycentral.com/feeds/mrss/'
_TESTS = [{
'url': 'http://www.cc.com/full-episodes/pv391a/the-daily-show-with-trevor-noah-november-28--2016---ryan-speedo-green-season-22-ep-22028',
'info_dict': {
'description': 'Donald Trump is accused of exploiting his president-elect status for personal gain, Cuban leader Fidel Castro dies, and Ryan Speedo Green discusses "Sing for Your Life."',
'title': 'November 28, 2016 - Ryan Speedo Green',
},
'playlist_count': 4,
}, {
'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
mgid = self._extract_triforce_mgid(webpage, data_zone='t2_lc_promo1')
videos_info = self._get_videos_info(mgid)
return videos_info
class ToshIE(MTVServicesInfoExtractor):
IE_DESC = 'Tosh.0'
_VALID_URL = r'^https?://tosh\.cc\.com/video-(?:clips|collections)/[^/]+/(?P<videotitle>[^/?#]+)'
_FEED_URL = 'http://tosh.cc.com/feeds/mrss'
_TESTS = [{
'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans',
'info_dict': {
'description': 'Tosh asked fans to share their summer plans.',
'title': 'Twitter Users Share Summer Plans',
},
'playlist': [{
'md5': 'f269e88114c1805bb6d7653fecea9e06',
'info_dict': {
'id': '90498ec2-ed00-11e0-aca6-0026b9414f30',
'ext': 'mp4',
'title': 'Tosh.0|June 9, 2077|2|211|Twitter Users Share Summer Plans',
'description': 'Tosh asked fans to share their summer plans.',
'thumbnail': r're:^https?://.*\.jpg',
# It's really reported to be published on year 2077
'upload_date': '20770610',
'timestamp': 3390510600,
'subtitles': {
'en': 'mincount:3',
},
},
}]
}, {
'url': 'http://tosh.cc.com/video-collections/x2iz7k/just-plain-foul/m5q4fp',
'url': 'https://www.cc.com/video/k3sdvm/the-daily-show-with-jon-stewart-exclusive-the-fourth-estate',
'only_matching': True,
}]
class ComedyCentralTVIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/folgen/(?P<id>[0-9a-z]{6})'
_TESTS = [{
'url': 'http://www.comedycentral.tv/staffeln/7436-the-mindy-project-staffel-4',
'url': 'https://www.comedycentral.tv/folgen/pxdpec/josh-investigates-klimawandel-staffel-1-ep-1',
'info_dict': {
'id': 'local_playlist-f99b626bdfe13568579a',
'ext': 'flv',
'title': 'Episode_the-mindy-project_shows_season-4_episode-3_full-episode_part1',
'id': '15907dc3-ec3c-11e8-a442-0e40cf2fc285',
'ext': 'mp4',
'title': 'Josh Investigates',
'description': 'Steht uns das Ende der Welt bevor?',
},
'params': {
# rtmp download
'skip_download': True,
},
}, {
'url': 'http://www.comedycentral.tv/shows/1074-workaholics',
'only_matching': True,
}, {
'url': 'http://www.comedycentral.tv/shows/1727-the-mindy-project/bonus',
'only_matching': True,
}]
_FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
_GEO_COUNTRIES = ['DE']
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mrss_url = self._search_regex(
r'data-mrss=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'mrss url', group='url')
return self._get_videos_info_from_url(mrss_url, video_id)
class ComedyCentralShortnameIE(InfoExtractor):
_VALID_URL = r'^:(?P<id>tds|thedailyshow|theopposition)$'
_TESTS = [{
'url': ':tds',
'only_matching': True,
}, {
'url': ':thedailyshow',
'only_matching': True,
}, {
'url': ':theopposition',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
shortcut_map = {
'tds': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'thedailyshow': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
'theopposition': 'http://www.cc.com/shows/the-opposition-with-jordan-klepper/full-episodes',
def _get_feed_query(self, uri):
return {
'accountOverride': 'intl.mtvi.com',
'arcEp': 'web.cc.tv',
'ep': 'b9032c3a',
'imageEp': 'web.cc.tv',
'mgid': uri,
}
return self.url_result(shortcut_map[video_id])

View file

@ -17,7 +17,7 @@ import math
from ..compat import (
compat_cookiejar_Cookie,
compat_cookies,
compat_cookies_SimpleCookie,
compat_etree_Element,
compat_etree_fromstring,
compat_getpass,
@ -70,6 +70,7 @@ from ..utils import (
str_or_none,
str_to_int,
strip_or_none,
try_get,
unescapeHTML,
unified_strdate,
unified_timestamp,
@ -204,6 +205,14 @@ class InfoExtractor(object):
* downloader_options A dictionary of downloader options as
described in FileDownloader
Internally, extractors can include subtitles in the format
list, in this format:
* _subtitle The subtitle object, in the same format
as in subtitles field
* _key The tag for the provided subtitle
This is never included in the output JSON, but moved
into the subtitles field.
url: Final video URL.
ext: Video filename extension.
format: The video format, defaults to ext (used for --get-format)
@ -230,8 +239,10 @@ class InfoExtractor(object):
uploader: Full name of the video uploader.
license: License name the video is licensed under.
creator: The creator of the video.
release_timestamp: UNIX timestamp of the moment the video was released.
release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video became available.
timestamp: UNIX timestamp of the moment the video became available
(uploaded).
upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
@ -245,11 +256,15 @@ class InfoExtractor(object):
subtitles: The available subtitles as a dictionary in the format
{tag: subformats}. "tag" is usually a language code, and
"subformats" is a list sorted from lower to higher
preference, each element is a dictionary with the "ext"
entry and one of:
preference, each element is a dictionary,
which must contain one of these values:
* "data": The subtitles file contents
* "url": A URL pointing to the subtitles file
"ext" will be calculated from URL if missing
These values are guessed based on other data, if missing,
in a way analogic to the formats data:
* "ext" - subtitle extension name (vtt, srt, ...)
* "proto" - download protocol (https, http, m3u8, ...)
* "http_headers"
automatic_captions: Like 'subtitles', used by the YoutubeIE for
automatically generated captions
duration: Length of the video in seconds, as an integer or float.
@ -336,8 +351,8 @@ class InfoExtractor(object):
object, each element of which is a valid dictionary by this specification.
Additionally, playlists can have "id", "title", "description", "uploader",
"uploader_id", "uploader_url" attributes with the same semantics as videos
(see above).
"uploader_id", "uploader_url", "duration" attributes with the same semantics
as videos (see above).
_type "multi_video" indicates that there are multiple videos that
@ -394,6 +409,8 @@ class InfoExtractor(object):
_GEO_COUNTRIES = None
_GEO_IP_BLOCKS = None
_WORKING = True
_SELFHOSTED = False
_REQUIRES_PLAYWRIGHT = False
def __init__(self, downloader=None):
"""Constructor. Receives an optional downloader."""
@ -1237,8 +1254,16 @@ class InfoExtractor(object):
'ViewAction': 'view',
}
def extract_interaction_type(e):
interaction_type = e.get('interactionType')
if isinstance(interaction_type, dict):
interaction_type = interaction_type.get('@type')
return str_or_none(interaction_type)
def extract_interaction_statistic(e):
interaction_statistic = e.get('interactionStatistic')
if isinstance(interaction_statistic, dict):
interaction_statistic = [interaction_statistic]
if not isinstance(interaction_statistic, list):
return
for is_e in interaction_statistic:
@ -1246,8 +1271,8 @@ class InfoExtractor(object):
continue
if is_e.get('@type') != 'InteractionCounter':
continue
interaction_type = is_e.get('interactionType')
if not isinstance(interaction_type, compat_str):
interaction_type = extract_interaction_type(is_e)
if not interaction_type:
continue
# For interaction count some sites provide string instead of
# an integer (as per spec) with non digit characters (e.g. ",")
@ -1263,16 +1288,40 @@ class InfoExtractor(object):
continue
info[count_key] = interaction_count
def extract_video_object(e):
assert e['@type'] == 'VideoObject'
def extract_author(e):
if not e:
return None
if not e.get('author'):
return None
e = e['author']
if isinstance(e, str):
info['uploader'] = e
elif isinstance(e, dict):
etype = e.get('@type')
if etype in ('Person', 'Organization'):
info.update({
'uploader': e.get('name'),
'uploader_id': e.get('identifier'),
'uploader_url': try_get(e, lambda x: x['url']['url'], str),
})
media_object_types = ('MediaObject', 'VideoObject', 'AudioObject', 'MusicVideoObject')
def extract_media_object(e):
assert e['@type'] in media_object_types
thumbnails = e.get('thumbnailUrl') or e.get('thumbnailURL')
if isinstance(thumbnails, compat_str):
thumbnails = [thumbnails]
elif thumbnails is None:
thumbnails = []
thumbnails = [({'url': thumb}) for thumb in thumbnails]
info.update({
'url': url_or_none(e.get('contentUrl')),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'thumbnails': thumbnails,
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
@ -1280,14 +1329,15 @@ class InfoExtractor(object):
'view_count': int_or_none(e.get('interactionCount')),
})
extract_interaction_statistic(e)
extract_author(e)
for e in json_ld:
if '@context' in e:
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
continue
if item_type in ('TVEpisode', 'Episode'):
episode_name = unescapeHTML(e.get('name'))
if item_type in ('TVEpisode', 'Episode', 'PodcastEpisode'):
episode_name = unescapeHTML(e.get('name') or e.get('headline'))
info.update({
'episode': episode_name,
'episode_number': int_or_none(e.get('episodeNumber')),
@ -1302,7 +1352,7 @@ class InfoExtractor(object):
'season_number': int_or_none(part_of_season.get('seasonNumber')),
})
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries', 'PodcastSeries'):
info['series'] = unescapeHTML(part_of_series.get('name'))
elif item_type == 'Movie':
info.update({
@ -1317,21 +1367,29 @@ class InfoExtractor(object):
'title': unescapeHTML(e.get('headline')),
'description': unescapeHTML(e.get('articleBody')),
})
elif item_type == 'VideoObject':
extract_video_object(e)
elif item_type in media_object_types:
extract_media_object(e)
if expected_type is None:
continue
else:
break
video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
extract_video_object(video)
for media_key in ('video', 'associatedMedia'):
media = e.get(media_key)
if isinstance(media, dict) and media.get('@type') in media_object_types:
extract_media_object(media)
if expected_type is None:
continue
else:
break
return dict((k, v) for k, v in info.items() if v is not None)
def _search_nextjs_data(self, webpage, video_id, **kw):
return self._parse_json(
self._search_regex(
r'<script id="__NEXT_DATA__"[^>]+>(.+?)</script>',
webpage, 'next.js data', **kw),
video_id, **kw)
@staticmethod
def _hidden_inputs(html):
html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
@ -1365,6 +1423,10 @@ class InfoExtractor(object):
f['tbr'] = f['abr'] + f['vbr']
def _formats_key(f):
# manifest subtitle workaround
if '_subtitle' in f:
return (-1,)
# TODO remove the following workaround
from ..utils import determine_ext
if not f.get('ext') and 'url' in f:
@ -1384,7 +1446,19 @@ class InfoExtractor(object):
preference -= 0.5
protocol = f.get('protocol') or determine_protocol(f)
proto_preference = 0 if protocol in ['http', 'https'] else (-0.5 if protocol == 'rtsp' else -0.1)
if protocol in ['http', 'https']:
proto_preference = 0
elif protocol == 'rtsp':
proto_preference = -0.5
elif protocol == 'bittorrent':
if self._downloader.params.get('prefer_p2p') is True:
proto_preference = 1
elif self._downloader.params.get('allow_p2p') is True:
proto_preference = -0.1
else:
proto_preference = -2
else:
proto_preference = -0.1
if f.get('vcodec') == 'none': # audio only
preference -= 50
@ -1456,9 +1530,10 @@ class InfoExtractor(object):
try:
self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers)
return True
except ExtractorError:
except ExtractorError as e:
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, item))
'%s: %s URL is invalid, skipping: %s'
% (video_id, item, error_to_compat_str(e.cause)))
return False
def http_scheme(self):
@ -1492,7 +1567,7 @@ class InfoExtractor(object):
manifest_url, video_id, 'Downloading f4m manifest',
'Unable to download f4m manifest',
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/ytdl-org/haruhi-dl/issues/6215#issuecomment-121704244)
# (see https://github.com/ytdl-org/youtube-dl/issues/6215#issuecomment-121704244)
transform_source=transform_source,
fatal=fatal, data=data, headers=headers, query=query)
@ -1523,7 +1598,7 @@ class InfoExtractor(object):
manifest_version = '2.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
# Remove unsupported DRM protected media from final formats
# rendition (see https://github.com/ytdl-org/haruhi-dl/issues/8573).
# rendition (see https://github.com/ytdl-org/youtube-dl/issues/8573).
media_nodes = remove_encrypted_media(media_nodes)
if not media_nodes:
return formats
@ -1654,8 +1729,8 @@ class InfoExtractor(object):
# References:
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21
# 2. https://github.com/ytdl-org/haruhi-dl/issues/12211
# 3. https://github.com/ytdl-org/haruhi-dl/issues/18923
# 2. https://github.com/ytdl-org/youtube-dl/issues/12211
# 3. https://github.com/ytdl-org/youtube-dl/issues/18923
# We should try extracting formats only from master playlists [1, 4.3.4],
# i.e. playlists that describe available qualities. On the other hand
@ -1687,7 +1762,7 @@ class InfoExtractor(object):
if not (media_type and group_id and name):
return
groups.setdefault(group_id, []).append(media)
if media_type not in ('VIDEO', 'AUDIO'):
if media_type not in ('VIDEO', 'AUDIO', 'SUBTITLES'):
return
media_url = media.get('URI')
if media_url:
@ -1695,17 +1770,27 @@ class InfoExtractor(object):
for v in (m3u8_id, group_id, name):
if v:
format_id.append(v)
f = {
'format_id': '-'.join(format_id),
'url': format_url(media_url),
'manifest_url': m3u8_url,
'language': media.get('LANGUAGE'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
}
if media_type == 'AUDIO':
f['vcodec'] = 'none'
if media_type == 'SUBTITLES':
f = {
'_subtitle': {
'url': format_url(media_url),
'ext': 'vtt',
'protocol': entry_protocol,
},
'_key': media.get('LANGUAGE'),
}
else:
f = {
'format_id': '-'.join(format_id),
'url': format_url(media_url),
'manifest_url': m3u8_url,
'language': media.get('LANGUAGE'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
}
if media_type == 'AUDIO':
f['vcodec'] = 'none'
formats.append(f)
def build_stream_name():
@ -2211,7 +2296,7 @@ class InfoExtractor(object):
# First of, % characters outside $...$ templates
# must be escaped by doubling for proper processing
# by % operator string formatting used further (see
# https://github.com/ytdl-org/haruhi-dl/issues/16867).
# https://github.com/ytdl-org/youtube-dl/issues/16867).
t = ''
in_template = False
for c in tmpl:
@ -2230,7 +2315,7 @@ class InfoExtractor(object):
# @initialization is a regular template like @media one
# so it should be handled just the same way (see
# https://github.com/ytdl-org/haruhi-dl/issues/11605)
# https://github.com/ytdl-org/youtube-dl/issues/11605)
if 'initialization' in representation_ms_info:
initialization_template = prepare_template(
'initialization',
@ -2316,7 +2401,7 @@ class InfoExtractor(object):
elif 'segment_urls' in representation_ms_info:
# Segment URLs with no SegmentTimeline
# Example: https://www.seznam.cz/zpravy/clanek/cesko-zasahne-vitr-o-sile-vichrice-muze-byt-i-zivotu-nebezpecny-39091
# https://github.com/ytdl-org/haruhi-dl/pull/14844
# https://github.com/ytdl-org/youtube-dl/pull/14844
fragments = []
segment_duration = float_or_none(
representation_ms_info['segment_duration'],
@ -2354,8 +2439,8 @@ class InfoExtractor(object):
# According to [1, 5.3.5.2, Table 7, page 35] @id of Representation
# is not necessarily unique within a Period thus formats with
# the same `format_id` are quite possible. There are numerous examples
# of such manifests (see https://github.com/ytdl-org/haruhi-dl/issues/15111,
# https://github.com/ytdl-org/haruhi-dl/issues/13919)
# of such manifests (see https://github.com/ytdl-org/youtube-dl/issues/15111,
# https://github.com/ytdl-org/youtube-dl/issues/13919)
full_info = formats_dict.get(representation_id, {}).copy()
full_info.update(f)
formats.append(full_info)
@ -2518,7 +2603,7 @@ class InfoExtractor(object):
media_tags.extend(re.findall(
# We only allow video|audio followed by a whitespace or '>'.
# Allowing more characters may end up in significant slow down (see
# https://github.com/ytdl-org/haruhi-dl/issues/11979, example URL:
# https://github.com/ytdl-org/youtube-dl/issues/11979, example URL:
# http://www.porntrex.com/maps/videositemap.xml).
r'(?s)(<(?P<tag>(?:amp-)?(?:video|audio))(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
for media_tag, media_type, media_content in media_tags:
@ -2594,7 +2679,15 @@ class InfoExtractor(object):
return entries
def _extract_akamai_formats(self, manifest_url, video_id, hosts={}):
signed = 'hdnea=' in manifest_url
if not signed:
# https://learn.akamai.com/en-us/webhelp/media-services-on-demand/stream-packaging-user-guide/GUID-BE6C0F73-1E06-483B-B0EA-57984B91B7F9.html
manifest_url = re.sub(
r'(?:b=[\d,-]+|(?:__a__|attributes)=off|__b__=\d+)&?',
'', manifest_url).strip('?')
formats = []
hdcore_sign = 'hdcore=3.7.0'
f4m_url = re.sub(r'(https?://[^/]+)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
hds_host = hosts.get('hds')
@ -2607,13 +2700,38 @@ class InfoExtractor(object):
for entry in f4m_formats:
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.extend(f4m_formats)
m3u8_url = re.sub(r'(https?://[^/]+)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
hls_host = hosts.get('hls')
if hls_host:
m3u8_url = re.sub(r'(https?://)[^/]+', r'\1' + hls_host, m3u8_url)
formats.extend(self._extract_m3u8_formats(
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats)
http_host = hosts.get('http')
if http_host and m3u8_formats and not signed:
REPL_REGEX = r'https?://[^/]+/i/([^,]+),([^/]+),([^/]+)\.csmil/.+'
qualities = re.match(REPL_REGEX, m3u8_url).group(2).split(',')
qualities_length = len(qualities)
if len(m3u8_formats) in (qualities_length, qualities_length + 1):
i = 0
for f in m3u8_formats:
if f['vcodec'] != 'none':
for protocol in ('http', 'https'):
http_f = f.copy()
del http_f['manifest_url']
http_url = re.sub(
REPL_REGEX, protocol + r'://%s/\g<1>%s\3' % (http_host, qualities[i]), f['url'])
http_f.update({
'format_id': http_f['format_id'].replace('hls-', protocol + '-'),
'url': http_url,
'protocol': protocol,
})
formats.append(http_f)
i += 1
return formats
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
@ -2684,9 +2802,14 @@ class InfoExtractor(object):
if isinstance(jwplayer_data, dict):
return jwplayer_data
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
def _extract_jwplayer_data(self, webpage, video_id, *args, fatal=True, **kwargs):
jwplayer_data = self._find_jwplayer_data(
webpage, video_id, transform_source=js_to_json)
if jwplayer_data is None:
if fatal:
raise ExtractorError("jwplayer data could not be found")
else:
return None
return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs)
@ -2859,10 +2982,10 @@ class InfoExtractor(object):
self._downloader.cookiejar.set_cookie(cookie)
def _get_cookies(self, url):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """
""" Return a compat_cookies_SimpleCookie with the cookies for the url """
req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie'))
return compat_cookies_SimpleCookie(req.get_header('Cookie'))
def _apply_first_set_cookie_header(self, url_handle, cookie):
"""
@ -2875,7 +2998,7 @@ class InfoExtractor(object):
We will workaround this issue by resetting the cookie to
the first one manually.
1. https://new.vk.com/
2. https://github.com/ytdl-org/haruhi-dl/issues/9841#issuecomment-227871201
2. https://github.com/ytdl-org/youtube-dl/issues/9841#issuecomment-227871201
3. https://learning.oreilly.com/
"""
for header, cookies in url_handle.headers.items():
@ -3020,3 +3143,85 @@ class SearchInfoExtractor(InfoExtractor):
@property
def SEARCH_KEY(self):
return self._SEARCH_KEY
class SelfhostedInfoExtractor(InfoExtractor):
"""Selfhosted Information Extractor class.
Selfhosted info extractors are for the services,
that cannot be handled by just listing all of their domains.
Mostly related to free and open source software,
which everyone is allowed to host on their own servers
(like PeerTube, Funkwhale, Mastodon, Nextcloud, and lots of others).
The _VALID_URL value should not match URLs, but it surely can
match the extractor-specific ID pointer string
(f.e. Mastodon extractor can match "mastodon:donotsta.re:9xN1v6yM7WhzE7aIIC",
but not "https://donotsta.re/notice/9xN1v6yM7WhzE7aIIC").
https://git.sakamoto.pl/laudom/haruhi-dl/-/issues/10
"""
_SELFHOSTED = True
"""Regular expression that matches the actual URLs, or None if should not be checked"""
_SH_VALID_URL = None
"""An iterable of strings, of which *any* should be contained in the webpage contents, or None if should not be checked"""
_SH_VALID_CONTENT_STRINGS = None
"""An iterable of regular expression strings, of which *any* should match the webpage contents, or None if should not be checked"""
_SH_VALID_CONTENT_REGEXES = None
@property
def IE_NAME(self):
return compat_str(type(self).__name__[:-4])
@classmethod
def suitable_selfhosted(cls, url, webpage):
"""Receives a URL and webpage contents, and returns True if suitable for this IE."""
if cls._SH_VALID_URL:
if '_SH_VALID_URL_RE' not in cls.__dict__:
cls._SH_VALID_URL_RE = re.compile(cls._SH_VALID_URL)
if cls._SH_VALID_URL_RE.match(url) is None:
return False
if webpage is None:
# if no webpage, assume just matching the URL is fine
if cls._SH_VALID_URL:
return True
# failing, there's nothing more to check
return False
if any(p in webpage for p in (cls._SH_VALID_CONTENT_STRINGS or ())):
return True
# no strings? check regexes!
if '_SH_CONTENT_REGEXES_RES' not in cls.__dict__:
cls._SH_VALID_CONTENT_REGEXES_RES = (re.compile(rgx)
for rgx in cls._SH_VALID_CONTENT_REGEXES or ())
if not any(rgx.match(webpage) is not None for rgx in cls._SH_VALID_CONTENT_REGEXES_RES):
return False
def _real_extract(self, url):
"""Unreal extraction process. Do NOT redefine in subclasses."""
return self._selfhosted_extract(url)
def _selfhosted_extract(self, url, webpage=None):
"""Real extraction process. Redefine in subclasses.
`webpage` is a string (the website contents, as downloaded by GenericIE) or None"""
pass
@classmethod
def _match_id_and_host(cls, url):
if '_VALID_URL_RE' not in cls.__dict__:
cls._VALID_URL_RE = re.compile(cls._VALID_URL)
m = cls._VALID_URL_RE.match(url)
if m is None:
if '_SH_VALID_URL_RE' not in cls.__dict__:
cls._SH_VALID_URL_RE = re.compile(cls._SH_VALID_URL)
m = cls._SH_VALID_URL_RE.match(url)
assert m
return compat_str(m.group('host')), compat_str(m.group('id'))

View file

@ -36,7 +36,7 @@ class UnicodeBOMIE(InfoExtractor):
_VALID_URL = r'(?P<bom>\ufeff)(?P<id>.*)$'
# Disable test for python 3.2 since BOM is broken in re in this version
# (see https://github.com/ytdl-org/haruhi-dl/issues/9751)
# (see https://github.com/ytdl-org/youtube-dl/issues/9751)
_TESTS = [] if (3, 0) < sys.version_info <= (3, 3) else [{
'url': '\ufeffhttp://www.youtube.com/watch?v=BaW_jenozKc',
'only_matching': True,

View file

@ -1,9 +1,15 @@
from __future__ import unicode_literals
from urllib.parse import parse_qs
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
)
from ..utils import (
try_get,
ExtractorError,
)
class RtmpIE(InfoExtractor):
@ -58,3 +64,71 @@ class MmsIE(InfoExtractor):
'title': title,
'url': url,
}
class BitTorrentMagnetIE(InfoExtractor):
IE_DESC = False
_VALID_URL = r'(?i)magnet:\?.+'
_TESTS = [{
'url': 'magnet:?xs=https%3A%2F%2Fvideo.internet-czas-dzialac.pl%2Fstatic%2Ftorrents%2F9085aa69-90c2-40c6-a707-3472b92cafc8-0.torrent&xt=urn:btih:0ae4cc8cb0e098a1a40b3224aa578bb4210a8cff&dn=Podcast+Internet.+Czas+dzia%C5%82a%C4%87!+-+Trailer&tr=wss%3A%2F%2Fvideo.internet-czas-dzialac.pl%3A443%2Ftracker%2Fsocket&tr=https%3A%2F%2Fvideo.internet-czas-dzialac.pl%2Ftracker%2Fannounce&ws=https%3A%2F%2Fvideo.internet-czas-dzialac.pl%2Fstatic%2Fwebseed%2F9085aa69-90c2-40c6-a707-3472b92cafc8-0.mp4',
'info_dict': {
'id': 'urn:btih:0ae4cc8cb0e098a1a40b3224aa578bb4210a8cff',
'ext': 'torrent',
'title': 'Podcast Internet. Czas działać! - Trailer',
},
'params': {
'allow_p2p': True,
'prefer_p2p': True,
'skip_download': True,
},
}]
def _real_extract(self, url):
qs = parse_qs(url[len('magnet:?'):])
# eXact Topic
video_id = qs['xt'][0]
if not video_id.startswith('urn:btih:'):
raise ExtractorError('Not a BitTorrent magnet')
# Display Name
title = try_get(qs, lambda x: x['dn'][0], str) or video_id[len('urn:btih:'):]
formats = [{
'url': url,
'protocol': 'bittorrent',
}]
# Web Seed
if qs.get('ws'):
for ws in qs['ws']:
formats.append({
'url': ws,
})
# Acceptable Source
if qs.get('as'):
for as_ in qs['as']:
formats.append({
'url': as_,
'preference': -2,
})
# eXact Source
if qs.get('xs'):
for xs in qs['xs']:
formats.append({
'url': xs,
'protocol': 'bittorrent',
})
self._sort_formats(formats)
# eXact Length
if qs.get('xl'):
xl = int(qs['xl'][0])
for i in range(0, len(formats)):
formats[i]['filesize'] = xl
return {
'id': video_id,
'title': title,
'formats': formats,
}

View file

@ -16,6 +16,8 @@ from ..utils import (
mimetype2ext,
orderedSet,
parse_iso8601,
strip_or_none,
try_get,
)
@ -82,6 +84,7 @@ class CondeNastIE(InfoExtractor):
'uploader': 'gq',
'upload_date': '20170321',
'timestamp': 1490126427,
'description': 'How much grimmer would things be if these people were competent?',
},
}, {
# JS embed
@ -93,7 +96,7 @@ class CondeNastIE(InfoExtractor):
'title': '3D printed TSA Travel Sentry keys really do open TSA locks',
'uploader': 'arstechnica',
'upload_date': '20150916',
'timestamp': 1442434955,
'timestamp': 1442434920,
}
}, {
'url': 'https://player.cnevids.com/inline/video/59138decb57ac36b83000005.js?target=js-cne-player',
@ -196,6 +199,13 @@ class CondeNastIE(InfoExtractor):
})
self._sort_formats(formats)
subtitles = {}
for t, caption in video_info.get('captions', {}).items():
caption_url = caption.get('src')
if not (t in ('vtt', 'srt', 'tml') and caption_url):
continue
subtitles.setdefault('en', []).append({'url': caption_url})
return {
'id': video_id,
'formats': formats,
@ -208,6 +218,7 @@ class CondeNastIE(InfoExtractor):
'season': video_info.get('season_title'),
'timestamp': parse_iso8601(video_info.get('premiere_date')),
'categories': video_info.get('categories'),
'subtitles': subtitles,
}
def _real_extract(self, url):
@ -225,8 +236,16 @@ class CondeNastIE(InfoExtractor):
if url_type == 'series':
return self._extract_series(url, webpage)
else:
params = self._extract_video_params(webpage, display_id)
info = self._search_json_ld(
webpage, display_id, fatal=False)
video = try_get(self._parse_json(self._search_regex(
r'__PRELOADED_STATE__\s*=\s*({.+?});', webpage,
'preload state', '{}'), display_id),
lambda x: x['transformed']['video'])
if video:
params = {'videoId': video['id']}
info = {'description': strip_or_none(video.get('description'))}
else:
params = self._extract_video_params(webpage, display_id)
info = self._search_json_ld(
webpage, display_id, fatal=False)
info.update(self._extract_video(params))
return info

Some files were not shown because too many files have changed in this diff Show more