Compare commits

...

237 commits

Author SHA1 Message Date
Lauren Liberda 2f375d447c fix/speedup ci 2021-09-09 12:38:11 +02:00
Lauren Liberda d464b29113 vider support 2021-09-06 22:34:06 +02:00
Lauren Liberda 19602fb3f5 [polskieradio] fix PR4 audition shit 2021-08-31 20:25:12 +02:00
Lauren Liberda a550e21b8c [ipla] state the DRM requirement clearly 2021-08-07 02:23:28 +02:00
Lauren Liberda 1ae67712e8 [ipla] error handling 2021-08-07 01:08:07 +02:00
Dominika Liberda a96bf110da * version 2021.08.01 2021-08-01 17:44:07 +02:00
Lauren Liberda 973652cf4d [youtube] fix age gate for *some* videos 2021-08-01 17:39:30 +02:00
Lauren Liberda d81137a604 [peertube] pt 3.3+ url scheme support, fix tests, minor fixes 2021-07-30 20:40:19 +02:00
Lauren Liberda a0d52ce5be [niconico] dmc downloader and other stuff from yt-dlp (as of 40078a5) 2021-06-26 14:40:02 +02:00
Dominika Liberda 81b5018d99 * version 2021.06.24.1 2021-06-24 14:01:25 +02:00
Dominika Liberda 31b7bf5bdb * fixes crash if signature decryption code isn't packed with artifacts 2021-06-24 13:58:36 +02:00
Dominika Liberda a0cb1b40a2 * fix in release script 2021-06-24 13:18:36 +02:00
Dominika Liberda c3e48f4934 * version 2021.06.24 2021-06-24 13:07:07 +02:00
Dominika Liberda ca6cbb6234 * fixes youtube list extractor 2021-06-24 12:27:39 +02:00
Lauren Liberda 7858dc7b9f fix app crash/tests 2021-06-22 03:17:30 +02:00
Lauren Liberda 2234b1100c [liveleak] remove for real 2021-06-22 03:02:52 +02:00
Lauren Liberda 75442522b2 [soundcloud] prerelease client id fetching 2021-06-22 02:43:50 +02:00
Lauren Liberda f4070e6fe4 prerelease artifact generator, for youtube sig 2021-06-21 23:01:02 +02:00
Lauren Liberda b30cd7afbb [liveleak] remove extractor 2021-06-21 20:43:52 +02:00
Lauren Liberda 29389b4935 [pornhub] Add support for pornhubthbh7ap3u.onion
Original author: dstftw <dstftw@gmail.com>
2021-06-21 20:26:48 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3fc2d04e08 [pornhub] Detect geo restriction 2021-06-21 20:22:14 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 30a3fb457e [pornhub] Dismiss tbr extracted from download URLs (closes #28927)
No longer reliable
2021-06-21 20:22:07 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 69813b6be8 [curiositystream:collection] Extend _VALID_URL (closes #26326, closes…
#29117)
2021-06-21 20:22:00 +02:00
Tianyi Shi f1a365faf8 [bilibili] Strip uploader name (#29202) 2021-06-21 20:21:17 +02:00
Logan B 86c90f7d47 [umg:de] Update GraphQL API URL (#29304)
Previous one no longer resolves

Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-06-21 20:20:56 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a33a92ba4b [nrk] Switch psapi URL to https (closes #29344)
Catalog calls no longer work via http
2021-06-21 20:20:49 +02:00
kikuyan 6057163d97 [postprocessor/ffmpeg] Show ffmpeg output on error (refs #22680) (#29…
…336)
2021-06-21 20:20:43 +02:00
kikuyan aad8936157 [egghead] Add support for app.egghead.io (closes #28404) (#29303)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-06-21 20:20:36 +02:00
kikuyan 18dd355e39 [appleconnect] Fix extraction (#29208) 2021-06-21 20:20:29 +02:00
kikuyan e628fc3794 [orf:tvthek] Add support for MPD formats (closes #28672) (#29236) 2021-06-21 20:20:18 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ac99e96a1e [facebook] Improve login required detection 2021-06-21 20:19:41 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 93131809f2 [youporn] Fix formats and view count extraction (closes #29216) 2021-06-21 20:19:35 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 9cced7b3d2 [orf:tvthek] Fix thumbnails extraction (closes #29217) 2021-06-21 20:19:28 +02:00
Remita Amine b526b67bc1 [formula1] fix extraction(closes #29206) 2021-06-21 20:19:20 +02:00
Lauren Liberda e676b759d1 [youtube] fix the fancy georestricted error 2021-06-20 23:00:58 +02:00
Dominika Liberda 1d54631bfb * version 2021.06.20 2021-06-20 22:26:02 +02:00
Lauren Liberda 073959a503 update changelog 2021-06-20 22:10:21 +02:00
Dominika Liberda eaf7a8bd6e * fixes agegate on youtube 2021-06-20 22:08:26 +02:00
Lauren Liberda ed273bfbf2 [youtube] cleanup, speed up age-gated extraction, fix videos with js-like syntax 2021-06-06 16:40:57 +02:00
Lauren Liberda 9373a2f667 [options] fix playwright headlessness behavior 2021-06-03 17:16:39 +02:00
Lauren Liberda f2a5fa2e53 [playwright] option to force a specific browser 2021-06-03 17:13:41 +02:00
Lauren Liberda 9b1ef5167d [tiktok] fix empty video lists
I'm fucking stupid
2021-06-03 16:53:01 +02:00
Lauren Liberda 7787c45730 [playwright] simplify code 2021-06-03 14:33:14 +02:00
Dominika Liberda f34b024e70 * version 2021.06.01 2021-06-01 10:39:48 +02:00
Lauren Liberda 0d8ef28280 update changelog 2021-05-31 23:37:14 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 132d7674e3 [ard] Relax _VALID_URL and fix video ids (closes #22724, closes #29091) 2021-05-31 23:29:01 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e19e102a56 [ustream] Detect https embeds (closes #29133) 2021-05-31 23:28:53 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= dd62e6bab3 [ted] Prefer own formats over external sources (closes #29142) 2021-05-31 23:28:48 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 484dabbf8a [twitch:clips] Improve extraction (closes #29149) 2021-05-31 23:28:40 +02:00
phlip 2e387cb356 [twitch:clips] Add access token query to download URLs (closes #29136) 2021-05-31 23:28:25 +02:00
Remita Amine 177f5c64de [vimeo] fix vimeo pro embed extraction(closes #29126) 2021-05-31 23:28:17 +02:00
Remita Amine a9f7bf158b [redbulltv] fix embed data extraction(closes #28770) 2021-05-31 23:28:11 +02:00
Remita Amine 80c9bfae14 [shahid] relax _VALID_URL(closes #28772, closes #28930) 2021-05-31 23:28:05 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8fac551776 [playstuff] Add extractor (closes #28901, closes #28931) 2021-05-31 23:27:42 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 47fec1e10b [eroprofile] Skip test 2021-05-31 23:27:36 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 57c88d40d3 [eroprofile] Fix extraction (closes #23200, closes #23626, closes #29…
…008)
2021-05-31 23:27:20 +02:00
kr4ssi ce5c2526bc [vivo] Add support for vivo.st (#29009)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-31 23:27:14 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8c826fe7ce [generic] Add support for og:audio (closes #28311, closes #29015) 2021-05-31 23:27:07 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 1c539931b6 [options] Fix thumbnail option group name (closes #29042) 2021-05-31 23:27:01 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 6d5cb9e661 [phoenix] Fix extraction (closes #29057) 2021-05-31 23:26:55 +02:00
Lauren Liberda 97abd98bc3 [generic] Add support for sibnet embeds
286e01ce30
2021-05-31 23:26:35 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 646a08b1c5 [vk] Add support for sibnet embeds (closes #9500) 2021-05-31 23:23:22 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= e32f3c07ea [generic] Add Referer header for direct videojs download URLs (closes…
#2879, closes #20217, closes #29053)
2021-05-31 23:23:16 +02:00
Lukas Anzinger 56d9861eb5 [orf:radio] Switch download URLs to HTTPS (closes #29012) (#29046) 2021-05-31 23:23:09 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8d0c50580c [blinkx] Remove extractor (closes #28941)
No longer exists.
2021-05-31 23:23:02 +02:00
catboy 07adc2e4cd [medaltv] Relax _VALID_URL (#28884)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-31 23:22:56 +02:00
Jacob Chapman a3e21baccc [YoutubeDL] Improve extract_info doc (#28946)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-31 23:22:47 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c3b5074fcd [funimation] Add support for optional lang code in URLs (closes #28950) 2021-05-31 23:22:07 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 30d8947496 [gdcvault] Add support for HTML5 videos 2021-05-31 23:20:33 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2489669316 [dispeak] DRY and update tests (closes #28970) 2021-05-31 23:20:13 +02:00
Ben Rog-Wilhelm 5512cc0f37 [dispeak] Improve FLV extraction (closes #13513) 2021-05-31 23:20:06 +02:00
Ben Rog-Wilhelm 1643b0b490 [kaltura] Improve iframe extraction (#28969)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-05-31 23:19:55 +02:00
Lauren Liberda 41cd26d4cf [kaltura] Make embed code alternatives actually work 2021-05-31 23:19:38 +02:00
Lauren Liberda 993cb8ce4c [youtube] fix videos with age gate 2021-05-31 22:31:08 +02:00
Lauren Liberda fca8c46c7b radiokapital extractors 2021-05-30 15:37:30 +02:00
Lauren Liberda 9d9b571371 [misskey] add tests 2021-05-28 00:53:27 +02:00
Lauren Liberda d540126206 utils: flake8 2021-05-28 00:50:34 +02:00
Lauren Liberda fa290c78e7 misskey extractor 2021-05-28 00:50:00 +02:00
Lauren Liberda 2c8fa677b2 [tiktok] deduplicate videos 2021-05-23 17:44:08 +02:00
Lauren Liberda ad5cc09566 [peertube] logging in 2021-05-23 14:43:00 +02:00
Lauren Liberda e83f44815c [mastodon] support cards to external services 2021-05-07 14:24:46 +02:00
Lauren Liberda 6adb5ea838 [mastodon] cache apps on logging in 2021-05-04 01:07:29 +02:00
Lauren Liberda 8dee2b0f85 changelog update 2021-05-03 23:16:25 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 36bc893bd8 [twitter] Improve formats extraction from vmap URL (closes #28909) 2021-05-03 23:00:18 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ceab7dc7ec [xtube] Fix formats extraction (closes #28870) 2021-05-03 23:00:13 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 560a3ab05d [svtplay] Improve extraction (closes #28507, closes #28876) 2021-05-03 23:00:07 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= b7f9dc517f [tv2dk] Fix extraction (closes #28888) 2021-05-03 22:59:59 +02:00
schnusch d56b6a0b75 [xfileshare] Add support for wolfstream.tv (#28858) 2021-05-03 22:59:44 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 2403ecd42d [francetvinfo] Improve video id extraction (closes #28792) 2021-05-03 22:59:38 +02:00
catboy 19dc8442c2 [medaltv] Fix extraction (#28807)
numeric clip ids are no longer used by medal, and integer user ids are now sent as strings.
2021-05-03 22:59:31 +02:00
The Hatsune Daishi d40d350a69 [tver] Redirect all downloads to Brightcove (#28849) 2021-05-03 22:59:23 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 63c541a3cd [test_execution] Add test for lazy extractors (refs #28780) 2021-05-03 22:57:13 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= c9c96706eb [bbc] Extract full description from __INITIAL_DATA__ (refs #28774) 2021-05-03 22:55:55 +02:00
dirkf 35043c6160 [bbc] Extract description and timestamp from __INITIAL_DATA__ (#28774) 2021-05-03 22:55:49 +02:00
Lauren Liberda 5c054ee942 [mastodon] oh haruhi what did I NOT do here
+ --force-use-mastodon option
+ logging in to mastodon/pleroma
+ fetching posts via different mastodon/pleroma instances to get follower-only/direct posts
+ fetching peertube videos via pleroma instances to circuvument censorship (?)
2021-05-03 01:41:05 +02:00
Lauren Liberda 76d4e8de92 [wppilot] add tests 2021-04-28 11:43:51 +02:00
Lauren Liberda e9f7e06635 [wppilot] reduce logging in and throw meaningful errors 2021-04-28 01:53:41 +02:00
Lauren Liberda 64ec930237 wp pilot extractors 2021-04-28 00:33:27 +02:00
Lauren Liberda ac8b9e45fb yet another update on funding 2021-04-26 17:16:19 +02:00
Lauren Liberda 8b4a9656f0 [tvp] fix website extracting with weird urls 2021-04-26 17:14:13 +02:00
Lauren Liberda 6ad8f4990a [tvn] better extraction method choosing 2021-04-26 14:40:15 +02:00
Lauren Liberda b31ca60b3a update on donations 2021-04-23 18:09:37 +02:00
Lauren Liberda eb67a3cd44 [tvp:embed] handling formats better way 2021-04-22 15:49:51 +02:00
Lauren Liberda cde74b6420 [youtube:channel] fix multiple page extraction 2021-04-20 23:57:15 +02:00
Lauren Liberda d68515cd12 readme update 2021-04-19 01:27:42 +02:00
Lauren Liberda 379b17f27e [tvp] fix jp2.tvp.pl 2021-04-18 21:49:22 +02:00
Lauren Liberda 83a294d881 [mastodon] support for soapbox and audio files 2021-04-18 18:19:33 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 4c46e374bc [cbsnews] Fix extraction for python <3.6 (closes #23359) 2021-04-18 16:41:09 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 5f6bcc20f5 [utils] Add support for support for experimental HTTP response status…
… code 308 Permanent Redirect (refs #27877, refs #28768)
2021-04-18 16:38:56 +02:00
quyleanh 865b8fd65f [pluralsight] Extend anti-throttling timeout (#28712) 2021-04-18 16:37:48 +02:00
Aaron Lipinski f7cde33162 [maoritv] Add new extractor(closes #24552) 2021-04-18 16:37:39 +02:00
Remita Amine 9ef69b9a67 [mtv] Fix Viacom A/B Testing Video Player extraction(closes #28703) 2021-04-18 16:37:32 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 05f71071f4 [pornhub] Extract DASH and HLS formats from get_media end point (clos…
…es #28698)
2021-04-18 16:37:25 +02:00
Remita Amine f755095cb3 [cbssports] fix extraction(closes #28682) 2021-04-18 16:37:17 +02:00
Remita Amine 85f9e11581 [jamendo] fix track extraction(closes #28686) 2021-04-18 16:37:11 +02:00
Remita Amine 6108793376 [curiositystream] fix format extraction(closes #26845, closes #28668) 2021-04-18 16:37:02 +02:00
Lauren Liberda d94f06105c compat simplecookie again because reasons 2021-04-18 16:36:44 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 0a6031afcb [compat] Use more conventional name for compat SimpleCookie 2021-04-18 16:35:21 +02:00
guredora d8d8cc0945 [line] add support live.line.me (closes #17205)(closes #28658) 2021-04-18 16:30:36 +02:00
Lauren Liberda 8deedd7636 added compat_SimpleCookie for compatibility with ytdl 2021-04-18 16:28:42 +02:00
Remita Amine 229b4d1671 [compat] add compat_SimpleCookie 2021-04-18 16:26:49 +02:00
Lauren Liberda 2208983e30 [vimeo] extraction improvements
originally by Remita Amine <remitamine@gmail.com>
2021-04-18 16:24:11 +02:00
RomanEmelyanov 8f35a39d9f [youku] Update ccode(closes #17852, closes #28447, closes #28460) (#2…
…8648)
2021-04-18 16:10:33 +02:00
Remita Amine 97b46bced6 [extractor/common] fix _get_cookies method for python 2(#20673, #2325…
…6, #20326, closes #28640)
2021-04-18 16:09:45 +02:00
Remita Amine 6f678388cb [screencastomatic] fix extraction(closes #11976, closes #24489) 2021-04-18 16:09:39 +02:00
Allan Daemon 40ef0c5a1c [palcomp3] Add new extractor(closes #13120) 2021-04-18 16:09:32 +02:00
Vid f0dd168230 [arnes] Add new extractor(closes #28483) 2021-04-18 16:09:19 +02:00
Adrian Heine df566be96f [magentamusik360] Add new extractor 2021-04-16 23:41:22 +02:00
Lauren Liberda 923069eb48 [core] merge formats by codecs 2021-04-15 01:41:42 +02:00
Lauren Liberda a0986f874d [senat.pl] support for live videos 2021-04-14 14:21:07 +02:00
Lauren Liberda 12a935cf42 [sejm.pl] support live streams 2021-04-14 14:01:54 +02:00
Lauren Liberda 44ed85b18b + castos extractors 2021-04-13 00:17:17 +02:00
Lauren Liberda 2bd0f6069a spryciarze.pl extractors 2021-04-12 20:53:07 +02:00
Lauren Liberda e2764f61ea json_dl: better author extraction 2021-04-12 20:52:49 +02:00
Lauren Liberda 66e93478d8 [spreaker] embedded player support 2021-04-12 14:45:18 +02:00
Lauren Liberda a4d58a6adf [spreaker] new url schemes 2021-04-12 14:33:51 +02:00
Lauren Liberda abb792e7b5 senat.pl extractor 2021-04-12 13:55:46 +02:00
Lauren Liberda 55e021da8e [sejm.pl] extracting ism formats, small changes to work with senat 2021-04-12 13:55:30 +02:00
Lauren Liberda 13cc377d6f [sejm.pl] multiple cameras and PJM translator 2021-04-12 12:07:19 +02:00
Lauren Liberda 46d28e0fd5 + sejm.gov.pl archival video extractor 2021-04-12 01:14:28 +02:00
Lauren Liberda 9c0e55eb79 improve documentation on subtitles 2021-04-09 18:41:55 +02:00
Lauren Liberda 860a8f2061 [tvp] support for tvp.info vue pages 2021-04-09 17:07:59 +02:00
Lauren Liberda 557fe650bb [cda] fix premium videos for premium users (?) 2021-04-09 13:56:17 +02:00
Lauren Liberda baf8549c0a [tvn24] refactor nextjs frontend handling
mitigating HTTP 404 response issues
2021-04-09 01:50:18 +02:00
Lauren Liberda dae5140251 - [ninateka] remove extractor [*]
ninateka uses DRM protection now
2021-04-06 14:51:34 +02:00
Lauren Liberda 9eaffe8278 [tvp:series] error handling, fallback to web 2021-04-05 04:11:02 +02:00
Lauren Liberda 6ed5f6bbc8 copykitku: get ready for merging other fork changes 2021-04-05 02:44:06 +02:00
Dominika Liberda a71cc68530 * version 2021.04.01 2021-04-01 20:57:36 +02:00
Lauren Liberda 8a0ec69c60 [vlive] merge all updates from ytdl 2021-04-01 14:54:46 +02:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 607734c7ef [francetvinfo] Improve video id extraction (closes #28584) 2021-04-01 14:49:09 +02:00
Chris Hranj 3a0f408546 [instagram] Improve title extraction and extract duration (#28469)
Co-authored-by: Sergey M. <dstftw@gmail.com>
2021-04-01 14:49:01 +02:00
Lauren Liberda a067097513 [youtube] better consent workaround 2021-04-01 14:07:32 +02:00
Dominika Liberda b428c73970 * version 2021.03.30 2021-03-30 22:43:27 +02:00
Lauren Liberda e824771caf [makefile] use python3 2021-03-30 22:33:20 +02:00
Lauren Liberda ecf455300f [youtube] consent shit workaround (fuck google)
Co-authored-by: Dominika Liberda <ja@sdomi.pl>
2021-03-30 22:32:29 +02:00
Remita Amine 605ba1f477 [sbs] add support for ondemand watch URLs(closes #28566) 2021-03-28 22:26:01 +02:00
Remita Amine 6277a6f4c7 [picarto] fix live stream extraction(closes #28532) 2021-03-28 22:25:34 +02:00
Remita Amine 608d64024e [vimeo] fix unlisted video extraction(closes #28414) 2021-03-28 22:25:29 +02:00
Remita Amine b587a7656e [ard] improve clip id extraction(#22724)(closes #28528) 2021-03-28 22:25:21 +02:00
Roman Sebastian Karwacik 14ee975fb4 [zoom] Add new extractor(closes #16597, closes #27002, closes #28531) 2021-03-28 22:25:14 +02:00
The Hatsune Daishi 1cd1ed16ed [extractor] escape forgotten dot for hostnames in regular expression …
…(#28530)
2021-03-28 22:25:08 +02:00
Remita Amine 74ae4cb2be [bbc] fix BBC IPlayer Episodes/Group extraction(closes #28360) 2021-03-28 22:25:02 +02:00
Remita Amine 7bee125ade [zingmp3] fix extraction(closes #11589, closes #16409, closes #16968,…
closes #27205)
2021-03-28 22:24:43 +02:00
=?UTF-8?q?Martin=20Str=C3=B6m?= 847a1ddff4 [vgtv] Add support for new tv.aftonbladet.se URL schema (#28514)
Co-authored-by: Sergey M <dstftw@gmail.com>
2021-03-28 22:24:36 +02:00
Lauren Liberda 64f7b37d8e [tiktok] detect private videos 2021-03-28 22:24:13 +02:00
Lauren Liberda 2404fc148e --ie-key cli option 2021-03-28 22:01:33 +02:00
Lauren Liberda 9aa7e4481b fix dw:article, refactor dw 2021-03-28 21:45:08 +02:00
Lauren Liberda 2e7f27f566 + patronite audio extractor 2021-03-24 16:40:15 +01:00
Dominika Liberda 7ba6fd5e2c * version 2021.03.21 2021-03-21 03:33:30 +01:00
Lauren Liberda d2d859b0cb changelog update 2021-03-21 03:17:30 +01:00
Lauren Liberda 1644003935 [youtube] meaningful error for age-gated no-embed videos 2021-03-21 02:40:20 +01:00
Lauren Liberda d7455472c7 - removed tvnplayer extractor 2021-03-21 00:08:12 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= a688593c71 [yandexmusic:playlist] Request missing tracks in chunks (closes #2735…
…5, closes #28184)
2021-03-20 20:40:49 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ce1c406432 [yandexmusic:album] Simplify 2021-03-20 20:40:43 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ef06ab2626 [yandexmusic] Add support for music.yandex.com (closes #27425) 2021-03-20 20:40:37 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 5403f15eca [yandexmusic] DRY _VALID_URL base 2021-03-20 20:40:31 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 11e7d9a9bc [yandexmusic:album] Improve album title extraction (closes #27418) 2021-03-20 20:40:26 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ad4946376d [yandexmusic] Refactor and add support for artist's tracks and albums…
… (closes #11887, closes #22284)
2021-03-20 20:40:20 +01:00
Lauren Liberda fae71efe4b [peertube] improve thumbnail extraction
Original author: remitamine
2021-03-20 20:35:40 +01:00
Lauren Liberda 2a36637212 [PATCH] [vimeo:album] Fix extraction for albums with number of videos multiple to page size
Original author: dstftw
2021-03-20 20:32:05 +01:00
Remita Amine 051da7778d [vvvvid] fix kenc format extraction(closes #28473) 2021-03-20 20:29:36 +01:00
Remita Amine 6faaa046ba [mlb] fix video extracion(#21241) 2021-03-20 20:29:29 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 3216bd2742 [svtplay] Improve extraction (closes #28448) 2021-03-20 20:29:24 +01:00
Remita Amine 8210d0d578 [applepodcasts] fix extraction(closes #28445) 2021-03-20 20:29:16 +01:00
Remita Amine 1df8de409f [rtve] improve extraction
- extract all formats
- fix RTVE Infantil extraction(closes #24851)
- extract is_live and series
2021-03-20 20:29:11 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= bc2dfba575 [southpark] Fix extraction and add support for southparkstudios.com (…
…closes #26763, closes #28413)
2021-03-20 20:29:05 +01:00
Remita Amine 7e5f6863ca [sportdeutschland] fix extraction(closes #21856)(closes #28425) 2021-03-20 20:28:59 +01:00
Remita Amine 8e580fb912 [pinterest] reduce the number of HLS format requests 2021-03-20 20:28:52 +01:00
Remita Amine a84bff7941 [tver] improve title extraction(closes #28418) 2021-03-20 20:28:41 +01:00
Remita Amine c07c6fd0bf [fujitv] fix HLS formats extension(closes #28416) 2021-03-20 20:28:35 +01:00
Remita Amine 0bf5bb20bb [shahid] fix format extraction(closes #28383) 2021-03-20 20:28:28 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 19f1ef28f1 [bandcamp] Extract release_timestamp 2021-03-20 20:28:21 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 06a0a2404e Introduce release_timestamp meta field (refs #28386) 2021-03-20 20:28:11 +01:00
Lauren Liberda b7c5d42047 [pornhub] Detect flagged videos
Original author: dstftw
2021-03-20 20:27:13 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= 8332796684 [pornhub] Extract formats from get_media end point (#28395) 2021-03-20 20:25:13 +01:00
Remita Amine fd211154d3 [bilibili] fix video info extraction(closes #28341) 2021-03-20 20:25:07 +01:00
Remita Amine e6efc4cc87 [cbs] add support for Paramount+ (closes #28342) 2021-03-20 20:25:02 +01:00
Remita Amine 9f9d5f98fd [trovo] Add Origin header to VOD formats(closes #28346) 2021-03-20 20:24:57 +01:00
Remita Amine 6e95b224c2 [voxmedia] fix volume embed extraction(closes #28338) 2021-03-20 20:24:52 +01:00
Remita Amine 0eab1a6949 [9c9media] fix extraction for videos with multiple ContentPackages(cl…
…oses #28309)
2021-03-20 20:24:44 +01:00
Remita Amine a28058ddeb [bbc] correct catched exception type 2021-03-20 20:24:38 +01:00
dirkf 62d5e81ff1 [bbc] add support for BBC Reel videos(closes #21870, closes #23660, c…
…loses #28268)
2021-03-20 20:24:32 +01:00
=?UTF-8?q?Sergey=20M=E2=80=A4?= ef668c9585 [zdf] Rework extractors (closes #11606, closes #13473, closes #17354,…
closes #21185, closes #26711, closes #27068, closes #27930, closes #28198, closes #28199, closes #28274)

* Generalize unique video ids for zdf based extractors
* Improve extraction
* Fix 3sat and phoenix
2021-03-20 20:24:19 +01:00
Lauren Liberda 63755989fc fix the patch hook 2021-03-20 20:23:58 +01:00
Remita Amine f67e11c888 [stretchinternet] Fix extraction(closes #28297) 2021-03-20 20:20:26 +01:00
Remita Amine 28d7757c8b [urplay] fix episode data extraction(closes #28292) 2021-03-20 20:20:03 +01:00
Remita Amine d49b9356ce [bandaichannel] Add new extractor(closes #21404) 2021-03-20 20:19:49 +01:00
Lauren Liberda efffe9e670 [tvp:embed] extracting video subtitles 2021-03-20 16:13:40 +01:00
Lauren Liberda a4a4af8546 fix m3u8 parsing test 2021-03-16 22:23:26 +01:00
Lauren Liberda 8e8af58d04 fix possible crash 2021-03-16 22:23:13 +01:00
Lauren Liberda 4ddca367de support for vtt subtitles in m3u8 manifests 2021-03-16 21:34:15 +01:00
Lauren Liberda 3e7425297f [pulsevideo] unduplicating formats 2021-03-16 15:23:46 +01:00
Lauren Liberda 3b151afce7 [polskieradio] radiokierowcow.pl extractor 2021-03-16 14:57:21 +01:00
Lauren Liberda 999ab0298b [youtube] some formats are now just static 2021-03-15 23:50:56 +01:00
Lauren Liberda 510512606a [youtube] better signature handling for DASH formats 2021-03-15 23:25:24 +01:00
Lauren Liberda a8e3f00134 [generic] extracting mpd manifests properly 2021-03-14 00:15:17 +01:00
Lauren Liberda ca57ada0fc + bittorrent magnet extractor 2021-03-12 04:59:48 +01:00
Lauren Liberda ade6eb8abc [generic] detecting bittorrent manifest files 2021-03-12 04:10:40 +01:00
Lauren Liberda 6f3c4fd2f8 [peertube] bittorrent formats 2021-03-12 03:04:49 +01:00
Lauren Liberda 58538a2c64 initial bittorrent support 2021-03-12 03:04:37 +01:00
Lauren Liberda 3426d75467 [tiktok] hashtag and music extractors 2021-03-11 22:13:57 +01:00
Lauren Liberda 199edacd48 [onnetwork] refactor 2021-03-10 17:58:23 +01:00
Lauren Liberda 9e535b8762 [polskieradio] podcast support 2021-03-09 19:47:12 +01:00
Lauren Liberda 0b5407d6ec [youtube] more descriptive geo-lock messages (with countries) 2021-03-09 18:30:31 +01:00
Timothy Wynn c10469c0a8 Update go.py 2021-03-07 22:52:29 +01:00
Lauren Liberda 9759eb7182 removed a lot of deprecated platform support code 2021-03-05 22:45:56 +01:00
Lauren Liberda 5311710390 new exe build script 2021-03-05 21:12:35 +01:00
Lauren Liberda c42920795e [playwright] more verbose errors if --verbose 2021-03-05 15:41:47 +01:00
Lauren Liberda 0de898ecb5 [youtube] signature function caching 2021-03-04 21:05:12 +01:00
Lauren Liberda ec0abef671 fix links to ytdl issues 2021-03-04 14:22:51 +01:00
Lauren Liberda d5ad78cd0b pypy tests 2021-03-04 12:01:19 +01:00
Lauren Liberda 3e69892860 videotarget extractor 2021-03-03 23:15:44 +01:00
Lauren Liberda 3240e9f582 acast player extractor 2021-03-03 20:17:44 +01:00
Dominika Liberda ba5cda94c7 version 2021.03.01 2021-03-01 23:16:35 +01:00
Lauren Liberda 1786d6c1c4 [peertube] playlist, channel and account extractor 2021-03-01 21:44:26 +01:00
Lauren Liberda 0234f9eacc [cda] logging in with a user account 2021-03-01 18:18:43 +01:00
Laura Liberda 0f68e2ad09 remove some unused devscripts/docs 2021-02-27 02:32:47 +01:00
171 changed files with 7248 additions and 8234 deletions

2
.github/FUNDING.yml vendored Normal file
View file

@ -0,0 +1,2 @@
github: selfisekai
ko_fi: selfisekai

1
.gitignore vendored
View file

@ -15,6 +15,7 @@ haruhi-dl.1
haruhi-dl.bash-completion
haruhi-dl.fish
haruhi_dl/extractor/lazy_extractors.py
haruhi_dl/extractor_artifacts/
haruhi-dl
haruhi-dl.exe
haruhi-dl.tar.gz

View file

@ -1,8 +1,29 @@
default:
before_script:
- sed -i "s@dl-cdn.alpinelinux.org@alpine.sakamoto.pl@g" /etc/apk/repositories
- apk add bash
- pip install nose
pypy3.6-core:
image: pypy:3.6-slim
variables:
HDL_TEST_SET: core
before_script:
- apt-get update && apt-get install -y bash && apt-get clean
- pip install nose
script:
- ./devscripts/run_tests.sh
pypy3.7-core:
image: pypy:3.7-slim
variables:
HDL_TEST_SET: core
before_script:
- apt-get update && apt-get install -y bash && apt-get clean
- pip install nose
script:
- ./devscripts/run_tests.sh
py3.6-core:
image: python:3.6-alpine
variables:
@ -39,18 +60,6 @@ py3.9-download:
script:
- ./devscripts/run_tests.sh
#jython-core:
# image: openjdk:11-slim
# variables:
# HDL_TEST_SET: core
# allow_failure: true
# before_script:
# - apt-get update
# - apt-get install -y wget
# - ./devscripts/install_jython.sh
# - export PATH="$HOME/jython/bin:$PATH"
# script:
# - ./devscripts/run_tests.sh
playwright-tests-core:
image: mcr.microsoft.com/playwright:focal

197
ChangeLog
View file

@ -1,3 +1,200 @@
version 2021.08.01
Extractor
* [youtube] fixed agegate
* [niconico] dmc downloader from youtube-dlp
* [peertube] new URL schemas
version 2021.06.20
Core
* [playwright] fixed headlessness
+ [playwright] option to force a specific browser
Extractor
* [tiktok] fix empty video lists
* [youtube] fix and speed-up age-gate circumvention
* [youtube] fix videos with JS-like syntax
version 2021.06.01
Core
* merging formats by codecs
* [json_ld] better author extraction
+ --force-use-mastodon option
* support for HTTP 308 redirects
+ [test_execution] add test for lazy extractors
* Improve extract_info doc
* [options] Fix thumbnail option group name
Extractor
* [tvp:series] fallback to web
- [ninateka] remove extractor
* [tvn24] refactor handling next.js frontend
* [cda] fix premium videos for premium users (?)
* [tvp] support for tvp.info vue.js pages
+ [sejm.gov.pl] new extractors
+ [senat.gov.pl] new extractors
* [spreaker] new url schemes
* [spreaker] support for embedded player
+ [spryciarze.pl] new extractors
+ [castos] new extractors
+ [magentamusik360] new extractor
+ [arnes] new extractor
+ [palcomp3] new extractor
* [screencastomatic] fix extraction
* [youku] update ccode
+ [line] support live.line.me
* [curiositystream] fix format extraction
* [jamendo] fix track extraction
* [pornhub] extracting DASH and HLS formats
* [mtv] fix Viacom A/B testing video player
+ [maoritv] new extractor
* [pluralsight] extend anti-throttling timeout
* [mastodon] support for soapbox and audio files
* [tvp] fix jp2.tvp.pl
* [youtube:channel] fix multiple page extraction
* [tvp:embed] handling formats better way
* [tvn] better extraction method choosing
* [tvp] fix tvp:website extracting with weird urls
+ [wppilot] new extractors
+ [mastodon] logging in to mastodon/pleroma
+ [mastodon] fetching posts via different instances
+ [mastodon] fetching peertube videos via pleroma instances
* [bbc] extract full description from __INITIAL_DATA__
* [tver] redirect all downloads to Brightcove
* [medaltv] fix extraction
* [francetvinfo] improve video id extraction
* [xfileshare] support for wolfstream.tv
* [tv2dk] fix extraction
* [svtplay] improve extraction
* [xtube] fix formats extraction
* [twitter] improve formats extraction from vmap URL
* [mastodon] cache apps on logging in
* [mastodon] support cards to external services
* [peertube] logging in
* [tiktok] deduplicate videos
+ [misskey] new extractor
+ [radiokapital] new extractors
* [youtube] fix videos with age gate
* [kaltura] Make embed code alternatives actually work
* [kaltura] Improve iframe extraction
* [dispeak] Improve FLV extraction
* [dispeak] DRY and update tests
* [gdcvault] Add support for HTML5 videos
* [funimation] Add support for optional lang code in URLs
* [medaltv] Relax _VALID_URL
- [blinkx] Remove extractor
* [orf:radio] Switch download URLs to HTTPS
+ [generic] Add Referer header for direct videojs download URLs
+ [vk] Add support for sibnet embeds
+ [generic] Add support for sibnet embeds
* [phoenix] Fix extraction
* [generic] Add support for og:audio
* [vivo] Add support for vivo.st
* [eroprofile] Fix extraction
* [playstuff] Add extractor
* [shahid] relax _VALID_URL
* [redbulltv] fix embed data extraction
* [vimeo] fix vimeo pro embed extraction
* [twitch:clips] Add access token query to download URLs
* [twitch:clips] Improve extraction
* [ted] Prefer own formats over external sources
* [ustream] Detect https embeds
* [ard] Relax _VALID_URL and fix video ids
version 2021.04.01
Core
- Removed Herobrine
Extractor
* [youtube] fixed GDPR consent workaround
* [instagram] improve title extraction and extract duration
* [francetvinfo] improve video ID extraction
* [vlive] merge all updates from YTDL
version 2021.03.30
Core
* `--ie-key` commandline option for selecting specific extractor
Extractor
* [tiktok] detect private videos
* [dw:article] fix extractor
+ [patroniteaudio] added extractor
+ [sbs] Add support for ondemand watch URLs
* [picarto] Fix live stream extraction
* [vimeo] Fix unlisted video extraction
* [ard] Improve clip id extraction
+ [zoom] Add support for zoom.us
* [bbc] Fix BBC IPlayer Episodes/Group extraction
* [zingmp3] Fix extraction
* [youtube] added workaround for cookie consent
version 2021.03.21
Core
* [playwright] More verbose errors
- Removed a lot of deprecated platform support code
* New win32 exe build system
+ Support for BitTorrent formats
+ Support for VTT subtitles in m3u8 (HLS) manifests
+ `release_timestamp` meta field
Extractor
+ [acast:player] new extractor
+ [videotarget] new extractor
* [youtube] caching extracted signature functions
* [go] fix extraction
* [youtube] more descriptive geo-lock messages (with countries)
* [polskieradio] podcast support
* [onnetwork] refactored extraction
+ [tiktok] hashtag and music extractors
* [peertube] bittorrent formats
* [generic] detecting bittorrent manifest files
+ bittorrent magnet extractor
* [generic] extracting mpd manifests properly
* [youtube] better signature handling for DASH formats
* [youtube] some DASH formats are now just static files
+ [polskieradio] radiokierowcow.pl extractor
* [pulsevideo] unduplicating formats
+ [tvp:embed] extracting video subtitles
+ [bandaichannel] Add new extractor
* [urplay] fix episode data extraction
* [stretchinternet] Fix extraction
* [zdf] Rework extractors
+ [bbc] add support for BBC Reel videos
* [9c9media] fix extraction for videos with multiple ContentPackages
* [voxmedia] fix volume embed extraction
* [trovo] Add Origin header to VOD formats
* [cbs] add support for Paramount+
* [bilibili] fix video info extraction
* [pornhub] Extract formats from get_media end point
* [pornhub] Detect flagged videos
* [bandcamp] Extract release_timestamp
* [shahid] fix format extraction
* [fujitv] fix HLS formats extension
* [tver] improve title extraction
* [pinterest] reduce the number of HLS format requests
* [sportdeutschland] fix extraction
* [southpark] Fix extraction and add support for southparkstudios.com
* [rtve] improve extraction
* [applepodcasts] fix extraction
* [svtplay] Improve extraction
* [mlb] fix video extracion
* [vvvvid] fix kenc format extraction
* [vimeo:album] Fix extraction for albums with number of videos multiple to page size
* [peertube] improve thumbnail extraction
* [yandexmusic] Refactor and add support for artist's tracks and albums
* [yandexmusic:album] Improve album title extraction
* [yandexmusic] DRY _VALID_URL base
* [yandexmusic] Add support for music.yandex.com
* [yandexmusic:playlist] Request missing tracks in chunks
- [tvnplayer] removed extractor
* [youtube] meaningful error for age-gated no-embed videos
version 2021.03.01
Extractor
* [cda] logging in with a user account
* [peertube] playlist, channel and account extractor
version 2021.02.27
Core
+ Use proxy sites option

View file

@ -9,7 +9,7 @@ PREFIX ?= /usr/local
BINDIR ?= $(PREFIX)/bin
MANDIR ?= $(PREFIX)/man
SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python
PYTHON ?= /usr/bin/env python3
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)

View file

@ -14,30 +14,84 @@ A Microsoft GitHub mirror exists as well: https://github.com/haruhi-dl/haruhi-dl
## Installing
haruhi-dl is available on PyPI: [![version on PyPI](https://img.shields.io/pypi/v/haruhi-dl?style=flat-square)](https://pypi.org/project/haruhi-dl/)
System-specific ways:
- [Windows .exe files](https://git.sakamoto.pl/laudompat/haruhi-dl/-/releases) ([mirror](https://github.com/haruhi-dl/haruhi-dl/releases)) - just unpack and run the exe file in cmd/powershell! (ffmpeg/rtmpdump not included, playwright extractors won't work)
- [Arch Linux (AUR)](https://aur.archlinux.org/packages/haruhi-dl/) - `yay -S haruhi-dl` (managed by mlunax)
- [macOS (homebrew)](https://formulae.brew.sh/formula/haruhi-dl) - `brew install haruhi-dl` (managed by Homebrew)
haruhi-dl is also available on PyPI: [![version on PyPI](https://img.shields.io/pypi/v/haruhi-dl?style=flat-square)](https://pypi.org/project/haruhi-dl/)
Install release from PyPI on Python 3.x:
```sh
$ python3 -m pip install --upgrade haruhi-dl
```
Install from master (unstable) on Python 3.x:
```sh
$ python3 -m pip install --upgrade git+https://git.sakamoto.pl/laudompat/haruhi-dl.git
```
**Python 2 support is dropped and we recommend to switch to Python 3**, though it may still work.
**Python 2 support is dropped, use Python 3.**
## Usage
```sh
$ haruhi-dl "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
```
That's it! You just got rickrolled!
Full manual with all options:
```sh
$ haruhi-dl --help
```
## Differences from youtube-dl
_This is not a complete list._
- Changed license from Unlicense to LGPL 3.0
- Extracting and downloading video with subtitles from m3u8 (HLS) - this also includes subtitles from Twitter and some other services
- Support for BitTorrent protocol (only used when explicitly enabled by user with `--allow-p2p` or `--prefer-p2p`; aria2c required)
- Specific way to handle selfhosted services (untied to specific providers/domains, like PeerTube, Funkwhale, Mastodon)
- Specific way to handle content proxy sites (like Nitter for Twitter)
- Merging formats by codecs instead of file extensions, if possible (you'd rather like your AV1+opus downloads from YouTube to be .webm, than .mkv, don't you?)
- New/improved/fixed extractors:
- PeerTube (extracting playlists, channels and user accounts, optionally downloading with BitTorrent)
- Funkwhale
- TikTok (extractors for user profiles, hashtags and music - all except single video and music with `--no-playlist` require Playwright)
- cda.pl
- Ipla
- Weibo (DASH formats)
- LinkedIn (videos from user posts)
- Acast
- Mastodon (including Pleroma, Gab Social, Soapbox)
- Ring Publishing (aka PulsEmbed, PulseVideo, OnetMVP; Ringier Axel Springer)
- TVP (support for TVPlayer2, client-rendered sites and TVP ABC, refactored some extractors to use mobile JSON API)
- TVN24 (support for main page, Fakty and magazine frontend)
- PolskieRadio
- Agora (wyborcza.pl video, wyborcza.pl/wysokieobcasy.pl/audycje.tokfm.pl podcasts, tuba.fm)
- sejm.gov.pl/senat.gov.pl
- Some improvements with handling JSON-LD
## Bug reports
Please send the bug details to <bug@haruhi.download> or on [Microsoft GitHub](https://github.com/haruhi-dl/haruhi-dl/issues).
## Contributing
If you want to contribute, send us a diff to <contribute@haruhi.download>, or submit a Pull Request on [our mirror at Microsoft GitHub](https://github.com/haruhi-dl/haruhi-dl).
Why contribute to this fork, and not youtube-dl?
- You make sure your contributions will always be free - under Unlicense, anyone can take your code, modify it, and close the source. LGPL 3.0 makes it clear, that any contributions must be published.
## Donations
If my contributions helped you, please consider sending me a small tip.
[![Buy Me a Coffee at ko-fi.com](https://cdn.ko-fi.com/cdn/kofi1.png?v=2)](https://ko-fi.com/selfisekai)

View file

@ -1,6 +1,6 @@
#!/bin/bash
data="$(curl -s "https://www.youtube.com/s/player/$1/player_ias.vflset/en_GB/base.js")"
func="$(grep -P '[a-z]\=a\.split.*a\.join' <<< "$data")"
func="$(grep -P '[a-z]\=a\.split\([\"'"'"']{2}.*a\.join' <<< "$data")"
echo "full extracted function: $func"
obfuscatedName="$(grep -Poh '\(""\);[A-Za-z]+' <<< "$func" | sed -s 's/("");//')"

View file

@ -1,433 +0,0 @@
#!/usr/bin/python3
import argparse
import ctypes
import functools
import shutil
import subprocess
import sys
import tempfile
import threading
import traceback
import os.path
sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__)))))
from haruhi_dl.compat import (
compat_input,
compat_http_server,
compat_str,
compat_urlparse,
)
# These are not used outside of buildserver.py thus not in compat.py
try:
import winreg as compat_winreg
except ImportError: # Python 2
import _winreg as compat_winreg
try:
import socketserver as compat_socketserver
except ImportError: # Python 2
import SocketServer as compat_socketserver
class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer):
allow_reuse_address = True
advapi32 = ctypes.windll.advapi32
SC_MANAGER_ALL_ACCESS = 0xf003f
SC_MANAGER_CREATE_SERVICE = 0x02
SERVICE_WIN32_OWN_PROCESS = 0x10
SERVICE_AUTO_START = 0x2
SERVICE_ERROR_NORMAL = 0x1
DELETE = 0x00010000
SERVICE_STATUS_START_PENDING = 0x00000002
SERVICE_STATUS_RUNNING = 0x00000004
SERVICE_ACCEPT_STOP = 0x1
SVCNAME = 'youtubedl_builder'
LPTSTR = ctypes.c_wchar_p
START_CALLBACK = ctypes.WINFUNCTYPE(None, ctypes.c_int, ctypes.POINTER(LPTSTR))
class SERVICE_TABLE_ENTRY(ctypes.Structure):
_fields_ = [
('lpServiceName', LPTSTR),
('lpServiceProc', START_CALLBACK)
]
HandlerEx = ctypes.WINFUNCTYPE(
ctypes.c_int, # return
ctypes.c_int, # dwControl
ctypes.c_int, # dwEventType
ctypes.c_void_p, # lpEventData,
ctypes.c_void_p, # lpContext,
)
def _ctypes_array(c_type, py_array):
ar = (c_type * len(py_array))()
ar[:] = py_array
return ar
def win_OpenSCManager():
res = advapi32.OpenSCManagerW(None, None, SC_MANAGER_ALL_ACCESS)
if not res:
raise Exception('Opening service manager failed - '
'are you running this as administrator?')
return res
def win_install_service(service_name, cmdline):
manager = win_OpenSCManager()
try:
h = advapi32.CreateServiceW(
manager, service_name, None,
SC_MANAGER_CREATE_SERVICE, SERVICE_WIN32_OWN_PROCESS,
SERVICE_AUTO_START, SERVICE_ERROR_NORMAL,
cmdline, None, None, None, None, None)
if not h:
raise OSError('Service creation failed: %s' % ctypes.FormatError())
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_uninstall_service(service_name):
manager = win_OpenSCManager()
try:
h = advapi32.OpenServiceW(manager, service_name, DELETE)
if not h:
raise OSError('Could not find service %s: %s' % (
service_name, ctypes.FormatError()))
try:
if not advapi32.DeleteService(h):
raise OSError('Deletion failed: %s' % ctypes.FormatError())
finally:
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_service_report_event(service_name, msg, is_error=True):
with open('C:/sshkeys/log', 'a', encoding='utf-8') as f:
f.write(msg + '\n')
event_log = advapi32.RegisterEventSourceW(None, service_name)
if not event_log:
raise OSError('Could not report event: %s' % ctypes.FormatError())
try:
type_id = 0x0001 if is_error else 0x0004
event_id = 0xc0000000 if is_error else 0x40000000
lines = _ctypes_array(LPTSTR, [msg])
if not advapi32.ReportEventW(
event_log, type_id, 0, event_id, None, len(lines), 0,
lines, None):
raise OSError('Event reporting failed: %s' % ctypes.FormatError())
finally:
advapi32.DeregisterEventSource(event_log)
def win_service_handler(stop_event, *args):
try:
raise ValueError('Handler called with args ' + repr(args))
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_set_status(handle, status_code):
svcStatus = SERVICE_STATUS()
svcStatus.dwServiceType = SERVICE_WIN32_OWN_PROCESS
svcStatus.dwCurrentState = status_code
svcStatus.dwControlsAccepted = SERVICE_ACCEPT_STOP
svcStatus.dwServiceSpecificExitCode = 0
if not advapi32.SetServiceStatus(handle, ctypes.byref(svcStatus)):
raise OSError('SetServiceStatus failed: %r' % ctypes.FormatError())
def win_service_main(service_name, real_main, argc, argv_raw):
try:
# args = [argv_raw[i].value for i in range(argc)]
stop_event = threading.Event()
handler = HandlerEx(functools.partial(stop_event, win_service_handler))
h = advapi32.RegisterServiceCtrlHandlerExW(service_name, handler, None)
if not h:
raise OSError('Handler registration failed: %s' %
ctypes.FormatError())
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_start(service_name, real_main):
try:
cb = START_CALLBACK(
functools.partial(win_service_main, service_name, real_main))
dispatch_table = _ctypes_array(SERVICE_TABLE_ENTRY, [
SERVICE_TABLE_ENTRY(
service_name,
cb
),
SERVICE_TABLE_ENTRY(None, ctypes.cast(None, START_CALLBACK))
])
if not advapi32.StartServiceCtrlDispatcherW(dispatch_table):
raise OSError('ctypes start failed: %s' % ctypes.FormatError())
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def main(args=None):
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--install',
action='store_const', dest='action', const='install',
help='Launch at Windows startup')
parser.add_argument('-u', '--uninstall',
action='store_const', dest='action', const='uninstall',
help='Remove Windows service')
parser.add_argument('-s', '--service',
action='store_const', dest='action', const='service',
help='Run as a Windows service')
parser.add_argument('-b', '--bind', metavar='<host:port>',
action='store', default='0.0.0.0:8142',
help='Bind to host:port (default %default)')
options = parser.parse_args(args=args)
if options.action == 'install':
fn = os.path.abspath(__file__).replace('v:', '\\\\vboxsrv\\vbox')
cmdline = '%s %s -s -b %s' % (sys.executable, fn, options.bind)
win_install_service(SVCNAME, cmdline)
return
if options.action == 'uninstall':
win_uninstall_service(SVCNAME)
return
if options.action == 'service':
win_service_start(SVCNAME, main)
return
host, port_str = options.bind.split(':')
port = int(port_str)
print('Listening on %s:%d' % (host, port))
srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler)
thr = threading.Thread(target=srv.serve_forever)
thr.start()
compat_input('Press ENTER to shut down')
srv.shutdown()
thr.join()
def rmtree(path):
for name in os.listdir(path):
fname = os.path.join(path, name)
if os.path.isdir(fname):
rmtree(fname)
else:
os.chmod(fname, 0o666)
os.remove(fname)
os.rmdir(path)
class BuildError(Exception):
def __init__(self, output, code=500):
self.output = output
self.code = code
def __str__(self):
return self.output
class HTTPError(BuildError):
pass
class PythonBuilder(object):
def __init__(self, **kwargs):
python_version = kwargs.pop('python', '3.4')
python_path = None
for node in ('Wow6432Node\\', ''):
try:
key = compat_winreg.OpenKey(
compat_winreg.HKEY_LOCAL_MACHINE,
r'SOFTWARE\%sPython\PythonCore\%s\InstallPath' % (node, python_version))
try:
python_path, _ = compat_winreg.QueryValueEx(key, '')
finally:
compat_winreg.CloseKey(key)
break
except Exception:
pass
if not python_path:
raise BuildError('No such Python version: %s' % python_version)
self.pythonPath = python_path
super(PythonBuilder, self).__init__(**kwargs)
class GITInfoBuilder(object):
def __init__(self, **kwargs):
try:
self.user, self.repoName = kwargs['path'][:2]
self.rev = kwargs.pop('rev')
except ValueError:
raise BuildError('Invalid path')
except KeyError as e:
raise BuildError('Missing mandatory parameter "%s"' % e.args[0])
path = os.path.join(os.environ['APPDATA'], 'Build archive', self.repoName, self.user)
if not os.path.exists(path):
os.makedirs(path)
self.basePath = tempfile.mkdtemp(dir=path)
self.buildPath = os.path.join(self.basePath, 'build')
super(GITInfoBuilder, self).__init__(**kwargs)
class GITBuilder(GITInfoBuilder):
def build(self):
try:
subprocess.check_output(['git', 'clone', 'git://github.com/%s/%s.git' % (self.user, self.repoName), self.buildPath])
subprocess.check_output(['git', 'checkout', self.rev], cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(GITBuilder, self).build()
class HaruhiDLBuilder(object):
authorizedUsers = ['fraca7', 'phihag', 'rg3', 'FiloSottile', 'ytdl-org']
def __init__(self, **kwargs):
if self.repoName != 'haruhi-dl':
raise BuildError('Invalid repository "%s"' % self.repoName)
if self.user not in self.authorizedUsers:
raise HTTPError('Unauthorized user "%s"' % self.user, 401)
super(HaruhiDLBuilder, self).__init__(**kwargs)
def build(self):
try:
proc = subprocess.Popen([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'], stdin=subprocess.PIPE, cwd=self.buildPath)
proc.wait()
#subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
# cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(HaruhiDLBuilder, self).build()
class DownloadBuilder(object):
def __init__(self, **kwargs):
self.handler = kwargs.pop('handler')
self.srcPath = os.path.join(self.buildPath, *tuple(kwargs['path'][2:]))
self.srcPath = os.path.abspath(os.path.normpath(self.srcPath))
if not self.srcPath.startswith(self.buildPath):
raise HTTPError(self.srcPath, 401)
super(DownloadBuilder, self).__init__(**kwargs)
def build(self):
if not os.path.exists(self.srcPath):
raise HTTPError('No such file', 404)
if os.path.isdir(self.srcPath):
raise HTTPError('Is a directory: %s' % self.srcPath, 401)
self.handler.send_response(200)
self.handler.send_header('Content-Type', 'application/octet-stream')
self.handler.send_header('Content-Disposition', 'attachment; filename=%s' % os.path.split(self.srcPath)[-1])
self.handler.send_header('Content-Length', str(os.stat(self.srcPath).st_size))
self.handler.end_headers()
with open(self.srcPath, 'rb') as src:
shutil.copyfileobj(src, self.handler.wfile)
super(DownloadBuilder, self).build()
class CleanupTempDir(object):
def build(self):
try:
rmtree(self.basePath)
except Exception as e:
print('WARNING deleting "%s": %s' % (self.basePath, e))
super(CleanupTempDir, self).build()
class Null(object):
def __init__(self, **kwargs):
pass
def start(self):
pass
def close(self):
pass
def build(self):
pass
class Builder(PythonBuilder, GITBuilder, HaruhiDLBuilder, DownloadBuilder, CleanupTempDir, Null):
pass
class BuildHTTPRequestHandler(compat_http_server.BaseHTTPRequestHandler):
actionDict = {'build': Builder, 'download': Builder} # They're the same, no more caching.
def do_GET(self):
path = compat_urlparse.urlparse(self.path)
paramDict = dict([(key, value[0]) for key, value in compat_urlparse.parse_qs(path.query).items()])
action, _, path = path.path.strip('/').partition('/')
if path:
path = path.split('/')
if action in self.actionDict:
try:
builder = self.actionDict[action](path=path, handler=self, **paramDict)
builder.start()
try:
builder.build()
finally:
builder.close()
except BuildError as e:
self.send_response(e.code)
msg = compat_str(e).encode('UTF-8')
self.send_header('Content-Type', 'text/plain; charset=UTF-8')
self.send_header('Content-Length', len(msg))
self.end_headers()
self.wfile.write(msg)
else:
self.send_response(500, 'Unknown build method "%s"' % action)
else:
self.send_response(500, 'Malformed URL')
if __name__ == '__main__':
main()

View file

@ -5,6 +5,17 @@
module.exports = function patchHook(patchContent) {
[
[/(?:youtube-|yt-?)dl\.org/g, 'haruhi.download'],
// fork: https://github.com/blackjack4494/yt-dlc
[/youtube_dlc/g, 'haruhi_dl'],
[/youtube-dlc/g, 'haruhi-dl'],
[/ytdlc/g, 'hdl'],
[/yt-dlc/g, 'hdl'],
// fork: https://github.com/yt-dlp/yt-dlp
[/yt_dlp/g, 'haruhi_dl'],
[/yt-dlp/g, 'haruhi-dl'],
[/ytdlp/g, 'hdl'],
[/youtube_dl/g, 'haruhi_dl'],
[/youtube-dl/g, 'haruhi-dl'],
[/youtubedl/g, 'haruhidl'],
@ -14,8 +25,10 @@ module.exports = function patchHook(patchContent) {
[/ydl/g, 'hdl'],
// prevent from linking to non-existent repository
[/github\.com\/ytdl-org\/haruhi-dl/g, 'github.com/ytdl-org/youtube-dl'],
[/github\.com\/(?:yt|h)dl-org\/haruhi-dl/g, 'github.com/ytdl-org/youtube-dl'],
[/github\.com\/rg3\/haruhi-dl/g, 'github.com/ytdl-org/youtube-dl'],
[/github\.com\/blackjack4494\/hdl/g, 'github.com/blackjack4494/yt-dlc'],
[/github\.com\/hdl\/hdl/g, 'github.com/yt-dlp/yt-dlp'],
// prevent changing the smuggle URLs (for compatibility with ytdl)
[/__haruhidl_smuggle/g, '__youtubedl_smuggle'],
].forEach(([regex, replacement]) => patchContent = patchContent.replace(regex, replacement));

View file

@ -1,5 +0,0 @@
#!/bin/bash
wget https://repo1.maven.org/maven2/org/python/jython-installer/2.7.2/jython-installer-2.7.2.jar
java -jar jython-installer-2.7.2.jar -s -d "$HOME/jython"
$HOME/jython/bin/jython -m pip install nose

View file

@ -1,33 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
import re
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
readme = inf.read()
bug_text = re.search(
r'(?s)#\s*BUGS\s*[^\n]*\s*(.*?)#\s*COPYRIGHT', readme).group(1)
dev_text = re.search(
r'(?s)(#\s*DEVELOPER INSTRUCTIONS.*?)#\s*EMBEDDING YOUTUBE-DL',
readme).group(1)
out = bug_text + dev_text
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View file

@ -1,29 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
issue_template_tmpl = inf.read()
# Get the version from haruhi_dl/version.py without importing the package
exec(compile(open('haruhi_dl/version.py').read(),
'haruhi_dl/version.py', 'exec'))
out = issue_template_tmpl % {'version': locals()['__version__']}
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View file

@ -1,26 +0,0 @@
from __future__ import unicode_literals
import io
import sys
import re
README_FILE = 'README.md'
helptext = sys.stdin.read()
if isinstance(helptext, bytes):
helptext = helptext.decode('utf-8')
with io.open(README_FILE, encoding='utf-8') as f:
oldreadme = f.read()
header = oldreadme[:oldreadme.index('# OPTIONS')]
footer = oldreadme[oldreadme.index('# CONFIGURATION'):]
options = helptext[helptext.index(' General Options:') + 19:]
options = re.sub(r'(?m)^ (\w.+)$', r'## \1', options)
options = '# OPTIONS\n' + options + '\n'
with io.open(README_FILE, 'w', encoding='utf-8') as f:
f.write(header)
f.write(options)
f.write(footer)

View file

@ -1,46 +0,0 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
import os
import sys
# Import haruhi_dl
ROOT_DIR = os.path.join(os.path.dirname(__file__), '..')
sys.path.insert(0, ROOT_DIR)
import haruhi_dl
def main():
parser = optparse.OptionParser(usage='%prog OUTFILE.md')
options, args = parser.parse_args()
if len(args) != 1:
parser.error('Expected an output filename')
outfile, = args
def gen_ies_md(ies):
for ie in ies:
ie_md = '**{0}**'.format(ie.IE_NAME)
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
continue
if ie_desc is not None:
ie_md += ': {0}'.format(ie.IE_DESC)
if not ie.working():
ie_md += ' (Currently broken)'
yield ie_md
ies = sorted(haruhi_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower())
out = '# Supported sites\n' + ''.join(
' - ' + md + '\n'
for md in gen_ies_md(ies))
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View file

@ -0,0 +1,32 @@
# this is intended to speed-up some extractors,
# which sometimes need to extract some data that doesn't change very much often,
# but it does on random times, like youtube's signature "crypto" or soundcloud's client id
import os
from os.path import dirname as dirn
import sys
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
from haruhi_dl import HaruhiDL
from haruhi_dl.utils import (
ExtractorError,
)
hdl = HaruhiDL(params={
'quiet': True,
})
artifact_dir = os.path.join(dirn(dirn((os.path.abspath(__file__)))), 'haruhi_dl', 'extractor_artifacts')
if not os.path.exists(artifact_dir):
os.mkdir(artifact_dir)
for ie_name in (
'Youtube',
'Soundcloud',
):
ie = hdl.get_info_extractor(ie_name)
try:
file_contents = ie._generate_prerelease_file()
with open(os.path.join(artifact_dir, ie_name.lower() + '.py'), 'w') as file:
file.write(file_contents)
except ExtractorError as err:
print(err)

View file

@ -1,141 +1,24 @@
#!/bin/bash
# IMPORTANT: the following assumptions are made
# * the GH repo is on the origin remote
# * the gh-pages branch is named so locally
# * the git config user.signingkey is properly set
# You will need
# pip install coverage nose rsa wheel
# TODO
# release notes
# make hash on local files
set -e
skip_tests=true
gpg_sign_commits=""
buildserver='localhost:8142'
while true
do
case "$1" in
--run-tests)
skip_tests=false
shift
;;
--gpg-sign-commits|-S)
gpg_sign_commits="-S"
shift
;;
--buildserver)
buildserver="$2"
shift 2
;;
--*)
echo "ERROR: unknown option $1"
exit 1
;;
*)
break
;;
esac
done
if [ -z "$1" ]; then echo "ERROR: specify version number like this: $0 1994.09.06"; exit 1; fi
version="$1"
major_version=$(echo "$version" | sed -n 's#^\([0-9]*\.[0-9]*\.[0-9]*\).*#\1#p')
if test "$major_version" '!=' "$(date '+%Y.%m.%d')"; then
echo "$version does not start with today's date!"
exit 1
if [[ "$(basename $(pwd))" == 'devscripts' ]]; then
cd ..
fi
if [ ! -z "`git tag | grep "$version"`" ]; then echo 'ERROR: version already present'; exit 1; fi
if [ ! -z "`git status --porcelain | grep -v CHANGELOG`" ]; then echo 'ERROR: the working directory is not clean; commit or stash changes'; exit 1; fi
useless_files=$(find haruhi_dl -type f -not -name '*.py')
if [ ! -z "$useless_files" ]; then echo "ERROR: Non-.py files in haruhi_dl: $useless_files"; exit 1; fi
if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi
if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; exit 1; fi
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
v="$(date "+%Y.%m.%d")"
read -p "Is ChangeLog up to date? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
/bin/echo -e "\n### First of all, testing..."
make clean
if $skip_tests ; then
echo 'SKIPPING TESTS'
else
nosetests --verbose --with-coverage --cover-package=haruhi_dl --cover-html test --stop || exit 1
if [[ "$(grep "'$v" haruhi_dl/version.py)" != '' ]]; then #' is this the first release of the day?
if [[ "$(grep -Poh '[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.[0-9]' haruhi_dl/version.py)" != '' ]]; then # so, 2nd or nth?
v="$v.$(($(cat haruhi_dl/version.py | grep -Poh '[0-9]{4}\.[0-9]{2}\.[0-9]{2}\.[0-9]' | grep -Poh '[0-9]+$')+1))"
else
v="$v.1"
fi
fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" haruhi_dl/version.py
sed "s/__version__ = '.*'/__version__ = '$v'/g" -i haruhi_dl/version.py
/bin/echo -e "\n### Changing version in ChangeLog..."
sed -i "s/<unreleased>/$version/" ChangeLog
/bin/echo -e "\n### Committing documentation, templates and haruhi_dl/version.py..."
make README.md CONTRIBUTING.md issuetemplates supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE/1_broken_site.md .github/ISSUE_TEMPLATE/2_site_support_request.md .github/ISSUE_TEMPLATE/3_site_feature_request.md .github/ISSUE_TEMPLATE/4_bug_report.md .github/ISSUE_TEMPLATE/5_feature_request.md .github/ISSUE_TEMPLATE/6_question.md docs/supportedsites.md haruhi_dl/version.py ChangeLog
git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."
git tag -s -m "Release $version" "$version"
git show "$version"
read -p "Is it good, can I push? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
echo
MASTER=$(git rev-parse --abbrev-ref HEAD)
git push origin $MASTER:master
git push origin "$version"
/bin/echo -e "\n### OK, now it is time to build the binaries..."
REV=$(git rev-parse HEAD)
make haruhi-dl haruhi-dl.tar.gz
read -p "VM running? (y/n) " -n 1
wget "http://$buildserver/build/ytdl-org/haruhi-dl/haruhi-dl.exe?rev=$REV" -O haruhi-dl.exe
mkdir -p "build/$version"
mv haruhi-dl haruhi-dl.exe "build/$version"
mv haruhi-dl.tar.gz "build/$version/haruhi-dl-$version.tar.gz"
RELEASE_FILES="haruhi-dl haruhi-dl.exe haruhi-dl-$version.tar.gz"
(cd build/$version/ && md5sum $RELEASE_FILES > MD5SUMS)
(cd build/$version/ && sha1sum $RELEASE_FILES > SHA1SUMS)
(cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS)
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
/bin/echo -e "\n### Signing and uploading the new binaries to GitHub..."
for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
ROOT=$(pwd)
python devscripts/create-github-release.py ChangeLog $version "$ROOT/build/$version"
#ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
/bin/echo -e "\n### Now switching to gh-pages..."
git clone --branch gh-pages --single-branch . build/gh-pages
(
set -e
ORIGIN_URL=$(git config --get remote.origin.url)
cd build/gh-pages
"$ROOT/devscripts/gh-pages/add-version.py" $version
"$ROOT/devscripts/gh-pages/update-feed.py"
"$ROOT/devscripts/gh-pages/sign-versions.py" < "$ROOT/updates_key.pem"
"$ROOT/devscripts/gh-pages/generate-download.py"
"$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update
git commit $gpg_sign_commits -m "release $version"
git push "$ROOT" gh-pages
git push "$ORIGIN_URL" gh-pages
)
rm -rf build
make pypi-files
echo "Uploading to PyPi ..."
python setup.py sdist bdist_wheel upload
make clean
/bin/echo -e "\n### DONE!"
python3 setup.py build_lazy_extractors
python3 devscripts/prerelease_codegen.py
rm -R build dist
python3 setup.py sdist bdist_wheel
python3 -m twine upload dist/*
devscripts/wine-py2exe.sh setup.py

View file

@ -2,7 +2,8 @@
# Run with as parameter a setup.py that works in the current directory
# e.g. no os.chdir()
# It will run twice, the first time will crash
# Wine >=6.3 required: https://bugs.winehq.org/show_bug.cgi?id=3591
set -e
@ -10,36 +11,30 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
if [ ! -d wine-py2exe ]; then
sudo apt-get install wine1.3 axel bsdiff
mkdir wine-py2exe
cd wine-py2exe
export WINEPREFIX=`pwd`
axel -a "http://www.python.org/ftp/python/2.7/python-2.7.msi"
axel -a "http://downloads.sourceforge.net/project/py2exe/py2exe/0.6.9/py2exe-0.6.9.win32-py2.7.exe"
#axel -a "http://winetricks.org/winetricks"
echo "Downloading Python 3.8.8"
aria2c "https://www.python.org/ftp/python/3.8.8/python-3.8.8.exe"
# this will need to be upgraded when switching to a newer version of python
winetricks win7
# http://appdb.winehq.org/objectManager.php?sClass=version&iId=21957
echo "Follow python setup on screen"
wine msiexec /i python-2.7.msi
echo "Installing Python 3.8.8"
wine python-3.8.8.exe /quiet InstallAllUsers=1 'DefaultAllUsersTargetDir=C:\\python38'
echo "Follow py2exe setup on screen"
wine py2exe-0.6.9.win32-py2.7.exe
echo "Installing py2exe"
wine 'C:\\python38\\python.exe' -m pip install wheel
wine 'C:\\python38\\python.exe' -m pip install py2exe
#wine 'C:\\python38\\python.exe' -m pip install playwright===1.9.0
#wine 'C:\\python38\\python.exe' -m playwright install
#echo "Follow Microsoft Visual C++ 2008 Redistributable Package setup on screen"
#bash winetricks vcrun2008
rm py2exe-0.6.9.win32-py2.7.exe
rm python-2.7.msi
#rm winetricks
# http://bugs.winehq.org/show_bug.cgi?id=3591
mv drive_c/Python27/Lib/site-packages/py2exe/run.exe drive_c/Python27/Lib/site-packages/py2exe/run.exe.backup
bspatch drive_c/Python27/Lib/site-packages/py2exe/run.exe.backup drive_c/Python27/Lib/site-packages/py2exe/run.exe "$SCRIPT_DIR/SizeOfImage.patch"
mv drive_c/Python27/Lib/site-packages/py2exe/run_w.exe drive_c/Python27/Lib/site-packages/py2exe/run_w.exe.backup
bspatch drive_c/Python27/Lib/site-packages/py2exe/run_w.exe.backup drive_c/Python27/Lib/site-packages/py2exe/run_w.exe "$SCRIPT_DIR/SizeOfImage_w.patch"
rm python-3.8.8.exe
cd -
@ -49,8 +44,8 @@ else
fi
wine "C:\\Python27\\python.exe" "$1" py2exe > "py2exe.log" 2>&1 || true
echo '# Copying python27.dll' >> "py2exe.log"
cp "$WINEPREFIX/drive_c/windows/system32/python27.dll" build/bdist.win32/winexe/bundle-2.7/
wine "C:\\Python27\\python.exe" "$1" py2exe >> "py2exe.log" 2>&1
mkdir -p build/bdist.win32/winexe/bundle-3.8/
# cp "$WINEPREFIX/drive_c/python38/python38.dll" build/bdist.win32/winexe/bundle-3.8/
echo "Making the exe file"
# cannot be piped into a file: https://forum.winehq.org/viewtopic.php?t=33992
wine 'C:\\python38\\python.exe' "$1" py2exe | tee py2exe.log

1
docs/.gitignore vendored
View file

@ -1 +0,0 @@
_build/

View file

@ -1,177 +0,0 @@
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
clean:
rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/haruhi-dl.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/haruhi-dl.qhc"
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/haruhi-dl"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/haruhi-dl"
@echo "# devhelp"
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."

View file

@ -1,71 +0,0 @@
# coding: utf-8
#
# haruhi-dl documentation build configuration file, created by
# sphinx-quickstart on Fri Mar 14 21:05:43 2014.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys
import os
# Allows to import haruhi_dl
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# -- General configuration ------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'haruhi-dl'
copyright = u'2014, Ricardo Garcia Gonzalez'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
from haruhi_dl.version import __version__
version = __version__
# The full version, including alpha/beta/rc tags.
release = version
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'default'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Output file base name for HTML help builder.
htmlhelp_basename = 'haruhi-dldoc'

View file

@ -1,23 +0,0 @@
Welcome to haruhi-dl's documentation!
======================================
*haruhi-dl* is a command-line program to download videos from YouTube.com and more sites.
It can also be used in Python code.
Developer guide
---------------
This section contains information for using *haruhi-dl* from Python programs.
.. toctree::
:maxdepth: 2
module_guide
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View file

@ -1,67 +0,0 @@
Using the ``haruhi_dl`` module
===============================
When using the ``haruhi_dl`` module, you start by creating an instance of :class:`HaruhiDL` and adding all the available extractors:
.. code-block:: python
>>> from haruhi_dl import HaruhiDL
>>> hdl = HaruhiDL()
>>> hdl.add_default_info_extractors()
Extracting video information
----------------------------
You use the :meth:`HaruhiDL.extract_info` method for getting the video information, which returns a dictionary:
.. code-block:: python
>>> info = hdl.extract_info('http://www.youtube.com/watch?v=BaW_jenozKc', download=False)
[youtube] Setting language
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
>>> info['title']
'haruhi-dl test video "\'/\\ä↭𝕐'
>>> info['height'], info['width']
(720, 1280)
If you want to download or play the video you can get its url:
.. code-block:: python
>>> info['url']
'https://...'
Extracting playlist information
-------------------------------
The playlist information is extracted in a similar way, but the dictionary is a bit different:
.. code-block:: python
>>> playlist = hdl.extract_info('http://www.ted.com/playlists/13/open_source_open_world', download=False)
[TED] open_source_open_world: Downloading playlist webpage
...
>>> playlist['title']
'Open-source, open world'
You can access the videos in the playlist with the ``entries`` field:
.. code-block:: python
>>> for video in playlist['entries']:
... print('Video #%d: %s' % (video['playlist_index'], video['title']))
Video #1: How Arduino is open-sourcing imagination
Video #2: The year open data went worldwide
Video #3: Massive-scale online collaboration
Video #4: The art of asking
Video #5: How cognitive surplus will change the world
Video #6: The birth of Wikipedia
Video #7: Coding a better government
Video #8: The era of open innovation
Video #9: The currency of the new economy is trust

File diff suppressed because it is too large Load diff

View file

@ -60,6 +60,7 @@ from .utils import (
format_bytes,
formatSeconds,
GeoRestrictedError,
HaruhiDLError,
int_or_none,
ISO3166Utils,
locked_file,
@ -777,22 +778,38 @@ class HaruhiDL(object):
def extract_info(self, url, download=True, ie_key=None, extra_info={},
process=True, force_generic_extractor=False):
'''
Returns a list with a dictionary for each video we find.
If 'download', also downloads the videos.
extra_info is a dict containing the extra values to add to each result
'''
"""
Return a list with a dictionary for each video extracted.
Arguments:
url -- URL to extract
Keyword arguments:
download -- whether to download videos during extraction
ie_key -- extractor key hint
extra_info -- dictionary containing the extra values to add to each result
process -- whether to resolve all unresolved references (URLs, playlist items),
must be True for download to work.
force_generic_extractor -- force using the generic extractor
"""
if not ie_key and force_generic_extractor:
ie_key = 'Generic'
force_use_mastodon = self.params.get('force_use_mastodon')
if not ie_key and force_use_mastodon:
ie_key = 'MastodonSH'
if not ie_key:
ie_key = self.params.get('ie_key')
if ie_key:
ies = [self.get_info_extractor(ie_key)]
else:
ies = self._ies
for ie in ies:
if not ie.suitable(url):
if not force_use_mastodon and not ie.suitable(url):
continue
ie = self.get_info_extractor(ie.ie_key())
@ -906,7 +923,7 @@ class HaruhiDL(object):
# url_transparent. In such cases outer metadata (from ie_result)
# should be propagated to inner one (info). For this to happen
# _type of info should be overridden with url_transparent. This
# fixes issue from https://github.com/ytdl-org/haruhi-dl/pull/11163.
# fixes issue from https://github.com/ytdl-org/youtube-dl/pull/11163.
if new_result.get('_type') == 'url':
new_result['_type'] = 'url_transparent'
@ -914,7 +931,7 @@ class HaruhiDL(object):
new_result, download=download, extra_info=extra_info)
elif result_type in ('playlist', 'multi_video'):
# Protect from infinite recursion due to recursively nested playlists
# (see https://github.com/hdl-org/haruhi-dl/issues/27833)
# (see https://github.com/ytdl-org/youtube-dl/issues/27833)
webpage_url = ie_result['webpage_url']
if webpage_url in self._playlist_urls:
self.to_screen(
@ -1515,14 +1532,18 @@ class HaruhiDL(object):
if 'display_id' not in info_dict and 'id' in info_dict:
info_dict['display_id'] = info_dict['id']
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict['timestamp'])
info_dict['upload_date'] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError):
pass
for ts_key, date_key in (
('timestamp', 'upload_date'),
('release_timestamp', 'release_date'),
):
if info_dict.get(date_key) is None and info_dict.get(ts_key) is not None:
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
# see http://bugs.python.org/issue1646728)
try:
upload_date = datetime.datetime.utcfromtimestamp(info_dict[ts_key])
info_dict[date_key] = upload_date.strftime('%Y%m%d')
except (ValueError, OverflowError, OSError):
pass
# Auto generate title fields corresponding to the *_number fields when missing
# in order to always have clean titles. This is very common for TV series.
@ -1530,6 +1551,19 @@ class HaruhiDL(object):
if info_dict.get('%s_number' % field) is not None and not info_dict.get(field):
info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field])
# Some fragmented media manifests like m3u8 allow embedding subtitles
# This is a weird hack to provide these subtitles to users without a very huge refactor of extractors
if 'formats' in info_dict:
formats_subtitles = list(filter(lambda x: x.get('_subtitle'), info_dict['formats']))
if formats_subtitles:
info_dict.setdefault('subtitles', {})
for sub in formats_subtitles:
if sub['_key'] not in info_dict['subtitles']:
info_dict['subtitles'][sub['_key']] = []
info_dict['subtitles'][sub['_key']].append(sub['_subtitle'])
# remove these subtitles from formats now
info_dict['formats'] = list(filter(lambda x: '_subtitle' not in x, info_dict['formats']))
for cc_kind in ('subtitles', 'automatic_captions'):
cc = info_dict.get(cc_kind)
if cc:
@ -1537,6 +1571,12 @@ class HaruhiDL(object):
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if subtitle_format.get('protocol') is None:
subtitle_format['protocol'] = determine_protocol(subtitle_format)
if subtitle_format.get('http_headers') is None:
full_info = info_dict.copy()
full_info.update(subtitle_format)
subtitle_format['http_headers'] = self._calc_headers(full_info)
if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
@ -1649,7 +1689,7 @@ class HaruhiDL(object):
# by extractor are incomplete or not (i.e. whether extractor provides only
# video-only or audio-only formats) for proper formats selection for
# extractors with such incomplete formats (see
# https://github.com/ytdl-org/haruhi-dl/pull/5556).
# https://github.com/ytdl-org/youtube-dl/pull/5556).
# Since formats may be filtered during format selection and may not match
# the original formats the results may be incorrect. Thus original formats
# or pre-calculated metrics should be passed to format selection routines
@ -1657,7 +1697,7 @@ class HaruhiDL(object):
# We will pass a context object containing all necessary additional data
# instead of just formats.
# This fixes incorrect format selection issue (see
# https://github.com/ytdl-org/haruhi-dl/issues/10083).
# https://github.com/ytdl-org/youtube-dl/issues/10083).
incomplete_formats = (
# All formats are video-only or
all(f.get('vcodec') != 'none' and f.get('acodec') == 'none' for f in formats)
@ -1853,7 +1893,6 @@ class HaruhiDL(object):
# subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE
subtitles = info_dict['requested_subtitles']
ie = self.get_info_extractor(info_dict['extractor_key'])
for sub_lang, sub_info in subtitles.items():
sub_format = sub_info['ext']
sub_filename = subtitles_filename(filename, sub_lang, sub_format, info_dict.get('ext'))
@ -1864,7 +1903,7 @@ class HaruhiDL(object):
if sub_info.get('data') is not None:
try:
# Use newline='' to prevent conversion of newline characters
# See https://github.com/ytdl-org/haruhi-dl/issues/10268
# See https://github.com/ytdl-org/youtube-dl/issues/10268
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
subfile.write(sub_info['data'])
except (OSError, IOError):
@ -1872,10 +1911,8 @@ class HaruhiDL(object):
return
else:
try:
sub_data = ie._request_webpage(
sub_info['url'], info_dict['id'], note=False).read()
with io.open(encodeFilename(sub_filename), 'wb') as subfile:
subfile.write(sub_data)
subd = get_suitable_downloader(sub_info, self.params)(self, self.params)
subd.download(sub_filename, sub_info)
except (ExtractorError, IOError, OSError, ValueError) as err:
self.report_warning('Unable to download subtitle for "%s": %s' %
(sub_lang, error_to_compat_str(err)))
@ -1903,6 +1940,10 @@ class HaruhiDL(object):
fd.add_progress_hook(ph)
if self.params.get('verbose'):
self.to_screen('[debug] Invoking downloader on %r' % info.get('url'))
if info.get('protocol') == 'bittorrent' and not self.params.get('allow_p2p'):
raise HaruhiDLError('Peer-to-peer format got selected, but peer-to-peer '
'downloads are not allowed. '
'Choose different format or add --allow-p2p option')
return fd.download(name, info)
if info_dict.get('requested_formats') is not None:
@ -1919,8 +1960,32 @@ class HaruhiDL(object):
def compatible_formats(formats):
video, audio = formats
# Check extension
# Check extensions and codecs
video_ext, audio_ext = video.get('ext'), audio.get('ext')
video_codec, audio_codec = video.get('vcodec'), audio.get('acodec')
if video_codec and audio_codec:
COMPATIBLE_CODECS = {
'mp4': (
# fourcc (m3u8, mpd)
'av01', 'hevc', 'avc1', 'mp4a',
# whatever the ism does
'h264', 'aacl',
),
'webm': (
'av01', 'vp9', 'vp8', 'opus', 'vrbs',
# these are in the webm spec, so putting it here to be sure
'vp9x', 'vp8x',
),
}
video_codec = video_codec[:4].lower()
audio_codec = audio_codec[:4].lower()
for ext in COMPATIBLE_CODECS:
if all(codec in COMPATIBLE_CODECS[ext]
for codec in (video_codec, audio_codec)):
info_dict['ext'] = ext
return True
if video_ext and audio_ext:
COMPATIBLE_EXTS = (
('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'ismv', 'isma'),
@ -1929,7 +1994,6 @@ class HaruhiDL(object):
for exts in COMPATIBLE_EXTS:
if video_ext in exts and audio_ext in exts:
return True
# TODO: Check acodec/vcodec
return False
filename_real_ext = os.path.splitext(filename)[1][1:]
@ -2283,7 +2347,7 @@ class HaruhiDL(object):
return
if type('') is not compat_str:
# Python 2.6 on SLES11 SP1 (https://github.com/ytdl-org/haruhi-dl/issues/3326)
# Python 2.6 on SLES11 SP1 (https://github.com/ytdl-org/youtube-dl/issues/3326)
self.report_warning(
'Your Python is broken! Update to a newer and supported version')
@ -2377,7 +2441,7 @@ class HaruhiDL(object):
proxies = {'http': opts_proxy, 'https': opts_proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/ytdl-org/haruhi-dl/issues/805)
# Set HTTPS proxy to HTTP one if given (https://github.com/ytdl-org/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = PerRequestProxyHandler(proxies)
@ -2391,7 +2455,7 @@ class HaruhiDL(object):
# When passing our own FileHandler instance, build_opener won't add the
# default FileHandler and allows us to disable the file protocol, which
# can be used for malicious purposes (see
# https://github.com/ytdl-org/haruhi-dl/issues/8227)
# https://github.com/ytdl-org/youtube-dl/issues/8227)
file_handler = compat_urllib_request.FileHandler()
def file_open(*args, **kwargs):
@ -2403,7 +2467,7 @@ class HaruhiDL(object):
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
# (See https://github.com/ytdl-org/haruhi-dl/issues/1309 for details)
# (See https://github.com/ytdl-org/youtube-dl/issues/1309 for details)
opener.addheaders = []
self._opener = opener

View file

@ -50,7 +50,7 @@ from .HaruhiDL import HaruhiDL
def _real_main(argv=None):
# Compatibility fixes for Windows
if sys.platform == 'win32':
# https://github.com/ytdl-org/haruhi-dl/issues/820
# https://github.com/ytdl-org/youtube-dl/issues/820
codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)
workaround_optparse_bug9161()
@ -176,6 +176,10 @@ def _real_main(argv=None):
opts.max_sleep_interval = opts.sleep_interval
if opts.ap_mso and opts.ap_mso not in MSO_INFO:
parser.error('Unsupported TV Provider, use --ap-list-mso to get a list of supported TV Providers')
if opts.force_generic_extractor and opts.force_use_mastodon:
parser.error('force either generic extractor or Mastodon')
if opts.force_playwright_browser not in ('firefox', 'chromium', 'webkit', None):
parser.error('invalid browser forced, must be on of: firefox, chromium, webkit')
def parse_retries(retries):
if retries in ('inf', 'infinite'):
@ -348,6 +352,8 @@ def _real_main(argv=None):
'restrictfilenames': opts.restrictfilenames,
'ignoreerrors': opts.ignoreerrors,
'force_generic_extractor': opts.force_generic_extractor,
'force_use_mastodon': opts.force_use_mastodon,
'ie_key': opts.ie_key,
'ratelimit': opts.ratelimit,
'nooverwrites': opts.nooverwrites,
'retries': opts.retries,
@ -420,6 +426,7 @@ def _real_main(argv=None):
'headless_playwright': opts.headless_playwright,
'sleep_interval': opts.sleep_interval,
'max_sleep_interval': opts.max_sleep_interval,
'force_playwright_browser': opts.force_playwright_browser,
'external_downloader': opts.external_downloader,
'list_thumbnails': opts.list_thumbnails,
'playlist_items': opts.playlist_items,
@ -438,6 +445,8 @@ def _real_main(argv=None):
'geo_bypass': opts.geo_bypass,
'geo_bypass_country': opts.geo_bypass_country,
'geo_bypass_ip_block': opts.geo_bypass_ip_block,
'allow_p2p': opts.allow_p2p if not opts.prefer_p2p else True,
'prefer_p2p': opts.prefer_p2p,
# just for deprecation check
'autonumber': opts.autonumber if opts.autonumber is True else None,
'usetitle': opts.usetitle if opts.usetitle is True else None,

File diff suppressed because it is too large Load diff

View file

@ -1,5 +1,18 @@
from __future__ import unicode_literals
from ..utils import (
determine_protocol,
)
def _get_real_downloader(info_dict, protocol=None, *args, **kwargs):
info_copy = info_dict.copy()
if protocol:
info_copy['protocol'] = protocol
return get_suitable_downloader(info_copy, *args, **kwargs)
# Some of these require _get_real_downloader
from .common import FileDownloader
from .f4m import F4mFD
from .hls import HlsFD
@ -8,15 +21,13 @@ from .rtmp import RtmpFD
from .dash import DashSegmentsFD
from .rtsp import RtspFD
from .ism import IsmFD
from .niconico import NiconicoDmcFD
from .external import (
get_external_downloader,
Aria2cFD,
FFmpegFD,
)
from ..utils import (
determine_protocol,
)
PROTOCOL_MAP = {
'rtmp': RtmpFD,
'm3u8_native': HlsFD,
@ -26,6 +37,8 @@ PROTOCOL_MAP = {
'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
'ism': IsmFD,
'bittorrent': Aria2cFD,
'niconico_dmc': NiconicoDmcFD,
}

View file

@ -182,15 +182,16 @@ class Aria2cFD(ExternalFD):
AVAILABLE_OPT = '-v'
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-c']
cmd = [self.exe or 'aria2c', '-c']
cmd += self._configuration_args([
'--min-split-size', '1M', '--max-connection-per-server', '4'])
dn = os.path.dirname(tmpfilename)
if dn:
cmd += ['--dir', dn]
cmd += ['--out', os.path.basename(tmpfilename)]
for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)]
if info_dict['protocol'] != 'bittorrent':
for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._option('--interface', 'source_address')
cmd += self._option('--all-proxy', 'proxy')
cmd += self._bool_option('--check-certificate', 'nocheckcertificate', 'false', 'true', '=')
@ -240,7 +241,7 @@ class FFmpegFD(ExternalFD):
# setting -seekable prevents ffmpeg from guessing if the server
# supports seeking(by adding the header `Range: bytes=0-`), which
# can cause problems in some cases
# https://github.com/ytdl-org/haruhi-dl/issues/11800#issuecomment-275037127
# https://github.com/ytdl-org/youtube-dl/issues/11800#issuecomment-275037127
# http://trac.ffmpeg.org/ticket/6125#comment:10
args += ['-seekable', '1' if seekable else '0']
@ -317,7 +318,9 @@ class FFmpegFD(ExternalFD):
args += ['-fs', compat_str(self._TEST_FILE_SIZE)]
if protocol in ('m3u8', 'm3u8_native'):
if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
if info_dict['ext'] == 'vtt':
args += ['-f', 'webvtt']
elif self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
args += ['-f', 'mpegts']
else:
args += ['-f', 'mp4']
@ -341,7 +344,7 @@ class FFmpegFD(ExternalFD):
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/ytdl-org/haruhi-dl/issues/8300).
# files (see https://github.com/ytdl-org/youtube-dl/issues/8300).
if sys.platform != 'win32':
proc.communicate(b'q')
raise

View file

@ -324,8 +324,8 @@ class F4mFD(FragmentFD):
urlh = self.hdl.urlopen(self._prepare_url(info_dict, man_url))
man_url = urlh.geturl()
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/ytdl-org/haruhi-dl/issues/6215#issuecomment-121704244
# and https://github.com/ytdl-org/haruhi-dl/issues/7823)
# (see https://github.com/ytdl-org/youtube-dl/issues/6215#issuecomment-121704244
# and https://github.com/ytdl-org/youtube-dl/issues/7823)
manifest = fix_xml_ampersands(urlh.read().decode('utf-8', 'ignore')).strip()
doc = compat_etree_fromstring(manifest)
@ -409,7 +409,7 @@ class F4mFD(FragmentFD):
# In tests, segments may be truncated, and thus
# FlvReader may not be able to parse the whole
# chunk. If so, write the segment as is
# See https://github.com/ytdl-org/haruhi-dl/issues/9214
# See https://github.com/ytdl-org/youtube-dl/issues/9214
dest_stream.write(down_data)
break
raise

View file

@ -154,8 +154,8 @@ class HlsFD(FragmentFD):
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/ytdl-org/haruhi-dl/issues/10165,
# https://github.com/ytdl-org/haruhi-dl/issues/10448).
# See https://github.com/ytdl-org/youtube-dl/issues/10165,
# https://github.com/ytdl-org/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries)
@ -173,7 +173,7 @@ class HlsFD(FragmentFD):
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.hdl.urlopen(
self._prepare_url(info_dict, info_dict.get('_decryption_key_url') or decrypt_info['URI'])).read()
# Don't decrypt the content in tests since the data is explicitly truncated and it's not to a valid block
# size (see https://github.com/hdl-org/haruhi-dl/pull/27660). Tests only care that the correct data downloaded,
# size (see https://github.com/ytdl-org/youtube-dl/pull/27660). Tests only care that the correct data downloaded,
# not what it decrypts to.
if not test:
frag_content = AES.new(

View file

@ -118,7 +118,7 @@ class HttpFD(FileDownloader):
# to match the value of requested Range HTTP header. This is due to a webservers
# that don't support resuming and serve a whole file with no Content-Range
# set in response despite of requested Range (see
# https://github.com/ytdl-org/haruhi-dl/issues/6057#issuecomment-126129799)
# https://github.com/ytdl-org/youtube-dl/issues/6057#issuecomment-126129799)
if has_range:
content_range = ctx.data.headers.get('Content-Range')
if content_range:

View file

@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
import threading
from .common import FileDownloader
from ..downloader import _get_real_downloader
from ..extractor.niconico import NiconicoIE
from ..compat import compat_urllib_request
class NiconicoDmcFD(FileDownloader):
""" Downloading niconico douga from DMC with heartbeat """
FD_NAME = 'niconico_dmc'
def real_download(self, filename, info_dict):
self.to_screen('[%s] Downloading from DMC' % self.FD_NAME)
ie = NiconicoIE(self.hdl)
info_dict, heartbeat_info_dict = ie._get_heartbeat_info(info_dict)
fd = _get_real_downloader(info_dict, params=self.params)(self.hdl, self.params)
success = download_complete = False
timer = [None]
heartbeat_lock = threading.Lock()
heartbeat_url = heartbeat_info_dict['url']
heartbeat_data = heartbeat_info_dict['data'].encode()
heartbeat_interval = heartbeat_info_dict.get('interval', 30)
def heartbeat():
try:
compat_urllib_request.urlopen(url=heartbeat_url, data=heartbeat_data)
except Exception:
self.to_screen('[%s] Heartbeat failed' % self.FD_NAME)
with heartbeat_lock:
if not download_complete:
timer[0] = threading.Timer(heartbeat_interval, heartbeat)
timer[0].start()
heartbeat_info_dict['ping']()
self.to_screen('[%s] Heartbeat with %d second interval ...' % (self.FD_NAME, heartbeat_interval))
try:
heartbeat()
if type(fd).__name__ == 'HlsFD':
info_dict.update(ie._extract_m3u8_formats(info_dict['url'], info_dict['id'])[0])
success = fd.real_download(filename, info_dict)
finally:
if heartbeat_lock:
with heartbeat_lock:
timer[0].cancel()
download_complete = True
return success

View file

@ -7,8 +7,12 @@ from .common import InfoExtractor
from ..utils import (
clean_html,
clean_podcast_url,
float_or_none,
int_or_none,
js_to_json,
parse_iso8601,
urljoin,
ExtractorError,
)
@ -124,3 +128,76 @@ class ACastChannelIE(ACastBaseIE):
entries.append(self._extract_episode(episode, show_info))
return self.playlist_result(
entries, show.get('id'), show.get('title'), show.get('description'))
class ACastPlayerIE(InfoExtractor):
IE_NAME = 'acast:player'
_VALID_URL = r'https?://player\.acast\.com/(?:[^/]+/episodes/)?(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://player.acast.com/600595844cac453f8579eca0/episodes/maciej-konieczny-podatek-medialny-to-mechanizm-kontroli?theme=default&latest=1',
'info_dict': {
'id': '601dc897fb37095537d48e6f',
'ext': 'mp3',
'title': 'Maciej Konieczny: "Podatek medialny to bardziej mechanizm kontroli niż podatkowy”',
'upload_date': '20210208',
'timestamp': 1612764000,
},
}, {
'url': 'https://player.acast.com/5d09057251a90dcf7fa8e985?theme=default&latest=1',
'info_dict': {
'id': '5d09057251a90dcf7fa8e985',
'title': 'DGPtalk: Obiektywnie o biznesie',
},
'playlist_mincount': 5,
}]
@staticmethod
def _extract_urls(webpage, **kw):
return [mobj.group('url')
for mobj in re.finditer(
r'(?x)<iframe\b[^>]+\bsrc=(["\'])(?P<url>%s(?:\?[^#]+)?(?:\#.+?)?)\1' % ACastPlayerIE._VALID_URL,
webpage)]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
data = self._parse_json(
js_to_json(
self._search_regex(
r'(?s)var _global\s*=\s*({.+?});',
webpage, 'podcast data')), display_id)
show = data['show']
players = [{
'id': player['_id'],
'title': player['title'],
'url': player['audio'],
'duration': float_or_none(player.get('duration')),
'timestamp': parse_iso8601(player.get('publishDate')),
'thumbnail': urljoin('https://player.acast.com/', player.get('cover')),
'series': show['title'],
'episode': player['title'],
} for player in data['player']]
if len(players) > 1:
info_dict = {
'_type': 'playlist',
'entries': players,
'id': show['_id'],
'title': show['title'],
'series': show['title'],
}
if show.get('cover'):
info_dict['thumbnails'] = [{
'url': urljoin('https://player.acast.com/', show['cover']['url']),
'filesize': int_or_none(show['cover'].get('size')),
}]
return info_dict
if len(players) == 1:
return players[0]
raise ExtractorError('No podcast episodes found')

View file

@ -9,10 +9,10 @@ from ..utils import (
class AppleConnectIE(InfoExtractor):
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/idsa\.(?P<id>[\w-]+)'
_TEST = {
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/(?:id)?sa\.(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://itunes.apple.com/us/post/idsa.4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'md5': 'e7c38568a01ea45402570e6029206723',
'md5': 'c1d41f72c8bcaf222e089434619316e4',
'info_dict': {
'id': '4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'ext': 'm4v',
@ -22,7 +22,10 @@ class AppleConnectIE(InfoExtractor):
'upload_date': '20150710',
'timestamp': 1436545535,
},
}
}, {
'url': 'https://itunes.apple.com/us/post/sa.0fe0229f-2457-11e5-9f40-1bb645f2d5d9',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -36,7 +39,7 @@ class AppleConnectIE(InfoExtractor):
video_data = self._parse_json(video_json, video_id)
timestamp = str_to_int(self._html_search_regex(r'data-timestamp="(\d+)"', webpage, 'timestamp'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count', default=None))
return {
'id': video_id,

View file

@ -42,6 +42,7 @@ class ApplePodcastsIE(InfoExtractor):
ember_data = self._parse_json(self._search_regex(
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
webpage, 'ember data'), episode_id)
ember_data = ember_data.get(episode_id) or ember_data
episode = ember_data['data']['attributes']
description = episode.get('description') or {}

View file

@ -249,14 +249,14 @@ class ARDMediathekIE(ARDMediathekBaseIE):
class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(?:www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?:video-?)?(?P<id>[0-9]+))\.html'
_VALID_URL = r'(?P<mainurl>https?://(?:www\.)?daserste\.de/(?:[^/?#&]+/)+(?P<id>[^/?#&]+))\.html'
_TESTS = [{
# available till 7.01.2022
'url': 'https://www.daserste.de/information/talk/maischberger/videos/maischberger-die-woche-video100.html',
'md5': '867d8aa39eeaf6d76407c5ad1bb0d4c1',
'info_dict': {
'display_id': 'maischberger-die-woche',
'id': '100',
'id': 'maischberger-die-woche-video100',
'display_id': 'maischberger-die-woche-video100',
'ext': 'mp4',
'duration': 3687.0,
'title': 'maischberger. die woche vom 7. Januar 2021',
@ -264,16 +264,25 @@ class ARDIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$',
},
}, {
'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
'url': 'https://www.daserste.de/information/politik-weltgeschehen/morgenmagazin/videosextern/dominik-kahun-aus-der-nhl-direkt-zur-weltmeisterschaft-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/information/nachrichten-wetter/tagesthemen/videosextern/tagesthemen-17736.html',
'only_matching': True,
}, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/unterhaltung/serie/in-aller-freundschaft-die-jungen-aerzte/Drehpause-100.html',
'only_matching': True,
}, {
'url': 'https://www.daserste.de/unterhaltung/film/filmmittwoch-im-ersten/videos/making-ofwendezeit-video-100.html',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
display_id = mobj.group('id')
player_url = mobj.group('mainurl') + '~playerXml.xml'
doc = self._download_xml(player_url, display_id)
@ -324,7 +333,7 @@ class ARDIE(InfoExtractor):
self._sort_formats(formats)
return {
'id': mobj.group('id'),
'id': xpath_text(video_node, './videoId', default=display_id),
'formats': formats,
'display_id': display_id,
'title': video_node.find('./title').text,
@ -335,7 +344,7 @@ class ARDIE(InfoExtractor):
class ARDBetaMediathekIE(ARDMediathekBaseIE):
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?:[^/]+/)?(?:player|live|video)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/',
'md5': 'a1dc75a39c61601b980648f7c9f9f71d',
@ -365,22 +374,22 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
}, {
'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/video/coronavirus-update-ndr-info/astrazeneca-kurz-lockdown-und-pims-syndrom-81/ndr/Y3JpZDovL25kci5kZS84NzE0M2FjNi0wMWEwLTQ5ODEtOTE5NS1mOGZhNzdhOTFmOTI/',
'only_matching': True,
}, {
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3dkci5kZS9CZWl0cmFnLWQ2NDJjYWEzLTMwZWYtNGI4NS1iMTI2LTU1N2UxYTcxOGIzOQ/tatort-duo-koeln-leipzig-ihr-kinderlein-kommet',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
display_id = mobj.group('display_id')
if display_id:
display_id = display_id.rstrip('/')
if not display_id:
display_id = video_id
video_id = self._match_id(url)
player_page = self._download_json(
'https://api.ardmediathek.de/public-gateway',
display_id, data=json.dumps({
video_id, data=json.dumps({
'query': '''{
playerPage(client:"%s", clipId: "%s") {
playerPage(client: "ard", clipId: "%s") {
blockedByFsk
broadcastedOn
maturityContentRating
@ -410,7 +419,7 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
}
}
}
}''' % (mobj.group('client'), video_id),
}''' % video_id,
}).encode(), headers={
'Content-Type': 'application/json'
})['data']['playerPage']
@ -435,7 +444,6 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
info.update({
'age_limit': age_limit,
'display_id': display_id,
'title': title,
'description': description,
'timestamp': unified_timestamp(player_page.get('broadcastedOn')),

View file

@ -103,7 +103,7 @@ class ArkenaIE(InfoExtractor):
f_url, video_id, mpd_id=kind, fatal=False))
elif kind == 'silverlight':
# TODO: process when ism is supported (see
# https://github.com/ytdl-org/haruhi-dl/issues/8118)
# https://github.com/ytdl-org/youtube-dl/issues/8118)
continue
else:
tbr = float_or_none(f.get('Bitrate'), 1000)

View file

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
remove_start,
)
class ArnesIE(InfoExtractor):
IE_NAME = 'video.arnes.si'
IE_DESC = 'Arnes Video'
_VALID_URL = r'https?://video\.arnes\.si/(?:[a-z]{2}/)?(?:watch|embed|api/(?:asset|public/video))/(?P<id>[0-9a-zA-Z]{12})'
_TESTS = [{
'url': 'https://video.arnes.si/watch/a1qrWTOQfVoU?t=10',
'md5': '4d0f4d0a03571b33e1efac25fd4a065d',
'info_dict': {
'id': 'a1qrWTOQfVoU',
'ext': 'mp4',
'title': 'Linearna neodvisnost, definicija',
'description': 'Linearna neodvisnost, definicija',
'license': 'PRIVATE',
'creator': 'Polona Oblak',
'timestamp': 1585063725,
'upload_date': '20200324',
'channel': 'Polona Oblak',
'channel_id': 'q6pc04hw24cj',
'channel_url': 'https://video.arnes.si/?channel=q6pc04hw24cj',
'duration': 596.75,
'view_count': int,
'tags': ['linearna_algebra'],
'start_time': 10,
}
}, {
'url': 'https://video.arnes.si/api/asset/s1YjnV7hadlC/play.mp4',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/en/watch/s1YjnV7hadlC',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC?t=123&hideRelated=1',
'only_matching': True,
}, {
'url': 'https://video.arnes.si/api/public/video/s1YjnV7hadlC',
'only_matching': True,
}]
_BASE_URL = 'https://video.arnes.si'
def _real_extract(self, url):
video_id = self._match_id(url)
video = self._download_json(
self._BASE_URL + '/api/public/video/' + video_id, video_id)['data']
title = video['title']
formats = []
for media in (video.get('media') or []):
media_url = media.get('url')
if not media_url:
continue
formats.append({
'url': self._BASE_URL + media_url,
'format_id': remove_start(media.get('format'), 'FORMAT_'),
'format_note': media.get('formatTranslation'),
'width': int_or_none(media.get('width')),
'height': int_or_none(media.get('height')),
})
self._sort_formats(formats)
channel = video.get('channel') or {}
channel_id = channel.get('url')
thumbnail = video.get('thumbnailUrl')
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': self._BASE_URL + thumbnail,
'description': video.get('description'),
'license': video.get('license'),
'creator': video.get('author'),
'timestamp': parse_iso8601(video.get('creationTime')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': self._BASE_URL + '/?channel=' + channel_id if channel_id else None,
'duration': float_or_none(video.get('duration'), 1000),
'view_count': int_or_none(video.get('views')),
'tags': video.get('hashtags'),
'start_time': int_or_none(compat_parse_qs(
compat_urllib_parse_urlparse(url).query).get('t', [None])[0]),
}

View file

@ -0,0 +1,37 @@
# coding: utf-8
from __future__ import unicode_literals
from .brightcove import BrightcoveNewIE
from ..utils import extract_attributes
class BandaiChannelIE(BrightcoveNewIE):
IE_NAME = 'bandaichannel'
_VALID_URL = r'https?://(?:www\.)?b-ch\.com/titles/(?P<id>\d+/\d+)'
_TESTS = [{
'url': 'https://www.b-ch.com/titles/514/001',
'md5': 'a0f2d787baa5729bed71108257f613a4',
'info_dict': {
'id': '6128044564001',
'ext': 'mp4',
'title': 'メタルファイターMIKU 第1話',
'timestamp': 1580354056,
'uploader_id': '5797077852001',
'upload_date': '20200130',
'duration': 1387.733,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
attrs = extract_attributes(self._search_regex(
r'(<video-js[^>]+\bid="bcplayer"[^>]*>)', webpage, 'player'))
bc = self._download_json(
'https://pbifcd.b-ch.com/v1/playbackinfo/ST/70/' + attrs['data-info'],
video_id, headers={'X-API-KEY': attrs['data-auth'].strip()})['bc']
return self._parse_brightcove_metadata(bc, bc['id'])

View file

@ -49,6 +49,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Ben Prunty',
'timestamp': 1396508491,
'upload_date': '20140403',
'release_timestamp': 1396483200,
'release_date': '20140403',
'duration': 260.877,
'track': 'Lanius (Battle)',
@ -69,6 +70,7 @@ class BandcampIE(InfoExtractor):
'uploader': 'Mastodon',
'timestamp': 1322005399,
'upload_date': '20111122',
'release_timestamp': 1076112000,
'release_date': '20040207',
'duration': 120.79,
'track': 'Hail to Fire',
@ -197,7 +199,7 @@ class BandcampIE(InfoExtractor):
'thumbnail': thumbnail,
'uploader': artist,
'timestamp': timestamp,
'release_date': unified_strdate(tralbum.get('album_release_date')),
'release_timestamp': unified_timestamp(tralbum.get('album_release_date')),
'duration': duration,
'track': track,
'track_number': track_number,

View file

@ -1,31 +1,39 @@
# coding: utf-8
from __future__ import unicode_literals
import functools
import itertools
import json
import re
from .common import InfoExtractor
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import (
ExtractorError,
OnDemandPagedList,
clean_html,
dict_get,
ExtractorError,
float_or_none,
get_element_by_class,
int_or_none,
js_to_json,
parse_duration,
parse_iso8601,
strip_or_none,
try_get,
unescapeHTML,
unified_timestamp,
url_or_none,
urlencode_postdata,
urljoin,
)
from ..compat import (
compat_etree_Element,
compat_HTTPError,
compat_urlparse,
)
class BBCCoUkIE(InfoExtractor):
@ -204,7 +212,7 @@ class BBCCoUkIE(InfoExtractor):
},
'skip': 'Now it\'s really geo-restricted',
}, {
# compact player (https://github.com/ytdl-org/haruhi-dl/issues/8147)
# compact player (https://github.com/ytdl-org/youtube-dl/issues/8147)
'url': 'http://www.bbc.co.uk/programmes/p028bfkf/player',
'info_dict': {
'id': 'p028bfkj',
@ -756,8 +764,17 @@ class BBCIE(BBCCoUkIE):
'only_matching': True,
}, {
# custom redirection to www.bbc.com
# also, video with window.__INITIAL_DATA__
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True,
'info_dict': {
'id': 'p02xzws1',
'ext': 'mp4',
'title': "Pluto may have 'nitrogen glaciers'",
'description': 'md5:6a95b593f528d7a5f2605221bc56912f',
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1437785037,
'upload_date': '20150725',
},
}, {
# single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
@ -793,11 +810,25 @@ class BBCIE(BBCCoUkIE):
'description': 'Learn English words and phrases from this story',
},
'add_ie': [BBCCoUkIE.ie_key()],
}, {
# BBC Reel
'url': 'https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness',
'info_dict': {
'id': 'p07c6sb9',
'ext': 'mp4',
'title': 'How positive thinking is harming your happiness',
'alt_title': 'The downsides of positive thinking',
'description': 'md5:fad74b31da60d83b8265954ee42d85b4',
'duration': 235,
'thumbnail': r're:https?://.+/p07c9dsr.jpg',
'upload_date': '20190604',
'categories': ['Psychology'],
},
}]
@classmethod
def suitable(cls, url):
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerEpisodesIE, BBCCoUkIPlayerGroupIE, BBCCoUkPlaylistIE)
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
else super(BBCIE, cls).suitable(url))
@ -929,7 +960,7 @@ class BBCIE(BBCCoUkIE):
else:
entry['title'] = info['title']
entry['formats'].extend(info['formats'])
except Exception as e:
except ExtractorError as e:
# Some playlist URL may fail with 500, at the same time
# the other one may work fine (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
@ -980,6 +1011,37 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles,
}
# bbc reel (e.g. https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness)
initial_data = self._parse_json(self._html_search_regex(
r'<script[^>]+id=(["\'])initial-data\1[^>]+data-json=(["\'])(?P<json>(?:(?!\2).)+)',
webpage, 'initial data', default='{}', group='json'), playlist_id, fatal=False)
if initial_data:
init_data = try_get(
initial_data, lambda x: x['initData']['items'][0], dict) or {}
smp_data = init_data.get('smpData') or {}
clip_data = try_get(smp_data, lambda x: x['items'][0], dict) or {}
version_id = clip_data.get('versionID')
if version_id:
title = smp_data['title']
formats, subtitles = self._download_media_selector(version_id)
self._sort_formats(formats)
image_url = smp_data.get('holdingImageURL')
display_date = init_data.get('displayDate')
topic_title = init_data.get('topicTitle')
return {
'id': version_id,
'title': title,
'formats': formats,
'alt_title': init_data.get('shortTitle'),
'thumbnail': image_url.replace('$recipe', 'raw') if image_url else None,
'description': smp_data.get('summary') or init_data.get('shortSummary'),
'upload_date': display_date.replace('-', '') if display_date else None,
'subtitles': subtitles,
'duration': int_or_none(clip_data.get('duration')),
'categories': [topic_title] if topic_title else None,
}
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
# There are several setPayload calls may be present but the video
# seems to be always related to the first one
@ -1041,7 +1103,7 @@ class BBCIE(BBCCoUkIE):
thumbnail = None
image_url = current_programme.get('image_url')
if image_url:
thumbnail = image_url.replace('{recipe}', '1920x1920')
thumbnail = image_url.replace('{recipe}', 'raw')
return {
'id': programme_id,
'title': title,
@ -1114,12 +1176,29 @@ class BBCIE(BBCCoUkIE):
continue
formats, subtitles = self._download_media_selector(item_id)
self._sort_formats(formats)
item_desc = None
blocks = try_get(media, lambda x: x['summary']['blocks'], list)
if blocks:
summary = []
for block in blocks:
text = try_get(block, lambda x: x['model']['text'], compat_str)
if text:
summary.append(text)
if summary:
item_desc = '\n\n'.join(summary)
item_time = None
for meta in try_get(media, lambda x: x['metadata']['items'], list) or []:
if try_get(meta, lambda x: x['label']) == 'Published':
item_time = unified_timestamp(meta.get('timestamp'))
break
entries.append({
'id': item_id,
'title': item_title,
'thumbnail': item.get('holdingImageUrl'),
'formats': formats,
'subtitles': subtitles,
'timestamp': item_time,
'description': strip_or_none(item_desc),
})
for resp in (initial_data.get('data') or {}).values():
name = resp.get('name')
@ -1293,21 +1372,149 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
playlist_id, title, description)
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:playlist'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
class BBCCoUkIPlayerPlaylistBaseIE(InfoExtractor):
_VALID_URL_TMPL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/%%s/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
@staticmethod
def _get_default(episode, key, default_key='default'):
return try_get(episode, lambda x: x[key][default_key])
def _get_description(self, data):
synopsis = data.get(self._DESCRIPTION_KEY) or {}
return dict_get(synopsis, ('large', 'medium', 'small'))
def _fetch_page(self, programme_id, per_page, series_id, page):
elements = self._get_elements(self._call_api(
programme_id, per_page, page + 1, series_id))
for element in elements:
episode = self._get_episode(element)
episode_id = episode.get('id')
if not episode_id:
continue
thumbnail = None
image = self._get_episode_image(episode)
if image:
thumbnail = image.replace('{recipe}', 'raw')
category = self._get_default(episode, 'labels', 'category')
yield {
'_type': 'url',
'id': episode_id,
'title': self._get_episode_field(episode, 'subtitle'),
'url': 'https://www.bbc.co.uk/iplayer/episode/' + episode_id,
'thumbnail': thumbnail,
'description': self._get_description(episode),
'categories': [category] if category else None,
'series': self._get_episode_field(episode, 'title'),
'ie_key': BBCCoUkIE.ie_key(),
}
def _real_extract(self, url):
pid = self._match_id(url)
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
series_id = qs.get('seriesId', [None])[0]
page = qs.get('page', [None])[0]
per_page = 36 if page else self._PAGE_SIZE
fetch_page = functools.partial(self._fetch_page, pid, per_page, series_id)
entries = fetch_page(int(page) - 1) if page else OnDemandPagedList(fetch_page, self._PAGE_SIZE)
playlist_data = self._get_playlist_data(self._call_api(pid, 1))
return self.playlist_result(
entries, pid, self._get_playlist_title(playlist_data),
self._get_description(playlist_data))
class BBCCoUkIPlayerEpisodesIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:episodes'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'episodes'
_TESTS = [{
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
'info_dict': {
'id': 'b05rcz9v',
'title': 'The Disappearance',
'description': 'French thriller serial about a missing teenager.',
'description': 'md5:58eb101aee3116bad4da05f91179c0cb',
},
'playlist_mincount': 6,
'skip': 'This programme is not currently available on BBC iPlayer',
'playlist_mincount': 8,
}, {
# all seasons
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 10,
}, {
# explicit season
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster?seriesId=b094m6nv',
'info_dict': {
'id': 'b094m5t9',
'title': 'Doctor Foster',
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
},
'playlist_mincount': 5,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 37,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove?page=2',
'info_dict': {
'id': 'm0004c4v',
'title': 'Beechgrove',
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
},
'playlist_mincount': 1,
}]
_PAGE_SIZE = 100
_DESCRIPTION_KEY = 'synopsis'
def _get_episode_image(self, episode):
return self._get_default(episode, 'image')
def _get_episode_field(self, episode, field):
return self._get_default(episode, field)
@staticmethod
def _get_elements(data):
return data['entities']['results']
@staticmethod
def _get_episode(element):
return element.get('episode') or {}
def _call_api(self, pid, per_page, page=1, series_id=None):
variables = {
'id': pid,
'page': page,
'perPage': per_page,
}
if series_id:
variables['sliceId'] = series_id
return self._download_json(
'https://graph.ibl.api.bbc.co.uk/', pid, headers={
'Content-Type': 'application/json'
}, data=json.dumps({
'id': '5692d93d5aac8d796a0305e895e61551',
'variables': variables,
}).encode('utf-8'))['data']['programme']
@staticmethod
def _get_playlist_data(data):
return data
def _get_playlist_title(self, data):
return self._get_default(data, 'title')
class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
IE_NAME = 'bbc.co.uk:iplayer:group'
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'group'
_TESTS = [{
# Available for over a year unlike 30 days for most other programmes
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
'info_dict': {
@ -1316,14 +1523,56 @@ class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
},
'playlist_mincount': 10,
}, {
# all pages
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 47,
}, {
# explicit page
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7?page=2',
'info_dict': {
'id': 'p081d7j7',
'title': 'Music in Scotland',
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
},
'playlist_mincount': 11,
}]
_PAGE_SIZE = 200
_DESCRIPTION_KEY = 'synopses'
def _extract_title_and_description(self, webpage):
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
description = self._search_regex(
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
webpage, 'description', fatal=False, group='value')
return title, description
def _get_episode_image(self, episode):
return self._get_default(episode, 'images', 'standard')
def _get_episode_field(self, episode, field):
return episode.get(field)
@staticmethod
def _get_elements(data):
return data['elements']
@staticmethod
def _get_episode(element):
return element
def _call_api(self, pid, per_page, page=1, series_id=None):
return self._download_json(
'http://ibl.api.bbc.co.uk/ibl/v1/groups/%s/episodes' % pid,
pid, query={
'page': page,
'per_page': per_page,
})['group_episodes']
@staticmethod
def _get_playlist_data(data):
return data['group']
def _get_playlist_title(self, data):
return data.get('title')
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):

View file

@ -156,6 +156,7 @@ class BiliBiliIE(InfoExtractor):
cid = js['result']['cid']
headers = {
'Accept': 'application/json',
'Referer': url
}
headers.update(self.geo_verification_headers())
@ -232,7 +233,7 @@ class BiliBiliIE(InfoExtractor):
webpage)
if uploader_mobj:
info.update({
'uploader': uploader_mobj.group('name'),
'uploader': uploader_mobj.group('name').strip(),
'uploader_id': uploader_mobj.group('id'),
})
if not info.get('uploader'):

View file

@ -1,86 +0,0 @@
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
remove_start,
int_or_none,
)
class BlinkxIE(InfoExtractor):
_VALID_URL = r'(?:https?://(?:www\.)blinkx\.com/#?ce/|blinkx:)(?P<id>[^?]+)'
IE_NAME = 'blinkx'
_TEST = {
'url': 'http://www.blinkx.com/ce/Da0Gw3xc5ucpNduzLuDDlv4WC9PuI4fDi1-t6Y3LyfdY2SZS5Urbvn-UPJvrvbo8LTKTc67Wu2rPKSQDJyZeeORCR8bYkhs8lI7eqddznH2ofh5WEEdjYXnoRtj7ByQwt7atMErmXIeYKPsSDuMAAqJDlQZ-3Ff4HJVeH_s3Gh8oQ',
'md5': '337cf7a344663ec79bf93a526a2e06c7',
'info_dict': {
'id': 'Da0Gw3xc',
'ext': 'mp4',
'title': 'No Daily Show for John Oliver; HBO Show Renewed - IGN News',
'uploader': 'IGN News',
'upload_date': '20150217',
'timestamp': 1424215740,
'description': 'HBO has renewed Last Week Tonight With John Oliver for two more seasons.',
'duration': 47.743333,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = video_id[:8]
api_url = ('https://apib4.blinkx.com/api.php?action=play_video&'
+ 'video=%s' % video_id)
data_json = self._download_webpage(api_url, display_id)
data = json.loads(data_json)['api']['results'][0]
duration = None
thumbnails = []
formats = []
for m in data['media']:
if m['type'] == 'jpg':
thumbnails.append({
'url': m['link'],
'width': int(m['w']),
'height': int(m['h']),
})
elif m['type'] == 'original':
duration = float(m['d'])
elif m['type'] == 'youtube':
yt_id = m['link']
self.to_screen('Youtube video detected: %s' % yt_id)
return self.url_result(yt_id, 'Youtube', video_id=yt_id)
elif m['type'] in ('flv', 'mp4'):
vcodec = remove_start(m['vcodec'], 'ff')
acodec = remove_start(m['acodec'], 'ff')
vbr = int_or_none(m.get('vbr') or m.get('vbitrate'), 1000)
abr = int_or_none(m.get('abr') or m.get('abitrate'), 1000)
tbr = vbr + abr if vbr and abr else None
format_id = '%s-%sk-%s' % (vcodec, tbr, m['w'])
formats.append({
'format_id': format_id,
'url': m['link'],
'vcodec': vcodec,
'acodec': acodec,
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(m.get('w')),
'height': int_or_none(m.get('h')),
})
self._sort_formats(formats)
return {
'id': display_id,
'fullid': video_id,
'title': data['title'],
'formats': formats,
'uploader': data['channel_name'],
'timestamp': data['pubdate_epoch'],
'description': data.get('description'),
'thumbnails': thumbnails,
'duration': duration,
}

View file

@ -130,7 +130,7 @@ class BrightcoveLegacyIE(InfoExtractor):
'skip': 'Unsupported URL',
},
{
# playlist with 'playlistTab' (https://github.com/ytdl-org/haruhi-dl/issues/9965)
# playlist with 'playlistTab' (https://github.com/ytdl-org/youtube-dl/issues/9965)
'url': 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=AQ%7E%7E,AAABXlLMdok%7E,NJ4EoMlZ4rZdx9eU1rkMVd8EaYPBBUlg',
'info_dict': {
'id': '1522758701001',
@ -154,10 +154,10 @@ class BrightcoveLegacyIE(InfoExtractor):
<object class="BrightcoveExperience">{params}</object>
"""
# Fix up some stupid HTML, see https://github.com/ytdl-org/haruhi-dl/issues/1553
# Fix up some stupid HTML, see https://github.com/ytdl-org/youtube-dl/issues/1553
object_str = re.sub(r'(<param(?:\s+[a-zA-Z0-9_]+="[^"]*")*)>',
lambda m: m.group(1) + '/>', object_str)
# Fix up some stupid XML, see https://github.com/ytdl-org/haruhi-dl/issues/1608
# Fix up some stupid XML, see https://github.com/ytdl-org/youtube-dl/issues/1608
object_str = object_str.replace('<--', '<!--')
# remove namespace to simplify extraction
object_str = re.sub(r'(<object[^>]*)(xmlns=".*?")', r'\1', object_str)

View file

@ -0,0 +1,91 @@
# coding: utf-8
from .common import InfoExtractor
from ..utils import (
parse_duration,
)
import re
class CastosHostedIE(InfoExtractor):
_VALID_URL = r'https?://[^/.]+\.castos\.com/(?:player|episodes)/(?P<id>[\da-zA-Z-]+)'
IE_NAME = 'castos:hosted'
_TESTS = [{
'url': 'https://audience.castos.com/player/408278',
'info_dict': {
'id': '408278',
'ext': 'mp3',
},
}, {
'url': 'https://audience.castos.com/episodes/improve-your-podcast-production',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage, **kw):
return [mobj.group(1) for mobj
in re.finditer(
r'<iframe\b[^>]+(?<!-)src="(https?://[^/.]+\.castos\.com/player/\d+)',
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
series = self._html_search_regex(
r'<div class="show">\s+<strong>([^<]+)</strong>', webpage, 'series name')
title = self._html_search_regex(
r'<div class="episode-title">([^<]+)</div>', webpage, 'episode title')
audio_url = self._html_search_regex(
r'<audio class="clip">\s+<source\b[^>]+src="(https?://[^"]+)"', webpage, 'audio url')
duration = parse_duration(self._search_regex(
r'<time id="duration">(\d\d(?::\d\d)+)</time>', webpage, 'duration'))
return {
'id': video_id,
'title': title,
'url': audio_url,
'duration': duration,
'series': series,
'episode': title,
}
class CastosSSPIE(InfoExtractor):
@classmethod
def _extract_entries(self, webpage, **kw):
entries = []
for found in re.finditer(
r'(?s)<div class="castos-player[^"]*"[^>]*data-episode="(\d+)-[a-z\d]+">(.+?</nav>)\s*</div>',
webpage):
video_id, entry = found.group(1, 2)
def search_entry(regex):
res = re.search(regex, entry)
if res:
return res.group(1)
series = search_entry(r'<div class="show">\s+<strong>([^<]+)</strong>')
title = search_entry(r'<div class="episode-title">([^<]+)</div>')
audio_url = search_entry(
r'<audio class="clip[^"]*">\s+<source\b[^>]+src="(https?://[^"]+)"')
duration = parse_duration(
search_entry(r'<time id="duration[^"]*">(\d\d(?::\d\d)+)</time>'))
if not title or not audio_url:
continue
entries.append({
'id': video_id,
'title': title,
'url': audio_url,
'duration': duration,
'series': series,
'episode': title,
})
return entries

View file

@ -27,7 +27,7 @@ class CBSBaseIE(ThePlatformFeedIE):
class CBSIE(CBSBaseIE):
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:(?:cbs|paramountplus)\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
_TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
@ -52,6 +52,9 @@ class CBSIE(CBSBaseIE):
}, {
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}, {
'url': 'https://www.paramountplus.com/shows/all-rise/video/QmR1WhNkh1a_IrdHZrbcRklm176X_rVc/all-rise-space/',
'only_matching': True,
}]
def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517):

View file

@ -26,7 +26,7 @@ class CBSNewsEmbedIE(CBSIE):
def _real_extract(self, url):
item = self._parse_json(zlib.decompress(compat_b64decode(
compat_urllib_parse_unquote(self._match_id(url))),
-zlib.MAX_WBITS), None)['video']['items'][0]
-zlib.MAX_WBITS).decode('utf-8'), None)['video']['items'][0]
return self._extract_video_info(item['mpxRefId'], 'cbsnews')

View file

@ -1,38 +1,113 @@
from __future__ import unicode_literals
from .cbs import CBSBaseIE
import re
# from .cbs import CBSBaseIE
from .common import InfoExtractor
from ..utils import (
int_or_none,
try_get,
)
class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/(?:video|news)/(?P<id>[^/?#&]+)'
# class CBSSportsEmbedIE(CBSBaseIE):
class CBSSportsEmbedIE(InfoExtractor):
IE_NAME = 'cbssports:embed'
_VALID_URL = r'''(?ix)https?://(?:(?:www\.)?cbs|embed\.247)sports\.com/player/embed.+?
(?:
ids%3D(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})|
pcid%3D(?P<pcid>\d+)
)'''
_TESTS = [{
'url': 'https://www.cbssports.com/nba/video/donovan-mitchell-flashes-star-potential-in-game-2-victory-over-thunder/',
'info_dict': {
'id': '1214315075735',
'ext': 'mp4',
'title': 'Donovan Mitchell flashes star potential in Game 2 victory over Thunder',
'description': 'md5:df6f48622612c2d6bd2e295ddef58def',
'timestamp': 1524111457,
'upload_date': '20180419',
'uploader': 'CBSI-NEW',
},
'params': {
# m3u8 download
'skip_download': True,
}
'url': 'https://www.cbssports.com/player/embed/?args=player_id%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26ids%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26resizable%3D1%26autoplay%3Dtrue%26domain%3Dcbssports.com%26comp_ads_enabled%3Dfalse%26watchAndRead%3D0%26startTime%3D0%26env%3Dprod',
'only_matching': True,
}, {
'url': 'https://www.cbssports.com/nba/news/nba-playoffs-2018-watch-76ers-vs-heat-game-3-series-schedule-tv-channel-online-stream/',
'url': 'https://embed.247sports.com/player/embed/?args=%3fplayer_id%3d1827823171591%26channel%3dcollege-football-recruiting%26pcid%3d1827823171591%26width%3d640%26height%3d360%26autoplay%3dTrue%26comp_ads_enabled%3dFalse%26uvpc%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_v4%2526partner%253d247%26uvpc_m%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_m_v4%2526partner_m%253d247_mobile%26utag%3d247sportssite%26resizable%3dTrue',
'only_matching': True,
}]
def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
# def _extract_video_info(self, filter_query, video_id):
# return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url):
uuid, pcid = re.match(self._VALID_URL, url).groups()
query = {'id': uuid} if uuid else {'pcid': pcid}
video = self._download_json(
'https://www.cbssports.com/api/content/video/',
uuid or pcid, query=query)[0]
video_id = video['id']
title = video['title']
metadata = video.get('metaData') or {}
# return self._extract_video_info('byId=%d' % metadata['mpxOutletId'], video_id)
# return self._extract_video_info('byGuid=' + metadata['mpxRefId'], video_id)
formats = self._extract_m3u8_formats(
metadata['files'][0]['url'], video_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False)
self._sort_formats(formats)
image = video.get('image')
thumbnails = None
if image:
image_path = image.get('path')
if image_path:
thumbnails = [{
'url': image_path,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
'filesize': int_or_none(image.get('size')),
}]
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnails': thumbnails,
'description': video.get('description'),
'timestamp': int_or_none(try_get(video, lambda x: x['dateCreated']['epoch'])),
'duration': int_or_none(metadata.get('duration')),
}
class CBSSportsBaseIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'(?:=|%26)pcid%3D(\d+)', r'embedVideo(?:Container)?_(\d+)'],
webpage, 'video id')
return self._extract_video_info('byId=%s' % video_id, video_id)
iframe_url = self._search_regex(
r'<iframe[^>]+(?:data-)?src="(https?://[^/]+/player/embed[^"]+)"',
webpage, 'embed url')
return self.url_result(iframe_url, CBSSportsEmbedIE.ie_key())
class CBSSportsIE(CBSSportsBaseIE):
IE_NAME = 'cbssports'
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/video/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.cbssports.com/college-football/video/cover-3-stanford-spring-gleaning/',
'info_dict': {
'id': 'b56c03a6-231a-4bbe-9c55-af3c8a8e9636',
'ext': 'mp4',
'title': 'Cover 3: Stanford Spring Gleaning',
'description': 'The Cover 3 crew break down everything you need to know about the Stanford Cardinal this spring.',
'timestamp': 1617218398,
'upload_date': '20210331',
'duration': 502,
},
}]
class TwentyFourSevenSportsIE(CBSSportsBaseIE):
IE_NAME = '247sports'
_VALID_URL = r'https?://(?:www\.)?247sports\.com/Video/(?:[^/?#&]+-)?(?P<id>\d+)'
_TESTS = [{
'url': 'https://247sports.com/Video/2021-QB-Jake-Garcia-senior-highlights-through-five-games-10084854/',
'info_dict': {
'id': '4f1265cb-c3b5-44a8-bb1d-1914119a0ccc',
'ext': 'mp4',
'title': '2021 QB Jake Garcia senior highlights through five games',
'description': 'md5:8cb67ebed48e2e6adac1701e0ff6e45b',
'timestamp': 1607114223,
'upload_date': '20201204',
'duration': 208,
},
}]

View file

@ -1,7 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import datetime
import hashlib
import hmac
from .common import InfoExtractor
from ..utils import (
@ -21,6 +24,7 @@ class CDABaseExtractor(InfoExtractor):
# apparently the token is hardcoded in the app
'Authorization': 'Basic YzU3YzBlZDUtYTIzOC00MWQwLWI2NjQtNmZmMWMxY2Y2YzVlOklBTm95QlhRRVR6U09MV1hnV3MwMW0xT2VyNWJNZzV4clRNTXhpNGZJUGVGZ0lWUlo5UGVYTDhtUGZaR1U1U3Q',
}
_NETRC_MACHINE = 'cda'
_bearer = None
# logs into cda.pl and returns _BASE_HEADERS with the Bearer token
@ -33,8 +37,27 @@ class CDABaseExtractor(InfoExtractor):
})
return headers
token_res = self._download_json(self._BASE_URL + '/oauth/token?grant_type=password&login=niezesrajciesiecda&password=VD3QbYWSb_uwAShBZKN7F1DwEg_tRTdb4Xd3JvFsx6Y',
video_id, 'Logging into cda.pl', headers=headers, data=bytes(''.encode('utf-8')))
username, password = self._get_login_info()
if username is None or password is None:
username = 'niezesrajciesiecda'
password_hash = 'VD3QbYWSb_uwAShBZKN7F1DwEg_tRTdb4Xd3JvFsx6Y'
account_type = 'shared'
else:
pwd_md5 = ""
for byte in hashlib.md5(password.encode('utf-8')).digest():
# bytes() param must be iterable of ints and not int
hexik = bytes((byte & 255, )).hex()
while len(hexik) < 2:
hexik = "0" + hexik
pwd_md5 += hexik
digest = hmac.new(
's01m1Oer5IANoyBXQETzSOLWXgWs01m1Oer5bMg5xrTMMxRZ9Pi4fIPeFgIVRZ9PeXL8mPfXQETZGUAN5StRZ9P'.encode('utf-8'),
pwd_md5.encode('utf-8'), hashlib.sha256).digest()
password_hash = base64.urlsafe_b64encode(digest).decode('utf-8').replace('=', '')
account_type = 'user'
token_res = self._download_json('%s/oauth/token?grant_type=password&login=%s&password=%s' % (self._BASE_URL, username, password_hash),
video_id, 'Logging into cda.pl with a %s account' % account_type, headers=headers, data=bytes(''.encode('utf-8')))
self._bearer = {
'token': token_res['access_token'],
@ -103,9 +126,6 @@ class CDAIE(CDABaseExtractor):
metadata = self._download_json(
self._BASE_URL + '/video/' + video_id, video_id, headers=headers)['video']
if metadata.get('premium') is True and metadata.get('premium_free') is not True:
raise ExtractorError('This video is only available for premium users.', expected=True)
uploader = try_get(metadata, lambda x: x['author']['login'])
# anonymous uploader
if uploader == 'anonim':
@ -113,6 +133,8 @@ class CDAIE(CDABaseExtractor):
formats = []
for quality in metadata['qualities']:
if not quality['file']:
continue
formats.append({
'url': quality['file'],
'format': quality['title'],
@ -121,6 +143,13 @@ class CDAIE(CDABaseExtractor):
'filesize': quality.get('length'),
})
if not formats:
if metadata.get('premium') is True and metadata.get('premium_free') is not True:
raise ExtractorError('This video is only available for premium users.', expected=True)
raise ExtractorError('No video qualities found', video_id=video_id)
self._sort_formats(formats)
return {
'id': video_id,
'title': metadata['title'],

View file

@ -157,7 +157,7 @@ class CeskaTelevizeIE(InfoExtractor):
stream_formats = self._extract_mpd_formats(
stream_url, playlist_id,
mpd_id='dash-%s' % format_id, fatal=False)
# See https://github.com/ytdl-org/haruhi-dl/issues/12119#issuecomment-280037031
# See https://github.com/ytdl-org/youtube-dl/issues/12119#issuecomment-280037031
if format_id == 'audioDescription':
for f in stream_formats:
f['source_preference'] = -10

View file

@ -17,7 +17,7 @@ import math
from ..compat import (
compat_cookiejar_Cookie,
compat_cookies,
compat_cookies_SimpleCookie,
compat_etree_Element,
compat_etree_fromstring,
compat_getpass,
@ -70,6 +70,7 @@ from ..utils import (
str_or_none,
str_to_int,
strip_or_none,
try_get,
unescapeHTML,
unified_strdate,
unified_timestamp,
@ -204,6 +205,14 @@ class InfoExtractor(object):
* downloader_options A dictionary of downloader options as
described in FileDownloader
Internally, extractors can include subtitles in the format
list, in this format:
* _subtitle The subtitle object, in the same format
as in subtitles field
* _key The tag for the provided subtitle
This is never included in the output JSON, but moved
into the subtitles field.
url: Final video URL.
ext: Video filename extension.
format: The video format, defaults to ext (used for --get-format)
@ -230,8 +239,10 @@ class InfoExtractor(object):
uploader: Full name of the video uploader.
license: License name the video is licensed under.
creator: The creator of the video.
release_timestamp: UNIX timestamp of the moment the video was released.
release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video became available.
timestamp: UNIX timestamp of the moment the video became available
(uploaded).
upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader.
@ -245,11 +256,15 @@ class InfoExtractor(object):
subtitles: The available subtitles as a dictionary in the format
{tag: subformats}. "tag" is usually a language code, and
"subformats" is a list sorted from lower to higher
preference, each element is a dictionary with the "ext"
entry and one of:
preference, each element is a dictionary,
which must contain one of these values:
* "data": The subtitles file contents
* "url": A URL pointing to the subtitles file
"ext" will be calculated from URL if missing
These values are guessed based on other data, if missing,
in a way analogic to the formats data:
* "ext" - subtitle extension name (vtt, srt, ...)
* "proto" - download protocol (https, http, m3u8, ...)
* "http_headers"
automatic_captions: Like 'subtitles', used by the YoutubeIE for
automatically generated captions
duration: Length of the video in seconds, as an integer or float.
@ -1273,6 +1288,23 @@ class InfoExtractor(object):
continue
info[count_key] = interaction_count
def extract_author(e):
if not e:
return None
if not e.get('author'):
return None
e = e['author']
if isinstance(e, str):
info['uploader'] = e
elif isinstance(e, dict):
etype = e.get('@type')
if etype in ('Person', 'Organization'):
info.update({
'uploader': e.get('name'),
'uploader_id': e.get('identifier'),
'uploader_url': try_get(e, lambda x: x['url']['url'], str),
})
media_object_types = ('MediaObject', 'VideoObject', 'AudioObject', 'MusicVideoObject')
def extract_media_object(e):
@ -1290,7 +1322,6 @@ class InfoExtractor(object):
'thumbnails': thumbnails,
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
@ -1298,6 +1329,7 @@ class InfoExtractor(object):
'view_count': int_or_none(e.get('interactionCount')),
})
extract_interaction_statistic(e)
extract_author(e)
for e in json_ld:
if '@context' in e:
@ -1391,6 +1423,10 @@ class InfoExtractor(object):
f['tbr'] = f['abr'] + f['vbr']
def _formats_key(f):
# manifest subtitle workaround
if '_subtitle' in f:
return (-1,)
# TODO remove the following workaround
from ..utils import determine_ext
if not f.get('ext') and 'url' in f:
@ -1410,7 +1446,19 @@ class InfoExtractor(object):
preference -= 0.5
protocol = f.get('protocol') or determine_protocol(f)
proto_preference = 0 if protocol in ['http', 'https'] else (-0.5 if protocol == 'rtsp' else -0.1)
if protocol in ['http', 'https']:
proto_preference = 0
elif protocol == 'rtsp':
proto_preference = -0.5
elif protocol == 'bittorrent':
if self._downloader.params.get('prefer_p2p') is True:
proto_preference = 1
elif self._downloader.params.get('allow_p2p') is True:
proto_preference = -0.1
else:
proto_preference = -2
else:
proto_preference = -0.1
if f.get('vcodec') == 'none': # audio only
preference -= 50
@ -1519,7 +1567,7 @@ class InfoExtractor(object):
manifest_url, video_id, 'Downloading f4m manifest',
'Unable to download f4m manifest',
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/ytdl-org/haruhi-dl/issues/6215#issuecomment-121704244)
# (see https://github.com/ytdl-org/youtube-dl/issues/6215#issuecomment-121704244)
transform_source=transform_source,
fatal=fatal, data=data, headers=headers, query=query)
@ -1550,7 +1598,7 @@ class InfoExtractor(object):
manifest_version = '2.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
# Remove unsupported DRM protected media from final formats
# rendition (see https://github.com/ytdl-org/haruhi-dl/issues/8573).
# rendition (see https://github.com/ytdl-org/youtube-dl/issues/8573).
media_nodes = remove_encrypted_media(media_nodes)
if not media_nodes:
return formats
@ -1681,8 +1729,8 @@ class InfoExtractor(object):
# References:
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21
# 2. https://github.com/ytdl-org/haruhi-dl/issues/12211
# 3. https://github.com/ytdl-org/haruhi-dl/issues/18923
# 2. https://github.com/ytdl-org/youtube-dl/issues/12211
# 3. https://github.com/ytdl-org/youtube-dl/issues/18923
# We should try extracting formats only from master playlists [1, 4.3.4],
# i.e. playlists that describe available qualities. On the other hand
@ -1714,7 +1762,7 @@ class InfoExtractor(object):
if not (media_type and group_id and name):
return
groups.setdefault(group_id, []).append(media)
if media_type not in ('VIDEO', 'AUDIO'):
if media_type not in ('VIDEO', 'AUDIO', 'SUBTITLES'):
return
media_url = media.get('URI')
if media_url:
@ -1722,17 +1770,27 @@ class InfoExtractor(object):
for v in (m3u8_id, group_id, name):
if v:
format_id.append(v)
f = {
'format_id': '-'.join(format_id),
'url': format_url(media_url),
'manifest_url': m3u8_url,
'language': media.get('LANGUAGE'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
}
if media_type == 'AUDIO':
f['vcodec'] = 'none'
if media_type == 'SUBTITLES':
f = {
'_subtitle': {
'url': format_url(media_url),
'ext': 'vtt',
'protocol': entry_protocol,
},
'_key': media.get('LANGUAGE'),
}
else:
f = {
'format_id': '-'.join(format_id),
'url': format_url(media_url),
'manifest_url': m3u8_url,
'language': media.get('LANGUAGE'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
}
if media_type == 'AUDIO':
f['vcodec'] = 'none'
formats.append(f)
def build_stream_name():
@ -2238,7 +2296,7 @@ class InfoExtractor(object):
# First of, % characters outside $...$ templates
# must be escaped by doubling for proper processing
# by % operator string formatting used further (see
# https://github.com/ytdl-org/haruhi-dl/issues/16867).
# https://github.com/ytdl-org/youtube-dl/issues/16867).
t = ''
in_template = False
for c in tmpl:
@ -2257,7 +2315,7 @@ class InfoExtractor(object):
# @initialization is a regular template like @media one
# so it should be handled just the same way (see
# https://github.com/ytdl-org/haruhi-dl/issues/11605)
# https://github.com/ytdl-org/youtube-dl/issues/11605)
if 'initialization' in representation_ms_info:
initialization_template = prepare_template(
'initialization',
@ -2343,7 +2401,7 @@ class InfoExtractor(object):
elif 'segment_urls' in representation_ms_info:
# Segment URLs with no SegmentTimeline
# Example: https://www.seznam.cz/zpravy/clanek/cesko-zasahne-vitr-o-sile-vichrice-muze-byt-i-zivotu-nebezpecny-39091
# https://github.com/ytdl-org/haruhi-dl/pull/14844
# https://github.com/ytdl-org/youtube-dl/pull/14844
fragments = []
segment_duration = float_or_none(
representation_ms_info['segment_duration'],
@ -2381,8 +2439,8 @@ class InfoExtractor(object):
# According to [1, 5.3.5.2, Table 7, page 35] @id of Representation
# is not necessarily unique within a Period thus formats with
# the same `format_id` are quite possible. There are numerous examples
# of such manifests (see https://github.com/ytdl-org/haruhi-dl/issues/15111,
# https://github.com/ytdl-org/haruhi-dl/issues/13919)
# of such manifests (see https://github.com/ytdl-org/youtube-dl/issues/15111,
# https://github.com/ytdl-org/youtube-dl/issues/13919)
full_info = formats_dict.get(representation_id, {}).copy()
full_info.update(f)
formats.append(full_info)
@ -2545,7 +2603,7 @@ class InfoExtractor(object):
media_tags.extend(re.findall(
# We only allow video|audio followed by a whitespace or '>'.
# Allowing more characters may end up in significant slow down (see
# https://github.com/ytdl-org/haruhi-dl/issues/11979, example URL:
# https://github.com/ytdl-org/youtube-dl/issues/11979, example URL:
# http://www.porntrex.com/maps/videositemap.xml).
r'(?s)(<(?P<tag>(?:amp-)?(?:video|audio))(?:\s+[^>]*)?>)(.*?)</(?P=tag)>', webpage))
for media_tag, media_type, media_content in media_tags:
@ -2924,10 +2982,10 @@ class InfoExtractor(object):
self._downloader.cookiejar.set_cookie(cookie)
def _get_cookies(self, url):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """
""" Return a compat_cookies_SimpleCookie with the cookies for the url """
req = sanitized_Request(url)
self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie'))
return compat_cookies_SimpleCookie(req.get_header('Cookie'))
def _apply_first_set_cookie_header(self, url_handle, cookie):
"""
@ -2940,7 +2998,7 @@ class InfoExtractor(object):
We will workaround this issue by resetting the cookie to
the first one manually.
1. https://new.vk.com/
2. https://github.com/ytdl-org/haruhi-dl/issues/9841#issuecomment-227871201
2. https://github.com/ytdl-org/youtube-dl/issues/9841#issuecomment-227871201
3. https://learning.oreilly.com/
"""
for header, cookies in url_handle.headers.items():

View file

@ -36,7 +36,7 @@ class UnicodeBOMIE(InfoExtractor):
_VALID_URL = r'(?P<bom>\ufeff)(?P<id>.*)$'
# Disable test for python 3.2 since BOM is broken in re in this version
# (see https://github.com/ytdl-org/haruhi-dl/issues/9751)
# (see https://github.com/ytdl-org/youtube-dl/issues/9751)
_TESTS = [] if (3, 0) < sys.version_info <= (3, 3) else [{
'url': '\ufeffhttp://www.youtube.com/watch?v=BaW_jenozKc',
'only_matching': True,

View file

@ -1,9 +1,15 @@
from __future__ import unicode_literals
from urllib.parse import parse_qs
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
)
from ..utils import (
try_get,
ExtractorError,
)
class RtmpIE(InfoExtractor):
@ -58,3 +64,71 @@ class MmsIE(InfoExtractor):
'title': title,
'url': url,
}
class BitTorrentMagnetIE(InfoExtractor):
IE_DESC = False
_VALID_URL = r'(?i)magnet:\?.+'
_TESTS = [{
'url': 'magnet:?xs=https%3A%2F%2Fvideo.internet-czas-dzialac.pl%2Fstatic%2Ftorrents%2F9085aa69-90c2-40c6-a707-3472b92cafc8-0.torrent&xt=urn:btih:0ae4cc8cb0e098a1a40b3224aa578bb4210a8cff&dn=Podcast+Internet.+Czas+dzia%C5%82a%C4%87!+-+Trailer&tr=wss%3A%2F%2Fvideo.internet-czas-dzialac.pl%3A443%2Ftracker%2Fsocket&tr=https%3A%2F%2Fvideo.internet-czas-dzialac.pl%2Ftracker%2Fannounce&ws=https%3A%2F%2Fvideo.internet-czas-dzialac.pl%2Fstatic%2Fwebseed%2F9085aa69-90c2-40c6-a707-3472b92cafc8-0.mp4',
'info_dict': {
'id': 'urn:btih:0ae4cc8cb0e098a1a40b3224aa578bb4210a8cff',
'ext': 'torrent',
'title': 'Podcast Internet. Czas działać! - Trailer',
},
'params': {
'allow_p2p': True,
'prefer_p2p': True,
'skip_download': True,
},
}]
def _real_extract(self, url):
qs = parse_qs(url[len('magnet:?'):])
# eXact Topic
video_id = qs['xt'][0]
if not video_id.startswith('urn:btih:'):
raise ExtractorError('Not a BitTorrent magnet')
# Display Name
title = try_get(qs, lambda x: x['dn'][0], str) or video_id[len('urn:btih:'):]
formats = [{
'url': url,
'protocol': 'bittorrent',
}]
# Web Seed
if qs.get('ws'):
for ws in qs['ws']:
formats.append({
'url': ws,
})
# Acceptable Source
if qs.get('as'):
for as_ in qs['as']:
formats.append({
'url': as_,
'preference': -2,
})
# eXact Source
if qs.get('xs'):
for xs in qs['xs']:
formats.append({
'url': xs,
'protocol': 'bittorrent',
})
self._sort_formats(formats)
# eXact Length
if qs.get('xl'):
xl = int(qs['xl'][0])
for i in range(0, len(formats)):
formats[i]['filesize'] = xl
return {
'id': video_id,
'title': title,
'formats': formats,
}

View file

@ -112,7 +112,7 @@ class CrunchyrollBaseIE(InfoExtractor):
# > This content may be inappropriate for some people.
# > Are you sure you want to continue?
# since it's not disabled by default in crunchyroll account's settings.
# See https://github.com/ytdl-org/haruhi-dl/issues/7202.
# See https://github.com/ytdl-org/youtube-dl/issues/7202.
qs['skip_wall'] = ['1']
return compat_urlparse.urlunparse(
parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
@ -267,7 +267,7 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues
# similar to https://github.com/ytdl-org/haruhi-dl/issues/6797.
# similar to https://github.com/ytdl-org/youtube-dl/issues/6797.
# Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
# should be imposed or not (from what I can see it just takes the first language
# ignoring the priority and requires it to correspond the IP). By the way this causes

View file

@ -25,12 +25,12 @@ class CuriosityStreamBaseIE(InfoExtractor):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True)
def _call_api(self, path, video_id):
def _call_api(self, path, video_id, query=None):
headers = {}
if self._auth_token:
headers['X-Auth-Token'] = self._auth_token
result = self._download_json(
self._API_BASE_URL + path, video_id, headers=headers)
self._API_BASE_URL + path, video_id, headers=headers, query=query)
self._handle_errors(result)
return result['data']
@ -52,62 +52,75 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': '262bb2f257ff301115f1973540de8983',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
}
},
'params': {
'format': 'bestvideo',
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
title = media['title']
formats = []
for encoding in media.get('encodings', []):
m3u8_url = encoding.get('master_playlist_url')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
encoding_url = encoding.get('url')
file_url = encoding.get('file_url')
if not encoding_url and not file_url:
continue
f = {
'width': int_or_none(encoding.get('width')),
'height': int_or_none(encoding.get('height')),
'vbr': int_or_none(encoding.get('video_bitrate')),
'abr': int_or_none(encoding.get('audio_bitrate')),
'filesize': int_or_none(encoding.get('size_in_bytes')),
'vcodec': encoding.get('video_codec'),
'acodec': encoding.get('audio_codec'),
'container': encoding.get('container_type'),
}
for f_url in (encoding_url, file_url):
if not f_url:
for encoding_format in ('m3u8', 'mpd'):
media = self._call_api('media/' + video_id, video_id, query={
'encodingsNew': 'true',
'encodingsFormat': encoding_format,
})
for encoding in media.get('encodings', []):
playlist_url = encoding.get('master_playlist_url')
if encoding_format == 'm3u8':
# use `m3u8` entry_protocol until EXT-X-MAP is properly supported by `m3u8_native` entry_protocol
formats.extend(self._extract_m3u8_formats(
playlist_url, video_id, 'mp4',
m3u8_id='hls', fatal=False))
elif encoding_format == 'mpd':
formats.extend(self._extract_mpd_formats(
playlist_url, video_id, mpd_id='dash', fatal=False))
encoding_url = encoding.get('url')
file_url = encoding.get('file_url')
if not encoding_url and not file_url:
continue
fmt = f.copy()
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', f_url)
if rtmp:
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': 'rtmp',
})
else:
fmt.update({
'url': f_url,
'format_id': 'http',
})
formats.append(fmt)
f = {
'width': int_or_none(encoding.get('width')),
'height': int_or_none(encoding.get('height')),
'vbr': int_or_none(encoding.get('video_bitrate')),
'abr': int_or_none(encoding.get('audio_bitrate')),
'filesize': int_or_none(encoding.get('size_in_bytes')),
'vcodec': encoding.get('video_codec'),
'acodec': encoding.get('audio_codec'),
'container': encoding.get('container_type'),
}
for f_url in (encoding_url, file_url):
if not f_url:
continue
fmt = f.copy()
rtmp = re.search(r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+))/(?P<playpath>mp[34]:.+)$', f_url)
if rtmp:
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
'format_id': 'rtmp',
})
else:
fmt.update({
'url': f_url,
'format_id': 'http',
})
formats.append(fmt)
self._sort_formats(formats)
title = media['title']
subtitles = {}
for closed_caption in media.get('closed_captions', []):
sub_url = closed_caption.get('file')
@ -132,7 +145,7 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collection|series)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/(?:collections?|series)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://app.curiositystream.com/collection/2',
'info_dict': {
@ -140,10 +153,13 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?',
},
'playlist_mincount': 17,
'playlist_mincount': 16,
}, {
'url': 'https://curiositystream.com/series/2',
'only_matching': True,
}, {
'url': 'https://curiositystream.com/collections/36',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -32,6 +32,18 @@ class DigitallySpeakingIE(InfoExtractor):
# From http://www.gdcvault.com/play/1013700/Advanced-Material
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
'only_matching': True,
}, {
# From https://gdcvault.com/play/1016624, empty speakerVideo
'url': 'https://sevt.dispeak.com/ubm/gdc/online12/xml/201210-822101_1349794556671DDDD.xml',
'info_dict': {
'id': '201210-822101_1349794556671DDDD',
'ext': 'flv',
'title': 'Pre-launch - Preparing to Take the Plunge',
},
}, {
# From http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru, empty slideVideo
'url': 'http://events.digitallyspeaking.com/gdc/project25/xml/p25-miyamoto1999_1282467389849HSVB.xml',
'only_matching': True,
}]
def _parse_mp4(self, metadata):
@ -84,26 +96,20 @@ class DigitallySpeakingIE(InfoExtractor):
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
for video_key, format_id, preference in (
('slide', 'slides', -2), ('speaker', 'speaker', -1)):
video_path = xpath_text(metadata, './%sVideo' % video_key)
if not video_path:
continue
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(video_path, '.flv'),
'ext': 'flv',
'format_note': '%s video' % video_key,
'quality': preference,
'preference': preference,
'format_id': format_id,
})
return formats
def _real_extract(self, url):

View file

@ -1,193 +1,43 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
xpath_text,
determine_ext,
float_or_none,
ExtractorError,
)
from .zdf import ZDFIE
class DreiSatIE(InfoExtractor):
class DreiSatIE(ZDFIE):
IE_NAME = '3sat'
_GEO_COUNTRIES = ['DE']
_VALID_URL = r'https?://(?:www\.)?3sat\.de/mediathek/(?:(?:index|mediathek)\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)'
_TESTS = [
{
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
'md5': 'be37228896d30a88f315b638900a026e',
'info_dict': {
'id': '45918',
'ext': 'mp4',
'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'uploader': 'SCHWEIZWEIT',
'uploader_id': '100000210',
'upload_date': '20140913'
},
'params': {
'skip_download': True, # m3u8 downloads
}
_VALID_URL = r'https?://(?:www\.)?3sat\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
_TESTS = [{
# Same as https://www.zdf.de/dokumentation/ab-18/10-wochen-sommer-102.html
'url': 'https://www.3sat.de/film/ab-18/10-wochen-sommer-108.html',
'md5': '0aff3e7bc72c8813f5e0fae333316a1d',
'info_dict': {
'id': '141007_ab18_10wochensommer_film',
'ext': 'mp4',
'title': 'Ab 18! - 10 Wochen Sommer',
'description': 'md5:8253f41dc99ce2c3ff892dac2d65fe26',
'duration': 2660,
'timestamp': 1608604200,
'upload_date': '20201222',
},
{
'url': 'http://www.3sat.de/mediathek/mediathek.php?mode=play&obj=51066',
'only_matching': True,
}, {
'url': 'https://www.3sat.de/gesellschaft/schweizweit/waidmannsheil-100.html',
'info_dict': {
'id': '140913_sendung_schweizweit',
'ext': 'mp4',
'title': 'Waidmannsheil',
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
'timestamp': 1410623100,
'upload_date': '20140913'
},
]
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
param_groups = {}
for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
group_id = param_group.get(self._xpath_ns(
'id', 'http://www.w3.org/XML/1998/namespace'))
params = {}
for param in param_group:
params[param.get('name')] = param.get('value')
param_groups[group_id] = params
formats = []
for video in smil.findall(self._xpath_ns('.//video', namespace)):
src = video.get('src')
if not src:
continue
bitrate = int_or_none(self._search_regex(r'_(\d+)k', src, 'bitrate', None)) or float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
group_id = video.get('paramGroup')
param_group = param_groups[group_id]
for proto in param_group['protocols'].split(','):
formats.append({
'url': '%s://%s' % (proto, param_group['host']),
'app': param_group['app'],
'play_path': src,
'ext': 'flv',
'format_id': '%s-%d' % (proto, bitrate),
'tbr': bitrate,
})
self._sort_formats(formats)
return formats
def extract_from_xml_url(self, video_id, xml_url):
doc = self._download_xml(
xml_url, video_id,
note='Downloading video info',
errnote='Failed to download video info')
status_code = xpath_text(doc, './status/statuscode')
if status_code and status_code != 'ok':
if status_code == 'notVisibleAnymore':
message = 'Video %s is not available' % video_id
else:
message = '%s returned error: %s' % (self.IE_NAME, status_code)
raise ExtractorError(message, expected=True)
title = xpath_text(doc, './/information/title', 'title', True)
urls = []
formats = []
for fnode in doc.findall('.//formitaeten/formitaet'):
video_url = xpath_text(fnode, 'url')
if not video_url or video_url in urls:
continue
urls.append(video_url)
is_available = 'http://www.metafilegenerator' not in video_url
geoloced = 'static_geoloced_online' in video_url
if not is_available or geoloced:
continue
format_id = fnode.attrib['basetype']
format_m = re.match(r'''(?x)
(?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
(?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
''', format_id)
ext = determine_ext(video_url, None) or format_m.group('container')
if ext == 'meta':
continue
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
video_url, video_id, fatal=False))
elif ext == 'm3u8':
# the certificates are misconfigured (see
# https://github.com/ytdl-org/haruhi-dl/issues/8665)
if video_url.startswith('https://'):
continue
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id=format_id, fatal=False))
else:
quality = xpath_text(fnode, './quality')
if quality:
format_id += '-' + quality
abr = int_or_none(xpath_text(fnode, './audioBitrate'), 1000)
vbr = int_or_none(xpath_text(fnode, './videoBitrate'), 1000)
tbr = int_or_none(self._search_regex(
r'_(\d+)k', video_url, 'bitrate', None))
if tbr and vbr and not abr:
abr = tbr - vbr
formats.append({
'format_id': format_id,
'url': video_url,
'ext': ext,
'acodec': format_m.group('acodec'),
'vcodec': format_m.group('vcodec'),
'abr': abr,
'vbr': vbr,
'tbr': tbr,
'width': int_or_none(xpath_text(fnode, './width')),
'height': int_or_none(xpath_text(fnode, './height')),
'filesize': int_or_none(xpath_text(fnode, './filesize')),
'protocol': format_m.group('proto').lower(),
})
geolocation = xpath_text(doc, './/details/geolocation')
if not formats and geolocation and geolocation != 'none':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
self._sort_formats(formats)
thumbnails = []
for node in doc.findall('.//teaserimages/teaserimage'):
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
thumbnail_key = node.get('key')
if thumbnail_key:
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
upload_date = unified_strdate(xpath_text(doc, './/details/airtime'))
return {
'id': video_id,
'title': title,
'description': xpath_text(doc, './/information/detail'),
'duration': int_or_none(xpath_text(doc, './/details/lengthSec')),
'thumbnails': thumbnails,
'uploader': xpath_text(doc, './/details/originChannelTitle'),
'uploader_id': xpath_text(doc, './/details/originChannelId'),
'upload_date': upload_date,
'formats': formats,
'params': {
'skip_download': True,
}
def _real_extract(self, url):
video_id = self._match_id(url)
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?id=%s' % video_id
return self.extract_from_xml_url(video_id, details_url)
}, {
# Same as https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html
'url': 'https://www.3sat.de/film/spielfilm/der-hauptmann-100.html',
'only_matching': True,
}, {
# Same as https://www.zdf.de/wissen/nano/nano-21-mai-2019-102.html, equal media ids
'url': 'https://www.3sat.de/wissen/nano/nano-21-mai-2019-102.html',
'only_matching': True,
}]

View file

@ -1,15 +1,57 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
smuggle_url,
unified_strdate,
unsmuggle_url,
ExtractorError,
)
from ..compat import compat_urlparse
class DWIE(InfoExtractor):
class DWVideoIE(InfoExtractor):
IE_NAME = 'dw:video'
_VALID_URL = r'dw:(?P<id>\d+)'
def _get_dw_formats(self, media_id, hidden_inputs):
if hidden_inputs.get('player_type') == 'video':
# https://www.dw.com/smil/v-{video_id} returns more formats,
# but they are all RTMP. ytdl used to do this:
# url.replace('rtmp://tv-od.dw.de/flash/', 'http://tv-download.dw.de/dwtv_video/flv/')
# this returns formats, but it's completely random if they work or not.
formats = [{
'url': fmt['file'],
'format_code': fmt['label'],
'height': int_or_none(fmt['label']),
} for fmt in self._download_json(
'https://www.dw.com/playersources/v-%s' % media_id,
media_id, 'Downloading JSON formats')]
self._sort_formats(formats)
else:
formats = [{'url': hidden_inputs['file_name']}]
return {
'id': media_id,
'title': hidden_inputs['media_title'],
'formats': formats,
'duration': int_or_none(hidden_inputs.get('file_duration')),
'upload_date': hidden_inputs.get('display_date'),
'thumbnail': hidden_inputs.get('preview_image'),
'is_live': hidden_inputs.get('isLiveVideo'),
}
def _real_extract(self, url):
media_id = self._match_id(url)
_, hidden_inputs = unsmuggle_url(url)
if not hidden_inputs:
return self.url_result('https://www.dw.com/en/av-%s' % media_id, 'DW', media_id)
return self._get_dw_formats(media_id, hidden_inputs)
class DWIE(DWVideoIE):
IE_NAME = 'dw'
_VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+(?:av|e)-(?P<id>\d+)'
_TESTS = [{
@ -21,7 +63,7 @@ class DWIE(InfoExtractor):
'ext': 'mp4',
'title': 'Intelligent light',
'description': 'md5:90e00d5881719f2a6a5827cb74985af1',
'upload_date': '20160311',
'upload_date': '20160605',
}
}, {
# audio
@ -52,57 +94,57 @@ class DWIE(InfoExtractor):
media_id = self._match_id(url)
webpage = self._download_webpage(url, media_id)
hidden_inputs = self._hidden_inputs(webpage)
title = hidden_inputs['media_title']
media_id = hidden_inputs.get('media_id') or media_id
if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
formats = self._extract_smil_formats(
'http://www.dw.com/smil/v-%s' % media_id, media_id,
transform_source=lambda s: s.replace(
'rtmp://tv-od.dw.de/flash/',
'http://tv-download.dw.de/dwtv_video/flv/'))
self._sort_formats(formats)
else:
formats = [{'url': hidden_inputs['file_name']}]
info_dict = {
'description': self._og_search_description(webpage),
}
info_dict.update(self._get_dw_formats(media_id, hidden_inputs))
upload_date = hidden_inputs.get('display_date')
if not upload_date:
if info_dict.get('upload_date') is None:
upload_date = self._html_search_regex(
r'<span[^>]+class="date">([0-9.]+)\s*\|', webpage,
'upload date', default=None)
upload_date = unified_strdate(upload_date)
info_dict['upload_date'] = unified_strdate(upload_date)
return {
'id': media_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': hidden_inputs.get('preview_image'),
'duration': int_or_none(hidden_inputs.get('file_duration')),
'upload_date': upload_date,
'formats': formats,
}
return info_dict
class DWArticleIE(InfoExtractor):
class DWArticleIE(DWVideoIE):
IE_NAME = 'dw:article'
_VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+a-(?P<id>\d+)'
_TEST = {
'url': 'http://www.dw.com/en/no-hope-limited-options-for-refugees-in-idomeni/a-19111009',
'md5': '8ca657f9d068bbef74d6fc38b97fc869',
'url': 'https://www.dw.com/pl/zalecenie-ema-szczepmy-si%C4%99-astrazenec%C4%85/a-56919770',
'info_dict': {
'id': '19105868',
'id': '56911196',
'ext': 'mp4',
'title': 'The harsh life of refugees in Idomeni',
'description': 'md5:196015cc7e48ebf474db9399420043c7',
'upload_date': '20160310',
}
'title': 'Czy AstraZeneca jest bezpieczna?',
'upload_date': '20210318',
},
}
def _real_extract(self, url):
article_id = self._match_id(url)
webpage = self._download_webpage(url, article_id)
hidden_inputs = self._hidden_inputs(webpage)
media_id = hidden_inputs['media_id']
media_path = self._search_regex(r'href="([^"]+av-%s)"\s+class="overlayLink"' % media_id, webpage, 'media url')
media_url = compat_urlparse.urljoin(url, media_path)
return self.url_result(media_url, 'DW', media_id)
videos = re.finditer(
r'<div class="mediaItem" data-media-id="(?P<id>\d+)">(?P<hidden_inputs>.+?)<div',
webpage)
if not videos:
raise ExtractorError('No videos found')
entries = []
for video in videos:
video_id, hidden_inputs = video.group('id', 'hidden_inputs')
hidden_inputs = self._hidden_inputs(hidden_inputs)
entries.append({
'_type': 'url_transparent',
'title': hidden_inputs['media_title'],
'url': smuggle_url('dw:%s' % video_id, hidden_inputs),
'ie_key': 'DWVideo',
})
return {
'_type': 'playlist',
'entries': entries,
'id': article_id,
'title': self._html_search_regex(r'<h1>([^>]+)</h1>', webpage, 'article title'),
'description': self._og_search_description(webpage),
}

View file

@ -22,16 +22,19 @@ class EggheadBaseIE(InfoExtractor):
class EggheadCourseIE(EggheadBaseIE):
IE_DESC = 'egghead.io course'
IE_NAME = 'egghead:course'
_VALID_URL = r'https://egghead\.io/courses/(?P<id>[^/?#&]+)'
_TEST = {
_VALID_URL = r'https://(?:app\.)?egghead\.io/(?:course|playlist)s/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
'playlist_count': 29,
'info_dict': {
'id': '72',
'id': '432655',
'title': 'Professor Frisby Introduces Composable Functional JavaScript',
'description': 're:(?s)^This course teaches the ubiquitous.*You\'ll start composing functionality before you know it.$',
},
}
}, {
'url': 'https://app.egghead.io/playlists/professor-frisby-introduces-composable-functional-javascript',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
@ -65,7 +68,7 @@ class EggheadCourseIE(EggheadBaseIE):
class EggheadLessonIE(EggheadBaseIE):
IE_DESC = 'egghead.io lesson'
IE_NAME = 'egghead:lesson'
_VALID_URL = r'https://egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
_VALID_URL = r'https://(?:app\.)?egghead\.io/(?:api/v1/)?lessons/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'info_dict': {
@ -88,6 +91,9 @@ class EggheadLessonIE(EggheadBaseIE):
}, {
'url': 'https://egghead.io/api/v1/lessons/react-add-redux-to-a-react-application',
'only_matching': True,
}, {
'url': 'https://app.egghead.io/lessons/javascript-linear-data-flow-with-container-style-types-box',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -6,7 +6,7 @@ from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode
from ..utils import (
ExtractorError,
unescapeHTML
merge_dicts,
)
@ -24,7 +24,8 @@ class EroProfileIE(InfoExtractor):
'title': 'sexy babe softcore',
'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18,
}
},
'skip': 'Video not found',
}, {
'url': 'http://www.eroprofile.com/m/videos/view/Try-It-On-Pee_cut_2-wmv-4shared-com-file-sharing-download-movie-file',
'md5': '1baa9602ede46ce904c431f5418d8916',
@ -77,19 +78,15 @@ class EroProfileIE(InfoExtractor):
[r"glbUpdViews\s*\('\d*','(\d+)'", r'p/report/video/(\d+)'],
webpage, 'video id', default=None)
video_url = unescapeHTML(self._search_regex(
r'<source src="([^"]+)', webpage, 'video url'))
title = self._html_search_regex(
r'Title:</th><td>([^<]+)</td>', webpage, 'title')
thumbnail = self._search_regex(
r'onclick="showVideoPlayer\(\)"><img src="([^"]+)',
webpage, 'thumbnail', fatal=False)
(r'Title:</th><td>([^<]+)</td>', r'<h1[^>]*>(.+?)</h1>'),
webpage, 'title')
return {
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
return merge_dicts(info, {
'id': video_id,
'display_id': display_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'age_limit': 18,
}
})

View file

@ -17,6 +17,7 @@ from .academicearth import AcademicEarthCourseIE
from .acast import (
ACastIE,
ACastChannelIE,
ACastPlayerIE,
)
from .adn import ADNIE
from .adobeconnect import AdobeConnectIE
@ -82,6 +83,7 @@ from .arte import (
ArteTVEmbedIE,
ArteTVPlaylistIE,
)
from .arnes import ArnesIE
from .asiancrush import (
AsianCrushIE,
AsianCrushPlaylistIE,
@ -100,11 +102,13 @@ from .awaan import (
)
from .azmedien import AZMedienIE
from .baidu import BaiduVideoIE
from .bandaichannel import BandaiChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
from .bbc import (
BBCCoUkIE,
BBCCoUkArticleIE,
BBCCoUkIPlayerPlaylistIE,
BBCCoUkIPlayerEpisodesIE,
BBCCoUkIPlayerGroupIE,
BBCCoUkPlaylistIE,
BBCIE,
)
@ -139,7 +143,6 @@ from .bleacherreport import (
BleacherReportIE,
BleacherReportCMSIE,
)
from .blinkx import BlinkxIE
from .bloomberg import BloombergIE
from .bokecc import BokeCCIE
from .bongacams import BongaCamsIE
@ -180,6 +183,7 @@ from .carambatv import (
CarambaTVPageIE,
)
from .cartoonnetwork import CartoonNetworkIE
from .castos import CastosHostedIE
from .cbc import (
CBCIE,
CBCPlayerIE,
@ -198,7 +202,11 @@ from .cbsnews import (
CBSNewsIE,
CBSNewsLiveVideoIE,
)
from .cbssports import CBSSportsIE
from .cbssports import (
CBSSportsEmbedIE,
CBSSportsIE,
TwentyFourSevenSportsIE,
)
from .ccc import (
CCCIE,
CCCPlaylistIE,
@ -251,6 +259,7 @@ from .comedycentral import (
)
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import (
BitTorrentMagnetIE,
MmsIE,
RtmpIE,
)
@ -327,6 +336,7 @@ from .dropbox import DropboxIE
from .dw import (
DWIE,
DWArticleIE,
DWVideoIE,
)
from .eagleplatform import EaglePlatformIE
from .ebaumsworld import EbaumsWorldIE
@ -620,7 +630,11 @@ from .limelight import (
LimelightChannelIE,
LimelightChannelListIE,
)
from .line import LineTVIE
from .line import (
LineTVIE,
LineLiveIE,
LineLiveChannelIE,
)
from .linkedin import (
LinkedInPostIE,
LinkedInLearningIE,
@ -629,10 +643,6 @@ from .linkedin import (
from .linuxacademy import LinuxAcademyIE
from .litv import LiTVIE
from .livejournal import LiveJournalIE
from .liveleak import (
LiveLeakIE,
LiveLeakEmbedIE,
)
from .livestream import (
LivestreamIE,
LivestreamOriginalIE,
@ -648,6 +658,7 @@ from .lynda import (
LyndaCourseIE
)
from .m6 import M6IE
from .magentamusik360 import MagentaMusik360IE
from .mailru import (
MailRuIE,
MailRuMusicIE,
@ -659,6 +670,7 @@ from .mangomolo import (
MangomoloLiveIE,
)
from .manyvids import ManyVidsIE
from .maoritv import MaoriTVIE
from .markiza import (
MarkizaIE,
MarkizaPageIE,
@ -696,6 +708,7 @@ from .minds import (
from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE
from .miomio import MioMioIE
from .misskey import MisskeySHIE
from .mit import TechTVMITIE, OCWMITIE
from .mitele import MiTeleIE
from .mixcloud import (
@ -703,7 +716,10 @@ from .mixcloud import (
MixcloudUserIE,
MixcloudPlaylistIE,
)
from .mlb import MLBIE
from .mlb import (
MLBIE,
MLBVideoIE,
)
from .mnet import MnetIE
from .moevideo import MoeVideoIE
from .mofosex import (
@ -811,7 +827,6 @@ from .nick import (
NickRuIE,
)
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninateka import NinatekaIE
from .ninecninemedia import NineCNineMediaIE
from .ninegag import NineGagIE
from .ninenow import NineNowIE
@ -908,12 +923,23 @@ from .packtpub import (
PacktPubIE,
PacktPubCourseIE,
)
from .palcomp3 import (
PalcoMP3IE,
PalcoMP3ArtistIE,
PalcoMP3VideoIE,
)
from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .patronite import PatroniteAudioIE
from .pbs import PBSIE
from .pearvideo import PearVideoIE
from .peertube import PeerTubeSHIE
from .peertube import (
PeerTubeSHIE,
PeerTubePlaylistSHIE,
PeerTubeAccountSHIE,
PeerTubeChannelSHIE,
)
from .people import PeopleIE
from .performgroup import PerformGroupIE
from .periscope import (
@ -941,6 +967,7 @@ from .platzi import (
from .playfm import PlayFMIE
from .playplustv import PlayPlusTVIE
from .plays import PlaysTVIE
from .playstuff import PlayStuffIE
from .playtvak import PlaytvakIE
from .playvid import PlayvidIE
from .playwire import PlaywireIE
@ -955,6 +982,9 @@ from .polskieradio import (
PolskieRadioIE,
PolskieRadioCategoryIE,
PolskieRadioPlayerIE,
PolskieRadioPodcastIE,
PolskieRadioPodcastListIE,
PolskieRadioRadioKierowcowIE,
)
from .popcorntimes import PopcorntimesIE
from .popcorntv import PopcornTVIE
@ -1001,6 +1031,10 @@ from .radiode import RadioDeIE
from .radiojavan import RadioJavanIE
from .radiobremen import RadioBremenIE
from .radiofrance import RadioFranceIE
from .radiokapital import (
RadioKapitalIE,
RadioKapitalShowIE,
)
from .rai import (
RaiPlayIE,
RaiPlayLiveIE,
@ -1091,7 +1125,12 @@ from .scte import (
SCTECourseIE,
)
from .seeker import SeekerIE
from .sejmpl import (
SejmPlIE,
SejmPlVideoIE,
)
from .senateisvp import SenateISVPIE
from .senatpl import SenatPlIE
from .sendtonews import SendtoNewsIE
from .servus import ServusIE
from .sevenplus import SevenPlusIE
@ -1191,6 +1230,10 @@ from .spreaker import (
)
from .springboardplatform import SpringboardPlatformIE
from .sprout import SproutIE
from .spryciarze import (
SpryciarzeIE,
SpryciarzePageIE,
)
from .srgssr import (
SRGSSRIE,
SRGSSRPlayIE,
@ -1279,6 +1322,8 @@ from .threeqsdn import ThreeQSDNIE
from .tiktok import (
TikTokIE,
TikTokUserIE,
TikTokHashtagIE,
TikTokMusicIE,
)
from .tinypic import TinyPicIE
from .tmz import (
@ -1359,10 +1404,9 @@ from .tvc import (
from .tver import TVerIE
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvn24 import TVN24IE
from .tvnplayer import (
TVNPlayerIE,
TVNPlayerSeriesIE,
from .tvn24 import (
TVN24IE,
TVN24NuviIE,
)
from .tvnet import TVNetIE
from .tvnoe import TVNoeIE
@ -1468,6 +1512,8 @@ from .videomore import (
VideomoreSeasonIE,
)
from .videopress import VideoPressIE
from .videotarget import VideoTargetIE
from .vider import ViderIE
from .vidio import VidioIE
from .vidlii import VidLiiIE
from .vidme import (
@ -1516,7 +1562,7 @@ from .vk import (
from .vlive import (
VLiveIE,
VLiveChannelIE,
VLivePlaylistIE
VLivePostIE,
)
from .vodlocker import VodlockerIE
from .vodpl import VODPlIE
@ -1574,6 +1620,10 @@ from .weibo import (
from .weiqitv import WeiqiTVIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .wppilot import (
WPPilotIE,
WPPilotChannelsIE,
)
from .wppl import WpPlIE
from .wsj import (
WSJIE,
@ -1619,6 +1669,8 @@ from .yandexmusic import (
YandexMusicTrackIE,
YandexMusicAlbumIE,
YandexMusicPlaylistIE,
YandexMusicArtistTracksIE,
YandexMusicArtistAlbumsIE,
)
from .yandexvideo import YandexVideoIE
from .yapfiles import YapFilesIE
@ -1674,5 +1726,9 @@ from .zattoo import (
)
from .zdf import ZDFIE, ZDFChannelIE
from .zhihu import ZhihuIE
from .zingmp3 import ZingMp3IE
from .zingmp3 import (
ZingMp3IE,
ZingMp3AlbumIE,
)
from .zoom import ZoomIE
from .zype import ZypeIE

View file

@ -521,7 +521,10 @@ class FacebookIE(InfoExtractor):
raise ExtractorError(
'The video is not available, Facebook said: "%s"' % m_msg.group(1),
expected=True)
elif '>You must log in to continue' in webpage:
elif any(p in webpage for p in (
'>You must log in to continue',
'id="login_form"',
'id="loginbutton"')):
self.raise_login_required()
if not video_data and '/watchparty/' in url:

View file

@ -5,29 +5,23 @@ from .common import InfoExtractor
class Formula1IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?formula1\.com/(?:content/fom-website/)?en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html'
_TESTS = [{
'url': 'http://www.formula1.com/content/fom-website/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'md5': '8c79e54be72078b26b89e0e111c0502b',
_VALID_URL = r'https?://(?:www\.)?formula1\.com/en/latest/video\.[^.]+\.(?P<id>\d+)\.html'
_TEST = {
'url': 'https://www.formula1.com/en/latest/video.race-highlights-spain-2016.6060988138001.html',
'md5': 'be7d3a8c2f804eb2ab2aa5d941c359f8',
'info_dict': {
'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV',
'id': '6060988138001',
'ext': 'mp4',
'title': 'Race highlights - Spain 2016',
'timestamp': 1463332814,
'upload_date': '20160515',
'uploader_id': '6057949432001',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'only_matching': True,
}]
'add_ie': ['BrightcoveNew'],
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/6057949432001/S1WMrhjlh_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
ooyala_embed_code = self._search_regex(
r'data-videoid="([^"]+)"', webpage, 'ooyala embed code')
bc_id = self._match_id(url)
return self.url_result(
'ooyala:%s' % ooyala_embed_code, 'Ooyala', ooyala_embed_code)
self.BRIGHTCOVE_URL_TEMPLATE % bc_id, 'BrightcoveNew', bc_id)

View file

@ -164,7 +164,7 @@ class FranceTVIE(InfoExtractor):
ext = determine_ext(video_url)
if ext == 'f4m':
if georestricted:
# See https://github.com/ytdl-org/haruhi-dl/issues/3963
# See https://github.com/ytdl-org/youtube-dl/issues/3963
# m3u8 urls work fine
continue
formats.extend(self._extract_f4m_formats(
@ -383,6 +383,10 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
}, {
'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
'only_matching': True,
}, {
# "<figure id=" pattern (#28792)
'url': 'https://www.francetvinfo.fr/culture/patrimoine/incendie-de-notre-dame-de-paris/notre-dame-de-paris-de-l-incendie-de-la-cathedrale-a-sa-reconstruction_4372291.html',
'only_matching': True,
}]
def _real_extract(self, url):
@ -399,7 +403,8 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
video_id = self._search_regex(
(r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'),
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
r'(?:data-id|<figure[^<]+\bid)=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
webpage, 'video id')
return self._make_url_result(video_id)

View file

@ -17,7 +17,7 @@ class FujiTVFODPlus7IE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
formats = self._extract_m3u8_formats(
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id)
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id, 'mp4')
for f in formats:
wh = self._BITRATE_MAP.get(f.get('tbr'))
if wh:

View file

@ -16,7 +16,7 @@ from ..utils import (
class FunimationIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/shows/[^/]+/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?:[^/]+/)?shows/[^/]+/(?P<id>[^/?#&]+)'
_NETRC_MACHINE = 'funimation'
_TOKEN = None
@ -51,6 +51,10 @@ class FunimationIE(InfoExtractor):
}, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
'only_matching': True,
}, {
# with lang code
'url': 'https://www.funimation.com/en/shows/hacksign/role-play/',
'only_matching': True,
}]
def _login(self):

View file

@ -6,6 +6,7 @@ from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
HEADRequest,
remove_start,
sanitized_Request,
smuggle_url,
urlencode_postdata,
@ -102,6 +103,26 @@ class GDCVaultIE(InfoExtractor):
'format': 'mp4-408',
},
},
{
# Kaltura embed, whitespace between quote and embedded URL in iframe's src
'url': 'https://www.gdcvault.com/play/1025699',
'info_dict': {
'id': '0_zagynv0a',
'ext': 'mp4',
'title': 'Tech Toolbox',
'upload_date': '20190408',
'uploader_id': 'joe@blazestreaming.com',
'timestamp': 1554764629,
},
'params': {
'skip_download': True,
},
},
{
# HTML5 video
'url': 'http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru',
'only_matching': True,
},
]
def _login(self, webpage_url, display_id):
@ -175,7 +196,18 @@ class GDCVaultIE(InfoExtractor):
xml_name = self._html_search_regex(
r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename')
start_page, 'xml filename', default=None)
if not xml_name:
info = self._parse_html5_media_entries(url, start_page, video_id)[0]
info.update({
'title': remove_start(self._search_regex(
r'>Session Name:\s*<.*?>\s*<td>(.+?)</td>', start_page,
'title', default=None) or self._og_search_title(
start_page, default=None), 'GDC Vault - '),
'id': video_id,
'display_id': display_id,
})
return info
embed_url = '%s/xml/%s' % (xml_root, xml_name)
ie_key = 'DigitallySpeaking'

View file

@ -84,7 +84,6 @@ from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE
from .arkena import ArkenaIE
from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE
from .kaltura import KalturaIE
@ -136,6 +135,12 @@ from .pulsembed import PulsEmbedIE
from .arcpublishing import ArcPublishingIE
from .medialaan import MedialaanIE
from .simplecast import SimplecastIE
from .spreaker import SpreakerIE
from .castos import (
CastosHostedIE,
CastosSSPIE,
)
from .vk import VKIE
class GenericIE(InfoExtractor):
@ -487,7 +492,7 @@ class GenericIE(InfoExtractor):
},
},
{
# https://github.com/ytdl-org/haruhi-dl/issues/2253
# https://github.com/ytdl-org/youtube-dl/issues/2253
'url': 'http://bcove.me/i6nfkrc3',
'md5': '0ba9446db037002366bab3b3eb30c88c',
'info_dict': {
@ -512,7 +517,7 @@ class GenericIE(InfoExtractor):
},
},
{
# https://github.com/ytdl-org/haruhi-dl/issues/3541
# https://github.com/ytdl-org/youtube-dl/issues/3541
'add_ie': ['BrightcoveLegacy'],
'url': 'http://www.kijk.nl/sbs6/leermijvrouwenkennen/videos/jqMiXKAYan2S/aflevering-1',
'info_dict': {
@ -976,7 +981,7 @@ class GenericIE(InfoExtractor):
}
},
# Multiple brightcove videos
# https://github.com/ytdl-org/haruhi-dl/issues/2283
# https://github.com/ytdl-org/youtube-dl/issues/2283
{
'url': 'http://www.newyorker.com/online/blogs/newsdesk/2014/01/always-never-nuclear-command-and-control.html',
'info_dict': {
@ -1634,34 +1639,6 @@ class GenericIE(InfoExtractor):
'upload_date': '20160409',
},
},
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': '7619da8c820e835bef21a1efa2a0fc71',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
},
'add_ie': [LiveLeakIE.ie_key()],
'params': {
'force_generic_extractor': True,
},
},
# Another LiveLeak embed pattern (#13336)
{
'url': 'https://milo.yiannopoulos.net/2017/06/concealed-carry-robbery/',
'info_dict': {
'id': '2eb_1496309988',
'ext': 'mp4',
'title': 'Thief robs place where everyone was armed',
'description': 'md5:694d73ee79e535953cf2488562288eee',
'uploader': 'brazilwtf',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Duplicated embedded video URLs
{
'url': 'http://www.hudl.com/athlete/2538180/highlights/149298443',
@ -2212,12 +2189,12 @@ class GenericIE(InfoExtractor):
# OnNetwork.tv embed
'url': 'https://wiadomosci.gazeta.pl/wiadomosci/7,114883,26377890,panstwo-polskie-nie-uznaje-takich-rodzin-jak-nasza-i-krzywdzi.html',
'info_dict': {
'id': '337382',
'title': 'Rodzina+ odc. 1. Karolina i Ania',
'ext': 'm3u8',
'age_limit': 16,
'upload_date': '20200929',
'id': '7,114883,26377890,panstwo-polskie-nie-uznaje-takich-rodzin-jak-nasza-i-krzywdzi',
'title': '"Państwo polskie nie uznaje takich rodzin jak nasza i krzywdzi w ten sposób dzieci" [RODZINA+]',
'uploader': 'wiadomosci.gazeta.pl',
},
# 1x onnetwork:script, which resolves to onnetwork:frame
'playlist_count': 1,
},
{
# Embetty video embeds (youtube, vimeo, facebook)
@ -2301,6 +2278,43 @@ class GenericIE(InfoExtractor):
},
'playlist_mincount': 52,
},
{
# Spreaker embed
'url': 'https://socjalizm.fm/jak-bedzie-w-socjalizmie/praca/',
'info_dict': {
'id': '44098221',
'ext': 'mp3',
'title': 'Jak będzie w socjalizmie? Praca.',
'uploader': 'Socjalizm FM',
'description': 'md5:d2833c41296a996153353890c329e1af',
'upload_date': '20210329',
'uploader_id': '13705223',
'timestamp': 1617024666,
},
},
{
# Castos (hosted) player
'url': 'https://castos.com/enhanced-podcast-player/',
'info_dict': {
'id': '210448',
'ext': 'mp3',
'title': '4 Ways To Create A Video Podcast (And Why You Should Try It)',
},
},
{
# Castos Super Simple Podcasting (WordPress plugin, selfhosted)
'url': 'https://pzbn.pl/4-heated-terf-moment/',
'info_dict': {
'id': '38',
'ext': 'mp3',
'title': '#4: Heated TERF moment',
},
},
{
# Sibnet embed (https://help.sibnet.ru/?sibnet_video_embed)
'url': 'https://phpbb3.x-tk.ru/bbcode-video-sibnet-t24.html',
'only_matching': True,
},
]
def report_following_redirect(self, new_url):
@ -2482,17 +2496,20 @@ class GenericIE(InfoExtractor):
# Check for direct link to a video
content_type = head_response.headers.get('Content-Type', '').lower()
m = re.match(r'^(?P<type>audio|video|application(?=/(?:ogg$|(?:vnd\.apple\.|x-)?mpegurl)))/(?P<format_id>[^;\s]+)', content_type)
m = re.match(r'^(?P<type>audio|video|application(?=/(?:ogg$|(?:vnd\.(?:apple|mpeg\.dash)\.|x-)?(?:mpegurl|mpd|bittorrent))))/(?P<format_id>[^;\s]+)', content_type)
if m:
format_id = compat_str(m.group('format_id'))
if format_id.endswith('mpegurl'):
formats = self._extract_m3u8_formats(url, video_id, 'mp4')
elif format_id == 'f4m':
formats = self._extract_f4m_formats(url, video_id)
elif format_id.endswith('mpd'):
formats = self._extract_mpd_formats(url, video_id)
else:
formats = [{
'format_id': format_id,
'url': url,
'protocol': 'bittorrent' if format_id.endswith('bittorrent') else None,
'vcodec': 'none' if m.group('type') == 'audio' else None
}]
info_dict['direct'] = True
@ -2526,6 +2543,20 @@ class GenericIE(InfoExtractor):
self._sort_formats(info_dict['formats'])
return info_dict
# Is it a BitTorrent manifest file?
if any(first_bytes.startswith(byt) for byt in (
b'd8:announce',
b'd13:announce-list',
b'd7:comment',
b'd4:info',
)):
info_dict['formats'] = [{
'url': url,
'protocol': 'bittorrent',
}]
# info_dict['direct'] = True
return info_dict
# Maybe it's a direct link to a video?
# Be careful not to download the whole thing!
if not is_html(first_bytes):
@ -2593,12 +2624,12 @@ class GenericIE(InfoExtractor):
return camtasia_res
# Sometimes embedded video player is hidden behind percent encoding
# (e.g. https://github.com/ytdl-org/haruhi-dl/issues/2448)
# (e.g. https://github.com/ytdl-org/youtube-dl/issues/2448)
# Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse_unquote(webpage)
# Unescape squarespace embeds to be detected by generic extractor,
# see https://github.com/ytdl-org/haruhi-dl/issues/21294
# see https://github.com/ytdl-org/youtube-dl/issues/21294
webpage = re.sub(
r'<div[^>]+class=[^>]*?\bsqs-video-wrapper\b[^>]*>',
lambda x: unescapeHTML(x.group(0)), webpage)
@ -2684,7 +2715,6 @@ class GenericIE(InfoExtractor):
SoundcloudEmbedIE,
TuneInBaseIE,
JWPlatformIE,
LiveLeakIE,
DBTVIE,
VideaIE,
TwentyMinutenIE,
@ -2722,6 +2752,8 @@ class GenericIE(InfoExtractor):
ArcPublishingIE,
MedialaanIE,
SimplecastIE,
SpreakerIE,
CastosHostedIE,
):
try:
ie_key = embie.ie_key()
@ -2933,7 +2965,7 @@ class GenericIE(InfoExtractor):
webpage)
if not mobj:
mobj = re.search(
r'data-video-link=["\'](?P<url>http://m.mlb.com/video/[^"\']+)',
r'data-video-link=["\'](?P<url>http://m\.mlb\.com/video/[^"\']+)',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'MLB')
@ -3184,6 +3216,15 @@ class GenericIE(InfoExtractor):
if pulsembed_entries:
return self.playlist_result(pulsembed_entries, video_id, video_title)
castos_ssp_entries = CastosSSPIE._extract_entries(webpage)
if castos_ssp_entries:
return self.playlist_result(castos_ssp_entries, video_id, video_title)
# Look for sibnet embedded player
sibnet_urls = VKIE._extract_sibnet_urls(webpage)
if sibnet_urls:
return self.playlist_from_matches(sibnet_urls, video_id, video_title)
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries:
@ -3210,7 +3251,7 @@ class GenericIE(InfoExtractor):
jwplayer_data, video_id, require_title=False, base_url=url)
return merge_dicts(info, info_dict)
except ExtractorError:
# See https://github.com/ytdl-org/haruhi-dl/pull/16735
# See https://github.com/ytdl-org/youtube-dl/pull/16735
pass
# Video.js embed
@ -3247,6 +3288,9 @@ class GenericIE(InfoExtractor):
'url': src,
'ext': (mimetype2ext(src_type)
or ext if ext in KNOWN_EXTENSIONS else 'mp4'),
'http_headers': {
'Referer': full_response.geturl(),
},
})
if formats:
self._sort_formats(formats)
@ -3315,7 +3359,7 @@ class GenericIE(InfoExtractor):
m_video_type = re.findall(r'<meta.*?property="og:video:type".*?content="video/(.*?)"', webpage)
# We only look in og:video if the MIME type is a video, don't try if it's a Flash player:
if m_video_type is not None:
found = filter_video(re.findall(r'<meta.*?property="og:video".*?content="(.*?)"', webpage))
found = filter_video(re.findall(r'<meta.*?property="og:(?:video|audio)".*?content="(.*?)"', webpage))
if not found:
REDIRECT_REGEX = r'[0-9]{,2};\s*(?:URL|url)=\'?([^\'"]+)'
found = re.search(

View file

@ -155,7 +155,7 @@ class GoIE(AdobePassIE):
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)',
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
r'\bvideoIdCode["\']\s*:\s*["\'](vdka\w+)'
), webpage, 'video id', default=video_id)
if not site_info:
brand = self._search_regex(

View file

@ -36,7 +36,7 @@ class GoogleDriveIE(InfoExtractor):
}
}, {
# video can't be watched anonymously due to view count limit reached,
# but can be downloaded (see https://github.com/ytdl-org/haruhi-dl/issues/14046)
# but can be downloaded (see https://github.com/ytdl-org/youtube-dl/issues/14046)
'url': 'https://drive.google.com/file/d/0B-vUyvmDLdWDcEt4WjBqcmI2XzQ/view',
'md5': 'bfbd670d03a470bb1e6d4a257adec12e',
'info_dict': {

View file

@ -12,6 +12,7 @@ from ..compat import (
)
from ..utils import (
ExtractorError,
float_or_none,
get_element_by_attribute,
int_or_none,
lowercase_escape,
@ -32,6 +33,7 @@ class InstagramIE(InfoExtractor):
'title': 'Video by naomipq',
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1371748545,
'upload_date': '20130620',
'uploader_id': 'naomipq',
@ -48,6 +50,7 @@ class InstagramIE(InfoExtractor):
'ext': 'mp4',
'title': 'Video by britneyspears',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 0,
'timestamp': 1453760977,
'upload_date': '20160125',
'uploader_id': 'britneyspears',
@ -86,6 +89,24 @@ class InstagramIE(InfoExtractor):
'title': 'Post by instagram',
'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957',
},
}, {
# IGTV
'url': 'https://www.instagram.com/tv/BkfuX9UB-eK/',
'info_dict': {
'id': 'BkfuX9UB-eK',
'ext': 'mp4',
'title': 'Fingerboarding Tricks with @cass.fb',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 53.83,
'timestamp': 1530032919,
'upload_date': '20180626',
'uploader_id': 'instagram',
'uploader': 'Instagram',
'like_count': int,
'comment_count': int,
'comments': list,
'description': 'Meet Cass Hirst (@cass.fb), a fingerboarding pro who can perform tiny ollies and kickflips while blindfolded.',
}
}, {
'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True,
@ -141,7 +162,7 @@ class InstagramIE(InfoExtractor):
lambda x: x['entry_data']['PostPage'][0]['media']),
dict)
# _sharedData.entry_data.PostPage is empty when authenticated (see
# https://github.com/hdl-org/haruhi-dl/pull/22880)
# https://github.com/ytdl-org/youtube-dl/pull/22880)
if not media:
additional_data = self._parse_json(
self._search_regex(
@ -159,7 +180,9 @@ class InstagramIE(InfoExtractor):
description = try_get(
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
compat_str) or media.get('caption')
title = media.get('title')
thumbnail = media.get('display_src') or media.get('display_url')
duration = float_or_none(media.get('video_duration'))
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
uploader = media.get('owner', {}).get('full_name')
uploader_id = media.get('owner', {}).get('username')
@ -200,9 +223,10 @@ class InstagramIE(InfoExtractor):
continue
entries.append({
'id': node.get('shortcode') or node['id'],
'title': 'Video %d' % edge_num,
'title': node.get('title') or 'Video %d' % edge_num,
'url': node_video_url,
'thumbnail': node.get('display_url'),
'duration': float_or_none(node.get('video_duration')),
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
'view_count': int_or_none(node.get('video_view_count')),
@ -239,8 +263,9 @@ class InstagramIE(InfoExtractor):
'id': video_id,
'formats': formats,
'ext': 'mp4',
'title': 'Video by %s' % uploader_id,
'title': title or 'Video by %s' % uploader_id,
'description': description,
'duration': duration,
'thumbnail': thumbnail,
'timestamp': timestamp,
'uploader_id': uploader_id,

View file

@ -8,6 +8,7 @@ from .common import InfoExtractor
from ..utils import (
int_or_none,
url_or_none,
ExtractorError,
)
@ -79,7 +80,11 @@ class IplaIE(InfoExtractor):
'Content-type': 'application/json'
}
res = self._download_json('http://b2c-mobile.redefine.pl/rpc/navigation/', media_id, data=req, headers=headers)
res = self._download_json('https://b2c-mobile.redefine.pl/rpc/navigation/', media_id, data=req, headers=headers)
if not res.get('result'):
if res['error']['code'] == 13404:
raise ExtractorError('Video requires DRM protection', expected=True)
raise ExtractorError(f"Ipla said: {res['error']['message']} - {res['error']['data']['userMessage']}")
return res['result']['mediaItem']
def get_url(self, media_id, source_id):
@ -93,4 +98,6 @@ class IplaIE(InfoExtractor):
}
res = self._download_json('https://b2c-mobile.redefine.pl/rpc/drm/', media_id, data=req, headers=headers)
if not res.get('result'):
raise ExtractorError(f"Ipla said: {res['error']['message']} - {res['error']['data']['userMessage']}")
return res['result']['url']

View file

@ -29,34 +29,51 @@ class JamendoIE(InfoExtractor):
'id': '196219',
'display_id': 'stories-from-emona-i',
'ext': 'flac',
'title': 'Maya Filipič - Stories from Emona I',
'artist': 'Maya Filipič',
# 'title': 'Maya Filipič - Stories from Emona I',
'title': 'Stories from Emona I',
# 'artist': 'Maya Filipič',
'track': 'Stories from Emona I',
'duration': 210,
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1217438117,
'upload_date': '20080730',
'license': 'by-nc-nd',
'view_count': int,
'like_count': int,
'average_rating': int,
'tags': ['piano', 'peaceful', 'newage', 'strings', 'upbeat'],
}
}, {
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
'only_matching': True,
}]
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
def _real_extract(self, url):
track_id, display_id = self._VALID_URL_RE.match(url).groups()
webpage = self._download_webpage(
'https://www.jamendo.com/track/' + track_id, track_id)
models = self._parse_json(self._html_search_regex(
r"data-bundled-models='([^']+)",
webpage, 'bundled models'), track_id)
track = models['track']['models'][0]
# webpage = self._download_webpage(
# 'https://www.jamendo.com/track/' + track_id, track_id)
# models = self._parse_json(self._html_search_regex(
# r"data-bundled-models='([^']+)",
# webpage, 'bundled models'), track_id)
# track = models['track']['models'][0]
track = self._call_api('track', track_id)
title = track_name = track['name']
get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
artist = get_model('artist')
artist_name = artist.get('name')
if artist_name:
title = '%s - %s' % (artist_name, title)
album = get_model('album')
# get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
# artist = get_model('artist')
# artist_name = artist.get('name')
# if artist_name:
# title = '%s - %s' % (artist_name, title)
# album = get_model('album')
formats = [{
'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
@ -74,7 +91,7 @@ class JamendoIE(InfoExtractor):
urls = []
thumbnails = []
for _, covers in track.get('cover', {}).items():
for covers in (track.get('cover') or {}).values():
for cover_id, cover_url in covers.items():
if not cover_url or cover_url in urls:
continue
@ -88,13 +105,14 @@ class JamendoIE(InfoExtractor):
})
tags = []
for tag in track.get('tags', []):
for tag in (track.get('tags') or []):
tag_name = tag.get('name')
if not tag_name:
continue
tags.append(tag_name)
stats = track.get('stats') or {}
license = track.get('licenseCC') or []
return {
'id': track_id,
@ -103,11 +121,11 @@ class JamendoIE(InfoExtractor):
'title': title,
'description': track.get('description'),
'duration': int_or_none(track.get('duration')),
'artist': artist_name,
# 'artist': artist_name,
'track': track_name,
'album': album.get('name'),
# 'album': album.get('name'),
'formats': formats,
'license': '-'.join(track.get('licenseCC', [])) or None,
'license': '-'.join(license) if license else None,
'timestamp': int_or_none(track.get('dateCreated')),
'view_count': int_or_none(stats.get('listenedAll')),
'like_count': int_or_none(stats.get('favorited')),
@ -116,9 +134,9 @@ class JamendoIE(InfoExtractor):
}
class JamendoAlbumIE(InfoExtractor):
class JamendoAlbumIE(JamendoIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)'
_TEST = {
_TESTS = [{
'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
'info_dict': {
'id': '121486',
@ -151,17 +169,7 @@ class JamendoAlbumIE(InfoExtractor):
'params': {
'playlistend': 2
}
}
def _call_api(self, resource, resource_id):
path = '/api/%ss' % resource
rand = compat_str(random.random())
return self._download_json(
'https://www.jamendo.com' + path, resource_id, query={
'id[]': resource_id,
}, headers={
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
})[0]
}]
def _real_extract(self, url):
album_id = self._match_id(url)
@ -169,7 +177,7 @@ class JamendoAlbumIE(InfoExtractor):
album_name = album.get('name')
entries = []
for track in album.get('tracks', []):
for track in (album.get('tracks') or []):
track_id = track.get('id')
if not track_id:
continue

View file

@ -120,7 +120,7 @@ class KalturaIE(InfoExtractor):
def _extract_urls(webpage, url=None):
# Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
finditer = (
re.finditer(
list(re.finditer(
r"""(?xs)
kWidget\.(?:thumb)?[Ee]mbed\(
\{.*?
@ -128,8 +128,8 @@ class KalturaIE(InfoExtractor):
(?P<q2>['"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
(?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
""", webpage)
or re.finditer(
""", webpage))
or list(re.finditer(
r'''(?xs)
(?P<q1>["'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
@ -142,16 +142,16 @@ class KalturaIE(InfoExtractor):
\[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s*
)
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage)
or re.finditer(
''', webpage))
or list(re.finditer(
r'''(?xs)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])\s*
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?:(?!(?P=q1)).)*
[?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
(?:(?!(?P=q1)).)*
(?P=q1)
''', webpage)
''', webpage))
)
urls = []
for mobj in finditer:

View file

@ -4,7 +4,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import js_to_json
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
js_to_json,
str_or_none,
)
class LineTVIE(InfoExtractor):
@ -88,3 +94,137 @@ class LineTVIE(InfoExtractor):
for thumbnail in video_info.get('thumbnails', {}).get('list', [])],
'view_count': video_info.get('meta', {}).get('count'),
}
class LineLiveBaseIE(InfoExtractor):
_API_BASE_URL = 'https://live-api.line-apps.com/web/v4.0/channel/'
def _parse_broadcast_item(self, item):
broadcast_id = compat_str(item['id'])
title = item['title']
is_live = item.get('isBroadcastingNow')
thumbnails = []
for thumbnail_id, thumbnail_url in (item.get('thumbnailURLs') or {}).items():
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail_url,
})
channel = item.get('channel') or {}
channel_id = str_or_none(channel.get('id'))
return {
'id': broadcast_id,
'title': self._live_title(title) if is_live else title,
'thumbnails': thumbnails,
'timestamp': int_or_none(item.get('createdAt')),
'channel': channel.get('name'),
'channel_id': channel_id,
'channel_url': 'https://live.line.me/channels/' + channel_id if channel_id else None,
'duration': int_or_none(item.get('archiveDuration')),
'view_count': int_or_none(item.get('viewerCount')),
'comment_count': int_or_none(item.get('chatCount')),
'is_live': is_live,
}
class LineLiveIE(LineLiveBaseIE):
_VALID_URL = r'https?://live\.line\.me/channels/(?P<channel_id>\d+)/broadcast/(?P<id>\d+)'
_TESTS = [{
'url': 'https://live.line.me/channels/4867368/broadcast/16331360',
'md5': 'bc931f26bf1d4f971e3b0982b3fab4a3',
'info_dict': {
'id': '16331360',
'title': '振りコピ講座😙😙😙',
'ext': 'mp4',
'timestamp': 1617095132,
'upload_date': '20210330',
'channel': '白川ゆめか',
'channel_id': '4867368',
'view_count': int,
'comment_count': int,
'is_live': False,
}
}, {
# archiveStatus == 'DELETED'
'url': 'https://live.line.me/channels/4778159/broadcast/16378488',
'only_matching': True,
}]
def _real_extract(self, url):
channel_id, broadcast_id = re.match(self._VALID_URL, url).groups()
broadcast = self._download_json(
self._API_BASE_URL + '%s/broadcast/%s' % (channel_id, broadcast_id),
broadcast_id)
item = broadcast['item']
info = self._parse_broadcast_item(item)
protocol = 'm3u8' if info['is_live'] else 'm3u8_native'
formats = []
for k, v in (broadcast.get(('live' if info['is_live'] else 'archived') + 'HLSURLs') or {}).items():
if not v:
continue
if k == 'abr':
formats.extend(self._extract_m3u8_formats(
v, broadcast_id, 'mp4', protocol,
m3u8_id='hls', fatal=False))
continue
f = {
'ext': 'mp4',
'format_id': 'hls-' + k,
'protocol': protocol,
'url': v,
}
if not k.isdigit():
f['vcodec'] = 'none'
formats.append(f)
if not formats:
archive_status = item.get('archiveStatus')
if archive_status != 'ARCHIVED':
raise ExtractorError('this video has been ' + archive_status.lower(), expected=True)
self._sort_formats(formats)
info['formats'] = formats
return info
class LineLiveChannelIE(LineLiveBaseIE):
_VALID_URL = r'https?://live\.line\.me/channels/(?P<id>\d+)(?!/broadcast/\d+)(?:[/?&#]|$)'
_TEST = {
'url': 'https://live.line.me/channels/5893542',
'info_dict': {
'id': '5893542',
'title': 'いくらちゃん',
'description': 'md5:c3a4af801f43b2fac0b02294976580be',
},
'playlist_mincount': 29
}
def _archived_broadcasts_entries(self, archived_broadcasts, channel_id):
while True:
for row in (archived_broadcasts.get('rows') or []):
share_url = str_or_none(row.get('shareURL'))
if not share_url:
continue
info = self._parse_broadcast_item(row)
info.update({
'_type': 'url',
'url': share_url,
'ie_key': LineLiveIE.ie_key(),
})
yield info
if not archived_broadcasts.get('hasNextPage'):
return
archived_broadcasts = self._download_json(
self._API_BASE_URL + channel_id + '/archived_broadcasts',
channel_id, query={
'lastId': info['id'],
})
def _real_extract(self, url):
channel_id = self._match_id(url)
channel = self._download_json(self._API_BASE_URL + channel_id, channel_id)
return self.playlist_result(
self._archived_broadcasts_entries(channel.get('archivedBroadcasts') or {}, channel_id),
channel_id, channel.get('title'), channel.get('information'))

View file

@ -1,191 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class LiveLeakIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?liveleak\.com/view\?.*?\b[it]=(?P<id>[\w_]+)'
_TESTS = [{
'url': 'http://www.liveleak.com/view?i=757_1364311680',
'md5': '0813c2430bea7a46bf13acf3406992f4',
'info_dict': {
'id': '757_1364311680',
'ext': 'mp4',
'description': 'extremely bad day for this guy..!',
'uploader': 'ljfriel2',
'title': 'Most unlucky car accident',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
'url': 'http://www.liveleak.com/view?i=f93_1390833151',
'md5': 'd3f1367d14cc3c15bf24fbfbe04b9abf',
'info_dict': {
'id': 'f93_1390833151',
'ext': 'mp4',
'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
'uploader': 'ARD_Stinkt',
'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
# Prochan embed
'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
'md5': '42c6d97d54f1db107958760788c5f48f',
'info_dict': {
'id': '4f7_1392687779',
'ext': 'mp4',
'description': "The guy with the cigarette seems amazingly nonchalant about the whole thing... I really hope my friends' reactions would be a bit stronger.\r\n\r\nAction-go to 0:55.",
'uploader': 'CapObveus',
'title': 'Man is Fatally Struck by Reckless Car While Packing up a Moving Truck',
'age_limit': 18,
},
'skip': 'Video is dead',
}, {
# Covers https://github.com/ytdl-org/haruhi-dl/pull/5983
# Multiple resolutions
'url': 'http://www.liveleak.com/view?i=801_1409392012',
'md5': 'c3a449dbaca5c0d1825caecd52a57d7b',
'info_dict': {
'id': '801_1409392012',
'ext': 'mp4',
'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
'uploader': 'bony333',
'title': 'Crazy Hungarian tourist films close call waterspout in Croatia',
'thumbnail': r're:^https?://.*\.jpg$'
}
}, {
# Covers https://github.com/ytdl-org/haruhi-dl/pull/10664#issuecomment-247439521
'url': 'http://m.liveleak.com/view?i=763_1473349649',
'add_ie': ['Youtube'],
'info_dict': {
'id': '763_1473349649',
'ext': 'mp4',
'title': 'Reporters and public officials ignore epidemic of black on asian violence in Sacramento | Colin Flaherty',
'description': 'Colin being the warrior he is and showing the injustice Asians in Sacramento are being subjected to.',
'uploader': 'Ziz',
'upload_date': '20160908',
'uploader_id': 'UCEbta5E_jqlZmEJsriTEtnw'
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.liveleak.com/view?i=677_1439397581',
'info_dict': {
'id': '677_1439397581',
'title': 'Fuel Depot in China Explosion caught on video',
},
'playlist_count': 3,
}, {
'url': 'https://www.liveleak.com/view?t=HvHi_1523016227',
'only_matching': True,
}, {
# No original video
'url': 'https://www.liveleak.com/view?t=C26ZZ_1558612804',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage, **kwargs):
return re.findall(
r'<iframe[^>]+src="(https?://(?:\w+\.)?liveleak\.com/ll_embed\?[^"]*[ift]=[\w_]+[^"]+)"',
webpage)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_title = self._og_search_title(webpage).replace('LiveLeak.com -', '').strip()
video_description = self._og_search_description(webpage)
video_uploader = self._html_search_regex(
r'By:.*?(\w+)</a>', webpage, 'uploader', fatal=False)
age_limit = int_or_none(self._search_regex(
r'you confirm that you are ([0-9]+) years and over.',
webpage, 'age limit', default=None))
video_thumbnail = self._og_search_thumbnail(webpage)
entries = self._parse_html5_media_entries(url, webpage, video_id)
if not entries:
# Maybe an embed?
embed_url = self._search_regex(
r'<iframe[^>]+src="((?:https?:)?//(?:www\.)?(?:prochan|youtube)\.com/embed[^"]+)"',
webpage, 'embed URL')
return {
'_type': 'url_transparent',
'url': embed_url,
'id': video_id,
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
}
for idx, info_dict in enumerate(entries):
formats = []
for a_format in info_dict['formats']:
if not a_format.get('height'):
a_format['height'] = int_or_none(self._search_regex(
r'([0-9]+)p\.mp4', a_format['url'], 'height label',
default=None))
formats.append(a_format)
# Removing '.*.mp4' gives the raw video, which is essentially
# the same video without the LiveLeak logo at the top (see
# https://github.com/ytdl-org/haruhi-dl/pull/4768)
orig_url = re.sub(r'\.mp4\.[^.]+', '', a_format['url'])
if a_format['url'] != orig_url:
format_id = a_format.get('format_id')
format_id = 'original' + ('-' + format_id if format_id else '')
if self._is_valid_url(orig_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': orig_url,
'preference': 1,
})
self._sort_formats(formats)
info_dict['formats'] = formats
# Don't append entry ID for one-video pages to keep backward compatibility
if len(entries) > 1:
info_dict['id'] = '%s_%s' % (video_id, idx + 1)
else:
info_dict['id'] = video_id
info_dict.update({
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
'thumbnail': video_thumbnail,
})
return self.playlist_result(entries, video_id, video_title)
class LiveLeakEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?liveleak\.com/ll_embed\?.*?\b(?P<kind>[ift])=(?P<id>[\w_]+)'
# See generic.py for actual test cases
_TESTS = [{
'url': 'https://www.liveleak.com/ll_embed?i=874_1459135191',
'only_matching': True,
}, {
'url': 'https://www.liveleak.com/ll_embed?f=ab065df993c1',
'only_matching': True,
}]
def _real_extract(self, url):
kind, video_id = re.match(self._VALID_URL, url).groups()
if kind == 'f':
webpage = self._download_webpage(url, video_id)
liveleak_url = self._search_regex(
r'(?:logourl\s*:\s*|window\.open\()(?P<q1>[\'"])(?P<url>%s)(?P=q1)' % LiveLeakIE._VALID_URL,
webpage, 'LiveLeak URL', group='url')
else:
liveleak_url = 'http://www.liveleak.com/view?%s=%s' % (kind, video_id)
return self.url_result(liveleak_url, ie=LiveLeakIE.ie_key())

View file

@ -0,0 +1,61 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class MagentaMusik360IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?magenta-musik-360\.de/([a-z0-9-]+-(?P<id>[0-9]+)|festivals/.+)'
_TESTS = [{
'url': 'https://www.magenta-musik-360.de/within-temptation-wacken-2019-1-9208205928595185932',
'md5': '65b6f060b40d90276ec6fb9b992c1216',
'info_dict': {
'id': '9208205928595185932',
'ext': 'm3u8',
'title': 'WITHIN TEMPTATION',
'description': 'Robert Westerholt und Sharon Janny den Adel gründeten die Symphonic Metal-Band. Privat sind die Niederländer ein Paar und haben zwei Kinder. Die Single Ice Queen brachte ihnen Platin und Gold und verhalf 2002 zum internationalen Durchbruch. Charakteristisch für die Band war Anfangs der hohe Gesang von Frontfrau Sharon. Stilistisch fing die Band im Gothic Metal an. Mit neuem Sound, schnellen Gitarrenriffs und Gitarrensoli, avancierte Within Temptation zur erfolgreichen Rockband. Auch dieses Jahr wird die Band ihre Fangemeinde wieder mitreißen.',
}
}, {
'url': 'https://www.magenta-musik-360.de/festivals/wacken-world-wide-2020-body-count-feat-ice-t',
'md5': '81010d27d7cab3f7da0b0f681b983b7e',
'info_dict': {
'id': '9208205928595231363',
'ext': 'm3u8',
'title': 'Body Count feat. Ice-T',
'description': 'Body Count feat. Ice-T konnten bereits im vergangenen Jahr auf dem „Holy Ground“ in Wacken überzeugen. 2020 gehen die Crossover-Metaller aus einem Club in Los Angeles auf Sendung und bringen mit ihrer Mischung aus Metal und Hip-Hop Abwechslung und ordentlich Alarm zum WWW. Bereits seit 1990 stehen die beiden Gründer Ice-T (Gesang) und Ernie C (Gitarre) auf der Bühne. Sieben Studioalben hat die Gruppe bis jetzt veröffentlicht, darunter das Debüt „Body Count“ (1992) mit dem kontroversen Track „Cop Killer“.',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
# _match_id casts to string, but since "None" is not a valid video_id for magenta
# there is no risk for confusion
if video_id == "None":
webpage = self._download_webpage(url, video_id)
video_id = self._html_search_regex(r'data-asset-id="([^"]+)"', webpage, 'video_id')
json = self._download_json("https://wcps.t-online.de/cvss/magentamusic/vodplayer/v3/player/58935/%s/Main%%20Movie" % video_id, video_id)
xml_url = json['content']['feature']['representations'][0]['contentPackages'][0]['media']['href']
metadata = json['content']['feature'].get('metadata')
title = None
description = None
duration = None
thumbnails = []
if metadata:
title = metadata.get('title')
description = metadata.get('fullDescription')
duration = metadata.get('runtimeInSeconds')
for img_key in ('teaserImageWide', 'smallCoverImage'):
if img_key in metadata:
thumbnails.append({'url': metadata[img_key].get('href')})
xml = self._download_xml(xml_url, video_id)
final_url = xml[0][0][0].attrib['src']
return {
'id': video_id,
'title': title,
'description': description,
'url': final_url,
'duration': duration,
'thumbnails': thumbnails
}

View file

@ -0,0 +1,31 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class MaoriTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?maoritelevision\.com/shows/(?:[^/]+/)+(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.maoritelevision.com/shows/korero-mai/S01E054/korero-mai-series-1-episode-54',
'md5': '5ade8ef53851b6a132c051b1cd858899',
'info_dict': {
'id': '4774724855001',
'ext': 'mp4',
'title': 'Kōrero Mai, Series 1 Episode 54',
'upload_date': '20160226',
'timestamp': 1456455018,
'description': 'md5:59bde32fd066d637a1a55794c56d8dcb',
'uploader_id': '1614493167001',
},
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1614493167001/HJlhIQhQf_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
brightcove_id = self._search_regex(
r'data-main-video-id=["\'](\d+)', webpage, 'brightcove id')
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
'BrightcoveNew', brightcove_id)

View file

@ -5,12 +5,25 @@ from .common import SelfhostedInfoExtractor
from ..utils import (
clean_html,
float_or_none,
int_or_none,
str_or_none,
try_get,
unescapeHTML,
url_or_none,
ExtractorError,
)
from urllib.parse import (
parse_qs,
urlencode,
urlparse,
)
import json
import re
from .peertube import PeerTubeSHIE
class MastodonSHIE(SelfhostedInfoExtractor):
"""
@ -23,6 +36,7 @@ class MastodonSHIE(SelfhostedInfoExtractor):
"""
IE_NAME = 'mastodon'
_VALID_URL = r'mastodon:(?P<host>[^:]+):(?P<id>.+)'
_NETRC_MACHINE = 'mastodon'
_SH_VALID_URL = r'''(?x)
https?://
(?P<host>[^/\s]+)/
@ -45,6 +59,7 @@ class MastodonSHIE(SelfhostedInfoExtractor):
'<li><a href="https://docs.joinmastodon.org/">Documentation</a></li>',
'<title>Pleroma</title>',
'<noscript>To use Pleroma, please enable JavaScript.</noscript>',
'<noscript>To use Soapbox, please enable JavaScript.</noscript>',
'Alternatively, try one of the <a href="https://apps.gab.com">native apps</a> for Gab Social for your platform.',
)
_SH_VALID_CONTENT_REGEXES = (
@ -96,39 +111,238 @@ class MastodonSHIE(SelfhostedInfoExtractor):
'title': 're:.+ - He shoots, he scores and the crowd went wild.... #Animal #Sports',
'ext': 'mp4',
},
}, {
# Soapbox, audio file
'url': 'https://gleasonator.com/notice/9zvJY6h7jJzwopKAIi',
'info_dict': {
'id': '9zvJY6h7jJzwopKAIi',
'title': 're:.+ - #FEDIBLOCK',
'ext': 'oga',
},
}, {
# mastodon, card to youtube
'url': 'https://mstdn.social/@polamatysiak/106183574509332910',
'info_dict': {
'id': 'RWDU0BjcYp0',
'ext': 'mp4',
'title': 'polamatysiak - Moje wczorajsze wystąpienie w Sejmie, koniecznie zobaczcie do końca 🙂 \n#pracaposłanki\n\nhttps://youtu.be/RWDU0BjcYp0',
'description': 'md5:0c16fa11a698d5d1b171963fd6833297',
'uploader': 'Paulina Matysiak',
'uploader_id': 'UCLRAd9-Hw6kEI1aPBrSaF9A',
'upload_date': '20210505',
},
}]
def _determine_instance_software(self, host, webpage=None):
if webpage:
for i, string in enumerate(self._SH_VALID_CONTENT_STRINGS):
if string in webpage:
return ['mastodon', 'mastodon', 'pleroma', 'pleroma', 'pleroma', 'gab'][i]
if any(s in webpage for s in PeerTubeSHIE._SH_VALID_CONTENT_STRINGS):
return 'peertube'
nodeinfo_href = self._download_json(
f'https://{host}/.well-known/nodeinfo', host, 'Downloading instance nodeinfo link')
nodeinfo = self._download_json(
nodeinfo_href['links'][-1]['href'], host, 'Downloading instance nodeinfo')
return nodeinfo['software']['name']
def _login(self):
username, password = self._get_login_info()
if not username:
return False
# very basic regex, but the instance domain (the one where user has an account)
# must be separated from the user login
mobj = re.match(r'^(?P<username>[^@]+(?:@[^@]+)?)@(?P<instance>.+)$', username)
if not mobj:
self.report_warning(
'Invalid login format - must be in format [username or email]@[instance]')
username, instance = mobj.group('username', 'instance')
app_info = self._downloader.cache.load('mastodon-apps', instance)
if not app_info:
app_info = self._download_json(
f'https://{instance}/api/v1/apps', None, 'Creating an app', headers={
'Content-Type': 'application/json',
}, data=bytes(json.dumps({
'client_name': 'haruhi-dl',
'redirect_uris': 'urn:ietf:wg:oauth:2.0:oob',
'scopes': 'read',
'website': 'https://haruhi.download',
}).encode('utf-8')))
self._downloader.cache.store('mastodon-apps', instance, app_info)
login_webpage = self._download_webpage(
f'https://{instance}/oauth/authorize', None, 'Downloading login page', query={
'client_id': app_info['client_id'],
'scope': 'read',
'redirect_uri': 'urn:ietf:wg:oauth:2.0:oob',
'response_type': 'code',
})
oauth_token = None
# this needs to be codebase-specific, as the HTML page differs between codebases
if 'xlink:href="#mastodon-svg-logo-full"' in login_webpage:
# mastodon
if '@' not in username:
self.report_warning(
'Invalid login format - for Mastodon instances e-mail address is required')
login_form = self._hidden_inputs(login_webpage)
login_form['user[email]'] = username
login_form['user[password]'] = password
login_req, urlh = self._download_webpage_handle(
f'https://{instance}/auth/sign_in', None, 'Sending login details',
headers={
'Content-Type': 'application/x-www-form-urlencoded',
}, data=bytes(urlencode(login_form).encode('utf-8')))
# cached apps may already be authorized
if '/oauth/authorize/native' in urlh.url:
oauth_token = parse_qs(urlparse(urlh.url).query)['code'][0]
else:
auth_form = self._hidden_inputs(
self._search_regex(
r'(?s)(<form\b[^>]+>.+?>Authorize</.+?</form>)',
login_req, 'authorization form'))
_, urlh = self._download_webpage_handle(
f'https://{instance}/oauth/authorize', None, 'Confirming authorization',
headers={
'Content-Type': 'application/x-www-form-urlencoded',
}, data=bytes(urlencode(auth_form).encode('utf-8')))
oauth_token = parse_qs(urlparse(urlh.url).query)['code'][0]
elif 'content: "\\fe0e";' in login_webpage:
# pleroma
login_form = self._hidden_inputs(login_webpage)
login_form['authorization[scope][]'] = 'read'
login_form['authorization[name]'] = username
login_form['authorization[password]'] = password
login_req = self._download_webpage(
f'https://{instance}/oauth/authorize', None, 'Sending login details',
headers={
'Content-Type': 'application/x-www-form-urlencoded',
}, data=bytes(urlencode(login_form).encode('utf-8')))
# TODO: 2FA, error handling
oauth_token = self._search_regex(
r'<h2>\s*Token code is\s*<br>\s*([a-zA-Z\d_-]+)\s*</h2>',
login_req, 'oauth token')
else:
raise ExtractorError('Unknown instance type')
actual_token = self._download_json(
f'https://{instance}/oauth/token', None, 'Downloading the actual token',
headers={
'Content-Type': 'application/x-www-form-urlencoded',
}, data=bytes(urlencode({
'client_id': app_info['client_id'],
'client_secret': app_info['client_secret'],
'redirect_uri': 'urn:ietf:wg:oauth:2.0:oob',
'scope': 'read',
'code': oauth_token,
'grant_type': 'authorization_code',
}).encode('utf-8')))
return {
'instance': instance,
'authorization': f"{actual_token['token_type']} {actual_token['access_token']}",
}
def _selfhosted_extract(self, url, webpage=None):
mobj = re.match(self._VALID_URL, url)
ap_censorship_circuvement = False
if not mobj:
mobj = re.match(self._SH_VALID_URL, url)
if not mobj and self._downloader.params.get('force_use_mastodon'):
mobj = re.match(PeerTubeSHIE._VALID_URL, url)
if mobj:
ap_censorship_circuvement = 'peertube'
if not mobj and self._downloader.params.get('force_use_mastodon'):
mobj = re.match(PeerTubeSHIE._SH_VALID_URL, url)
if mobj:
ap_censorship_circuvement = 'peertube'
if not mobj:
raise ExtractorError('Unrecognized url type')
host, id = mobj.group('host', 'id')
if any(frag in url for frag in ('/objects/', '/activities/')):
if not webpage:
webpage = self._download_webpage(url, '%s:%s' % (host, id), expected_status=302)
real_url = self._og_search_property('url', webpage, default=None)
if real_url:
return self.url_result(real_url, ie='MastodonSH')
login_info = self._login()
metadata = self._download_json('https://%s/api/v1/statuses/%s' % (host, id), '%s:%s' % (host, id))
if not metadata['media_attachments']:
raise ExtractorError('No attached medias')
if login_info and host != login_info['instance']:
wf_url = url
if not url.startswith('http'):
software = ap_censorship_circuvement
if not software:
software = self._determine_instance_software(host, webpage)
url_part = None
if software == 'pleroma':
if '-' in id: # UUID
url_part = 'objects'
else:
url_part = 'notice'
elif software == 'peertube':
url_part = 'videos/watch'
elif software in ('mastodon', 'gab'):
# mastodon and gab social require usernames in the url,
# but we can't determine the username without fetching the post,
# but we can't fetch the post without determining the username...
raise ExtractorError(f'Use the full url with --force-use-mastodon to download from {software}', expected=True)
else:
raise ExtractorError(f'Unknown software: {software}')
wf_url = f'https://{host}/{url_part}/{id}'
search = self._download_json(
f"https://{login_info['instance']}/api/v2/search", '%s:%s' % (host, id),
query={
'q': wf_url,
'type': 'statuses',
'resolve': True,
}, headers={
'Authorization': login_info['authorization'],
})
assert len(search['statuses']) == 1
metadata = search['statuses'][0]
else:
if not login_info and any(frag in url for frag in ('/objects/', '/activities/')):
if not webpage:
webpage = self._download_webpage(url, '%s:%s' % (host, id), expected_status=302)
real_url = self._og_search_property('url', webpage, default=None)
if real_url:
return self.url_result(real_url, ie='MastodonSH')
metadata = self._download_json(
'https://%s/api/v1/statuses/%s' % (host, id), '%s:%s' % (host, id),
headers={
'Authorization': login_info['authorization'],
} if login_info else {})
entries = []
for media in metadata['media_attachments']:
if media['type'] == 'video':
for media in metadata['media_attachments'] or ():
if media['type'] in ('video', 'audio'):
entries.append({
'id': media['id'],
'title': str_or_none(media['description']),
'url': str_or_none(media['url']),
'thumbnail': str_or_none(media['preview_url']),
'thumbnail': str_or_none(media['preview_url']) if media['type'] == 'video' else None,
'vcodec': 'none' if media['type'] == 'audio' else None,
'duration': float_or_none(try_get(media, lambda x: x['meta']['original']['duration'])),
'width': int_or_none(try_get(media, lambda x: x['meta']['original']['width'])),
'height': int_or_none(try_get(media, lambda x: x['meta']['original']['height'])),
'tbr': int_or_none(try_get(media, lambda x: x['meta']['original']['bitrate'])),
})
if len(entries) == 0:
raise ExtractorError('No audio/video attachments')
title = '%s - %s' % (str_or_none(metadata['account'].get('display_name') or metadata['account']['acct']), clean_html(str_or_none(metadata['content'])))
if ap_censorship_circuvement == 'peertube':
title = unescapeHTML(
self._search_regex(
r'^<p><a href="[^"]+">(.+?)</a></p>',
metadata['content'], 'video title'))
if len(entries) == 0:
card = metadata.get('card')
if card:
return {
'_type': 'url_transparent',
'url': card['url'],
'title': title,
'thumbnail': url_or_none(card.get('image')),
}
raise ExtractorError('No audio/video attachments')
info_dict = {
"id": id,

View file

@ -15,33 +15,39 @@ from ..utils import (
class MedalTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://medal.tv/clips/34934644/3Is9zyGMoBMr',
'url': 'https://medal.tv/clips/2mA60jWAGQCBH',
'md5': '7b07b064331b1cf9e8e5c52a06ae68fa',
'info_dict': {
'id': '34934644',
'id': '2mA60jWAGQCBH',
'ext': 'mp4',
'title': 'Quad Cold',
'description': 'Medal,https://medal.tv/desktop/',
'uploader': 'MowgliSB',
'timestamp': 1603165266,
'upload_date': '20201020',
'uploader_id': 10619174,
'uploader_id': '10619174',
}
}, {
'url': 'https://medal.tv/clips/36787208',
'url': 'https://medal.tv/clips/2um24TWdty0NA',
'md5': 'b6dc76b78195fff0b4f8bf4a33ec2148',
'info_dict': {
'id': '36787208',
'id': '2um24TWdty0NA',
'ext': 'mp4',
'title': 'u tk me i tk u bigger',
'description': 'Medal,https://medal.tv/desktop/',
'uploader': 'Mimicc',
'timestamp': 1605580939,
'upload_date': '20201117',
'uploader_id': 5156321,
'uploader_id': '5156321',
}
}, {
'url': 'https://medal.tv/clips/37rMeFpryCC-9',
'only_matching': True,
}, {
'url': 'https://medal.tv/clips/2WRj40tpY_EU9',
'only_matching': True,
}]
def _real_extract(self, url):

View file

@ -0,0 +1,74 @@
# coding: utf-8
from .common import SelfhostedInfoExtractor
from ..utils import (
mimetype2ext,
parse_iso8601,
ExtractorError,
)
import json
class MisskeySHIE(SelfhostedInfoExtractor):
IE_NAME = 'misskey'
_VALID_URL = r'misskey:(?P<host>[^:]+):(?P<id>[\da-z]+)'
_SH_VALID_URL = r'https?://(?P<host>[^/]+)/notes/(?P<id>[\da-z]+)'
_SH_VALID_CONTENT_STRINGS = (
'<meta name="application-name" content="Misskey"',
'<meta name="misskey:',
'<!-- If you are reading this message... how about joining the development of Misskey? -->',
)
_TESTS = [{
'url': 'https://catgirl.life/notes/8lh52dlrii',
'info_dict': {
'id': '8lh52dlrii',
'ext': 'mp4',
'timestamp': 1604387877,
'upload_date': '20201103',
'title': '@graf@poa.st @Moon@shitposter.club \n*kickstarts your federation*',
},
}]
def _selfhosted_extract(self, url, webpage=None):
host, video_id = self._match_id_and_host(url)
post = self._download_json(f'https://{host}/api/notes/show', video_id,
data=bytes(json.dumps({
'noteId': video_id,
}).encode('utf-8')),
headers={
'Content-Type': 'application/json',
})
entries = []
for file in post['files']:
if not file['type'].startswith('video/') and not file['type'].startswith('audio/'):
continue
entries.append({
'id': file['id'],
'url': file['url'],
'ext': mimetype2ext(file.get('type')),
'title': file.get('name'),
'thumbnail': file.get('thumbnailUrl'),
'timestamp': parse_iso8601(file.get('createdAt')),
'filesize': file['size'] if file.get('size') != 0 else None,
'age_limit': 18 if file.get('isSensitive') else 0,
})
if len(entries) == 0:
raise ExtractorError('No media found in post')
elif len(entries) == 1:
info_dict = entries[0]
else:
info_dict = {
'_type': 'playlist',
'entries': entries,
}
info_dict.update({
'id': video_id,
'title': post.get('text') or '_',
})
return info_dict

View file

@ -1,15 +1,91 @@
from __future__ import unicode_literals
from .nhl import NHLBaseIE
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
parse_duration,
parse_iso8601,
try_get,
)
class MLBIE(NHLBaseIE):
class MLBBaseIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
video = self._download_video_data(display_id)
video_id = video['id']
title = video['title']
feed = self._get_feed(video)
formats = []
for playback in (feed.get('playbacks') or []):
playback_url = playback.get('url')
if not playback_url:
continue
name = playback.get('name')
ext = determine_ext(playback_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
playback_url, video_id, 'mp4',
'm3u8_native', m3u8_id=name, fatal=False))
else:
f = {
'format_id': name,
'url': playback_url,
}
mobj = re.search(r'_(\d+)K_(\d+)X(\d+)', name)
if mobj:
f.update({
'height': int(mobj.group(3)),
'tbr': int(mobj.group(1)),
'width': int(mobj.group(2)),
})
mobj = re.search(r'_(\d+)x(\d+)_(\d+)_(\d+)K\.mp4', playback_url)
if mobj:
f.update({
'fps': int(mobj.group(3)),
'height': int(mobj.group(2)),
'tbr': int(mobj.group(4)),
'width': int(mobj.group(1)),
})
formats.append(f)
self._sort_formats(formats)
thumbnails = []
for cut in (try_get(feed, lambda x: x['image']['cuts'], list) or []):
src = cut.get('src')
if not src:
continue
thumbnails.append({
'height': int_or_none(cut.get('height')),
'url': src,
'width': int_or_none(cut.get('width')),
})
language = (video.get('language') or 'EN').lower()
return {
'id': video_id,
'title': title,
'formats': formats,
'description': video.get('description'),
'duration': parse_duration(feed.get('duration')),
'thumbnails': thumbnails,
'timestamp': parse_iso8601(video.get(self._TIMESTAMP_KEY)),
'subtitles': self._extract_mlb_subtitles(feed, language),
}
class MLBIE(MLBBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:[\da-z_-]+\.)*(?P<site>mlb)\.com/
(?:[\da-z_-]+\.)*mlb\.com/
(?:
(?:
(?:[^/]+/)*c-|
(?:[^/]+/)*video/[^/]+/c-|
(?:
shared/video/embed/(?:embed|m-internal-embed)\.html|
(?:[^/]+/)+(?:play|index)\.jsp|
@ -18,7 +94,6 @@ class MLBIE(NHLBaseIE):
(?P<id>\d+)
)
'''
_CONTENT_DOMAIN = 'content.mlb.com'
_TESTS = [
{
'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933',
@ -76,18 +151,6 @@ class MLBIE(NHLBaseIE):
'thumbnail': r're:^https?://.*\.jpg$',
},
},
{
'url': 'https://www.mlb.com/news/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer/c-118550098',
'md5': 'e09e37b552351fddbf4d9e699c924d68',
'info_dict': {
'id': '75609783',
'ext': 'mp4',
'title': 'Must C: Pillar climbs for catch',
'description': '4/15/15: Blue Jays outfielder Kevin Pillar continues his defensive dominance by climbing the wall in left to rob Tim Beckham of a home run',
'timestamp': 1429139220,
'upload_date': '20150415',
}
},
{
'url': 'https://www.mlb.com/video/hargrove-homers-off-caldwell/c-1352023483?tid=67793694',
'only_matching': True,
@ -113,8 +176,92 @@ class MLBIE(NHLBaseIE):
'url': 'http://mlb.mlb.com/shared/video/embed/m-internal-embed.html?content_id=75609783&property=mlb&autoplay=true&hashmode=false&siteSection=mlb/multimedia/article_118550098/article_embed&club=mlb',
'only_matching': True,
},
{
'url': 'https://www.mlb.com/cut4/carlos-gomez-borrowed-sunglasses-from-an-as-fan/c-278912842',
'only_matching': True,
}
]
_TIMESTAMP_KEY = 'date'
@staticmethod
def _get_feed(video):
return video
@staticmethod
def _extract_mlb_subtitles(feed, language):
subtitles = {}
for keyword in (feed.get('keywordsAll') or []):
keyword_type = keyword.get('type')
if keyword_type and keyword_type.startswith('closed_captions_location_'):
cc_location = keyword.get('value')
if cc_location:
subtitles.setdefault(language, []).append({
'url': cc_location,
})
return subtitles
def _download_video_data(self, display_id):
return self._download_json(
'http://content.mlb.com/mlb/item/id/v1/%s/details/web-v1.json' % display_id,
display_id)
class MLBVideoIE(MLBBaseIE):
_VALID_URL = r'https?://(?:www\.)?mlb\.com/(?:[^/]+/)*video/(?P<id>[^/?&#]+)'
_TEST = {
'url': 'https://www.mlb.com/mariners/video/ackley-s-spectacular-catch-c34698933',
'md5': '632358dacfceec06bad823b83d21df2d',
'info_dict': {
'id': 'c04a8863-f569-42e6-9f87-992393657614',
'ext': 'mp4',
'title': "Ackley's spectacular catch",
'description': 'md5:7f5a981eb4f3cbc8daf2aeffa2215bf0',
'duration': 66,
'timestamp': 1405995000,
'upload_date': '20140722',
'thumbnail': r're:^https?://.+',
},
}
_TIMESTAMP_KEY = 'timestamp'
@classmethod
def suitable(cls, url):
return False if MLBIE.suitable(url) else super(MLBVideoIE, cls).suitable(url)
@staticmethod
def _get_feed(video):
return video['feeds'][0]
@staticmethod
def _extract_mlb_subtitles(feed, language):
subtitles = {}
for cc_location in (feed.get('closedCaptions') or []):
subtitles.setdefault(language, []).append({
'url': cc_location,
})
def _download_video_data(self, display_id):
# https://www.mlb.com/data-service/en/videos/[SLUG]
return self._download_json(
'https://fastball-gateway.mlb.com/graphql',
display_id, query={
'query': '''{
mediaPlayback(ids: "%s") {
description
feeds(types: CMS) {
closedCaptions
duration
image {
cuts {
width
height
src
}
}
playbacks {
name
url
}
}
id
timestamp
title
}
}''' % display_id,
})['data']['mediaPlayback'][0]

View file

@ -111,7 +111,7 @@ class MSNIE(InfoExtractor):
continue
if 'format=m3u8-aapl' in format_url:
# m3u8_native should not be used here until
# https://github.com/ytdl-org/haruhi-dl/issues/9913 is fixed
# https://github.com/ytdl-org/youtube-dl/issues/9913 is fixed
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False))

View file

@ -255,7 +255,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
@staticmethod
def _extract_child_with_type(parent, t):
return next(c for c in parent['children'] if c.get('type') == t)
for c in parent['children']:
if c.get('type') == t:
return c
def _extract_mgid(self, webpage):
try:
@ -286,7 +288,8 @@ class MTVServicesInfoExtractor(InfoExtractor):
data = self._parse_json(self._search_regex(
r'__DATA__\s*=\s*({.+?});', webpage, 'data'), None)
main_container = self._extract_child_with_type(data, 'MainContainer')
video_player = self._extract_child_with_type(main_container, 'VideoPlayer')
ab_testing = self._extract_child_with_type(main_container, 'ABTesting')
video_player = self._extract_child_with_type(ab_testing or main_container, 'VideoPlayer')
mgid = video_player['props']['media']['video']['config']['uri']
return mgid
@ -320,7 +323,7 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media.mtvnservices.com/embed/.+?)\1', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media\.mtvnservices\.com/embed/.+?)\1', webpage)
if mobj:
return mobj.group('url')

View file

@ -108,7 +108,7 @@ class NHLIE(NHLBaseIE):
'timestamp': 1454544904,
},
}, {
# Some m3u8 URLs are invalid (https://github.com/ytdl-org/haruhi-dl/issues/10713)
# Some m3u8 URLs are invalid (https://github.com/ytdl-org/youtube-dl/issues/10713)
'url': 'https://www.nhl.com/predators/video/poile-laviolette-on-subban-trade/t-277437416/c-44315003',
'md5': '50b2bb47f405121484dda3ccbea25459',
'info_dict': {

View file

@ -1,25 +1,28 @@
# coding: utf-8
from __future__ import unicode_literals
import datetime
import functools
import re
import json
import math
import datetime
from .common import InfoExtractor
from ..postprocessor.ffmpeg import FFmpegPostProcessor
from ..compat import (
compat_str,
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import (
determine_ext,
dict_get,
ExtractorError,
float_or_none,
InAdvancePagedList,
int_or_none,
float_or_none,
OnDemandPagedList,
parse_duration,
parse_iso8601,
PostProcessingError,
str_or_none,
remove_start,
try_get,
unified_timestamp,
@ -34,7 +37,7 @@ class NiconicoIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.nicovideo.jp/watch/sm22312215',
'md5': 'd1a75c0823e2f629128c43e1212760f9',
'md5': 'a5bad06f1347452102953f323c69da34s',
'info_dict': {
'id': 'sm22312215',
'ext': 'mp4',
@ -162,6 +165,11 @@ class NiconicoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|secure\.|sp\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)'
_NETRC_MACHINE = 'niconico'
_API_HEADERS = {
'X-Frontend-ID': '6',
'X-Frontend-Version': '0'
}
def _real_initialize(self):
self._login()
@ -188,40 +196,92 @@ class NiconicoIE(InfoExtractor):
if compat_parse_qs(parts.query).get('message', [None])[0] == 'cant_login':
login_ok = False
if not login_ok:
self._downloader.report_warning('unable to log in: bad username or password')
self.report_warning('unable to log in: bad username or password')
return login_ok
def _extract_format_for_quality(self, api_data, video_id, audio_quality, video_quality):
def yesno(boolean):
return 'yes' if boolean else 'no'
def _get_heartbeat_info(self, info_dict):
session_api_data = api_data['video']['dmcInfo']['session_api']
session_api_endpoint = session_api_data['urls'][0]
video_id, video_src_id, audio_src_id = info_dict['url'].split(':')[1].split('/')
format_id = '-'.join(map(lambda s: remove_start(s['id'], 'archive_'), [video_quality, audio_quality]))
api_data = (
info_dict.get('_api_data')
or self._parse_json(
self._html_search_regex(
'data-api-data="([^"]+)"',
self._download_webpage('http://www.nicovideo.jp/watch/' + video_id, video_id),
'API data', default='{}'),
video_id))
session_api_data = try_get(api_data, lambda x: x['media']['delivery']['movie']['session'])
session_api_endpoint = try_get(session_api_data, lambda x: x['urls'][0])
def ping():
status = try_get(
self._download_json(
'https://nvapi.nicovideo.jp/v1/2ab0cbaa/watch', video_id,
query={'t': try_get(api_data, lambda x: x['media']['delivery']['trackingId'])},
note='Acquiring permission for downloading video',
headers=self._API_HEADERS),
lambda x: x['meta']['status'])
if status != 200:
self.report_warning('Failed to acquire permission for playing video. The video may not download.')
yesno = lambda x: 'yes' if x else 'no'
# m3u8 (encryption)
if try_get(api_data, lambda x: x['media']['delivery']['encryption']) is not None:
protocol = 'm3u8'
encryption = self._parse_json(session_api_data['token'], video_id)['hls_encryption']
session_api_http_parameters = {
'parameters': {
'hls_parameters': {
'encryption': {
encryption: {
'encrypted_key': try_get(api_data, lambda x: x['media']['delivery']['encryption']['encryptedKey']),
'key_uri': try_get(api_data, lambda x: x['media']['delivery']['encryption']['keyUri'])
}
},
'transfer_preset': '',
'use_ssl': yesno(session_api_endpoint['isSsl']),
'use_well_known_port': yesno(session_api_endpoint['isWellKnownPort']),
'segment_duration': 6000,
}
}
}
# http
else:
protocol = 'http'
session_api_http_parameters = {
'parameters': {
'http_output_download_parameters': {
'use_ssl': yesno(session_api_endpoint['isSsl']),
'use_well_known_port': yesno(session_api_endpoint['isWellKnownPort']),
}
}
}
session_response = self._download_json(
session_api_endpoint['url'], video_id,
query={'_format': 'json'},
headers={'Content-Type': 'application/json'},
note='Downloading JSON metadata for %s' % format_id,
note='Downloading JSON metadata for %s' % info_dict['format_id'],
data=json.dumps({
'session': {
'client_info': {
'player_id': session_api_data['player_id'],
'player_id': session_api_data.get('playerId'),
},
'content_auth': {
'auth_type': session_api_data['auth_types'][session_api_data['protocols'][0]],
'content_key_timeout': session_api_data['content_key_timeout'],
'auth_type': try_get(session_api_data, lambda x: x['authTypes'][session_api_data['protocols'][0]]),
'content_key_timeout': session_api_data.get('contentKeyTimeout'),
'service_id': 'nicovideo',
'service_user_id': session_api_data['service_user_id']
'service_user_id': session_api_data.get('serviceUserId')
},
'content_id': session_api_data['content_id'],
'content_id': session_api_data.get('contentId'),
'content_src_id_sets': [{
'content_src_ids': [{
'src_id_to_mux': {
'audio_src_ids': [audio_quality['id']],
'video_src_ids': [video_quality['id']],
'audio_src_ids': [audio_src_id],
'video_src_ids': [video_src_id],
}
}]
}],
@ -229,52 +289,81 @@ class NiconicoIE(InfoExtractor):
'content_uri': '',
'keep_method': {
'heartbeat': {
'lifetime': session_api_data['heartbeat_lifetime']
'lifetime': session_api_data.get('heartbeatLifetime')
}
},
'priority': session_api_data['priority'],
'priority': session_api_data.get('priority'),
'protocol': {
'name': 'http',
'parameters': {
'http_parameters': {
'parameters': {
'http_output_download_parameters': {
'use_ssl': yesno(session_api_endpoint['is_ssl']),
'use_well_known_port': yesno(session_api_endpoint['is_well_known_port']),
}
}
}
'http_parameters': session_api_http_parameters
}
},
'recipe_id': session_api_data['recipe_id'],
'recipe_id': session_api_data.get('recipeId'),
'session_operation_auth': {
'session_operation_auth_by_signature': {
'signature': session_api_data['signature'],
'token': session_api_data['token'],
'signature': session_api_data.get('signature'),
'token': session_api_data.get('token'),
}
},
'timing_constraint': 'unlimited'
}
}).encode())
resolution = video_quality.get('resolution', {})
info_dict['url'] = session_response['data']['session']['content_uri']
info_dict['protocol'] = protocol
# get heartbeat info
heartbeat_info_dict = {
'url': session_api_endpoint['url'] + '/' + session_response['data']['session']['id'] + '?_format=json&_method=PUT',
'data': json.dumps(session_response['data']),
# interval, convert milliseconds to seconds, then halve to make a buffer.
'interval': float_or_none(session_api_data.get('heartbeatLifetime'), scale=3000),
'ping': ping
}
return info_dict, heartbeat_info_dict
def _extract_format_for_quality(self, api_data, video_id, audio_quality, video_quality):
def parse_format_id(id_code):
mobj = re.match(r'''(?x)
(?:archive_)?
(?:(?P<codec>[^_]+)_)?
(?:(?P<br>[\d]+)kbps_)?
(?:(?P<res>[\d+]+)p_)?
''', '%s_' % id_code)
return mobj.groupdict() if mobj else {}
protocol = 'niconico_dmc'
format_id = '-'.join(map(lambda s: remove_start(s['id'], 'archive_'), [video_quality, audio_quality]))
vdict = parse_format_id(video_quality['id'])
adict = parse_format_id(audio_quality['id'])
resolution = try_get(video_quality, lambda x: x['metadata']['resolution'], dict) or {'height': vdict.get('res')}
vbr = try_get(video_quality, lambda x: x['metadata']['bitrate'], float)
return {
'url': session_response['data']['session']['content_uri'],
'url': '%s:%s/%s/%s' % (protocol, video_id, video_quality['id'], audio_quality['id']),
'format_id': format_id,
'format_note': 'DMC %s' % try_get(video_quality, lambda x: x['metadata']['label'], compat_str),
'ext': 'mp4', # Session API are used in HTML5, which always serves mp4
'abr': float_or_none(audio_quality.get('bitrate'), 1000),
'vbr': float_or_none(video_quality.get('bitrate'), 1000),
'height': resolution.get('height'),
'width': resolution.get('width'),
'vcodec': vdict.get('codec'),
'acodec': adict.get('codec'),
'vbr': float_or_none(vbr, 1000) or float_or_none(vdict.get('br')),
'abr': float_or_none(audio_quality.get('bitrate'), 1000) or float_or_none(adict.get('br')),
'height': int_or_none(resolution.get('height', vdict.get('res'))),
'width': int_or_none(resolution.get('width')),
'quality': -2 if 'low' in format_id else -1, # Default quality value is -1
'protocol': protocol,
'http_headers': {
'Origin': 'https://www.nicovideo.jp',
'Referer': 'https://www.nicovideo.jp/watch/' + video_id,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
# Get video webpage. We are not actually interested in it for normal
# cases, but need the cookies in order to be able to download the
# info webpage
# Get video webpage for API data.
webpage, handle = self._download_webpage_handle(
'http://www.nicovideo.jp/watch/' + video_id, video_id)
if video_id.startswith('so'):
@ -284,86 +373,136 @@ class NiconicoIE(InfoExtractor):
'data-api-data="([^"]+)"', webpage,
'API data', default='{}'), video_id)
def _format_id_from_url(video_url):
return 'economy' if video_real_url.endswith('low') else 'normal'
def get_video_info_web(items):
return dict_get(api_data['video'], items)
try:
video_real_url = api_data['video']['smileInfo']['url']
except KeyError: # Flash videos
# Get flv info
flv_info_webpage = self._download_webpage(
'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
video_id, 'Downloading flv info')
# Get video info
video_info_xml = self._download_xml(
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id,
video_id, note='Downloading video info page')
flv_info = compat_parse_qs(flv_info_webpage)
if 'url' not in flv_info:
if 'deleted' in flv_info:
raise ExtractorError('The video has been deleted.',
expected=True)
elif 'closed' in flv_info:
raise ExtractorError('Niconico videos now require logging in',
expected=True)
elif 'error' in flv_info:
raise ExtractorError('%s reports error: %s' % (
self.IE_NAME, flv_info['error'][0]), expected=True)
else:
raise ExtractorError('Unable to find video URL')
def get_video_info_xml(items):
if not isinstance(items, list):
items = [items]
for item in items:
ret = xpath_text(video_info_xml, './/' + item)
if ret:
return ret
video_info_xml = self._download_xml(
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id,
video_id, note='Downloading video info page')
if get_video_info_xml('error'):
error_code = get_video_info_xml('code')
def get_video_info(items):
if not isinstance(items, list):
items = [items]
for item in items:
ret = xpath_text(video_info_xml, './/' + item)
if ret:
return ret
if error_code == 'DELETED':
raise ExtractorError('The video has been deleted.',
expected=True)
elif error_code == 'NOT_FOUND':
raise ExtractorError('The video is not found.',
expected=True)
elif error_code == 'COMMUNITY':
self.to_screen('%s: The video is community members only.' % video_id)
else:
raise ExtractorError('%s reports error: %s' % (self.IE_NAME, error_code))
video_real_url = flv_info['url'][0]
# Start extracting video formats
formats = []
extension = get_video_info('movie_type')
if not extension:
extension = determine_ext(video_real_url)
# Get HTML5 videos info
quality_info = try_get(api_data, lambda x: x['media']['delivery']['movie'])
if not quality_info:
raise ExtractorError('The video can\'t be downloaded', expected=True)
formats = [{
'url': video_real_url,
'ext': extension,
'format_id': _format_id_from_url(video_real_url),
}]
else:
formats = []
for audio_quality in quality_info.get('audios') or {}:
for video_quality in quality_info.get('videos') or {}:
if not audio_quality.get('isAvailable') or not video_quality.get('isAvailable'):
continue
formats.append(self._extract_format_for_quality(
api_data, video_id, audio_quality, video_quality))
dmc_info = api_data['video'].get('dmcInfo')
if dmc_info: # "New" HTML5 videos
quality_info = dmc_info['quality']
for audio_quality in quality_info['audios']:
for video_quality in quality_info['videos']:
if not audio_quality['available'] or not video_quality['available']:
continue
formats.append(self._extract_format_for_quality(
api_data, video_id, audio_quality, video_quality))
# Get flv/swf info
timestamp = None
video_real_url = try_get(api_data, lambda x: x['video']['smileInfo']['url'])
if video_real_url:
is_economy = video_real_url.endswith('low')
self._sort_formats(formats)
else: # "Old" HTML5 videos
formats = [{
if is_economy:
self.report_warning('Site is currently in economy mode! You will only have access to lower quality streams')
# Invoking ffprobe to determine resolution
pp = FFmpegPostProcessor(self._downloader)
cookies = self._get_cookies('https://nicovideo.jp').output(header='', sep='; path=/; domain=nicovideo.jp;\n')
self.to_screen('%s: %s' % (video_id, 'Checking smile format with ffprobe'))
try:
metadata = pp.get_metadata_object(video_real_url, ['-cookies', cookies])
except PostProcessingError as err:
raise ExtractorError(err.msg, expected=True)
v_stream = a_stream = {}
# Some complex swf files doesn't have video stream (e.g. nm4809023)
for stream in metadata['streams']:
if stream['codec_type'] == 'video':
v_stream = stream
elif stream['codec_type'] == 'audio':
a_stream = stream
# Community restricted videos seem to have issues with the thumb API not returning anything at all
filesize = int(
(get_video_info_xml('size_high') if not is_economy else get_video_info_xml('size_low'))
or metadata['format']['size']
)
extension = (
get_video_info_xml('movie_type')
or 'mp4' if 'mp4' in metadata['format']['format_name'] else metadata['format']['format_name']
)
# 'creation_time' tag on video stream of re-encoded SMILEVIDEO mp4 files are '1970-01-01T00:00:00.000000Z'.
timestamp = (
parse_iso8601(get_video_info_web('first_retrieve'))
or unified_timestamp(get_video_info_web('postedDateTime'))
)
metadata_timestamp = (
parse_iso8601(try_get(v_stream, lambda x: x['tags']['creation_time']))
or timestamp if extension != 'mp4' else 0
)
# According to compconf, smile videos from pre-2017 are always better quality than their DMC counterparts
smile_threshold_timestamp = parse_iso8601('2016-12-08T00:00:00+09:00')
is_source = timestamp < smile_threshold_timestamp or metadata_timestamp > 0
# If movie file size is unstable, old server movie is not source movie.
if filesize > 1:
formats.append({
'url': video_real_url,
'ext': 'mp4',
'format_id': _format_id_from_url(video_real_url),
}]
'format_id': 'smile' if not is_economy else 'smile_low',
'format_note': 'SMILEVIDEO source' if not is_economy else 'SMILEVIDEO low quality',
'ext': extension,
'container': extension,
'vcodec': v_stream.get('codec_name'),
'acodec': a_stream.get('codec_name'),
# Some complex swf files doesn't have total bit rate metadata (e.g. nm6049209)
'tbr': int_or_none(metadata['format'].get('bit_rate'), scale=1000),
'vbr': int_or_none(v_stream.get('bit_rate'), scale=1000),
'abr': int_or_none(a_stream.get('bit_rate'), scale=1000),
'height': int_or_none(v_stream.get('height')),
'width': int_or_none(v_stream.get('width')),
'source_preference': 5 if not is_economy else -2,
'quality': 5 if is_source and not is_economy else None,
'filesize': filesize
})
def get_video_info(items):
return dict_get(api_data['video'], items)
self._sort_formats(formats)
# Start extracting information
title = get_video_info('title')
if not title:
title = self._og_search_title(webpage, default=None)
if not title:
title = self._html_search_regex(
title = (
get_video_info_xml('title') # prefer to get the untranslated original title
or get_video_info_web(['originalTitle', 'title'])
or self._og_search_title(webpage, default=None)
or self._html_search_regex(
r'<span[^>]+class="videoHeaderTitle"[^>]*>([^<]+)</span>',
webpage, 'video title')
webpage, 'video title'))
watch_api_data_string = self._html_search_regex(
r'<div[^>]+id="watchAPIDataContainer"[^>]+>([^<]+)</div>',
@ -372,14 +511,15 @@ class NiconicoIE(InfoExtractor):
video_detail = watch_api_data.get('videoDetail', {})
thumbnail = (
get_video_info(['thumbnail_url', 'thumbnailURL'])
self._html_search_regex(r'<meta property="og:image" content="([^"]+)">', webpage, 'thumbnail data', default=None)
or dict_get( # choose highest from 720p to 240p
get_video_info_web('thumbnail'),
['ogp', 'player', 'largeUrl', 'middleUrl', 'url'])
or self._html_search_meta('image', webpage, 'thumbnail', default=None)
or video_detail.get('thumbnail'))
description = get_video_info('description')
description = get_video_info_web('description')
timestamp = (parse_iso8601(get_video_info('first_retrieve'))
or unified_timestamp(get_video_info('postedDateTime')))
if not timestamp:
match = self._html_search_meta('datePublished', webpage, 'date published', default=None)
if match:
@ -388,19 +528,25 @@ class NiconicoIE(InfoExtractor):
timestamp = parse_iso8601(
video_detail['postedAt'].replace('/', '-'),
delimiter=' ', timezone=datetime.timedelta(hours=9))
timestamp = timestamp or try_get(api_data, lambda x: parse_iso8601(x['video']['registeredAt']))
view_count = int_or_none(get_video_info(['view_counter', 'viewCount']))
view_count = int_or_none(get_video_info_web(['view_counter', 'viewCount']))
if not view_count:
match = self._html_search_regex(
r'>Views: <strong[^>]*>([^<]+)</strong>',
webpage, 'view count', default=None)
if match:
view_count = int_or_none(match.replace(',', ''))
view_count = view_count or video_detail.get('viewCount')
view_count = (
view_count
or video_detail.get('viewCount')
or try_get(api_data, lambda x: x['video']['count']['view']))
comment_count = (
int_or_none(get_video_info_web('comment_num'))
or video_detail.get('commentCount')
or try_get(api_data, lambda x: x['video']['count']['comment']))
comment_count = (int_or_none(get_video_info('comment_num'))
or video_detail.get('commentCount')
or try_get(api_data, lambda x: x['thread']['commentCount']))
if not comment_count:
match = self._html_search_regex(
r'>Comments: <strong[^>]*>([^<]+)</strong>',
@ -409,22 +555,41 @@ class NiconicoIE(InfoExtractor):
comment_count = int_or_none(match.replace(',', ''))
duration = (parse_duration(
get_video_info('length')
get_video_info_web('length')
or self._html_search_meta(
'video:duration', webpage, 'video duration', default=None))
or video_detail.get('length')
or get_video_info('duration'))
or get_video_info_web('duration'))
webpage_url = get_video_info('watch_url') or url
webpage_url = get_video_info_web('watch_url') or url
# for channel movie and community movie
channel_id = try_get(
api_data,
(lambda x: x['channel']['globalId'],
lambda x: x['community']['globalId']))
channel = try_get(
api_data,
(lambda x: x['channel']['name'],
lambda x: x['community']['name']))
# Note: cannot use api_data.get('owner', {}) because owner may be set to "null"
# in the JSON, which will cause None to be returned instead of {}.
owner = try_get(api_data, lambda x: x.get('owner'), dict) or {}
uploader_id = get_video_info(['ch_id', 'user_id']) or owner.get('id')
uploader = get_video_info(['ch_name', 'user_nickname']) or owner.get('nickname')
uploader_id = str_or_none(
get_video_info_web(['ch_id', 'user_id'])
or owner.get('id')
or channel_id
)
uploader = (
get_video_info_web(['ch_name', 'user_nickname'])
or owner.get('nickname')
or channel
)
return {
'id': video_id,
'_api_data': api_data,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
@ -432,6 +597,8 @@ class NiconicoIE(InfoExtractor):
'uploader': uploader,
'timestamp': timestamp,
'uploader_id': uploader_id,
'channel': channel,
'channel_id': channel_id,
'view_count': view_count,
'comment_count': comment_count,
'duration': duration,
@ -440,7 +607,7 @@ class NiconicoIE(InfoExtractor):
class NiconicoPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/(?:user/\d+/)?mylist/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/(?:user/\d+/|my/)?mylist/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.nicovideo.jp/mylist/27411728',
@ -456,60 +623,77 @@ class NiconicoPlaylistIE(InfoExtractor):
'url': 'https://www.nicovideo.jp/user/805442/mylist/27411728',
'only_matching': True,
}]
_PAGE_SIZE = 100
def _call_api(self, list_id, resource, query):
return self._download_json(
'https://nvapi.nicovideo.jp/v2/mylists/' + list_id, list_id,
'Downloading %s JSON metatdata' % resource, query=query,
headers={'X-Frontend-Id': 6})['data']['mylist']
def _parse_owner(self, item):
owner = item.get('owner') or {}
if owner:
return {
'uploader': owner.get('name'),
'uploader_id': owner.get('id'),
}
return {}
def _fetch_page(self, list_id, page):
page += 1
items = self._call_api(list_id, 'page %d' % page, {
'page': page,
'pageSize': self._PAGE_SIZE,
})['items']
for item in items:
video = item.get('video') or {}
video_id = video.get('id')
if not video_id:
continue
count = video.get('count') or {}
get_count = lambda x: int_or_none(count.get(x))
info = {
'_type': 'url',
'id': video_id,
'title': video.get('title'),
'url': 'https://www.nicovideo.jp/watch/' + video_id,
'description': video.get('shortDescription'),
'duration': int_or_none(video.get('duration')),
'view_count': get_count('view'),
'comment_count': get_count('comment'),
'ie_key': NiconicoIE.ie_key(),
}
info.update(self._parse_owner(video))
yield info
_API_HEADERS = {
'X-Frontend-ID': '6',
'X-Frontend-Version': '0'
}
def _real_extract(self, url):
list_id = self._match_id(url)
mylist = self._call_api(list_id, 'list', {
'pageSize': 1,
})
entries = InAdvancePagedList(
functools.partial(self._fetch_page, list_id),
math.ceil(mylist['totalItemCount'] / self._PAGE_SIZE),
self._PAGE_SIZE)
result = self.playlist_result(
entries, list_id, mylist.get('name'), mylist.get('description'))
result.update(self._parse_owner(mylist))
return result
def get_page_data(pagenum, pagesize):
return self._download_json(
'http://nvapi.nicovideo.jp/v2/mylists/' + list_id, list_id,
query={'page': 1 + pagenum, 'pageSize': pagesize},
headers=self._API_HEADERS).get('data').get('mylist')
data = get_page_data(0, 1)
title = data.get('name')
description = data.get('description')
uploader = data.get('owner').get('name')
uploader_id = data.get('owner').get('id')
def pagefunc(pagenum):
data = get_page_data(pagenum, 25)
return ({
'_type': 'url',
'url': 'http://www.nicovideo.jp/watch/' + item.get('watchId'),
} for item in data.get('items'))
return {
'_type': 'playlist',
'id': list_id,
'title': title,
'description': description,
'uploader': uploader,
'uploader_id': uploader_id,
'entries': OnDemandPagedList(pagefunc, 25),
}
class NiconicoUserIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/user/(?P<id>\d+)/?(?:$|[#?])'
_TEST = {
'url': 'https://www.nicovideo.jp/user/419948',
'info_dict': {
'id': '419948',
},
'playlist_mincount': 101,
}
_API_URL = "https://nvapi.nicovideo.jp/v1/users/%s/videos?sortKey=registeredAt&sortOrder=desc&pageSize=%s&page=%s"
_PAGE_SIZE = 100
_API_HEADERS = {
'X-Frontend-ID': '6',
'X-Frontend-Version': '0'
}
def _entries(self, list_id, ):
total_count = 1
count = page_num = 0
while count < total_count:
json_parsed = self._download_json(
self._API_URL % (list_id, self._PAGE_SIZE, page_num + 1), list_id,
headers=self._API_HEADERS,
note='Downloading JSON metadata%s' % (' page %d' % page_num if page_num else ''))
if not page_num:
total_count = int_or_none(json_parsed['data'].get('totalCount'))
for entry in json_parsed["data"]["items"]:
count += 1
yield self.url_result('https://www.nicovideo.jp/watch/%s' % entry['id'])
page_num += 1
def _real_extract(self, url):
list_id = self._match_id(url)
return self.playlist_result(self._entries(list_id), list_id, ie=NiconicoIE.ie_key())

View file

@ -1,100 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
)
class NinatekaIE(InfoExtractor):
IE_NAME = 'ninateka'
_VALID_URL = r'https?://ninateka\.pl/(?:film|audio)/(?P<id>[^/\?#]+)'
_TESTS = [{
'url': 'https://ninateka.pl/film/dziwne-przygody-kota-filemona-7',
'md5': '8b25c2998b48e1add7d93a5e27030786',
'info_dict': {
'id': 'dziwne-przygody-kota-filemona-7',
'ext': 'mp4',
'title': 'Dziwny świat kota Filemona | Poważne zmartwienie',
'description': 'Filemon ma kłopot z własnym wyglądem, czy uda mu się z nim uporać?',
},
}, {
'url': 'https://ninateka.pl/audio/telefon-drony-fisz-1-12',
'md5': 'fa03fc229d3b4d8eaa18976a7020909e',
'info_dict': {
'id': 'telefon-drony-fisz-1-12',
'ext': 'm4a',
'title': 'Telefon | Drony | Fisz Emade Tworzywo | 1/12',
'description': 'Utwór z długo wyczekiwanego albumu studyjnego Fisz Emade Tworzywo pt. „Drony”',
},
}]
def decode_url(self, encoded):
xor_val = ord('h') ^ ord(encoded[0])
return ''.join(chr(ord(c) ^ xor_val) for c in encoded)
def extract_formats(self, data, video_id, name):
info = self._parse_json(data, video_id, transform_source=js_to_json)
formats = []
for source_info in info['sources']:
url = self.decode_url(source_info['src'])
type_ = source_info.get('type')
if type_ == 'application/vnd.ms-sstr+xml' or url.endswith('/Manifest'):
formats.extend(self._extract_ism_formats(
url, video_id, ism_id='mss-{}'.format(name), fatal=False))
elif type_ == 'application/x-mpegURL' or url.endswith('.m3u8'):
formats.extend(self._extract_m3u8_formats(
url, video_id, ext='mp4', m3u8_id='hls-{}'.format(name), fatal=False))
elif type_ == 'application/dash+xml' or url.endswith('.mpd'):
formats.extend(self._extract_mpd_formats(
url, video_id, mpd_id='dash-{}'.format(name), fatal=False))
elif url.endswith('.f4m'):
formats.extend(self._extract_f4m_formats(
url, video_id, f4m_id='hds-{}'.format(name), fatal=False))
else:
formats.append({
'format_id': 'direct-{}'.format(name),
'url': url,
'ext': determine_ext(url, 'mp4'),
})
return formats
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
main = self._search_regex(
r'(?m)(?:var|let|const)\s+playerOptionsWithMainSource\s*=\s*(\{.*?\})\s*;\s*?$',
webpage, 'main source')
formats = self.extract_formats(main, video_id, 'main')
audiodesc = self._search_regex(
r'(?m)(?:var|let|const)\s+playerOptionsWithAudioDescriptionSource\s*=\s*(\{.*?\})\s*;\s*?$',
webpage, 'audio description', default=None)
if audiodesc:
formats.extend(self.extract_formats(audiodesc, video_id, 'audiodescription'))
english_ver = self._search_regex(
r'(?m)(?:var|let|const)\s+playerOptionsWithEnglishVersion\s*=\s*(\{.*?\})\s*;\s*?$',
webpage, 'english version', default=None)
if english_ver:
formats.extend(self.extract_formats(english_ver, video_id, 'english'))
self._sort_formats(formats)
return {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}

View file

@ -23,11 +23,9 @@ class NineCNineMediaIE(InfoExtractor):
destination_code, content_id = re.match(self._VALID_URL, url).groups()
api_base_url = self._API_BASE_TEMPLATE % (destination_code, content_id)
content = self._download_json(api_base_url, content_id, query={
'$include': '[Media,Season,ContentPackages]',
'$include': '[Media.Name,Season,ContentPackages.Duration,ContentPackages.Id]',
})
title = content['Name']
if len(content['ContentPackages']) > 1:
raise ExtractorError('multiple content packages')
content_package = content['ContentPackages'][0]
package_id = content_package['Id']
content_package_url = api_base_url + 'contentpackages/%s/' % package_id

View file

@ -115,7 +115,7 @@ class NocoIE(InfoExtractor):
# Timestamp adjustment offset between server time and local time
# must be calculated in order to use timestamps closest to server's
# in all API requests (see https://github.com/ytdl-org/haruhi-dl/issues/7864)
# in all API requests (see https://github.com/ytdl-org/youtube-dl/issues/7864)
webpage = self._download_webpage(url, video_id)
player_url = self._search_regex(

View file

@ -58,7 +58,7 @@ class NRKBaseIE(InfoExtractor):
def _call_api(self, path, video_id, item=None, note=None, fatal=True, query=None):
return self._download_json(
urljoin('http://psapi.nrk.no/', path),
urljoin('https://psapi.nrk.no/', path),
video_id, note or 'Downloading %s JSON' % item,
fatal=fatal, query=query,
headers={'Accept-Encoding': 'gzip, deflate, br'})

View file

@ -21,7 +21,7 @@ class OnceIE(InfoExtractor):
progressive_formats = []
for adaptive_format in formats:
# Prevent advertisement from embedding into m3u8 playlist (see
# https://github.com/ytdl-org/haruhi-dl/issues/8893#issuecomment-199912684)
# https://github.com/ytdl-org/youtube-dl/issues/8893#issuecomment-199912684)
adaptive_format['url'] = re.sub(
r'\badsegmentlength=\d+', r'adsegmentlength=0', adaptive_format['url'])
rendition_id = self._search_regex(

View file

@ -3,10 +3,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
js_to_json,
)
import re
import datetime
class OnNetworkLoaderIE(InfoExtractor):
@ -45,51 +45,46 @@ class OnNetworkLoaderIE(InfoExtractor):
class OnNetworkFrameIE(InfoExtractor):
IE_NAME = 'onnetwork:frame'
_VALID_URL = r'https?://video\.onnetwork\.tv/frame84\.php\?(?:[^&]+&)*?mid=(?P<mid>[^&]+)&(?:[^&]+&)*?id=(?P<vid>[^&]+)'
_VALID_URL = r'https?://video\.onnetwork\.tv/frame\d+\.php\?(?:[^&]+&)*?mid=(?P<mid>[^&]+)&(?:[^&]+&)*?id=(?P<vid>[^&]+)'
_TESTS = [{
'url': 'https://video.onnetwork.tv/frame84.php?mid=MCwxNng5LDAsMCwxNzU1LDM3MjksMSwwLDEsMzYsNSwwLDIsMCw0LDEsMCwxLDEsMiwwLDAsMSwwLDAsMCwwLC0xOy0xOzIwOzIwLDAsNTAsMA==&preview=0&iid=0&e=1&widget=524&id=ffEXS991c5f8f4dbb502b540687287098d2d8',
'only_matching': True,
}]
_BASE_OBJECT_RE = r'''var onplayer\s*=\s*new tUIPlayer\(\s*{\s*videos\s*:\s*\[\s*{.*?'''
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
vid = mobj.group('vid')
webpage = self._download_webpage(url, vid, 'Downloading video frame')
video_id = self._search_regex(
self._BASE_OBJECT_RE + r'id\s*:\s*(\d+)',
webpage, 'video id')
m3u_url = self._search_regex(
self._BASE_OBJECT_RE + r'(?:urls\s*:\[{[^}]+}\],)?url\s*:"([^"]+)"',
webpage, 'm3u url')
title = self._search_regex(
self._BASE_OBJECT_RE + r"(?<!p)title\s*:\s*'([^']+)'",
webpage, 'title')
thumbnail = self._search_regex(
self._BASE_OBJECT_RE + r"""(?<![a-z])poster\s*:\s*'([^']+)'""",
webpage, 'thumbnail', fatal=False)
duration = self._search_regex(
self._BASE_OBJECT_RE + r'duration\s*:\s*(\d+)',
webpage, 'duration', fatal=False)
age_limit = self._search_regex(
self._BASE_OBJECT_RE + r'ageallow\s*:\s*(\d+)',
webpage, 'age limit', fatal=False)
upload_date_unix = self._search_regex(
self._BASE_OBJECT_RE + r'adddate\s*:\s*(\d+)',
webpage, 'upload date', fatal=False)
if upload_date_unix:
upload_date = datetime.datetime.fromtimestamp(int(upload_date_unix)).strftime('%Y%m%d')
data = self._search_regex(
r'(?s)var onplayer\s*=\s*new tUIPlayer\(\s*({\s*videos\s*:\s*\[\s*{.*?})\s*,\s*OnPlayerUI',
webpage, 'video data')
data = js_to_json(data)
data = re.sub(
r'\((?P<value>\d+(?:\.\d+)?|(["\']).+?\2)(?:\s*\|\|\s*.+?)?\)',
lambda x: x.group('value'), data)
data = re.sub(r'"\s*\+\s*"', '', data)
data = self._parse_json(data, vid)
formats = self._extract_m3u8_formats(m3u_url, video_id)
entries = []
for video in data['videos']:
video_id = str(video['id'])
formats = self._extract_m3u8_formats(video['url'], video_id)
self._sort_formats(formats)
entries.append({
'id': video_id,
'title': video['title'],
'formats': formats,
'thumbnail': video.get('poster'),
'duration': int_or_none(video.get('duration')),
'age_limit': int_or_none(video.get('ageallow')),
'timestamp': int_or_none(video.get('adddate')),
})
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'duration': int_or_none(duration),
'age_limit': int_or_none(age_limit),
'upload_date': upload_date,
'_type': 'playlist',
'entries': entries,
'id': vid,
}

View file

@ -98,6 +98,9 @@ class ORFTVthekIE(InfoExtractor):
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
src, video_id, f4m_id=format_id, fatal=False))
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
src, video_id, mpd_id=format_id, fatal=False))
else:
formats.append({
'format_id': format_id,
@ -140,6 +143,25 @@ class ORFTVthekIE(InfoExtractor):
})
upload_date = unified_strdate(sd.get('created_date'))
thumbnails = []
preview = sd.get('preview_image_url')
if preview:
thumbnails.append({
'id': 'preview',
'url': preview,
'preference': 0,
})
image = sd.get('image_full_url')
if not image and len(data_jsb) == 1:
image = self._og_search_thumbnail(webpage)
if image:
thumbnails.append({
'id': 'full',
'url': image,
'preference': 1,
})
entries.append({
'_type': 'video',
'id': video_id,
@ -149,7 +171,7 @@ class ORFTVthekIE(InfoExtractor):
'description': sd.get('description'),
'duration': int_or_none(sd.get('duration_in_seconds')),
'upload_date': upload_date,
'thumbnail': sd.get('image_full_url'),
'thumbnails': thumbnails,
})
return {
@ -182,7 +204,7 @@ class ORFRadioIE(InfoExtractor):
duration = end - start if end and start else None
entries.append({
'id': loop_stream_id.replace('.mp3', ''),
'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id),
'url': 'https://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id),
'title': title,
'description': clean_html(data.get('subtitle')),
'duration': duration,

View file

@ -0,0 +1,148 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
str_or_none,
try_get,
)
class PalcoMP3BaseIE(InfoExtractor):
_GQL_QUERY_TMPL = '''{
artist(slug: "%s") {
%s
}
}'''
_ARTIST_FIELDS_TMPL = '''music(slug: "%%s") {
%s
}'''
_MUSIC_FIELDS = '''duration
hls
mp3File
musicID
plays
title'''
def _call_api(self, artist_slug, artist_fields):
return self._download_json(
'https://www.palcomp3.com.br/graphql/', artist_slug, query={
'query': self._GQL_QUERY_TMPL % (artist_slug, artist_fields),
})['data']
def _parse_music(self, music):
music_id = compat_str(music['musicID'])
title = music['title']
formats = []
hls_url = music.get('hls')
if hls_url:
formats.append({
'url': hls_url,
'protocol': 'm3u8_native',
'ext': 'mp4',
})
mp3_file = music.get('mp3File')
if mp3_file:
formats.append({
'url': mp3_file,
})
return {
'id': music_id,
'title': title,
'formats': formats,
'duration': int_or_none(music.get('duration')),
'view_count': int_or_none(music.get('plays')),
}
def _real_initialize(self):
self._ARTIST_FIELDS_TMPL = self._ARTIST_FIELDS_TMPL % self._MUSIC_FIELDS
def _real_extract(self, url):
artist_slug, music_slug = re.match(self._VALID_URL, url).groups()
artist_fields = self._ARTIST_FIELDS_TMPL % music_slug
music = self._call_api(artist_slug, artist_fields)['artist']['music']
return self._parse_music(music)
class PalcoMP3IE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:song'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/nossas-composicoes-cuida-bem-dela/',
'md5': '99fd6405b2d8fd589670f6db1ba3b358',
'info_dict': {
'id': '3162927',
'ext': 'mp3',
'title': 'Nossas Composições - CUIDA BEM DELA',
'duration': 210,
'view_count': int,
}
}]
@classmethod
def suitable(cls, url):
return False if PalcoMP3VideoIE.suitable(url) else super(PalcoMP3IE, cls).suitable(url)
class PalcoMP3ArtistIE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:artist'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<id>[^/?&#]+)'
_TESTS = [{
'url': 'https://www.palcomp3.com.br/condedoforro/',
'info_dict': {
'id': '358396',
'title': 'Conde do Forró',
},
'playlist_mincount': 188,
}]
_ARTIST_FIELDS_TMPL = '''artistID
musics {
nodes {
%s
}
}
name'''
@ classmethod
def suitable(cls, url):
return False if re.match(PalcoMP3IE._VALID_URL, url) else super(PalcoMP3ArtistIE, cls).suitable(url)
def _real_extract(self, url):
artist_slug = self._match_id(url)
artist = self._call_api(artist_slug, self._ARTIST_FIELDS_TMPL)['artist']
def entries():
for music in (try_get(artist, lambda x: x['musics']['nodes'], list) or []):
yield self._parse_music(music)
return self.playlist_result(
entries(), str_or_none(artist.get('artistID')), artist.get('name'))
class PalcoMP3VideoIE(PalcoMP3BaseIE):
IE_NAME = 'PalcoMP3:video'
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)/?#clipe'
_TESTS = [{
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/maiara-e-maraisa-voce-faz-falta-aqui-ao-vivo-em-vicosa-mg/#clipe',
'add_ie': ['Youtube'],
'info_dict': {
'id': '_pD1nR2qqPg',
'ext': 'mp4',
'title': 'Maiara e Maraisa - Você Faz Falta Aqui - DVD Ao Vivo Em Campo Grande',
'description': 'md5:7043342c09a224598e93546e98e49282',
'upload_date': '20161107',
'uploader_id': 'maiaramaraisaoficial',
'uploader': 'Maiara e Maraisa',
}
}]
_MUSIC_FIELDS = 'youtubeID'
def _parse_music(self, music):
youtube_id = music['youtubeID']
return self.url_result(youtube_id, 'Youtube', youtube_id)

View file

@ -0,0 +1,36 @@
# coding: utf-8
from .common import InfoExtractor
from ..utils import (
js_to_json,
)
class PatroniteAudioIE(InfoExtractor):
IE_NAME = 'patronite:audio'
_VALID_URL = r'https?://patronite\.pl/(?P<id>[a-zA-Z\d-]+)'
_TESTS = [{
'url': 'https://patronite.pl/radionowyswiat',
'info_dict': {
'id': 'radionowyswiat',
'ext': 'unknown_video',
'title': 'Radio Nowy Świat',
'description': 'Dobre radio tworzą nie tylko dziennikarze, realizatorzy, technicy czy reporterzy. Bez nich nie byłoby radia, ale też radia nie byłoby bez słuchaczy. Dziś każdy z Was może pójść o krok dalej - stając się współtwórcą i mecenasem Radia Nowy Świat!',
},
}]
def _real_extract(self, url):
# only works with radio streams, no podcast support
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
data = self._parse_json(self._search_regex(
r"(?s)const player\s*=\s*new window\.PatroniteWebPlayer\('\.web-player',\s*({.+?})\);",
webpage, 'player data'), display_id, js_to_json)
return {
'id': display_id,
'url': data['url'],
'title': data['title'],
'description': self._og_search_description(webpage),
'thumbnail': data.get('artwork'),
'vcodec': 'none',
}

View file

@ -305,7 +305,7 @@ class PBSIE(InfoExtractor):
{
# Video embedded in iframe containing angle brackets as attribute's value (e.g.
# "<iframe style='position: absolute;<br />\ntop: 0; left: 0;' ...", see
# https://github.com/ytdl-org/haruhi-dl/issues/7059)
# https://github.com/ytdl-org/youtube-dl/issues/7059)
'url': 'http://www.pbs.org/food/features/a-chefs-life-season-3-episode-5-prickly-business/',
'md5': '59b0ef5009f9ac8a319cc5efebcd865e',
'info_dict': {
@ -348,7 +348,7 @@ class PBSIE(InfoExtractor):
},
},
{
# https://github.com/ytdl-org/haruhi-dl/issues/13801
# https://github.com/ytdl-org/youtube-dl/issues/13801
'url': 'https://www.pbs.org/video/pbs-newshour-full-episode-july-31-2017-1501539057/',
'info_dict': {
'id': '3003333873',
@ -642,7 +642,7 @@ class PBSIE(InfoExtractor):
# we won't try extracting them.
# Since summer 2016 higher quality formats (4500k and 6500k) are also available
# albeit they are not documented in [2].
# 1. https://github.com/ytdl-org/haruhi-dl/commit/cbc032c8b70a038a69259378c92b4ba97b42d491#commitcomment-17313656
# 1. https://github.com/ytdl-org/youtube-dl/commit/cbc032c8b70a038a69259378c92b4ba97b42d491#commitcomment-17313656
# 2. https://projects.pbs.org/confluence/display/coveapi/COVE+Video+Specifications
if not bitrate or int(bitrate) < 400:
continue

View file

@ -1,11 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
import datetime
from urllib.parse import urlencode
import re
from .common import SelfhostedInfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
int_or_none,
parse_resolution,
str_or_none,
@ -13,28 +16,180 @@ from ..utils import (
unified_timestamp,
url_or_none,
urljoin,
ExtractorError,
)
class PeerTubeSHIE(SelfhostedInfoExtractor):
_UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
_API_BASE = 'https://%s/api/v1/videos/%s/%s'
_VALID_URL = r'peertube:(?P<host>[^:]+):(?P<id>%s)' % (_UUID_RE)
_SH_VALID_URL = r'https?://(?P<host>[^/]+)/(?:videos/(?:watch|embed)|api/v\d/videos)/(?P<id>%s)' % _UUID_RE
class PeerTubeBaseExtractor(SelfhostedInfoExtractor):
_UUID_RE = r'[\da-zA-Z]{22}|[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
_API_BASE = 'https://%s/api/v1/%s/%s/%s'
_SH_VALID_CONTENT_STRINGS = (
'<title>PeerTube<',
'There will be other non JS-based clients to access PeerTube',
'>There are other non JS-based unofficial clients to access PeerTube',
'>We are sorry but it seems that PeerTube is not compatible with your web browser.<',
'<meta property="og:platform" content="PeerTube"',
)
_NETRC_MACHINE = 'peertube'
_LOGIN_INFO = None
def _login(self):
if self._LOGIN_INFO:
ts = datetime.datetime.now().timestamp()
if self._LOGIN_INFO['expires_on'] >= ts + 5:
return True
username, password = self._get_login_info()
if not username:
return None
# the instance domain (the one where user has an account) must be separated from the user e-mail
mobj = re.match(r'^(?P<username>[^@]+(?:@[^@]+)?)@(?P<instance>.+)$', username)
if not mobj:
self.report_warning(
'Invalid login format - must be in format [username or email]@[instance]')
username, instance = mobj.group('username', 'instance')
oauth_keys = self._downloader.cache.load('peertube-oauth', instance)
if not oauth_keys:
oauth_keys = self._download_json(f'https://{instance}/api/v1/oauth-clients/local', instance, 'Downloading OAuth keys')
self._downloader.cache.store('peertube-oauth', instance, oauth_keys)
client_id, client_secret = oauth_keys['client_id'], oauth_keys['client_secret']
auth_res = self._download_json(f'https://{instance}/api/v1/users/token', instance, 'Logging in', data=bytes(urlencode({
'client_id': client_id,
'client_secret': client_secret,
'response_type': 'code',
'grant_type': 'password',
'scope': 'user',
'username': username,
'password': password,
}).encode('utf-8')))
ts = datetime.datetime.now().timestamp()
auth_res['instance'] = instance
auth_res['expires_on'] = ts + auth_res['expires_in']
auth_res['refresh_token_expires_on'] = ts + auth_res['refresh_token_expires_in']
# not using self to set the details to expose it to all peertube extractors
PeerTubeBaseExtractor._LOGIN_INFO = auth_res
def _call_api(self, host, resource, resource_id, path, note=None, errnote=None, fatal=True):
return self._download_json(
self._API_BASE % (host, resource, resource_id, path), resource_id,
headers={
'Authorization': f'Bearer {self._LOGIN_INFO["access_token"]}',
} if self._LOGIN_INFO and self._LOGIN_INFO['instance'] == host else {},
note=note, errnote=errnote, fatal=fatal)
def _parse_video(self, video, url):
host, display_id = self._match_id_and_host(url)
info_dict = {}
formats = []
files = video.get('files') or []
for playlist in (video.get('streamingPlaylists') or []):
if not isinstance(playlist, dict):
continue
playlist_files = playlist.get('files')
if not (playlist_files and isinstance(playlist_files, list)):
continue
files.extend(playlist_files)
for file_ in files:
if not isinstance(file_, dict):
continue
file_url = url_or_none(file_.get('fileUrl'))
if not file_url:
continue
file_size = int_or_none(file_.get('size'))
format_id = try_get(
file_, lambda x: x['resolution']['label'], compat_str)
f = parse_resolution(format_id)
f.update({
'url': file_url,
'format_id': format_id,
'filesize': file_size,
})
if format_id == '0p':
f['vcodec'] = 'none'
else:
f['fps'] = int_or_none(file_.get('fps'))
formats.append(f)
if file_.get('torrentDownloadUrl'):
f = f.copy()
f.update({
'url': file_['torrentDownloadUrl'],
'ext': determine_ext(file_url),
'format_id': '%s-torrent' % format_id,
'protocol': 'bittorrent',
})
formats.append(f)
if files:
self._sort_formats(formats)
info_dict['formats'] = formats
else:
info_dict.update({
'_type': 'url_transparent',
'url': 'peertube:%s:%s' % (host, video['uuid']),
'ie_key': 'PeerTubeSH',
})
def data(section, field, type_):
return try_get(video, lambda x: x[section][field], type_)
def account_data(field, type_):
return data('account', field, type_)
def channel_data(field, type_):
return data('channel', field, type_)
category = data('category', 'label', compat_str)
categories = [category] if category else None
nsfw = video.get('nsfw')
if nsfw is bool:
age_limit = 18 if nsfw else 0
else:
age_limit = None
webpage_url = 'https://%s/videos/watch/%s' % (host, display_id)
info_dict.update({
'id': video['uuid'],
'title': video['name'],
'description': video.get('description'),
'thumbnail': urljoin(webpage_url, video.get('thumbnailPath')),
'timestamp': unified_timestamp(video.get('publishedAt')),
'uploader': account_data('displayName', compat_str),
'uploader_id': str_or_none(account_data('id', int)),
'uploader_url': url_or_none(account_data('url', compat_str)),
'channel': channel_data('displayName', compat_str),
'channel_id': str_or_none(channel_data('id', int)),
'channel_url': url_or_none(channel_data('url', compat_str)),
'language': data('language', 'id', compat_str),
'license': data('licence', 'label', compat_str),
'duration': int_or_none(video.get('duration')),
'view_count': int_or_none(video.get('views')),
'like_count': int_or_none(video.get('likes')),
'dislike_count': int_or_none(video.get('dislikes')),
'age_limit': age_limit,
'tags': try_get(video, lambda x: x['tags'], list),
'categories': categories,
})
return info_dict
class PeerTubeSHIE(PeerTubeBaseExtractor):
_VALID_URL = r'peertube:(?P<host>[^:]+):(?P<id>%s)' % (PeerTubeBaseExtractor._UUID_RE)
_SH_VALID_URL = r'https?://(?P<host>[^/]+)/(?:videos/(?:watch|embed)|api/v\d/videos|w)/(?P<id>%s)' % (PeerTubeBaseExtractor._UUID_RE)
_TESTS = [{
'url': 'https://framatube.org/videos/watch/9c9de5e8-0a1e-484a-b099-e80766180a6d',
'md5': '9bed8c0137913e17b86334e5885aacff',
'md5': '8563064d245a4be5705bddb22bb00a28',
'info_dict': {
'id': '9c9de5e8-0a1e-484a-b099-e80766180a6d',
'ext': 'mp4',
'title': 'What is PeerTube?',
'description': 'md5:3fefb8dde2b189186ce0719fda6f7b10',
'description': 'md5:96adbaf219b4d41747bfc5937df0b017',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'timestamp': 1538391166,
'upload_date': '20181001',
@ -65,6 +220,27 @@ class PeerTubeSHIE(SelfhostedInfoExtractor):
'upload_date': '20200420',
'uploader': 'Drew DeVault',
}
}, {
# new url scheme since PeerTube 3.3
'url': 'https://peertube2.cpy.re/w/3fbif9S3WmtTP8gGsC5HBd',
'info_dict': {
'id': '122d093a-1ede-43bd-bd34-59d2931ffc5e',
'ext': 'mp4',
'title': 'E2E tests',
'uploader_id': '37855',
'timestamp': 1589276219,
'upload_date': '20200512',
'uploader': 'chocobozzz',
},
}, {
'url': 'https://peertube2.cpy.re/w/122d093a-1ede-43bd-bd34-59d2931ffc5e',
'only_matching': True,
}, {
'url': 'https://peertube2.cpy.re/api/v1/videos/3fbif9S3WmtTP8gGsC5HBd',
'only_matching': True,
}, {
'url': 'peertube:peertube2.cpy.re:3fbif9S3WmtTP8gGsC5HBd',
'only_matching': True,
}, {
'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44',
'only_matching': True,
@ -91,14 +267,9 @@ class PeerTubeSHIE(SelfhostedInfoExtractor):
return ['peertube:%s:%s' % (mobj.group('host'), mobj.group('video_id'))
for mobj in entries]
def _call_api(self, host, video_id, path, note=None, errnote=None, fatal=True):
return self._download_json(
self._API_BASE % (host, video_id, path), video_id,
note=note, errnote=errnote, fatal=fatal)
def _get_subtitles(self, host, video_id):
captions = self._call_api(
host, video_id, 'captions', note='Downloading captions JSON',
host, 'videos', video_id, 'captions', note='Downloading captions JSON',
fatal=False)
if not isinstance(captions, dict):
return
@ -117,101 +288,188 @@ class PeerTubeSHIE(SelfhostedInfoExtractor):
return subtitles
def _selfhosted_extract(self, url, webpage=None):
mobj = re.match(self._VALID_URL, url)
if not mobj:
mobj = re.match(self._SH_VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
host, video_id = self._match_id_and_host(url)
self._login()
if self._LOGIN_INFO and self._LOGIN_INFO['instance'] != host:
video_search = self._call_api(
self._LOGIN_INFO['instance'], 'search', 'videos', '?' + urlencode({
'search': f'https://{host}/videos/watch/{video_id}',
}), note='Searching for remote video')
if len(video_search) == 0:
raise ExtractorError('Remote video not found')
host, video_id = self._LOGIN_INFO['instance'], video_search['data'][0]['uuid']
video = self._call_api(
host, video_id, '', note='Downloading video JSON')
host, 'videos', video_id, '', note='Downloading video JSON')
title = video['name']
info_dict = self._parse_video(video, url)
formats = []
files = video.get('files') or []
for playlist in (video.get('streamingPlaylists') or []):
if not isinstance(playlist, dict):
continue
playlist_files = playlist.get('files')
if not (playlist_files and isinstance(playlist_files, list)):
continue
files.extend(playlist_files)
for file_ in files:
if not isinstance(file_, dict):
continue
file_url = url_or_none(file_.get('fileUrl'))
if not file_url:
continue
file_size = int_or_none(file_.get('size'))
format_id = try_get(
file_, lambda x: x['resolution']['label'], compat_str)
f = parse_resolution(format_id)
f.update({
'url': file_url,
'format_id': format_id,
'filesize': file_size,
})
if format_id == '0p':
f['vcodec'] = 'none'
else:
f['fps'] = int_or_none(file_.get('fps'))
formats.append(f)
self._sort_formats(formats)
info_dict['subtitles'] = self.extract_subtitles(host, video_id)
description = None
if webpage:
description = self._og_search_description(webpage)
description = self._og_search_description(webpage, default=None)
if not description:
full_description = self._call_api(
host, video_id, 'description', note='Downloading description JSON',
host, 'videos', video_id, 'description', note='Downloading description JSON',
fatal=False)
if isinstance(full_description, dict):
description = str_or_none(full_description.get('description'))
if not description:
description = video.get('description')
info_dict['description'] = description
subtitles = self.extract_subtitles(host, video_id)
return info_dict
def data(section, field, type_):
return try_get(video, lambda x: x[section][field], type_)
def account_data(field, type_):
return data('account', field, type_)
class PeerTubePlaylistSHIE(PeerTubeBaseExtractor):
_VALID_URL = r'peertube:playlist:(?P<host>[^:]+):(?P<id>.+)'
_SH_VALID_URL = r'https?://(?P<host>[^/]+)/(?:videos/(?:watch|embed)/playlist|api/v\d/video-playlists|w/p)/(?P<id>%s)' % (PeerTubeBaseExtractor._UUID_RE)
def channel_data(field, type_):
return data('channel', field, type_)
_TESTS = [{
'url': 'https://video.internet-czas-dzialac.pl/videos/watch/playlist/3c81b894-acde-4539-91a2-1748b208c14c?playlistPosition=1',
'info_dict': {
'id': '3c81b894-acde-4539-91a2-1748b208c14c',
'title': 'Podcast Internet. Czas Działać!',
'uploader_id': 3,
'uploader': 'Internet. Czas działać!',
},
'playlist_mincount': 14,
}, {
'url': 'https://peertube2.cpy.re/w/p/hrAdcvjkMMkHJ28upnoN21',
'only_matching': True,
}]
category = data('category', 'label', compat_str)
categories = [category] if category else None
def _selfhosted_extract(self, url, webpage=None):
host, display_id = self._match_id_and_host(url)
nsfw = video.get('nsfw')
if nsfw is bool:
age_limit = 18 if nsfw else 0
else:
age_limit = None
self._login()
playlist_data = self._call_api(host, 'video-playlists', display_id, '', 'Downloading playlist metadata')
entries = []
i = 0
videos = {'total': 0}
while len(entries) < videos['total'] or i == 0:
videos = self._call_api(host, 'video-playlists', display_id,
'videos?start=%d&count=25' % (i * 25),
note=('Downloading playlist video list (page #%d)' % i))
i += 1
for video in videos['data']:
entries.append(self._parse_video(video['video'], url))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': urljoin(url, video.get('thumbnailPath')),
'timestamp': unified_timestamp(video.get('publishedAt')),
'uploader': account_data('displayName', compat_str),
'uploader_id': str_or_none(account_data('id', int)),
'uploader_url': url_or_none(account_data('url', compat_str)),
'channel': channel_data('displayName', compat_str),
'channel_id': str_or_none(channel_data('id', int)),
'channel_url': url_or_none(channel_data('url', compat_str)),
'language': data('language', 'id', compat_str),
'license': data('licence', 'label', compat_str),
'duration': int_or_none(video.get('duration')),
'view_count': int_or_none(video.get('views')),
'like_count': int_or_none(video.get('likes')),
'dislike_count': int_or_none(video.get('dislikes')),
'age_limit': age_limit,
'tags': try_get(video, lambda x: x['tags'], list),
'categories': categories,
'formats': formats,
'subtitles': subtitles
'_type': 'playlist',
'entries': entries,
'id': playlist_data['uuid'],
'title': playlist_data['displayName'],
'description': playlist_data.get('description'),
'channel': playlist_data['videoChannel']['displayName'],
'channel_id': playlist_data['videoChannel']['id'],
'channel_url': playlist_data['videoChannel']['url'],
'uploader': playlist_data['ownerAccount']['displayName'],
'uploader_id': playlist_data['ownerAccount']['id'],
'uploader_url': playlist_data['ownerAccount']['url'],
}
class PeerTubeChannelSHIE(PeerTubeBaseExtractor):
_VALID_URL = r'peertube:channel:(?P<host>[^:]+):(?P<id>.+)'
_SH_VALID_URL = r'https?://(?P<host>[^/]+)/(?:(?:api/v\d/)?video-channels|c)/(?P<id>[^/?#]+)(?:/videos)?'
_TESTS = [{
'url': 'https://video.internet-czas-dzialac.pl/video-channels/internet_czas_dzialac/videos',
'info_dict': {
'id': '2',
'title': 'Internet. Czas działać!',
'description': 'md5:ac35d70f6625b04b189e0b4b76e62e17',
'uploader_id': 3,
'uploader': 'Internet. Czas działać!',
},
'playlist_mincount': 14,
}, {
'url': 'https://video.internet-czas-dzialac.pl/c/internet_czas_dzialac',
'only_matching': True,
}]
def _selfhosted_extract(self, url, webpage=None):
host, display_id = self._match_id_and_host(url)
self._login()
channel_data = self._call_api(host, 'video-channels', display_id, '', 'Downloading channel metadata')
entries = []
i = 0
videos = {'total': 0}
while len(entries) < videos['total'] or i == 0:
videos = self._call_api(host, 'video-channels', display_id,
'videos?start=%d&count=25&sort=publishedAt' % (i * 25),
note=('Downloading channel video list (page #%d)' % i))
i += 1
for video in videos['data']:
entries.append(self._parse_video(video, url))
return {
'_type': 'playlist',
'entries': entries,
'id': str(channel_data['id']),
'title': channel_data['displayName'],
'display_id': channel_data['name'],
'description': channel_data.get('description'),
'channel': channel_data['displayName'],
'channel_id': channel_data['id'],
'channel_url': channel_data['url'],
'uploader': channel_data['ownerAccount']['displayName'],
'uploader_id': channel_data['ownerAccount']['id'],
'uploader_url': channel_data['ownerAccount']['url'],
}
class PeerTubeAccountSHIE(PeerTubeBaseExtractor):
_VALID_URL = r'peertube:account:(?P<host>[^:]+):(?P<id>.+)'
_SH_VALID_URL = r'https?://(?P<host>[^/]+)/(?:(?:api/v\d/)?accounts|a)/(?P<id>[^/?#]+)(?:/video(?:s|-channels))?'
_TESTS = [{
'url': 'https://video.internet-czas-dzialac.pl/accounts/icd/video-channels',
'info_dict': {
'id': '3',
'description': 'md5:ac35d70f6625b04b189e0b4b76e62e17',
'uploader': 'Internet. Czas działać!',
'title': 'Internet. Czas działać!',
'uploader_id': 3,
},
'playlist_mincount': 14,
}, {
'url': 'https://video.internet-czas-dzialac.pl/a/icd',
'only_matching': True,
}]
def _selfhosted_extract(self, url, webpage=None):
host, display_id = self._match_id_and_host(url)
self._login()
account_data = self._call_api(host, 'accounts', display_id, '', 'Downloading account metadata')
entries = []
i = 0
videos = {'total': 0}
while len(entries) < videos['total'] or i == 0:
videos = self._call_api(host, 'accounts', display_id,
'videos?start=%d&count=25&sort=publishedAt' % (i * 25),
note=('Downloading account video list (page #%d)' % i))
i += 1
for video in videos['data']:
entries.append(self._parse_video(video, url))
return {
'_type': 'playlist',
'entries': entries,
'id': str(account_data['id']),
'title': account_data['displayName'],
'display_id': account_data['name'],
'description': account_data.get('description'),
'uploader': account_data['displayName'],
'uploader_id': account_data['id'],
'uploader_url': account_data['url'],
}

View file

@ -1,45 +1,133 @@
# coding: utf-8
from __future__ import unicode_literals
from .dreisat import DreiSatIE
import re
from .youtube import YoutubeIE
from .zdf import ZDFBaseIE
from ..compat import compat_str
from ..utils import (
int_or_none,
merge_dicts,
try_get,
unified_timestamp,
urljoin,
)
class PhoenixIE(DreiSatIE):
class PhoenixIE(ZDFBaseIE):
IE_NAME = 'phoenix.de'
_VALID_URL = r'''(?x)https?://(?:www\.)?phoenix\.de/content/
(?:
phoenix/die_sendungen/(?:[^/]+/)?
)?
(?P<id>[0-9]+)'''
_TESTS = [
{
'url': 'http://www.phoenix.de/content/884301',
'md5': 'ed249f045256150c92e72dbb70eadec6',
'info_dict': {
'id': '884301',
'ext': 'mp4',
'title': 'Michael Krons mit Hans-Werner Sinn',
'description': 'Im Dialog - Sa. 25.10.14, 00.00 - 00.35 Uhr',
'upload_date': '20141025',
'uploader': 'Im Dialog',
}
_VALID_URL = r'https?://(?:www\.)?phoenix\.de/(?:[^/]+/)*[^/?#&]*-a-(?P<id>\d+)\.html'
_TESTS = [{
# Same as https://www.zdf.de/politik/phoenix-sendungen/wohin-fuehrt-der-protest-in-der-pandemie-100.html
'url': 'https://www.phoenix.de/sendungen/ereignisse/corona-nachgehakt/wohin-fuehrt-der-protest-in-der-pandemie-a-2050630.html',
'md5': '34ec321e7eb34231fd88616c65c92db0',
'info_dict': {
'id': '210222_phx_nachgehakt_corona_protest',
'ext': 'mp4',
'title': 'Wohin führt der Protest in der Pandemie?',
'description': 'md5:7d643fe7f565e53a24aac036b2122fbd',
'duration': 1691,
'timestamp': 1613902500,
'upload_date': '20210221',
'uploader': 'Phoenix',
'series': 'corona nachgehakt',
'episode': 'Wohin führt der Protest in der Pandemie?',
},
{
'url': 'http://www.phoenix.de/content/phoenix/die_sendungen/869815',
'only_matching': True,
}, {
# Youtube embed
'url': 'https://www.phoenix.de/sendungen/gespraeche/phoenix-streitgut-brennglas-corona-a-1965505.html',
'info_dict': {
'id': 'hMQtqFYjomk',
'ext': 'mp4',
'title': 'phoenix streitgut: Brennglas Corona - Wie gerecht ist unsere Gesellschaft?',
'description': 'md5:ac7a02e2eb3cb17600bc372e4ab28fdd',
'duration': 3509,
'upload_date': '20201219',
'uploader': 'phoenix',
'uploader_id': 'phoenix',
},
{
'url': 'http://www.phoenix.de/content/phoenix/die_sendungen/diskussionen/928234',
'only_matching': True,
'params': {
'skip_download': True,
},
]
}, {
'url': 'https://www.phoenix.de/entwicklungen-in-russland-a-2044720.html',
'only_matching': True,
}, {
# no media
'url': 'https://www.phoenix.de/sendungen/dokumentationen/mit-dem-jumbo-durch-die-nacht-a-89625.html',
'only_matching': True,
}, {
# Same as https://www.zdf.de/politik/phoenix-sendungen/die-gesten-der-maechtigen-100.html
'url': 'https://www.phoenix.de/sendungen/dokumentationen/gesten-der-maechtigen-i-a-89468.html?ref=suche',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
article_id = self._match_id(url)
internal_id = self._search_regex(
r'<div class="phx_vod" id="phx_vod_([0-9]+)"',
webpage, 'internal video ID')
article = self._download_json(
'https://www.phoenix.de/response/id/%s' % article_id, article_id,
'Downloading article JSON')
api_url = 'http://www.phoenix.de/php/mediaplayer/data/beitrags_details.php?ak=web&id=%s' % internal_id
return self.extract_from_xml_url(video_id, api_url)
video = article['absaetze'][0]
title = video.get('titel') or article.get('subtitel')
if video.get('typ') == 'video-youtube':
video_id = video['id']
return self.url_result(
video_id, ie=YoutubeIE.ie_key(), video_id=video_id,
video_title=title)
video_id = compat_str(video.get('basename') or video.get('content'))
details = self._download_json(
'https://www.phoenix.de/php/mediaplayer/data/beitrags_details.php',
video_id, 'Downloading details JSON', query={
'ak': 'web',
'ptmd': 'true',
'id': video_id,
'profile': 'player2',
})
title = title or details['title']
content_id = details['tracking']['nielsen']['content']['assetid']
info = self._extract_ptmd(
'https://tmd.phoenix.de/tmd/2/ngplayer_2_3/vod/ptmd/phoenix/%s' % content_id,
content_id, None, url)
duration = int_or_none(try_get(
details, lambda x: x['tracking']['nielsen']['content']['length']))
timestamp = unified_timestamp(details.get('editorialDate'))
series = try_get(
details, lambda x: x['tracking']['nielsen']['content']['program'],
compat_str)
episode = title if details.get('contentType') == 'episode' else None
thumbnails = []
teaser_images = try_get(details, lambda x: x['teaserImageRef']['layouts'], dict) or {}
for thumbnail_key, thumbnail_url in teaser_images.items():
thumbnail_url = urljoin(url, thumbnail_url)
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
return merge_dicts(info, {
'id': content_id,
'title': title,
'description': details.get('leadParagraph'),
'duration': duration,
'thumbnails': thumbnails,
'timestamp': timestamp,
'uploader': details.get('tvService'),
'series': series,
'episode': episode,
})

View file

@ -1,22 +1,15 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import time
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
js_to_json,
try_get,
update_url_query,
urlencode_postdata,
)
class PicartoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)(?:/(?P<token>[a-zA-Z0-9]+))?'
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)'
_TEST = {
'url': 'https://picarto.tv/Setz',
'info_dict': {
@ -34,65 +27,46 @@ class PicartoIE(InfoExtractor):
return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
channel_id = mobj.group('id')
channel_id = self._match_id(url)
metadata = self._download_json(
'https://api.picarto.tv/v1/channel/name/' + channel_id,
channel_id)
data = self._download_json(
'https://ptvintern.picarto.tv/ptvapi', channel_id, query={
'query': '''{
channel(name: "%s") {
adult
id
online
stream_name
title
}
getLoadBalancerUrl(channel_name: "%s") {
url
}
}''' % (channel_id, channel_id),
})['data']
metadata = data['channel']
if metadata.get('online') is False:
if metadata.get('online') == 0:
raise ExtractorError('Stream is offline', expected=True)
title = metadata['title']
cdn_data = self._download_json(
'https://picarto.tv/process/channel', channel_id,
data=urlencode_postdata({'loadbalancinginfo': channel_id}),
note='Downloading load balancing info')
data['getLoadBalancerUrl']['url'] + '/stream/json_' + metadata['stream_name'] + '.js',
channel_id, 'Downloading load balancing info')
token = mobj.group('token') or 'public'
params = {
'con': int(time.time() * 1000),
'token': token,
}
prefered_edge = cdn_data.get('preferedEdge')
formats = []
for edge in cdn_data['edges']:
edge_ep = edge.get('ep')
if not edge_ep or not isinstance(edge_ep, compat_str):
for source in (cdn_data.get('source') or []):
source_url = source.get('url')
if not source_url:
continue
edge_id = edge.get('id')
for tech in cdn_data['techs']:
tech_label = tech.get('label')
tech_type = tech.get('type')
preference = 0
if edge_id == prefered_edge:
preference += 1
format_id = []
if edge_id:
format_id.append(edge_id)
if tech_type == 'application/x-mpegurl' or tech_label == 'HLS':
format_id.append('hls')
formats.extend(self._extract_m3u8_formats(
update_url_query(
'https://%s/hls/%s/index.m3u8'
% (edge_ep, channel_id), params),
channel_id, 'mp4', preference=preference,
m3u8_id='-'.join(format_id), fatal=False))
continue
elif tech_type == 'video/mp4' or tech_label == 'MP4':
format_id.append('mp4')
formats.append({
'url': update_url_query(
'https://%s/mp4/%s.mp4' % (edge_ep, channel_id),
params),
'format_id': '-'.join(format_id),
'preference': preference,
})
else:
# rtmp format does not seem to work
continue
source_type = source.get('type')
if source_type == 'html5/application/vnd.apple.mpegurl':
formats.extend(self._extract_m3u8_formats(
source_url, channel_id, 'mp4', m3u8_id='hls', fatal=False))
elif source_type == 'html5/video/mp4':
formats.append({
'url': source_url,
})
self._sort_formats(formats)
mature = metadata.get('adult')
@ -103,10 +77,10 @@ class PicartoIE(InfoExtractor):
return {
'id': channel_id,
'title': self._live_title(metadata.get('title') or channel_id),
'title': self._live_title(title.strip()),
'is_live': True,
'thumbnail': try_get(metadata, lambda x: x['thumbnails']['web']),
'channel': channel_id,
'channel_id': metadata.get('id'),
'channel_url': 'https://picarto.tv/%s' % channel_id,
'age_limit': age_limit,
'formats': formats,

View file

@ -31,6 +31,7 @@ class PinterestBaseIE(InfoExtractor):
title = (data.get('title') or data.get('grid_title') or video_id).strip()
urls = []
formats = []
duration = None
if extract_formats:
@ -38,8 +39,9 @@ class PinterestBaseIE(InfoExtractor):
if not isinstance(format_dict, dict):
continue
format_url = url_or_none(format_dict.get('url'))
if not format_url:
if not format_url or format_url in urls:
continue
urls.append(format_url)
duration = float_or_none(format_dict.get('duration'), scale=1000)
ext = determine_ext(format_url)
if 'hls' in format_id.lower() or ext == 'm3u8':

Some files were not shown because too many files have changed in this diff Show more