Hello!
Some channels have stopped downloading EPG from the port.hu website.
1. WebGrab+Plus/w MDB & REX Postprocess -- version V3.2.3.0
2. Ubuntu 16.04.3 LTS
3. Mono JIT compiler version 6.12.0.122 (tarball Mon Feb 22 17:33:28 UTC 2021)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
Thanks for answers!
looks like the have some sort flood protection running.
grabbing get blocked when you grab too much data to fast.
try adding some delays to slow webgrab down.
add the retry= part to the site {xxx} line like below.
dont forget the | thats used to separate the settings.
site {channelnameprefix=port|retry=<retry time-out="30" channel-delay="5" index-delay="5" show-delay="5">4</retry>}
you can also try lowering the delays.i used the above and didnt get blocked.
thanks for the answer!
but I don’t understand where in the config this needs to be inserted. can you paste it into my config as an example so that I can understand?
its not in the config,its in the port.hu.ini
screenshot.
edit:
i made a typo,corrected.
I understand already, thank you. looks like it's working
double post
fyi it looks like the details page(show-delay) is the main cause of getting blocked.
you could probably use a index-delay,channel-delay of 1,maybe even a show-delay also of 1.
if you get blocked i would increase the show-delay and try again.
values in post above slow webgrab down alot,maybe more than necessary.
thanks, I'll test it.
But there is a similar problem with the site musor.tv
Hi!
I can confirm the isssues with musor.tv, also adding a really strange thing: while most of the channels are indexing/grabbing with the proper time/EIT (as: with the correct hour), a couple of the channels were grabbed/indexed with an incorrect date. The best example to this is Galaxy 4: tried to grab it multiple times, but the result was the same (see attached picture) ; according the grabbed time this program supposed to be tomorrow - at the grabbed time - but according to the website (musor.tv) the program is actually today (at the given time).
I didn't modified the site ini - as the other channels/date/time from the site are ok, and neither the backend software adjusts the EPG time.
I've never encountered any issue like this till now. (Not with other site inis)
Setup:
- WebGrab+Plus/w MDB & REX Postprocess -- version V5.1.3 beta
- Debian 11
- TvHeadend running under OSMC (Vero 4k+)
thy this.
its a version i have,different from one in siteini.pack
Edit: Updated Feb 11 2024
* @Revision 17 - [24/12/2023] Blackbear199
* details title fix for random index_urlshow failures
* change start time to UTC
Hi!
Thx, I'll try it @ the next run and will report back!
UPDATE: For some reason (using your files) ALL of the channels just gave back 'no shows on indexpage', BUT after copying back the original (siteini pack) files, it starts to grab properly (except the original issue)
works fine for me with V5.1.3 and linux.
edit the original ini and on the site {xxx} line change firstshow=now to firstshow=1
thats what causing the data to be shifted one day.
i checked a bunch channels and the firshow=x setting is probably not even needed but its safer to have it as the site uses a start time only with no date part.
Thx for the info. I'll check it later, as at the moment I'm facing an issue with my donation (meaning My 'donator' status has just gone).
i just check and your good till 2025 so not sure what happened,i sent a msg to jan to fix it.
Hello!
Grabbing does not work for me with your new config on musor.tv
WG version 5.1.3
Alpine Linux v3.18
works fine for me on windows and linux(ubuntu).
upload your wg config(remove license info).
added in previous post
no idea why its not working.
pinchin had the same problem.
did u try changing the original ini firstshow= setting as i said above?
mtv europe has wrong site_id="xx",it should be site_id="MTV_EURO"
thats why its giving the error 404,other than that the rest look fine.
looking at your config i noticed your using a different user agent than i do.
try the one i use..
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36 Edg/117.0.2045.47
added firstshow=1, but doesn't work
I've noticed 2 things in your config file which is a bit off:
1., You're not using any delays (channel, show etc) at scraping, which is suggested to use (min 2sec for channel/show etc.) And increase the timeout to 20 -->> grabbing too fast on musor.tv can lead to an IP ban
2. You're not using the suggested user agent (as it was written in musor.tv.ini)
Edit: Blackbear199 was faster than I type on phone :P
after changing the user-agent it worked
Thanks Blackbear199
To PinChin
Can you write me an example of what to insert into the musor.ini config?
If You meant what to put in the 'musor.tv.ini' file: I wouldn't really touch that, but if you open it, you'll see in the header part @ the 'Remarks' section what is advised to change.
If Your Webgrab++.config file is mainly for these couple of channels and don't want to grab from an another site as well, then I'd change (in the Webgrab++ config file) this line:
time-out="10"
to this:
time-out="20" channel-delay="3" index-delay="3" show-delay="3"
With this You can (likely) avoid the IP ban.
added these settings
If You're changing in the ini file, then change also the channel-delay and index-delay as well to 3
BUT (as you might have realised by now) if You change in the ini file, then those settings will be valid ONLY for the specified site, to which the ini file belongs to and during grabbing surpasses the Webgrab++ config files corresponding settings.
glad you guys were able to figure it out.
changing the user agent to the mobile one mentioned in the remarks of the ini in siteini.pack is not really the ideal way.
its a mobile device useragent.
user agent set in webgrab config is a global setting meaning its used for all site ini's.
it may fix musor.tv but break other ini you use.
the more correct way todo this is use channel grouping.
on the downloads page the documented configuration files explains this(its in the webgrab config one)
basically you wrap your musor <channel lines inside <channels>xxxx</channels> tags in your webgrab config.
doing this allows users to set specific settings used for specific channels like the user agent,retry and many other settings.
i edited my post above with a txt file with a example and a few small fixes(see the revision comments).
why does this happen? nothing changes.
the same data is overwritten by the same data
( 8/178 ) MUSOR.TV -- chan. (xmltv_id=MTV 00s) -- mode Incremental
iiic
epg correction :
CHANGED show corrected,
show with ---- start = 11/11/2023 14:00:00 stop = 11/11/2023 15:00:00 title = Crazy In Love!
Replaces ----- start = 11/11/2023 14:00:00 stop = 11/11/2023 15:00:00 title = Crazy In Love!
c
epg correction :
CHANGED show corrected,
show with ---- start = 11/11/2023 15:00:00 stop = 11/11/2023 19:00:00 title = Non-Stop Y2Ks!
Replaces ----- start = 11/11/2023 15:00:00 stop = 11/11/2023 19:00:00 title = Non-Stop Y2Ks!
c
epg correction :
CHANGED show corrected,
show with ---- start = 11/11/2023 19:00:00 stop = 11/11/2023 22:00:00 title = 40 Worldwide Hits From The Boys!
Replaces ----- start = 11/11/2023 19:00:00 stop = 11/11/2023 22:00:00 title = 40 Worldwide Hits From The Boys!
c
epg correction :
CHANGED show corrected,
show with ---- start = 11/11/2023 22:00:00 stop = 12/11/2023 03:00:00 title = Get The Party Started!
Replaces ----- start = 11/11/2023 22:00:00 stop = 12/11/2023 03:00:00 title = Get The Party Started!
c
epg correction :
CHANGED show corrected,
show with ---- start = 12/11/2023 03:00:00 stop = 12/11/2023 04:00:00 title = Dancefloor Fillers!
Replaces ----- start = 12/11/2023 03:00:00 stop = 12/11/2023 04:00:00 title = Dancefloor Fillers!
c
epg correction :
CHANGED show corrected,
show with ---- start = 12/11/2023 04:00:00 stop = 12/11/2023 09:00:00 title = Non-Stop Y2Ks!
Replaces ----- start = 12/11/2023 04:00:00 stop = 12/11/2023 09:00:00 title = Non-Stop Y2Ks!
I think it's because your grabbing is in incremental mode, which mainly just corrects the already grabbed times
could be a number of reasons.
corrupt data from previous broken ini.
i would run a grab once using
<update>f</update>
then remove the f
<update></update>
update will now run in incremental mode(default update mode of the <channel update="x" setting for each channel
you should see all .... meaning no changes or corrections
update requested for - 1 - out of - 1 - channels for 1 day(s)
( 1/1 ) MUSOR.TV -- chan. (xmltv_id=AXN (HD)) -- mode Incremental
i.............
Summary for update of AXN (HD)
no changes, no update necessary !
unchanged shows inspected 13
total after update 13
Hi all,
I am using the latest 5.1.3 version of wg++ on Ubuntu 18.04. I have not any issues with port.hu portal but with musor.hu.
I didn't want to open a new topic for my issue, because I saw that you commented here about musor.hu.
I have a 403 error, and I tried everything, that you suggested above, but nothing helped me.
I changed the user agent and updated it to Revision 16 from 13.
The error message I am getting is:
Job started at 11/02/2024 11:05:18
Checking License ..
For License request/update data, see WGLicense.log.txt
found: /home/hts/.hts/tvheadend/.wg++/./siteini.pack/Hungary/musor.tv.ini -- Revision 16
encrypted in 'new (V3)' mode
timezone=UTC+00:00 mapped with timezone_id "Atlantic/Canary"
found: /home/hts/.hts/tvheadend/.wg++/./siteini.pack/Misc/dummy.ini -- Revision 02
processing /home/hts/.hts/tvheadend/.wg++/guide.xml ...
Found existing channel (xmltv_id=Max 4) in the config file
Found existing channel (xmltv_id=Example) in the config file
....
i=index .=same c=change g=gab r=replace n=new
Group (0) :
update requested for - 2 - out of - 2 - channels for 1 day(s)
( 1/2 ) MUSOR.TV -- chan. (xmltv_id=Max 4) -- mode Force
i
error downloading page: Response status code does not indicate success: 403 (Forbidden).
Unable to update channel Max 4
Generic syntax exception:
message:
no index page data received from Max 4
unable to update channel, try again later
Existing guide data restored!
( 1/2 ) DUMMY -- chan. (xmltv_id=Example) -- mode Force
in
Summary for update of Example
missing shows added 0
changed shows updated 0
new shows added 1
unchanged shows inspected 0
total after update 1
Job finished at 11/02/2024 11:05:19 done in 1s
What I am doing wrong?
the example of WebGrab++.config.xml is below:
<?xml version="1.0"?>
guide.xml
rex
Mozilla/5.0 (Android 13; Mobile; rv:68.0) Gecko/68.0 Firefox/114.0
decrypt_userkey
To force a license update; replace this text with the letter f
on
4
0
f
Max 4
Example
Thank you in advance.
403 forbidden can be caused by a number of things.
first redownload the file in post 11,revision 17
second i would upgrade to V5.1.4.2
https://github.com/SilentButeo2/webgrabplus-siteinipack/blob/master/eval...
third..
please dont paste log data
upload the entire file.
the same goes for your webgrab config,upload the entire file(remove username/password info).
everyone pastes it(i dont know why) but as u can see the forums messes up the tags.
To answer my question, I got the BAN.
I connected over wireguard to my host and tied on Windows PC Firefox to load the page and I got:
"Forbidden
You don't have permission to access this resource."
Does anyone know how long take the BAN?
Is there any other solution to bypass the BAN?
wait 15 min and try again.
edit the ini and on the site {xxx} line increase the delays for the retry=
show-delay is the one that would cause it the most.
next would be index-delay
lastly channel-delay.
the current settings work fine for me.
i just tried a channel and grabbed 14 days and didnt get banned.
Hi,
Thank you very much for your quick support and providing me with a link for a new version of the software as well as for site.ini.
I tried after 20 minutes and I still have a BAN, hence I will try after a few hours again.
I will apply your recommendation for settings.