You are here

tvgids,tv double entries

7 posts / 0 new
Last post
doglover
Offline
Joined: 11 years
Last seen: 3 years
tvgids,tv double entries

The attached tvgids.tv siteinei seems to work.
However I still have a problem.  On the website: http://www.tvgids.tv/zenders/rtl-4/1 the first show is duplicate from the last show of the day before.
Webgrab seems to handle this correctly.  However I want to make sure it does work correctly (i had problems before on other websites with show doubles).  In the past I solved this in showsplit with an exclude statement.
However in this case I cannot get this to work properly.
 
Can somebody help please?
 
Willy

Attachments: 
doglover
Offline
Joined: 11 years
Last seen: 3 years

Solved it myself now.
See attached file
Willy

Attachments: 
Paul
Offline
Paul's picture
Donator
Joined: 12 years
Last seen: 10 months

Thanks for sharing Willy!

doglover
Offline
Joined: 11 years
Last seen: 3 years

Downloaden rond 1h00 's nachts geeft nog steeds problemen.  (verkeerde dag aanduiding)
Downloaden na 06h00 en voor 24h00 is OK.
Ik zoek verder.
 
Willy

WGMaker
Offline
WGMaker's picture
WG++ Team memberDonator
Joined: 12 years
Last seen: 59 min
Is the support helpful?
support us

Willy,
 
I use tvgids.tv daily as my main siteini for most of the Dutch channels.
( I use the 'released' version http://www.webgrabplus.com/sites/default/files/download/ini/info/SiteIni.Pack/Netherlands/tvgids.tv.ini
which is slightly different from the one you use. However I don't think it is very much different in the area  were you have the problems. )
 
I don't have the problems you describe but maybe because I only use it in day time.
I very much like to help you : Can you do a run (that has the problem) with the index_showsplit in debug mode? And send me the logfile?  (try to limit the data in the log file by reducing the timespan and just one channel).
 
Besides that, some suggestions:
Have you tried ? :
index_showsplit.modify {cleanup(removeduplicates)} ? It is described , including the optional extra arguments , in 4.6.4.6.1 of the manual. If the 'duplicates' are not fully equal you can play with the matchingfactor.
 
As an ultimate resort you can use the link argument. To do that is a bit difficult to explain, but I will try: With link you can remove duplicates from two 'linked' multivalue elements.
 
Suppose you have multi value el1 and el2
 
The values of el1 are  a b c c d e (so two duplicates on index 2 and 3)
The values of el2 are  aa bb cc cx dd ee (no duplicates)
 
if you do this:
el1.modify {cleanup(removeduplicates link="el2")}
It will remove the duplicates from el1  result: a b c d e , but it will also remove the values in el2 with the same index as the duplicates in el1, result aa bb cc dd ee
 
You can use this as a trick in index_showsplit. Suppose el2 is your index_showsplit and you want to remove cx because it is a duplicate of cc but not completely equal. Now if you create el1 from el2 with el1.modify {substring(type=char)|'el2' 0 1} (this extracts the first char of el2 and puts it in el1 as you know), you get the el1 as in the example above. After that you can remove the duplicate from el1 with the link set to el2.
 
I have yused that trick in http://www.webgrabplus.com/sites/default/files/download/ini/info/SiteIni.Pack/Malaysia/tm.com.my.ini for the same problem as you seem to have with tvgids.tv
 
Hope this helps      Jan

doglover
Offline
Joined: 11 years
Last seen: 3 years

Jan,
I just seen your reply now.  But last nigth I had my PC run a test (thus before I had seen your reply) with something I dreamt up by myself.
 
I had this showpslit defined:
index_showsplit.scrub {multi(exclude=first)|<li style='margin-bottom: 10px;'>|<div class='program btm_space'|</span>|<script type="text/javascript">}
 
and no date scrubbing.
The grab was run at 0h30.
And this seemed to work now.  At least on the two channels I tried.
And since this works now, I can als explain why it works.  The website list the today show with the first show the one which is running at that moment.  Now at 0h00 this can be show which started before midnigth.  If you leave this one in the grab, this could cause the grabber to conclude that the show starts just before the next midnigth and a wrong date change could be set, resulting in a shifted schedule by a day.
The tomorrow page list again the last show of the previous day,  The showsplit as defined would also delete now this first show in there, and the duplicate is removed.
 
One possible problem still exist. If the website is not updated properly (because if it has the wrong time set) and start to give more shows prior to the one running, again a problem could arise as the schedule will end up on te wrong date. 
This is normally tackled by use of  the date.scrub.  This not possible here as the only dates are mentioned in the shows.  You cannot use the first show because that could be a show from yesterday.  And use of the second show to set the date does not work either because it is possible there is no second show.  The use of the last show des not work either because in the case of only one show, this is also the first show which started yesterday.
So no date.scrub.
 
Willy
PS:
The published version with:
index_showsplit.scrub {multi()|<div class='program btm_space'||</span>|<script type="text/javascript">}
index_date.scrub {single (force include=first)|<div class='program btm_space|pstart='|' pend}
index_date.modify {calculate(scope=datelogo format=date)}
Works during daytime, but not between 0h00 and ....
The first shows has to be finished.
 

WGMaker
Offline
WGMaker's picture
WG++ Team memberDonator
Joined: 12 years
Last seen: 59 min
Is the support helpful?
support us

Willy,
good that you found a solution! However your explanation is not entirely correct. The exclude=first only removes the first of the total result of index_showsplit, and not as you thought also the first of the next days showsplit. That is because the statement inndex_showsplit.scrub { etc}  first collects all results from the two html pages added together and only at the end the exclude argument is executed. That results in a showsplit result with only the very first removed but the duplicates around the next days transient are still there. But that is (in this case) being taken care of by automatic removal in  WG++ correctly.
 
For the problem you noticed (a first show in the index that .. can.. start on the previous day) there is also a special site dependant setting in the ini .. firstday .. with the same result. Like this:
site {firstday=1}
See 4.3 of the manual. It works a little bit different, it doesn't remove this first show (so you won't see it dissapearing when you set index_showsplit in debug mode) but instead this first show will not be processed when scrubbing the index details.
 
Jan

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl