Get Channel Logo - horizon.tv.ch
Hello, I figured out that this ini file of horizon.tv.ch.ini is not grabbing the channel icons.
I just tried the last hour with the following entry but unfortunally it's not working. Maybe you know what I am doing wrong here?
Page I tried to grab:
https://www.horizon.tv/de_ch/tv-schauen/live-channel.html/256616999367/1...
My Ini Entry:
index_urlchannellogo.scrub { url () ||<div class="box-art logo boxart-small">| src="|"|alt=|</div>}
The html part of the homepage I want to grab:
<div class="description">
<div class="box-art logo boxart-small">
<img src="https://wp9-images-ch-dynamic.horizon.tv/channellogos/02/itv_3.png?w=75&... alt="ITV 3">
</div>
If this is finally working it might be uploaded as well to the ini epg list :)
Hello thanks for your hints, unfortunaly it's not working, the output is the following.
<channel id="ITV 3 UK">
<display-name lang="de">ITV 3 UK</display-name>
<url>http://www.horizon.tv</url>
</channel>
The ITV 3 Logo that is part of the html document.
Thank's for your help and patience, I have added the debug attribute to url and got following output.
[ Info ] channel (xmltv_id=ITV 3 UK) site -- HORIZON.TV.CH -- mode full
[ Debug ] Debugging information SiteIni
[ Debug ] Element: INDEX_URLCHANNELLOGO
[ Debug ] html source written to : C:\ProgramData\ServerCare\WebGrab\html.source.htm
[ Debug ] scrub strings:
[ Debug ] type & arguments : url(debug)
[ Debug ] headstring : <div class="box-art logo boxart-small">
[ Debug ] blockstart (bs): src="http://www.webgrabplus.com/_%5B%20%C2%A0Debug%20%5D%20%C2%A0%20%C2%A0%20%C2%A0elementstart%20%28es%29%3A%20">
[ Debug ] elementend (ee): </div>
[ Debug ] blockend (be): <h2
[ Debug ]
[ Debug ] No Block with these separators
And the source output for one ID: There is not the url of itv_3 logo.
{"id":"16807026","title":"KRIMI","scheme":"urn:tva:metadata:cs:UPCEventGenreCS:2009"}
],"isAdult":false,"cast":["Kevin Whately","Laurence Fox","Clare Holman","Rebecca Front","Owen Teale","Tom Harper","Gina McKee"],"directors":["Sarah Harding"],"images":[{"assetType":"boxart-xlarge","assetTypes":["boxart-xlarge"],"width":210,"height":303,"url":"https://wp21-images-ch-dynamic.horizon.tv/linear_images/20731373532.p.jp..."}
,{"assetType":"boxart-small","assetTypes":["boxart-small"],"width":75,"height":108,"url":"https://wp21-images-ch-dynamic.horizon.tv/linear_images/20731373532.p.jp..."}
,{"assetType":"boxart-medium","assetTypes":["boxart-medium"],"width":110,"height":159,"url":"https://wp21-images-ch-dynamic.horizon.tv/linear_images/20731373532.p.jp..."}
,{"assetType":"boxart-large","assetTypes":["boxart-large"],"width":180,"height":260,"url":"https://wp21-images-ch-dynamic.horizon.tv/linear_images/20731373532.p.jp..."}
,{"assetType":"tva-boxcover","assetTypes":["tva-boxcover"],"width":180,"height":260,"url":"https://wp21-images-ch-dynamic.horizon.tv/linear_images/20731373532.p.jpg"}
],"mediaGroupId":"crid:~~2F~~2Feventis.nl~~2F00000000-0000-1000-0008-00000000899B","secondaryTitle":"Old School Ties","shortDescription":"When an ambitious Oxford student is found dead in her hotel room after inviting a reformed computer hacker to speak at the Union, Lewis and Hathaway investigate. IMDb rating: 7.6/10.","mediaType":"Episode","year":"2007","isReplayTv":false,"seriesEpisodeNumber":"2","seriesNumber":"1","videoStreams":[],"airDate":1167609600000,"entitlements":["VIP","_OPEN_"],"currentProductIds":[],"currentTvodProductIds":[]}
}
This seems to be harder than expected :) Still the same, I assume the url is not correct that I have posted in the beginning. So I have uploaded the Ini and config file with that channel as example. Maybe it's helping you a bit more.
Nice try, as I got you so far you try to seperate the url of this part or?
Hi thank you very, much I created an output of the seperated url to the image. I assume that ?w=110&h=150&mode=box needs to be removed or?
[ Debug ] ----------begin--block----------
[ Debug ] content="https://wp9-images-ch-dynamic.horizon.tv/channellogos/02/itv_3.png?w=110..."/>
[ Debug ] ----------end----block----------
Thanks a lot, even with the proper URL I don't get the channel icon to the XML. The goal is not far away :)
There is something that's confusing pretty much, when I just change for testing the value index_urlchannellogo to index_urlshow I get exact the channel Icon but it's part of the movie description. But index_urlchannellogo seems not to work.
<channel id="ITV 3 UK">
<display-name lang="de">ITV 3 UK</display-name>
<url>http://www.horizon.tv.ch</url> --> Missing: <icon>Bla</icon>
</channel>
[ Debug ] Modify
[ Debug ] command & arguments : set(debug)
[ Debug ] Expression-1 : 'temp_4'
[ Debug ] Element value before operation:
[ Debug ] https://wp9-images-ch-dynamic.horizon.tv/channellogos/02/itv_3.png
[ Debug ] String composer result for Expression-1 :
[ Debug ] Expression-1 expanded : https://wp9-images-ch-dynamic.horizon.tv/channellogos/02/itv_3.png
[ Debug ] Element value after operation:
[ Debug ] https://wp9-images-ch-dynamic.horizon.tv/channellogos/02/itv_3.png
[ Debug ] skipped
Well and here the explanation why it confuses you. And I think I've got bad news. I don't think you will get the channel icon in the current siteini implementation.
The channel logo url is only grabbed on the index page (see 4.6.1.1 in de docs), during the datelogo scope (before the showsplit!)
And the thing you are trying to do, is to grab the channel logo from a show page.
If you add debug to the showsplit and check out the html.source.htm file, you will see the data were wg++ can search for the channel logo. And currently I don't see any connection point with that data and the channel logo. So that's the reason why I think in the current siteini it is not possible to support the channel logo url.
Hi Francis, thank you very much for your explanation here. At least we know something more, this might be a new feature request :)
Made a temp workaround. The channel logo url is available in the .channels.xml generation part. So I added the info with the site_id value.
So now there is a new .channels.xml file you must use with this siteini.
Yes, could be a feature request. But currently many other things on our mind. Currently re-writing a part of the configuration part in the code. And the new installer is also still on our list. But time .....
Thank you Francis
this output I get is, but I will try to extract the image from your given xml. I get that there are many other things in your pipe.
Just a question beside will be there something parallelization when grabbing? I don't know which language webgrab is based on but in .Net it's kinda simple to implement those features.
<channel id="ITV 2 UK">
<display-name lang="de">ITV 2 UK</display-name>
<icon src="http://www.webgrabplus.com/27909159384" />
<url>http://www.horizon.tv</url>
</channel>
Ok, here an adjusted siteini. (previous also works correct, but this one won't output a channels logo, if you use an incorrect channel).
The main reason that you don't see it works, is because you have used the old channel definitions in your config file.
So delete all the horizon.tv.ch channels in your config file, open my new .channels.xml file (#20 posts above) and use these new etries.
About paralleling things, no nothing is done that way. In the early days of wg++ is also ask to introduce such a thing. Because I wanted to speed up things.
But as I now look back, with the knowledge and experience I now have, its not a top priority any more.
If you even seen in some of the settings of wg++, there are even "slowdown" settings, to be sure the site is not blocking you because you are loading there server to much. So running multiple grabs in multiple threads would not help and would create strange effects.But nevertheless it is still in the back of my mind to implement such a thing. And also most of the siteini grabbing stuff, the webpage download is the bottle neck (slowest) part. And putting multiple of those requests in separate threads, won't help a lot.
And if it were that simple, it was already be implemented. But hopefully one day, we can do this.
Your explanation changed my view on this topic regarding to the parallel optimisation. The only point I see would be to run several thread for different pages to grab.
Btw thank you very much for all your effort and your files are working pretty well :)