Hi Guys,
Less than 24 hours ago I met with WebGrab+. It works perfectly but it is hard do understand code syntax. I know php, asp, TSQL and a little bit shell scripting. But WebGrab is not easy to understand or similiar to other things (at least for me, no offense). I am interested Turkish Channels. Unfortunatly some of them doesn't work. For this reason I tried to fix ini files. As I mentioned I am working with WebGrab less than 24 hours and finally I fixed Hurriyet.ini. I want to share with you but I have no idea how can I add EPG Channels section of this website.
I need to fix TRT.NET.TR too. But I need your help. Here is the original text:
</div>
<div id="gunlukAkisDIV">
<p class="tur0"><a href="http://www.webgrabplus.com/./detay.aspx?pid=83" target="_self"><span class="aks0">06:23</span><span class="aks1">İstiklal Marşı ve Günün Program Akışı</span></a></p><p class="tur4"><a href="http://www.webgrabplus.com/./detay.aspx?pid=46762" target="_self"><span class="aks0">06:25</span><span class="aks1">Adını Sen Koy</span></a></p><p class="tur4"><a href="http://www.webgrabplus.com/./detay.aspx?pid=30349" target="_self"><span class="aks0">07:40</span><span class="aks1">Beni Böyle Sev</span></a></p><p class="tur5"><a href="http://www.webgrabplus.com/./detay.aspx?pid=25413" target="_self"><span class="aks0">10:10</span><span class="aks1">Yabancı Sinema "Geronimo: Bir Amerikan Efsanesi"</span></a></p><p class="tur4"><a href="http://www.webgrabplus.com/./detay.aspx?pid=39073" target="_self"><span class="aks0">12:25</span><span class="aks1">Yeşil Deniz</span></a></p><p class="tur4"><a href="http://www.webgrabplus.com/./detay.aspx?pid=43454" target="_self"><span class="aks0">15:00</span><span class="aks1">Baba Candır</span></a></p><p class="tur4"><a href="http://www.webgrabplus.com/./detay.aspx?pid=46762" target="_self"><span class="aks0">17:35</span><span class="aks1">Adını Sen Koy</span></a></p><p class="tur6"><a href="http://www.webgrabplus.com/./detay.aspx?pid=78" target="_self"><span class="aks0">19:10</span><span class="aks1">Hava Durumu</span></a></p><p class="tur6"><span class="aks0">19:15</span><span class="aks1">Habere Doğru</span></p><p class="tur6"><a href="http://www.webgrabplus.com/./detay.aspx?pid=46658" target="_self"><span class="aks0">19:30</span><span class="aks1">Işıl Açıkkar İle Ana Haber</span></a></p><p class="tur8"><a href="http://www.webgrabplus.com/./detay.aspx?pid=58663" target="_self"><span class="aks0">20:00</span><span class="aks1">Sıra Sende Türkiye</span></a></p><p class="tur2"><a href="http://www.webgrabplus.com/./detay.aspx?pid=39147" target="_self"><span class="aks0">00:00</span><span class="aks1">Çanak Çömlek Patladı</span></a></p><p class="tur5"><a href="http://www.webgrabplus.com/./detay.aspx?pid=25413" target="_self"><span class="aks0">01:00</span><span class="aks1">Yabancı Sinema "Geronimo: Bir Amerikan Efsanesi"</span></a></p><p class="tur4"><a href="http://www.webgrabplus.com/./detay.aspx?pid=30349" target="_self"><span class="aks0">02:55</span><span class="aks1">Beni Böyle Sev</span></a></p><p class="tur2"><a href="http://www.webgrabplus.com/./detay.aspx?pid=56620" target="_self"><span class="aks0">05:20</span><span class="aks1">El Emeği</span></a></p><p class="tur6"><span class="aks0">06:35</span><span class="aks1">-</span></p>
</div>
<div style="clear:both">
</div>
</div>
I already grab title and description but I want to grap category (genre) <p class="turX" X is the category id and here is the legend:
<li><a class="kLS kateS1" turID="0" href="http://www.webgrabplus.com/">Genel</a></li>
<li><a class="kLS kateS0" turID="4" href="http://www.webgrabplus.com/">Dizi</a></li>
<li><a class="kLS kateS0" turID="2" href="http://www.webgrabplus.com/">Kültür</a></li>
<li><a class="kLS kateS0" turID="8" href="http://www.webgrabplus.com/">Müzik</a></li>
<li><a class="kLS kateS0" turID="9" href="http://www.webgrabplus.com/">Eğlence</a></li>
<li><a class="kLS kateS0" turID="6" href="http://www.webgrabplus.com/">Haber</a></li>
<li><a class="kLS kateS0" turID="5" href="http://www.webgrabplus.com/">Sinema</a></li>
<li><a class="kLS kateS0" turID="3" href="http://www.webgrabplus.com/">Çocuk</a></li>
<li><a class="kLS kateS0" turID="1" href="http://www.webgrabplus.com/">Eğitim</a></li>
<li><a class="kLS kateS0" turID="7" href="http://www.webgrabplus.com/">Spor</a></li>
Here is the my ini file for TRT.NET.TR
site {url=trt.net.tr|timezone=UTC+03:00|maxdays=6|cultureinfo=tr-TR|charset=UTF-8|titlematchfactor=90|nopageoverlaps}
site {ratingsystem=TR|episodesystem=onscreen|grabengine=|firstshow=0|firstday=0000000}
url_index{url|http://www.trt.net.tr/televizyon/akis.aspx?kanal=|channel|&gun=|urldate|}
*http://www.trt.net.tr/televizyon/akis.aspx?kanal=trt-1&gun=0
urldate.format {daycounter|0}
*subpage.format {number||1|}
index_showsplit.scrub {multi|<div id="gunlukAkisDIV">|<p class="tur|</p>|<div style="clear:both">}
index_urlshow {url|http://www.trt.net.tr/televizyon|href=".|||"}
index_start.scrub {single|<span|class="aks0">|</span>|<span}
*index_stop.scrub {single|}
index_title.scrub {single|<span|class="aks1">|</span>|</a>}
*
index_category.scrub {single|<p|class="||">|">}
*
description.scrub {single|<meta name="description"|content="|"| />}
description.modify {remove|- TRT Televizyon}
description.modify {cleanup}
*director.scrub {single|}
*actor.scrub {single(separator=", ")|}
*presenter.scrub {single|}
*producer.scrub {single|}
*writer.scrub {single|}
*composer.scrub {single|}
*rating.scrub {multi|}
*ratingicon.scrub {multi|}
*category.scrub {single|}
productiondate.scrub {single|Yapım Yılı|<li class="kocontent">|</li>|</ul>}
*starrating.scrub {single|}
*episode.scrub {single|}
*subtitles.scrub {single|}
*premiere.scrub {single|}
*previousshown.scrub {single|}
*
* operations:
I am new, and I don't how to fix. I hope someone can help me. Thanks. You can find fixed hurriyet.ini here:
Thank you very much for your fast response. Unfortunatly it didn't work because:
index_showsplit.scrub {multi|<div id="gunlukAkisDIV">|<p class="tur|</p>|<div style="clear:both">}
I use this. For this reason p class element doesn't capture. I need to chage above line. Could you help me? After we finish this I will try to work DSMART (which is you already working about it) and tivibu: http://www.tivibuspor.com.tr/yayin-akisi
Error in Windows host. See logs:
[Error ] Unable to update channel TRT Okul
[Critical] See log file for details
[Critical] Exception.Message: parsing "<div id="gunlukAkisDIV">(?:.*?)(<p class.+?</p>)(?:.*?))*<div style="clear:both">" - Too many )'s.
[Critical] Exception.StackTrace: at System.Text.RegularExpressions.RegexParser.ScanRegex()
at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, TimeSpan matchTimeout, Boolean useCache)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options)
at WGconsole.Scrub.GrebElements(String source, String[] filters, Boolean fromscrub)
at WGconsole.Scrub.GetElements(String from, String[] filters, Boolean fromscrub)
at WGconsole.Scrub.SplitIndex(String index, SiteIni ScrubStrings)
at WGconsole.Program.UpdateChannel(String strIndex, ChannelToUpdate Chan, XmlTarget xTarget)
at WGconsole.Program.ConsoleApplication(String[] args)
You are the man :) You fixed it. I will share the new ini with everyone for TRT.NET.TR I hope someone can update in the website. Here is the updated in. Thank you.
Anyone has a working Tivibu source? @mkaand does Tivibu Epg work for you? I can't get it running...