You are here

MDB and Russian Language scraping

9 posts / 0 new
Last post
JohnnyParanoia
Offline
JohnnyParanoia's picture
Joined: 11 years
Last seen: 2 years
MDB and Russian Language scraping

Title says it all really lol.
Is there a way to scrape the English description and title from the IMDB for Russian language channels using MDB and adding it to the guide? When I try to scrape it returns no Episode or Movie candidates.
I've tried using Google to search the Russian title and it does return the English IMDB result on the right so the info is available.
I had some sucess with port.hu using the subtitle (which also had the English name) and then merging it so that the English title appeared after the Hungarian title and the English description appeared after the Hungarian description,
Thanks for any help in advance :)

JohnnyParanoia
Offline
JohnnyParanoia's picture
Joined: 11 years
Last seen: 2 years

Bumpity Bump :)

WGMaker
Offline
WGMaker's picture
WG++ Team memberDonator
Joined: 12 years
Last seen: 4 hours
Is the support helpful?
support us

Hi Johnny

I was very busy, little time left to answer sorry!

Send me :

A xmltv input file with the Russian titles.
The mdb.config you used

With that I can see what happens  .. and maybe find some improvement

Jan

JohnnyParanoia
Offline
JohnnyParanoia's picture
Joined: 11 years
Last seen: 2 years
WGMaker wrote:

Hi Johnny
I was very busy, little time left to answer sorry!
Send me :
A xmltv input file with the Russian titles.
The mdb.config you used
With that I can see what happens  .. and maybe find some improvement
Jan

No problem, I didn't know if anyone had seen my post lol :)
Please find attached my configs and thanks for your help.

WGMaker
Offline
WGMaker's picture
WG++ Team memberDonator
Joined: 12 years
Last seen: 4 hours
Is the support helpful?
support us

Hi,
it saves me time if you send me the xmltv input file. Run your config with the mdb post processor off and for one or two channels you think are interesting to get the IMDb data for, then send me the xmltv output.
 
Jan

JohnnyParanoia
Offline
JohnnyParanoia's picture
Joined: 11 years
Last seen: 2 years

Here you go, sorry I misunderstood :)

Attachments: 
WGMaker
Offline
WGMaker's picture
WG++ Team memberDonator
Joined: 12 years
Last seen: 4 hours
Is the support helpful?
support us

Hi,

I ran the mdb postprocessor with your xmltv. Just to get the differences clear, I used three primary search facilities : ask.com, bing.com and google.com

With ask.com I got 10 matches, with bing.com 12 matches and with google 2 matches.

(google blocks robot searches because there is no money in it for them, so after a few searches you are kicked out. Thanks google for the service!!)

Clearly the best is bing.

A break down of the result: (see also attached details at the end)

There were 32 unique shows in your xmltv
For 19 of them bing found a IMDb showid
For 12 of these 19 IMDb provided a Russian aka (also known as) title to match with the xmltv title
End result  for 12 out of 32 shows IMDb data was extracted

There is very little we can do about this.

 In the attached details you can see that for one of the shows for which bing found 3 possible showid's, for none of them IMDb has a matching Russian title in it's aka list. So there is no wayn for the postprocessor to figure out which of the showid's is the corect one.

Jan

 

JohnnyParanoia
Offline
JohnnyParanoia's picture
Joined: 11 years
Last seen: 2 years

Thanks for that :)
I must have messed up my config somewhere because it wasn't even identifying movies or series, I've got that sorted not and its at least picking them up even if there are no results :)

WGMaker
Offline
WGMaker's picture
WG++ Team memberDonator
Joined: 12 years
Last seen: 4 hours
Is the support helpful?
support us

Strange that you got no results!   If you send me your mdb.config.xml I can have a look what's wrong.
Jan

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl