You are here

first ini for me

66 posts / 0 new
Last post
msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year
first ini for me

Hi guys,

I like this project and would to learn in order to participate.

I am trying to grab aljazeera.net by myself, I know it is there, but it is encrypted.
i found this request in devtool https://www.aljazeera.net/graphql?wp-site=aja&operationName=SchedulePage...

trying to grab it but it shows
no index page data received from aljazeera
unable to update channel, try again later
Existing guide data restored!

I tried with cookie, but unfortunately same error
can you advice what I am missing

Appreciate your support

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

post your webgrab log please, you are getting closer ;)
ps
ask authors to enable debug on your license, new version do stuff that you cannot do with 2.1

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

here is it :)

Attachments: 
mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

wait let me see if i can enable debug....

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

Do you want me to add debug and send you another log

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

ok try

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

here is it

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

ok, close your browser and clear cookies, then open again your browser and add that link(do not open other sites or aljazeera), the answer is there, tell me if you get data or the solution ;)

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

it says site header not specified

but I already set the header
url_index.headers {customheader=Accept-Encoding=gzip,deflate}

Attachments: 
mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

then we discuss showsplit

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours
msallal wrote:

it says site header not specified
but I already set the header
url_index.headers {customheader=Accept-Encoding=gzip,deflate}

good !! but is not what you need.....now what "site" headers wants ? What do you think ?

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

Suggestion: on same address you can have english or arab...find the difference...see what changes

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

it need the same request header from the devtool.
but how can I add those, can you provide an example

Attachments: 
mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

Ask Jan to enable you debug (send mail in forum contact) wg3.1 compared to old versions, does a lot more....and easier

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours
msallal wrote:

it need the same request header from the devtool.
but how can I add those, can you provide an example

only one is needed so you do a url_index.headers {customheader=

the difference is the link in your picture and this link:
https://www.aljazeera.net/graphql?wp-site=aje&operationName=SchedulePage...

the difference tells the site aljazeera what schedule you want arab or english

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

I don't understand what you are asking, I read your post five times, but not getting your point, what debug can I get.

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours
msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

ooooooh my god, wp-site=aje

thank you sooooooooo much

but the question now from where you got the aje value

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

you are awesome, Thank you sooo much for such kindful help
much appreciated

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

url_index.headers {customheader=wp-site=aje} *Specifies the site header, the site language you need to indicate in url.
aje= aljazeera english
aja= aljazeera arab
now i want to see your ini ;)

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

it generates the guide now, but because it is from a Json not HTML the started date was Sunday not Today

schedule":[
{
"showDay":"Sunday",
"showTimeslot":"00:00",
"showName":"نشرة الأخبار",
"showDescription":"نشرة تقدم الأخبار السياسية العربية والعالمية.",
"duration":"01:26:0",
"startDate":"1619308800",
"__typename":"Schedule"
},
{
"showDay":"Sunday",
"showTimeslot":"01:26",
"showName":"النشرة الجويـة",
"showDescription":"التنبؤات بأوضاع الطقس ومتغيراته، ودرجات الحرارة والرطوبة والمنخفضات الجوية المتوقعة.",
"duration":"00:34:0",
"startDate":"1619308800",
"__typename":"Schedule"
},

I am stuck on how to figure out the date stuff

can you give me a clue
Thank you Mat so much

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

in the data-link you get shows with a "startDate":"1620000000"( = 3 may unixdate) or "showDay":"Monday", (weekdayname)... you have 2 possibilties.

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

Yes i know, but how to filter on those values
I am stuck here

Could you advise please

By the way i contacted Jan and he enabled a debug mode for me

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

I am trying to grab the data based on startDate (unix date) or showDay since two hours, I could not find a way to do it.

can you advice
Thank you so much

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

index_start is wrong, should be "showTimeslot":"||",|",

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

yes, sorry my mistake I just changed to test and did send the old file, sorry, but the problem is the file is not starting from today day, how can I set the start date to match the startDate

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

you did not read post 22....scrub index_date

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

Also change revision to v3.1 and remove cookie

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

i read that, but i could not find a way to do that, can you edit mu ini and show me the correct way please

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

here is it

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

Pay attention to what i write, multi is when within same blockstart(BS) and blockend(BE) there are multi elements repeated ES (element start) and EE (element end) and is not this the case because it's all single
See documentation 4.2.1.1 http://webgrabplus.com/sites/default/files/download/documentation/Manual...

Attachments: 
msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

oooh god, I did not noticed an index_date

much much appreciated thank you Mat :)

tomorrow i will do another siteini,

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

Before you start read documentation above....i still read it after 5 years.

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

now I got this

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

i guess it is working fine now, i just added those two lines

index_date.scrub {single|startDate":"||",|",}
index_date.modify {calculate(format=utctime)}

Thank you so much

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

your solution is not correct, do the easy way, remove index_date and set in site line firstday=1 so it start on monday

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

Hi Mat,

you are right, there was an error I did not saw yesterday. and when I used firstday=1 it graped the data, but the date still not correct.
I wonder how firstday=1 works, I checked the documentation but it not showing any information about how to map the firstday to the property I am looking for, in my example it is similar to this "startDate":"1619308800"

the generated file works but the time is not correct. how can i set the date to be startDate

please advice

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

i misstyped firstday=0123456 and then urldate.format {daycounter|1} to indicate to bypass first day of the index. In the log you should indication of skipped : show that happened before 'today'

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

why do I need for urldate.format {daycounter|1}
the data is not generated based on specified date from this url
https://www.aljazeera.net/graphql?wp-site=aja&operationName=SchedulePage...

check the attached .json

the data is almost a week and what I am looking for is matching the "showTimeslot" + startDate" to be the start index

i tries multiple time, but still not showing the correct result.
"schedule":[
{
"showDay":"Sunday",
"showTimeslot":"00:00",
"showName":"نشرة الأخبار",
"showDescription":"نشرة تقدم الأخبار السياسية العربية والعالمية.",
"duration":"01:26:0",
"startDate":"1619308800",
"__typename":"Schedule"
},
{
"showDay":"Sunday",
"showTimeslot":"01:26",
"showName":"النشرة الجويـة",
"showDescription":"التنبؤات بأوضاع الطقس ومتغيراته، ودرجات الحرارة والرطوبة والمنخفضات الجوية المتوقعة.",
"duration":"00:34:0",
"startDate":"1619308800",
"__typename":"Schedule"
},

can you check how did you make it in your original version "aljazeera.com.ini"

I spent 3 days on this trying to learn and to figure out how to do it.
please check my ini as well

thanks Mat :)

Attachments: 
mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

fix firstday properly see page 19

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

can we do zoom meeting for 5 minutes?

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

yes i added those already, but the problem with this json is startDate is unix date only not datetime if it is converted it will show Tue May 04 2021 00:00:00 GMT+0000, so in order to get the correct index_start I need to calculate the startDate + showTimeslot this is where i am stuck into

{
"showDay":"Sunday",
"showTimeslot":"01:26",
"showName":"النشرة الجويـة",
"showDescription":"التنبؤات بأوضاع الطقس ومتغيراته، ودرجات الحرارة والرطوبة والمنخفضات الجوية المتوقعة.",
"duration":"00:34:0",
"startDate":"1619308800",
"__typename":"Schedule"
},
and if you checked the json i sent on post 39, the data is not sorted based on date, it is sorted based on weekday, so that Tuesday = 1620086400 next Wednesday = 1619568000, which is last week not next day

can you just modify the file for me
thanks

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

use only showTimeslot":" as index start....no index_date

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

this is what I was doing since yesterday, the question is how the graper know when I use showTimeslot only the date from json. either I don't understand you or I don't understand the whole idea of wg++.

I am stuck in this since three days, trying to accomplish this and I thought it is fun, but i dont think it is for me, i wasted much much time, i will do it in the stupid way the html one and grap each 8 hours

thank you Mat

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

I think i now understand what you want to do....wait ( i am at work now)

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

Please

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

so do change lang =en
1. scrub with a temp_1 startdate then modify calculate format=yyyy/MM/dd
2. then scrub with another temp_2 the showTimeslot
3. addstart to index_start 'index_temp_1' 'index_temp_2'
4 index_start modify calculate format=date,unix
5 with modify addend (lang=ar) to title and description

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

I already did all of these steps. Did you checked my ini

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

i see in your latest siteini max7 should be 7.1 or whatever number of days as it is a .1 siteini

msallal
Offline
Donator
Joined: 3 years
Last seen: 1 year

Yes. You are right

Can you modify my ini and send back to me
It needs your touches
Thanks

mat8861
Offline
WG++ Team memberDonator
Joined: 9 years
Last seen: 5 hours

if you keep aksing without trying and understanding you will never learn, so this is first and last time i do it for you. From now on only suggestions :)

Attachments: 

Pages

Log in or register to post comments

Brought to you by Jan van Straaten

Program Development - Jan van Straaten ------- Web design - Francis De Paemeleere
Supported by: servercare.nl