Post reply

Name:
Email:
Subject:
Tags:

Seperate each tag by a comma
Message icon:

Attach:
(Clear Attachment)
(more attachments)
Allowed file types: apk, doc, docx, gif, jpg, mpg, pdf, png, txt, zip, xls, 3gpp, mp2, mp3, wav, odt, ods, html, mp4, amr, apk, m4a, jpeg, aac
Restrictions: 50 per post, maximum total size 150000KB, maximum individual size 150000KB
Note that any files attached will not be displayed until approved by a moderator.
Anti-spam: complete the task

shortcuts: hit alt+s to submit/post or alt+p to preview


Topic Summary

Posted by: Dhammañāṇa
« on: August 03, 2019, 02:05:14 PM »

The idea of invisible characters came because a search of the defect strings would only match the page where copied from. As for which pages are effected, just the next, next, ... page of BMC, seemingly only in the BMC2 part.... ohh, ... No. Because the next page links are wrong, always the same page.

So it's just one page. My person then guesses it's because of a connection problem caused certain action not to be fullfilled. Atma also thinks that it would not be that worthy to investigate the plugin fully and redevelop it. Maybe, how ever, good if informing the developer that such happens.

Sadhu for care and much joy with good undertakings, Nyom Moritz .
Posted by: Moritz
« on: August 03, 2019, 01:47:35 PM »

Still no clear idea.

But it seems I have also not yet seen the full picture of what all went wrong.


So, apparently, something must have gone wrong when saving the page http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0059

between the subsequent revisions
2019/07/30 06:36 -- div headings
and
2019/08/01 16:10 -- <p id

where the UTF8 encoding of special characters somehow was garbled, which can be seen in the comparison between these two directly subsequent revisions.

That is the obvious error that I have seen. Everything else mentioned is not clear to me:

Plain text replacement &amp; -> & seems to have "destroyed" certain characters, mostly Pali.

More seemingly a simple header replacement of h5-tag. The action caused invisible characters in &.. html values.
I have not found where invisible characters appear in relation to h5-tag replacements or elsewhere.

Interesting is also that the replacments, althought looking similar, are uniqu on each page. So the additional added invisible characters are different while the string appears to be equal.

I have not seen any other page than BMC section 59 where the mentioned UTF8 encoding error happened, exactly between these two revisions .

Can Bhante point to other pages where this happened?


In any case this looks like an encoding error, which happened only one or maybe more times (which I have not seen) when saving some page(s) with BatchEdit.

My first idea would have been that maybe the file(s) was/were edited in an external program in between which could not deal properly with the UTF8 encoding and saved it wrongly.
DokuWiki has some mechanism for recognizing external edits and including them as such in the revision history. But it might be that it does not always work, does not always become "aware" when something was changed from outside (I think the check only happens when saving), so that the change would appear included in another change (in this case a replacement which was mostly correct, but based on a file that had already been modified and wrongly encoded in between).
Not sure about that, if that would be possible that especially BatchEdit might skip the mechanism of "being aware" if something has changed from outside.

Apart from that, I have seen that BatchEdit has become a lot more complex since the last time I looked into the code. Many things happen which I don't understand so quickly. For example it looks like the results of a BatchEdit search (with matches and replacements, even before they are applied) are stored in a certain structure in some temporary files, and most likely they are read back again from those files when finally applying the replacements. Maybe it could happen somehow that the data gets wrongly encoded there sometimes and reloaded from there afterwards for some reason with the wrong encoding. But that is just vague speculation now since I don't really yet see how it all works.


As long as not possible to replicate the error in some clear test case I think it will be difficult to figure it out.  ::)


_/\_ _/\_ _/\_

* Moritz now probably not having much time the next days or week to find the reason.
Posted by: Dhammañāṇa
« on: August 03, 2019, 09:36:32 AM »

Interesting is also that the replacments, althought looking similar, are uniqu on each page. So the additional added invisible characters are different while the string appears to be equal.
Posted by: Dhammañāṇa
« on: August 03, 2019, 06:44:48 AM »

As it infected all non-standard characters: maybe batchedit has any process which deals with char-sets and which was possible interrupted by connectivity. Sometimes, during no reaction, Atma would send orders also twice which might "disturb" ongoing prozesses.

h3 and h4 has been made with similar regex before, but didn't touch that sample page (maybe 20at all infected).

Something un-usual is that this change is printed in gray in the list, possible pointing on something.
Posted by: Dhammañāṇa
« on: August 03, 2019, 06:15:49 AM »

search:

/\s*<h5 id=["']([^"']+?)['"]>​([^<>]+?)<\/h5>\s*/

replace:

\n\n=== //$2// ===\n<span anchor #$1></span>\n\n

Sehr simple.
Posted by: Moritz
« on: August 03, 2019, 03:51:39 AM »

That looks strange. No clear idea at the moment.
Knowing exactly what the regex to replace was might help to understand it better.

Connection problems should not be a possible reason.
Possibly a programming error from some of the modifications I made.

I think it is best to to keep the old revision, better than trying to replace from this result.

I can look in the morning, or if Bhante could send me the FTP password (already have the password now), maybe I find something useful from here (at work now, in taxi, but a "boring" night so far only sitting and waiting).

_/\_
Posted by: Dhammañāṇa
« on: August 02, 2019, 05:49:56 PM »

Plain text replacement &amp; -> & seems to have "destroyed" certain characters, mostly Pali.

More seemingly a simple header replacement of h5-tag. The action caused invisible characters in &.. html values.

http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0059?do=diff&rev2%5B0%5D=1564461374&rev2%5B1%5D=1564741857&difftype=sidebyside

Could such happen just by connectivity problems?

Not sure if it's better to try to recover the versions with the Spam-tool or replace the strangers (here the invisible could be maybe difficultly handled well).
Posted by: Dhammañāṇa
« on: August 02, 2019, 02:48:04 PM »

Looks like certain character combinations are not matched proper by the header-styling plugin. Sample:

=== //Object:// ===

returns

Object://

Atma didn't test it in detail yet.

http://accesstoinsight.eu/en/lib/authors/thanissaro/bmc/section0020

ohh... doc of wikistyle-pugin tells: ":// won't be converted."
Posted by: Dhammañāṇa
« on: July 31, 2019, 05:28:32 PM »

Divisions should be "fine" already. Some less alian Spans are left, and will be made tomorrow, when sun is shining.

There is one issue in regard of classes, but more over on id's: low letters. While notepad+ gives the possibility to replace with the lower-case value, batch-edit seems not to support \l$1. Since there might be lot of anchors and links to them (which are not so problematic since cut down to low case by the system if right, also for anchor extention) containing upper-case, maybe someone has an idea how to transform them more global.
Posted by: Dhammañāṇa
« on: July 26, 2019, 02:40:22 PM »

Updates Info:

Link-tags should be now all replaced by wiki-mark-ups.

Certain things that will need further fixes in regard of links.

  • relative links, missing ./ or ../ at the beginning would not work for now. General my person tends to replace relative links with the whole path, starting with :en:....:file

  • since the structure has been slightly changed in regard of the tipitaka folder, starting with adding :sut: and renaming folders already toward the cs-rm standard (not all done for now, especially in the kn folder) not all links are correct for now and need work.
  • media-links may be not correct for now. Since image links with destiny require an obsolute url, having either host/_detail/en/.../file (display of pictures within the media manager) or host/_media/en/.../file (direct download) it's not possible to replace relative links correct more global. Alternative would be to either change the logos used by media-manager or to let go of the zze-logos for certain files.

  • links which have been generated by script in the doc-info at the footer need to be made as static one.
  • It might be that my person has deleted certain anchors which gave reference to headers from the index and also other pages. Those would be needed to be renewed.
  • There is no idea of how to maintain hover-texts in good ways. Most have been given up. The rest has been placed within images (Info sign)


My person will focus now on replacing the rest of div and span tags.

The Portuguese pages, btw, since even only 3 pages, would had increased the work and time by 50% are left behind meanwhile, thinking much faster to edit the three pages manual next to the global replacements.

The last and greatest challenge of replacements will then be that of the special lists and tables... and of course many small and special things will be left. 
Posted by: Dhammañāṇa
« on: July 21, 2019, 11:45:25 AM »

Althought no problem, Nyom Danilo , general in Dhamma, never, if correcting, risk that something get lost. If not seeing the use for now, simply "hide" it. In that way the Dhamma could be maintained till in our days.

Atma, as told, will need some days more to replace the most (lists and tables will need manual edits).

Next step would be to bring it in a nice easy standard and creat templets for new pages.

Ati had a huge standard and my person thinks that most is good to carry on. It needs a while to understand all (working now 7+ years with it, still finding hidden treasures).

Is Nyom familar with css?

Some of the ATI.synax features he can find at Ati.eu Syntax. Detail Doku is not written for now, even having started. The topic old posts here give some impressions for understanding.

To investigate and see of what Atma is currently doing, best to check Activity Lists or http://accesstoinsight.eu/index?do=recent

If wishing to use regex for many pages (be careful, can damage much and not easy till impossible to recover, incl all Sanghayana Tipitakas) he finds the batcheditor tool in the Admin area.

(The last year+ Atma had started all anew surely 2,3 times... 10.000 of pages, because some mistakes...)
Posted by: Danilo
« on: July 21, 2019, 08:14:27 AM »

Not sure if Nyom Danilo is familiar and skilled with regex, powerful, but also dangerous to destroy a lot. Let it be know if wishing to use it for global changes.

I have some experience with regex.

If Bhante thinks it's a good idea, he could specify the patters to be matched in the html pages and the output data of the dokuwiki's pages, thus a standard model would be clearly defined to be used as reference and I (or anyone else) could come up with the corresponding regex rules and do the appropriate changes.

When editing the html page, I had saw many tags which didn't appeared to had any effect. So I end up removing it.


Sadhu! (and good to see Danilo back here)

I have not read everything in detail now, but have, after taking a quick look at it, installed the WikiStyle Script plugin.
It seems the plugin does not change any stored data, but only affects how things are rendered.

Possible that there might be conflicts with other plugins like the include-plugin, as Bhante says. But it would not destroy any data.
If there are any problems with it, one can simply uninstall/deactivate it and maybe think of other solutions.

Not yet looked much at any results and if all works correctly, but the example page seems to look fine so far.

I gave Danilo Admin rights now.

_/\_ _/\_ _/\_


* Moritz will not have time to look at anything deeper until Sunday

Thanks, Moritz. _/\_
Posted by: Dhammañāṇa
« on: July 19, 2019, 10:09:06 PM »

Sadhu

The mod seems also to solve the problem of styling in link texts and seems to release from a lot of work so far.
Posted by: Moritz
« on: July 19, 2019, 10:00:43 PM »

Sadhu! (and good to see Danilo back here)

I have not read everything in detail now, but have, after taking a quick look at it, installed the WikiStyle Script plugin.
It seems the plugin does not change any stored data, but only affects how things are rendered.

Possible that there might be conflicts with other plugins like the include-plugin, as Bhante says. But it would not destroy any data.
If there are any problems with it, one can simply uninstall/deactivate it and maybe think of other solutions.

Not yet looked much at any results and if all works correctly, but the example page seems to look fine so far.

I gave Danilo Admin rights now.

_/\_ _/\_ _/\_


* Moritz will not have time to look at anything deeper until Sunday
Posted by: Dhammañāṇa
« on: July 19, 2019, 11:21:51 AM »

It's pretty simple:
1. Rename the plugin's directory to "wikiformatstyling"
2. Place the directory in "dokuwiki/lib/plugins"

Installing plugins: it's best, secure and easy made via the Admin-panel. Not sure if Nyom Danilo has admin rights, which should be no problem. Good how ever, since some tools are very powerful and could even destroy much, to coordinate with Nyom Moritz or ask my person if not sure in some regards.

In regard of distinguishing tags and divs, whether already changed or old ones. Old htmls tags incl. always ="..." or and other marks. For the stylings for wike tags always look like this <div class_texts #anchor_text> or  <span class_texts #anchor_text> or <span #anchor_text> or <div class_texts>. If seeing others then this, old, best to list them with a link to the place where seen, in a list (maybe a topic only for that, or here just a post).

Not sure if Nyom Danilo is familiar and skilled with regex, powerful, but also dangerous to destroy a lot. Let it be know if wishing to use it for global changes.