coollector.com
https://www.coollector.com/

"Scanning your Files": BUGS or IMPROVEMENTS ? or both ?
https://www.coollector.com/viewtopic.php?f=2&t=513
Page 1 of 3

Author:  kepler42 [ Sun Dec 26, 2010 8:39 pm ]
Post subject:  "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

I must say I have mixed feelings with the scanning of my existing Video hard drives.
On one hand its is Fabulous - When it works.
On the other hand its a nightmare , when it does not. And this happen in roughly 15% of th ecases. The two major problems are:

1/ FALSE "POSITIVES Coolector "thinks" he found a correspondence for a given file, and this "finding" happens to be erroneous. Repairing that goes either through a Change in name ( hazardous) or a "manual fix"..But in this later case the next scan will re-introduce the error, and this will force us to change the names of th efiles -WHICH WE DONT WANT (because there might be subtitles or picture files based on this name, or it can be a Torrent File that we IMPERATIVELY have to leave with its ORIGINAL name, etc...)

2/ FALSE "NEGATIVES. Coolector does NOT find the database item for a File. You suggest to include itthrough the appropriate thread. (BTW why the hell am I asked the IMDB number AND THE NAME ? Save me some typing , the IMDB number is plenty enough !!) Then I receive a fancy and polite answer telling methat 8 out of 10 of my requests were ALREADY in the data base. :cry2: Ahrrrg! So we are down to a simple question: Is the file name parsing really effective. Sometimes it just does not want to recognise the name , with ABSOLUTELY no difference. Sometimes it is touchy, does not accept dots in lieu of spaces. It Also NEVER find a correspondence when the spaces are removed in the video filename, etc;;;

So it seems clear that the search algorithm needs some improvement.
It also needs to provide an interactive phase with the user( refusing suggestions or directing choices after the "Preview" ) AND TO MEMORIZE the choices which were made so it "improves" itself , and does not "regress" in the next scan.

So I am very grateful for this tool, and also very upset by these little incumbrances.
Thanks for your attention. Let me know if you want detailed examples.

Author:  (cool) Hector [ Sun Dec 26, 2010 8:51 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
* tt0069678 Addio Fratello Crudele
* tt1583303 Les Invités De Mon Père
* tt0381849 3:10 to Yuma
* tt0829459 A Mighty Heart
* tt0037522 Back to Bataan
* tt1210042 Brooklyn's Finest
* tt0379725 Capote
* tt0050306 Designing Woman
* tt0429727 Il caimano
* tt0053619 L'avventura
* tt0054452 La Vérité
* tt1363376 La Dernière Fugue
* tt1359553 Le Cameleon
* tt1509638 Les Petits Ruisseaux
* tt0074084 Novecento
* tt0131409 Geri's game
* tt0323250 Mike's new car
* tt0089841 Prizzi's Honor
* tt0365190 Red Lights
* tt0240913 Sous Le Sable
* tt0044331 Affair In Trinidad
* tt1289449 Yo Tambien
* tt0047878 The Big Combo
* tt0061407 The Taming Of The Shrew
* tt0057634 The V.I.P.s
* tt0343737 The Good Shepherd
* tt0140352 The Insider


Please tell me what were the exact file names for those movies, so I can understand why they were not recognized.

Author:  (cool) Hector [ Sun Dec 26, 2010 8:55 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
But in this later case the next scan will re-introduce the error

I'm surprised. When a file is already associated with a movie, it's skipped by all the scans.

Do you confirm this bug ?

Author:  (cool) Hector [ Sun Dec 26, 2010 8:59 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
Sometimes it is touchy, does not accept dots in lieu of spaces

I'm surprised. Are you sure ?


kepler42 wrote:
It Also NEVER find a correspondence when the spaces are removed in the video filename, etc

Of course ! The algorithm recognizes words. If you remove spaces, you get different words.

Author:  (cool) Hector [ Sun Dec 26, 2010 9:01 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
Repairing that goes either through a Change in name ( hazardous) or a "manual fix"

You should always do a scan preview, especially if your files are named loosely.

Author:  kepler42 [ Sun Dec 26, 2010 10:19 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

Here are some cases that I was preparing, they fit in with your questions.

(cool) Hector wrote:
kepler42 wrote:
Sometimes it is touchy, does not accept dots in lieu of spaces

I'm surprised. Are you sure ?


Definite
1/ Unable to interpret dots as spaces ( La.Dernière.Fugue vs La Dernière Fugue )
Same remark for UNDERSCORES _ or HYPHENS -

(cool) Hector wrote:
kepler42 wrote:
It Also NEVER find a correspondence when the spaces are removed in the video filename, etc

Of course ! The algorithm recognizes words. If you remove spaces, you get different words.


2/Unable to check filename with no spaces ( LesInvitésDeMonPère vs Les Invités De Mon Père)
Your answer "OfCourse..." make me fear that you might be not extremely open to discussion :sorry:
When the first "brute force" pass ( which what you call "The Algorithm" seems to be ) does not find a match, there should be a "smart second pass" where , forthis case all spaces are removed in the target name ( name from the database)


3/ Unable to collate accented letters ( La Derniere Fugue vs La Dernière Fugue )
Here again the "smart second (or third) pass" should remove all accented letters by their collate, in target too, before performing looser comparisons.

4/Unable to match missing ' ( Brooklyns Finest vs Brooklyn's Finest )
Same remark

5/Requires character forbidden in Filename( the colon in 3:10 to Yuma )
This is just "UnFixable" if you do not accept the idea that " El Algorithmo" could be improved some day..

6/Inability to use (and parse) folders ( ex: Folder "Inception" containing CD1.avi and CD2.avi )
Here we are in even smarter methods, but programmatically it's cool, just recursing on the folder(s) name instead of the file name...

BUT THE IMPORTANT POINT is that , after the Preview scan there should be , a minima, a "review of the preview" interaction before launching the actual update of the owned videos.
And, finally, i want to underline again that there are many circumstances where changing the filename to benefit from the feature is just NOT possible ! and anyway boring...

I have other suggestions ( I insist, NOT critics) ..but w'll see later..Thanks for replying so fast. :clap:

Author:  (cool) Hector [ Sun Dec 26, 2010 11:18 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
Unable to interpret dots as spaces ( La.Dernière.Fugue vs La Dernière Fugue )

It's not at all a problem of dot vs. space.

There's simply no movie in our database named "La Dernière Fugue".

In this case, the algorithm matches the most popular movie that matches the longest string.

longest string => "la dernière".

most popular movie starting with "la dernière" => "la dernière femme".

Author:  (cool) Hector [ Sun Dec 26, 2010 11:20 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
Unable to check filename with no spaces [/b]( LesInvitésDeMonPère vs Les Invités De Mon Père)

Of course, our database doesn't have any movie starting with the word "LesInvitésDeMonPère".

Author:  (cool) Hector [ Sun Dec 26, 2010 11:21 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
Unable to collate accented letters ( La Derniere Fugue vs La Dernière Fugue )

Please stop with "La dernière Fugue". This movie is simply not in our database. This has nothing to do with accented letters. The algorithm can handle accented letters.

Author:  (cool) Hector [ Sun Dec 26, 2010 11:23 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
Unable to match missing ' ( Brooklyns Finest vs Brooklyn's Finest )

"Brooklyns" and "Brooklyn" are 2 different words.

Author:  (cool) Hector [ Sun Dec 26, 2010 11:26 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
Requires character forbidden in Filename( the colon in 3:10 to Yuma )

Windows won't allow the ":" character. But the program recognizes perfectly "3 10 to Yuma" or "3.10 to Yuma". What's your file named ?

Author:  (cool) Hector [ Sun Dec 26, 2010 11:29 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
This is just "UnFixable" if you do not accept the idea that " El Algorithmo" could be improved some day..

Please don't say that. I'm ok to improve the algorithm if you give me a good exemple. But don't expect the algorithm to magically fix your typos for you.

Author:  (cool) Hector [ Sun Dec 26, 2010 11:31 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
Inability to use (and parse) folders ( ex: Folder "Inception" containing CD1.avi and CD2.avi )

avi files need to have the movie title in their name. If the algorithm used the folder names, we would have many false results. The algorithm uses the folder names only in the case of VIDEO_TS folders.

Author:  (cool) Hector [ Sun Dec 26, 2010 11:33 pm ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

kepler42 wrote:
And, finally, i want to underline again that there are many circumstances where changing the filename to benefit from the feature is just NOT possible ! and anyway boring...

Please give me some exemples. I want exemples. I need exemples. For the moment, none of your exemples were meaningful.

But don't expect miracles if you name your files like a pig :doh:

Author:  kepler42 [ Mon Dec 27, 2010 5:00 am ]
Post subject:  Re: "Scanning your Files": BUGS or IMPROVEMENTS ? or both ?

(cool) Hector wrote:
kepler42 wrote:
Unable to collate accented letters ( La Derniere Fugue vs La Dernière Fugue )

Please stop with "La dernière Fugue". This movie is simply not in our database. This has nothing to do with accented letters. The algorithm can handle accented letters.

I am not harassing you,
I was just trying to be helpfull in quoting an example.
It was just bad luck I felt on a missing movie.

Here is another example (La verite vs La vérité )
I DO NOT think "the algorithm" handles properly accented letters.
Why dont you use one of the public matching algorithms ?
You would save trouble for you (and your users).

Page 1 of 3 All times are UTC + 1 hour [ DST ]
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/